# Day19
## 網頁結構解析：使用 BeautifulSoup 套件操作 CSS Selector
- 使用 HTML Parser
- 使用 CSS Selector 語法獲取子節點

## 作業說明
由於前一天作業我們已經練習過一些定位工具，今天針對 CSS Selector 更多變化用法再深入練習吧。

- 題目網站：
https://pokemondb.net/pokedex/all
- 使用 CSS Selector 技巧把寶可夢表格抓下來

In [1]:
from bs4 import BeautifulSoup
import requests

### `GET` Request

In [2]:
url = 'https://pokemondb.net/pokedex/all'

req_text = requests.get(url)

print(req_text.status_code, "\n\n", req_text.text[:2000])


200 

 <!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Pokémon Pokédex: list of Pokémon with stats | Pokémon Database</title>
<link rel="preconnect" href="https://img.pokemondb.net">
<style>@font-face{font-family:"Fira Sans";font-style:normal;font-weight:400;font-display:swap;src:url("/static/fonts/fira-sans-v10-latin-400.woff2") format("woff2");unicode-range:U+0000-00FF,U+0131,U+0152-0153,U+02BB-02BC,U+02C6,U+02DA,U+02DC,U+2000-206F,U+2074,U+20AC,U+2122,U+2191,U+2193,U+2212,U+2215,U+FEFF,U+FFFD}@font-face{font-family:"Fira Sans";font-style:italic;font-weight:400;font-display:swap;src:url("/static/fonts/fira-sans-v10-latin-400i.woff2") format("woff2");unicode-range:U+0000-00FF,U+0131,U+0152-0153,U+02BB-02BC,U+02C6,U+02DA,U+02DC,U+2000-206F,U+2074,U+20AC,U+2122,U+2191,U+2193,U+2212,U+2215,U+FEFF,U+FFFD}@font-face{font-family:"Fira Sans";font-style:normal;font-weight:700;font-display:swap;src:url("/static/fonts/fira-sans-v10-latin-600.woff2") format("woff2");unicode

### 使用 HTML Parser

In [3]:
# 轉為 BeautifulSoup 物件
soup = BeautifulSoup(req_text.text, 'html.parser')
soup.title

<title>Pokémon Pokédex: list of Pokémon with stats | Pokémon Database</title>

### 指定相符特徵的節點
- 找到寶可夢資訊表格
- 使用：`soup.find(<tag_name>, {<attribute>: <attribute_value>})`


In [4]:
table = soup.find('table', {'id': 'pokedex'})
table

<table class="data-table sticky-header block-wide" id="pokedex">
<thead>
<tr>
<th class="sorting" data-sort="int"><div class="sortwrap">#</div></th> <th class="sorting" data-sort="string"><div class="sortwrap">Name</div></th> <th><div class="sortwrap">Type</div></th> <th class="sorting" data-sort="int"><div class="sortwrap">Total</div></th> <th class="sorting" data-sort="int"><div class="sortwrap">HP</div></th> <th class="sorting" data-sort="int"><div class="sortwrap">Attack</div></th> <th class="sorting" data-sort="int"><div class="sortwrap">Defense</div></th> <th class="sorting" data-sort="int"><div class="sortwrap">Sp. Atk</div></th> <th class="sorting" data-sort="int"><div class="sortwrap">Sp. Def</div></th> <th class="sorting" data-sort="int"><div class="sortwrap">Speed</div></th> </tr>
</thead>
<tbody>
<tr>
<td class="cell-num cell-fixed" data-sort-value="1"><span class="infocard-cell-img"><span class="img-fixed icon-pkmn" data-alt="Bulbasaur icon" data-src="https://img.pokemondb

### 連續查找
- 取得所有表格中的列

In [5]:
header = table.find_all('th')
body_rows = table.find_all('tr')

In [6]:
header

[<th class="sorting" data-sort="int"><div class="sortwrap">#</div></th>,
 <th class="sorting" data-sort="string"><div class="sortwrap">Name</div></th>,
 <th><div class="sortwrap">Type</div></th>,
 <th class="sorting" data-sort="int"><div class="sortwrap">Total</div></th>,
 <th class="sorting" data-sort="int"><div class="sortwrap">HP</div></th>,
 <th class="sorting" data-sort="int"><div class="sortwrap">Attack</div></th>,
 <th class="sorting" data-sort="int"><div class="sortwrap">Defense</div></th>,
 <th class="sorting" data-sort="int"><div class="sortwrap">Sp. Atk</div></th>,
 <th class="sorting" data-sort="int"><div class="sortwrap">Sp. Def</div></th>,
 <th class="sorting" data-sort="int"><div class="sortwrap">Speed</div></th>]

In [7]:
body_rows

[<tr>
 <th class="sorting" data-sort="int"><div class="sortwrap">#</div></th> <th class="sorting" data-sort="string"><div class="sortwrap">Name</div></th> <th><div class="sortwrap">Type</div></th> <th class="sorting" data-sort="int"><div class="sortwrap">Total</div></th> <th class="sorting" data-sort="int"><div class="sortwrap">HP</div></th> <th class="sorting" data-sort="int"><div class="sortwrap">Attack</div></th> <th class="sorting" data-sort="int"><div class="sortwrap">Defense</div></th> <th class="sorting" data-sort="int"><div class="sortwrap">Sp. Atk</div></th> <th class="sorting" data-sort="int"><div class="sortwrap">Sp. Def</div></th> <th class="sorting" data-sort="int"><div class="sortwrap">Speed</div></th> </tr>,
 <tr>
 <td class="cell-num cell-fixed" data-sort-value="1"><span class="infocard-cell-img"><span class="img-fixed icon-pkmn" data-alt="Bulbasaur icon" data-src="https://img.pokemondb.net/sprites/sword-shield/icon/bulbasaur.png"></span></span><span class="infocard-cel

### 指定節點文字相符：找出文字是 Ivysaur 的節點
- Hint: 使用 `soup.find("a", string=<text_in_the_html_node_>)`

In [8]:
soup.find("a", string='Ivysaur')

<a class="ent-name" href="/pokedex/ivysaur" title="View Pokedex for #002 Ivysaur">Ivysaur</a>

### 找出屬性包含部分文字的節點：找出各種類型的寶可夢種類標籤

- 使用 regex 
- `soup.find(<tag_name>, {<attribute>: <regex_pattern>})`

In [9]:
# 選取各種類型的寶可夢種類標籤(GRASS, POISON, ...)，用 set 過濾出不重複種類有哪幾種

import re

regex_pattern = re.compile("type-.*")
type(regex_pattern)

_sre.SRE_Pattern

In [10]:
set(soup.find_all('a', {'class': regex_pattern}))

{<a class="type-icon type-bug" href="/type/bug">Bug</a>,
 <a class="type-icon type-dark" href="/type/dark">Dark</a>,
 <a class="type-icon type-dragon" href="/type/dragon">Dragon</a>,
 <a class="type-icon type-electric" href="/type/electric">Electric</a>,
 <a class="type-icon type-fairy" href="/type/fairy">Fairy</a>,
 <a class="type-icon type-fighting" href="/type/fighting">Fighting</a>,
 <a class="type-icon type-fire" href="/type/fire">Fire</a>,
 <a class="type-icon type-flying" href="/type/flying">Flying</a>,
 <a class="type-icon type-ghost" href="/type/ghost">Ghost</a>,
 <a class="type-icon type-grass" href="/type/grass">Grass</a>,
 <a class="type-icon type-ground" href="/type/ground">Ground</a>,
 <a class="type-icon type-ice" href="/type/ice">Ice</a>,
 <a class="type-icon type-normal" href="/type/normal">Normal</a>,
 <a class="type-icon type-poison" href="/type/poison">Poison</a>,
 <a class="type-icon type-psychic" href="/type/psychic">Psychic</a>,
 <a class="type-icon type-rock" hr

### 將資訊組成表格

In [11]:
for col in header:
    print(col.text)

#
Name
Type
Total
HP
Attack
Defense
Sp. Atk
Sp. Def
Speed


In [12]:
for row in body_rows:
    #print(row.text.split('\n'))
    #print(row.find_all('td'))
    for td in row.find_all('td'):
        print(td.text)

001
Bulbasaur
Grass Poison
318
45
49
49
65
65
45
002
Ivysaur
Grass Poison
405
60
62
63
80
80
60
003
Venusaur
Grass Poison
525
80
82
83
100
100
80
003
Venusaur Mega Venusaur
Grass Poison
625
80
100
123
122
120
80
004
Charmander
Fire 
309
39
52
43
60
50
65
005
Charmeleon
Fire 
405
58
64
58
80
65
80
006
Charizard
Fire Flying
534
78
84
78
109
85
100
006
Charizard Mega Charizard X
Fire Dragon
634
78
130
111
130
85
100
006
Charizard Mega Charizard Y
Fire Flying
634
78
104
78
159
115
100
007
Squirtle
Water 
314
44
48
65
50
64
43
008
Wartortle
Water 
405
59
63
80
65
80
58
009
Blastoise
Water 
530
79
83
100
85
105
78
009
Blastoise Mega Blastoise
Water 
630
79
103
120
135
115
78
010
Caterpie
Bug 
195
45
30
35
20
20
45
011
Metapod
Bug 
205
50
20
55
25
25
30
012
Butterfree
Bug Flying
395
60
45
50
90
80
70
013
Weedle
Bug Poison
195
40
35
30
20
20
50
014
Kakuna
Bug Poison
205
45
25
50
25
25
35
015
Beedrill
Bug Poison
395
65
90
40
45
80
75
015
Beedrill Mega Beedrill
Bug Poison
495
65
150
40
15
80
145

60
62
80
63
80
60
154
Meganium
Grass 
525
80
82
100
83
100
80
155
Cyndaquil
Fire 
309
39
52
43
60
50
65
156
Quilava
Fire 
405
58
64
58
80
65
80
157
Typhlosion
Fire 
534
78
84
78
109
85
100
157
Typhlosion Hisuian Typhlosion
Fire Ghost
534
73
84
78
119
85
95
158
Totodile
Water 
314
50
65
64
44
48
43
159
Croconaw
Water 
405
65
80
80
59
63
58
160
Feraligatr
Water 
530
85
105
100
79
83
78
161
Sentret
Normal 
215
35
46
34
35
45
20
162
Furret
Normal 
415
85
76
64
45
55
90
163
Hoothoot
Normal Flying
262
60
30
30
36
56
50
164
Noctowl
Normal Flying
452
100
50
50
86
96
70
165
Ledyba
Bug Flying
265
40
20
30
40
80
55
166
Ledian
Bug Flying
390
55
35
50
55
110
85
167
Spinarak
Bug Poison
250
40
60
40
40
40
30
168
Ariados
Bug Poison
400
70
90
70
60
70
40
169
Crobat
Poison Flying
535
85
90
80
70
80
130
170
Chinchou
Water Electric
330
75
38
38
56
56
67
171
Lanturn
Water Electric
460
125
58
58
76
76
67
172
Pichu
Electric 
205
20
40
15
35
35
60
173
Cleffa
Fairy 
218
50
25
28
45
55
15
174
Igglybuff
Normal F

85
314
Illumise
Bug 
430
65
47
75
73
85
85
315
Roselia
Grass Poison
400
50
60
45
100
80
65
316
Gulpin
Poison 
302
70
43
53
43
53
40
317
Swalot
Poison 
467
100
73
83
73
83
55
318
Carvanha
Water Dark
305
45
90
20
65
20
65
319
Sharpedo
Water Dark
460
70
120
40
95
40
95
319
Sharpedo Mega Sharpedo
Water Dark
560
70
140
70
110
65
105
320
Wailmer
Water 
400
130
70
35
70
35
60
321
Wailord
Water 
500
170
90
45
90
45
60
322
Numel
Fire Ground
305
60
60
40
65
45
35
323
Camerupt
Fire Ground
460
70
100
70
105
75
40
323
Camerupt Mega Camerupt
Fire Ground
560
70
120
100
145
105
20
324
Torkoal
Fire 
470
70
85
140
85
70
20
325
Spoink
Psychic 
330
60
25
35
70
80
60
326
Grumpig
Psychic 
470
80
45
65
90
110
80
327
Spinda
Normal 
360
60
60
60
60
60
60
328
Trapinch
Ground 
290
45
100
45
45
45
10
329
Vibrava
Ground Dragon
340
50
70
50
50
50
70
330
Flygon
Ground Dragon
520
80
100
80
80
80
100
331
Cacnea
Grass 
335
50
85
40
85
40
35
332
Cacturne
Grass Dark
475
70
115
60
115
60
55
333
Swablu
Normal Flying
310
45

41
50
37
50
37
66
510
Liepard
Dark 
446
64
88
50
88
50
106
511
Pansage
Grass 
316
50
53
48
53
48
64
512
Simisage
Grass 
498
75
98
63
98
63
101
513
Pansear
Fire 
316
50
53
48
53
48
64
514
Simisear
Fire 
498
75
98
63
98
63
101
515
Panpour
Water 
316
50
53
48
53
48
64
516
Simipour
Water 
498
75
98
63
98
63
101
517
Munna
Psychic 
292
76
25
45
67
55
24
518
Musharna
Psychic 
487
116
55
85
107
95
29
519
Pidove
Normal Flying
264
50
55
50
36
30
43
520
Tranquill
Normal Flying
358
62
77
62
50
42
65
521
Unfezant
Normal Flying
488
80
115
80
65
55
93
522
Blitzle
Electric 
295
45
60
32
50
32
76
523
Zebstrika
Electric 
497
75
100
63
80
63
116
524
Roggenrola
Rock 
280
55
75
85
25
25
15
525
Boldore
Rock 
390
70
105
105
50
40
20
526
Gigalith
Rock 
515
85
135
130
60
80
25
527
Woobat
Psychic Flying
323
65
45
43
55
43
72
528
Swoobat
Psychic Flying
425
67
57
55
77
55
114
529
Drilbur
Ground 
328
60
85
40
30
45
68
530
Excadrill
Ground Steel
508
110
135
60
50
65
88
531
Audino
Normal 
445
103
60
86
60
86
50
531


72
686
Inkay
Dark Psychic
288
53
54
53
37
46
45
687
Malamar
Dark Psychic
482
86
92
88
68
75
73
688
Binacle
Rock Water
306
42
52
67
39
56
50
689
Barbaracle
Rock Water
500
72
105
115
54
86
68
690
Skrelp
Poison Water
320
50
60
60
60
60
30
691
Dragalge
Poison Dragon
494
65
75
90
97
123
44
692
Clauncher
Water 
330
50
53
62
58
63
44
693
Clawitzer
Water 
500
71
73
88
120
89
59
694
Helioptile
Electric Normal
289
44
38
33
61
43
70
695
Heliolisk
Electric Normal
481
62
55
52
109
94
109
696
Tyrunt
Rock Dragon
362
58
89
77
45
45
48
697
Tyrantrum
Rock Dragon
521
82
121
119
69
59
71
698
Amaura
Rock Ice
362
77
59
50
67
63
46
699
Aurorus
Rock Ice
521
123
77
72
99
92
58
700
Sylveon
Fairy 
525
95
65
65
110
130
60
701
Hawlucha
Fighting Flying
500
78
92
75
74
63
118
702
Dedenne
Electric Fairy
431
67
58
57
81
67
101
703
Carbink
Rock Fairy
500
50
50
150
50
150
50
704
Goomy
Dragon 
300
45
50
35
55
75
40
705
Sliggoo
Dragon 
452
68
75
53
83
113
60
705
Sliggoo Hisuian Sliggoo
Dragon Steel
452
58
75
83
83
113
40


30
843
Silicobra
Ground 
315
52
57
75
35
50
46
844
Sandaconda
Ground 
510
72
107
125
65
70
71
845
Cramorant
Flying Water
475
70
85
55
85
95
85
846
Arrokuda
Water 
280
41
63
40
40
30
66
847
Barraskewda
Water 
490
61
123
60
60
50
136
848
Toxel
Electric Poison
242
40
38
35
54
35
40
849
Toxtricity Amped Form
Electric Poison
502
75
98
70
114
70
75
849
Toxtricity Low Key Form
Electric Poison
502
75
98
70
114
70
75
850
Sizzlipede
Fire Bug
305
50
65
45
50
50
45
851
Centiskorch
Fire Bug
525
100
115
65
90
90
65
852
Clobbopus
Fighting 
310
50
68
60
50
50
32
853
Grapploct
Fighting 
480
80
118
90
70
80
42
854
Sinistea
Ghost 
308
40
45
45
74
54
50
855
Polteageist
Ghost 
508
60
65
65
134
114
70
856
Hatenna
Psychic 
265
42
30
45
56
53
39
857
Hattrem
Psychic 
370
57
40
65
86
73
49
858
Hatterene
Psychic Fairy
510
57
90
95
136
103
29
859
Impidimp
Dark Fairy
265
45
45
30
55
40
50
860
Morgrem
Dark Fairy
370
65
60
45
75
55
70
861
Grimmsnarl
Dark Fairy
510
95
120
65
95
75
60
862
Obstagoon
Dark Normal
520
93


In [13]:
header_cols = [col.text for col in header]
row_values  = list(filter(None, [[td.text for td in row.find_all('td')] for row in body_rows]))

In [14]:
header_cols

['#',
 'Name',
 'Type',
 'Total',
 'HP',
 'Attack',
 'Defense',
 'Sp. Atk',
 'Sp. Def',
 'Speed']

In [15]:
row_values

[['001',
  'Bulbasaur',
  'Grass Poison',
  '318',
  '45',
  '49',
  '49',
  '65',
  '65',
  '45'],
 ['002', 'Ivysaur', 'Grass Poison', '405', '60', '62', '63', '80', '80', '60'],
 ['003',
  'Venusaur',
  'Grass Poison',
  '525',
  '80',
  '82',
  '83',
  '100',
  '100',
  '80'],
 ['003',
  'Venusaur Mega Venusaur',
  'Grass Poison',
  '625',
  '80',
  '100',
  '123',
  '122',
  '120',
  '80'],
 ['004', 'Charmander', 'Fire ', '309', '39', '52', '43', '60', '50', '65'],
 ['005', 'Charmeleon', 'Fire ', '405', '58', '64', '58', '80', '65', '80'],
 ['006',
  'Charizard',
  'Fire Flying',
  '534',
  '78',
  '84',
  '78',
  '109',
  '85',
  '100'],
 ['006',
  'Charizard Mega Charizard X',
  'Fire Dragon',
  '634',
  '78',
  '130',
  '111',
  '130',
  '85',
  '100'],
 ['006',
  'Charizard Mega Charizard Y',
  'Fire Flying',
  '634',
  '78',
  '104',
  '78',
  '159',
  '115',
  '100'],
 ['007', 'Squirtle', 'Water ', '314', '44', '48', '65', '50', '64', '43'],
 ['008', 'Wartortle', 'Water ', '4

In [16]:
import pandas as pd

df = pd.DataFrame(row_values, columns=header_cols)
df['Type'] = df['Type'].apply(lambda x: x.strip().split(' '))
df

Unnamed: 0,#,Name,Type,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed
0,001,Bulbasaur,"[Grass, Poison]",318,45,49,49,65,65,45
1,002,Ivysaur,"[Grass, Poison]",405,60,62,63,80,80,60
2,003,Venusaur,"[Grass, Poison]",525,80,82,83,100,100,80
3,003,Venusaur Mega Venusaur,"[Grass, Poison]",625,80,100,123,122,120,80
4,004,Charmander,[Fire],309,39,52,43,60,50,65
...,...,...,...,...,...,...,...,...,...,...
1070,902,Basculegion Female,"[Water, Ghost]",530,120,92,65,100,75,78
1071,903,Sneasler,"[Poison, Fighting]",510,80,130,60,40,80,120
1072,904,Overqwil,"[Dark, Poison]",510,85,115,95,65,65,85
1073,905,Enamorus Incarnate Forme,"[Fairy, Flying]",580,74,115,70,135,80,106


In [17]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1075 entries, 0 to 1074
Data columns (total 10 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   #        1075 non-null   object
 1   Name     1075 non-null   object
 2   Type     1075 non-null   object
 3   Total    1075 non-null   object
 4   HP       1075 non-null   object
 5   Attack   1075 non-null   object
 6   Defense  1075 non-null   object
 7   Sp. Atk  1075 non-null   object
 8   Sp. Def  1075 non-null   object
 9   Speed    1075 non-null   object
dtypes: object(10)
memory usage: 84.1+ KB
