# Pokemon Moves Data Scrape

The purpose of this program is to scrape data from the website https://pokemondb.net/move/all, 
and save it as a csv file in a format that facilitates analysis.

It will seek to answer Question 3: Which Pokémon types have the most powerful moves? When combined with powers related to Pokémon themselves (i.e. HP, speed etc?) which Pokémon are the strongest?

The following tutorial was referenced to complete this webscraping project
https://towardsdatascience.com/web-scraping-html-tables-with-python-c9baba21059

### 1. Obtain content of table from webpage
First, visit website and observe structure of html elements:
https://pokemondb.net/move/all

In [1]:
import requests  # gets html contents of website
import lxml.html as lh   # parses fields
import pandas as pd    # work with data in dataframe
from lxml.cssselect import CSSSelector

In [2]:
# store website html
url = "https://pokemondb.net/move/all"

# create a handle for contents of website
page = requests.get(url)

# use status_code method to check http response
page.status_code 

200

In [3]:
# .status_code returned a 200, which means your request was successful and the server responded with the data you were requesting.
# https://realpython.com/python-requests/

In [4]:
# store contents of website 
# document_fromstring(string): Parses a document from the given string. 
doc = lh.fromstring(page.content)

# parse data that are stored in rows (between <tr>..</tr> of HTML)
# https://www.w3schools.com/tags/tag_tr.asp
# the xpath method performs a global XPath query against the document or root node
tr_elements = doc.xpath('//tr')

### 2. Explore format of content

In [5]:
tr_elements

[<Element tr at 0x11da5f1d8>,
 <Element tr at 0x11da5f278>,
 <Element tr at 0x11da5f728>,
 <Element tr at 0x11da5f318>,
 <Element tr at 0x11da5f7c8>,
 <Element tr at 0x11da5f818>,
 <Element tr at 0x11da5f868>,
 <Element tr at 0x11da5f8b8>,
 <Element tr at 0x11da5f908>,
 <Element tr at 0x11da5f958>,
 <Element tr at 0x11da5f9a8>,
 <Element tr at 0x11da5f9f8>,
 <Element tr at 0x11da5fa48>,
 <Element tr at 0x11da5fa98>,
 <Element tr at 0x11da5fae8>,
 <Element tr at 0x11da5fb38>,
 <Element tr at 0x11da5fb88>,
 <Element tr at 0x11da5fbd8>,
 <Element tr at 0x11da5fc28>,
 <Element tr at 0x11da5fc78>,
 <Element tr at 0x11da5fcc8>,
 <Element tr at 0x11da5fd18>,
 <Element tr at 0x11da5fd68>,
 <Element tr at 0x11da5fdb8>,
 <Element tr at 0x11da5fe08>,
 <Element tr at 0x11da5fe58>,
 <Element tr at 0x11da5fea8>,
 <Element tr at 0x11da5fef8>,
 <Element tr at 0x11da5ff48>,
 <Element tr at 0x11da5ff98>,
 <Element tr at 0x11da7d048>,
 <Element tr at 0x11da7d098>,
 <Element tr at 0x11da7d0e8>,
 <Element 

In [6]:
# check length of first 12 rows to ensure that table is uniformly formatted
[len(T) for T in tr_elements]

[9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,
 9,


In [7]:
# preview contents 
page.content[:500]

b'<!DOCTYPE html>\n<html lang="en">\n<head>\n<meta charset="utf-8">\n<title>Pok\xc3\xa9mon moves: list of attacks | Pok\xc3\xa9mon Database</title>\n<link rel="preconnect" href="https://fonts.gstatic.com">\n<link rel="preconnect" href="https://img.pokemondb.net">\n<link rel="stylesheet" href="/static/css/pokemondb-3866aea80e.css">\n<meta name="viewport" content="width=device-width, initial-scale=1">\n<meta property="og:description" name="description" content="There are over 460 Pok\xc3\xa9mon attacks in total; this page lis'

In [8]:
# preview contents
tr_elements[0].text_content()

'\nName Type Cat. Power Acc. PP TM Effect Prob. (%) '

In [9]:
# preview content
tr_elements[2].text_content()

'\nAbsorbGrass 20 100 25\n\nUser recovers half the HP inflicted on opponent.\n—\n'

In [10]:
# observe length/row count of table
len(tr_elements)

805

### 3. Define table structure

In [11]:
## Parse the first row as a header
# create empty list to store column names
col = []
# initate counter
i=0

# for each row, store column name and an empty list
for i,t in enumerate(tr_elements[0]):
    name = t.text_content()
    print('{:2d}: {:s}'.format(i+1,name))
    col.append((name,[]))

 1: Name
 2: Type
 3: Cat.
 4: Power
 5: Acc.
 6: PP
 7: TM
 8: Effect
 9: Prob. (%)


In [12]:
# view resulting list of tuples
col

[('Name', []),
 ('Type', []),
 ('Cat.', []),
 ('Power', []),
 ('Acc.', []),
 ('PP', []),
 ('TM', []),
 ('Effect', []),
 ('Prob. (%)', [])]

### 4. Store text content in list of tuples

In [13]:
# parse through data and store each point from a row in a column
for r in range(1,len(tr_elements)):  # iterate through table as many times as there are rows (once for each row)
    T=tr_elements[r]  # temporarily store each row in element, T
    
    # make sure row is size 9, else do not add to dataframe
    if len(T) != 9:
        break
    
    #initiate column index, i
    i=0
    
    # iterate through each element of the row
    for t in T.iterchildren():   #Iterate through children of element. https://kite.com/python/docs/lxml.etree.ElementBase.iterchildren
        data=t.text_content()
        # make sure row is not empty
        if i>0:
            # convert numeric values to integers
            try:
                data=int(data)
            except:
                pass ## if not numeric, do not apply int conversion and move on to next element
        # append data to empty list of appropriate column. Remember, col is a list of tuples, in which the first element (0) is the column name, and the second element is an empty list
        col[i][1].append(data)
        # increment i for next column
        i+=1

In [14]:
## Check number of observations in each column. 
print('Size of original table, including column labels:',len(tr_elements))
print('\nSize of columns in parsed table. All should be 1 less than size of original table')
for c in col:
    print(c[0],':',len(c[1]))

Size of original table, including column labels: 805

Size of columns in parsed table. All should be 1 less than size of original table
Name : 804
Type : 804
Cat. : 804
Power : 804
Acc. : 804
PP : 804
TM : 804
Effect : 804
Prob. (%) : 804


### 5. Convert to Pandas Dataframe

In [15]:
# first, create a dictionary of lists using Python list comprehension
# https://stackoverflow.com/questions/1747817/create-a-dictionary-with-list-comprehension
Dict = {title:column for title, column in col}

In [16]:
# preview format of Category column, which is unlikely to load in correctly
Dict["Cat."]

['',
 '',
 '',
 '',
 '',
 '—',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '—',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '—',
 '',
 '',
 '',
 '',
 '—',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '—',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '—',
 '',
 '',
 '',
 '',
 '—',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '—',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '

In [17]:
# create pandas dataframe from dictionary
df = pd.DataFrame(Dict)
# preview column format
df

Unnamed: 0,Name,Type,Cat.,Power,Acc.,PP,TM,Effect,Prob. (%)
0,"10,000,000 Volt Thunderbolt",Electric,,195,—,1,,Pikachu-exclusive Z-Move.,
1,Absorb,Grass,,20,100,25,,User recovers half the HP inflicted on opponent.,—
2,Accelerock,Rock,,40,100,20,,User attacks first.,
3,Acid,Poison,,40,100,30,,May lower opponent's Special Defense.,10
4,Acid Armor,Poison,,—,—,20,,Sharply raises user's Defense.,—
5,Acid Downpour,Poison,—,—,—,1,,Poison type Z-Move.,
6,Acid Spray,Poison,,40,100,20,,Sharply lowers opponent's Special Defense.,100
7,Acrobatics,Flying,,55,100,15,TM62,Stronger when the user does not have a held item.,—
8,Acupressure,Normal,,—,—,30,,Sharply raises a random stat.,—
9,Aerial Ace,Flying,,60,∞,20,TM40,Ignores Accuracy and Evasiveness.,—


### 6. Export as csv file.

In [18]:
df.to_csv('pokemon_moves.csv', index=False)