# Getting data directly from a website
This notebook walks you through some steps in collecting data from [Bulbapedia's National Pokedex](https://bulbapedia.bulbagarden.net/wiki/List_of_Pok%C3%A9mon_by_National_Pok%C3%A9dex_number) using `requests` and `BeautifulSoup`

### Import `requests` library
This package allows you to get any website's HTML code so that you can extract data from it. Let's save the website's URL in the `URL` variable.

In [1]:
import requests
import pandas as pd
import numpy as np
URL="https://bulbapedia.bulbagarden.net/wiki/List_of_Pok%C3%A9mon_by_National_Pok%C3%A9dex_number"

### Load the page

Since we are getting the information from the entire webpage, internet is required for the request package to work.

In [2]:
page = requests.get(URL)

### Parse HTML data

#### BeautifulSoup
This is a python package that allows you to `parse` the information in the html file. The code below tells BeautifulSoup to parse the page content loaded via the request method.

In [3]:
from bs4 import BeautifulSoup

soup = BeautifulSoup(page.content, 'html.parser')

## Findings    
    7 - Elements Pokemons that are other regional forms and has 1 type
    8 - Elements Pokemons that are other regional forms and has 2 types
    9 - Orig pokemon list with single typing
    10 - Orig pokemon list with multiple typing

# How to Scrape : 

Pseudo code : 

    For each row (index that is even and is not zero) in pokemon info
    check the total number of elements in the text converted html block
    
    if 7 : get the index corresponding to each element
    
    elif 8 : get the index corresponding to each element
    
    elif 9 : get the index corresponding to each element
    
    elif 10 : get the index corresponding to each element
    
    else: print something as a flagger

# Strategy 2 Json Conversion per Loop:

In [7]:
gen_json = []

info_start = 1
# place where to get the pokemon info

for generation in range(1, 9):
    poke_content=soup.find(id='mw-content-text')
    poke_tables=poke_content.find_all('table')
    gen_list=poke_tables[generation]
    info_row=gen_list.contents[info_start]
    

    for pokemon_info_values, even_index_chec in zip(info_row.contents, range(0,len(info_row.contents))):
        # Pokemons' values are stored in even index (divisible by 2 and is not 0)
        if ((even_index_chec % 2) == 0) & (even_index_chec != 0) :
            pokemon_raw_info = pokemon_info_values.text.strip().split('\n')
            scrapetime = np.datetime64('now')


    ## Pokemons that are other regional forms and has 1 type
            if len(pokemon_raw_info) == 7:
                kdex = pokemon_raw_info[0]
                ndex = pokemon_raw_info[1]
                poke_name = pokemon_raw_info[4]
                type1 = pokemon_raw_info[6]
                type2 = ''
                categ = 'Other Form Single Type'
    #             print(kdex,ndex,poke_name, type1,type2)
    #             print(pokemon_info_values.text.strip().split('\n'))
    #             print('####')

    ## Pokemons that are other regional forms and has 2 types
            elif len(pokemon_raw_info) == 8:
                kdex = pokemon_raw_info[0]
                ndex = pokemon_raw_info[1]
                poke_name = pokemon_raw_info[4]
                type1 = pokemon_raw_info[6]
                type2 = pokemon_raw_info[7]
                categ = 'Other Form Multi Type'
    #             print(kdex,ndex,poke_name, type1,type2)
    #             print(pokemon_info_values.text.strip().split('\n'))
    #             print('####')

    ## Orig pokemon list with single typing
            elif len(pokemon_raw_info) == 9:
                kdex = pokemon_raw_info[0]
                ndex = pokemon_raw_info[2]
                poke_name = pokemon_raw_info[6]
                type1 = pokemon_raw_info[8]
                type2 = ''
                categ = 'Orig Form Single Type'
    #             print(kdex,ndex,poke_name, type1,type2)
    #             print(pokemon_info_values.text.strip().split('\n'))
    #             print('####')


    # Orig pokemon list with multiple typing
            elif len(pokemon_raw_info) == 10:
                kdex = pokemon_raw_info[0]
                ndex = pokemon_raw_info[2]
                poke_name = pokemon_raw_info[6]
                type1 = pokemon_raw_info[8]
                type2 = pokemon_raw_info[9]
                categ = 'Orig Form Multi Type'
    #             print(kdex,ndex,poke_name, type1,type2)
    #             print(pokemon_info_values.text.strip().split('\n'))
    #             print('####')

            else:
                print('Check out elements containing ' + str(len(pokemon_raw_info)) + ' elements')

            # Saving as a tuple
            gen_json.append({"kdex" : kdex,
                              "ndex" : ndex,
                              "poke_name" : poke_name,
                              "type1" : type1,
                              "type2" : type2,
                              "generation" : generation,
                              "scrapetime" : scrapetime})
        


In [12]:
gen_json

[{'kdex': '#001',
  'ndex': '#001',
  'poke_name': 'Bulbasaur',
  'type1': 'Grass',
  'type2': 'Poison',
  'generation': 1,
  'scrapetime': numpy.datetime64('2022-06-13T02:54:55')},
 {'kdex': '#002',
  'ndex': '#002',
  'poke_name': 'Ivysaur',
  'type1': 'Grass',
  'type2': 'Poison',
  'generation': 1,
  'scrapetime': numpy.datetime64('2022-06-13T02:54:55')},
 {'kdex': '#003',
  'ndex': '#003',
  'poke_name': 'Venusaur',
  'type1': 'Grass',
  'type2': 'Poison',
  'generation': 1,
  'scrapetime': numpy.datetime64('2022-06-13T02:54:55')},
 {'kdex': '#004',
  'ndex': '#004',
  'poke_name': 'Charmander',
  'type1': 'Fire',
  'type2': '',
  'generation': 1,
  'scrapetime': numpy.datetime64('2022-06-13T02:54:55')},
 {'kdex': '#005',
  'ndex': '#005',
  'poke_name': 'Charmeleon',
  'type1': 'Fire',
  'type2': '',
  'generation': 1,
  'scrapetime': numpy.datetime64('2022-06-13T02:54:55')},
 {'kdex': '#006',
  'ndex': '#006',
  'poke_name': 'Charizard',
  'type1': 'Fire',
  'type2': 'Flying',
 