# getting started with pandas

**Extra resources:**

* Python Data Science Handbook: https://jakevdp.github.io/PythonDataScienceHandbook/
* Pandas cheatsheet: https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf 

<img src = "memes/pandas.PNG">

In [None]:
import numpy
import pandas

import seaborn
import matplotlib.pyplot as plt

import time
import sys
import os

import requests
from bs4 import BeautifulSoup

In [None]:
pandas.__version__

In [None]:
numpy.__version__

# dataframes, series and indexes

### dataframes

Imagine them as excel tables that you can explore

In [None]:
pokemons = ["Bulbasaur", "Squirtle", "Charizard", "Pikachu", "Onix", "Geodude", "Vulpix", "Starmie"]
types = ["Grass / Poison", "Water", "Fire / Flying", "Electric", "Rock / Ground",  "Rock / Ground", "Fire", "Water"]
master = ["Ash", "Ash", "Ash", "Ash", "Brock", "Brock", "Misty", "Misty"]
pokedex_number = [1, 4, 7, 25, 95, 74, 37, 121]
evolved = ["No", "No", "Yes", "No", "No", "No", "No", "Yes"]

In [None]:
# pokemons = ["Bulbasaur", "Squirtle", "Charizard", "Pikachu", "Onix", "Geodude", "Vulpix", "Starmie"]
# types = ["Grass / Poison", "Water", "Fire / Flying", "Electric", "Rock / Ground",  "Rock / Ground", "Fire", "Water"]
# master = ["Ash", "Ash", "Ash", "Ash", "Brock", "Brock", "Misty"]
# pokedex_number = [1, 4, 7, 25, 95, 74, 37, 121]
# evolved = ["No", "No", "Yes", "No", "No", "No", "No", "Yes"]

# df_pokemon = pandas.DataFrame({"name": pokemons, 
#                                "types": types, 
#                                "master": master, 
#                                "number": pokedex_number, 
#                                "is_evolved": evolved})
# df_pokemon

### Series

pandas columns are called Series

Series (pandas columns) are  basically numpy arrays with a name on top. This means that you can apply numpy array operations to pandas columns

### Indexes

In [None]:
df_pokemon

In [None]:
df_pokemon.columns

In [None]:
df_pokemon.index

In [None]:
df_pokemon.loc[, ]

Indexes help us to acces individual values and slices of the dataframe

### What can you do with a dataframe?

*Sort pokemon by pokedex number*

*Which pokemon are evolved?*

*Which pokemon are dual type?*

*Add a region column*

In [None]:
df_pokemon["region"] = "Kanto"
df_pokemon

*Add a new row*

*Fill Togepi pokedex number (175)*

# Pandas foundations

In [None]:
os.listdir("data")

## Lego dataset (reading flat files)

In [None]:
df_legos = pandas.read_csv("data/lego_sets.csv")
df_legos.head()

In [None]:
len(df_legos)

In [None]:
df_legos.info()

In [None]:
df_legos.describe()

In [None]:
df_legos = df_legos.dropna()
df_legos.info()

In [None]:
df_legos = df_legos.drop(["prod_id", "prod_long_desc", "num_reviews", "val_star_rating"], axis = 1)
df_legos.head()

## Are there any marvel lego pieces with easy difficulty that cost less than 150 dollars? (row and column selection)

*how many difficulty levels there are?*

In [None]:
df_legos["review_difficulty"].unique()

In [None]:
df_legos["review_difficulty"].nunique()

In [None]:
df_legos["theme_name"].unique()

In [None]:
df_legos["theme_name"].nunique()

### Column selection

In [None]:
df_legos_mini = df_legos[["set_name", "theme_name", "ages", "piece_count", "review_difficulty", "country"]] 


### Row selection 

### Combined selection

In [None]:
#example of str contains

*Answering the original question*

## Whats the brand of the toys? (broadcasting)

*Reordering columns*

## What architecture lego are more difficult and have more pieces? (sorting)

# Making a pokedex (reading html code)

What if I don't have a csv? We have information available online. Pandas helps with the scrapping of tables in html  or xml format. 

* `pandas.read_html` https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.read_html.html
* Pokedex database: https://pokemondb.net/pokedex/all

## Create a separate type 1 and type 2 columns for pokemons

## What's the strongest physical attacker?

## What's the strongest special attacker?

## Rename # column

## What's the fastest pokemon of the first generation?

## What's the fastest pokemon of the first generation? (no megas)

## Whick pokemons have a mega evolution?

## Which pokemon change type when they mega evolve?

# What's next?

* Groupping and aggregating functions
* Merging dataframes (joins, appends)
* Apply vectorized functions
* Working with Big Data
* Pandas pro tips and tricks

# Contact

Manuel Montoya 
* Correo: manuel.montoya@pucp.edu.pe
* Linkedin: https://www.linkedin.com/in/manuel-montoya-gamio/

<img src="https://miro.medium.com/max/3006/1*KdxlBR9P3mDp9JZ_URMdYQ.jpeg">