<a href="https://colab.research.google.com/github/cowdinosaur/colab_notebooks/blob/main/09B_Pandas_and_Dataframes_Pokemon.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction to Pandas and Dataframes

**Pandas** is a popular Python library, used for statistical analysis. We operate on **dataframes**, which you can think of as "Excel sheets as variables in Python". 

## Importing data

Now, let's import some data! We do this using the `read_csv` function within the `pd` library, so we call it with `pd.read_csv`. We tell it one thing: the location of the file to be read. 

Let's get started by importing a CSV file, which are plain-text versions of data values! Some sample CSV files can be found in this server's `Resources` folder. To use them, you'll need to go to the homepage, download the CSV files from the `Resources` folder, and **re-upload to the same folder as this notebook**. 

Once you're ready, run this line to store all that data in to a single variable:

Printing is a bit unwieldy because of all the data. To read just a bit of info from the beginning, which lets you check if everything imported correctly, use `.head()`:

Notice that the column headings are each of the columns in the CSV file, and the row headings are just numbers. This is OK, but not quite right--we want each row heading to be the pokemon name. In pandas terminology, we want to **set the pokemon name column as the index**. To do this, we re-import, and specify an `index_col`:

Now, the first two rows look a bit weird, but that's pandas' way of telling us that "name" is an index column. You can have multiple index columns, but that's beyond the scope of this class.

## Some DataFrame operations

Some useful functions you can do with DataFrames below. Try them out, and see what they do!

### <font color="red">Exercise 1: Import and check

New pokedex data was just released! Import the data from the new CSV (from Apr 2021) and check how many rows of data it has.

<hr>

## Reading and filtering data in DataFrames

There are many, many ways of accessing data in DataFrames. Here are a few ways--you can read up on other ways at the [Pandas DataFrame documentation](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html) page. It's worth clicking over just to see the variety of functions you can call to handle DataFrames!

Here, though, we'll start with accessing information the way you'd expect, by row and column:

### <font color="red">Exercise 2: Get data</font>

Import the necessary CSV file, and set up a DataFrame for the base stats 'hp', 'attack', 'speed' for each of the Pokemon 'Bulbasaur', 'Charmander' and 'Squirtle'. Your result should look like this:

| name | hp | attack | speed |
|---|:---:|:---:|:---:|
| Venosaur | 45 | 49 | 45 |
| Charizard | 39 | 52 | 65 |
| Blastsoise | 44 | 48 | 43 |

<hr>

## Filtering data

Here's one of the most powerful features of DataFrames--being able to quickly work with large chunks of data. If you had to do this with for loops, it'd be a bit of pain to filter everything out item by item, not to mention having to reconstruct your lists one by one.

## More Filtering

Take a look at what's being done, and try to figure it out, particularly when it comes to the two-condition criteria!

In [None]:
# Can you figure out what's being done in the below code? 

pokeAll = pd.read_csv("pokedex_(Update_2021.04).csv", index_col = "name")
criteria = (pokeAll["generation"] == 1) & (pokeAll["total_points"] > 100)
pokeAll["all_defense"] = pokeAll["defense"] + pokeAll["sp_defense"]
pokeAll[criteria][["defense","sp_defense", "all_defense"]]

Unnamed: 0_level_0,defense,sp_defense,all_defense
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Bulbasaur,49,65,114
Ivysaur,63,80,143
Venusaur,83,100,183
Mega Venusaur,123,120,243
Charmander,43,50,93
...,...,...,...
Dragonite,95,100,195
Mewtwo,90,90,180
Mega Mewtwo X,100,100,200
Mega Mewtwo Y,70,120,190


### <font color="red">Exercise 3: Pikachu data and filtering

* Read data from the pokedex_(Update_2021.04) file
* Find the speed of Raichu
* Find all the Pokemon in Generation 1 whose speed is greater than Raichu
    
There should be 17 Pokemon.

Bonus challenge: Filter out all "Mega" pokemon from the results.

### <font color="red">Exercise 4: What does this do? 

What do each of the lines in this chunk of code do? Run it, find out, and explain to someone sitting next to you.

In [None]:
first = pd.read_csv('pokedex_(Update_2021.04).csv', index_col = 'name')
second = first[first['generation'] == 1]
third = second.drop('Pikachu')
fourth = third.drop('generation', axis = 1)
fourth.to_csv('result.csv')

## Inserting data

Inserting column data into your DataFrames is straightforward. Just add it in:

### <font color="red">Exercise 5: Final Pokemon data analysis</font>

Import the latest Pokedex version. Write an expression to display each of the following:

* The Japanese names of all pokemon
* All data for the pokemon Charizard
* Total points for all pokemon in Generation 1.
* Total points for all Electric pokemon for all generations. (Hint: Type 1: Electric and Type 2: Electric both count)
* Total points for all pokemon after Generation 3 with greater than 500 total points.
* Attack points for all pokemon in Generation 4 as a multiple of special attack (sp_attack) for that pokemon. Show a DataFrame with 'attack', 'sp_attack', and "attack vs sp_attack", for example:

| name | attack | sp_attack | attack vs sp_attack |
|---|:---:|:---:|:---:|
| Turtwig | 68 | 45 | 1.51 |
| Grotle | 89 | 55 | 1.62 |
| Torterra | 109 | 75 | 1.45 |
| Chimchar | 58 | 58 | 1.00 |
| ... | ... | ... | ... |