## Loading/Exploring the data

Load the iris.csv file into a pandas dataframe. Take a minute to familiarize yourself with the data.

## Import Pandas

Import the `pandas` library as `pd`

In [2]:
import pandas as pd

Read the `iris.csv` dataset into an object named `iris`

In [23]:
iris = pd.read_csv('iris.csv')
print("Exported")

Exported


How many different species are in this dataset?

In [25]:
num_of_species = iris['species'].nunique()
print(num_of_species)

3


What are their names?

In [27]:
names = iris['species'].unique()
print(names)

['setosa' 'versicolor' 'virginica']


How many samples are there per species?

<details><summary>Hint</summary>Use the <a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.value_counts.html"><code>.value_counts()</code></a> method</details>

In [32]:
num_of_samples = iris['species'].value_counts()
print(num_of_samples)

species
setosa        50
versicolor    50
virginica     50
Name: count, dtype: int64


## Feature Engineering

Create a new column called `'sepal_ratio'` which is equal to sepal width / sepal length

In [41]:
iris['sepal_ratio'] = iris['sepal width (cm)'] / iris['sepal length (cm)']
print(iris)

     sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  \
0                  5.1               3.5                1.4               0.2   
1                  4.9               3.0                1.4               0.2   
2                  4.7               3.2                1.3               0.2   
3                  4.6               3.1                1.5               0.2   
4                  5.0               3.6                1.4               0.2   
..                 ...               ...                ...               ...   
145                6.7               3.0                5.2               2.3   
146                6.3               2.5                5.0               1.9   
147                6.5               3.0                5.2               2.0   
148                6.2               3.4                5.4               2.3   
149                5.9               3.0                5.1               1.8   

       species  sepal_ratio

Create a similar column called `'petal_ratio'`: petal width / petal length

In [45]:
iris['petal_ratio'] = iris['petal width (cm)'] / iris['petal length (cm)']
print(iris)

     sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  \
0                  5.1               3.5                1.4               0.2   
1                  4.9               3.0                1.4               0.2   
2                  4.7               3.2                1.3               0.2   
3                  4.6               3.1                1.5               0.2   
4                  5.0               3.6                1.4               0.2   
..                 ...               ...                ...               ...   
145                6.7               3.0                5.2               2.3   
146                6.3               2.5                5.0               1.9   
147                6.5               3.0                5.2               2.0   
148                6.2               3.4                5.4               2.3   
149                5.9               3.0                5.1               1.8   

       species  sepal_ratio

Create 4 columns that correspond to `sepal length (cm)`, `sepal width (cm)`, `petal length (cm)`, and `petal width (cm)`, only in inches.

In [51]:
inch = 0.393701

iris["sepal_length (in)"] = iris["sepal length (cm)"] * inch
iris["sepal_width (in)"] = iris["sepal width (cm)"] * inch
iris["petal_length (in)"] = iris["petal length (cm)"] * inch
iris["petal_width (in)"] = iris["petal width (cm)"] * inch

print(iris)

     sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  \
0                  5.1               3.5                1.4               0.2   
1                  4.9               3.0                1.4               0.2   
2                  4.7               3.2                1.3               0.2   
3                  4.6               3.1                1.5               0.2   
4                  5.0               3.6                1.4               0.2   
..                 ...               ...                ...               ...   
145                6.7               3.0                5.2               2.3   
146                6.3               2.5                5.0               1.9   
147                6.5               3.0                5.2               2.0   
148                6.2               3.4                5.4               2.3   
149                5.9               3.0                5.1               1.8   

       species  sepal_ratio

## Apply

Create a column called `'encoded_species'`:
- 0 for setosa
- 1 for versicolor
- 2 for virginica


Hint 1
Create a dictionary using the species as keys and the numbers 0-2 for values


Hint 2
    Use the dictionary in hint 1 with the <code>.apply()</code> method to create the new column


In [53]:
species_encoding = {"setosa": 0, "versicolor": 1, "virginica": 2}
iris["encoded_species"] = iris["species"].apply(lambda x: species_encoding[x])

print(iris)


     sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  \
0                  5.1               3.5                1.4               0.2   
1                  4.9               3.0                1.4               0.2   
2                  4.7               3.2                1.3               0.2   
3                  4.6               3.1                1.5               0.2   
4                  5.0               3.6                1.4               0.2   
..                 ...               ...                ...               ...   
145                6.7               3.0                5.2               2.3   
146                6.3               2.5                5.0               1.9   
147                6.5               3.0                5.2               2.0   
148                6.2               3.4                5.4               2.3   
149                5.9               3.0                5.1               1.8   

       species  sepal_ratio

## March Madness

Let's change up the dataset to something different than flowers: March Madness!

Read in the dataset `ncaa-seeds.csv` to an object named `seeds`.

This dataframe simulates the games that will occur in the first round of the [NCAA basketball tournament](http://www.sportingnews.com/au/ncaa-basketball/news/ncaa-tournament-2017-march-madness-bracket-schedule-matchups-print-a-bracket/1r6cau9sb1xj4131zzhay2dj5g). In the first row, you should see the following:

| team_seed | opponent_seed |
|-----------|---------------|
| 01N       | 16N           |

In [63]:
game = pd.read_csv("ncaa-seeds.csv")
print(game.head())


  team_seed,opponent_seed
0                 01N,16N
1                 02N,15N
2                 03N,14N
3                 04N,13N
4                 05N,12N


For team_seed, the 01 is their seed, and N is their division (North). This row is saying the 1st seed in the north division will play the 16th seed (same division).

Using the `.apply()` method, create the following new columns:
- `team_division`
- `opponent_division`

The first row of your result should look as follows:

| team_seed | opponent_seed | team_division | opponent_division |
|-----------|---------------|---------------|-------------------|
| 01N       | 16N           | N             | N                 |


In [72]:
data = {"team_seed": ["01N", "02S", "03E", "04W", "05N"],"opponent_seed": ["16N", "15S", "14E", "13W", "12N"]}
game = pd.DataFrame(data)
game["team_division"] = game["team_seed"].apply(lambda x: x[-1])
game["opponent_division"] = game["opponent_seed"].apply(lambda x: x[-1])

print(game)

  team_seed opponent_seed team_division opponent_division
0       01N           16N             N                 N
1       02S           15S             S                 S
2       03E           14E             E                 E
3       04W           13W             W                 W
4       05N           12N             N                 N


Now that you have the divisions, change the `team_seed` and `opponent_seed` columns to just be the numbers.

The first row of your result should look as follows:

| team_seed | opponent_seed | team_division | opponent_division |
|-----------|---------------|---------------|-------------------|
| 1         | 16            | N             | N                 |

In [75]:
game.columns = game.columns.str.strip()
game["team_division"] = game["team_seed"].apply(lambda x: x[-1])
game["opponent_division"] = game["opponent_seed"].apply(lambda x: x[-1])

game["team_seed"] = game["team_seed"].apply(lambda x: int(x[:-1]))
game["opponent_seed"] = game["opponent_seed"].apply(lambda x: int(x[:-1]))

print(game)

   team_seed  opponent_seed team_division opponent_division
0          1             16             N                 N
1          2             15             S                 S
2          3             14             E                 E
3          4             13             W                 W
4          5             12             N                 N


Create a new column called seed_delta, which is the difference between the team's seed and their opponent's. 

The first row of your result should look as follows:

| team_seed | opponent_seed | team_division | opponent_division | seed_delta |
|-----------|---------------|---------------|-------------------|------------|
| 1         | 16            | N             | N                 | -15        |

<br>
<details><summary>Did you get an error?</summary>
team_seed and opponent_seed need to be numerical columns in order for you to perform mathematical operations on them.
</details>

In [79]:
game["seed_delta"] = game["team_seed"] - game["opponent_seed"]
print(game)

   team_seed  opponent_seed team_division opponent_division  seed_delta
0          1             16             N                 N         -15
1          2             15             S                 S         -13
2          3             14             E                 E         -11
3          4             13             W                 W          -9
4          5             12             N                 N          -7
