![rmotr](https://user-images.githubusercontent.com/7065401/52071918-bda15380-2562-11e9-828c-7f95297e4a82.png)
<hr style="margin-bottom: 40px;">

# Pandas DataFrame exercises


In [1]:
# Import the numpy package under the name np
import numpy as np

# Import the pandas package under the name pd
import pandas as pd

# Import the matplotlib package under the name plt
import matplotlib.pyplot as plt
%matplotlib inline

# Print the pandas version and the configuration
print(pd.__version__)

ModuleNotFoundError: No module named 'numpy'

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## DataFrame creation

### Create an empty pandas DataFrame


In [None]:
pd.DataFrame(data=[None],
             index=[None],
             columns=[None])

<img width=400 src="https://cdn.dribbble.com/users/4678/screenshots/1986600/avengers.png"></img>

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Create a `marvel_df` pandas DataFrame with the given marvel data


In [None]:
marvel_data = [
    ['Spider-Man', 'male', 1962],
    ['Captain America', 'male', 1941],
    ['Wolverine', 'male', 1974],
    ['Iron Man', 'male', 1963],
    ['Thor', 'male', 1963],
    ['Thing', 'male', 1961],
    ['Mister Fantastic', 'male', 1961],
    ['Hulk', 'male', 1962],
    ['Beast', 'male', 1963],
    ['Invisible Woman', 'female', 1961],
    ['Storm', 'female', 1975],
    ['Namor', 'male', 1939],
    ['Hawkeye', 'male', 1964],
    ['Daredevil', 'male', 1964],
    ['Doctor Strange', 'male', 1963],
    ['Hank Pym', 'male', 1962],
    ['Scarlet Witch', 'female', 1964],
    ['Wasp', 'female', 1963],
    ['Black Widow', 'female', 1964],
    ['Vision', 'male', 1968]
]

In [None]:
df = pd.DataFrame(data=marvel_data)

df

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Add column names to the `marvel_df`


In [None]:
col_names = ['name', 'sex', 'first_appearance']

df.columns = col_names
df

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Add index names to the `marvel_df` (use the character name as index)


In [None]:
df.index = df['name']
df

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Drop the name column as it's now the index

In [None]:
#df = df.drop(columns=['name'])
df = df.drop(['name'], axis=1)
df

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Drop 'Namor' and 'Hank Pym' rows


In [None]:
df =df.drop(['Namor', 'Hank Pym'], axis=0)
df

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## DataFrame selection, slicing and indexation

### Show the first 5 elements on `marvel_df`


In [None]:
df.head()

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Show the last 5 elements on `marvel_df`


In [None]:
df.tail()

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Show just the sex of the first 5 elements on `marvel_df`

In [None]:
df.head().sex.to_frame()

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Show the first_appearance of all middle elements on `marvel_df`

In [None]:
df.iloc[1:-1,].first_appearance.to_frame()

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Show the first and last elements on `marvel_df`


In [None]:
df.iloc[[0, -1],][['sex', 'first_appearance']]
df.iloc[[0, -1],]

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## DataFrame manipulation and operations

### Modify the `first_appearance` of 'Vision' to year 1964

In [None]:
df.loc['Vision', 'first_appearance'] = 1964

df

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Add a new column to `marvel_df` called 'years_since' with the years since `first_appearance`


In [None]:
df['years_since'] = 2018 - df['first_appearance']

df

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## DataFrame boolean arrays (also called masks)

### Given the `marvel_df` pandas DataFrame, make a mask showing the female characters


In [None]:
mask = df['sex'] == 'female'

mask

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Given the `marvel_df` pandas DataFrame, get the male characters


In [None]:
mask = df['sex'] == 'male'

df[mask]

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Given the `marvel_df` pandas DataFrame, get the characters with `first_appearance` after 1970


In [None]:
mask = df['first_appearance'] > 1970

df[mask]

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Given the `marvel_df` pandas DataFrame, get the female characters with `first_appearance` after 1970

In [None]:
mask = (df['sex'] == 'female') & (df['first_appearance'] > 1970)

df[mask]

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## DataFrame summary statistics

### Show basic statistics of `marvel_df`

In [None]:
df.describe()

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Given the `marvel_df` pandas DataFrame, show the mean value of `first_appearance`

In [None]:

df.first_appearance.mean()

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Given the `marvel_df` pandas DataFrame, show the min value of `first_appearance`


In [None]:

df.first_appearance.min()

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Given the `marvel_df` pandas DataFrame, get the characters with the min value of `first_appearance`

In [None]:
mask = df['first_appearance'] == df.first_appearance.min()
df[mask]

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## DataFrame basic plottings

### Reset index names of `marvel_df`


In [None]:
df = df.reset_index()

df

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Plot the values of `first_appearance`


In [None]:
plt.plot(df.index, df.first_appearance)

In [None]:
df.first_appearance.plot()

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

### Plot a histogram (plot.hist) with values of `first_appearance`


In [None]:

plt.hist(df.first_appearance)

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)
