### Introduction to Pandas Bonus/Study Notebook

Topics to study for the test.  Some of these are already in this notebook, but I suggest you add the remaining ones.  See Lecture notebooks for examples.  (Panda Introduction and Pandas_merge_groupby_apply)

- Panda Series (enhanced one dimensional array)
    - initialize with list or dictionary
    - accessing values
    - calculate descriptive statistics
    - creating custom index
- Panda DataFrames (enhanced two-dimensional array)
    - initialize from CSV
    - export to CSV
    - customize index
    - select columns and select rows (at the same time or independently)
    - boolean (conditional) indexing/selection 
    - accessing a specific cell by row and column 
    - calculate descriptive statistics on the entire DataFrame or specific columns and rows
    - calculate sum, mean, min, max based on columns or rows (don’t forget to set the axis for columns) for the entire DataFrame or specific columns or specific rows
    - add a new column to a DataFrame based on a list
    - groupby and sort_values
    - unique values and counts for columns (value_counts)
    - info, shape, head, tail
    - add a new column to DataFrame based on existing columns
    - dealing with null values (replacing values or dropping rows or columns with null values)


In [88]:
import os
import numpy as np
import pandas as pd
from pandas import Series, DataFrame

import sklearn.linear_model
import matplotlib as mpl
import matplotlib.pyplot as plt

%matplotlib inline

In [89]:
# Set the path to our data location
datapath = os.path.join(".", "")
datapath

'./'

In [90]:
hogwarts_datafile = 'hogwarts_grades.csv'

In [91]:

grades = pd.read_csv(datapath + hogwarts_datafile, index_col='Name')

In [92]:
grades = pd.read_csv(datapath + hogwarts_datafile)
grades

Unnamed: 0,Name,Potions,Transfiguration,Runes,Defense,Divination,Data Science,Charms,Herbology
0,Harry,80.0,92.0,,100.0,71.0,92.0,93.0,83.0
1,Hermione,100.0,100.0,100.0,100.0,,100.0,100.0,100.0
2,Ron,70.0,83.0,,92.0,73.0,98.0,95.0,87.0
3,Draco,100.0,88.0,,72.0,75.0,72.0,92.0,92.0
4,Crabbe,31.0,15.0,,29.0,6.0,3.0,70.0,52.0
5,Fred,75.0,93.0,,91.0,,58.0,,
6,George,75.0,93.0,,91.0,,58.0,,
7,Goyle,23.0,43.0,,32.0,11.0,21.0,41.0,49.0
8,Luna,94.0,97.0,100.0,93.0,98.0,100.0,98.0,98.0
9,Cho,,92.0,,95.0,93.0,98.0,95.0,97.0


In [93]:
grades.set_index('Name',inplace=True)

Get information the Dataframe

In [94]:
grades.info()

<class 'pandas.core.frame.DataFrame'>
Index: 12 entries, Harry to Neville
Data columns (total 8 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Potions          9 non-null      float64
 1   Transfiguration  12 non-null     float64
 2   Runes            2 non-null      float64
 3   Defense          11 non-null     float64
 4   Divination       8 non-null      float64
 5   Data Science     10 non-null     float64
 6   Charms           8 non-null      float64
 7   Herbology        9 non-null      float64
dtypes: float64(8)
memory usage: 864.0+ bytes


See the beginning of the DataFrame

In [95]:
grades.head(5)

Unnamed: 0_level_0,Potions,Transfiguration,Runes,Defense,Divination,Data Science,Charms,Herbology
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Harry,80.0,92.0,,100.0,71.0,92.0,93.0,83.0
Hermione,100.0,100.0,100.0,100.0,,100.0,100.0,100.0
Ron,70.0,83.0,,92.0,73.0,98.0,95.0,87.0
Draco,100.0,88.0,,72.0,75.0,72.0,92.0,92.0
Crabbe,31.0,15.0,,29.0,6.0,3.0,70.0,52.0


See the end of the DataFrame

In [96]:
grades.tail(5)

Unnamed: 0_level_0,Potions,Transfiguration,Runes,Defense,Divination,Data Science,Charms,Herbology
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Goyle,23.0,43.0,,32.0,11.0,21.0,41.0,49.0
Luna,94.0,97.0,100.0,93.0,98.0,100.0,98.0,98.0
Cho,,92.0,,95.0,93.0,98.0,95.0,97.0
Cedric,,98.0,,,,,,
Neville,,84.0,,71.0,78.0,,,100.0


See the shape of the DataFrame (rows and columns)

In [97]:
grades.shape

(12, 8)

Get the column names

In [98]:
grades.columns

Index(['Potions', 'Transfiguration', 'Runes', 'Defense', 'Divination',
       'Data Science', 'Charms', 'Herbology'],
      dtype='object')

Show a specific set of columns

In [99]:
grades[['Potions','Herbology']]

Unnamed: 0_level_0,Potions,Herbology
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Harry,80.0,83.0
Hermione,100.0,100.0
Ron,70.0,87.0
Draco,100.0,92.0
Crabbe,31.0,52.0
Fred,75.0,
George,75.0,
Goyle,23.0,49.0
Luna,94.0,98.0
Cho,,97.0


Change all the null/NaN values to -1 in place.

In [100]:
grades.fillna(-1, inplace=True)

In [101]:
grades

Unnamed: 0_level_0,Potions,Transfiguration,Runes,Defense,Divination,Data Science,Charms,Herbology
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Harry,80.0,92.0,-1.0,100.0,71.0,92.0,93.0,83.0
Hermione,100.0,100.0,100.0,100.0,-1.0,100.0,100.0,100.0
Ron,70.0,83.0,-1.0,92.0,73.0,98.0,95.0,87.0
Draco,100.0,88.0,-1.0,72.0,75.0,72.0,92.0,92.0
Crabbe,31.0,15.0,-1.0,29.0,6.0,3.0,70.0,52.0
Fred,75.0,93.0,-1.0,91.0,-1.0,58.0,-1.0,-1.0
George,75.0,93.0,-1.0,91.0,-1.0,58.0,-1.0,-1.0
Goyle,23.0,43.0,-1.0,32.0,11.0,21.0,41.0,49.0
Luna,94.0,97.0,100.0,93.0,98.0,100.0,98.0,98.0
Cho,-1.0,92.0,-1.0,95.0,93.0,98.0,95.0,97.0


Change all -1 back to NaN.

In [112]:
#grades[grades == -1] = np.nan
# OR
grades.replace(-1,np.nan, inplace=True)

In [103]:
grades

Unnamed: 0_level_0,Potions,Transfiguration,Runes,Defense,Divination,Data Science,Charms,Herbology
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Harry,80.0,92.0,,100.0,71.0,92.0,93.0,83.0
Hermione,100.0,100.0,100.0,100.0,,100.0,100.0,100.0
Ron,70.0,83.0,,92.0,73.0,98.0,95.0,87.0
Draco,100.0,88.0,,72.0,75.0,72.0,92.0,92.0
Crabbe,31.0,15.0,,29.0,6.0,3.0,70.0,52.0
Fred,75.0,93.0,,91.0,,58.0,,
George,75.0,93.0,,91.0,,58.0,,
Goyle,23.0,43.0,,32.0,11.0,21.0,41.0,49.0
Luna,94.0,97.0,100.0,93.0,98.0,100.0,98.0,98.0
Cho,,92.0,,95.0,93.0,98.0,95.0,97.0


Drop all rows that contain NaN

In [104]:
grades.dropna()

Unnamed: 0_level_0,Potions,Transfiguration,Runes,Defense,Divination,Data Science,Charms,Herbology
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Luna,94.0,97.0,100.0,93.0,98.0,100.0,98.0,98.0


Drop all columns that contain NaN.

In [105]:
grades.dropna(axis='columns')

Unnamed: 0_level_0,Transfiguration
Name,Unnamed: 1_level_1
Harry,92.0
Hermione,100.0
Ron,83.0
Draco,88.0
Crabbe,15.0
Fred,93.0
George,93.0
Goyle,43.0
Luna,97.0
Cho,92.0


Add a column named 'Total' that has the total sum of all points for each student.

In [106]:
grades['Total'] = grades.sum(axis='columns')
grades

Unnamed: 0_level_0,Potions,Transfiguration,Runes,Defense,Divination,Data Science,Charms,Herbology,Total
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Harry,80.0,92.0,,100.0,71.0,92.0,93.0,83.0,611.0
Hermione,100.0,100.0,100.0,100.0,,100.0,100.0,100.0,700.0
Ron,70.0,83.0,,92.0,73.0,98.0,95.0,87.0,598.0
Draco,100.0,88.0,,72.0,75.0,72.0,92.0,92.0,591.0
Crabbe,31.0,15.0,,29.0,6.0,3.0,70.0,52.0,206.0
Fred,75.0,93.0,,91.0,,58.0,,,317.0
George,75.0,93.0,,91.0,,58.0,,,317.0
Goyle,23.0,43.0,,32.0,11.0,21.0,41.0,49.0,220.0
Luna,94.0,97.0,100.0,93.0,98.0,100.0,98.0,98.0,778.0
Cho,,92.0,,95.0,93.0,98.0,95.0,97.0,570.0


Drop the column 'Total' that was just added.

In [107]:
grades.drop('Total',axis='columns',inplace=True)

In [108]:
grades

Unnamed: 0_level_0,Potions,Transfiguration,Runes,Defense,Divination,Data Science,Charms,Herbology
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Harry,80.0,92.0,,100.0,71.0,92.0,93.0,83.0
Hermione,100.0,100.0,100.0,100.0,,100.0,100.0,100.0
Ron,70.0,83.0,,92.0,73.0,98.0,95.0,87.0
Draco,100.0,88.0,,72.0,75.0,72.0,92.0,92.0
Crabbe,31.0,15.0,,29.0,6.0,3.0,70.0,52.0
Fred,75.0,93.0,,91.0,,58.0,,
George,75.0,93.0,,91.0,,58.0,,
Goyle,23.0,43.0,,32.0,11.0,21.0,41.0,49.0
Luna,94.0,97.0,100.0,93.0,98.0,100.0,98.0,98.0
Cho,,92.0,,95.0,93.0,98.0,95.0,97.0


Change Fred's grade in Potions to an 88.

In [109]:
grades.loc['Fred','Potions'] = 88

In [110]:
grades

Unnamed: 0_level_0,Potions,Transfiguration,Runes,Defense,Divination,Data Science,Charms,Herbology
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Harry,80.0,92.0,,100.0,71.0,92.0,93.0,83.0
Hermione,100.0,100.0,100.0,100.0,,100.0,100.0,100.0
Ron,70.0,83.0,,92.0,73.0,98.0,95.0,87.0
Draco,100.0,88.0,,72.0,75.0,72.0,92.0,92.0
Crabbe,31.0,15.0,,29.0,6.0,3.0,70.0,52.0
Fred,88.0,93.0,,91.0,,58.0,,
George,75.0,93.0,,91.0,,58.0,,
Goyle,23.0,43.0,,32.0,11.0,21.0,41.0,49.0
Luna,94.0,97.0,100.0,93.0,98.0,100.0,98.0,98.0
Cho,,92.0,,95.0,93.0,98.0,95.0,97.0


Select rows based on condition.

In [111]:
grades.loc[(grades['Transfiguration'] < 60) & (grades['Charms'] < 70)]

Unnamed: 0_level_0,Potions,Transfiguration,Runes,Defense,Divination,Data Science,Charms,Herbology
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Goyle,23.0,43.0,,32.0,11.0,21.0,41.0,49.0
