In [1]:
# Add the Pandas dependency.
import pandas as pd

In [2]:
# Files to load
file_to_load = "Resources/missing_grades.csv"

# Read the CSV into a DataFrame
missing_grade_df = pd.read_csv(file_to_load)
missing_grade_df

Unnamed: 0,Student ID,student_name,gender,grade,reading_score,math_score
0,0,Paul Bradley,M,9th,66.0,79.0
1,1,Victor Smith,M,12th,94.0,61.0
2,2,Kevin Rodriguez,M,12th,,60.0
3,3,Dr. Richard Scott,M,12th,67.0,58.0
4,4,Bonnie Ray,F,9th,97.0,84.0
5,5,Bryan Miranda,M,9th,94.0,
6,6,Sheena Carter,F,11th,82.0,80.0
7,7,Nicole Baker,F,12th,96.0,69.0


# Options to handle missing data 

## Option 1: Do Nothing

If we do nothing, when we sum or take the averages of the reading and math scores, those NaNs will not be considered in the sum or the averages (just as they are not considered in the sum or the averages in an Excel file). In this situation, the missing values have no impact.

However, if we multiply or divide with a row that has a NaN, the answer will be NaN. This can cause problems if we need the answer for the rest of our code.



## Option 2: Drop the Row

To drop a row with NaNs, Pandas has the dropna() method. Use this method on the missing_grade_df DataFrame like this:

In [3]:
# Drop the NaNs.
missing_grade_df.dropna()

Unnamed: 0,Student ID,student_name,gender,grade,reading_score,math_score
0,0,Paul Bradley,M,9th,66.0,79.0
1,1,Victor Smith,M,12th,94.0,61.0
3,3,Dr. Richard Scott,M,12th,67.0,58.0
4,4,Bonnie Ray,F,9th,97.0,84.0
6,6,Sheena Carter,F,11th,82.0,80.0
7,7,Nicole Baker,F,12th,96.0,69.0


Dropping rows can affect the story you are trying to tell with the data. Before removing rows with NaN, you should ask yourself two key questions:

- How much data would be removed if NaNs are dropped?

- How would this impact the analysis?

- These questions need to be addressed for every dataset you work with.

## Option 3: Fill in the Row

Filling in an empty row must be used with caution. For example, filling in a row with "0" can impact arithmetic calculations. If you decide to fill in empty rows, the values you insert must  be carefully considered for every downstream analysis you perform.

In Pandas, we use the **fillna()** method to fill in a row. If we want to fill all empty rows with zero in our DataFrame, we pass the "0" in parentheses like this: **df.fillna(0)**.

In [3]:
# Fill in the empty rows with "85".
missing_grade_df.fillna(85)

Unnamed: 0,Student ID,student_name,gender,grade,reading_score,math_score
0,0,Paul Bradley,M,9th,66.0,79.0
1,1,Victor Smith,M,12th,94.0,61.0
2,2,Kevin Rodriguez,M,12th,85.0,60.0
3,3,Dr. Richard Scott,M,12th,67.0,58.0
4,4,Bonnie Ray,F,9th,97.0,84.0
5,5,Bryan Miranda,M,9th,94.0,85.0
6,6,Sheena Carter,F,11th,82.0,80.0
7,7,Nicole Baker,F,12th,96.0,69.0
