# **Filling missing values with pandas**

- When building data pipelines, it's inevitable that you'll stumble upon missing data. In some cases, you may want to remove these records from the dataset. But in others, you'll need to impute values for the missing information. In this exercise, you'll practice using pandas to impute missing test scores.

- Data from the file "testing_scores.json" has been read into a DataFrame, and is stored in the variable raw_testing_scores. In addition to this, pandas has been loaded as pd.

**Instructions**

- Print the head of the raw_testing_scores DataFrame, and observe the NaN values.

In [None]:
# Print the head of the `raw_testing_scores` DataFrame
print(raw_testing_scores.head())

In [None]:
              street_address       city  math_score  reading_score  writing_score
02M260  425 West 33rd Street  Manhattan         NaN            NaN            NaN
06M211    650 Academy Street  Manhattan         NaN            NaN            NaN
01M539   111 Columbia Street  Manhattan       657.0          601.0          601.0
02M294      350 Grand Street  Manhattan       395.0          411.0          387.0
02M308      350 Grand Street  Manhattan       418.0          428.0          415.0

**Instructions**

- Use the average of the "math_score" column to fill the NaN values in the "math_score" column.
- Print the head of the updated DataFrame.

In [None]:
# Fill NaN values with the average from that column
raw_testing_scores["math_score"] = raw_testing_scores["math_score"].fillna(raw_testing_scores["math_score"].mean())

# Print the head of the raw_testing_scores DataFrame
print(raw_testing_scores.head())

In [None]:
              street_address       city  math_score  reading_score  writing_score
02M260  425 West 33rd Street  Manhattan     432.944            NaN            NaN
06M211    650 Academy Street  Manhattan     432.944            NaN            NaN
01M539   111 Columbia Street  Manhattan     657.000          601.0          601.0
02M294      350 Grand Street  Manhattan     395.000          411.0          387.0
02M308      350 Grand Street  Manhattan     418.000          428.0          415.0

**Instructions**

- For the "math_score", "reading_score" and "writing_score" columns, update the transform() function to fill NaN values with the mean of the respective columns, in place.
- Print the head of the cleaned DataFrame.

In [None]:
def transform(raw_data):
	raw_data.fillna(
    	value={
			# Fill NaN values with column mean
			"math_score": raw_data["math_score"].mean(),
			"reading_score": raw_data["reading_score"].mean(),
			"writing_score": raw_data["writing_score"].mean()
		}, inplace=True
	)
	return raw_data

clean_testing_scores = transform(raw_testing_scores)

# Print the head of the clean_testing_scores DataFrame
print(clean_testing_scores.head())

In [None]:
              street_address       city  math_score  reading_score  writing_score
02M260  425 West 33rd Street  Manhattan     432.944        424.504        418.459
06M211    650 Academy Street  Manhattan     432.944        424.504        418.459
01M539   111 Columbia Street  Manhattan     657.000        601.000        601.000
02M294      350 Grand Street  Manhattan     395.000        411.000        387.000
02M308      350 Grand Street  Manhattan     418.000        428.000        415.000