# Imputing missing plane prices
Now there's just one column with missing values left!

You've removed the `"Additional_Info"` column from `planes`—the last step is to impute the missing data in the `"Price"` column of the dataset.

As a reminder, you generated this boxplot, which suggested that imputing the median price based on the `"Airline`" is a solid approach!

In [12]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
planes= pd.read_csv(r"D:\Cursos\Data_Science_Python\data_sets\planesv2.csv")
planes.head(3)
# Find the five percent threshold
threshold = len(planes) * 0.05
# Create a filter
cols_to_drop = planes.columns[planes.isna().sum() <= threshold]
planes=planes.drop('Additional_Info',axis=1)
# Drop missing values for columns below the threshold
planes.dropna(subset=cols_to_drop, inplace=True)

Group planes by airline and calculate the median price.

In [14]:
# Calculate median plane ticket prices by Airline
airline_prices = planes.groupby("Airline")["Price"].median()

print(airline_prices)

Airline
Air Asia              5192.0
Air India             9443.0
GoAir                 5003.5
IndiGo                5054.0
Jet Airways          11507.0
Multiple carriers    10197.0
SpiceJet              3873.0
Vistara               8028.0
Name: Price, dtype: float64


Convert the grouped median prices to a dictionary.

In [15]:
# Convert to a dictionary
prices_dict = airline_prices.to_dict()
prices_dict

{'Air Asia': 5192.0,
 'Air India': 9443.0,
 'GoAir': 5003.5,
 'IndiGo': 5054.0,
 'Jet Airways': 11507.0,
 'Multiple carriers': 10197.0,
 'SpiceJet': 3873.0,
 'Vistara': 8028.0}

* Conditionally impute missing values for `"Price"` by mapping values in the `"Airline column"` based on `prices_dict`.
* Check for remaining missing values.

In [16]:
# Map the dictionary to missing values of Price by Airline
planes["Price"] = planes["Price"].fillna(planes["Airline"].map(prices_dict))

# Check for missing values
print(planes.isna().sum())

Airline            0
Date_of_Journey    0
Source             0
Destination        0
Route              0
Dep_Time           0
Arrival_Time       0
Duration           0
Total_Stops        0
Price              0
dtype: int64
