There are multiple ways of handling missing data and this varies case by case. There is no universal best way in dealing with the missing data. Use your best judgement and explore different options to determine which method is best for your data set.
- Drop rows or columns that have a missing value - df.dropna()
- Drop rows or columns, on the basis a missing value frequency - df.dropna(how='all')
- Drop rows or columns based on a threshold value -- For instance, “thresh=4” means that the rows that have at least 4 non-missing values will be kept. The other ones will be dropped. - df.dropna(thresh=4)
- Drop based on a particular subset of columns - df.dropna(subset=['column1','column2'])
- Fill with a constant value - df.fillna(0)
- Fill with an aggregated value - df.fillna(df['column1'].mean())
- Replace with the previous or next value - df.fillna(method='bfill')
- Fill by using another data frame that have same columns - df.fillna(df2)
- Fill value with predicted value, or generated by ML algorithm with interpolation