Rainfall Prediction is one of the difficult and uncertain tasks that have a significant impact on human society. Timely and accurate forecasting can proactively help reduce human and financial loss. This study presents a set of experiments that involve the use of common machine learning techniques to create models that can predict whether it will rain tomorrow or not based on the weather data for that day
- Balancing done for an unbalanced dataset (Over Sampling)
- Label Coding Is Done for Categorical Variables
- Sophisticated imputation like MICE is used
- Outliers can be detected and excluded from the data
- The filter method and wrapper methods are used for feature selection
- Compare speed and performance for different popular models
- Which metric can be the best to judge the performance on an unbalanced data set: precision and F1 score.
Cohen’s kappa statistic is a very good measure that can handle very well both multi-class and imbalanced class problems.
Cohen’s kappa is defined as:
Kappa = (observed accuracy - expected accuracy)/(1 - expected accuracy)
For more details Check
XGBoost and Random Forest performed better compared to other models. However, if speed is an important thing to consider, we can stick with Random Forest instead of XGBoost.