GitHub folder contains:
- R code of project in ‘.R format’: Cab Fare Prediction Using R.R
- Python code of project in ‘.ipynb format’: Cab Fare Prediction Using Python.ipynb
- Project report: Cab Fare Prediction.pdf
- Problem Statement.pdf
- Saved Model trained on entire training dataset from python: cab_fare_xgboost_model.pkl
- Saved Model trained on entire training dataset from python: final_Xgboost_model_using_R.rds
- Predictions on test dataset in csv format:predictions_xgboost.csv
The objective of this Project is to Predict Cab Fare amount based upon following data attributes in the dataset are as follows:
pickup_datetime - timestamp value indicating when the cab ride started.
pickup_longitude - float for longitude coordinate of where the cab ride started.
pickup_latitude - float for latitude coordinate of where the cab ride started.
dropoff_longitude - float for longitude coordinate of where the cab ride ended.
dropoff_latitude - float for latitude coordinate of where the cab ride ended.
passenger_count - an integer indicating the number of passengers in the cab ride.
- Data Pre-processing.
- Data Visualization.
- Outlier Analysis.
- Missing value Analysis.
- Feature Selection.
- Correlation analysis.
- Chi-Square test.
- Analysis of Variance(Anova) Test
- Multicollinearity Test.
- Feature Scaling.
- Normalization.
- Splitting into Train and Validation Dataset.
- Hyperparameter Optimization.
- Model Development I. Linear Regression II. Ridge Regression III. Lasso Regression IV. Decision Tree V. Random Forest
- Improve Accuracy a) Algorithm Tuning b) Ensembles------XGBOOST For Regression Finalize Model a) Predictions on validation dataset b) Create standalone model on entire training dataset c) Save model for later use
- Python Code
- R Code