Skip to content


Repository files navigation


GitHub folder contains:

  1. R code of project in ‘.R format’: Cab Fare Prediction Using R.R
  2. Python code of project in ‘.ipynb format’: Cab Fare Prediction Using Python.ipynb
  3. Project report: Cab Fare Prediction.pdf
  4. Problem Statement.pdf
  5. Saved Model trained on entire training dataset from python: cab_fare_xgboost_model.pkl
  6. Saved Model trained on entire training dataset from python: final_Xgboost_model_using_R.rds
  7. Predictions on test dataset in csv format:predictions_xgboost.csv

Problem Statement

The objective of this Project is to Predict Cab Fare amount based upon following data attributes in the dataset are as follows:

pickup_datetime - timestamp value indicating when the cab ride started.
pickup_longitude - float for longitude coordinate of where the cab ride started.
pickup_latitude - float for latitude coordinate of where the cab ride started.
dropoff_longitude - float for longitude coordinate of where the cab ride ended.
dropoff_latitude - float for latitude coordinate of where the cab ride ended.
passenger_count - an integer indicating the number of passengers in the cab ride.

It is a regression Problem.

All the steps implemented in this project

  1. Data Pre-processing.
  2. Data Visualization.
  3. Outlier Analysis.
  4. Missing value Analysis.
  5. Feature Selection.
  • Correlation analysis.
  • Chi-Square test.
  • Analysis of Variance(Anova) Test
  • Multicollinearity Test.
  1. Feature Scaling.
  • Normalization.
  1. Splitting into Train and Validation Dataset.
  2. Hyperparameter Optimization.
  3. Model Development I. Linear Regression II. Ridge Regression III. Lasso Regression IV. Decision Tree V. Random Forest
  4. Improve Accuracy a) Algorithm Tuning b) Ensembles------XGBOOST For Regression Finalize Model a) Predictions on validation dataset b) Create standalone model on entire training dataset c) Save model for later use
  5. Python Code
  6. R Code

You can view the Project Report for more details


No description, website, or topics provided.






No releases published
