Skip to content

Predictive model in Python to predict car destination based in time and starting position

Notifications You must be signed in to change notification settings


Repository files navigation

Car destination prediction model

The goal of this project is to create a predictive model to predict the position destination of a car based on a date and a starting position.

System requirements

The has been done using Python and Spark as the main technologies. To be able to run the notebooks you need to have installed the following:

  • Spark 2.2
  • Python 3.6. Also, these packages are required:
    • notebook
    • findspark
    • numpy
    • pandas
    • scikit-learn
    • python-geohash
    • matplotlib
    • gmaps

If you have Anaconda installed in your computer, you can easily get your Python environment ready by loading python-environment.yml, which contains all the dependencies. You can do it by simply running:

conda env create -f python-environment.yml

Although is not necessary to perform the data processing and running the model, you will need a Google Maps Javascript API Key to visualize maps with gmaps in some notebook. After you activate it in the Google Developers Console, you must add it to your environment by:

export GOOGLE_API_KEY=[Your fantastic API KEY goes here]

Project structure

The project contains the following type of files:

  • Jupyter notebooks. They contain the code for the project implementation. You will better understand the project by following in this order:
    • data-cleansing.ipynb: Contains the code for read and explore the raw dataset, make some data cleanup transformations and visualization (maps). It produces as a result the file processed-dataset.csv
    • features-preparation: Normalize the data, expands dimensionality, and in general compute new features which could be useful depending on the model that choose later. It produces featured-dataset.csv
    • random-forest-model.ipynb: Implements Random Forest Prediction Model.
    • k-nearest-model.ipynb: Implements K-Nearest Neighbor Prediction Model.
  • Python script.
    • This script runs the models generated in the notebooks to predict the heading of a vehicle based on its starting position and time.
  • Models. The trained models are stored in the following files:
    • random_forest_model.pkl
    • k_nearest_model.pkl
  • Analysis Documentation. There is a PDF file which details all the analysis, decision making, and discuss the code of the implementation: predictive-analytics-connected-car.pdf

Running the models

To ease the evaluation of the model, I've created a simple script in Python so that you can play with different values and see the prediction.

To run the script, Spark is not needed, and only numpy, scikit-learn and geohash Python packages are required. However, if you loaded the environment which I provided with the project, you'll have everything you need to go.

From a command line if you type:

./ -h

You will get help on how to use it:

usage: [-h] {forest,knn} time latitude longitude

positional arguments:
  {forest,knn}  Predictive model to use, can be either forest or knn
  time          Start trip time, with the format "yyyy-MM-dd HH:mm:ss". It
                must be between quotation marks. For instance, you coud use:
                "2017-05-24 12:26:37"
  latitude      Latitude of the trip start position. For instance, you could
                use: 47.409291
  longitude     Longitude of the trip end position. For instance, you could
                use: 8.546942

optional arguments:
  -h, --help    show this help message and exit

For example, if you wanted to make a prediction using the K-Nearest Neighbor Model:

./ knn "2017-05-29 18:23:27" 32.989318 -97.263840

And that's all. Enjoy the code! Feedback is welcome ;-)


Predictive model in Python to predict car destination based in time and starting position






No releases published


No packages published