Skip to content

dontmindifiduda/sf-crime

Repository files navigation

Team 4 San Francisco Crime Kaggle Competition Solution

Team Members

  • Scott Duda
  • Steve Estes
  • George Jiang
  • Morgan Kaiser

Exploratory Data Analysis

This repo includes a Jupyter notebook with our exploratory data analysis. Visualizations generated by this notebook are stored in the images subdirectory. This notebook also produces the bar chart race mp4 file.

Directions

  1. Make sure you have the original training and test data from the competition (train.csv.zip and test.csv.zip) saved in the same directory as the scripts included in this repo.
  2. Run the preprocessing script.
  3. Using the output files from the preprocessing script (p_train.csv and p_test.csv), train each of the individual models using its corresponding script.
  4. Using the output files from each model training script, run the stacking-combine script to create the final submission file using a stacked ensemble created from the predictions of each of the five models.

Model Results

Final Competition Score: 2.19915 (evaluated using multi-class logarithmic loss)

Additional Information

The code included in this repo was developed as part of the PGH Data Science Meetup Group's 2020 Kaggle Competition. A copy of the presentation that was delivered to present the model results can be found here.

An additional set of code was developed for incorporation of external datasets into the original data provided by the competition hosts. This code will be added to the repo in the near future. There were several interesting takeaways from the other team presentations that may inspire future code modifications as well.

UPDATE: This project is also summarized in a Medium article.

About

Team 4 solution to the SF Crime Kaggle Competition

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages