Team 4 San Francisco Crime Kaggle Competition Solution

Team Members

Scott Duda
Steve Estes
George Jiang
Morgan Kaiser

Exploratory Data Analysis

This repo includes a Jupyter notebook with our exploratory data analysis. Visualizations generated by this notebook are stored in the images subdirectory. This notebook also produces the bar chart race mp4 file.

Directions

Make sure you have the original training and test data from the competition (train.csv.zip and test.csv.zip) saved in the same directory as the scripts included in this repo.
Run the preprocessing script.
Using the output files from the preprocessing script (p_train.csv and p_test.csv), train each of the individual models using its corresponding script.
Using the output files from each model training script, run the stacking-combine script to create the final submission file using a stacked ensemble created from the predictions of each of the five models.

Model Results

Final Competition Score: 2.19915 (evaluated using multi-class logarithmic loss)

Additional Information

The code included in this repo was developed as part of the PGH Data Science Meetup Group's 2020 Kaggle Competition. A copy of the presentation that was delivered to present the model results can be found here.

An additional set of code was developed for incorporation of external datasets into the original data provided by the competition hosts. This code will be added to the repo in the near future. There were several interesting takeaways from the other team presentations that may inspire future code modifications as well.

UPDATE: This project is also summarized in a Medium article.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.ipynb_checkpoints		.ipynb_checkpoints
images		images
.gitignore		.gitignore
README.md		README.md
SF Crime Data - EDA.ipynb		SF Crime Data - EDA.ipynb
bar_chart_race.mp4		bar_chart_race.mp4
districts.geojson		districts.geojson
geckodriver.log		geckodriver.log
preprocessing.py		preprocessing.py
sampleSubmission.csv.zip		sampleSubmission.csv.zip
stacking-catboost.py		stacking-catboost.py
stacking-combine.py		stacking-combine.py
stacking-lightgbm.py		stacking-lightgbm.py
stacking-nn.py		stacking-nn.py
stacking-rf.py		stacking-rf.py
stacking-xgboost.py		stacking-xgboost.py
tempmap.html		tempmap.html
test.csv.zip		test.csv.zip
train.csv.zip		train.csv.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Team 4 San Francisco Crime Kaggle Competition Solution

Team Members

Exploratory Data Analysis

Directions

Model Results

Additional Information

About

Releases

Packages

Languages

dontmindifiduda/sf-crime

Folders and files

Latest commit

History

Repository files navigation

Team 4 San Francisco Crime Kaggle Competition Solution

Team Members

Exploratory Data Analysis

Directions

Model Results

Additional Information

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages