Chicago-business-survival-predictor

Implement a machine learning pipeline to priorize the business licenses that are likely to die in 2 years in Chicago.

The full project report can be found here

Pipelines

cofigs: a folder contains the configure files of different combinations of features. We use them to pass all the parameters that we need into pipelines.
data: contains an sh file to download the cleaned full data set.
data_collector: a folder contains all the code we use to collect and clean data.
output: a folder to save the results of our pipeline, including performance table, precision and recall curve plots and AUC-ROC curve plots.
pipeline: contains the modules of imputation, evaluation, discretization, get dummies and scaling.
tests: a set of test code for our piepline.
main.py: the main function to run models and get results.
transformer.py: the function to preprocess data set before modeling.

Getting Started

Get the full dataset

cd data
sh get_fullfiles.sh

Prerequisites

All the packages' requirement is in the enviorment.yml

To clone the enviorment, simply run the following:

conda env create -f environment.yml

To activate the enviorment, simply run the following:

conda activate myenv

Installing

python setup.py install

Running the tests

py.test

Running the pipeline

python main.py --config ./cofigs/acs_geo.yml

In the configs file, there are different combination of features that from ACS, reported 311, reported Crime, business license that you can choose.

Getting results

The results of the pipeline is saved in the output folder.

Under the performance foler, there would be csvs to keep all the performance of all models

Under the pr folder, there would be precison-recall graphs

Under the roc foler, there would be roc graphs

Additional materials

Authors

Peng Wei CV
Yuwei Zhang Linkedin
Ta-yun Yang Linkedin
Xuan Bu Linkedin

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Acknowledgments

This project is the final project of machine learning for public policy in University of Chicago.

Supervised by Professor Rayid Ghani
Inspired by Satej

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chicago-business-survival-predictor

Pipelines

Getting Started

Prerequisites

Installing

Running the tests

Running the pipeline

Getting results

Additional materials

Authors

License

Acknowledgments

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
cofigs		cofigs
data		data
data_collector		data_collector
output		output
pipeline		pipeline
tests		tests
LICENSE.md		LICENSE.md
README.md		README.md
environment.yml		environment.yml
main.py		main.py
setup.py		setup.py
transformer.py		transformer.py

License

XuanBu21/Chicago-business-survival-predictor

Folders and files

Latest commit

History

Repository files navigation

Chicago-business-survival-predictor

Pipelines

Getting Started

Prerequisites

Installing

Running the tests

Running the pipeline

Getting results

Additional materials

Authors

License

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages