Skip to content

FeldmanMike/transit-ml

Repository files navigation

Ticket to Ride Where?

Ticket to Ride Where? is a machine learning project that predicts public transit ridership rates in Illinois Census tracts. The project generates a model that predicts which tracts would increase their use of public transportation the most given an increase in transit access. The hope is that this model can be used as a tool to inform community-based decision-making about where to prioritize investments in public transit infrastructure.

Installation

In your preferred directory, clone the repository using git:

git clone https://github.com/FeldmanMike/transit-ml.git

Install required packages in a virtual environment:

pip3 install -r requirements.txt

Structure

main.py governs the text-based UI of this repository. In the terminal, users may choose to rely on archived versions of the data and/or best model or run the entire program from scratch (which will take several hours).

pipeline.py contains useful helper functions to get ACS data, impute missing values, run grid search over multiple pipelines, and identify and record information about best-performing models.

download.py and data_wrangling.py get data from ACS and the WalkScore API, merge it with files in data_sources, and performs necessary cleaning before returning a DataFrame ready for modeling.

model_selection.py takes the DataFrame, splits it into training and testing sets, and runs a grid search over pre-selected regression models and hyperparameters to identify the best model, which is saved to best_model.pkl.

recommend.py produces DataFrames of 1) Census tracts recommended for increased transit investment based on the results of the best model and 2) Census tracts recommended for further inspection based on a large positive difference between the best model's predictions and the tract's actual ridership rates.

CENSUS_DATA_COLS.json contains a dictionary mapping ACS 5-year table ID's to column labels with information about what each table contains.

The pickle_files folder contains .pkl files generated from other files in the repository relating to model selection and Census tracts recommended from recommend.py.

The data_sources folder contains shapefiles and LEHD Origin-Destination Employment Statistics (LODES).

Usage

To run using an archived version of the data and best model:

python3 main.py -m

To run using an archived version of the data:

python3 main.py -d

To run using no archives:

python3 main.py

Authors

The authors of this repository are Nathan Dignazio, Mike Feldman, and Nguyen Luong, three graduate students at the University of Chicago.

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages