Ticket to Ride Where?

Ticket to Ride Where? is a machine learning project that predicts public transit ridership rates in Illinois Census tracts. The project generates a model that predicts which tracts would increase their use of public transportation the most given an increase in transit access. The hope is that this model can be used as a tool to inform community-based decision-making about where to prioritize investments in public transit infrastructure.

Installation

In your preferred directory, clone the repository using git:

git clone https://github.com/FeldmanMike/transit-ml.git

Install required packages in a virtual environment:

pip3 install -r requirements.txt

Structure

main.py governs the text-based UI of this repository. In the terminal, users may choose to rely on archived versions of the data and/or best model or run the entire program from scratch (which will take several hours).

pipeline.py contains useful helper functions to get ACS data, impute missing values, run grid search over multiple pipelines, and identify and record information about best-performing models.

download.py and data_wrangling.py get data from ACS and the WalkScore API, merge it with files in data_sources, and performs necessary cleaning before returning a DataFrame ready for modeling.

model_selection.py takes the DataFrame, splits it into training and testing sets, and runs a grid search over pre-selected regression models and hyperparameters to identify the best model, which is saved to best_model.pkl.

recommend.py produces DataFrames of 1) Census tracts recommended for increased transit investment based on the results of the best model and 2) Census tracts recommended for further inspection based on a large positive difference between the best model's predictions and the tract's actual ridership rates.

CENSUS_DATA_COLS.json contains a dictionary mapping ACS 5-year table ID's to column labels with information about what each table contains.

The pickle_files folder contains .pkl files generated from other files in the repository relating to model selection and Census tracts recommended from recommend.py.

The data_sources folder contains shapefiles and LEHD Origin-Destination Employment Statistics (LODES).

Usage

To run using an archived version of the data and best model:

python3 main.py -m

To run using an archived version of the data:

python3 main.py -d

To run using no archives:

python3 main.py

Authors

The authors of this repository are Nathan Dignazio, Mike Feldman, and Nguyen Luong, three graduate students at the University of Chicago.

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ticket to Ride Where?

Installation

Structure

Usage

Authors

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
data_sources		data_sources
pickle_files		pickle_files
CENSUS_DATA_COLS.json		CENSUS_DATA_COLS.json
README.md		README.md
data_wrangling.py		data_wrangling.py
download.py		download.py
final_report.pdf		final_report.pdf
main.py		main.py
model_selection.py		model_selection.py
pipeline.py		pipeline.py
recommend.py		recommend.py
requirements.txt		requirements.txt

FeldmanMike/transit-ml

Folders and files

Latest commit

History

Repository files navigation

Ticket to Ride Where?

Installation

Structure

Usage

Authors

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages