Skip to content

Replication repository for Hwang and Naik (2023). Code, data, and models for SSO@S pipeline for public use.

Notifications You must be signed in to change notification settings

Changing-Cities-Research-Lab/rep-gsv-trash

Repository files navigation

gsv-trash-replication-repo

This repository provides code, data, and training models to reproduce the SSO@S pipeline outlined in Hwang and Naik's (2023) paper, "Systematic Social Observation at Scale: Using Crowdsourcing and Computer Vision to Measure Visible Neighborhood Conditions." This repository accompanies data and statistical code to replicate tables and figures in the manuscript provided here.

Preferred Citation:

Hwang, J. and Naik, N. (2023). Unrestricted data and statistical code accompanying Hwang, J. and N. Naik. 2023. "Systematic Social Observation at Scale: Using Crowdsourcing and Computer Vision to Measure Visible Neighborhood Conditions". Stanford Digital Repository. Available at https://purl.stanford.edu/xy095yh6422. https://doi.org/10.25740/xy095yh6422.

Instructions for Set-up

Download Conda following the following online installation guide:

https://conda.io/projects/conda/en/stable/user-guide/install/download.html

Setup Conda virtual environment for all needed dependencies with the following commands:

conda env create -f environment.yml

(If on a M2 Mac use : conda env create -f environment_mac.yml )

conda activate gsv_trash

Python Scripts:

  • constants.py: Specifies global constants such as trash trueskill thresholds and hyperparams. Additionally, it holds information that can be used to train the resnet classifier model. IE Information such as where to store the CSV file, where to find images, where to output CSV files, etc.

  • discretize_trueskill.py: Creates csv using inputted thresholds and raw trueskill scores to produce true labels.

  • extract_vectors.py: Uses resnet model to produce feature embeddings of images.

  • build_image_directory.py: Given a directory of images and a csv of images and their labels, splits and copies into new folders based on true labels for resnet training.

  • trainer.py: Defines Trainer class that is utilized for training, checkpointing, evaluating, logging, and creating metrics for the resnet classifier training process.

  • train.py: Initiates trainer and data loaders utilized for the training process and begins the training process for the resnet.

  • image2vec.py: Class to convert images to vector embeddings used to train/test SVMs, uses trained Resnet Classifier to create the embeddings . Used to create CSV of columns: Image name, renet prediction, embedding, and label to be used in svm_classifier.py

  • model.py: Defines the Resnet backbone classifier model

  • svm_classifier.py: Given csv with image feature vectors and associated true labels, trains an SVC (or SVR if specified).

  • test_model.py: Suite of methods to help with error analysis/model testing

  • util.py: Provides miscellaneous helper functions

Training Pipeline:

Inputs: image_dir (directory with all images), trueskill_csv (a csv that contains image_name and associated score)

  1. use discretize_trueskill.py using the trueskill_csv produce a csv containing image_name and true label

  2. run build_image_directory.py use image_dir and discretize_trueskill.py output to create labeled image directories to be used for training

  3. run train.py to use the labeled images from the previous step to train a classifier model with resnet

(To read the evaluation metrics during the training process use the following command: tensorboard --logdir <LOG_DIR>)

  1. run extract_vectors.py utilizes a trained classifier model to extract training and test image vectors

  2. run svm_classify.py to utilize extracted feature vectors to train/test an SVM model

  3. run Trash_analysis.ipynb to make an analysis of the final trained classifier model.

Outputs: Resnet Classifier/Feature Extractor, Feature extractions of the images, Trained SVM classifier on Feature extractions, Tensorboard logs

Correspondence

Contact Jackelyn Hwang at jihwang@stanford.edu

License

Shield: CC BY 4.0

This work is licensed under a Creative Commons Attribution 4.0 International License.

CC BY 4.0

About

Replication repository for Hwang and Naik (2023). Code, data, and models for SSO@S pipeline for public use.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published