Skip to content
Repo for our ASSETS'19 paper applying ResNet to Project Sidewalk data
Jupyter Notebook Python Other
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
GSVutils wrote some nifty tools to provide progress reports and skip existing … Jul 12, 2019
black_crops
dataset_csvs
error_analysis last dump of new figs before paper submission Jul 23, 2019
figures_for_paper last dump of new figs before paper submission Jul 23, 2019
ground_truth repartitioned seattle panos Jul 12, 2019
new_cities
new_old_dataset_csvs made csvs to rerun old dataset with new sidecar files Mar 23, 2019
old
pytorch_pretrained added results from final re-run of new cities_data Jul 20, 2019
sliding_window a bunch of table tweaks Jul 20, 2019
tohme added back tohme Apr 20, 2019
visualizations forgot to pull.. shame Aug 30, 2019
.gitignore added size results file Jul 17, 2019
README.md updated link to final version of paper Aug 21, 2019
add_data_to_meta.py got crops working with seattle panos, running crops now Jun 29, 2019
downsample_dataset.py wrote code to randomly downsample a dataset Mar 8, 2019
environment.yml
make_csv_for_old_dataset.ipynb made csvs to rerun old dataset with new sidecar files Mar 23, 2019
make_train_test_sets.ipynb cleaned things up a bit more in the sliding window directory Mar 12, 2019
set_cropper.py wrote some nifty tools to provide progress reports and skip existing … Jul 12, 2019
sliding_window_crops_to_make.csv added list of panos used for sliding window analysis, sampled by most… Mar 23, 2019
train_set_set_sizes_new.txt wrote custom dataloader to add images and sidecar files Feb 23, 2019

README.md

Overview

This repository provides tools to train a neural network to detect sidewalk features in Google Streetview imagery, and tools to use a trained network. Everything is implemented in Python and Pytorch. For the purposes of our 2019 ASSETS submission, which you might want to read, the sidewalk features we focus on detecting are:

  • Curb Ramp
  • Missing Curb
  • Surface Problem
  • Obstruction

We add a fifth feature, null, to these categories to enable the network to detect the absence of sidewalk features.

Network Architecture

A significant point of the 2019 ASSETS paper focused on experimenting with different network architectures to improve performance. All our architectures are based upon Resnet, a popular family of neural network architectures that achieves state of the art performance on the ImageNet dataset.

The resnet architecture takes as input square color images, in the form of a 244 x 244 x 3 channel (RGB) vector. Instead of feeding an entire GSV panorama into the network, we input small crops from a panorama. We modify this network architecture by incorporating additional features, loosely divided into:

  • Positional Features, which describe where in the panorama a (potential) label is located, such as the X and Y coordinates in the panorama image, the yaw degree, and the angle above/below the horizon.
  • Geographic Features, which describe where in the city the panorama is located. These include the distance and compass heading from the panorama to the CBD, the position in the street block, and the distance to the nearest intersection.

Use Cases

We developed the system with the intention of applying it to two different tasks. While there is much in common between our two approaches for these two tasks, there are some differences, which are important to be aware of.

Validation Task

For validation, the neural network is input square crops taken from a GSV panorama, and attempts to identify the presence or absence of an accesiblity problem by classifying the image as a curb ramp, missing curb ramp, surface problem, or obstruction, or null. To achieve the best performance on this task, we trained the network on crops from GSV imagery which are directly centered around crowdsourced labels. To create examples of "null" crops, we randomly sampled crops from the imagery.

Labeling Task

For labeling, the model is tasked with locating and labeling all of the accessibility problems in an entire GSV panorama. Our approach for this task uses a sliding window technique, a standard technique for object detection in the computer vision community, which breaks the large scene into small, overlapping crops that are then passed into a neural network for classification. The neural network outputs a single predicted class for each crop: curb ramp, missing curb ramp, surface problem, obstruction, or null. Crops with a predicted class of null are ignored, and the remaining predictions are then clustered using non-maximum suppression. Overlapping predictions for a given label type are grouped together, and the prediction with the highest neural network output value or ‘strength’ is kept, while weaker predictions are suppressed.

Setup

For development, we used Ananconda to manage all neccesary Python packages. The pytorch_pretrained/environment.yml file should make it easy to create a new conda environment with the neccesary packages installed.

To do so, install anaconda, then cd into the pytorch_pretrained directory, and run:

conda env create -f environment.yml

Once this is done, activate the environment with:

conda activate sidewalk_pytorch

Training a Model

Todo

Using a Model

This section assumes that you already have a trained model, and you would like to use this model to validate or label GSV imagery. A large number of models are included in this repository, in the pytorch_pretrained/models directory. In this directory, each model is a *.pt file, which stores the parameters of the model which are then applied to the pre-defined architecture which is defined in pytorch_pretrained/resnet_extended*.py. The various models that are in pytorch_pretrained/models have been trained on a variety of different architectures incorporating different sets of the additional features described in the Overview, and trained on different datasets.

If the model you would like to use requires additional features, then you must use the TwoFileFolder dataloader, which makes it easy to load both a crop and its associated positional and geographic features into a single PyTorch vector.

Using a Model for Validation

Setup

As mentioned above, if you're planning on using a model that requires additional features, you should use the TwoFileFolder dataloader provided by pytorch_pretrained/TwoFileFolder.py. The dataloader expects your files to be organized with the following directory structure:

root/
     label1/
            file1.jpg
            file1.json
            file2.jpg
            file2.json
     label2/
            file3.jpg
            file3.json
            file4.jpg
            file4.json

Where the .jpg files are square crops (of any resolution) and the .json files contain the following fields:

{"dist to cbd": 4.094305012075221, "bearing to cbd": 64.74029765051874, "crop size": 1492.1348109969322, "sv_x": 9300.0, "sv_y": -1500.0, "longitude": -76.967779, "pano id": "__1c3_5IArbrml1--v7meQ", "dist to intersection": 23.45792342820621, "block middleness": 42.95327159437733, "latitude": 38.872448, "pano yaw": -179.67633056640602, "crop_y": 4828.0, "crop_x": 2655.9685763888974}

These crops and .json files can be produced easily and simulataneously using the bulk_extract_crops function from GSVutils/utils.py. This function takes as input a .csv with the following columns:

Pano ID, SV_x, SV_y, Label, Photog Heading, Heading, Label ID 

Using the Model

Once you've got your crops and additional feature .json files, you're ready to go. First, define your data transforms, and build the dataset from your crop directory using TwoFileFolder:

data_transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# the dataset loads the files into pytorch vectors
image_dataset = TwoFileFolder(dir_containing_crops, meta_to_tensor_version=2, transform=data_transform)

# the dataloader takes these vectors and batches them together for parallelization, increasing performance
dataloader    = torch.utils.data.DataLoader(image_dataset, batch_size=4, shuffle=True, num_workers=4)

# this is the number of additional features provided by the dataset
len_ex_feats = image_dataset.len_ex_feats
dataset_size = len(image_dataset)

With this done, we can load the model itself. First, we load the architecture, then we load the weights from the .pt file onto the architecture:

model_ft = extended_resnet18(len_ex_feats=len_ex_feats)

try:
    model_ft.load_state_dict(torch.load(model_path))
except RuntimeError as e:
    model_ft.load_state_dict(torch.load(model_path, map_location='cpu'))
model_ft = model_ft.to( device )
optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.9)

# this tells pytorch to not change the weights, since we're using the model to get predictions, not training
model_ft.eval()

Now, we can actually compute the predictions. We do this by looping over all the data in the dataloader, and computing the predictions. We accumulate these predictions in pred_out, and for simplicity, we accumulate the paths in paths_out:

for inputs, labels, paths in dataloader:
    inputs = inputs.to(device)
    labels = labels.to(device)

    # zero the parameter gradients
    optimizer_ft.zero_grad()

    with torch.set_grad_enabled(False):
        outputs = model_ft(inputs)
        _, preds = torch.max(outputs, 1)

        paths_out += list(paths)
        pred_out  += list(outputs.tolist())

With this finished, we now have our predictions (in integer form), and the corresponding image paths in paths_out. For example, if paths_out[0]is /example_dir/example_label/example_img.jpg, then preds_out[0] will be the integer prediction for example_img.jpg. What do I mean by integer prediction? To save memory, PyTorch assigns each string label an integer index, and stores those indices instead of the strings. Our labels are ('Missing Cut', "Null", 'Obstruction', "Curb Cut", "Sfc Problem"), so if paths_out[0] is 0, then the model assigned a prediction of Missing Curb Ramp to the image example_img.jpg.

Now we're pretty much finished! We can wrap this all into a single easy function for you to use for whatever purpose you like. This returns a list of (img_path, predicted_label) tuples.:

def predict_from_crops(dir_containing_crops, model_path):
    ''' use the TwoFileFolder dataloader to load images and feed them
        through the model
        a list of (img_path, predicted_label) tuples
    '''
    data_transform = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
        ])

    print "Building dataset and loading model..."
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

    image_dataset = TwoFileFolder(dir_containing_crops, meta_to_tensor_version=2, transform=data_transform)
    dataloader    = torch.utils.data.DataLoader(image_dataset, batch_size=4, shuffle=True, num_workers=4)

    len_ex_feats = image_dataset.len_ex_feats
    dataset_size = len(image_dataset)

    panos = image_dataset.classes

    print("Using dataloader that supplies {} extra features.".format(len_ex_feats))
    print("")
    print("Finished loading data. Got crops from {} panos.".format(len(panos)))


    model_ft = extended_resnet18(len_ex_feats=len_ex_feats)

    try:
        model_ft.load_state_dict(torch.load(model_path))
    except RuntimeError as e:
        model_ft.load_state_dict(torch.load(model_path, map_location='cpu'))
    model_ft = model_ft.to( device )
    optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.9)

    model_ft.eval()

    paths_out = []
    pred_out  = []

    print "Computing predictions...."
    for inputs, labels, paths in dataloader:
        inputs = inputs.to(device)
        labels = labels.to(device)

        # zero the parameter gradients
        optimizer_ft.zero_grad()

        with torch.set_grad_enabled(False):
            outputs = model_ft(inputs)
            _, preds = torch.max(outputs, 1)

            paths_out += list(paths)
            pred_out  += list(outputs.tolist())

    print "Finished!"
    pytorch_label_from_int = ('Missing Cut', "Null", 'Obstruction', "Curb Cut", "Sfc Problem")
    str_predictions = [pytorch_label_from_int[np.argmax(x)] for x in pred_out]

    return zip(paths_out, str_predictions)

Using a Model for Labeling

Todo

You can’t perform that action at this time.