<a href="https://colab.research.google.com/github/DataScienceAndEngineering/deep-learning-final-project-project-sidewalk/blob/main/docs/Project_Sidewalk_Report.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Project Sidewalk Report
##### Analyzing sidewalk accessibility through images of the street
##### Rabiul Hossain, Nicholas Leotta, Nancy Sea




# **Abstract**
***1-2 paragraphs of 200–250 words. Should concisely state the problem, why it is important,
and give some indication of what you accomplished (2-3 discoveries)***

This work aims to develop a system capable of quickly determining the level of accessibility of a given sidewalk. This system would assist visually impaired individuals with navigating densely obstructed walkways common in major cities and allow for autonomous navigation of dynamically shifting sidewalk environments. We utilize the Cityscapes Dataset, a collection of 5k images taken from the dashcam of a car around 50 European cities. Each image is associated with a fine, instanced annotation of 30 classes. We subsampled this dataset to approximately 3k images containing a portion of the visible, segmented sidewalk. We utilized multiple models capable of segmenting these images into component objects, planning further to process the location and identity of the objects to determine the degree of obstruction present in the image.

We explored multiple methodologies to obtain segmentations; MaskRCNN, UNet, SAM (Segment Anything Model), and YOLO, with Random Forest operating as a baseline. The MaskRCNN model was the best choice, as it can produce instanced segmentations while determining the class of the segmented object. The MaskRCNN model is built on FPN and ResNet101 and is pre-trained on the MS COCO dataset, producing instanced segmentations for everyday objects. By adding some additional dense layers for classification, we attempted to train the model to utilize the segmentations to predict observed obstruction of the sidewalk.

#**Introduction**
***State your data and research question(s). Indicate why it is important. Describe your
research plan so that readers can easily follow your thought process and the flow of the
report. Please also include key results at the beginning so that readers know to look for.
Here you can very briefly mention any important data cleaning or preparation. Do not talk
about virtual results i.e. things you tried or wanted to do but didn’t do. Virtual results are
worse than worthless. They highlight failure.***

The inspiration for this work is Project Sidewalk from The University of Washington. The University of Washington project aims to collect a dataset containing all locations and timestamps for sidewalk obstructions and accessibility items. They plan to utilize this data for machine learning, but their public APIs only provide access to the location information of identified obstacles and objects. We plan to expand upon this idea and create a system capable of automatically determining obstructions present on a sidewalk from simple images or video frames.

To simplify our research, we identified three tasks we must address. The first step was obtaining a segmented sidewalk region within an image. The second was to obtain segmentations of objects which would compose common sidewalk obstructions. The final step was to, utilizing the location and object type, determine the likelihood of each object constituting a suitable obstruction. We utilized Random Forest to obtain sidewalk segmentations from the input image to compare our model to a baseline prediction. We did not compose baseline predictions for total obstruction classification, as this problem is rather complex.

The first segmentation task was relatively easy, as obtaining semantic sidewalk through the use of pre-trained, as well as custom models, was largely successful. We obtained acceptable performance on a simple sidewalk segmentation task utilizing a custom-trained U-Net.  When compared to our baseline prediction, this implementation was incredibly successful. Random forest obtained a Dice score of 0.077, while the U-Net obtained a Dice score of .72 on the same validation dataset.

The second segmentation task was slightly more challenging than the initial. We required segmentations for each object in the image that could constitute a sidewalk obstruction. For each object to be processed individually, more than semantic segmentations is required. Instead, we would need to produce instanced segmentations. To accomplish this task, we explored utilizing multiple pre-trained models; MaskRCNN, Yolo, and SAM. Each model successfully obtained the required instanced segmentation from a minimally processed image; however, there were significant compatibility issues with utilizing each one. After significant tinkering, MaskRCNN was the best model to move forward within our specific development environment.

Finally, the obstruction classification was the most challenging aspect of this work. Utilizing the instanced segmentation model, we needed to append additional layers to determine the level of obstruction for each segmented object. To accomplish this, we added dense layers parallel to the class and bounding box determination layers, comparing obstruction prediction to an algorithmically determined ground truth obtained from an augmented dataset.

#**Background**
***Discuss other relevant work on solving this problem. Most of your references are here. Cite
 all sources. There is no specific formatting requirement for citations but be consistent.***

This project is largely inspired by Project Sidewalk from The University of Washington.  This original project is focused on crowd-sourcing accessibility information for a given city through gamefying the data collection process.  Their work was effective, with human participation obtaining a recall of 63% and a precision of 71% on properly annotating sidewalk images [Saha et al., 2019]. We propose that a completely automated machine learning system could surpass these results, and accurately determine sidewalk obstructions via deep learning models.

Recent advances in automatically producing instanced segmentations have made this goal a possibility.  New models, such as MaskRCNN [He et al., 2017] and SAM (Segment Anything Model) [Kirillov et al., 2023] are able to extract objects from images with incredible accuracy. We want to expand upon these models to allow categorization of segmented objects based on relative location and identity.  Utilizing these models as a starting point saves us from having to train an instanced segmenter from scratch, which is a very intensive and demanding process.  For example, SAM was trained on a dataset of 11 million for either 90k or 180k iterations, depending on desired mask quality [Kirillov et al., 2023] and MaskRCNN on a dataset of 135k images for 90k iterations [He et al., 2017].  Leveraging these models provides us with a solid foundation for instanced segmentation, allowing us to focus on determining how to add additional classification to the obtained outputs.



#**Data**
***Where you go the data. Describe the variables. You can begin discussing the data wrangling,
and data cleaning. Some EDA may happen here. This includes your data source (including
URL if applicable), any articles behind the data source.***

We utilized the CityScapes Dataset for the purposes of this model.  This dataset is composed of over 5k images collected by dashcam while driving around 50 German cities.  This dataset is extensive, however we only needed a subset of data for our purposes, notably images containing sidewalks.  After isolating and processing the data, we were left with slightly over 3k samples. Each sample included an image, a semantic segmentation mask for each object class, and a json file detailing polygon coordinates and class for each occurence of an object within the image (instanced segmentaiton).  After extraction, we applied a custom train/test/val split, allocating 10% of the data for both validation and testing, with the remaining 80% utilized for training.

To allow for training of the obstruction classifier, we constructed an augmented dataset, algorithmatically determining obstructions utilizing intersection over union (IOU) to score the prevelance of the object within the sidewalk boundary.

#**Methods**
***How did you take your data and set up the problem? Describe things like normalization,
feature selection, the models you chose. In this section, you may have EDA and graphs
showing the exploration of hyper-parameters. Note: Use graphs to illustrate interesting
relationships that are important to your final analyses. DO NOT just show a bunch of
graphs because you can. You should label and discuss every graph you include. There is no
required number to include. The graphs should help us understand your analysis process
and illuminate key features of the data.***

#**Evaluation**
***Here you are going to show your different models’ performance. It is particularly useful to
show multiple metrics and things like ROC curves (for binary classifiers). Make sure it is
clearly not just what the score is but for which instances in the data one has the largest
errors (in a regression), or just sample examples miss-classified. Make an attempt to
interpret the parameters of the model to understand what was useful about the input data.
Method comparison and sensitivity analyses are absolutely CRUCIAL to good scientific
work. To that end, you MUST compare at least 2 different methods from class in answering
your scientific questions. It is important to report what you tried but do so SUCCINCTLY.***

#**Conclusion**
***How well did it work? Characterize how robust you think the results are (did you have
enough data?) Try for interpretation of what the model found (what variables were useful,
what was not)? Try to avoid describing what you would do if you had more time. If you
have to make a statement about “future work” limit it to one short statement.***

#**Attribution**
***Using the number and size of github commits by author (bar graph), and the git hub
visualizations of when the commits occurred. Using these measures each person should
self-report how many code-hours of their work are visible in the repo with 2-3 sentences
listing their contribution. Do not report any code hours that cannot be traced to commits. If
you spend hours on a 2-line change of code or side-reading you did, you cannot report. If
you do searches or research for the project that does not result in code, you must create
notes in a markdown file (eg. in the project wiki) and the notes should be commensurate
with the amount of work reported. Notes cannot be simply copy-pasted from elsewhere
(obviously).***

#**References**
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., &amp; Schiele, B. (2016). The cityscapes dataset for Semantic Urban Scene understanding. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr.2016.350 

He, K., Gkioxari, G., Dollar, P., &amp; Girshick, R. (2017). Mask R-CNN. 2017 IEEE International Conference on Computer Vision (ICCV). https://doi.org/10.1109/iccv.2017.322 

Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A. C., Lo, W.-Y., Dollár, P., & Girshick, R. (2023). Segment Anything. ArXiv:2304.02643.

Saha, M., Saugstad, M., Maddali, H. T., Zeng, A., Holland, R., Bower, S., Dash, A., Chen, S., Li, A., Hara, K., &amp; Froehlich, J. (2019). Project sidewalk. Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3290605.3300292 







