<a href="https://colab.research.google.com/github/5harad/DPI-617/blob/main/labs/surveillance-answers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Law, Order, and Algorithms**
## Estimating the prevalence and placement of surveillance cameras

**Getting started**

Before you start, create a copy of this Jupyter notebook in your own Google Drive by clicking `Copy to Drive` in the menubar. If you do not do this your work will not be saved! 

Remember to save your work frequently by pressing command-S or clicking File > Save in the menubar. 

We recommend completing this problem set in Google Chrome.

**Installing libraries**

As before, we'll start by loading the libraries that we'll use in this lab. This time we'll need to install the `ggmap` library first, which lets us visualize some of our geographic results. 

Run the cell below to install and load the libraries. It may take a minute or two to complete.

In [None]:
# install the library ggmap
install.packages('ggmap')

# load libraries
library(tidyverse)
library(ggmap)

# Set some formatting options
options(digits = 3, repr.matrix.max.rows = 10, repr.matrix.max.cols = 100)
theme_set(theme_bw())

## Background

In this lab, we'll work to replicate the results from [Sheng et al. (2021)](https://5harad.com/papers/surveilling-surveillance.pdf), where the authors estimate the prevalence of surveillance cameras across the United States using Google street view data.

Specifically, we will perform two tasks in this lab:
1. Estimate the density of surveillance cameras in San Francisco;
2. Examine the relationship between camera placement and the demographic composition of a neighborhood.

### Data

Run the cell below to load the data we'll be using in this lab.

In [None]:
# Load data
fname = 'https://github.com/5harad/DPI-617/raw/main/data/surveillance_sf.RData?raw=true'
load(url(fname))

We are loading four objects into the notebook. They are:

#### cameras_sf
(one row per image)
- `panoid`: ID of image 
- `lat-lon`: coordinates of image
- `period`: year of image
- `detected`, `verified`: whether a (verified) camera is in the image, explained below

#### census_sf
(one row per census block-group)
- `GEOID`, `NAME`: ID and name of census block group (CBG)
- `total_pop`: total population of CBG
- `total_white`: total non-Hispanic white population of CBG
- `geometry`: multipolygon shape of census block group

#### cameras_all
(one row per image, covering 10 U.S. cities)
- `panoid`: ID of image 
- `city`: city of image
- `period`: year of image
- `verified`: indicator for whether the image contains a verified camera
- `zone_type`: the designation of the area (e.g., `Residential` or `Commercial`) 
- `percentage_minority`: demographic composition of the CBG where the image was taken
- census_block_group: the CBG of the image

#### ggmap_sf
- saved map of San Francisco

In [None]:
head(cameras_sf)

### Object detection models

In order to identify cameras using street view images, we will use an [object detection](https://en.wikipedia.org/wiki/Object_detection) model.
An object detection model is a computer vision model that detects instances of objects of a certain class within an image.
In this particular case we are interested in detecting cameras in an image, but in practice such models can be used to detect all kinds of objects.

While object detection model performance has increased drastically due in large part to the rise of deep learning, enabling a number of previously impossible applications, they are not perfect models.
In order to understand how well the camera detection model works, we evaluate it using two useful metrics: [precision and recall](https://en.wikipedia.org/wiki/Precision_and_recall).

To define precision and recall, we need to distinguish between model predictions (i.e., whether the model *thinks* an image contains a camera) and the correct answer (i.e., whether an image *actually* contains a camera). The **precision** is the proportion of images that actually contain a camera, among those images where the model believes there is a camera. The **recall** is the proportion of images the model identified as having a camera, among those images that actually contain a camera.

In general, given an arbitrary model, we can define precision and recall in terms of model-predicted and actual "positive" and "negative" cases, terminology that comes from the epidemiology. In our case, a "positive" image is one that has a camera, and a "negative" image is one that does not. Using this terminology, we can then define the following quantities.

* TP: true positives; the number of predicted positive cases that were real positives 
* TN: true negatives; the number of predicted negative cases that were real negatives 
* FP: false positives; the number of predicted positives that were actually negative in the data (false alarms, Type I error)
* FN: false negatives; the number of predicted negatives that were actually positive in the data (Type II error) 

Their definitions can be illustrated using the following table:

<br>
<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0; margin: auto;}
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:#BBBBBB;}
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:#BBBBBB;}
.tg .tg-baqh{text-align:center;vertical-align:top}
</style>
<table class="tg">
  <tr>
    <th class="tg-baqh"></th>
    <th class="tg-baqh">Real positive</th>
    <th class="tg-baqh">Real negative</th>
  </tr>
  <tr>
    <td class="tg-baqh"><strong>Predicted positive</strong></td>
    <td class="tg-baqh">TP</td>
    <td class="tg-baqh">FP</td>
  </tr>
  <tr>
    <td class="tg-baqh"><strong>Predicted negative</strong></td>
    <td class="tg-baqh">FN</td>
    <td class="tg-baqh">TN</td>
  </tr>
</table>
<br>


Precision is defined as
$\frac{TP}{TP + FP} = \frac{N_{\text{correct model-identified cameras}}}{N_{\text{model-identified cameras}}}$

Recall is defined as
$\frac{TP}{TP + FN} = \frac{N_{\text{correct model-identified cameras}}}{N_{\text{real cameras}}}$


This diagram from Wikipedia visually illustrates these definitions:

<p><a href="https://commons.wikimedia.org/wiki/File:Precisionrecall.svg#/media/File:Precisionrecall.svg"><img width='350px' src="https://upload.wikimedia.org/wikipedia/commons/thumb/2/26/Precisionrecall.svg/1200px-Precisionrecall.svg.png" alt="Precisionrecall.svg"></a><br>
By <a href="//commons.wikimedia.org/wiki/User:Walber" title="User:Walber"&>Walber</a> - <span class="int-own-work" lang="en">Own work</span>, <a href="https://creativecommons.org/licenses/by-sa/4.0" title="Creative Commons Attribution-Share Alike 4.0">CC BY-SA 4.0</a>, <a href="https://commons.wikimedia.org/w/index.php?curid=36926283">Link</a></p>


### Exercise: calculate model precision
We previously trained a camera detection model, which we then applied to a sample of Google street view images. Afterwards, human annotators reviewed each of the images in which a camera was `detected` by the model and `verified` whether a camera was actually there.
Now, as an exercise, let's calculate the precision of our model using the columns `verified` and `detected` from the data frame `cameras_sf`. 

In [None]:
# Your code here!
# START solution
cameras_sf %>%
  filter(detected) %>%
  summarize(precision = mean(verified))
# END solution

### Estimating the prevalence of surveillance cameras in San Francisco

As we see above, the model's performance is far from perfect.
However, we are still able to use this model to help us estimate the number of cameras in SF.

By running the camera detection model on a random sample of street view images (in `cameras_sf`), we can estimate the total number of cameras in SF using three ingredients:

1. The number of verified cameras in this sample
1. The percentage of all roads in SF covered by the sample
1. The model's recall

###Exercise: Compute the number of verified cameras in this sample

In [None]:
# Calculate the number of verified cameras
# Store the answer in the variable n_verified
# Your code here!
# START solution
n_verified <- sum(cameras_sf$verified)
# END solution

### Exercise: calculate the fraction of roadway in SF covered by the sample

There are about 3.1 million meters of road in San Francisco, and an image covers, on average, 24.1 meters. When computing the fraction of roadway covered by the images, keep in mind that every street view image covers only one side of a two-sided street.

In [None]:
# Constants
# Total road length in San Francisco (in meters)
road_length_m <- 3108000
# Average length of road covered by one image in the dataset (in meters)
avg_image_length_m <- 24.1

# Calculate percentage of all roads in SF covered by the sample
# Store the answer in the variable frac_road_covered
# Your code here!
# START solution
n_images <- length(cameras_sf$panoid)
frac_road_covered <- avg_image_length_m * n_images / (2 * road_length_m)
# END solution

###Exercise: Estimating recall

Discuss how you would estimate the **recall** of our camera detection model. Is estimating recall easier or harder than estimating precision in our case?

Sheng et al. estimated the camera detection model's recall is 0.67. Now we are able to estimate the total number of cameras in SF visible from the road.


### Exercise: derive a formula to estimate the number of cameras in SF

Use the quantities we computed above to estimate the number of cameras in San Francisco.

Step 1. Using our estimated `recall`, estimate the actual number of cameras in our *sample* of street-view images.

Step 2. Using `frac_road_covered`, estimate the number of cameras in the whole city.

In [None]:
# Constants
# Recall of the camera detection model
recall <- 0.67

# Estimate camera detections and density
# Store the answer in the variable est_cameras
# Your code here!
# START solution
est_cameras_in_sample <- n_verified / recall
est_cameras <-  est_cameras_in_sample / frac_road_covered
# END solution

### Plotting the location of camera detections
We can plot the locations of detected cameras in our sample on a map to help us better understand their spatial distribution. What patterns do you notice in the placement of cameras?

In [None]:
ggmap(ggmap_sf, extent = "device") +
  geom_point(data = cameras_sf %>% 
               filter(verified),
             aes(x = lon, y = lat),
             position = "jitter", 
             shape = 25, fill = "white", color = "red",
             alpha = 1, size = 1.5) +
  theme(axis.text = element_blank(), 
        axis.title = element_blank(),
        axis.ticks = element_blank(),
        panel.grid = element_blank(),
        panel.border = element_blank())

### Racial disparities in camera placement
By performing the analysis above for multiple cities across the U.S. and combining the results with census data,
we can examine the relationship between camera density and share of minorities (defined as those who identify as either Hispanic or non-white) in that location.



In [None]:
cameras_all %>%
  ggplot(aes(x = percentage_minority, y = verified)) +
  #geom_hline(yintercept = avg_detection_rate, linetype = "dashed", color = "gray") + # avg detection rate
  geom_smooth(method = "lm", 
              formula = y ~ poly(x, degree = 2),
              se = T) +
  scale_x_continuous(
    name = "Minority share of population (in Census Block Group)", 
    #breaks = seq(0, 1, 0.1),
    expand = expansion(mult = c(0, 0.05)),
    labels = scales::percent_format(accuracy = 1)
  ) +
  scale_y_continuous(
    name = "Camera identification rate",  
    breaks = seq(0, 0.012, 0.003),
    expand = expansion(mult = c(0, 0.1)),
    labels = scales::percent_format(accuracy = 0.1)
  ) +
  theme(
    panel.grid = element_blank(),
    panel.border = element_blank(),
    axis.text = element_text(size = 16, family = "Helvetica", color = "black"),
    axis.title = element_text(size = 16, family = "Helvetica", color = "black"),
    axis.line = element_line(linewidth = 0.5, color = "black"),
    axis.ticks.x = element_line(linewidth = 0.5, color = "black"),
    axis.ticks.y = element_line(linewidth = 0.5, color = "black")
  ) 

###Exercise: Disparate impacts in camera placement

Discuss the plot above. What might be driving the results?