# Traffic Delays (COMPLETE)
## Chapter 3.1-3.4 Describing Distribution

In [None]:
# This code will load the R packages we will use
library(coursekata)

# set styles
css <- suppressWarnings(readLines("https://raw.githubusercontent.com/jimstigler/jupyter/master/ck_jupyter_styles_v2.css"))
IRdisplay::display_html(sprintf('<style>%s</style>', paste(css, collapse = "\n")))

<div class="teacher-note">
    <b>Teacher Note:</b> The purpose of this mini-JNB is to practice creating histograms, and deepen their understanding for how bins and binwidth can be used to adjust the appearance of a distribution. Students will also practice summarizing the shape, center, spread, and weirdness of a distribution.
</div>

## 1. Can traffic lights cause delays?

Researchers want to know if different types of traffic lights affect traffic. 

There are two kinds of traffic lights we will consider today:
1. **Timed lights:** These change on a regular schedule, no matter how many cars or people are waiting.
2. **Flexible lights:** These adjust based on how many cars or people are nearby. For example, if there are no cars, the light might stay green longer for the other direction.

### 1.1 If a bus moves through its route, which kind of traffic lights would slow the bus down? Why do you think so?

### 1.2 About the `TrafficFlow` data frame.

Researchers ran a computer simulation where they tested how buses moved through traffic with different kinds of lights.

In each run of the simulation, they measured how long the bus was delayed (in minutes) when using timed lights or flexible lights. They also calculated the difference in delay between the two kinds of lights.

The `TrafficFlow` data frame has 24 rows, each representing a simulated bus route:
- `Timed`	Delay time (in minutes) for fixed timed lights
- `Flexible`- Delay time (in minutes) for flexible communicating lights
- `Difference`	Difference (Timed-Flexible) for each simulation

For more details, this is the link to the [R documentation for this dataset](https://search.r-project.org/CRAN/refmans/Lock5Data/html/TrafficFlow.html).

### 1.3 Write R code to take a look at the data frame.

In [None]:
# write code

# sample code
head(TrafficFlow)
glimpse(TrafficFlow)

## 2. Explore the distribution of delay times with `Timed` lights

### 2.1 Take a look at the distribution of delay times with `Timed` lights with a histogram.



<div class="teacher-note"><b>Teacher Note:</b>         
This will give students the chance to start with scaffolded code, and to see the default histogram.
</div>

In [None]:
# Modify this code
gf_histogram(~Thumb, data = Fingers)

# sample response
gf_histogram(~Timed, data = TrafficFlow)


### 2.2 Describe this distribution. 

Remember that we typically describe distributions by noting the shape, center, spread, and weird things.

Shape:

Center:

Spread:

Weird things:

<div class="teacher-note"><b>Sample Response</b>         

Shape: Skewed right

Center: A little over 100 minutes

Spread: about 15 minutes

Weird things: There is a big gap from about 115-135, and a few really high delay times near 140 minutes.

</div>

### 2.3 How would you adjust your code so that the histogram only has 5 bins?

In [None]:
# code here

gf_histogram(~Timed, data = TrafficFlow, boundary = 0, bins = 5)

### 2.4 How would you adjust your basic histogram so that it has a binwidth of 10?

In [None]:
# sample code

gf_histogram(~Timed, data = TrafficFlow, boundary = 0, binwidth = 10)

### 2.5 Does changing number of `bins` or the `binwidth` change the *center* of the distribution? Why or why not?

<div class="teacher-note">

<b>Teacher Note</b>:
Potential misconception: Some students may look at the "center" of the x-axis across all plots and conclude that it does not change (rather than the center of the distribution). 



<b>Sample Response</b>:
The center remains roughly around the same value, with just a little bit of variation.

</div>

## 3. Explore the distribution of delay times with `Flexible` lights

### 3.1 Take a look at the distribution of delay times with `Flexible` lights with a histogram.


In [None]:
# write code
gf_histogram(~ Flexible, data = TrafficFlow)

### 3.2 Describe this distribution. 

Remember that we typically describe distributions by noting the shape, center, spread, and weird things.

Shape:

Center:

Spread:

Weird things:

<div class="teacher-note"><b>Sample Response</b>         

Shape: roughly normal, but might seem slightly skewed right because of an outlier or two

Center: about 44 minutes

Spread: about 3-4 minutes

Weird things: The peak around 44 is quite higher than the rest of the distribution. There are some gaps in the right tail area.

</div>

### 3.3 Adjust your histogram of `Flexible` delay times so that it has just a few bins. How can you do that using only the `binwidth` argument?

In [None]:
# sample code
gf_histogram(~Flexible, data = TrafficFlow, boundary = 0, binwidth = 5)

### 3.4 If we added another case to the distribution that had a delay time of 20 minutes under the `Flexible` traffic lights, how would it change the histogram? How about a delay time of 60 minutes?
Would these new values fit into the existing bins?

<div class="teacher-note">

<b>Sample Response</b>:
These are values that are far from the rest of the data and we would need new bins outside the current range.
 
</div>

## 4. Compare delays with `Timed` and `Flexible` traffic lights

### 4.1 Based on the data you have looked at today, if you were planning the traffic lights for an urban intersection, what would you do?

<div class="teacher-note">

<b>Sample Response</b>:
If I needed to design a traffic light for an urban intersection and I wanted to reduce traffic delays, I would use Flexible traffic lights.
 

</div>