## Better Ice Cream Sales through Minimization

In [2]:
# Run this cell to set up the notebook, but please don't change it.

# These lines import the Numpy and Datascience modules.
import numpy as np
from datascience import *

# These lines do some fancy plotting magic
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
import warnings
warnings.simplefilter('ignore', FutureWarning)
from matplotlib import patches
from ipywidgets import interact, interactive, fixed
import ipywidgets as widgets

# These lines load the tests.
from client.api.assignment import load_assignment 
tests = load_assignment('ice_cream.ok')

In this exercise, we'll use `minimize` to find an optimal location for an ice cream truck.  Minimization is useful in a vast array of applications - it's not just for finding the best line through a scatter plot!

You'll see 3 different ways to do minimization:
1. Using a slider to find the best location manually
2. Trying a bunch of locations using `apply` and finding the best one using `sort`
3. Using `minimize`

Data 8 is poised to disrupt the ice cream market.  We're catering to San Francisco hipsters, so we operate a truck that sells our locally-sourced organic Sriracha-Kale ice cream.  Today we have driven our truck to Ocean Beach, a long, narrow beach on the western coast of the city.

<img src="ocean_beach.jpg">

Upon arriving, we find that our potential customers are spread out along the beach.  We decide we want to park our truck in the location that's closest *on average* to all the customers.  That way, customers will be more likely to come to our truck.

(This may not be a great way to choose our truck's location.  Maybe you can think of a better way to decide on a location.)

We canvas the beach and record the location of each beachgoer in a table called `customers`.  The beach is oriented roughly North/South, and it's narrow, so we ignore how close each beachgoer is to the water.  We record only how far north each person is from the southern end of the beach.

Suppose there are 2 people on the beach, at 600 meters and 950 meters from the Southern end, respectively.  If we park our truck at 750 meters, the average distance from our truck to customers is:

$$\frac{|600 - 750| + |950 - 750|}{2}.$$



<img src="beach_locations.jpg">

By now, the Python code that computes this might look a little familiar:

In [70]:
# The customer locations:
two_customer_locations = make_array(600, 950)

first_truck_location = 750

two_customers_mean_distance_from_750 = np.mean(np.abs(two_customer_locations - first_truck_location))
two_customers_mean_distance_from_750

<div class="hide">\pagebreak</div>
#### Question 1
A new person shows up on the beach, so the new customer locations are 600, 950, and 1,150 meters from the southern end.  If we park our ice cream truck at the *mean* of those locations, what is the average distance from our truck to customers?

In [4]:
three_customer_locations = make_array(600, 950, 1150)

# Compute this.
three_customers_mean_distance_from_mean = ...
three_customers_mean_distance_from_mean

In [5]:
_ = tests.grade('q1')

<div class="hide">\pagebreak</div>
#### Question 2
The mean is 900 meters.  If we park our truck at 925 meters instead, what's the average distance from our truck to a customer?

In [73]:
# Fill in three_customers_mean_distance_from_925.  Use code to compute it.
three_customers_mean_distance_from_925 = ...
three_customers_mean_distance_from_925

In [7]:
_ = tests.grade('q2')

The average distance went down!  Despite what your intuition might say, the mean of the customer locations isn't the best location to pick.

Use the slider created by the next cell to find approximately the best location for the `three_customer_locations` dataset.  (You'll only be able to get within 5 of the best location.  It's okay if your submission doesn't display the slider.)

In [16]:
def three_customers_distance(location):
    return np.mean(np.abs(three_customer_locations - location))

interact(three_customers_distance, location=widgets.FloatSlider(min=700, max=1300, step=5, value=900, msg_throttle=1));

<div class="hide">\pagebreak</div>
#### Question 3
What location did you find?  What was the average distance to customers from that location?  Is that location around the same as any familiar statistic of the data?

*Write your answer here, replacing this text.*

#### The full dataset
Now let's look at the full customer dataset.  In this dataset, there are 1,000 people on the beach.  The next cell displays a histogram of their locations.

In [5]:
# Just run this cell.
customers = Table.read_table("customers.csv")
customers.hist(bins=np.arange(0, 2001, 100))
customers

Let's think very precisely about what we're trying to optimize.  Given these customer locations, we want to find a *single location*.  If we park our truck at that location, we want it to result in the smallest *average distance from our truck to customers*.

<div class="hide">\pagebreak</div>
#### Question 4
Write a function called `average_distance`.  It should take a single number as its argument (a truck location) and return the average distance from that location to the customers in the `customers` table.

In [8]:
def average_distance(location):
    # Fill in the function definition here.
    ...

# An example call to your function:
average_distance(1000)

In [12]:
_ = tests.grade('q4')

`average_distance` tells us how badly we're meeting our objective.  A mathematician would call this an *objective function*.  We want to find the distance that produces the smallest value of this objective function.

Use the slider created by the next cell to find approximately the best location for the `customers` dataset. (You'll only be able to get within 5 of the best location.)

In [9]:
interact(average_distance, location=widgets.FloatSlider(min=700, max=1300, step=5, value=800, msg_throttle=1));

<div class="hide">\pagebreak</div>
#### Question 5
What location did you find, and what was the average distance to customers from that location?

*Write your answer here, replacing this text.*

<div class="hide">\pagebreak</div>
#### Question 6
Create a table called `average_distances` with two columns:

1. `"location"`, a truck location.  The smallest location should be 700 and the largest should be 1300, and they should go up in increments of 1.
2. `"average distance to customers"`.  The average distance from customers (in the `customers` table) to that location.

**Then**, sort the table to find the location with the smallest average distance to customers.  Name the sorted table `sorted_average_distances`, and name the best location (a number) `best_location_by_sorting`.

**Hint:** The staff solution used the table method `apply`.  If you don't, you'll need to use a `for` loop, and your code will be longer than the skeleton suggests.

In [86]:
locations = Table().with_column("location", np.arange(700, 1300+1, 1))

average_distances = locations.with_column("average distance to customers", ...)

sorted_average_distances = ...
sorted_average_distances.show(5)

best_location_by_sorting = ...
best_location_by_sorting

The function `minimize` does basically the same thing you just did.

It takes as its argument a *function*, the objective function.  It returns the input (that is, the argument) that produces the smallest output value of the objective function.  If the objective function takes several arguments, it returns the arguments that produce the smallest output value of the objective function, all together in one array.

<div class="hide">\pagebreak</div>
#### Question 7
Use `minimize` to find the best location for our ice cream truck.

In [85]:
# Write code to compute the best location, using minimize.
best_location = ...
best_location

Your answer should match `best_location_by_sorting` up to a few decimal places.

Later in the day, the distribution of potential customers along the beach has changed.  `customers2.csv` contains their new locations.

In [12]:
customers2 = Table.read_table('customers2.csv')
customers2.hist(bins=np.arange(0, 2000+100, 100))

<div class="hide">\pagebreak</div>
#### Question 8
Find the new best location for our ice cream truck.

In [15]:
# Hint: The staff solution defined a function called average_distance2.
# We recommend doing that.
def average_distance2(location):
    ...

new_best_location = ...
new_best_location

If you'd like to check your answer, try doing what you did in question 6.  Your answer to question 3 may also be useful.

In [45]:
# For your convenience, you can run this cell to run all the tests at once!
import os
_ = [tests.grade(q[:-3]) for q in os.listdir("tests") if q.startswith('q')]

In [46]:
# Run this cell to submit your work *after* you have passed all of the test cells.
# It's ok to run this cell multiple times. Only your final submission will be scored.

!TZ=America/Los_Angeles jupyter nbconvert --output=".more_regression_$(date +%m%d_%H%M)_submission.html" more_regression.ipynb && echo "Submitted successfully."