![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

# Santa Visiting Homes in Strathcona County

There are a lot of homes in Strathcona County and Santa's internal GPS is malfunctioning. We [think](https://www.sciencealert.com/turns-out-we-have-no-idea-why-the-northern-lights-wreak-havoc-on-our-satellite-technology) the GPS interference is due to strong aurora borealis (Northern Lights) activity which are a result of intense solar storms. Luckily [Strathcona County's Open Data Portal](https://data.strathcona.ca/) includes all of the location data of homes in the county.

That’s where you as a data scientist come in. They’ve given you data and you need to reprogram Santa's GPS to figure out how to visit homes in the county on Christmas Eve in the most efficient manner possible.

## Importing Code
First we'll import some Python libraries that we'll use. These libraries are code that other people have written to help make our programming easier.

In [None]:
print('Importing code libraries...')
# We will store the data into a 'dataframe' using pandas
import pandas as pd
try: # if we are running in JupyterLite
    # pyodide is a Python interpreter that runs in the browser
    import pyodide
    # piplite will be used for installing packages
    import piplite
    await piplite.install(['nbformat','plotly','haversine','folium'])
except:
    pass
# Plotly will be used to visualize the data
import plotly.express as px
# We will visualize the coordinates in a map using the folium
import folium
# We want to cluster them using the FastMarkerCluster submodule from folium plugins
from folium.plugins import FastMarkerCluster
# The haversine library will help us calculate distances on a map
try:
    import haversine as hs
except:
    !pip install haversine
    import haversine as hs

print('Successfully imported Python libraries')

## Getting Data About Homes

Next we will: 

1. Retrieve the data from the Strathcona County Open Data Portal.
2. Put the data in a dataframe named `home_data`. Think of a dataframe as a powerful spreadsheet.
3. Have a look at the first five rows using `.head()`.

In [None]:
data_url = 'https://data.strathcona.ca/api/views/c9fr-ivqf/rows.csv?accessType=DOWNLOAD'
try:
    home_data = pd.read_csv(pyodide.http.open_url(data_url))
except:
    home_data = pd.read_csv(data_url)
home_data

## Visualizing Home Locations

Let's use folium to plot the home locations in our dataframe on an interactive map.

In [None]:
m = folium.Map(location=[53.5701, -113.0741], zoom_start=10)
m.add_child(FastMarkerCluster(home_data[['LATITUDE', 'LONGITUDE']].values.tolist()))
display(m)

## Counting Homes

That's a lot of homes for Santa to visit, and this is just in Strathcona County. To find out how many homes are in the data set we can use `.shape`.

In [None]:
home_data.shape

## Calculating Travel Time

We can approximate Santa's travel time using the equation $t = \frac{d}{v}$ where $t$ is time, $d$ is distance, and $v$ is speed or velocity.

Start by assuming that Santa can travel close to the speed of sound, or about 300 meters per second, and that he spends about 30 seconds in each home.

In [None]:
# Function with Santa's speed and time per home to calculate time required
# Feel free to change the values for flight_speed and time_per_home and re-run this cell
def calculate_required_time(travel_distance, number_of_homes):
    flight_speed = 300
    time_per_home = 30
    time_required = travel_distance / flight_speed + time_per_home
    return time_required
print('We have defined the function calculate_required_time()')

In [None]:
number_of_homes = home_data.shape[0]
total_distance = 0
previous_location = (53.5701, -113.0741) # starting from the middle of Strathcona County
for record in home_data.iterrows():
    current_location = (record[1]['LATITUDE'], record[1]['LONGITUDE'])
    travel_distance = hs.haversine(previous_location, current_location, unit=hs.Unit.METERS)
    total_distance = total_distance + travel_distance
    previous_location = current_location

print(total_distance, 'meters')
required_time = calculate_required_time(total_distance, number_of_homes)
print(required_time, 'seconds required.')
print(required_time/3600, 'hours required')

That seems like a long time, a little over a week. You can of course change the values in the `calculate_required_time()` function so Santa travels faster or spends less time in each home.

# Visualizing the Path

A better way to decrease the travel time, though, would be to visit homes in an optimal order. We will visualize this using the Cufflinks library for Plotly. Right now we just have Santa visiting homes in the order they are listed in the data:

In [None]:
px.line(home_data, x="LONGITUDE", y="LATITUDE")

Looking at just the first 50 homes, we can see that this is not an efficient path:

In [None]:
px.line(home_data.head(50), x="LONGITUDE", y="LATITUDE")

# Travelling Salesman Problem

Optimizing Santa's travel path is a version of the classic [travelling salesman problem](https://simple.wikipedia.org/wiki/Travelling_salesman_problem), which is actually a very hard mathematical problem to compute.

There hasn't yet been a good solution, and there is a [$1,000,000 prize](http://www.claymath.org/millennium-problems/p-vs-np-problem) available to anyone who solves it.

## Filtering and Sorting Data

Assuming that you haven't solved the travelling salesman problem already, we'll try to optimize Santa's route by eliminating some homes. Let's see what data categories are available to us in our `home_data` dataframe:

In [None]:
home_data.columns

There are a couple of column names that are interesting for our purposes: `'ASSESSCLAS'` and perhaps `'FIREPLACE'`. Let's look at the types of "assessment classes":

In [None]:
# We are using a for loop to identify the unique values of "assessment classes"
for assessment_class in home_data['ASSESSCLAS'].unique():
    print(assessment_class)

To get just the homes that are `'Residential'` try to run the following. Notice, we create a new dataframe called `home_data_filtered`.

In [None]:
condition = home_data['ASSESSCLAS']=='Residential'
home_data_filtered = home_data[condition]
home_data_filtered.shape

Or how about just the homes with fireplaces, try:

`condition = home_data['FIREPLACE']=='Y'`

In [None]:
# enter your code below


You can also specify two conditions like this, try it yourself:

---

`condition1 = home_data['ASSESSCLAS'] == 'Residential'`

`condition2 = home_data['YEAR_BUILT'] < 2000`

`home_data_filtered = home_data[(condition1) & (condition2)]`

---

`&` means **and**

`|` means **or**

In [None]:
# enter your code below


Ordering the data by latitude might also help. Notice that we create a new dataframe called `home_data_sorted`. 

`home_data_sorted = home_data_filtered.sort_values(by=['LONGITUDE'])`

In [None]:
home_data_sorted = home_data_filtered.sort_values(by=['LONGITUDE'])
px.line(home_data_sorted, x="LONGITUDE", y="LATITUDE")

## Graphing Data

Here are some graphs to help you visualize the data and make decisions about which homes to include in Santa's route. 

In [None]:
fireplaces = home_data['FIREPLACE'].value_counts()
px.pie(values=fireplaces.values, names=fireplaces.index, title='Buildings with Fireplaces').show()
px.bar(fireplaces, title='Buildings with Fireplaces').show()

px.bar(home_data['ASSESSCLAS'].value_counts(), title='Building Counts by Assessment Class').show()

### Challenge

After running the two code cells above, edit this cell and explain what you see in the plotted graphs.

1. Enter your explanation for the "FIREPLACE" graph here. Does this help you to decide which homes to include in Santa's route?
2. Enter your explanation for the "ASSESSCLAS" graph here. Does this help you to decide which homes to include in Santa's route?
3. In the code cell below, try to plot another graph based on a different category. Remember, you can list all the categories or column names using `home_data.columns`.

In [None]:
# enter your code below


What did you try graphing? What do you see in your graph? Does this help you to decide which homes to include in Santa's route?

* Enter your explanation for the graph you created.

## Calculating Total Time From a Dataframe

To make things easier, let's define a function that calculates Santa's total time using data in a dataframe.

After running this cell, we'll be able to see Santa's total time by calling this function with one of your dataframes like this:

`calculate_time_from_dataframe(home_data_sorted)`

In [None]:
# Function with Santa's speed and "time per home" to calculate total time required
def calculate_time_from_dataframe(df):
    flight_speed = 300  # in m/s, the speed of sound = 331 + 0.6*T where T is temperature is Celsius
    time_per_home = 30  # in seconds
    number_of_homes = df.shape[0]
    total_distance = 0
    previous_location = (53.5701, -113.0741) # the middle of Strathcona County
    for row in df.iterrows():
        current_location = (row[1]['LATITUDE'], row[1]['LONGITUDE'])
        travel_distance = hs.haversine(previous_location, current_location, unit=hs.Unit.METERS)
        total_distance = total_distance + travel_distance
        previous_location = current_location
    time_required = total_distance / flight_speed + time_per_home
    print(total_distance, 'meters')
    print(time_required, 'seconds')
    print(time_required/3600, 'hours')
    return time_required
print('We have defined the function calculate_time_from_dataframe()')

# Analysis Challenge

Try out different filtering and sorting ideas to see how best to minimize the time that Santa takes to visit homes in Strathcona County.

In [None]:
# enter your code below


In [None]:
# enter your code below


# Conclusions

Edit this cell to describe how you would **minimize Santa's travel time**. Include any data filtering and sorting steps that you recommend, and why you would recommend them.



## Reflections

Write about some or all of the following questions, either individually in separate markdown cells or as a group.
- What is something you learned through this process?
- How well did your group work together? Why do you think that is?
- What were some of the hardest parts?
- What are you proud of? What would you like to show others?
- Are you curious about anything else related to this? Did anything surprise you?
- How can you apply your learning to future activities?

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)