## Introduction to Maps

Welcome to Lab 12! Today we will be exploring maps and how to create them by taking a look at the water usage, geography, and income in California. The water data for this lab was procured from the [California State Water Resources Control Board](http://www2.pacinst.org/gpcd/table.html) and curated by the [Pacific Institute](http://pacinst.org/). The map data includes [US topography](https://github.com/jgoodall/us-maps), [California counties](https://github.com/johan/world.geo.json/tree/master/countries/USA/CA), and [ZIP codes](http://bl.ocks.org/jefffriesen/6892860).

Today's lab is slightly different from lecture - we'll be using the `Map` function instead of `.map_table` because we're going to be working with areas on a map, and not points. However, ask your TA/UGSI to review `.map_table` if you feel like you still don't understand it!


As usual, run the cell below to prepare everything!

In [1]:
# Run this cell to set up the notebook, but please don't change it.

# These lines import the Numpy and Datascience modules.
import numpy as np
import math
from datascience import *

# These lines do some fancy plotting magic.
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
import warnings
warnings.simplefilter('ignore', FutureWarning)

# These lines load the tests.
from client.api.notebook import Notebook
ok = Notebook('lab12.ok')

In [None]:
_ = ok.backup()

Let's begin by loading in the data!

In [None]:
# Run this cell, but please don't change it.

districts = Map.read_geojson('water_districts.geojson')
zips = Map.read_geojson('ca_zips.geojson.gz')
usage_raw = Table.read_table('water_usage.csv', dtype={'pwsid': str})
income_raw = Table.read_table('ca_income_by_zip.csv', dtype={'ZIP': str}).drop('STATEFIPS', 'STATE', 'agi_stub')
wd_vs_zip = Table.read_table('wd_vs_zip.csv', dtype={'PWSID': str, 'ZIP': str}).set_format(make_array(2, 3), PercentFormatter)

Looking at Maps
======

The `districts` and `zips` data sets are `Map` objects. Documentation on mapping in the `datascience` package can be found at [data8.org/datascience/maps.html](http://data8.org/datascience/maps.html).  To view a map of California's water districts, run the cell below. Click on a district to see its description.

In [None]:
districts.format(width=400, height=200)

A `Map` is a collection of regions and other features such as points and markers, each of which has a **string** `id` and various properties. You can view the features of the `districts` map as a table using `Table.from_records`.

In [None]:
district_table = Table.from_records(districts.features)
district_table.show(3)

To display a `Map` containing only two features from the `district_table`, call `Map` on an array containing those two features from the `feature` column.

**Question 1** Draw a map of the Alameda County Water District (row 0) and the East Bay Municipal Utilities District (row 2).

In [None]:
# Fill in the next line so the last line draws a map of those two districts.
# Hint: Use .take to take the rows you want, and then use .column to get features to map
alameda_and_east_bay = ...
Map(alameda_and_east_bay, height=300, width=300)

In [None]:
_ = ok.grade('q1')

In the next cell, we've created a table called `zip_features` that contains each district, along with a lot of data about each district.

`zip_features` will allow us to investigate income and number of farmers in California. 

Explore the table!

In [None]:
#Run this cell, but please don't change it!

income_by_zipcode = income_raw.group('ZIP', sum) 
for label in income_by_zipcode.labels: 
    income_by_zipcode.relabel(label, label.replace(' sum', ''))
income = Table().with_columns(
        'ZIP', income_by_zipcode.column('ZIP'),
        'num returns', income_by_zipcode.column('N02650'),
        'total income ($)', income_by_zipcode.column('A02650'), 
        'num farmers', income_by_zipcode.column('SCHF') 
    )
income = income.where(income.column('ZIP') != '99999') 
with_averages = income.with_columns(
    "Proportion of farmers", income.column('num farmers')/income.column('num returns'),
    "Average income ($)", 1000*income.column('total income ($)') / income.column('num returns'))
zip_features = Table.from_records(zips.features)
zip_features = with_averages.join('ZIP', zip_features)
"""
ZIP: zip code of district
num returns: number of tax returns
total income ($) the total income of all tax returns in thousands of dollars
num farmers: number of farmer tax returns
proportion of farmers: proportion of tax returns from farmers
average income ($): average income for the district
"""
zip_features

To get your creative juices flowing, we've provided a simple example where we mapped only the districts that have a high average income (specifically one above $100,000!). 

Notice how we use can use `.where` to filter the table of districts and then map those districts by calling `Map`!

In [None]:
high_average_zips = zip_features.where('Average income ($)', are.above(100000)) 
Map(high_average_zips.column('feature'), width=400, height=300)

**Question 2** 

Investigate the above map a little more closely. Are there any associations that you can observe? 

*Write your answer here, replacing this text.*

**Question 3**: 

Now, think about how you can use `.where` to filter the table to create maps that allow us to visualize associations. Write down 3 ideas and share them with a neighbor or your TA/UGSI!

*Write your answer here, replacing this text.*

**Question 4** 

Now we've got to create the maps! Use the following cells to filter your tables and `Map` out the data! 

Remember: the function `Map` takes in an array of features as its input. 

In [None]:
# Here's an example of how to use the Map function: 
#  Because we haven't filtered the table, we're just Mapping every 
#  zipcode district that we have data on.
# Note: This example might take a while to load 
Map(zip_features.column("feature"))

In [None]:
#Use .where and Map to make your first Map!

In [None]:
#Use .where and Map to make you second Map!

In [None]:
#Use .where and Map to make your third Map!

**Question 5**

What did you learn? If you didn't observe any associations, its ok to note that there isn't an association! If you did, can you think of any reasons that might lead to said associations? 

Check your answers with your neighbors (if you mapped the same things) or your TA/UGSI!

*Write your answer here, replacing this text.*

In the following cell, we've created a table called `usage_features` that has data about water usage in California. 

In [None]:
usage_raw.set_format(4, NumberFormatter)
max_pop = usage_raw.select(0, 'population').group(0, max).relabeled(1, 'Population')
avg_water = usage_raw.select(0, 'res_gpcd').group(0, np.mean).relabeled(1, 'Water')
usage = max_pop.join('pwsid', avg_water).relabeled(0, 'PWSID')
usage_features = usage.join('PWSID', district_table)
"""
PWSID: the public water supply identifier of the district
Population: Estimate of average population served in 2015
Water: Average residential water use (gallons per person per day) in 2014-2015
"""
usage_features

**Question 6**

Calculate the average water usage per person across all avaliable water districts. Then create a map that displays which districts have an average water usage that is above that average. 

In [None]:
#Hint - remember that each district has a certain population
# If we want to know the average water usage per person, we need to 
# take this into account. 
avg_water_usage = ...
...

In [None]:
avg_water_usage

In [None]:
_ = ok.grade("q6")

**Question 7** Based on the map above, which part of California appears to use more water per person: the San Francisco area or the Los Angeles area?

*Write your answer here, replacing this text.*

**OPTIONAL:** Coloring Maps
=======

Here we will see that we can shade in maps with a certain key by using the `.color` function. 

`.color` takes as a first input a table with 2 columns, one for the PWSID, and one for the values that we wish to color by. 

In [None]:
population = usage_features.select("PWSID", "Population")
districts.color(population, key_on='feature.properties.PWSID')

**Question 8**
To investigate the use of water with a slightly better visual, let's use `.color` to create a map of the water usage of each PWSID. 

In [None]:
#Start by making a per_pwsid_usage table
per_pwsid_usage = ...

districts.color(per_pwsid_usage, key_on='feature.properties.PWSID') 

In [None]:
_ = ok.grade('q8')

**Question 9** Based on the shaded map above, does this verify our answer from question 7?

*Write your answer here, replacing this text.*

Reviewing `.map_table`
======

`.map_table` is a method that takes in a table that should contain the following: 

1. lat - the latitude of the point
2. long - the longitude of the point
3. name - the name of the point 
4. (optional) color - what color to represent the point with
5. (optional) area - the size of the point

ex. `Marker.map_table(table)`

ex. `Circle.map_table(table)`

Suppose that you own a business, and you want to map where you get the most sales in order to market to the right locations.

In [59]:
jan_sales = Table.read_table("SalesJan2009.csv").select(0, 1, 2, 3, 10, 11).relabeled("Latitude", "lat").relabeled("Longitude", "lon")
def remove_commas(string): 
    return string.replace(",", "")
jan_sales = jan_sales.with_column("Price1", jan_sales.apply(remove_commas, 'Price'))
jan_sales = jan_sales.with_column("Price2", jan_sales.apply(int, 'Price1'))
jan_sales = jan_sales.drop(2,6).relabeled("Price2", "Price")
jan_sales

**Question 10** Using `.map_table` map the location of your sales!

In [36]:
#Try using Circle! Make sure that you add a .show() onto the end of your code here to display it!
...
#Now try using Marker!
...

**Question 11** Where do the majority of your sales come from? 

*Write your answer here, replacing this text.*

**Question 12** You're considering dropping Mastercard, and making all your customers use Visa. Create a Map that draws Mastercard purchases in red and Visa purchases in blue.  

In [42]:
#Apply this function on the table to get the colors! Think about which column you should apply it on to!
def correct_color(credit_card_manu):
    if (credit_card_manu == "Mastercard"):
        return "red"
    else: 
        return "blue"

#Make your map here!
#Remember that a table must have the following columns in order to be mapped with color, in the same order, with the same names:
#  lat, lon, name, color
# For this question, use the Payment_Type as the column for name
sales_with_color = ...
...

**Question 13** Can you make a decision about this? Why or why not?

*Write your answer here, replacing this text.*

**Question 14** To make the money involved in the purchase show up on our map, let's scale the points with the area. Make sure that you multiply the prices by 100000000 so that we can actually see the points. 

In [47]:
sales_with_color

In [66]:
#Make your map here!
#Remember that a table must have the following columns in order to be mapped with area, in the same order, with the same names:
#  lat, lon, name, color, area
sales_with_purchase_amount = ...
...

**Question 15** Can we deduce anything from this map?

*Write your answer here, replacing this text.*

# Congrats! 

You're finished with lab12!

Make sure that you run all the cells below!

In [None]:
# For your convenience, you can run this cell to run all the tests at once.
_ = ok.grade_all()

In [None]:
# Run this cell to submit your work.
# You can submit as many times as you want.  If you want us to grade a
# submission other than your most recent one, you can choose which submission
# is graded at https://okpy.org/cal/data8r/su17/ .

_ = ok.submit()