**IMPORTANT:** If you are running this notebook for the first time, please run the first code cell below by pressing Shift-Enter while in it. If not, please skip it. This digital case study has been developed for Jupyter Notebook Python 3 kernel.

In [1]:
%run ./resources/library.py

The <font color='red'>`style_notebook()`</font> function increases the font size to improve readability. You can skip the cell below if you don't need a larger font size.

In [2]:
style_notebook()

# Notebook 2: Calculating the Mean Center Point for Mortality Locations

## <font color='blue'>Background</font>

One of the benefits of digital data is the ability to use digital tools to triangulate observations using a multiple digital analytical tools. We will extend Dr. John Snow's toolkit by quickly performing mean center analysis of mortality locations in Soho District.

### Steps to carry out for Mean Center Analysis

* **Step 1.** Unpickle the dataframes from Notebook 1.
* **Step 2.** Review Mean Center Analysis equestions.
* **Step 3.** Transform Equations to Python Code
* **Step 4.** Recreate the Notebook 1 map, `map1`
* **Step 5.** Add Mean Center Point to Notebook 1 map

## <font color='blue'>Step 1. Unpickle Dataframe Files</font>

We learned to pickle the `deaths_df` and `pumps_df` dataframes in Notebook 1. To unpickle these, we will use the `pandas` dot function called <font color='red'>`.read_pickle()`</font>.

In [2]:
# Don't forget to import pandas
import pandas as pd

# Let's read the pickle files back into their dataframes.
deaths_df = pd.read_pickle('outputs/deaths_df.pickle')
pumps_df = pd.read_pickle('outputs/pumps_df.pickle')

## <font color='blue'>Step 2. Review Mean Center Analysis Equations</font>

Assuming those who got sick and died had access to water within walkable distance to a nearby water pump, let's obtain the mean center of all points, weighted by deaths at each point. 

The equations for mean center are as follows (written within `MarkDown` in [`Latex`](http://data-blog.udacity.com/posts/2016/10/latex-primer/)):

$\begin{align} \small Equation-1 && \normalsize\bar x _{weighted} = \normalsize\frac{\Sigma (x_{1..n} w_{1..n})}{\Sigma (w_{1..n})}\end{align}$

$\begin{align} \small Equation-2 && \normalsize \bar y _{weighted} = \normalsize\frac{\Sigma (y_{1..n} w_{1..n})}{\Sigma (w_{1..n})}\end{align}$

$\begin{align} \small Equation-3 && \normalsize {mean\ center} = \normalsize(\bar x _{weighted}, \bar y _{weighted}) \end{align}$

## <font color='blue'>Step 3. Transform Equations to Python Code</font>

1. We substitute longitude values for **`x`**, latitude values for **`y`** (<font color='red'>important to remember these variable mappings</font>), and deaths for **`w`**
2. We create two new columns, `product_LAT` (for numerator of Equation 1) and `product_LON` (for numerator of Equation 2), in the `deaths_df` dataframe. 
3. Let's display the new dataframe.

In [3]:
# For x and y respectively:
#   populate product_LAT and product_LON 
#   with LAT and LON values weighted (multipled) by deaths
deaths_df['product_LAT'] = deaths_df['LAT'] * deaths_df['DEATHS']
deaths_df['product_LON'] = deaths_df['LON'] * deaths_df['DEATHS']

# Let's copy this dataframe to a new one which we can save (pickle)
mean_center_df = deaths_df

# Let's display it. Type the command to display mean_center_df below.
mean_center_df.head()

Unnamed: 0,FID,DEATHS,LON,LAT,product_LAT,product_LON
0,0,3,-0.13793,51.513418,154.540254,-0.41379
1,1,2,-0.137883,51.513361,103.026722,-0.275766
2,2,1,-0.137853,51.513317,51.513317,-0.137853
3,3,1,-0.137812,51.513262,51.513262,-0.137812
4,4,4,-0.137767,51.513204,206.052816,-0.551068


Let's pickle this dataframe.

In [4]:
mean_center_df.to_pickle("outputs/mean_center_df.pickle")

Let's obtain the mean center coordinates by combining `mean_LAT` and `mean_LON`. We will use the <font color='red'>`sum()`</font> dot function of package `numpy`.

In [5]:
import numpy as np

Let's finish calculating Equation 1, solving for weighted mean `x` (longitude). We already have values for the product inside the parentheses, stored in `product_LON`. We use the `numpy` dot function <font color='red'>`.sum()`</font> to sum the numerator and denominator.

$\begin{align} \small Equation-1 && \normalsize\bar x _{weighted} = \normalsize\frac{\Sigma (x_{1..n} w_{1..n})}{\Sigma (w_{1..n})}\end{align}$

In [6]:
# This corresponds to the x bar, weighted, in the mean center formula
# Equation 1
mean_LON = np.sum(deaths_df['product_LON'])/np.sum(deaths_df['DEATHS'])

Let's finish calculating Equation 2, using the same approach and solve for weighted mean `y` (longitude).

$\begin{align} \small Equation-2 && \normalsize \bar y _{weighted} = \normalsize\frac{\Sigma (y_{1..n} w_{1..n})}{\Sigma (w_{1..n})}\end{align}$

In [7]:
# This corresponds to y bar, weighted, in the mean center formula
# Equation 2
mean_LAT = np.sum(deaths_df['product_LAT'])/np.sum(deaths_df['DEATHS'])

Let's put together Equation 3 and display the mean center point.

$\begin{align} \small Equation-3 && \normalsize {mean\ center} = \normalsize(\bar x _{weighted}, \bar y _{weighted}) \end{align}$

In [9]:
# Let's put these two together as coordinates, called mean_center_POINT
# Equation 3
mean_center_POINT = (mean_LAT, mean_LON)

mean_center_POINT

(51.51339831083845, -0.1364029734151329)

## <font color='blue'>Step 4. Recreate the Notebook 1 map, `map1`</font>

We can copy-paste all the code from the last map rendering in Notebook 1.

In [10]:
import pandas as pd
import folium

# let's import the folium plugins
from folium import plugins

deaths_df = pd.read_csv('resources/cholera_deaths.csv')
pumps_df = pd.read_csv('resources/pumps.csv')

SOHO_COORDINATES = (51.513578, -0.136722)

map1 = folium.Map(location=SOHO_COORDINATES, zoom_start=17)

folium.TileLayer('cartodbpositron').add_to(map1)

locationlist = deaths_df[["LAT","LON"]].values.tolist()
radiuslist = deaths_df[["DEATHS"]].values.tolist()

# Iterate through the rows of the cholera deaths data frame 
#    and add each cholera death circle marker to map1
for i in range(0, len(locationlist)):

    # Create a popup for each marker
    # Each marker will show point information and 
    #   number of deaths in that location.    
    popup = folium.Popup('Location: '+'('+str(locationlist[i][0])+\
                         ', '+str(locationlist[i][1])+')'+\
                         '<br/>'+\
                        'Deaths: '+ str(radiuslist[i][0]))
    
    # Add each circle marker with popup representing 
    #   a location with deaths to map1
    folium.CircleMarker(
                    location=locationlist[i], \
                    radius=radiuslist[i], \
                    popup=popup, \
                    color='black', \
                    weight=1, \
                    fill=True, \
                    fill_color='red', \
                    fill_opacity=1).add_to(map1)

# Iterate through the rows of the pumps_df data frame 
#   and add each water pump to map1
for each in pumps_df.iterrows():
    
    # Create a popup for each marker
    # Each marker will show pump location information
    popup = folium.Popup('Location: '+'('+str(each[1]['LAT'])+', '+str(each[1]['LON'])+')')

    # Add each circle marker with popup representing 
    #   a pump location to map1
    folium.RegularPolygonMarker([each[1]['LAT'],each[1]['LON']], \
                    color='black', \
                    weight=1,\
                    fill_opacity=1, \
                    fill_color='blue', \
                    number_of_sides=4, \
                    popup=popup, \
                    radius=10).add_to(map1)

# let's use the "Fullscreen" plugin
# add the button to the top right corner
plugins.Fullscreen(
    position='topright',
    title='Expand me',
    title_cancel='Exit me',
    force_separate_button=True
).add_to(map1)

map1

## <font color='blue'>Step 5. Add Mean Center Point to Notebook 1 map</font>

Let's plug that `mean_POINT` value into `map1` as a Folium `RegularPolygonMarker` and find out where the mean center is of case locations weighted by number of deaths in each location.

In [11]:
folium.CircleMarker(
            location=mean_center_POINT, \
            color='black', \
            weight=2, \
            fill_opacity=1, \
            fill_color="yellowgreen", \
            popup=folium.Popup('Mean Center Point: '+ \
                str(mean_center_POINT)), \
            radius=10).add_to(map1)
map1

##  Congratulations !   

You have:
1. Recreated the famous John Snow Cholera map within a Jupyter notebook
2. Added Mean Center analysis to triangulate observations of pump and mortality locations on the map

## References


### Weighted Mean Center

1. https://glenbambrick.com/tag/weighted-mean-center/
2. https://docs.scipy.org/doc/numpy/index.html
3. http://data-blog.udacity.com/posts/2016/10/latex-primer/

*For case study suggestions for improvement, please contact Herman Tolentino, Jan MacGregor, James Tobias or Zhanar Haimovich.*