# Lab 8

# Visualization with Matplotlib

In this lab, you'll be working through Chapter 4 to get an introduction to the data visualization and analysis package for Python, Matplotlib. This notebook is made up of two sections.

- Section 1: Work through the code samples in Chapter 4
- Section 2: Exercises

# Section 1: Code Practice

In this section, you will be reading through the various chapter sections and **typing out**/running the code samples given in the sections. The purpose of this is for you to practice using Jupyter to run Python code as well as learn about the functionality available to you in both IPython and Jupyter.

**Do not copy/paste the code**. Type it out. Don't go zen, either. Pay attention to the meaning of what you are typing. Pay attention to the parameters and the types of arguments. Find the similarities and differences among the various object APIs. 

---

##### Executing code in Jupyter

When typing and executing code in Jupyter, it is helpful to know the various keyboard shortcuts. You can find the full list of these by clicking **Help &rarr; Keyboard Shortcuts** in the menu. However, the two most useful keyboard shortcuts are:

- `Shift-Enter`: Execute the current cell and advance to the next cell. This will create one if none exists, but if a cell exists below your current cell, a new cell will **not** be created.
- `Alt-Enter`: Execute the current cell and **create** a new cell below.
- `Control-Enter`: Execute the current cell without advancing to the next cell

When writing your code, you will be using these two commands to make sure input/output (`In`/`Out`) is consistent with what is found in the chapter. If you create a cell by mistake, you can always go to **Edit &rarr; Delete Cells** to remove it.

#### Purpose of Section 1

Your purpose in this section is 

- **Type out** the code examples from the chapter (do not copy and paste)
- **Run** them
- **Check** to **make sure** you are getting the same results as what is contained in the chapter

---




## Multiple Subplots

[Chapter link](https://nbviewer.jupyter.org/urls/bitbucket.org/dogwynn/pythondatasciencehandbook/raw/master/notebooks/04.08-Multiple-Subplots.ipynb)

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-white')
import numpy as np

### `plt.axes`: Subplots by Hand

In [None]:
ax1 = plt.axes()  # standard axes
ax2 = plt.axes([0.65, 0.65, 0.2, 0.2])


In [None]:
fig = plt.figure()
ax1 = fig.add_axes([0.1, 0.5, 0.8, 0.4],
                   xticklabels=[], ylim=(-1.2, 1.2))
ax2 = fig.add_axes([0.1, 0.1, 0.8, 0.4],
                   ylim=(-1.2, 1.2))

x = np.linspace(0, 10)
ax1.plot(np.sin(x))
ax2.plot(np.cos(x));

### `plt.subplot`: Simple Grids of Subplots

In [None]:
for i in range(1, 7):
    plt.subplot(2, 3, i)
    plt.text(0.5, 0.5, str((2, 3, i)),
             fontsize=18, ha='center')

In [None]:
fig = plt.figure()
fig.subplots_adjust(hspace=0.4, wspace=0.4)
for i in range(1, 7):
    ax = fig.add_subplot(2, 3, i)
    ax.text(0.5, 0.5, str((2, 3, i)),
           fontsize=18, ha='center')

### `plot.subplots`: The Whole Grid in One Go

In [None]:
fig, ax = plt.subplots(2, 3, sharex='col', sharey='row')

In [None]:
# axes are in a two-dimensional array, indexed by [row, col]
for i in range(2):
    for j in range(3):
        ax[i, j].text(0.5, 0.5, str((i, j)),
                      fontsize=18, ha='center')
fig

### `plt.GridSpec`: More Complicated Arrangements

In [None]:
grid = plt.GridSpec(2, 3, wspace=0.4, hspace=0.3)

In [None]:
grid = plt.GridSpec(2, 3, wspace=0.4, hspace=0.3)

In [None]:
# Create some normally distributed data
mean = [0, 0]
cov = [[1, 1], [1, 2]]
x, y = np.random.multivariate_normal(mean, cov, 3000).T

# Set up the axes with gridspec
fig = plt.figure(figsize=(6, 6))
grid = plt.GridSpec(4, 4, hspace=0.2, wspace=0.2)
main_ax = fig.add_subplot(grid[:-1, 1:])
y_hist = fig.add_subplot(grid[:-1, 0], xticklabels=[], sharey=main_ax)
x_hist = fig.add_subplot(grid[-1, 1:], yticklabels=[], sharex=main_ax)

# scatter points on the main axes
main_ax.plot(x, y, 'ok', markersize=3, alpha=0.2)

# histogram on the attached axes
x_hist.hist(x, 40, histtype='stepfilled',
            orientation='vertical', color='gray')
x_hist.invert_yaxis()

y_hist.hist(y, 40, histtype='stepfilled',
            orientation='horizontal', color='gray')
y_hist.invert_xaxis()

---

## Text and Annotation

[Chapter link](https://nbviewer.jupyter.org/urls/bitbucket.org/dogwynn/pythondatasciencehandbook/raw/master/notebooks/04.09-Text-and-Annotation.ipynb)

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib as mpl
plt.style.use('seaborn-whitegrid')
import numpy as np
import pandas as pd

### Example: Effect of Holidays on US Births

Load births data using the following:

```python
births = pd.read_csv('https://belhavencs.nyc3.digitaloceanspaces.com/csc311/births.csv')
```

In [None]:
births = pd.read_csv('data/births.csv')

quartiles = np.percentile(births['births'], [25, 50, 75])
mu, sig = quartiles[1], 0.74 * (quartiles[2] - quartiles[0])
births = births.query('(births > @mu - 5 * @sig) & (births < @mu + 5 * @sig)')

births['day'] = births['day'].astype(int)

births.index = pd.to_datetime(10000 * births.year +
                              100 * births.month +
                              births.day, format='%Y%m%d')
births_by_date = births.pivot_table('births',
                                    [births.index.month, births.index.day])
births_by_date.index = [pd.datetime(2012, month, day)
                        for (month, day) in births_by_date.index]

In [None]:
fig, ax = plt.subplots(figsize=(12, 4))
births_by_date.plot(ax=ax);


In [None]:
fig, ax = plt.subplots(figsize=(12, 4))
births_by_date.plot(ax=ax)

# Add labels to the plot
style = dict(size=10, color='gray')

ax.text('2012-1-1', 3950, "New Year's Day", **style)
ax.text('2012-7-4', 4250, "Independence Day", ha='center', **style)
ax.text('2012-9-4', 4850, "Labor Day", ha='center', **style)
ax.text('2012-10-31', 4600, "Halloween", ha='right', **style)
ax.text('2012-11-25', 4450, "Thanksgiving", ha='center', **style)
ax.text('2012-12-25', 3850, "Christmas ", ha='right', **style)

# Label the axes
ax.set(title='USA births by day of year (1969-1988)',
       ylabel='average daily births')

# Format the x axis with centered month labels
ax.xaxis.set_major_locator(mpl.dates.MonthLocator())
ax.xaxis.set_minor_locator(mpl.dates.MonthLocator(bymonthday=15))
ax.xaxis.set_major_formatter(plt.NullFormatter())
ax.xaxis.set_minor_formatter(mpl.dates.DateFormatter('%h'));


### Transforms and Text Position

In [None]:
fig, ax = plt.subplots(facecolor='lightgray')
ax.axis([0, 10, 0, 10])

# transform=ax.transData is the default, but we'll specify it anyway
ax.text(1, 5, ". Data: (1, 5)", transform=ax.transData)
ax.text(0.5, 0.1, ". Axes: (0.5, 0.1)", transform=ax.transAxes)
ax.text(0.2, 0.2, ". Figure: (0.2, 0.2)", transform=fig.transFigure);

In [None]:
ax.set_xlim(0, 2)
ax.set_ylim(-6, 6)
fig


### Arrows and Annotation

In [None]:
%matplotlib inline

fig, ax = plt.subplots()

x = np.linspace(0, 20, 1000)
ax.plot(x, np.cos(x))
ax.axis('equal')

ax.annotate('local maximum', xy=(6.28, 1), xytext=(10, 4),
            arrowprops=dict(facecolor='black', shrink=0.05))

ax.annotate('local minimum', xy=(5 * np.pi, -1), xytext=(2, -6),
            arrowprops=dict(arrowstyle="->",
                            connectionstyle="angle3,angleA=0,angleB=-

In [None]:
fig, ax = plt.subplots(figsize=(12, 4))
births_by_date.plot(ax=ax)

# Add labels to the plot
ax.annotate("New Year's Day", xy=('2012-1-1', 4100),  xycoords='data',
            xytext=(50, -30), textcoords='offset points',
            arrowprops=dict(arrowstyle="->",
                            connectionstyle="arc3,rad=-0.2"))

ax.annotate("Independence Day", xy=('2012-7-4', 4250),  xycoords='data',
            bbox=dict(boxstyle="round", fc="none", ec="gray"),
            xytext=(10, -40), textcoords='offset points', ha='center',
            arrowprops=dict(arrowstyle="->"))

ax.annotate('Labor Day', xy=('2012-9-4', 4850), xycoords='data', ha='center',
            xytext=(0, -20), textcoords='offset points')
ax.annotate('', xy=('2012-9-1', 4850), xytext=('2012-9-7', 4850),
            xycoords='data', textcoords='data',
            arrowprops={'arrowstyle': '|-|,widthA=0.2,widthB=0.2', })

ax.annotate('Halloween', xy=('2012-10-31', 4600),  xycoords='data',
            xytext=(-80, -40), textcoords='offset points',
            arrowprops=dict(arrowstyle="fancy",
                            fc="0.6", ec="none",
                            connectionstyle="angle3,angleA=0,angleB=-90"))

ax.annotate('Thanksgiving', xy=('2012-11-25', 4500),  xycoords='data',
            xytext=(-120, -60), textcoords='offset points',
            bbox=dict(boxstyle="round4,pad=.5", fc="0.9"),
            arrowprops=dict(arrowstyle="->",
                            connectionstyle="angle,angleA=0,angleB=80,rad=20"))


ax.annotate('Christmas', xy=('2012-12-25', 3850),  xycoords='data',
             xytext=(-30, 0), textcoords='offset points',
             size=13, ha='right', va="center",
             bbox=dict(boxstyle="round", alpha=0.1),
             arrowprops=dict(arrowstyle="wedge,tail_width=0.5", alpha=0.1));

# Label the axes
ax.set(title='USA births by day of year (1969-1988)',
       ylabel='average daily births')

# Format the x axis with centered month labels
ax.xaxis.set_major_locator(mpl.dates.MonthLocator())
ax.xaxis.set_minor_locator(mpl.dates.MonthLocator(bymonthday=15))
ax.xaxis.set_major_formatter(plt.NullFormatter())
ax.xaxis.set_minor_formatter(mpl.dates.DateFormatter('%h'));

ax.set_ylim(3600, 5400);

---

## Customizing Ticks

[Chapter link](https://nbviewer.jupyter.org/urls/bitbucket.org/dogwynn/pythondatasciencehandbook/raw/master/notebooks/04.10-Customizing-Ticks.ipynb)

### Major and Minor Ticks

In [None]:
import matplotlib.pyplot as plt
plt.style.use('classic')
%matplotlib inline
import numpy as np

In [None]:
ax = plt.axes(xscale='log', yscale='log')
ax.grid();

In [None]:
print(ax.xaxis.get_major_locator())
print(ax.xaxis.get_minor_locator())

In [None]:
print(ax.xaxis.get_major_locator())
print(ax.xaxis.get_minor_locator())

### Hiding Ticks or Labels

In [None]:
ax = plt.axes()
ax.plot(np.random.rand(50))

ax.yaxis.set_major_locator(plt.NullLocator())
ax.xaxis.set_major_formatter(plt.NullFormatter())

In [None]:
fig, ax = plt.subplots(5, 5, figsize=(5, 5))
fig.subplots_adjust(hspace=0, wspace=0)

# Get some face data from scikit-learn
from sklearn.datasets import fetch_olivetti_faces
faces = fetch_olivetti_faces().images

for i in range(5):
    for j in range(5):
        ax[i, j].xaxis.set_major_locator(plt.NullLocator())
        ax[i, j].yaxis.set_major_locator(plt.NullLocator())
        ax[i, j].imshow(faces[10 * i + j], cmap="bone")


### Reducing or Increasing the Number of Ticks

In [None]:
fig, ax = plt.subplots(4, 4, sharex=True, sharey=True)

In [None]:
# For every axis, set the x and y major locator
for axi in ax.flat:
    axi.xaxis.set_major_locator(plt.MaxNLocator(3))
    axi.yaxis.set_major_locator(plt.MaxNLocator(3))
fig

### Fancy Tick Formats

In [None]:
# Plot a sine and cosine curve
fig, ax = plt.subplots()
x = np.linspace(0, 3 * np.pi, 1000)
ax.plot(x, np.sin(x), lw=3, label='Sine')
ax.plot(x, np.cos(x), lw=3, label='Cosine')

# Set up grid, legend, and limits
ax.grid(True)
ax.legend(frameon=False)
ax.axis('equal')
ax.set_xlim(0, 3 * np.pi);

In [None]:
ax.xaxis.set_major_locator(plt.MultipleLocator(np.pi / 2))
ax.xaxis.set_minor_locator(plt.MultipleLocator(np.pi / 4))
fig


In [None]:
def format_func(value, tick_number):
    # find number of multiples of pi/2
    N = int(np.round(2 * value / np.pi))
    if N == 0:
        return "0"
    elif N == 1:
        return r"$\pi/2$"
    elif N == 2:
        return r"$\pi$"
    elif N % 2 > 0:
        return r"${0}\pi/2$".format(N)
    else:
        return r"${0}\pi$".format(N // 2)

ax.xaxis.set_major_formatter(plt.FuncFormatter(format_func))
fig

---

## Customizing Matplotlib: Configurations and Stylesheets

[Chapter link](https://nbviewer.jupyter.org/urls/bitbucket.org/dogwynn/pythondatasciencehandbook/raw/master/notebooks/04.11-Settings-and-Stylesheets.ipynb)

### Plot Customization by Hand

In [None]:
import matplotlib.pyplot as plt
plt.style.use('classic')
import numpy as np


In [None]:
x = np.random.randn(1000)
plt.hist(x);

In [None]:
# use a gray background
ax = plt.axes(axisbg='#E6E6E6')
ax.set_axisbelow(True)

# draw solid white grid lines
plt.grid(color='w', linestyle='solid')

# hide axis spines
for spine in ax.spines.values():
    spine.set_visible(False)
    
# hide top and right ticks
ax.xaxis.tick_bottom()
ax.yaxis.tick_left()

# lighten ticks and labels
ax.tick_params(colors='gray', direction='out')
for tick in ax.get_xticklabels():
    tick.set_color('gray')
for tick in ax.get_yticklabels():
    tick.set_color('gray')
    
# control face and edge color of histogram
ax.hist(x, edgecolor='#E6E6E6', color='#EE6666');

### Changing the Defaults: `rcParams`

In [None]:
IPython_default = plt.rcParams.copy()

In [None]:
from matplotlib import cycler
colors = cycler('color',
                ['#EE6666', '#3388BB', '#9988DD',
                 '#EECC55', '#88BB44', '#FFBBBB'])
plt.rc('axes', facecolor='#E6E6E6', edgecolor='none',
       axisbelow=True, grid=True, prop_cycle=colors)
plt.rc('grid', color='w', linestyle='solid')
plt.rc('xtick', direction='out', color='gray')
plt.rc('ytick', direction='out', color='gray')
plt.rc('patch', edgecolor='#E6E6E6')
plt.rc('lines', linewidth=2)

In [None]:
plt.hist(x);


In [None]:
for i in range(4):
    plt.plot(np.random.rand(10))

### Stylesheets

In [None]:
plt.style.available[:5]

In [None]:
def hist_and_lines():
    np.random.seed(0)
    fig, ax = plt.subplots(1, 2, figsize=(11, 4))
    ax[0].hist(np.random.randn(1000))
    for i in range(3):
        ax[1].plot(np.random.rand(10))
    ax[1].legend(['a', 'b', 'c'], loc='lower left')

#### Default style

In [None]:
# reset rcParams
plt.rcParams.update(IPython_default);

In [None]:
hist_and_lines()

#### FiveThirtyEight style

In [None]:
with plt.style.context('fivethirtyeight'):
    hist_and_lines()


#### ggplot

In [None]:
with plt.style.context('ggplot'):
    hist_and_lines()

#### Baysian Methods for Hackers style

In [None]:
with plt.style.context('bmh'):
    hist_and_lines()


#### Dark background

In [None]:
with plt.style.context('dark_background'):
    hist_and_lines()


#### Grayscale

In [None]:
with plt.style.context('grayscale'):
    hist_and_lines()


#### Seaborn style

In [None]:
import seaborn
hist_and_lines()


---

## Three-Dimensional Plotting in Matplotlib

[Chapter link](https://nbviewer.jupyter.org/urls/bitbucket.org/dogwynn/pythondatasciencehandbook/raw/master/notebooks/04.12-Three-Dimensional-Plotting.ipynb)

In [None]:
from mpl_toolkits import mplot3d

In [None]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

In [None]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

### Three-dimensional Points and Lines

In [None]:
ax = plt.axes(projection='3d')

# Data for a three-dimensional line
zline = np.linspace(0, 15, 1000)
xline = np.sin(zline)
yline = np.cos(zline)
ax.plot3D(xline, yline, zline, 'gray')

# Data for three-dimensional scattered points
zdata = 15 * np.random.random(100)
xdata = np.sin(zdata) + 0.1 * np.random.randn(100)
ydata = np.cos(zdata) + 0.1 * np.random.randn(100)
ax.scatter3D(xdata, ydata, zdata, c=zdata, cmap='Greens');

### Three-dimensional Contour Plots

In [None]:
def f(x, y):
    return np.sin(np.sqrt(x ** 2 + y ** 2))

x = np.linspace(-6, 6, 30)
y = np.linspace(-6, 6, 30)

X, Y = np.meshgrid(x, y)
Z = f(X, Y)

In [None]:
fig = plt.figure()
ax = plt.axes(projection='3d')
ax.contour3D(X, Y, Z, 50, cmap='binary')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z');

In [None]:
ax.view_init(60, 35)
fig

### Wireframes and Surface Plots

In [None]:
fig = plt.figure()
ax = plt.axes(projection='3d')
ax.plot_wireframe(X, Y, Z, color='black')
ax.set_title('wireframe');

In [None]:
ax = plt.axes(projection='3d')
ax.plot_surface(X, Y, Z, rstride=1, cstride=1,
                cmap='viridis', edgecolor='none')
ax.set_title('surface');


In [None]:
r = np.linspace(0, 6, 20)
theta = np.linspace(-0.9 * np.pi, 0.8 * np.pi, 40)
r, theta = np.meshgrid(r, theta)

X = r * np.sin(theta)
Y = r * np.cos(theta)
Z = f(X, Y)

ax = plt.axes(projection='3d')
ax.plot_surface(X, Y, Z, rstride=1, cstride=1,
                cmap='viridis', edgecolor='none');

### Surface Triangulations

In [None]:
theta = 2 * np.pi * np.random.random(1000)
r = 6 * np.random.random(1000)
x = np.ravel(r * np.sin(theta))
y = np.ravel(r * np.cos(theta))
z = f(x, y)

In [None]:
ax = plt.axes(projection='3d')
ax.scatter(x, y, z, c=z, cmap='viridis', linewidth=0.5);

In [None]:
ax = plt.axes(projection='3d')
ax.plot_trisurf(x, y, z,
                cmap='viridis', edgecolor='none');

#### Example: Visualizing a Möbius strip

In [None]:
theta = np.linspace(0, 2 * np.pi, 30)
w = np.linspace(-0.25, 0.25, 8)
w, theta = np.meshgrid(w, theta)

In [None]:
phi = 0.5 * theta

In [None]:
# radius in x-y plane
r = 1 + w * np.cos(phi)

x = np.ravel(r * np.cos(theta))
y = np.ravel(r * np.sin(theta))
z = np.ravel(w * np.sin(phi))

In [None]:
# triangulate in the underlying parametrization
from matplotlib.tri import Triangulation
tri = Triangulation(np.ravel(w), np.ravel(theta))

ax = plt.axes(projection='3d')
ax.plot_trisurf(x, y, z, triangles=tri.triangles,
                cmap='viridis', linewidths=0.2);

ax.set_xlim(-1, 1); ax.set_ylim(-1, 1); ax.set_zlim(-1, 1);

---

## Geographic Data with Basemap

[Chapter link](https://nbviewer.jupyter.org/urls/bitbucket.org/dogwynn/pythondatasciencehandbook/raw/master/notebooks/04.13-Geographic-Data-With-Basemap.ipynb)

In [None]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap

In [None]:
plt.figure(figsize=(8, 8))
m = Basemap(projection='ortho', resolution=None, lat_0=50, lon_0=-100)
m.bluemarble(scale=0.5);

In [None]:
fig = plt.figure(figsize=(8, 8))
m = Basemap(projection='lcc', resolution=None,
            width=8E6, height=8E6, 
            lat_0=45, lon_0=-100,)
m.etopo(scale=0.5, alpha=0.5)

# Map (long, lat) to (x, y) for plotting
x, y = m(-122.3, 47.6)
plt.plot(x, y, 'ok', markersize=5)
plt.text(x, y, ' Seattle', fontsize=12);

### Map Projections

In [None]:
from itertools import chain

def draw_map(m, scale=0.2):
    # draw a shaded-relief image
    m.shadedrelief(scale=scale)
    
    # lats and longs are returned as a dictionary
    lats = m.drawparallels(np.linspace(-90, 90, 13))
    lons = m.drawmeridians(np.linspace(-180, 180, 13))

    # keys contain the plt.Line2D instances
    lat_lines = chain(*(tup[1][0] for tup in lats.items()))
    lon_lines = chain(*(tup[1][0] for tup in lons.items()))
    all_lines = chain(lat_lines, lon_lines)
    
    # cycle through these lines and set the desired style
    for line in all_lines:
        line.set(linestyle='-', alpha=0.3, color='w')

#### Cylindrical projections

In [None]:
fig = plt.figure(figsize=(8, 6), edgecolor='w')
m = Basemap(projection='cyl', resolution=None,
            llcrnrlat=-90, urcrnrlat=90,
            llcrnrlon=-180, urcrnrlon=180, )
draw_map(m)

#### Pseudo-cylindrical projections

In [None]:
fig = plt.figure(figsize=(8, 6), edgecolor='w')
m = Basemap(projection='moll', resolution=None,
            lat_0=0, lon_0=0)
draw_map(m)

#### Perspective projections

In [None]:
fig = plt.figure(figsize=(8, 8))
m = Basemap(projection='ortho', resolution=None,
            lat_0=50, lon_0=0)
draw_map(m);

#### Conic projections

In [None]:
fig = plt.figure(figsize=(8, 8))
m = Basemap(projection='lcc', resolution=None,
            lon_0=0, lat_0=50, lat_1=45, lat_2=55,
            width=1.6E7, height=1.2E7)
draw_map(m)

### Drawing a Map Background

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(12, 8))

for i, res in enumerate(['l', 'h']):
    m = Basemap(projection='gnom', lat_0=57.3, lon_0=-6.2,
                width=90000, height=120000, resolution=res, ax=ax[i])
    m.fillcontinents(color="#FFDDCC", lake_color='#DDEEFF')
    m.drawmapboundary(fill_color="#DDEEFF")
    m.drawcoastlines()
    ax[i].set_title("resolution='{0}'".format(res));

### Plotting Data on Maps

### Example: California Cities

Load data using the following:

```python
cities = pd.read_csv("https://belhavencs.nyc3.digitaloceanspaces.com/csc311/california_cities.csv")
```

In [None]:
import pandas as pd
cities = pd.read_csv('data/california_cities.csv')

# Extract the data we're interested in
lat = cities['latd'].values
lon = cities['longd'].values
population = cities['population_total'].values
area = cities['area_total_km2'].values

In [None]:
# 1. Draw the map background
fig = plt.figure(figsize=(8, 8))
m = Basemap(projection='lcc', resolution='h', 
            lat_0=37.5, lon_0=-119,
            width=1E6, height=1.2E6)
m.shadedrelief()
m.drawcoastlines(color='gray')
m.drawcountries(color='gray')
m.drawstates(color='gray')

# 2. scatter city data, with color reflecting population
# and size reflecting area
m.scatter(lon, lat, latlon=True,
          c=np.log10(population), s=area,
          cmap='Reds', alpha=0.5)

# 3. create colorbar and legend
plt.colorbar(label=r'$\log_{10}({\rm population})$')
plt.clim(3, 7)

# make legend with dummy points
for a in [100, 300, 500]:
    plt.scatter([], [], c='k', alpha=0.5, s=a,
                label=str(a) + ' km$^2$')
plt.legend(scatterpoints=1, frameon=False,
           labelspacing=1, loc='lower left');

### Example: Surface Tempurature Data

To save a local copy of temperature data:

```python
import gzip
import urllib.request
from pathlib import Path
path = Path('gistemp250.nc')
if not path.exists():
    response = urllib.request.urlopen('https://belhavencs.nyc3.digitaloceanspaces.com/csc311/gistemp250.nc.gz')
    path.write_bytes(gzip.decompress(response.read()))
```

In [None]:
# !curl -O http://data.giss.nasa.gov/pub/gistemp/gistemp250.nc.gz
# !gunzip gistemp250.nc.gz

In [None]:
from netCDF4 import Dataset
data = Dataset('gistemp250.nc')

In [None]:
from netCDF4 import date2index
from datetime import datetime
timeindex = date2index(datetime(2014, 1, 15),
                       data.variables['time'])

In [None]:
lat = data.variables['lat'][:]
lon = data.variables['lon'][:]
lon, lat = np.meshgrid(lon, lat)
temp_anomaly = data.variables['tempanomaly'][timeindex]

---

## Visualization with Seaborn

[Chapter link](https://nbviewer.jupyter.org/urls/bitbucket.org/dogwynn/pythondatasciencehandbook/raw/master/notebooks/04.14-Visualization-With-Seaborn.ipynb)

### Seaborn Versus Matplotlib

In [None]:
import matplotlib.pyplot as plt
plt.style.use('classic')
%matplotlib inline
import numpy as np
import pandas as pd

In [None]:
# Create some data
rng = np.random.RandomState(0)
x = np.linspace(0, 10, 500)
y = np.cumsum(rng.randn(500, 6), 0)

In [None]:
# Plot the data with Matplotlib defaults
plt.plot(x, y)
plt.legend('ABCDEF', ncol=2, loc='upper left');

In [None]:
# same plotting code as above!
plt.plot(x, y)
plt.legend('ABCDEF', ncol=2, loc='upper left');

### Exploring Seaborn Plots

#### Histograms, KDE, and densities

In [None]:
data = np.random.multivariate_normal([0, 0], [[5, 2], [2, 2]], size=2000)
data = pd.DataFrame(data, columns=['x', 'y'])

for col in 'xy':
    plt.hist(data[col], normed=True, alpha=0.5)

In [None]:
for col in 'xy':
    sns.kdeplot(data[col], shade=True)

In [None]:
sns.distplot(data['x'])
sns.distplot(data['y']);


In [None]:
sns.kdeplot(data);


In [None]:
with sns.axes_style('white'):
    sns.jointplot("x", "y", data, kind='kde');

In [None]:
with sns.axes_style('white'):
    sns.jointplot("x", "y", data, kind='hex')

#### Pair plots

In [None]:
iris = sns.load_dataset("iris")
iris.head()

In [None]:
sns.pairplot(iris, hue='species', size=2.5);

#### Faceted histograms

In [None]:
tips = sns.load_dataset('tips')
tips.head()

In [None]:
tips['tip_pct'] = 100 * tips['tip'] / tips['total_bill']

grid = sns.FacetGrid(tips, row="sex", col="time", margin_titles=True)
grid.map(plt.hist, "tip_pct", bins=np.linspace(0, 40, 15));

#### Factor plots

In [None]:
with sns.axes_style(style='ticks'):
    g = sns.factorplot("day", "total_bill", "sex", data=tips, kind="box")
    g.set_axis_labels("Day", "Total Bill");

#### Join distributions

In [None]:
with sns.axes_style('white'):
    sns.jointplot("total_bill", "tip", data=tips, kind='hex')

In [None]:
sns.jointplot("total_bill", "tip", data=tips, kind='reg');

#### Bar plots

In [None]:
with sns.axes_style('white'):
    g = sns.factorplot("year", data=planets, aspect=2,
                       kind="count", color='steelblue')
    g.set_xticklabels(step=5)

In [None]:
with sns.axes_style('white'):
    g = sns.factorplot("year", data=planets, aspect=4.0, kind='count',
                       hue='method', order=range(2001, 2015))
    g.set_ylabels('Number of Planets Discovered')

### Example: Exploring Marathon Finishing Times

To load the data:

```python
data = pd.read_csv('https://belhavencs.nyc3.digitaloceanspaces.com/csc311/marathon-data.csv')
```

In [None]:
# !curl -O https://raw.githubusercontent.com/jakevdp/marathon-data/ma

In [None]:
data = pd.read_csv('marathon-data.csv')
data.head()

In [None]:
data.dtypes

In [None]:
def convert_time(s):
    h, m, s = map(int, s.split(':'))
    return pd.datetools.timedelta(hours=h, minutes=m, seconds=s)

data = pd.read_csv('marathon-data.csv',
                   converters={'split':convert_time, 'final':convert_time})
data.head()

In [None]:
data.dtypes

In [None]:
data['split_sec'] = data['split'].astype(int) / 1E9
data['final_sec'] = data['final'].astype(int) / 1E9
data.head()

In [None]:
data['split_frac'] = 1 - 2 * data['split_sec'] / data['final_sec']
data.head()

In [None]:
sns.distplot(data['split_frac'], kde=False);
plt.axvline(0, color="k", linestyle="--");

In [None]:
sum(data.split_frac < 0)

In [None]:
g = sns.PairGrid(data, vars=['age', 'split_sec', 'final_sec', 'split_frac'],
                 hue='gender', palette='RdBu_r')
g.map(plt.scatter, alpha=0.8)
g.add_legend();

In [None]:
sns.kdeplot(data.split_frac[data.gender=='M'], label='men', shade=True)
sns.kdeplot(data.split_frac[data.gender=='W'], label='women', shade=True)
plt.xlabel('split_frac');


In [None]:
sns.violinplot("gender", "split_frac", data=data,
               palette=["lightblue", "lightpink"]);


In [None]:
data['age_dec'] = data.age.map(lambda age: 10 * (age // 10))
data.head()

In [None]:
men = (data.gender == 'M')
women = (data.gender == 'W')

with sns.axes_style(style=None):
    sns.violinplot("age_dec", "split_frac", hue="gender", data=data,
                   split=True, inner="quartile",
                   palette=["lightblue", "lightpink"]);

In [None]:
(data.age > 80).sum()

In [None]:
g = sns.lmplot('final_sec', 'split_frac', col='gender', data=data,
               markers=".", scatter_kws=dict(color='c'))
g.map(plt.axhline, y=0.1, color="k", ls=":");

---

# Section 2: Exercises

In this section, you will be provided a few exercises to demonstrate your understanding of the chapter contents. Each exercise will have a Markdown section describing the problem, and you will provide cells below the description with code, comments and visual demonstrations of your solution.

---

### Problem 1



Use the 2014 Seattle precipitation data set:

https://belhavencs.nyc3.digitaloceanspaces.com/csc311/Seattle2014.csv

Provide line plots and histograms (4, 8, 52 bins) of the following columns over the time period provided in the data:
- `'PRCP'`
- `'SNWD'`
- `'SNOW'`

You should have two figures, one for line plots and one for histograms. 

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Load the data
url = 'https://belhavencs.nyc3.digitaloceanspaces.com/csc311/Seattle2014.csv'
data = pd.read_csv(url)

# Convert 'DATE' column to datetime
data['DATE'] = pd.to_datetime(data['DATE'])

# Set the date as the index
data.set_index('DATE', inplace=True)

# Create line plots
plt.figure(figsize=(12, 6))
plt.subplot(3, 1, 1)
plt.plot(data.index, data['PRCP'], label='Precipitation', color='blue')
plt.title('Daily Precipitation in Seattle (2014)')
plt.ylabel('Precipitation (mm)')
plt.legend()

plt.subplot(3, 1, 2)
plt.plot(data.index, data['SNWD'], label='Snow Depth', color='cyan')
plt.title('Daily Snow Depth in Seattle (2014)')
plt.ylabel('Snow Depth (mm)')
plt.legend()

plt.subplot(3, 1, 3)
plt.plot(data.index, data['SNOW'], label='Snowfall', color='lightblue')
plt.title('Daily Snowfall in Seattle (2014)')
plt.ylabel('Snowfall (mm)')
plt.legend()

plt.tight_layout()
plt.show()

# Create histograms with different bin sizes
fig, axs = plt.subplots(3, 1, figsize=(12, 12))

# Histogram for 'PRCP'
axs[0].hist(data['PRCP'], bins=4, color='blue', alpha=0.7)
axs[0].set_title('Histogram of Precipitation (4 bins)')
axs[0].set_xlabel('Precipitation (mm)')
axs[0].set_ylabel('Frequency')

axs[1].hist(data['PRCP'], bins=8, color='blue', alpha=0.7)
axs[1].set_title('Histogram of Precipitation (8 bins)')
axs[1].set_xlabel('Precipitation (mm)')
axs[1].set_ylabel('Frequency')

axs[2].hist(data['PRCP'], bins=52, color='blue', alpha=0.7)
axs[2].set_title('Histogram of Precipitation (52 bins)')
axs[2].set_xlabel('Precipitation (mm)')
axs[2].set_ylabel('Frequency')

plt.tight_layout()
plt.show()


---

### Problem 2

- Load in the following datasets:
    - https://belhavencs.nyc3.digitaloceanspaces.com/csc311/FremontBridge.csv
        - The `index_col` should be `'Date'`
        - `parse_dates` should be `True`
        - Resample to daily counts
        - Variable name: `daily`
        - Reset columns to `['west', 'east']`
    - https://belhavencs.nyc3.digitaloceanspaces.com/csc311/BicycleWeather.csv
        - The `index_col` should be `'DATE'`
        - `parse_dates` should be `True`
        - Variable name: `weather`
- Create columns in `daily`
    - `'total'`: equal to the counts for `df.east + df.west`
- Create two columns in `weather`
    - `temp_min`: minimum temperature each day in Fahrenheit
    - `temp_max`: maximum temperature each day in Fahrenheit
    - `temp`: average of `temp_min` and `temp_max`
    - `precip`: `'PRCP'` column (in 1/10mm) converted to inches
- Produce a figure with three axes (x-axis aligned)
    - Axis 1: Total rider count per day
    - Axis 2: Total precipitation per day
    - Axis 3: Average temperature per day

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Load the Fremont Bridge dataset
daily = pd.read_csv('https://belhavencs.nyc3.digitaloceanspaces.com/csc311/FremontBridge.csv',
                     index_col='Date', parse_dates=True)
daily = daily.resample('D').sum()
daily.reset_index(drop=False, inplace=True)
daily.columns = ['Date', 'west', 'east']

# Create a 'total' column
daily['total'] = daily['west'] + daily['east']

# Load the Bicycle Weather dataset
weather = pd.read_csv('https://belhavencs.nyc3.digitaloceanspaces.com/csc311/BicycleWeather.csv',
                      index_col='DATE', parse_dates=True)

# Create columns in weather
weather['temp_min'] = weather['TMIN'] * 9/5 + 32
weather['temp_max'] = weather['TMAX'] * 9/5 + 32
weather['temp'] = (weather['temp_min'] + weather['temp_max']) / 2
weather['precip'] = weather['PRCP'] / 254.0  # Convert to inches

# Merge daily counts with weather data
daily['Date'] = pd.to_datetime(daily['Date'])
daily = daily.set_index('Date').join(weather, how='inner')

# Produce the figure with three axes
fig, axs = plt.subplots(3, 1, figsize=(10, 12))

# Total rider count per day
axs[0].plot(daily.index, daily['total'], color='blue', label='Total Riders')
axs[0].set_title('Total Rider Count Per Day')
axs[0].set_ylabel('Count')
axs[0].legend()

# Total precipitation per day
axs[1].plot(daily.index, daily['precip'], color='green', label='Precipitation')
axs[1].set_title('Total Precipitation Per Day (inches)')
axs[1].set_ylabel('Inches')
axs[1].legend()

# Average temperature per day
axs[2].plot(daily.index, daily['temp'], color='orange', label='Average Temp')
axs[2].set_title('Average Temperature Per Day (°F)')
axs[2].set_ylabel('Temperature (°F)')
axs[2].legend()

plt.tight_layout()
plt.show()
