# Matplotlib and Visualizing Data

In [None]:
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(18)
sqft = np.random.randint(650, 2000, 2000).astype(np.int64)
price = sqft * 2 + np.random.normal(0, 300, 2000)
w = 4
b = 3

reg_x = np.linspace(600,2000,100)
reg_y = b + w * reg_x

In [None]:
plt.scatter(sqft, price)
plt.plot(reg_x, reg_y, c='orange')
plt.show()

In [None]:
w = 2
b = 3

reg_x = np.linspace(600,2000,100)
reg_y = b + w * reg_x

In [None]:
plt.scatter(sqft, price)
plt.plot(reg_x, reg_y, c='orange')
plt.show()

## plt.plot()
We can plot data very easily by taking any array/lst of data and passing it into plt.plot(data).
This generates a line plot though, so all points will be connected by straight lines.

In [None]:
#Setting up python for plotting

import matplotlib.pyplot as plt #this is the typical setup for plotting using matplotlib
import numpy as np

# Straight ploting (i.e. plt.plot(data)) plots line plots

plt.plot(np.arange(10)) # We can plot numpy arrays using plt.plot()
plt.show() # The plot won't actually show until we call plt.show()

plt.plot(np.random.rand(100))
plt.show() # This is what the random.rand generates


## Plot Figures
We can generate a figure `plt.figure()` which will be our canvas for our plots.

From here we can define **axes/subplots** by adding them `axe = fig.add_subplot(#nrows, #ncols, #position)`. *__nrows/ncols__ defines the the number of __subplots__, while position defines which __subplot__ the __axe__ is descirbing*

Example: *fig.add_subplot(2, 3, 4)* would create a 2x3 grid of plots, in which the current **axe** would refer to the fourth plot (first col, second row).

In [None]:
# Matplotlib's Figures

import matplotlib.pyplot as plt
import numpy as np

fig = plt.figure() #Usually we want to actually create a figure, a customized location for our plots

# Two line plots on the initial plot
ax1 = fig.add_subplot(2, 1, 1) #To a figure we can add sub_plots which creates an axes
ax1.plot(np.arange(100)) #Now we can target specific axes when we plot elements
ax1.plot(np.random.rand(100))

# Random noise plotted on the second graph
ax2 = fig.add_subplot(2, 1, 2) #The inputs are (# across, # down, #axes you want)
ax2.plot(np.random.rand(100))

plt.show()


## In class work: Problem 1
Plot 2 arrays (size = 1000) generated from numpy's random normal on the same subplot with the following criteria:
 1. The first dataset should be generated from a standard normal distribution
 2. The second dataset should be a normal distribution with a loc = 5 and scale = 2

In [None]:
#Problem 1


## Figures cont.
There are a lot of different types of graphs available to us through matplotlib:
 - Histogram
 - Scatter Plot
 - Line Plot
 - Etc.

In [None]:
# Matplotlib's Figures

fig = plt.figure() # Usually we want to actually create a figure, a customized location for our plots
fig.subplots_adjust(wspace = .5, hspace = .5) # We can adjust axes spacing with the figure method subplots_adjust

ax1 = fig.add_subplot(2, 2, 1)
ax2 = fig.add_subplot(2, 2, 2)
ax3 = fig.add_subplot(2, 2, 3)

ax1.hist(np.random.normal(size=100)) # hist() on a axes or plot will create a histogram
ax2.scatter(np.arange(100), np.random.randn(100)) # scatter() on an axes or plot will create a scatter plot
ax3.plot(np.arange(50)+(np.random.rand(50)*10))

plt.show()

## Labels and Coloring
Most graph types will be able to take a **color** parameter. This enables us to determine what color to choose for our data (note: you can pass char colors, hex colors, etc.)

In addition most matplotlib plots/figures have some form of a **title** method that lets you set a title. For axes it looks like `ax1.set_title('title string')`

In [None]:
# Labels and Coloring

import matplotlib.pyplot as plt
import numpy as np

fig = plt.figure() # Usually we want to actually create a figure, a customized location for our plots

ax1 = fig.add_subplot(1, 1, 1)
data = np.random.normal(loc = 0, scale = 1, size = 1000)
ax1.hist(data, color='r') # The color parameter, for most plots, defines the base color
ax1.set_title("Random Normal Histogram") # We can set the title with the set_title() method

plt.show()



### We can even add in gridlines
Similary there is a grid attribute we can set with `ax1.grid(Boolean)`

In [None]:
fig = plt.figure() # Usually we want to actually create a figure, a customized location for our plots

ax1 = fig.add_subplot(1, 1, 1)
data = np.random.normal(loc = 0, scale = 1, size = 1000)
ax1.hist(data, color='r') # The color parameter, for most plots, defines the base color
ax1.set_title("Random Normal Histogram") # We can set the title with the set_title() method
ax1.grid(True) # We can add a grid pretty easily too

plt.show()



## Tick Marks
For most graphs we want to highlight ceratin key **x** and **y** coordinates. To accomplish this, we can use `ax.set_xticks(values)` and `ax.set_xticklabels(labels)`.

In [None]:
# Setting tick marks - can be done for X and Y axes

import matplotlib.pyplot as plt
import numpy as np

data = np.random.normal(loc = 0, scale = 1, size = 1000)

fig = plt.figure()

ax1 = fig.add_subplot(1, 2, 1)
ax1.hist(data, color='r') # The normed parameter 
ax1.set_title("Random Normal Histogram")
ticks = ax1.set_xticks([-3, -2, -1, 0, 1, 2, 3]) # We can manually define the location of tick marks
# This sets labels to each tick mark
labels = ax1.set_xticklabels(["3-SD", "2-SD", "1-SD", "Mean", "1-SD", "2-SD", "3-SD"], rotation = 45)

ax2 = fig.add_subplot(1, 2, 2)
ax2.hist(data, color='r')

ax1.set_xlabel("Standard Deviations")
ax1.set_ylabel("Frequency")

plt.show()

## In class work: Problem 2
Generate a random normal dataset (at a new location and scale) and plot it's histogram. Create the x-tick marks in the same manner as above, but make it so they encompass every datapoint (need to determine min/max and SD's away)

In [None]:
#Problem 2


## Overlapping Graphs
We can overlay graph data simply by adding them to the same figure/axes

*Note: Depending on the scale of the data, some details may not be visible*

In [None]:
# Plotting multiple graphs on one graph
%matplotlib inline

import matplotlib.pyplot as plt
import numpy as np

data1 = np.arange(100) + 10 * np.random.randn(100)
data2 = np.random.normal(loc = 50, scale = 10, size = 100)
data3 = np.random.binomial(10, .05, 100)

fig = plt.figure()
ax1 = fig.add_subplot(1, 1, 1)

ax1.plot(data1, label = "0-100 + noise", color='b')
ax1.plot(data2, label = "Normal Dist Data")
ax1.plot(data3, label = "Binomial")

leg = ax1.legend(loc="best") # This adds a legend (based on labels) at the 'best' location ()
leg.draggable(state=True) # This only works if you run '%matplotlib notebook' in your notebook
# TALK about interactive environment
ax1.grid(True)

plt.show()

### Alpha/Opacitiy

While this can be extremely useful when trying to compare datasets, overlapping solid shapes can be hard to distinguish. The **alpha** parameter allows us to manipulate the opacity of our objects to visualize overlapping solid segments.

In [None]:
# Overlapping histograms
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches

data1 = np.random.normal(loc=12, scale=3, size=1000)
data2 = np.random.normal(loc=12, scale=3, size=1000)

fig = plt.figure()
ax1 = fig.add_subplot(1,1,1)

purple = mpatches.Patch(color='purple', label='overlap')

ax1.hist(data1, label='Mean8', alpha=.4, color='r')
ax1.hist(data2, label='Mean12', alpha=.4, color='black')
leg = ax1.legend(loc="best")

plt.show()

## Annotations and Drawings on Graphs
We can add additional data to our graphs with annotation and drawing functionality

---
### Patches
`patches.Shape(xy, width, height, color, args**)`

`axe.add_artist(patch)`

Rectangle Patch:
 - xy is the bottom left location of the Rectangle
 - width is the width
 - height is the height

---

### Annotations
`axe.annotate(str, xy, xytext, arrowproprs**)`
 - xy defines the location we are pointing to
 - xytext is the starting location for the text
 - arrowprops is a dictionary of properties for the arrow

In [None]:
#Annotations and Drawings

import matplotlib.pyplot as plt
import numpy as np
from matplotlib import patches #patches are necessary to add shapes to plot

data = np.random.normal(loc = 0, scale = 1, size = 5000)

fig = plt.figure()
ax1 = fig.add_subplot(1, 1, 1)
ax1.hist(data, color = 'r', normed = True)

rect = patches.Rectangle(xy=(-1, 0), width=2, height=1, color = 'b')
rect.set_alpha(.3) #Setting the alpha for any object changes its transparency
ax1.add_artist(rect)

ax1.annotate('68% of prob should fall in rectangle', xy=(0, .5), xytext=(.1, .65),
             arrowprops=dict(facecolor='black', shrink=0.05))


ax1.set_ylim((0, .7)) #We can change the axes limits by set_xlim or set_ylim to reshape scope

plt.show()


## In class work: Problem 3
Plot some randomly generated data and add an annotation at the max and min of the data generated.

In [None]:
#Problem 3
