# Basic Python Visualization Tools

Wheter you are an aspiring or actual data scientist, data analyst, or even just an analyst, there exists a lot of data these days. And because of this and the nature of your role, it is your job to help others understand all this data quickly and easily. Better put, you need to make large amounts of data easily digestiable for those around you. One great way to do this is: PICTURES! No not FB pics, but charts and graphs that let others quickly glean insight. 

With that said, this notebook is going introduce some of <b>python's basic visualiztion tools/packages</b>. The notebook will mostly use matplotlib and plotly.

## Histogram

A histogram is a graphical representation of the distribution of numerical data. It is an estimate of the probability distribution of a continuous variable (quantitative variable). To construct a histogram, the first step is to "bin" the range of values—that is, divide the entire range of values into a series of intervals—and then count how many values fall into each interval. The bins are usually specified as consecutive, non-overlapping intervals of a variable. The bins (intervals) must be adjacent, and are usually equal size.

Below, the notebook plots a histogram using height data with matplotlib and plotly. 

In [2]:
#Add some data to graph
Height=[65.78, 71.52, 69.4, 68.22, 67.79, 68.7, 69.8, 70.01, 67.9, 66.78,
 66.49, 67.62, 68.3, 67.12, 68.28, 71.09, 66.46, 68.65, 71.23, 67.13, 67.83, 
68.88, 63.48, 68.42, 67.63, 67.21, 70.84, 67.49, 66.53, 65.44, 69.52, 65.81, 
67.82, 70.6, 71.8, 69.21, 66.8, 67.66, 67.81, 64.05, 68.57, 65.18, 69.66, 67.97, 
65.98, 68.67, 66.88, 67.7, 69.82, 69.09]
Weight=[112.99, 136.49, 153.03, 142.34, 144.3, 123.3, 141.49, 136.46, 
112.37, 120.67, 127.45, 114.14, 125.61, 122.46, 116.09, 140.0, 129.5, 142.97, 
137.9, 124.04, 141.28, 143.54, 97.9, 129.5, 141.85, 129.72, 142.42, 131.55, 
108.33, 113.89, 103.3, 120.75, 125.79, 136.22, 140.1, 128.75, 141.8, 121.23, 
131.35, 106.71, 124.36, 124.86, 139.67, 137.37, 106.45, 128.76, 145.68, 116.82, 
143.62, 134.93]

In [6]:
import matplotlib.pyplot as plt
import numpy as np

#use matplotlib.pyplot to make a histogram
plt.hist(Height)
plt.title("Heights")
plt.xlabel("Value")
plt.ylabel("Frequency")

plt.savefig('basichistogram.png')

### Height Histogram with matplotlib

<img src="basichistogram.png" alt="basic histogram" align="left" style="width:304px;height:228px;">

In [15]:
#plot using plotly
import plotly.plotly as py
import plotly.graph_objs as go

data = [
    go.Histogram(
    x=Height
    )
]

py.iplot(data, filename='basic-histogram')

## Scatterplot

A scatter plot is a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of data. If the points are color-coded you can increase the number of displayed variables to three. The data is displayed as a collection of points, each having the value of one variable determining the position on the horizontal axis and the value of the other variable determining the position on the vertical axis.

Below, the notebook graphs a scaterplot of weight vs height in matplotlib and plotly.

In [39]:
#clear figure
plt.clf()

#ploat scatter plot
plt.scatter(Weight,Height,color='green')

#make titles
plt.title("Weight vs Height")
plt.xlabel("Weight")
plt.ylabel("Height")

plt.savefig('basicscatter.png')

### Weight Height Scatter Plot with matplotlib

<img src="basicscatter.png" alt="basic histogram" align="left" style="width:304px;height:228px;">

In [16]:
# Graph scattr using plotly
# Create a trace
trace = go.Scatter(
    x = Weight,
    y = Height,
    mode = 'markers'
)

data = [trace]

# Plot and embed in ipython notebook!
py.iplot(data, filename='basic-scatter')

## Bar Chart

A bar chart or bar graph is a chart that presents grouped data with rectangular bars with lengths proportional to the values that they represent. The bars can be plotted vertically or horizontally. A vertical bar chart is sometimes called a column bar chart.

Below, the notebook plots a bar chart in both matplotlib and plotly. The notebook does a quick transfromation (data reduction/reduces number of observations) of the data just to make it more conveninet to work with a bar chart.

In [32]:
#transform weight and height data into something more reasonable for a bar chart
import graphlab as gl

#put data into dataframe
w=gl.SArray(Weight)
h=gl.SArray(Height)
sf=gl.SFrame([w,h])

# arr for filter loop; will calculate ave heights for weight bins
# weight bins will be average weight with in the arr boundaries
arr=[0,110,120,130,140,150,300]
weight_av=[]
height_av=[]
i=0
while i<6:
    sf['test']=sf['X1'].apply(lambda x: 1 if (x>=arr[i] and x<arr[i+1]) else 0)
    test=sf.filter_by(1, 'test')
    test.__materialize__()
    weight_av.append(np.mean(list(test['X1'])))
    height_av.append(np.mean(list(test['X2'])))
    i+=1

In [33]:
weight_av

[104.53800000000001,
 114.38333333333333,
 125.421875,
 135.77111111111111,
 142.41461538461539,
 153.03]

In [34]:
height_av

[65.912000000000006,
 67.120000000000005,
 67.470624999999984,
 69.486666666666665,
 68.925384615384615,
 69.400000000000006]

In [41]:
#graph bar chart with matplotlib
#clear figure
plt.clf()

#ploat scatter plot
plt.bar(weight_av,height_av, width=2.0)

#make titles
plt.title("Weight Avs vs Height Avs in that Weight Bin")
plt.xlabel("Weight Avs")
plt.ylabel("Height Avs")

plt.savefig('basicbar.png')

### Weight Height Bar Plot with matplotlib

<img src="basicbar.png" alt="basic bar" align="left" style="width:304px;height:228px;">

In [42]:
#create bar chart in plotly
#set up data to be plotted
trace0 = go.Bar(
    x=weight_av,
    y=height_av,
    text=['ave wieght in 0 to 110 group', 'ave wieght in 110 to 120 group','ave wieght in 120 to 130 group','ave wieght in 130 to 140 group','ave wieght in 140 to 150 group','ave wieght in 150 to 300 group'],
    marker=dict(
        color='rgb(158,202,225)',
        line=dict(
            color='rgb(8,48,107)',
            width=1.5,
        )
    ),
    opacity=0.6
)

data = [trace0]
#add title
layout = go.Layout(
    title='Weight Avs vs Height Avs in that Weight Bin',
)

#set figure
fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='text-hover-bar')

## Making horizontal bar charts

In [48]:
#graph bar chart with matplotlib
#clear figure
plt.clf()

#ploat scatter plot
plt.barh(weight_av,height_av)

#make titles
plt.title("Weight Avs vs Height Avs in that Weight Bin")
plt.xlabel("Height Avs")
plt.ylabel("Weight Avs")

plt.savefig('basicbar_h.png')

### Weight Height Horizontal Bar Plot with matplotlib

<img src="basicbar_h.png" alt="basic bar h" align="left" style="width:304px;height:228px;">

In [49]:
#create horizontal bar chart in plotly
#set up data to be plotted
trace0 = go.Bar(
    x=height_av,
    y=weight_av,
    text=['ave wieght in 0 to 110 group', 'ave wieght in 110 to 120 group','ave wieght in 120 to 130 group','ave wieght in 130 to 140 group','ave wieght in 140 to 150 group','ave wieght in 150 to 300 group'],
    orientation = 'h',
    marker=dict(
        color='rgb(158,202,225)',
        line=dict(
            color='rgb(8,48,107)',
            width=3,
        )
    ),
    opacity=0.6
)

data = [trace0]
#add title
layout = go.Layout(
    title='Horizontal Bar Chart',
)

#set figure
fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='text-hover-h-bar')

## Conclusion

Well, that is it for the basic visualization tools. This notebook has only scratched the service. You can use these basic tools to create much more meaningful and aesthetically appealing visualizations; you just need to play around and explore. You can also, learn about other unique visualization tools that exist: word clouds, infographics, etc. 

This webpage also can aid in learning about other visualization packages in python: 
<a href="http://pbpython.com/visualization-tools-1.html">visualization tools</a>.

Anyway, Happy Exploring.
 