![Mercer SSE Image](http://paulemacneil.com/ECELogo.png)
# SSE 691 : Engineering Data Visualization
# Project #2
# October 26, 2015
***

Topics Covered | Topic Examples
------------ | -------------
Numpy & Pandas Modules | Powerful Arrays, Data Filtering & Analysis
Math Data Analysis | Contour Maps, 3D Graphs

***

# 1. Numpy

The Numpy module is a powerful module used for efficent numerical computing in Python.  It provides powerful and efficient types for *n*-dimensional arrays, vector and matrix arithmetic, linear algebra, etc.  One of the most commonly used objects provided by the Numpy module is the **ndarray** object.  It provides added functionality to Python's standard array object, and it has several similarities to Matlab's matrix data structure (matrix operations, linear algebra, signal processing, etc.).  The examples below illustrate some of these features.

In [None]:
# Import Numpy module
import numpy as np

In [None]:
# Create a Numpy array using a standard Python list
v = np.array([1, 2])
v

Even though the collection appears to be a standard Python array, it is actually an **ndarray** as can be seen below.

In [None]:
type(v)

The next few lines of code show just how easy it is to quickly create large array of varying dimensions.  For instance, the **zeroes** function accepts a single integer and returns a one-dimensional array made up exclusively of zeroes.

In [None]:
# Create a 1D Numpy array of 10 zeroes
v = np.zeros(10)
v

Passing a 2-integer set to the zeroes function produces two-dimensional array composed of zeroes.  Creating such an array with standard Python types would requires multiple lines of code, whereas here with Numpy's zeroes function, this is accomplished in one line of code!

In [None]:
# Create a 2D Numpy array of 10x10 zeroes
v = np.zeros((10, 10))
v

Once an ndarray has been created, Numpy provides quick mechanisms for altering the values within the array, or matrix.

In [None]:
# Now add 5 to each element such that it is now a 10x10 array of 5's
v + 5

In [None]:
# Create the 3x3 Identity Matrix
v = np.identity(3)
v

In [None]:
# Create a 1D Numpy array made up of every value between -5 and +5 while stepping every 0.5 (-5, -4.5, -4,..., 4, 4.5, 5 etc.)
v = np.arange(-5, 5.5, 0.5)
v

In [None]:
# Create the same array using Numpy's linspace function.
v = np.linspace(-5, +5, 21)  # Here, the step value (21) must be manually calculated and provided as the third parameter.
v

## 1.1 Interactive Example

The example below illustrates the efficiencies provided by Numpy's ndarray versus Python's standard array.  A multiplication table is provided based on a user's selection from a drop-down control (*n* x *n* table, where *n* = 5, 15, 25, 50).

In [None]:
# Create a function for generating a multiplication table using Python's comprehension feature.
def multiplicationTblPy(n):
    return np.array([[(i + 1) * (j + 1) for i in range(n)] for j in range(n)])

# Create a function for generating a multiplication table using Numpy functions.
def multiplicationTbl(n):
    M = np.arange(1, n + 1).reshape((-1, 1))
    M = np.tile(M, (1, n))
    N = np.arange(1, n + 1).reshape((1, -1))
    N = np.tile(N, (n, 1))
    return M * N

In [None]:
# Import modules for capturing executiong times, using ordered dictionaries,
# and displaying Dropdown widgets
from timeit import timeit
from collections import OrderedDict
from IPython.display import display, clear_output
from IPython.html.widgets import Dropdown


# Create a Dropdown widget that will be used to control the level
# of the multiplication table (5x5, 15x15, etc.)
dw = Dropdown(options = OrderedDict([("n=5", 5),
                                   ("n=15", 15),
                                   ("n=25", 25),
                                   ("n=50", 50)]))

# Create the callback function that will be used to update the display
# when the value of the Dropdown widget changes.
def dropdownValueChanged(sender, val):
    clear_output()
    execNum = 1000
    
    # Capture the execution times of the different implementations
    # used for creating the multiplication table
    compTime = timeit("multiplicationTblPy({})".format(val),
                      setup="from __main__ import multiplicationTblPy",
                      number=execNum)
    numpyTime = timeit("multiplicationTbl({})".format(val),
                       setup="from __main__ import multiplicationTbl",
                       number=execNum)
    
    # Display the results
    print("Using list comprehension @ n={0} -> {1:.9f} s".format(val, compTime))
    print("Using Numpy @ n={0} -> {1:.9f} s".format(val, numpyTime))
    print()
    print(multiplicationTbl(val))

# Assign the callback function to the event handler that is triggered
# when the value for Dropdown widget is changed.
dw.on_trait_change(dropdownValueChanged, 'value')

# Give the Dropdown widget an initial value.
dw.value = dw.options["n=5"]

# Display the Dropdown widget
display(dw)

# Manually trigger the value-changed event so that the display will update.
dropdownValueChanged(None, dw.value)

Note how the disparity between execution times differs as *n* increases (based on 1000 timed iterations of execution).  As *n* increases, the execution time for the list comprehension-based function, *multiplicationTblPy*, exponentially increases when compared to the Numpy-based function, *multiplicationTbl*, which only slightly increases even as *n* changes to 50!

# 2. Pandas

The Pandas module is most useful for the manipulation and analysis of numerical tables and time series.  The module provides several different types of data structures - such as the **DataFrame** - which are used to easily and efficiently analyze tabular data.  The Pandas module is also very useful when loading tabular data that contain different data types (versus just numeric values).  In the example below, tabular data is read in from a web-based data source for state populations and is analyzed, filtered and presented in graphical form.

In [None]:
# Import the Pandas module.
import pandas as pd

The Pandas module makes it very easy to create a data structure from a comma-separated values (csv) file via the **read_csv** function.  Here, the data source is simply a URL to a csv file containing population estimates for each state in 2014.

In [None]:
# Use the read_csv function to create a Pandas DataFrame containing the relevant data from the web-based csv file.
url = "http://www.census.gov/popest/data/state/asrh/2014/files/SCPRC-EST2014-18+POP-RES.csv"
data = pd.read_csv(url)

# Display the data.
data

As seen here, the data object is a *DataFrame* object, which is a Pandas type consisting of a two-dimensional labeled data structure with columns of potentially different types (like an Excel spreadsheet).

In [None]:
type(data)

Auxillary functions - like **shape** and **keys** - easily reveal the different properties and characteristics of the DataFrame.  Below, it is showing that *data* has 53 rows and 8 columns of data representing the population estimates of 18+ individuals in each state and the nation.  The headers of each column are also displayed.

In [None]:
data.shape, data.keys()

In [None]:
# Use the head function to view a subset of the data structure composed of the first few rows of the dataset.
data.head()

In [None]:
# Use the tail function to view a subset of the data structure composed of the last few rows of the dataset.
data.tail()

A nice feature of DataFrame objects is that each column of the DataFrame can be accessed through its name. In IPython, tab completion proposes the different columns of the data.

In [None]:
# Retrieve the name of the first state
data.NAME[1]  # Index 0 is the national statistic, so use an index of 1 for the name of the first state.

In [None]:
# Retrieve and display a view of the data structure composed of only the territory/state and the population numbers.
data[['NAME', 'POPESTIMATE2014', 'POPEST18PLUS2014']]

In [None]:
# Since the index is not apparent, use Boolean indexing to retrieve the data row for the state of Georgia.
data[data.NAME == "Georgia"]

In [None]:
# Use the indexer attribute, ix, to access a single element from the dataset when the column name is unknown.
data.ix[1]

In [None]:
# Create a Pandas DataFrame object that omits the national population data.
statesOnly = data.ix[1:]
statesOnly.head()

Note how the row for "United States" is not part of the *statesOnly* dataset.

In [None]:
# Retrieve a subset of states whose estimated 18+ population is > 10 million.
statesOnly = data.ix[1:]
statesOnly[statesOnly.POPEST18PLUS2014 >= 10000000]

The Pandas module also provides several built-in functions useful for accounting and statistics purposes.

In [None]:
# Display the mean, standard deviation, min/max, and 25/50/75 percent quantiles
# of the 2014 estimated population amongst the US states/territories.
statesOnly.POPESTIMATE2014.describe()

In [None]:
# Display the state with the lowest estimated total population.
statesOnly[statesOnly.POPESTIMATE2014 == statesOnly.POPESTIMATE2014.min()]

In [None]:
# Display the state with the largest estimated total population.
statesOnly[statesOnly.POPESTIMATE2014 == statesOnly.POPESTIMATE2014.max()]

The Pandas module is built on top of the Matplotlib module which makes it easier for visualizing the Pandas data structures through graphs.  In the example below, a horizontal bar graph is shown in a convenient graph providing an even greater visualization of the data structure.

In [None]:
# Plot population data using Pandas basic plot function to display a horizontal bar graph.

# Instruct this notebook to display the image inline with the text.
%matplotlib inline

# Create the horizontal bar graph of the population data.
my_plot = statesOnly.plot(x='NAME', y='POPESTIMATE2014', kind='barh', figsize=(12,12), legend=None,
                          title="2014 US State/Territory Population Estimate")
my_plot.set_xlabel("Estimated Population (Tens of Millions)")
my_plot.set_ylabel("State / Territory")

# 3. Data Visualizations

This section demonstrates the useful capabilities provided by the Numpy and Matplotlib modules for calculating and visualizing the characteristics of complex mathematical problems.  The Matplotlib module makes it very easy to quickly display interactive graphs of complex mathematical functions.  Each graph in this section is interactive allowing for panning, zooming, and saving of the graphs displayed here.

In [None]:
# Importing all of pylab imports Numpy and Matplotlib.
from pylab import *

In [None]:
# Instruct this notebook to display the graphs inline with the text, but with interactive capabilities enabled.
%matplotlib notebook

# Display an interactive graph of contrasting functions that includes gridlines, labels, etc.
figure(figsize=(8, 6))
x = arange(-15, 15, 0.1)
plot(x, sin(x), '-r', label=r'$f(x) = \mathrm{sin}(x)$')
plot(x, cos(x), '--g', label=r'$f(x) = \mathrm{cos}(x)$', lw=1.5)
xticks([-10, 0, 10])
yticks([-1, 0, 1])
ylim(-2, 2)
legend(loc=2)
grid()
title('Sine and Cosine functions displayed on the same graph.')

## 3.1 Contour Maps
This section demonstrates the inherent capabilities of the Matplotlib module in support of contour maps, both 2-D and 3-D.

### 3.1.1   2-D Contour Map of $f(x, y) = x^2 + y^2$

In [None]:
# Plot contour map of f(x, y)= x**2 + y**2
figure()

# Create vectors for the x/y-axis, respectively, to be used as the independent variables of the contour map.
x_vector = arange(-5, 5, 0.15)
y_vector = arange(-5, 5, 0.15)

# Create a matrix/grid from the independent variables for the contour function, z.
# Using a grid allows the contour function to be written as a function of x and y in Python.
x, y = meshgrid(x_vector, y_vector)

# z = f(x, y) = x**2 + y**2
z = (x**2 + y**2)

# Create & display the graph
contour(x, y, z, 20)
title('Contour map of {0}'.format(r'$f(x,y) = x^2 + y^2$'))

### 3.1.2  2-D Contour Map of $f(x, y) = x^2 - y^2$

Due to the interactive capabilities of the IPython notebook, the contour function, *z*, illustrated in the previous section can be quickly changed to reveal an entirely different contour map.  And because the contour function is a function of the independent variables of the coordinate grid created from Numpy's **meshgrid** function and is passed as a parameter to the Matplotlib's **contour** function, *z* can be quickly written as if it were a standard mathematical equation in Python.

In [None]:
# Plot contour map of f(x, y) = x**2 - y**2
figure()

# z = f(x, y) = x**2 - y**2
z = x**2 - y**2

# Create & display the graph
contour(x, y, z, 20)
title('Contour map of {0}'.format(r'$f(x,y) = x^2 - y^2$'))

###3.1.3  3-D Contour Map of $f(x, y) = x^2 + y^2$

In [None]:
# Import Matplotlib's 3-D toolkit, mplot3d, for 3-D graphs.
from mpl_toolkits.mplot3d import Axes3D

**NOTE:** The interactive controls for the 3-D graphs only allow for rotating and/or saving the view.  Moving and zooming features are not currently operational with Matplotlib's 3-D graphs.

In [None]:
# Similar to before, create the x/y-axis for the contour grid, except using more resolution for the 3-D graph.
x_vector = linspace(-1, 1, 100)
y_vector = linspace(-1, 1, 100)
x, y = meshgrid(x_vector, y_vector)
fig = figure()
z = x**2 + y**2
ax = fig.gca(projection='3d')
ax.plot_surface(x, y, z)
title('3-D contour map of {0}'.format(r'$f(x,y) = x^2 + y^2$'))

###3.1.4  3-D Contour Map of $f(x, y) = x^2 - y^2$

In [None]:
x_vector = linspace(-1, 1, 150)
y_vector = linspace(-1, 1, 150)
x, y = meshgrid(x_vector, y_vector)
fig = figure()
z = x**2 - y**2
ax = fig.gca(projection='3d')
ax.plot_surface(x, y, z)
title('3-D contour map of {0}'.format(r'$f(x,y) = x^2 - y^2$'))

##3.2 Fourier Expansion of a Square Wave

The Fourier series of a square wave is:

$f(x) = \frac{4}{\pi}\sum_{n=1,3,5,...}^\infty\frac{1}{n}\sin(\frac{n \pi x}{L})$, where $x \exists [0, 2L]$

See also ["Fourier Series" from *MathWorld*](http://mathworld.wolfram.com/FourierSeries.html).

In [None]:
# Graph the Fourier Expansion of a Square Wave
figure()

# Ensure subsequent graphs do not overwrite the existing graphs
hold(True)

# Number of points to display the wave
N = 256
L = 1
x = linspace(0, 2*L, N)
y = zeros(N)

for n in range(1, 8, 2):
    # the sine waves, added
    y += 4/(pi*n)*sin((pi*n*x)/L)
    
    # plot the graph
    plot(x, y, label='n={0}'.format(n))
    

# annotate the graph
axis([0, 2*L, -1.5, 1.5])
grid()
legend()
xlabel('Seconds')
ylabel('Value')
title('Fourier expansion of a Square Wave')

#See project folder for time log.