# <span style="color: Steelblue"> Arrays, dataframes, and plotting </span>
## <span style="color: Steelblue"> A short primer on Numpy, Pandas, and Matplotlib </span>

>Jupyter Notebook: **'Arrays, dataframes & plotting with Python'**

>Goal: first steps with Numpy, Matplotlib & Pandas

>By: H.J. Megens

>Where you can reach me: hendrik-jan.megens -at- wur.nl

>Last modified: 26 September 2017

## Arrays and basic Numpy

Arrays are one-dimensional (vectors), two-dimensional, or multidimensional datastructures that hold data. One characteristic of arrays is that they only hold one type of data (usually numerical, either int or float, but strings work too). Numpy has a specialized package for working with numerical array data, called Numpy. Numpy is extremely efficient in working with numerical data and has many built-in methods for mathematical operations. Numpy is one of the major reasons Python has become very important in the 'Data Analytics' wave, also known as 'Big Data' revolution, and sometimes referred to as 'machine learning'. Here you will briefly explore a few characteristics of Numpy.

### Basic properties of vectors and arrays


In [None]:
# Cell #1
# First we import the numpy module
import numpy as np # accepted convention of importing numpy

In [None]:
# Cell #2
# How to make a numpy array ...
# ... or rather a 1-dim vector
a = np.array([1,2,3])
print(a)  # see what it contains
print(a.dtype) # see what datatype the array is

In [None]:
# Cell #3
# an another one...
b = np.array([4,5,6])
b

In [None]:
# Cell #4
# doing simple mathematical operations is easy-peasy.
a + b

In [None]:
# Cell #5
# Just note how different this behavior is from working with normal lists:
a_list = [1,2,3]
b_list = [4,5,6]
a_list + b_list

In [None]:
# Cell #6
# Concatenating matrices or vectors
# This is an example of a so-called 'vertical-stack'
# Other options are 'hstack' - horizontal - and 'dstack'
c = np.vstack((a,b))

In [None]:
# Cell #7
# show contents of c
c

In [None]:
# Cell #8
# get the value of the first row, and second column
c[0,1]

In [None]:
# Cell #9
# Notice the difference between retrieving elements from
# simple lists in simple lists
c_list = [[1,2,3],[4,5,6]]
# In this case you need to go 'one deeper' into the list
c_list[0][1]

In [None]:
# Cell #10
# various mathematical operations can be applied to matrices
# for instance squaring
d = c**2

In [None]:
# Cell #11
# d is again a 3x2 array, now containing the squared values of c
d

In [None]:
# Cell #12
# 'd' is a numpy array object, which has several built-in methods, such
# as summing the values
d.sum()

In [None]:
# Cell #13
# larger then?
d > 20

In [None]:
# Cell #14
# You can change values in an array
d[0,1] = 21
d

In [None]:
# Cell #15
# Which ones now larger than 20?
d > 20

### Selecting elements from arrays based on logic values - True or False?

Assume you would like to slice a list based on an arbitrary list that simply states 'select yes-or-no'. You could try something like this:

In [None]:
# Cell #16

mylist = ['a','b','c','d']
my_logic_list = [False,True,True,False]
mylist[my_logic_vector]


... but that doesn't work for the standard Python lists, in fact the only way to extract locations from a string or list is to use slices. In this case, because the elements we want to extract are adjacent, we could do the following, but in case the elements you want to select are not adjacent, that won't work.

In [None]:
# Cell 17
mylist[1:3]

With numpy arrays/vectors, you can select an arbitrary selection of elements, based on supplying a vector (or array) of similar dimensions that contains Booleans `True` or `False` in the order of elements to be selected, or not selected. For instance:

In [None]:
# Cell 18
myvec = np.array(['a','b','c','d'])
my_logic_vector = np.array([False,True,True,False])
myvec[my_logic_vector]

Just to note, Numpy arrays can be transformed to standard Python lists, and vice versa. For instance:

In [None]:
# Cell 19
selection = list(myvec[my_logic_vector])
selection

You might wonder: "So what?". What's so great about this type of selecting elements in a vector or array. Well, consider this example, where you might be interested in selecting only the numbers in the array that are larger than 8:

In [None]:
# Cell 20
myvec = np.array([10, 5.0, 7, 15])
myvec > 8

The code above results in an array, of data type Boolean. First and last elements are indeed larger than 8, the two middle ones are not. This Boolean array can subsequently be passed back into the original array (or vector) to then only show the elements that are `True`.

In [None]:
# Cell 21
myvec[myvec>8]

Note that all elements are of type float. You can't mix types in a Numpy array, we'll come back to that in the section 'Dataframes'. 

You can also construct composite logical expressions, for instance:

In [None]:
# Cell 22
myvec[(myvec > 8) & (myvec < 12)]

As before, this works because of the Boolean vector created by the two tests: number larger than 8 `AND` smaller than 12. 

In [None]:
# Cell 23
(myvec > 8) & (myvec < 12)

## Task 1: Getting Familiar With Numpy #

**Learning goals:**
* Familiarize yourself with Numpy, and how to apply internal functions (methods)
* Understand array coordinates


**a)** Create a numpy array ‘x’, based on the following list:

**`[[67,98,202],[43,2,6],[12,99,100]]`**

Have a look at Cells #2 and #3 for basic syntax. 
Produce the square of the matrix (name **`x_square`**). Look at cell #10 for basic syntax. What is the value in row 3, column 2? Cell #8 (a.o.) shows basic syntax.

In [None]:
# a



**b)** multiply the array using the vector '**`a`**' from the Notebook and call that matrix '**`a_x`**'. In which ‘direction’ does the multiplication occur?

In [None]:
# b



**c)** By applying '`dir(a_x)`' you can find out which methods can be applied to your array. Calculate **sum**, **mean**, **standard deviation**, **minimum value** and **maximum value** of the matrix. See cell #14 for an example.

In [None]:
# c




**d)** Create a truth table (array with True and False) for values larger than 200. 

In [None]:
# d


**e)** Then, use that truth table to retrieve all values of '**`a_x`**' larger than 200. Cells #21 and #22 provide further information.

In [None]:
# e


## Task 2: The *Iris* dataset

**Learning goals:**
* Learn how to understand data structures and their features
* Learn how to use (Boolean) vectors or arrays to subset other arrays.

![iris species](iris_species.jpg)
The *Iris* dataset is one of the classic biological datasets, created by the famous geneticist R.A. Fisher. The data is derived from measurements on flowers of three *Iris* species: *I. versicolor*, *I. setosa*, and *I. virginica*. The dataset contains four measurements per flower: sepal length and width, and petal length and width.

![Iris flower](Iris_flower.jpg)

The data set is often used to demonstrate or benchmark machine-learning techniques. For more information on the *Iris* dataset, visit the <a href=https://en.wikipedia.org/wiki/Iris_flower_data_set>Wikipedia page</a>.

Because the *Iris* data set is small, and often used for demonstrating machine learning techniques, it is often bundled in analytical packages for languages often used in data science, such as R and Python. In Python, the most used package for machine learning is the Scikit-learn package. 

We will first load the dataset and explore some of its properties. 

In [None]:
import numpy as np  # import numpy
from sklearn import datasets # import the 'datasets' 
iris = datasets.load_iris()

The Scikit-Learn data packages have a specific structure which you don't need to remember. However, you can always use the `dir()` function on any object to investigate which methods are available. For now, we will focus on the `.keys()` method, which tells us that the datastructure is a bit like a dictionary.

In [None]:
iris.keys()

You will see the different 'items' in the data package. Let's explore them. One is called 'DESCR', and hold a description of the data package. Curious what it has to say? Just do:

In [None]:
print(iris.DESCR)

Note the absence of parenthesis, because iris.DESCR is a data structure, not a method (function built-in the method).

This is a nice overview of the data. The data is derived from three species of *Iris*: *I. setosa*, *I.versicolor*, and *I. virginica*. These names can also be found in the `iris.target_names`:

In [None]:
iris.target_names

In total, 150 flowers were measured, 50 for every species. The order of the species in the actual data array is given by the `iris.target` vector. 

In [None]:
iris.target

**a)** What species do the 0's, 1's and 2's correspond to, respectively? How can you subset the `iris.target_names` vector to extract the name 'setosa'?

In [None]:
#a


The structures of the flowers that were measured are called 'features'. Their names too can be retrieved from `iris`:

In [None]:
iris.feature_names

The actual measurements are found in the `iris.data` datastructure. The first ten rows of the iris data show the following values:

In [None]:
iris.data[:10,:]

In the next questions you will explore some of the features of `iris.data`. 

**b)** What kind of data structure is `iris.data`? Use both the general Python `type()` function, as well as the iris.data specific `dtype` attribute.

In [None]:
#b



The iris.data object has, like any python object, methods (object-specific functions), which  require parentheses (just like regular functions), but also has attributes, which hold information specific to the object. Both methods and attributes can be listed by doing `iris.data.<TAB>`, which will autocomplete, and in the notebook will bring forward a list of all methods and attributes. 

**c)** Can you find an attribute or method that will tell you the 'shape' of the data matrix? Not sure if the likely name is a method or attribute? Just try. If you invoke it without parentheses and it gives an error, it is likely a method. In addition, you can always use the '?' after the name for more information!

In [None]:
# c


The first column of iris.data holds the measurments of the sepal length. You can extract the first column in this way:
```Python
iris.data[:,0]
```
The ':' (colon) is a slice from start to end. Since it is the first index value, this means: "all rows". The '0' stands for the first column, since it is the second index value.

**d)** Capture the first column in a variable called `sepal_length`. What type is `sepal_length`? What is its shape?

In [None]:
# d




**e)** The `sepal_length` object that you've made has a number of attributes. Find the methods to calculate mean and standard deviation, and calculate these values. Check if the values are the same as in the description of the iris data package. 

In [None]:
# e



The mean and standard deviation you have just calculated was based on all three species. We might want to do the same for each of the species. Let's brake that down first in a number of steps. Say we want to do this first for one species, versicolor. This species is the second in the `iris.target_names` vector, and therefore has index value `1`, both in the `iris.target_names` and corresponding `iris.target` vectors. 

**f)** Create a Boolean vector called `is_versicolor` based on iris.target by asking when iris.target is 1. 
```Python
iris.target == 1
```
How long is the Boolean vector? Which values does it contain?

In [None]:
# f



**g)** Create a vector, called `versicolor_sepal_length`, that has *only* the versicolor sepal length information. This means you have to get the first column from `iris.data`, and then subset that vector based on the Boolean vector `is_versicolor` created in the previous question.

In [None]:
# g



Now we have to make use of the realization that once you know how to do something for one thing, you can do it for all. Automation! Looping! (Remember! Laziness is a virtue! Or at least, it *can* be...)

**h)** Make a for loop where you will calculate mean and std for every one of the three species.

In [None]:
# h



## Plotting with Matplotlib
### Task 3: A very short primer on Matplotlib

**Learning goals:**
* Explore plotting with Python, particularly with the Matplotlib library
* Familiarize yourself with a few basic plotting types, and basic elements of a plot
* Learn to set up basic plots, and then extending them by adding more elements.

In this section we will explore a few basic properties of Matplotlib and demonstrate how you can make nice figures by just a limited number of lines of code. One of the added learning goals is to explore the extensibility of the Matplotlib code. By adding figure titles, legends, axis labels, and by modifying color and other features of plotted lines and dots, you can make beautiful, publishing grade figures tailored to your needs, and, importantly, automate it without much additional effort.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

Our first demonstration will be on a well-known mathematical operation: taking the sine and cosine of a bunch of values, and then plot the x and y values. For this we first need to have a vector of 'x-coordinates'. Numpy has a neat little function for that, called 'linspace'. The next line of code will produce 100 values between zero and ten, evenly spaced:

In [None]:
x = np.linspace(0,10,100)
x

Matplotlib supports different coding styles. This can become a bit confusing, so we will only show one today, which is also the one that is most often used. It is in line with the 'Matlab' way of plotting. This is not surprising as <a href=https://en.wikipedia.org/wiki/MATLAB>Matlab</a> has inspired the Python Matplotlib makers. There is also a more 'Pythonesque' way of doing it, which has a number of advantages, but you will see that slightly less often in coding examples (although Matplotlib developers appear to <a href=http://matplotlib.org/faq/usage_faq.html#coding-styles>encourage the object, Python way</a>).

The general procedure is as follows: open an 'active' figure. This is usually not strictly necessary in Matlab style, but has a few advantages. First, it explicitly delineates the active plot. And second it allows to modify canvas properties. After activating a canvas, you can add a number of plotting statements, and add other features. After all the elements are added to the active plot, you can display it (or write it to a file).

**a)** In the first exercise you will add more features. You can do this by removing the `#` in front of a line of code, or swapping out lines. Eacht time, run the cell, and see what changes. Note also the syntax to define line style and color. 'b' is blue, 'g' is green, and 'r' is red. There are many more. What does '--' do?

In [None]:
# a
plt.figure() # create figure

plt.plot(x,np.sin(x)) # basic plotting function
plt.plot(x,np.cos(x)) # basic plotting function
# replace these two lines by:
# plt.plot(x,np.sin(x), 'r')   # you can modify colors
# plt.plot(x,np.cos(x), '--b') # and line patterns
# and then replace these in turn by:
# plt.plot(x,np.sin(x), 'b',label = 'sin') # 'label' is shown by plt.legend()
# plt.plot(x,np.cos(x), '--g', label = 'cos') # 'label' is shown by plt.legend()

# Then, one by one, add these lines. Observe and report differences.
# plt.xlim((-1,11))       # change the default limits of the plot, horizontal direction
# plt.ylim((-1.2,1.2))    # change the default limits of the plot, vertical direction
# plt.title('This is my title') # add a title to the plot
# plt.xlabel('X')  # add label for x-axis
# plt.ylabel('Y')  # add label for y-axis
# plt.legend() # when activating this, make sure that 'label' is added to plot function!
plt.show() # finally: display

One of the key concepts of this plotting style is the 'active plotting window' where you can add and modify stuff until you decide to plot. Within a plot you can also add subplots, for instance if you would like to plot the sine and cosine functions in separed graphs. The same principle then holds. Here we demonstrate plotting using different subplots.

**b)** For each subplot, add a title, saying either 'sin' or 'cos'. For syntax how to plot a title, see the previous question.

In [None]:
# b
x = np.linspace(0,10,100)
plt.figure()
plt.subplot(2,1,1)      # subplot: two rows, one column, 1st plot
plt.plot(x,np.sin(x))
plt.subplot(2,1,2)      # subplot: two rows, one column, 2nd plot
plt.plot(x,np.cos(x), '--g')
plt.show()

**c)** Change the layout of the plot, stacking the sine and cosine plots vertically in stead of horizontally as in the previous plot.

In [None]:
# c
x = np.linspace(0,10,100)
plt.figure()
# plt.subplot(?)        # modify this line of code and uncomment it
plt.plot(x,np.sin(x))
# plt.subplot(?)        # modify this line of code and uncomment it
plt.plot(x,np.cos(x), '--g')
plt.show()

### Task 4: Plotting the Iris data

**Learning goals:**
* Learn to integrate complex data structures in a plotting strategy
* Learn to tinker with code to make attractive plots

For a more meaningful example, we will explore the *Iris* dataset again. First order of business: import the libraries we will use in this exercise. 

In [None]:
import matplotlib.pyplot as plt  # matplotlib
%matplotlib inline              
                                 # Jupyter notebook specific 'magic' - figures
                                 # will be plotted in the notebook.

import numpy as np               # numpy
from sklearn import datasets     # import the 'datasets' 
iris = datasets.load_iris()      # load the iris dataset

The next example makes a `scatterplot` of the Iris data: Sepal Lenght vs Sepal Width. Again, the plot is initialized, a plot statement is made (in this case 'scatter'), using first and second columns of the `iris.data` array. Then a title is added.

**a)** Modify the code below so that x-axis and y-axis labels are plotted ('Sepal Lenght', 'Sepal Width')

In [None]:
# a
plt.figure()
plt.scatter(iris.data[:,0],iris.data[:,1])
plt.title('Iris data: Sepal Length vs Sepal Width')
plt.show()

Now that you know how to make a plot of all Iris samples, you can actually take your solution for Task 2, question h, to sequentially plot the data for the three different species. As you sequentially invoke a plot statement for each species separately, you can modify, for instance, the 'label', which is important for the legend. But also the color of the dots (or anything else if you want; don't hesitate to experiment).

**b)** Replace the '?' with the correct variable/vector so that it works. Question: how are label names and dot color dynamically modified?


In [None]:
# b
colors = ['blue','red','green']
plt.figure()
for i in range(3):
    sepal_length = iris.data[:,0][iris.target==i]
    sepal_width = iris.data[:,1][iris.target==i]
    plt.scatter('?','?', label = iris.target_names[i], c=colors[i] ) # replace '?'
plt.title('Iris data: Sepal Length vs Sepal Width')
plt.legend()
plt.show()

So far we have seen 'plot' and 'scatter' functions. There are many other [plotting functions](https://matplotlib.org/api/pyplot_summary.html), such as the 'bar', 'boxplot', and 'histogram'. There is no time to explore them all. One we will explore for the Iris data is the histogram. One of the interesting features of the *Iris* data set is that you can not distinghuis the three species on any single character alone. Every feature has a distribion which overlaps with another species. Let's explore the distribution of 'Sepal Length' for I. versicolor: 

In [None]:
# plot histogram of sepal lengths.
plt.figure()
plt.hist(iris.data[iris.target ==1,0], color='blue', alpha = 0.9, label = 'versicolor')
plt.title('Iris data, histogram of Sepal Length')
plt.legend()
plt.show()

**c)** Now add one line, similar to the second line of code in the previous cell, so that you add *I. setosa* to this plot. Make sure you have appropriate species and column from the iris.data array! And modify the line so that 'label' and color are changed, maybe change it to red? **Also note the 'alpha' parameter. You can set it to a value between 0 and 1**. It changes the transparancy. **Play** with it a bit to create a histogram that you like most. 

In [None]:
# c
plt.figure()
plt.hist(iris.data[iris.target ==1,0], color='blue', alpha = 0.9, label = 'versicolor')
# ADD LINE OF CODE HERE

plt.title('Iris data, histogram of Sepal Length')
plt.legend()
plt.show()

## Dataframes 
### Task 5: A short primer on Dataframes using Pandas

**Learning goals:**
* Understand the properties of a data frame, and how that differs from an array
* Explore the power of Pandas for numerical data manipulation and analysis

Arrays and vectors are incredibly powerful structures for numerical data analysis. In some languages, such as R, arrays and vectors are primary data types, which in large part explains the appeal of R for statistical data analysis - well, in fact that language has numerical data analysis as primary focus. Numpy has done much the same for Python, and Python for instance is quite popular in physics research and there is a trend that Python is replacing other languages/programming environments, such as Matlab.

Arrays and vectors, however, also show a fundamental limitation: they can only hold a single data type, and that data type is usually numerical. consider the following example:

In [None]:
my_list = ['some random string', 1, 0.3]
my_list

`my_list` is a list that holds three elements, each of different type: a string, integer, and float. Now consider what happens next:

In [None]:
np.array(my_list)

Numpy arrays always have a single type. Since one of the elements from the list is a string, that requires the rest to be of type string as well, because a string can not be converted (meaningfully) to a numerical value.

When we supply a list of integers, on the other hand, a numpy array of type integer can be made:

In [None]:
my_list= [1,1,1]
my_vector = np.array(my_list)
print my_vector.dtype 
my_vector

Note that by default integers in Python take 64 bits. If you have many integers in an array, say hundreds of millions, but each of the integers is small (smaller than 256, in fact), you can get away with explicitly converting to 8-bit integers. This saves 8 times the memory. This is important for working with bit-mapped pictures, but can also be relevant for DNA data, if you code your bases simply 0,1,2,3.

Now, again, numpy arrays can only have a single type, so when all values are numerical, but not all are integers:

In [None]:
my_list= [1.0,1,1]
my_vector = np.array(my_list)
print my_vector.dtype
my_vector

So, arrays and vectors don't allow you to mix data types, but lists do. 

In practise, many of the data sets we are likely to encounter are organized in a column-separated fashion ('tables'), ie. have a 'matrix-like' lay-out, which could make them suitable for manipulation and computation in arrays. However, tables often have a mix of data types. Our Crane data set is a good case in point. To accomodate this type of dataset, you need to use the Data Frame. If that sounds arcane, it shouldn't. There is no doubt you have worked with data frames before. Just think of 'Excel sheets'.

Data frames are usually organized as a collection of named vectors or lists. Each column represents one data type, that holds for instance integers, and which has a name attached to it seperately which can be a string. Different columns can have different data types, such as string or float, but within each column, the data type is the same, allowing for optimal efficiency of working with the data per column. 

Data frames, like arrays, although internally always of compounded types, are basic data structures in some languages, such as R and Matlab. In Python too, it was recognized that with the Numpy and SciPy modules in place and very popular, there was a need to support dataframes. For this purpose the Pandas module was created. 

Like dataframes in other languages, the Pandas dataframe supports slicing and indexing, although the syntax to do that requires a bit of practice. However the same holds for languages like R, and, in fact, the dataframe syntax of Pandas is quite similar, although there are some quirks (although it can be argued that these quirks make the Pandas dataframe more easily sliceable than the R dataframes). Since the Pandas dataframe internally consists of Numpy vectors, all the mathematics tools and efficiency that people have come to love in Numpy are built into Pandas as well.

Let's start this part on Pandas with the Iris dataset, but now, instead of getting it from a built-in package, we'll load it directly from an excel sheet. 

*Caveat emptor: like with every strategy where you want to load an entire datastructure into memory, this could go south quite significantly if your dataset is huge. Like, seriously crashing your computer. So pay attention to the size of your input data. There are some interesting 'Out of Memory' solutions that are compatible with Pandas to deal with very large arrays, such as hdf5-based arrays, but these are several parsecs beyond the scope of this practical.*

In [None]:
import pandas as pd # the most accepted way of importing the pandas package


**a)** Explore the `pd` object. Find a method/function that can load the `Iris.xlsx` data into a dataframe with variable name `iris_pd`.

In [None]:
#a
iris_pd = pd.<METHOD>('Iris.xlsx') # apply appropriate method

**b)** Now that you have loaded the Iris data into the `iris_pd` dataframe, the `iris_pd` object has Pandas-specific methods that allow you to do things with it. One of the things is to show just the first lines of the table, much like the 'head' command in the shell. Explore your `iris_pd` dataframe for a method that might do just that, and apply it. 

In [None]:
# b


As with arrays, it is very easy to retrieve a single column of data. To extract the column that holds the sepal length measurements, you simply do the following:


In [None]:
iris_sepal_length_pd = iris_pd['sepal length (cm)']

You might have noticed that the syntax is very similar to extracting a value from a dictionary by supplying a key to the dictionary. That is not a co-incidence: the Pandas dataframe organizes the different columns very much like items in a dictionary. 

**c)** Explore properties of the `iris_sepal_length_pd` object. What type is it? Can you determine its shape?

In [None]:
# c


**d)** Calculate mean and std of sepal length. See if there are methods in the object that can do this for you.

In [None]:
# d


To extract rows, use loc (or iloc). The loc method works similar for selecting values as you saw earlier in Numpy arrays. There is one difference: you can use (lists of) column names as keys (for rows too, if you have named rows, which we don't have here). But you can also use index values, which can be a slice, a numpy array with row numbers, or Boolean vector.
```Python
iris_pd.loc[?,['species','sepal length (cm)','sepal width (cm)']]
```

Where the '?' is a placeholder for the rows to select. 

**e)** Replace the '?' with a 'slice' that returns the first 5 rows.

In [None]:
# e
iris_pd.loc['?',['species','sepal length (cm)','sepal width (cm)']] # replace '?'

**f)** Replace the '?' with a numpy array with values to retrieve only lines 1,3,5,141 and 142 in the data frame. 

In [None]:
# f
iris_pd.loc['?',['species','sepal length (cm)','sepal width (cm)']] # replace '?'

**g)** To extract only the *I. setosa* samples from the data frame, replace the index in the previous cell with a Boolean vector (or, in this case, Pandas series), based on this statement:
```Python
iris_pd['species'] == 'setosa'
```
As usual, if you don't quite understand each step, break it down. You can always catch any outcome of code (well, if it returns an object), in a variable and explore it further.

In [None]:
# g
iris_pd.loc['?','sepal length (cm)'] # replace '?'

Next, a little demonstration of adding a column. We can calculate the ratio between sepal sength and sepal width and store it in an additional column, called 'ratio':

In [None]:
iris_pd['ratio'] = iris_pd['sepal length (cm)']/iris_pd['sepal width (cm)']

**h)** Show the first 10 lines of the data frame, and show the dimensions ('shape'). What has changed? Is this an 'in-place' addition, i.e. can a data frame be changed?

In [None]:
# h




*Final note on Pandas: Pandas does have a built-in library for plotting. However, since there is already much that will be confusing even dealing with just one plotting method (and, as mentioned, even Matplotlib has already 2!) I don't want to bother you with it here. Furthermore, Matplotlibs options are far more extensive, and all the Pandas datastructures are compatible with Matplotlib anyway. *

## Further reading:
- [matplotlib website](https://matplotlib.org/)
- [matplotlib examples](https://matplotlib.org/gallery.html)
- [Pandas](http://pandas.pydata.org/)
- [Python data science handbook (O'Reilly) - recommended](http://a.co/2jL2pqf)
