# Libraries

---

The standard Python library is robust and rich in features. IMHO one of the key reasons Python is so successful is the extensibility through a broad range of community driven, open source 3rd party libraries - Pandas (built on NumPy), MatplotLib, Jupyter, SciKitLearn, FbProphet to name a few.

From command line run the following:
    
    pip install pandas
    
...or if you installed with Anaconda Distribution
    
    conda install pandas
    
The libraries demo'd here are just the tip of the iceberg!!

In [None]:
# import the os library for file manipulation
import os
# import the pandas library and alias it
import pandas as pd
# we've already seen how to import a subset of features from a library using "from"
from numpy import array

In [None]:
# create a data frame from a sample csv
df = pd.read_csv("SalesOrderHeader.csv", delimiter="\t")

In [None]:
# top of the data frame
df.head()

In [None]:
# actually, I want more
df.head(10)

In [None]:
# and the end
df.tail()

In [None]:
# Summarise the DataFrame
df.describe()

In [None]:
# Show a single column from the data frame, use .head() to show the top again
df["SalesOrderID"].head()

In [None]:
# Get the data from a single row (in this case, the 3rd row)
df.iloc[2]

In [None]:
# Get the SalesOrderID from a single row
df.iloc[2]["SalesOrderID"]

In [None]:
df.iloc[2][0]

---

## Plotting libraries

For Data Analytics, Python makes it easy to visualise data using plotting libraries like:
* MatplotLib
* SeaBorn
* Bokeh
* HoloViews
* ggplot
* Plotly

In [None]:
# MatplotLib is one of the most commonly used libraries for dataviz
# import the MatPlotLib plotting library and assign an alias
import matplotlib.pyplot as plt

In [None]:
# Plot a simple chart
plt.plot(df["SalesOrderID"], df["TotalDue"])
plt.ylabel('TotalDue')
plt.xlabel('Sales Order ID')
plt.show()

In [None]:
# the plot() method tries to guess the best chart for the job...
plt.plot(df["TerritoryID"], df["TotalDue"])
plt.ylabel('TotalDue')
plt.xlabel('Territory ID')
plt.show()

In [None]:
#Use a groupby to process the TerritoryID
groupedDF = df.groupby("TerritoryID").sum()
groupedDF.head()

In [None]:
# there are more explicit methods for specifying a graph type e.g. bar
plt.bar(groupedDF.index, groupedDF["TotalDue"])
# compare to 
# plt.bar(groupedDF.index, groupedDF["TotalDue"])
plt.ylabel("Total Due")
plt.xlabel("Territory ID")
plt.title("Total Due by Territory")
plt.show()

---

## Plotting Math functions with NumPy

Numpy is one of the most commonly used libraries for processing numerical information, mathematics and statistical functions. 

It is also a key component in Pandas.

In [None]:
#import matplotlib.pyplot as plt #........reimport not necessary
#import matplotlib as mpl
import numpy as np

x = np.linspace(0, 20, 100)
plt.plot(x, np.sin(x))
plt.show() 

### *Other plotting libraries are available!*

---