# Python libraries and an introduction to data manipulation

The core Python language is by design somewhat minimal.  Like other programming languages, Python has an ecosystem of modules (libraries of code) that augument the base language.  Some of these libraries are "standard", meaning that they are included with your Python distribution.  Many other open source libraries can be obtained from the organizations that support their development.

Think of a library as a collection of functions and data types that can be accessed to complete certain programming tasks without having to implement everything yourself from scratch.

This course will make extensive use of the following libraries:

* **[Numpy](http://numpy.org)** is a library for working with arrays of data.

* **[Pandas](http://pandas.pydata.org)** provides high-performance, easy-to-use data structures and data analysis tools.

* **[Scipy](http://scipy.org)** is a library of techniques for numerical and scientific computing.

* **[Matplotlib](http://matplotlib.org)** is a library for making graphs.

* **[Seaborn](http://seaborn.pydata.org)** is a higher-level interface to Matplotlib that can be used to simplify many graphing tasks.

* **[Statsmodels](http://www.statsmodels.org)** is a library that implements many statistical techniques.

This notebook introduces the Pandas and Numpy libraries, which are used to manipulate datasets.  Next week we will give an overview of the Matplotlib and Seaborn libraries that are used to produce graphs.  The Statsmodels package will be used in the second and third courses of the series that introduce formal statistical analysis and modeling. 

# Documentation

No data scientist or software engineer memorizes every feature of every software tool that they utilize.  Effective data scientists take advantage of resources (mostly on-line) to resolve challenges that they encounter when developing code and analyzing data.  Documentation is the official, authoritative resource for any programming language or library. Here are links to the official documentation for the [Python language](https://docs.python.org/3/) and the [Python Standard Library](https://docs.python.org/3/library/index.html#library-index).

### Importing libraries

When using Python, you will generally begin your scripts by importing the libraries that you will be using. 

The following statements import the Numpy and Pandas libraries, giving them abbreviated names:

In [6]:
import numpy as np
import pandas as pd

### Utilizing library functions

After importing a library, its functions can then be called from your code by prepending the library name to the function name.  For example, to use the '`dot`' function from the '`numpy`' library, you would enter '`numpy.dot`'.  To avoid repeatedly having to type the libary name in your scripts, it is conventional to define a two or three letter abbreviation for each library, e.g. '`numpy`' is usually abbreviated as '`np`'.  This allows us to use '`np.dot`' instead of '`numpy.dot`'.  Similarly, the Pandas library is typically abbreviated as '`pd`'.

The next cell shows how to call functions from an imported library:

In [8]:
a = np.array([0,1,2,3,4,5,6,7,8,9,10]) 
np.mean(a)

np.float64(5.0)