# Numpy

Numpy is a library providing functions to deal with large arrays and matrices of numeric data. It runs much faster than working only with python lists directly.

The __array object class__ is the basic element of Numpy.   
Numpy arrays are like lists in Python with this important difference: everything in an array has to be the same type.

In [9]:
import numpy as np

In [3]:
array = np.array([1, 4, 5, 8], float)
print array
print ""
array = np.array([[1, 2, 3], [4, 5, 6]], float)  # a 2D array/Matrix
print array

[ 1.  4.  5.  8.]

[[ 1.  2.  3.]
 [ 4.  5.  6.]]


## Numpy array: index, slice, manipulate

In [5]:
array = np.array([1, 4, 5, 8], float)
print array
print ""
print array[1]
print ""
print array[:2]
print ""
array[1] = 5.0
print array[1]


two_D_array = np.array([[1, 2, 3], [4, 5, 6]], float)
print two_D_array
print ""
print two_D_array[1][1]
print ""
print two_D_array[1, :]
print ""
print two_D_array[:, 2]    

[ 1.  4.  5.  8.]

4.0

[ 1.  4.]

5.0
[[ 1.  2.  3.]
 [ 4.  5.  6.]]

5.0

[ 4.  5.  6.]

[ 3.  6.]


## Numpy array: arithmetic operations

In [8]:
array_1 = np.array([1, 2, 3], float)
array_2 = np.array([5, 2, 6], float)
print array_1 + array_2
print ""
print array_1 - array_2
print ""
print array_1 * array_2

array_1 = np.array([[1, 2], [3, 4]], float)
array_2 = np.array([[5, 6], [7, 8]], float)
print array_1 + array_2
print ""
print array_1 - array_2
print ""
print array_1 * array_2

[ 6.  4.  9.]

[-4.  0. -3.]

[  5.   4.  18.]
[[  6.   8.]
 [ 10.  12.]]

[[-4. -4.]
 [-4. -4.]]

[[  5.  12.]
 [ 21.  32.]]


## Numpy array: mean and dot products

In [7]:
array_1 = np.array([1, 2, 3], float)
array_2 = np.array([[6], [7], [8]], float)
print np.mean(array_1)
print np.mean(array_2)
print ""
print np.dot(array_1, array_2)

2.0
7.0

[ 44.]


# Pandas 
## Series

The concept of __Series__ in Pandas: Series is a 1D object similar to an array, list, or column in a databse. By default it will assign an index label {0...N-1} to each item in the Series.

In [10]:
import pandas as pd

### Create a Series object

In [None]:
series = pd.Series(['Dave', 'Cheng-Han', 'Udacity', 42, -1789710578])
print series


### Assign custom indices to the items in the Series

In [12]:
series = pd.Series(['Dave', 'Cheng-Han', 359, 9001],
                       index=['Instructor', 'Curriculum Manager',
                              'Course Number', 'Power Level'])
print series

## Selection by index

In [13]:
series = pd.Series(['Dave', 'Cheng-Han', 359, 9001],
                       index=['Instructor', 'Curriculum Manager',
                              'Course Number', 'Power Level'])
print series['Instructor']
print ""
print series[['Instructor', 'Curriculum Manager', 'Course Number']]

Dave

Instructor                 Dave
Curriculum Manager    Cheng-Han
Course Number               359
dtype: object


### Selection using boolean operators

In [14]:
cuteness = pd.Series([1, 2, 3, 4, 5], index=['Cockroach', 'Fish', 'Mini Pig',
                                                 'Puppy', 'Kitten'])
print cuteness > 3
print ""
print cuteness[cuteness > 3]

Cockroach    False
Fish         False
Mini Pig     False
Puppy         True
Kitten        True
dtype: bool

Puppy     4
Kitten    5
dtype: int64


## Dataframes


To create a dataframe, you can pass a dictionary of lists to the Dataframe
constructor:
1. The key of the dictionary will be the column name
2. The associating list will be the values within that column

In [16]:
data = {'year': [2010, 2011, 2012, 2011, 2012, 2010, 2011, 2012],
            'team': ['Bears', 'Bears', 'Bears', 'Packers', 'Packers', 'Lions',
                     'Lions', 'Lions'],
            'wins': [11, 8, 10, 15, 11, 6, 10, 4],
            'losses': [5, 8, 6, 1, 5, 10, 6, 12]}
football = pd.DataFrame(data)
print football

   losses     team  wins  year
0       5    Bears    11  2010
1       8    Bears     8  2011
2       6    Bears    10  2012
3       1  Packers    15  2011
4       5  Packers    11  2012
5      10    Lions     6  2010
6       6    Lions    10  2011
7      12    Lions     4  2012


In [17]:
data = {'year': [2010, 2011, 2012, 2011, 2012, 2010, 2011, 2012],
            'team': ['Bears', 'Bears', 'Bears', 'Packers', 'Packers', 'Lions',
                     'Lions', 'Lions'],
            'wins': [11, 8, 10, 15, 11, 6, 10, 4],
            'losses': [5, 8, 6, 1, 5, 10, 6, 12]}
football = pd.DataFrame(data)
print football.dtypes
print ""
print football.describe()
print ""
print football.head()
print ""
print football.tail()


losses     int64
team      object
wins       int64
year       int64
dtype: object

          losses       wins         year
count   8.000000   8.000000     8.000000
mean    6.625000   9.375000  2011.125000
std     3.377975   3.377975     0.834523
min     1.000000   4.000000  2010.000000
25%     5.000000   7.500000  2010.750000
50%     6.000000  10.000000  2011.000000
75%     8.500000  11.000000  2012.000000
max    12.000000  15.000000  2012.000000

   losses     team  wins  year
0       5    Bears    11  2010
1       8    Bears     8  2011
2       6    Bears    10  2012
3       1  Packers    15  2011
4       5  Packers    11  2012

   losses     team  wins  year
3       1  Packers    15  2011
4       5  Packers    11  2012
5      10    Lions     6  2010
6       6    Lions    10  2011
7      12    Lions     4  2012


### Indexing DataFrames

1. Selecting sngle column returns a Series
2. Selecting multiple coumns return a DataFrame

In [19]:
data = {'year': [2010, 2011, 2012, 2011, 2012, 2010, 2011, 2012],
            'team': ['Bears', 'Bears', 'Bears', 'Packers', 'Packers', 'Lions',
                     'Lions', 'Lions'],
            'wins': [11, 8, 10, 15, 11, 6, 10, 4],
            'losses': [5, 8, 6, 1, 5, 10, 6, 12]}

football = pd.DataFrame(data)
print football['year']
print '---'
print football.year  # shorthand for football['year']
print '---'
print football[['year', 'wins', 'losses']]

0    2010
1    2011
2    2012
3    2011
4    2012
5    2010
6    2011
7    2012
Name: year, dtype: int64
---
0    2010
1    2011
2    2012
3    2011
4    2012
5    2010
6    2011
7    2012
Name: year, dtype: int64
---
   year  wins  losses
0  2010    11       5
1  2011     8       8
2  2012    10       6
3  2011    15       1
4  2012    11       5
5  2010     6      10
6  2011    10       6
7  2012     4      12


### Row selection
Basic methods:
1. Slicing
2. Individual index (iloc or loc)
3. Boolean index

In [29]:
print football.iloc[[0]]
print ""
print football.loc[[0]]
print ""
print football[3:5]
print ""
print football[football.wins > 10]
print ""
print football[(football.wins > 10) & (football.team == "Packers")]

## ?pd.DataFrame.loc
## ?pd.DataFrame.iloc

   losses   team  wins  year
0       5  Bears    11  2010

   losses   team  wins  year
0       5  Bears    11  2010

   losses     team  wins  year
3       1  Packers    15  2011
4       5  Packers    11  2012

   losses     team  wins  year
0       5    Bears    11  2010
3       1  Packers    15  2011
4       5  Packers    11  2012

   losses     team  wins  year
3       1  Packers    15  2011
4       5  Packers    11  2012


## Pandas vectorized methods