# Welcome to MSA8010-CIS8005 Section II

---

Welcome to MSA8010-CIS8005 section II.  In this section, we will be using a variety of tools that will require some initial configuration. To ensure everything goes smoothly moving forward, we will setup the majority of those tools in this Session. While some of this will likely be dull, doing it now will enable us to do more exciting work in the weeks that follow without getting bogged down in further software configuration.


## Getting Python

You will be using Python throughout the rest of the course, including many popular 3rd party Python libraries for scientific computing. [Anaconda](http://continuum.io/downloads) is an easy-to-install bundle of Python and most of these libraries. We recommend that you use Anaconda for this course.

Please visit [this page](https://github.com/MSA8010-CIS8005/contents/blob/master/Lesson1/Installing-Python.md) and follow the instructions to set up Python

<hline>

## Hello, Python

The IPython notebook is an application to build interactive computational notebooks. You'll be using them to complete labs, homework and projects. Once you've set up Python, please <a href=https://github.com/MSA8010-CIS8005/contents/blob/master/Lesson1/PythonLibs.ipynb download="PythonLibs.ipynb">download this page</a>, and open it with IPython by typing

```
ipython notebook <name_of_downloaded_file>
```

Notebooks are composed of many "cells", which can contain text (like this one), or code (like the one below). Double click on the cell below, and evaluate it by clicking the "play" button above, for by hitting shift + enter

In [None]:
x = [10, 20, 30, 40, 50]
for item in x:
    print "Item is ", item

## Python Libraries

We will be using a several different libraries throughout this course. If you've successfully completed the [installation instructions](https://github.com/MSA8010-CIS8005/contents/blob/master/Lesson1/Installing-Python.md), all of the following statements should run.

In [None]:
import sys
print(sys.path)

In [None]:
# Below you can see your python 2 and 3 folders that jupyter(ipython) notebook
# looking at:
#import sys
#print(sys.path)


# Look at your OS path now and see what is the path to your anaconda. 
# the pip that comes with this anaconda installs modules local to this
# path under lib/python-2.7/site-packages. All modules are here.
# For me, I have two anaconda installation and one is bad. 
# The path was not right one. So I went to bin 
# folder of right anaconda and ran pip for this one.
# mine right anaconda is under Users/XXXX/anaconda/bin and executed ./pip
# For your anaconda Python3 location, find your python3 anaconda path and 
# then go to bin under this folder and run ./pip3 install package model_name
# so you have modules for Python3 too.
# mine was: /Library/Frameworks/Python.framework/Versions/3.4/bin

# mine had only python2 kernel so I used this command to have python3 available too
# go to Users/XXXX/anaconda/bin
# sudo ipython3 kernel install
 
    
#IPython is what you are using now to run the notebook
import IPython
print "IPython version:      %6.6s (need at least 1.0)" % IPython.__version__

# Numpy is a library for working with Arrays
import numpy as np
print "Numpy version:        %6.6s (need at least 1.7.1)" % np.__version__

# SciPy implements many different numerical algorithms
import scipy as sp
print "SciPy version:        %6.6s (need at least 0.12.0)" % sp.__version__

# Pandas makes working with data tables easier
import pandas as pd
print "Pandas version:       %6.6s (need at least 0.11.0)" % pd.__version__

# Module for plotting
import matplotlib
print "Mapltolib version:    %6.6s (need at least 1.2.1)" % matplotlib.__version__

# SciKit Learn implements several Machine Learning algorithms
import sklearn
print "Scikit-Learn version: %6.6s (need at least 0.13.1)" % sklearn.__version__

# Requests is a library for getting data from the Web
import requests
print "requests version:     %6.6s (need at least 1.2.3)" % requests.__version__

# Networkx is a library for working with networks
import networkx as nx
print "NetworkX version:     %6.6s (need at least 1.7)" % nx.__version__

#BeautifulSoup is a library to parse HTML and XML documents
#import BeautifulSoup
#print "BeautifulSoup version:%6.6s (need at least 3.2)" % BeautifulSoup.__version__
#BeautifulSoup is a library to parse HTML and XML documents
import bs4
print "BeautifulSoup version:%6.6s (need at least 3.2)" % bs4.__version__

#MrJob is a library to run map reduce jobs on Amazon's computers
import mrjob
print "Mr Job version:       %6.6s (need at least 0.4)" % mrjob.__version__

#Pattern has lots of tools for working with data from the internet
# pattern module is compatible only with Python 2. You can install
# pattern3 for Python 3 and change import to pattern3
#import pattern3
import pattern
print "Pattern version:      %6.6s (need at least 2.6)" % pattern.__version__

If any of these libraries are missing or out of date, you will need to [install them](https://github.com/MSA8010-CIS8005/contents/blob/master/Lesson1/Installing-Python.md) and restart IPython

## Hello matplotlib

The notebook integrates nicely with Matplotlib, the primary plotting package for python. This should embed a figure of a sine wave:

In [None]:
#this line prepares IPython for working with matplotlib
%matplotlib inline  

# this actually imports matplotlib
import matplotlib.pyplot as plt  

x = np.linspace(0, 10, 30)  #array of 30 points from 0 to 10
y = np.sin(x)
z = y + np.random.normal(size=30) * .2
#print z
plt.plot(x, y, 'ro-', label='A sine wave')
plt.plot(x, z, 'b-', label='Noisy sine')
plt.legend(loc = 'lower right')
plt.xlabel("X axis")
plt.ylabel("Y axis")           

If that last cell complained about the `%matplotlib` line, you need to update IPython to v1.0, and restart the notebook. See the [installation page](https://github.com/cs109/content/wiki/Installing-Python)

## Hello Numpy

The Numpy array processing library is the basis of nearly all numerical computing in Python. Here's a 30 second crash course. For more details, consult Chapter 4 of Python for Data Analysis, or the [Numpy User's Guide](http://docs.scipy.org/doc/numpy-dev/user/index.html)

In [None]:
print "Make a 3 row x 4 column array of random numbers"
x = np.random.random((3, 4))
print x
print

print "Add 1 to every element"
x = x + 1
print x
print

print "Get the element at row 1, column 2"
print x[1, 2]
print

# The colon syntax is called "slicing" the array. 
print "Get the first row"
print x[0, :]
print

print "Get every 2nd column of the first row"
print x[0, ::2]
print

Print the maximum, minimum, and mean of the array. This does **not** require writing a loop. In the code cell below, type `x.m<TAB>`, to find built-in operations for common array statistics like this

In [None]:
print "Max is  ", x.max()
print "Min is  ", x.min()
print "Mean is ", x.mean()

Call the `x.max` function again, but use the `axis` keyword to print the maximum of each row in x.

In [None]:
print x.max(axis=1)

In a binomial experiment there are two mutually exclusive  outcomes, often referred to as "success" and "failure".  If the probability of success is p, the probability of failure is 1 - p.

Such an experiment whose outcome is random and can be either of two possibilities, "success" or "failure", is called a Bernoulli trial, after Swiss mathematician Jacob Bernoulli (1654 - 1705).

Here's a way to quickly simulate 500 coin "fair" coin tosses (where the probabily of getting Heads is 50%, or 0.5)

In [None]:
x = np.random.binomial(500, .5)
print "number of heads:", x

Repeat this simulation 500 times, and use the [plt.hist() function](http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.hist) to plot a histogram of the number of Heads (1s) in each simulation

In [None]:
# 3 ways to run the simulations

# loop
heads = []
for i in range(500):
    heads.append(np.random.binomial(500, .5))

# "list comprehension"
heads = [np.random.binomial(500, .5) for i in range(500)]

# pure numpy
heads = np.random.binomial(500, .5, size=500)

histogram = plt.hist(heads, bins=10)

In [None]:
type(heads)

In [None]:
heads.shape