# Using Jupyter Notebooks for Python Data Analysis

## What is Python
* Programming language that was first released in 1991
* High-level programming language that has strong abstraction from computer
* Emphasizes readability

In short, it helps you fly

![Python Makes You Fly](https://imgs.xkcd.com/comics/python.png)

## Zen of Python
* Beautiful is better than ugly.

* Explicit is better than implicit.

* Simple is better than complex.

* Complex is better than complicated.

* Flat is better than nested.

* Sparse is better than dense.

* Readability counts.

* Special cases aren't special enough to break the rules.

* Although practicality beats purity.

* Errors should never pass silently.

* Unless explicitly silenced.

* In the face of ambiguity, refuse the temptation to guess.

* There should be one-- and preferably only one --obvious way to do it.

* Although that way may not be obvious at first unless you're Dutch.

* Now is better than never.

* Although never is often better than *right* now.

* If the implementation is hard to explain, it's a bad idea.

* If the implementation is easy to explain, it may be a good idea.

* Namespaces are one honking great idea -- let's do more of those!

## What is a Jupyter Notebook
* IPython began in 2001 with the goal of creating a better interactive Python interpreter
* Encourages the execute-explore workflow
* In 2014, Jupyter project was announced in order to build language-agnostic interactive computing tools
    * Supports over 40 languages
    * IPython is a part of the Jupyter project
* Notebooks allow for a number of cell types including Markdown (like this cell!) for formatting purposes
    * [Markdown examples](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet)

## How to run Jupyter Notebook Cells
* Run a cell with SHIFT + ENTER

## Python Language Basics

### Comments

In [11]:
# Comments are indicated by utilizing a pound sign at the beginning of the line
# Nothing will be output in the output cell if it begins with 

### Strings

In [12]:
# Strings are enclosed with either single or double quotes i.e. 'string', or "string"
print("Hello World")

Hello World


### Numeric Types
* There are two numeric types, int and float
* int can store integers, and float stores floating-point numbers

In [13]:
type(2)

int

In [14]:
type(2.5)

float

In [15]:
## Note: Division operations on integers will not return a float values.  
## To return a float, a float must be passed in the denominator
print(3 / 4)
print(3 / 4.0)

0.75
0.75


### Functions
* Functions help clean and organize code
* It is best to use a function if similar code will be used more than once
* Functions make code more readable by providing a name to a group of Python statements
* We can use a function to take the average of 4 numbers
* First we will sum up the numbers 2, 4, 6, and 8 and divide by 4 (using parentheses to ensure order of operations is important)

In [16]:
def take_average_of_4_numbers(num1, num2, num3, num4):
    sum_numbers = num1+num2+num3+num4
    count_numbers = len([num1, num2, num3, num4])
    return ((sum_numbers)/float(count_numbers))

In [17]:
take_average_of_4_numbers(2, 4, 6, 8)

5.0

## Certain packages make tasks much easier
* NumPy adds support for arrays and matrices and makes running high level math easy

In [18]:
import numpy as np

*  np.mean is what is known as a function, which takes arguments
*  We can also define our list of numbers as a variable, making the code simpler to read
*  Now we will call the mean function from NumPy
*  You can research docs online (http://docs.scipy.org/doc/numpy/reference/generated/numpy.mean.html)
*  We need to compile the numbers into a list for this to work

In [19]:
np.mean([2, 4, 6, 8])

5.0

In [20]:
x = [2, 4, 6, 8]
np.mean(x)

5.0

## Lists
* An important concept in Python is the list, which can be used for a variety of reasons
* Lists are defined by square brackets
* We can even have a list of string values (but don't try taking a mean over them!)

In [21]:
best_mlb_players = ['Edgar Martinez', 'Ken Griffey Jr.', 'Felix Hernandez']
np.mean(best_mlb_players)

TypeError: cannot perform reduce with flexible type

## Errors
* Speaking of errors, they are very important
* Be sure to read through the error to find out why you might have an issue
* Our error of "TypeError: cannot perform reduce with flexible type"
* We can use Stack Overflow to search our error and find out why we have an error
* http://stackoverflow.com/questions/28393103/typeerror-cannot-perform-reduce-with-flexible-type

In [22]:
# Slicing lists is a quick way to find out the information you want
# Be aware that Python counts from 0, so to get the first person in our list, we slice the 0th element
best_mlb_players[0]

'Edgar Martinez'

In [23]:
# We can also select a range of values
best_mlb_players[0:3]

['Edgar Martinez', 'Ken Griffey Jr.', 'Felix Hernandez']

In [24]:
# We can "concatenate" lists by adding them together
print(best_mlb_players + ['Jay Buhner', 'Jamie Moyer'])
# But the list only saves them if we save, or append values
best_mlb_players.append('Jose Cruz Jr.')
print(best_mlb_players)
# Check the length of a list
print(len(best_mlb_players + ['Jay Buhner', 'Jamie Moyer']))

['Edgar Martinez', 'Ken Griffey Jr.', 'Felix Hernandez', 'Jay Buhner', 'Jamie Moyer']
['Edgar Martinez', 'Ken Griffey Jr.', 'Felix Hernandez', 'Jose Cruz Jr.']
6


### For Loops and Indentation
* For loops allow us to run the same code over collection or an iterator
* Python uses whitespace instead of braces to structure for loops
* Colons denote the start of an indented block

In [25]:
for player in best_mlb_players:
    print(player)

Edgar Martinez
Ken Griffey Jr.
Felix Hernandez
Jose Cruz Jr.


### Binary Operators
* We can use binary operators in for loops and if/else statement

In [26]:
for player in best_mlb_players:
    if player=='Edgar Martinez':
        print("Edgar is the best!")
    else:
        continue

Edgar is the best!


In [27]:
for player in best_mlb_players:
    if len(player)<14:
        print(player, " has shortest name of our list with only {} characters".format(len(player)))
    else:
        print(player, len(player))

Edgar Martinez 14
Ken Griffey Jr. 15
Felix Hernandez 15
Jose Cruz Jr.  has shortest name of our list with only 13 characters
