# LAC Data and Research Summit
## Introduction to Python
### Isabel Oñate
### 8/28/2019

#### <span style="color:#a50e3e;">Description: </span> 

Python is a programming language that is becoming more and more popular for doing data science. We will review some basic concepts and tools to use Python to conduct data analysis. 

Some resources:

- <a href="https://www.datacamp.com/community/tutorials/python-developer-set-up">Data camp course</a> 

- <a href="https://www.python.org/dev/peps/pep-0008/">Style guide for python code</a> 

- <a href="https://github.com/nuitrcs/intro-python-summer2019/blob/master/Python3_reference_cheat_sheet.pdf">Python beginners cheat sheet</a>  
- NumPy cheat sheet
- Pandas cheat sheet

*_Some code and exercises in were based on materials from other workshops: <a href="https://github.com/nuitrcs/intro-python-summer2019/blob/master/python-script1.md">Introduction to Python workshop by NU</a>; <a href="https://nbviewer.jupyter.org/gist/rpmuller/5920182">A Crash Course in Python for Scientists by Rick Muller</a>; <a href="https://github.com/nuitrcs/numpy-scipy-workshop">NumPy and Scipy workshop by NU</a>; and <a href="https://docs.python.org/3/tutorial/index.html">Pandas workshop by NU</a>_

#### <span style="color:#a50e3e;">Jupyter notebook: </span> 

The <a href="https://jupyter.org/">Jupyter Notebook </a> is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text.

Jupyter supports over many programming languages, including Python and R. Notebooks can be shared with others using email, Dropbox, GitHub and the Jupyter Notebook Viewer. Your code can produce rich, interactive output: HTML, images, videos, LaTeX, and custom MIME types.

You can find instructions for installing  Jupyter notebook <a href="https://jupyter.org/install">here</a>.



### Basic syntax

#### <span style="color:#a50e3e;">Indentation </span> 
Python provides no braces to indicate blocks of code for loops or functions. Blocks of code are denoted by line indentation, which is rigidly enforced. All the continuous lines indented with same number of spaces would form a block. All statements within the block must be indented the same amount.

In [238]:
if "yes" == "yes":
    print("true")
else:
    print("False")

true


#### <span style="color:#a50e3e;">Multi-line statements </span> 
Statements in Python typically end with a new line. However, python does allow the use of the line continuation character to denote that the line should continue.
Statements contained within the [ ], { }, or ( ) brackets do not need to use the line continuation character.

In [239]:
total = 3 + \
        4 + \
        6

In [240]:
days = ['Monday', 'Tuesday', 'Wednesday',
        'Thursday', 'Friday']

#### <span style="color:#a50e3e;">Quotation </span> 
Python accepts single ('), double (") and triple (''' or """) quotes to denote string literals, as long as the same type of quote starts and ends the string. The triple quotes are used to span the string across multiple lines. 

In [241]:
word = 'word'
sentence = "This is a sentence."
paragraph = """This is a paragraph. It is
made up of multiple lines and sentences."""

#### <span style="color:#a50e3e;">Comments </span> 
A hash sign (#) that is not inside a string literal begins a comment. All characters after the # and up to the end of the physical line are part of the comment and the Python interpreter ignores them.

In [242]:
# Do not print this

# First comments
print("Hello, Python!")# second comment

Hello, Python!


### Using Python as a calculator

Python can be used as a simple calculator for tasks including addition, subtraction, multiplication and division, exponentiation, etc.

In [243]:
print(5 + 2)
print(5 - 2)
print(3 * 5)
print(10 / 2) # division always returns a floating point number
print(9 // 4) # floor division discards fractional part
print(18 % 7) # This operator returns the remainder of the division
print(4 ** 2)
print(4 ** 1/2)

7
3
15
5.0
2
4
16
2.0


In [244]:
print(3 == 3.1)
print(3 != 3.1)
print(3 > 3.1)
print(3 < 3.1)

print(not True)
print(not False)
print(not 3 == 3.1)

False
True
False
True
False
True
True


### Variables

In Python, a variable allows to refer a value with a name using "=". Variables can contain objects like numbers (integers, floats, etc.), strings, Boolean, and others that we will review below.

In [245]:
x = 1
print(x)

x = x + 2
print(x)

x += 2
print(x)

1
3
5


In [246]:
# Assign a value to multiple variables
a = b = c = 10
print(a)
print(a, b, c) 

10
10 10 10


In [247]:
# Define multiple variables in one line
x, y = 2, 3
print(x, y)

# Swap values
x, y = y, x  
print(x, y)

2 3
3 2


In [248]:
# Float
z = 4.56
print(z)

4.56


In [249]:
# String
st = "hello"
print(st)

hello


In [250]:
# Boolean - either True or False
bl = True
print(bl)

True


In [251]:
# Variable type
print(type(x))
print(type(z))
print(type(st))
print(type(bl))

<class 'int'>
<class 'float'>
<class 'str'>
<class 'bool'>


In [252]:
type(x)

int

### Lists

In Python, a list is a collection of values, other lists, or other objects (string, integers, lists, etc.). A list is created with "[ ]". 

In [253]:
# Create a list of strings
my_list = ['a', 'b', 'c', 'd']
print(my_list)

#The elements in a list don't have to be of the same type, but they usually are.
my_mixed_list = [1, 'a', 2.3, [4, 5, 6]]
print(my_mixed_list)

['a', 'b', 'c', 'd']
[1, 'a', 2.3, [4, 5, 6]]


In [254]:
# Get the lenght of a string
print(len(my_list))

4


In [255]:
# Lists are indexed, starting with 0. List indices allow us to access the individual elements in a list or parts of the list.
my_list[0]
my_list[2]
my_list[-1]

'd'

In [256]:
my_list[0:3]
my_list[0:30]  #we go over the end, we only get what's available:
my_list[-3:]
my_list[:2]
my_list[2:]

['c', 'd']

In [257]:
# When you have nested lists, you need a separate set of brackets for each list:
nested_lists = [[2, 3, 5, 7, 11], [2, 4, 6, 8]]
len(nested_lists)

2

In [258]:
nested_lists[0]
nested_lists[0][1]
nested_lists[-1][-1]

8

In [259]:
# We can use indices with strings too
string_var = 'abcde'
string_var[0]

'a'

In [260]:
# Adding lists
primes = [2, 3, 5, 7, 11]
more_primes = [13, 17, 19]
primes + more_primes

[2, 3, 5, 7, 11, 13, 17, 19]

In [261]:
# Adding one value
primes.append(13)
print(primes)
primes + [39]
print(primes)

[2, 3, 5, 7, 11, 13]
[2, 3, 5, 7, 11, 13]


In [262]:
# You can change the elements by assigning to them directly - this can not be done with strings
fruit = ['apple', 'banana', 'pear']
fruit[0] = 'fig'
fruit

['fig', 'banana', 'pear']

In [263]:
# You can assign to slices of lists too
letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
letters[2:5] = ['C', 'D', 'E']
letters

['a', 'b', 'C', 'D', 'E', 'f', 'g']

### Dictionaries

Dictionaries hold key-value pairs. Keys are labels you can use to look up the stored values associated with them. Keys are usually strings and are unique. A dictionary is created with "{ }".

In [264]:
ages = {'Isabel': 41, 'Pablo': 4, 'Diego': 2, 'Ana': 0, 'Maria': 36}
print(ages)

{'Isabel': 41, 'Pablo': 4, 'Diego': 2, 'Ana': 0, 'Maria': 36}


In [265]:
# Instead of indexing with positions, we index with keys:
print(ages['Isabel'])
print(ages['Diego'])

41
2


In [266]:
# We can add a key and value pair to a dictionary by setting it:
ages['Juan'] = 35
print(ages['Juan'])

# We can also change the value by assigning to it
ages['Juan'] = 37
print(ages['Juan'])

35
37


### Numpy

In Python, an array is a data structure that stores values of same data type. <a href="https://www.numpy.org/">Numpy</a>  is a Python package for scientific computing. It contains functions for doing fast vector, matrix, and linear algebra-type operations in Python.

The n-dimensional array (ndarray) is the fundamental data structure in NumPy. It is a table of elements, all of the same type, which can be indexed (that is, accessed) by a tuple of integers.

In [267]:
# Import library
import numpy as np

In [268]:
# Array
np.array([1, 2, 3, 4, 5, 6])

array([1, 2, 3, 4, 5, 6])

In [269]:
# Array - integers
a = np.array([1, 2, 3, 4, 5, 6])
print(a.dtype)

# Array - float
b = np.array([1.0, 2.0, 3.0, 4.0])
print(b.dtype)

int32
float64


In [270]:
# Array of zeros
z = np.zeros((2, 5, 3))
print(z)

[[[0. 0. 0.]
  [0. 0. 0.]
  [0. 0. 0.]
  [0. 0. 0.]
  [0. 0. 0.]]

 [[0. 0. 0.]
  [0. 0. 0.]
  [0. 0. 0.]
  [0. 0. 0.]
  [0. 0. 0.]]]


In [271]:
z.shape

(2, 5, 3)

In [272]:
# Array of random integers
rand = np.random.randint(0, 10, (3, 3))
print(rand)

[[1 3 0]
 [5 5 8]
 [7 3 9]]


In [273]:
rand.shape

(3, 3)

In [274]:
# Slicing
print(a)
a[1:3]

[1 2 3 4 5 6]


array([2, 3])

In [275]:
a = np.arange(9).reshape(3, 3)
print(a)

[[0 1 2]
 [3 4 5]
 [6 7 8]]


In [276]:
a[1, 1]

4

In [277]:
a[:, 0]

array([0, 3, 6])

In [278]:
a[1:, :-1] # all rows starting with the second row, and all columns but the last one

array([[3, 4],
       [6, 7]])

In [279]:
# Functions
print(np.mean(a))
print(np.max(a))

4.0
8


### Pandas

Pandas is the essential data analysis library for Python programmers. It provides fast and flexible data structures built on top of numpy. In pandas, each data structure has something called an Index which data values with a label. 

A Data Frame is a two-dimensional array of indexed data.

In [280]:
# Import library
import pandas as pd

In [281]:
max_depths_dict = {
    'Erie': 64,
    'Huron': 229,
    'Michigan': 281,
    'Ontario': 244,
    'Superior': 406,
}

max_depths = pd.Series(max_depths_dict)
max_depths

Erie         64
Huron       229
Michigan    281
Ontario     244
Superior    406
dtype: int64

In [282]:
avg_depths_dict  = {
    'Erie': 19,
    'Huron': 59,
    'Michigan': 85,
    'Ontario': 86,
    'Superior': 149,
}

avg_depths = pd.Series(avg_depths_dict)
avg_depths

Erie         19
Huron        59
Michigan     85
Ontario      86
Superior    149
dtype: int64

In [283]:
lakes = pd.DataFrame({'Max Depth (m)': max_depths, 'Avg Depth (m)': avg_depths})
lakes

Unnamed: 0,Max Depth (m),Avg Depth (m)
Erie,64,19
Huron,229,59
Michigan,281,85
Ontario,244,86
Superior,406,149


In [284]:
avg_depths['Michigan']

85

In [285]:
lakes['Avg Depth (m)']

Erie         19
Huron        59
Michigan     85
Ontario      86
Superior    149
Name: Avg Depth (m), dtype: int64

In [286]:
avg_depths > 60

Erie        False
Huron       False
Michigan     True
Ontario      True
Superior     True
dtype: bool

In [287]:
lakes.describe()

Unnamed: 0,Max Depth (m),Avg Depth (m)
count,5.0,5.0
mean,244.8,79.6
std,122.713895,47.389872
min,64.0,19.0
25%,229.0,59.0
50%,244.0,85.0
75%,281.0,86.0
max,406.0,149.0


In [288]:
lakes['Max Depth (m)'].max()

406

### Programing

A common exercise in programming books is to compute the Fibonacci sequence up to some number n. The Fibonacci sequence is a sequence in math that starts with 0 and 1, and then each successive entry is the sum of the previous two. Thus, the sequence goes 0,1,1,2,3,5,8,13,21,34,55,89,...

Let's write the code for this:

In [289]:
# Define variable n
n = 10
# Initialize a list with numbers 0 and 1
sequence = [0,1]
# Loop over integers from 2 to n to append new element of the list
for i in range(2,n): # This is going to be a problem if we ever set n <= 2!
    sequence.append(sequence[i-1]+sequence[i-2])
print(sequence)

[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]


Now let's put this code into a function:

In [290]:
def fibonacci(sequence_length):
    "Return the Fibonacci sequence of length *sequence_length*"
    sequence = [0,1]
    if sequence_length < 1:
        print("Fibonacci sequence only defined for length 1 or greater")
        return
    if 0 < sequence_length < 3:
        return sequence[:sequence_length]
    for i in range(2,sequence_length): 
        sequence.append(sequence[i-1]+sequence[i-2])
    return sequence

In [291]:
fibonacci(10)

[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]