ISRC Python Workshop: Basics I

## Introduction

![Use Python 2](http://i.imgur.com/v4OD30P.png)

### Interpreted Language

*You can type your command and get the response instantly.*

Example: 
![Example of Python code](http://i.imgur.com/VMP3D0T.png)

### Popular Language for Data Analysis

 ![Python Packages](http://i.imgur.com/Q8trGd1.png)

### Variables
*Vairables can be considered containers. You can put anything inside a container, without specifying the size or type, which will be needed in Java or C. Note that Python is case-sensitive. Be careful about using letters in different cases.*

In [1]:
x = 3 # integer
y = 3. # floating point number
z = "Hello!" # strings
Z = "Wonderful!" # another string, stored in a variable big z.
print(x)
print(y)
print(z)
print(Z)

3
3.0
Hello!
Wonderful!


*You can do operations on numeric values as well as strings.*

In [2]:
sum_ = x + y # int + float = float
v = "World!"
sum_string = z + " " + v # concatenate strings
print(sum_)
print(sum_string)

6.0
Hello! World!


*Print with formating*

In [3]:
print("The sum of x and y is %f"%sum_)

The sum of x and y is 6.000000


*__Some notes on Strings__*

*To initialize a string variable, you can use either double or single quotes.*

In [4]:
store_name = "HyVee"

*You can think of strings as a sequence of characters. In this case, indices and bracket notations can be used to access specific ranges of characters.*

In [5]:
name_13 = store_name[1:4] # [start, end), end is exclusive; Python starts with 0 NOT 1
print(name_13)
last_letter = store_name[-1] # -1 means the last element
print(last_letter)

yVe
e


### Control Logics

*In the following examples, we show examples of comparison, if-else loop, for loop, and while loop.*

#### Comparison

In [6]:
print(store_name == "HyVee") # Will return a boolean value True or False
print(sum_ < 0)

True
False


#### If-Else

In [7]:
if store_name != "Walmart":
    print("The store is not Walmart. It's " + store_name + ".")
else:
    print("The store is Walmart.")

The store is not Walmart. It's HyVee.


In [8]:
if sum_ == 0:
    print("sum_ is 0")
elif sum_ < 0:
    print("sum_ is less than 0")
else:
    print("sum_ is above 0 and its value is " + str(sum_)) # Cast sum_ into string type.

sum_ is above 0 and its value is 6.0


#### For loop: Iterating thru a sequence

In [9]:
for letter in store_name:
    print(letter)
# Use index to access specific elements
# range() is a function to create interger sequences
print("range(5) gives: " + str(range(5))) # By default starts from 0
print("range(1,9) gives: " + str(range(1, 9))) # From 1 to 9-1 (Again the end index is exclusive.)
for index in range(len(store_name)): # length of a sequence
    print("The %ith letter in store_name is: "%index + store_name[index])

H
y
V
e
e
range(5) gives: [0, 1, 2, 3, 4]
range(1,9) gives: [1, 2, 3, 4, 5, 6, 7, 8]
The 0th letter in store_name is: H
The 1th letter in store_name is: y
The 2th letter in store_name is: V
The 3th letter in store_name is: e
The 4th letter in store_name is: e


#### While loop: Keep doing until condition no longer holds.

*Use __for__ when you know the exact number of iterations; use __while__ when you do not (e.g., checking convergence).*

In [10]:
flag = True
index = 0
while flag:
    print(store_name[index])
    index += 1 # a += b means a = a + b
    if index >= len(store_name):
        flag = False # if we get to the last element of string, the condition no longer holds
        print("The End!")

H
y
V
e
e
The End!


#### Notes: Keyword *break* and *continue* 

*break* means get out of the loop immediately. Any code after the break will NOT be executed

In [11]:
flag = True
index = 0
while flag:
    print(store_name[index])
    index += 1 # a += b means a = a + b
    if store_name[index] == "V":
        print("End at V")
        break # instead of setting flag to False, we can directly break out of the loop
        print("Hello!") # This will NOT be run

H
y
End at V


*continue means get to the next iteration of loop. It is __breaking__ the current iteration and __continue__ to the next.*

In [12]:
for letter in store_name:
    if letter == "V":
        continue # Not printing V
    else:
        print(letter)

H
y
e
e


### Data Structures

*In this section, we show some major data structures in Python.*

#### List

*Initialize a list with brackets. You can store anything in a list, even if they are different types*

In [13]:
a_list = [1, 2, 3] # commas to seperate elements
print("Length of a_list is: %i"%(len(a_list)))
print("The 3rd element of a_list is: %s" %(a_list[2])) # Remember Python starts with 0
print("The sum of a_list is %.2f"%(sum(a_list)))
b_list = [20, True, "good", "good"] # We can put different types in a list

Length of a_list is: 3
The 3rd element of a_list is: 3
The sum of a_list is 6.00


*Update a list: __pop__, __remove__, __append__, __extend__*

In [14]:
print("Pop %i out of a_list"%a_list.pop(1)) # pop the value of an index
print(a_list)
print("Remove the string good from b_list:")
b_list.remove("good") # remove a specific value (the first one in the list)
print(b_list)
a_list.append(10)
print("After appending a new value, a_list is now: %s"%(str(a_list)))
# merge a_list and b_list
a_list.extend(b_list)
## This is equivalent to a_list += b_list
print("Merging a_list and b_list: %s"%(str(a_list)))
print("We can also use + to concatenate two lists: a_list + b_list = %s"%(a_list+b_list))

Pop 2 out of a_list
[1, 3]
Remove the string good from b_list:
[20, True, 'good']
After appending a new value, a_list is now: [1, 3, 10]
Merging a_list and b_list: [1, 3, 10, 20, True, 'good']
We can also use + to concatenate two lists: a_list + b_list = [1, 3, 10, 20, True, 'good', 20, True, 'good']


#### Tuple (A special case of list whose elements cannot be changed)
*Initialize a tuple with paranthesis. The only difference between list and tuple is that you can alter list but not tuple.*

In [15]:
a_tuple = (1, 2, 3, 10)
print(a_tuple)
print("First element of a_tuple: %i"%a_tuple[0])
# You cannot change the values of a_tuple
a_tuple[0] = 5

(1, 2, 3, 10)
First element of a_tuple: 1


TypeError: 'tuple' object does not support item assignment

#### Dictionary: key-value pairs

*Initialize a dict by curly brackets*

In [16]:
d = {} # empty dictionary
d[1] = "1 value" # add a key-value by using bracket (key). You can put anything in key or value.
print(d)
# Use for loop to add values
for index in range(2, 10):
    d[index] = "%i value"%index
print(d)
print("All the keys: " + str(d.keys()))
print("All the values: " + str(d.values()))
for key in d:
    print "Key is: %i, Value is : %s"%(key, d[key])

{1: '1 value'}
{1: '1 value', 2: '2 value', 3: '3 value', 4: '4 value', 5: '5 value', 6: '6 value', 7: '7 value', 8: '8 value', 9: '9 value'}
All the keys: [1, 2, 3, 4, 5, 6, 7, 8, 9]
All the values: ['1 value', '2 value', '3 value', '4 value', '5 value', '6 value', '7 value', '8 value', '9 value']
Key is: 1, Value is : 1 value
Key is: 2, Value is : 2 value
Key is: 3, Value is : 3 value
Key is: 4, Value is : 4 value
Key is: 5, Value is : 5 value
Key is: 6, Value is : 6 value
Key is: 7, Value is : 7 value
Key is: 8, Value is : 8 value
Key is: 9, Value is : 9 value


### Functions

*Now we can write our first function by combining all we have above.*

*Function is a block of codes with input arguments (and, optionally, return values) for specific purposes.*

In [17]:
def mySum(list_to_sum):
    return sum(list_to_sum)
def mySumUsingLoop(list_to_sum):
    sum_ = list_to_sum[0]
    for item in list_to_sum[1:]:
        sum_ += item
    return sum_
#################################
print(mySum(range(5)))
print(mySumUsingLoop(range(5)))

10
10


### FIle I/O
*This section is about some basics on reading and writing data to your hard disks.*

#### Write data to a file

In [18]:
f = open("./tmp.csv", "w") # f is a file handler, while "w" is the mode (w for write)
data = range(10)
for item in data:
    f.write(str(item))
    f.write("\n") # add newline character
f.close()

#### Read data to a file

In [19]:
f = open("./tmp.csv", "r") # this time, use read mode
contents = [item for item in f] # list comprehension. This is the same as for-loop but more concise
contents = [item.strip("\n") for item in contents] # strip the newline
print(contents)
int_values = map(int, contents) # map the values into integer type
print(int_values)
f.close()

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


### Libraries

#### Built-in Libraries

*Python provides many built-in packages to prevent extra work on some common and useful functions*

*We will use __math__ as an example.*

In [20]:
# use import to load a library
import math
x = 3
print("e^x = e^3 = %f"%math.exp(x))
print("log(x) = log(3) = %f"%math.log(x))

e^x = e^3 = 20.085537
log(x) = log(3) = 1.098612


In [21]:
# You can import a specific function
from math import exp
print(exp(x)) # This way, you don't need to use math.exp but just exp
# Import all functions
from math import *
print(exp(x))
print(log(x)) # Try these two before importing math

20.0855369232
20.0855369232
1.09861228867


#### External Libraries

*There are times you'll want some advanced utility functions not provided by Python. There are many useful packages by developers.*

*We'll use __numpy__ as an example. (__numpy__, __scipy__, __matplotlib__,and probably __pandas__ will be of the most importance to you for data analyses.*

*Installation of packages for Python is the easiest using <a href="https://packaging.python.org/installing/" target="_blank">pip</a>.*

In [22]:
# After you install numpy, load it
import numpy as np # you can use np instead of numpy to call the functions in numpy package
x = np.array([[1,2,3], [4,5,6]], dtype=np.float) # create a numpy array object, specify the data type as float
print(x)

[[ 1.  2.  3.]
 [ 4.  5.  6.]]


In [24]:
# Scipy/Numpy provides extensive utilities to manipulate data and simple analysis
from scipy.stats import pearsonr, spearmanr # correlation functions
print(pearsonr(x[1, :], x[0, :]))
print(spearmanr(x[1, :], x[0, :]))

(1.0, 0.0)
SpearmanrResult(correlation=1.0, pvalue=0.0)
