# <img style="float: left; padding-right: 10px; width: 45px" src="../../styles/iacs.png"> S-109A Introduction to Data Science 


# Lab 1: Solutions to Introduction to Python 

**Harvard University**<br>
**Fall 2018**<br>
**Instructors:** Pavlos Protopapas and Kevin Rader <br>
**Lab Instructor:** Rahul Dave <br>
**Authors:** Rahul Dave, David Sondak, Will Claybaugh, Pavlos Protopapas


---



In [2]:
## RUN THIS CELL TO GET THE RIGHT FORMATTING 
from IPython.core.display import HTML
def css_styling():
    styles = open("../../styles/cs109.css", "r").read()
    return HTML(styles)
css_styling()

## Programming Expectations
All assignments for this class will use Python and the browser-based iPython notebook format you are currently viewing. Python experience is not a prerequisite for this course, as long as you are comfortable learning on your own as needed. While we strive to make the programming component of this course straightforward, we won't devote much time to teaching programming or Python syntax. 

Note though that the **programming at the level of CS 50 is a prerequisite** for this course.   If you have concerns about the prerequisite, please come speak with any of the instructors. 

 We will refer to the Python 3 [documentation](https://docs.python.org/3/) in this lab and throughout the course.  There are also many introductory tutorials to help build programming skills, which we are listed in the last section of this lab.

## Table of Contents 
<ol start="0">
<li> Learning Goals </li>
<li> Getting Started</li>
<li> Lists </li>
<li> Strings and Listiness </li>
<li> Dictionaries </li>
<li> Functions </li>
<li> Text Analysis of Hamlet </li>
<li> References </li>
</ol>

## Part 0:  Learning Goals 
This introductory lab is a condensed tutorial in Python programming.  By the end of this lab, you will feel more comfortable:

- Writing short Python code using functions, loops, arrays, dictionaries, strings,  if statements.

- Manipulating Python lists and recognizing the listy properties of other Python containers.

- Learning and reading Python documentation.  

*Lab 1 relates to material in lecture 0,1,2,3 and homework 0.*

## Part 1: Getting Started

### Importing modules
All notebooks should begin with code that imports *modules*, collections of built-in, commonly-used Python functions.  Below we import the Numpy module, a fast numerical programming library for scientific computing.  Future labs will require additional modules, which we'll import with the same `import MODULE_NAME as MODULE_NICKNAME` syntax.

In [3]:
import numpy as np #imports a fast numerical programming library

Now that Numpy has been imported, we can access some useful functions.  For example, we can use `mean` to calculate the mean of a set of numbers.

In [4]:
np.mean([1.2, 2, 3.3])

2.1666666666666665

to calculate the mean of 1.2, 2, and 3.3.

The code above is not particularly efficient, and efficiency will be important for you when dealing with large data sets. In Lab 1 we will see more efficient options.

### Calculations and variables

At the most basic level we can use Python as a simple calculator.

In [5]:
1 + 2

3

Notice integer division (//) and floating-point error below!

In [6]:
1/2, 1//2, 1.0/2.0, 3*3.2

(0.5, 0, 0.5, 9.600000000000001)

The last line in a cell is returned as the output value, as above.  For cells with multiple lines of results, we can display results using ``print``, as can be seen below.

In [7]:
print(1 + 3.0, "\n", 9, 7)
5/3

4.0 
 9 7


1.6666666666666667

We can store integer or floating point values as variables.  The other basic Python data types -- booleans, strings, lists -- can also be stored as variables. 

In [8]:
a = 1
b = 2.0

Here is the storing of a list:

In [9]:
a = [1, 2, 3]

Think of a variable as a label for a value, not a box in which you put the value

![](images/sticksnotboxes.png)

(image taken from Fluent Python by Luciano Ramalho)

In [10]:
b = a
b

[1, 2, 3]

This DOES NOT create a new copy of `a`. It merely puts a new label on the memory at a, as can be seen by the following code:

In [11]:
print("a", a)
print("b", b)
a[1] = 7
print("a after change", a)
print("b after change", b)

a [1, 2, 3]
b [1, 2, 3]
a after change [1, 7, 3]
b after change [1, 7, 3]


Multiple items on one line in the interface are returned as a *tuple*, an immutable sequence of Python objects.

In [12]:
a = 1
b = 2.0
a + a, a - b, b * b, 10*a

(2, -1.0, 4.0, 10)

We can obtain the type of a variable, and use boolean comparisons to test these types. 

In [13]:
type(a) == float

False

In [14]:
type(a) == int

True

For reference, below are common arithmetic and comparison operations.

<img src="images/ops1_v2.png" alt="Drawing" style="width: 600px;"/>

<img src="images/ops2_v2.png" alt="Drawing" style="width: 650px;"/>

>**EXERCISE**:  Create a tuple called `tup` with the following seven objects:

> - The first element is an integer of your choice
> - The second element is a float of your choice  
> - The third element is the sum of the first two elements
> - The fourth element is the difference of the first two elements
> - The fifth element is first element divided by the second element

> Display the output of `tup`.  What is the type of the variable `tup`? What happens if you try and chage an item in the tuple? 

In [17]:
# your code here
a = 1
b = 2.0
tup = (a, b, a + b, a - b, a/a)
print(tup, type(tup))
print(type(a))
tup[1] = 4

(1, 2.0, 3.0, -1.0, 1.0) <class 'tuple'>
<class 'int'>


TypeError: 'tuple' object does not support item assignment

## Part 2: Lists

Much of Python is based on the notion of a list.  In Python, a list is a sequence of items separated by commas, all within square brackets.  The items can be integers, floating points, or another type.  Unlike in C arrays, items in a Python list can be different types, so Python lists are more versatile than traditional arrays in C or in other languages. 

Let's start out by creating a few lists.  

In [16]:
empty_list = []
float_list = [1., 3., 5., 4., 2.]
int_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
mixed_list = [1, 2., 3, 4., 5]
print(empty_list)
print(int_list)
print(mixed_list, float_list)

[]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
[1, 2.0, 3, 4.0, 5] [1.0, 3.0, 5.0, 4.0, 2.0]


Lists in Python are zero-indexed, as in C.  The first entry of the list has index 0, the second has index 1, and so on.

In [15]:
print(int_list[0])
print(float_list[1])

1
3.0


What happens if we try to use an index that doesn't exist for that list?  Python will complain!

In [16]:
print(float_list[10])

IndexError: list index out of range

A list has a length at any given point in the execution of the code, which we can find using the `len` function.

In [17]:
print(float_list)
len(float_list)

[1.0, 3.0, 5.0, 4.0, 2.0]


5

### Indexing on lists

And since Python is zero-indexed, the last element of `float_list` is

In [18]:
float_list[len(float_list)-1]

2.0

It is more idiomatic in python to use -1 for the last element, -2 for the second last, and so on

In [19]:
float_list[-1]

2.0

We can use the ``:`` operator to access a subset of the list.  This is called *slicing.* 

In [20]:
print(float_list[1:5])
print(float_list[0:2])

[3.0, 5.0, 4.0, 2.0]
[1.0, 3.0]


Below is a summary of list slicing operations:

<img src="images/ops3_v2.png" alt="Drawing" style="width: 600px;"/>

You can slice "backwards" as well:

In [21]:
float_list[:-2] # up to second last

[1.0, 3.0, 5.0]

In [22]:
float_list[:4] # up to but not including 5th element

[1.0, 3.0, 5.0, 4.0]

You can also slice with a stride:

In [23]:
float_list[:4:2] # above but skipping every second element

[1.0, 5.0]

We can iterate through a list using a loop.  Here's a for loop.

In [24]:
for ele in float_list:
    print(ele)

1.0
3.0
5.0
4.0
2.0


Or, if we like, we can iterate through a list using the indices using a for loop with  `in range`. This is not idiomatic and is not recommended, but accomplishes the same thing as above.

In [25]:
for i in range(len(float_list)):
    print(float_list[i])

1.0
3.0
5.0
4.0
2.0


What if you wanted the index as well?

Python has other useful functions such as `enumerate`,  which can be used to create a list of tuples with each tuple of the form `(index, value)`. 

In [26]:
for i, ele in enumerate(float_list):
    print(i,ele)

0 1.0
1 3.0
2 5.0
3 4.0
4 2.0


In [27]:
list(enumerate(float_list))

[(0, 1.0), (1, 3.0), (2, 5.0), (3, 4.0), (4, 2.0)]

This is an example of an *iterator*, something that can be used to set up an iteration. When you call `enumerate`, a list if tuples is not created. Rather an object is created, which when iterated over (or when the `list` function is called using it as an argument), acts like you are in a loop, outputting one tuple at a time.

### Appending and deleting

We can also append items to the end of the list using the `+` operator or with `append`.

In [28]:
float_list + [.333]

[1.0, 3.0, 5.0, 4.0, 2.0, 0.333]

In [29]:
float_list.append(.444)

In [30]:
print(float_list)
len(float_list)

[1.0, 3.0, 5.0, 4.0, 2.0, 0.444]


6

Go and run the cell with `float_list.append` a second time.  Then run the next line.  What happens?  

To remove an item from the list, use `del.`

In [31]:
del(float_list[2])
print(float_list)

[1.0, 3.0, 4.0, 2.0, 0.444]


### List Comprehensions

Lists can be constructed in a compact way using a *list comprehension*.  Here's a simple example.

In [32]:
squaredlist = [i*i for i in int_list]
squaredlist

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

And here's a more complicated one, requiring a conditional.

In [33]:
comp_list1 = [2*i for i in squaredlist if i % 2 == 0]
print(comp_list1)

[8, 32, 72, 128, 200]


This is entirely equivalent to creating `comp_list1` using a loop with a conditional, as below:

In [34]:
comp_list2 = []
for i in squaredlist:
    if i % 2 == 0:
        comp_list2.append(2*i)
        
comp_list2

[8, 32, 72, 128, 200]

The list comprehension syntax

```
[expression for item in list if conditional]

```

is equivalent to the syntax

```
for item in list:
    if conditional:
        expression
```

>**EXERCISE**:  Build a list that contains every prime number between 1 and 100, in two different ways:
1.  Using for loops and conditional if statements.
2.  *(Stretch Goal)* Using a list comprehension.  You should be able to do this in one line of code, and it may be helpful to look up the function `all` in the documentation.

In [35]:
# your code here
N = 100;

# using loops and if statements
primes = [];
for j in range(2, N):
    count = 0;
    for i in range(2,j):
        if j % i == 0:
            count = count + 1;
    if count == 0:
        primes.append(j)
primes

[2,
 3,
 5,
 7,
 11,
 13,
 17,
 19,
 23,
 29,
 31,
 37,
 41,
 43,
 47,
 53,
 59,
 61,
 67,
 71,
 73,
 79,
 83,
 89,
 97]

In [36]:
# your code here
# using list comprehension 
primes_lc = [j for j in range(2, N) if all(j % i != 0 for i in range(2, j))]

print(primes)
print(primes_lc)

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]


## Part 3:  Strings and listiness

A list is a container that holds a bunch of objects.  We're particularly interested in Python lists because many other containers in Python, like strings, dictionaries, numpy arrays, pandas series and dataframes, and iterators like `enumerate`, have list-like properties.  This is known as [duck](https://en.wikipedia.org/wiki/Duck_typing) typing, a term coined by Alex Martelli, which refers to the notion that  *if it quacks like a duck, it is a duck*.  We'll soon see that these  containers quack like lists, so for practical purposes we can think of these containers as lists!  They are listy!

Containers that are listy have a set length, can be sliced, and can be iterated over with a loop.  Let's look at some listy containers now.

### Strings
We claim that strings are listy.  Here's a string.

In [37]:
astring = "kevin"

Like lists, this string has a set length, the number of characters in the string.

In [38]:
len(astring)

5

Like lists, we can slice the string.

In [39]:
print(astring[0:2])
print(astring[0:6:2])
print(astring[-1])

ke
kvn
n


And we can iterate through the string with a loop.  Below is a while loop:

In [40]:
i = 0
while i < len(astring):
    print(astring[i])
    i = i + 1

k
e
v
i
n


This is equivalent to the for loop:

In [41]:
for character in astring:
    print(character)

k
e
v
i
n


So strings are listy.  

How are strings different from lists?  While lists are mutable, strings are immutable.  Note that an error occurs when we try to change the second elemnt of `string_list` from 1 to b.

In [42]:
print(float_list)
float_list[1] = 2.09
print(float_list)
print(astring)
astring[1] = 'b'
print(astring)

[1.0, 3.0, 4.0, 2.0, 0.444]
[1.0, 2.09, 4.0, 2.0, 0.444]
kevin


TypeError: 'str' object does not support item assignment

We can't use `append` but we can concatenate with `+`. Why is this?

In [43]:
astring = astring + ', pavlos, ' + 'rahul, ' + 'margo'
print(astring)
type(astring)

kevin, pavlos, rahul, margo


str

What is happening here is that we are creating a new string in memory when we do `astring + ', pavlos, ' + 'rahul, ' + 'margo'`. Then we are relabelling this string with the old lavel `astring`. This means that the old memory that `astring` labelled is forgotten. What happens to it? We'll find out in lab 1.

Or we could use `join`.  See below for a summary of common string operations.  

<img src="images/ops4_v3.png" alt="Drawing" style="width: 600px;"/>

To summarize this section, for  practical purposes all containers that are listy have the following properties:

1.  Have a set length, which you can find using `len`
2.  Are iterable (via a loop)
3.  Are sliceable via : operations

We will encounter other listy containers soon.

>**EXERCISE**: Make three strings, called `first`, `middle`, and `last`, with your first, middle, and last names, respectively.  If you don't have a middle name, make up a middle name!  

>Then create a string called `full_name` that joins your first, middle, and last name, with a space separating your first, middle, and last names.  

>Finally make a string called `full_name_rev` which takes `full_name` and reverses the letters.  For example, if `full_name` is `Jane Beth Doe`, then `full_name_rev` is `eoD hteB enaJ`.



In [44]:
list(range(-1, -5))

[]

In [45]:
# your code here
first = 'Margo'
middle = 'Suzanne'
last = 'Levine'
full_name = ' '.join([first, middle, last])

full_name_rev = []
for i in range(len(full_name)):
    full_name_rev.append(full_name[len(full_name)-1 - i]) 
    
full_name_rev = ''.join(full_name_rev[0:])
print(full_name_rev)
print(full_name)
print(full_name[::-1])

eniveL ennazuS ograM
Margo Suzanne Levine
eniveL ennazuS ograM


## Part 4: Dictionaries
A dictionary is another storage container.  Like a list, a dictionary is a sequence of items.  Unlike a list, a dictionary is unordered and its items are accessed with keys and not integer positions.  

Dictionaries are the closest container we have to a database.

Let's make a dictionary with a few Harvard courses and their corresponding enrollment numbers.

In [46]:
enroll2016_dict = {'CS50': 692, 'CS109 / Stat 121 / AC 209': 312, 'Econ1011a': 95, 'AM21a': 153, 'Stat110': 485}
enroll2016_dict

{'AM21a': 153,
 'CS109 / Stat 121 / AC 209': 312,
 'CS50': 692,
 'Econ1011a': 95,
 'Stat110': 485}

In [47]:
enroll2016_dict.values()

dict_values([692, 312, 95, 153, 485])

In [48]:
enroll2016_dict.items()

dict_items([('CS50', 692), ('CS109 / Stat 121 / AC 209', 312), ('Econ1011a', 95), ('AM21a', 153), ('Stat110', 485)])

In [49]:
for key, value in enroll2016_dict.items():
    print("%s: %d" %(key, value))

CS50: 692
CS109 / Stat 121 / AC 209: 312
Econ1011a: 95
AM21a: 153
Stat110: 485


Simply iterating over a dictionary gives us the keys. This is useful when we want to do something with each item:

In [50]:
second_dict={}
for key in enroll2016_dict:
    second_dict[key] = enroll2016_dict[key]
second_dict

{'AM21a': 153,
 'CS109 / Stat 121 / AC 209': 312,
 'CS50': 692,
 'Econ1011a': 95,
 'Stat110': 485}

The above is an actual copy to another part of memory, unlike, `second_dict = enroll2016_dict` which would have made both variables label the same meory location.

In this example, the keys are strings corresponding to course names.  Keys don't have to be strings though.  

Like lists, you can construct dictionaries using a *dictionary comprehension*, which is similar to a list comprehension. Notice the brackets {} and the use of `zip`, which is another iterator that combines two lists together.

In [51]:
my_dict = {k:v for (k, v) in zip(int_list, float_list)}
my_dict

{1: 1.0, 2: 2.09, 3: 4.0, 4: 2.0, 5: 0.444}

You can also create dictionaries nicely using the *constructor* function `dict`.

In [52]:
dict(a = 1, b = 2)

{'a': 1, 'b': 2}

While dictionaries have some similarity to lists, they are not listy.  They do have a set length, and the can be iterated through with a loop, but they cannot be sliced, since they have no sense of an order. In technical terms, they satisfy, along with lists and strings, Python's *Sequence* protocol, which is a higher abstraction than that of a list.

### A cautionary word on iterators (read at home)

Iterators are a bit different from lists in the sense that they can be "exhausted". Perhaps its best to explain with an example

In [53]:
an_iterator = enumerate(astring)

In [54]:
type(an_iterator)

enumerate

In [55]:
for i, c in an_iterator:
    print(i,c)

0 k
1 e
2 v
3 i
4 n
5 ,
6  
7 p
8 a
9 v
10 l
11 o
12 s
13 ,
14  
15 r
16 a
17 h
18 u
19 l
20 ,
21  
22 m
23 a
24 r
25 g
26 o


In [56]:
for i, c in an_iterator:
    print(i,c)

What happens, you get nothing when you run this again! This is because the iterator has been "exhausted", ie, all its items are used up. I have had answers go wrong for me because I wasnt careful about this. You must either track the state of the iterator or bypass this problem by not storing `enumerate(BLA)` in a variable, so that you dont inadvertantly "use that variable" twice.