# Python Functions, Files, and Dictionaries

## Introduction: Working with Data Files
In Python, we must **open** files before we can use them and **close** them when we are done with them. As you might expect, **once a file is opened it becomes a Python object** just like all other data. The followoing shows the functions and methods that can be used to open and close files.

    `open` | `open(filename,'r')` --> Open a file called filename and use it for reading. This will return a reference to a file object.

    `open` | `open(filename,'w')` -----> Open a file called filename and use it for writing. This will also return a reference to a file object.

    `close` | `filevariable.close()` -----> File use is complete.

## Reading a File
As an example, suppose we have a text file called `olympics.txt` that contains the data representing about olympians across different years. The contents of the file are shown at the bottom of the page.

To `open` this file, we would call the open function. The variable, `fileref`, now holds a reference to the file object returned by open. When we are finished with the file, we can close it by using the `close` method. After the file is closed any further attempts to use `fileref` will result in an error.`

we can illustrate how `file reading` works. Python provides some functions for reading data from an existing file. There are two steps. First, you call the `open` function to open the file. So, here we have an invocation of the open function, and we pass in two arguments. One is the string, that's the name of the file, Olympics.text. And the other says what to do with the file. In our case, `r` for reading. Later on, we'll see `w` for writing. The open function returns an object. It's a file object, and we are assigning it to this variable name called `fileref`. There's going to be an additional step that we're going to have to do to actually read the contents. So, this step that we're showing so far just creates the file object. Then, they're going to be some lines of code that we haven't written yet, that will actually get the contents from the file and do something with it. Then, there is a corresponding close operation that lets Python know that we're done working with this file object, and it's okay to stop keeping track of it. So, there's a line three, `fileref.close`. Now, if I run this, we're actually not going to see any output because all we've done is open the file and then close it. We haven't actually read the contents in, and we certainly haven't printed anything out, so you're not seeing anything in the output window. 

In [3]:
fileref = open("olympics.txt", "r")
## other code here that refers to variable fileref
fileref.close()

What if we did want to read the contents and print them out? There's a few different ways of working with file objects. The first method that we'll use is `.read`, **which is going to bring in the entire contents of the file as a single string.** So, we call the `.read` method on the `fileref` object. That returns a string, and I'm assigning that to the variable called `contents`, and then I can just print out, let's print out the first 100 characters of it. Now, we'll see something in the "Output" window. So, we're seeing the first 100 characters from the file, which got us three lines and a little bit of the fourth line. 



In [4]:
fileref = open("olympics.txt", "r")
contents = fileref.read()
print(contents[:100])
fileref.close()

Name,Sex,Age,Team,Event,Medal
A Dijiang,M,24,China,Basketball,NA
A Lamusi,M,23,China,Judo,NA
Gunnar 


You'll rarely use this method of reading the entire contents of the file all at once as a big string, partly because if you had a really big file, it would be a problem for your computer to handle all of that in memory all at once. **The only times we're going to use this dot read method is if you wanted to grab the whole file and as a string and pass it to some other function that parses it**. Even then, it will usually be some other function available that will directly read from the file object a little bit at a time, and parse its contents. 

The second method, instead of reading it all at once, we have a dot read lines | `.readline()`| method. **Instead of getting everything as a single string, it returns a list of strings, one string for each line in the file**. So, let's print out. Let's say the first four lines of the file this way. Let's call it lines because I called it lines on line two. 


In [5]:
fileref = open("olympics.txt", "r")
lines = fileref.readlines()
print(lines[:4])
fileref.close()

['Name,Sex,Age,Team,Event,Medal\n', 'A Dijiang,M,24,China,Basketball,NA\n', 'A Lamusi,M,23,China,Judo,NA\n', 'Gunnar Nielsen Aaby,M,24,Denmark,Football,NA\n']


You can see now that we're printing out a list, and we've got the square brackets. __Inside the list, there are four strings__. Each of the strings you notice is ending with this special backslash `n` character. **That's the new line character, because in the file, we have a bunch of lines of text**. So, when we read these lines in, each of these strings has a backslash `n` at the end of it. Instead of just printing out all these lines, I could maybe get a slightly prettier print out, if I iterate through them. So, for line in lines and maybe I'll just take the first four lines again. First five lines, and I'm going to print the individual line. So, now, when I run it, it's going to iterate through these four lines, and each one of them is going to go on its own line. We're no longer going to get the square brackets to show up because we're not printing the whole list. We're iterating through the individual strings. We're also not going to get these quote marks because we're going to pass the strings, and when we print those out, we just show their contents in the "Output" window. So, let's see how that looks when we run it, and sure enough, we get each of the lines separately. 



In [6]:
fileref = open("olympics.txt", "r")
lines = fileref.readlines()
for lin in lines[:4]:
    print(lin)
fileref.close()

Name,Sex,Age,Team,Event,Medal

A Dijiang,M,24,China,Basketball,NA

A Lamusi,M,23,China,Judo,NA

Gunnar Nielsen Aaby,M,24,Denmark,Football,NA



Now, you might notice something a little strange here, which is that we get these blank lines. **The reason for that is that each of the strings, you'll remember had that newline character at the end, which meant do a carriage return. The print function always does a a carriage return, and so we're getting two of those. One is starting us on a new line, and the other one is starting us on a new line again, so we get a blank line. What if we didn't want to have that extra blank line?** 

Well, you've seen the **`.strip`** method before. I can strip the whitespace from the beginning and ends of each of these lines, so the `.strip` method gets rid of any whitespace at the beginning or the end. **Whitespace is the space character, a tab character, or a new line character.** So, if I call this, now, I'm going to get the printout that doesn't have the blank lines, and sure enough, we've got the first four lines from the file. 


In [7]:
fileref = open("olympics.txt", "r")
lines = fileref.readlines()
for lin in lines[:4]:
    print(lin.strip())
fileref.close()

Name,Sex,Age,Team,Event,Medal
A Dijiang,M,24,China,Basketball,NA
A Lamusi,M,23,China,Judo,NA
Gunnar Nielsen Aaby,M,24,Denmark,Football,NA


**Now, there's a shorter way to iterate over the lines if that's all we're going to do is iterate over all of them**. So, let me show you that because it's the more Pythonic way rather than reading the entire file into a list. We can just directly iterate over all of the lines by saying for line in `fileref`. So, here, it's a file object. It's not a list, but it knows how to be iterated over and each time we get one more line. So, this is going to do exactly the same thing that we had before. Except now, we're going to get all the lines in the file.

In [8]:
fileref = open("olympics.txt", "r")
#lines = fileref.readlines()
for lin in lines:
    print(lin.strip())
fileref.close()

Name,Sex,Age,Team,Event,Medal
A Dijiang,M,24,China,Basketball,NA
A Lamusi,M,23,China,Judo,NA
Gunnar Nielsen Aaby,M,24,Denmark,Football,NA
Edgar Lindenau Aabye,M,34,Denmark/Sweden,Tug-Of-War,Gold
Christine Jacoba Aaftink,F,21,Netherlands,Speed Skating,NA
Christine Jacoba Aaftink,F,21,Netherlands,Speed Skating,NA
Christine Jacoba Aaftink,F,25,Netherlands,Speed Skating,NA
Christine Jacoba Aaftink,F,25,Netherlands,Speed Skating,NA
Christine Jacoba Aaftink,F,27,Netherlands,Speed Skating,NA
Christine Jacoba Aaftink,F,27,Netherlands,Speed Skating,NA
Per Knut Aaland,M,31,United States,Cross Country Skiing,NA
Per Knut Aaland,M,31,United States,Cross Country Skiing,NA
Per Knut Aaland,M,31,United States,Cross Country Skiing,NA
Per Knut Aaland,M,31,United States,Cross Country Skiing,NA
Per Knut Aaland,M,33,United States,Cross Country Skiing,NA
Per Knut Aaland,M,33,United States,Cross Country Skiing,NA
Per Knut Aaland,M,33,United States,Cross Country Skiing,NA
Per Knut Aaland,M,33,United States,Cro

So, we can iterate over this file object directly. **We can't do this thing of taking a slice of it like we did with lists**. That gives us an error. 

In [10]:
fileref = open("olympics.txt", "r")
for lin in fileref[:4]:
    print(lin.strip())
fileref.close()

TypeError: '_io.TextIOWrapper' object is not subscriptable

**So, a file object supports iteration, but it does not support taking slices.** So, if we wanted to just do something with the first four lines, we'd have to use the `.readlines()`, rather than just iterating over the file object. If we're prepared to process all the lines, which is the normal thing that you're going to do with a file, this is the standard Pythonic idiom. 

Now, when should you actually call `.readlines` or `.read`? Well, one reason to call `.readlines` is, if you wanted to take slices. Another reason might be that you wanted to just get a `count` of how many lines are in the file. So, if I get all of the lines and put them in a variable, I could now print out the length of lines, and that would tell me how many lines were in the file. Turns out there are 60 lines in the file. 

In [15]:
fileref = open("olympics.txt", "r")
lines = fileref.readlines()
print(len(lines))

60


If I wanted to find out how many _characters_ are in the file, I could read the entire file as one character string, and then I could ask for it's length. 



In [16]:
fileref = open("olympics.txt", "r")
contents = fileref.read()
print(len(contents))

3178


**So, except in those special cases, the more common thing that you're going to want to do is to just iterate over the file object itself. We won't use dot read or dot readlines, instead will just iterate over the file object itself. This is the most common way that you'll be working with files. So, that's Python code for reading from a file.**

## Alternative File Reading Methods
![image](img/06Capture.PNG)
![image](img/07Capture.PNG)

**Note**: A common error that novice programmers make is not realizing that all these ways of reading the file contents, use up the file. **After you call readlines(), if you call it again you’ll get an empty list.**

### Check your Understanding

1. Using the file `school_prompt2.txt`, find the number of characters in the file and assign that value to the variable `num_char`.

In [20]:
fileref = open("school_prompt2.txt", "r")
contents = fileref.read()
num_char = len(contents)
print(num_char)
print('....................')
print(contents[0:10])

536
....................
Writing es


2. Find the number of lines in the file, `travel_plans2.txt`, and assign it to the variable num_lines.



In [24]:
fileref = open("travel_plans2.txt", 'r')
lines = fileref.readlines()
num_lines = len(lines)
print(num_lines)
print('..................')
print(lines[:3])

11
..................
['This summer I will be travelling.\n', 'I will go to...\n', 'Italy: Rome\n']


3. Create a string called `first_forty` that is comprised of the first 40 characters of `emotion_words2.txt`.

In [25]:
fileref = open("emotion_words2.txt", "r")
contents = fileref.read()
first_forty = contents[:40]
print(first_forty)

Sad upset blue down melancholy somber bi


##  Iterating over lines in a file
We will now use this file as input in a program that will do some data processing. In the program, we will **examine each line of the file and print it with some additional text. Because `readlines()` returns a list of lines of text, we can use the `for loop` to iterate through each line of the file**.

A **line** of a file is defined to be a sequence of characters up to and including a special character called the **newline** character. If you evaluate a string that contains a newline character you will see the character represented as `\n`. If you print a string that contains a newline you will not see the `\n`, you will just see its effects (a carriage return).

As the for loop iterates through each line of the file the loop variable will contain the current line of the file as a string of characters. The general pattern for processing each line of a text file is as follows:
        
        for line in myFile.readlines():
            statement1
            statement2
            ...
To process all of our olypmics data, we will use a `for loop` to iterate over the lines of the file. **Using the `split` method, we can break each line into a list containing all the fields of interest about the athlete**. We can then take the values corresponding to name, team and event to construct a simple sentence.

In [71]:
olypmicsfile = open("olympics.txt", "r")
for aline in olypmicsfile.readlines():
    values = aline.split(",")
    print(values[0], "is from", values[3], "and is on the roster for", values[4])
olypmicsfile.close()

Name is from Team and is on the roster for Event
A Dijiang is from China and is on the roster for Basketball
A Lamusi is from China and is on the roster for Judo
Gunnar Nielsen Aaby is from Denmark and is on the roster for Football
Edgar Lindenau Aabye is from Denmark/Sweden and is on the roster for Tug-Of-War
Christine Jacoba Aaftink is from Netherlands and is on the roster for Speed Skating
Christine Jacoba Aaftink is from Netherlands and is on the roster for Speed Skating
Christine Jacoba Aaftink is from Netherlands and is on the roster for Speed Skating
Christine Jacoba Aaftink is from Netherlands and is on the roster for Speed Skating
Christine Jacoba Aaftink is from Netherlands and is on the roster for Speed Skating
Christine Jacoba Aaftink is from Netherlands and is on the roster for Speed Skating
Per Knut Aaland is from United States and is on the roster for Cross Country Skiing
Per Knut Aaland is from United States and is on the roster for Cross Country Skiing
Per Knut Aaland 

To make the code a little simpler, and to allow for more efficient processing, Python provides a built-in way to iterate through the contents of a file one line at a time, without first reading them all into a list. 

In [27]:
olypmicsfile = open("olympics.txt", "r")
for aline in olypmicsfile:
    values = aline.split(",")
#    print(values[0], "is from", values[3], "and is on the roster for", values[4])
olypmicsfile.close()

Write code to find out how many lines are in the file `emotion_words.txt` as shown above. Save this value to the variable `num_lines`. **Do not use the len method**.

In [28]:
fileref = open("emotion_words.txt", "r")
num_lines = 0
lines = fileref.readlines()
for lin in lines:
    num_lines += 1
print(num_lines)

7


## Finding a File in Your File System
![image](img/09Capture.PNG)
![image](img/08Capture.PNG)
If your file and your Python program **are in the same directory you can simply use the filename.** For example, with the file hierarchy in the diagram, the file myPythonProgram.py could contain the code `open('data1.txt', 'r')`

If your file and your Python program **are in different directories, however, then you need to specify a `path`.** You can think of the filename as the short name for a file, and the **path as the full name**. Typically, you will specify a relative file path, which says where to find the file to open, relative to the directory where the code is running from. For example, the file `myPythonProgram.py` could contain the code `open('../myData/data2.txt', 'r')`. The `../` means to go up one level in the directory structure, to the containing folder `(allProjects); myData/` says to descend into the `myData` subfolder.

There is also an option to use an `absolute file path`. For example, suppose the **file structure in the figure is stored on a computer in the user’s home directory**, `/Users/joebob01/myFiles`. Then code in any Python program running from any file folder could open `data2.txt` via `open('/Users/joebob01/myFiles/allProjects/myData/data2.txt', 'r')`. 

 [**Further Reading**](https://runestone.academy/runestone/books/published/fopp/Files/FindingaFileonyourDisk.html)

## Writing Text Files
Recall that text files contain sequences of characters. We usually think of these character sequences as being the lines of the file where each line ends with the newline `\n` character. Be very careful to notice that the `write` method takes one parameter, a string.

In [30]:
for number in range(1,4):
    square = number * number
    print(square)
        

1
4
9


Once we are satisfied that it is creating the appropriate output, the next step is to add the necessary pieces to produce an output file and write the data lines to it. To start, we need to open a new output file by calling the `open` function, `outfile = open("squared_numbers.txt",'w')`, using the `'w'` flag.

In [31]:
outfile = open("squared_numbers.txt", 'w')
for number in range(1, 5):
    square = number * number
    print(square)
outfile.close()

1
4
9
16


Once the file has been created, we just need to call the `write` method passing the string that we wish to add to the file. In this case, the string is already being printed so we will just change the `print` into a call to the `write` method. However, there is an additional step to take, since the write method can only accept a string as input. We’ll need to convert the number to a string. Then, we just need to add one extra character to the string. The newline character needs to be concatenated to the end of the line. The entire line now becomes `outfile.write(str(square)+ '\n')`. 

In [32]:
outfile = open("squared_numbers.txt", "w")
for number in range(1,13):
    square = number * number
    outfile.write(str(square)) #we don't have here a newline character
outfile.close()

but the output form the above equation is quite diffrent from what we expect and it becomes like:14916253649.... so we need to add `'\n'` to separate each line

In [33]:
outfile = open("squared_numbers.txt", "w")
for number in range(1,13):
    square = number * number
    outfile.write(str(square)) #we don't have here a newline character
    outfile.write('\n') #we add here a newline character
outfile.close()

In [34]:
outfile = open("squared_numbers.txt", "w")
for number in range(1,13):
    square = number * number
    # outfile.write(str(square)) #we don't have here a newline character
    # outfile.write('\n') #we add here a newline character
    outfile.write(str(square) + '\n') #combine the two
outfile.close()

new_outfile = open("squared_numbers.txt", "r")
print(new_outfile.read()[:10])
new_outfile.close()


1
4
9
16
2


## Using with for Files
The Python with statement makes using context managers easy. The general form of a with statement is:

    with <create some object that understands context> as <some name>:
        do some stuff with the object
        ...
A simple example will clear up all of this abstract discussion of contexts. Here are the contents of a file called “mydata.txt”.

In [35]:
with open('mydata.txt', 'r') as md: #which is equal to: md = open('mydata.txt', 'r')
    for line in md:
        print(line)

1 2 3

4 5 6


The above is equivaletn to

In [36]:
md = open('mydata.txt', 'r')
for line in md:
    print(line)
md.close()

1 2 3

4 5 6


## ***

In [37]:
fname = 'mydata2.txt'
with open(fname, 'w') as md:
    #md.read()
    #md.readlines()
    #for line in md:
    for num in range(10):
        md.write(str(num))
        md.write('\n')
    

### Recipe for Reading and Processing a File
Here’s a foolproof recipe for processing the contents of a text file. If you’ve fully digested the previous sections, you’ll understand that there are other options as well. Some of those options are preferable for some situations, and some are preferred by python programmers for efficiency reasons. In this course, though, you can always succeed by following this recipe.
1. Open the file using `with` and `open`.
2. Use `.readlines()` to get a list of the lines of text in the file.
3. Use a `for` loop to iterate through the strings in the list, each being one line from the file. On each iteration, process that line of text
4. When you are done extracting data from the file, continue writing your code outside of the indentation. Using `with` will automatically close the file once the program exits the with block.

In [38]:
fname = "yourfile.txt"
with open(fname, 'r') as fileref:         # step 1
    lines = fileref.readlines()           # step 2
    for lin in lines:                     # step 3
        #some code that references the variable lin
#some other code not relying on fileref   # ste

SyntaxError: unexpected EOF while parsing (<ipython-input-38-7d6de401ffb4>, line 6)

This option involves iterating over the file itself while still iterating over each line in the file:



In [39]:
fname = "yourfile.txt"
with open(fname, 'r') as fileref:         # step 1
    for lin in fileref:                   # step 2
        ## some code that reference the variable lin
#some other code not relying on fileref   # step 3

SyntaxError: unexpected EOF while parsing (<ipython-input-39-6adeb413e4f6>, line 5)

## csv Format

#### Reading in data from a CSV File

All file methods that we have mentioned - `read`, `readline`, and `readlines`, and simply iterating over the file object itself - will work on CSV files. In our examples, we will iterate over the lines. Because the values on each line are separated with commas, we can use the `.split()` method to parse each line into a collection of separate value.

In [41]:
fileconnection = open("olympics.txt", 'r')
lines = fileconnection.readlines()
for lin in lines[:6]:
    print(lin.strip())

Name,Sex,Age,Team,Event,Medal
A Dijiang,M,24,China,Basketball,NA
A Lamusi,M,23,China,Judo,NA
Gunnar Nielsen Aaby,M,24,Denmark,Football,NA
Edgar Lindenau Aabye,M,34,Denmark/Sweden,Tug-Of-War,Gold
Christine Jacoba Aaftink,F,21,Netherlands,Speed Skating,NA


In [42]:
header = lines[0]
field_names = header.strip().split(",")
print(field_names)

['Name', 'Sex', 'Age', 'Team', 'Event', 'Medal']


In [43]:
for row in lines[1:]:
    vals = row.strip().split(",")
    if vals[5] != "NA":
        print("{}:{};{}".format(vals[0],vals[4], vals[5]))

Edgar Lindenau Aabye:Tug-Of-War;Gold
Arvo Ossian Aaltonen:Swimming;Bronze
Arvo Ossian Aaltonen:Swimming;Bronze
Juhamatti Tapio Aaltonen:Ice Hockey;Bronze
Paavo Johannes Aaltonen:Gymnastics;Bronze
Paavo Johannes Aaltonen:Gymnastics;Gold
Paavo Johannes Aaltonen:Gymnastics;Gold
Paavo Johannes Aaltonen:Gymnastics;Gold
Paavo Johannes Aaltonen:Gymnastics;Bronze


 If you get a file using a different separator, you can just call the `.split('|')` or `.split('\\t')`.

### Writing data to a CSV File

The typical pattern for writing data to a CSV file will be to write a header row and loop through the items in a list, outputting one row for each. Here we a have a list of tuples, each representing one Olympian, a subset of the rows and columns from the file we have been reading from.

In [44]:
olympians = [("John Aalberg", 31, "Cross Country Skiing"),
             ("Minna Maarit Aalto", 30, "Sailing"),
             ("Win Valdemar Aaltonen", 54, "Art Competitions"),
             ("Wakako Abe", 18, "Cycling")]

outfile = open("reduced_olympics.csv", "w")
# output the header now
outfile.write('Name, Age, Sport')
outfile.write('\n')
#oupue each of the rows:
for olympian in olympians:
    row_string = '{},{},{}'.format(olympian[0], olympian[1], olympian[2])
    outfile.write(row_string)
    outfile.write('\n')
outfile.close()

**There are a few things worth noting in the code above**.

* First, using `.format()` makes it really clear what we’re doing when we create the variable `row_string`. We are making a comma separated set of values; the `{}` curly braces indicated where to substitute in the actual values. The equivalent `string concatenation` would be very hard to read. An alternative, also clear way to do it would be with the `.join method: row_string = ','.join(olympian[0], olympian[1], olympian[2])`.

* Second, unlike the print statement, remember that the `.write()` method on a file object **does not automatically insert a newline. Instead, we have to explicitly add the character `('\n')` at the end of each line.

* Third, we have to explicitly refer to each of the elements of olympian when building the string to write. Note that just putting `.format(olympian)` **wouldn’t work** because the interpreter would see only one value (a tuple) when it was expecting three values to try to substitute into the string template. Later in the book we will see that python provides an advanced technique for automatically unpacking the three values from the tuple, with `.format(*olympian)`.

As described previously, if one or more columns contain text, and that text could contain commas, we need to do something to distinguish a comma in the text from a comma that is separating different values (cells in the table). If we want to enclose each value in double quotes, it can start to get a little tricky, because we will need to have the double quote character inside the string output. But it is doable. Indeed, one reason Python allows strings to be delimited with either single quotes or double quotes is so that one can be used to delimit the string and the other can be a character in the string. If you get to the point where you need to quote all of the values, we recommend learning to use python’s csv module.

In [19]:
olympians = [("John Aalberg", 31, "Cross Country Skiing, 15KM"),
             ("Minna Maarit Aalto", 30, "Sailing"),
             ("Win Valdemar Aaltonen", 54, "Art Competitions"),
             ("Wakako Abe", 18, "Cycling")]

outfile = open("reduced_olympics2.csv", "w")
# output the header row
outfile.write('"Name","Age","Sport"')
outfile.write('\n')
# output each of the rows:
for olympian in olympians:
    row_string = '"{}", "{}", "{}"'.format(olympian[0], olympian[1], olympian[2])
    outfile.write(row_string)
    outfile.write('\n')
outfile.close()


# Course 2 Assessment 1

1. The textfile, `travel_plans.txt`, contains the summer travel plans for someone with some commentary. Find the total number of characters in the file and save to the variable `num`.

In [47]:
filename = open("travel_plans.txt", 'r')
chars = filename.read()
num = len(chars)
print(num)

FileNotFoundError: [Errno 2] No such file or directory: 'travel_plans.txt'

2. We have provided a file called `emotion_words.txt` that contains lines of words that describe emotions. Find the total number of words in the file and assign this value to the variable `num_words`

In [41]:
fname = open("emotion_words.txt", 'r')
words = fname.read()
num_words = len(words.split())
print(num_words)

48


3. Assign to the variable `num_lines` the number of lines in the file `school_prompt.txt`.

In [43]:
fname = open("school_prompt2.txt", "r")
lines = fname.readlines()
num_lines = len(lines)
print(num_lines)

10


4. Assign the first 30 characters of `school_prompt.txt` as a string to the variable `beginning_chars`.

In [44]:
fname = open("school_prompt2.txt", "r")
chars = fname.read()
beginning_chars = chars[:30]
print(beginning_chars)

Writing essays for school can 


5. **Challenge**: Using the file `school_prompt.txt`, assign the third word of every line to a list called `three`.

In [72]:
fname = open("school_prompt2.txt", "r")
lines = fname.readlines()
three = []
for lin in lines:
    words = lin.split()
    three.append(words[2])
print(three)

['for', 'find', 'to', 'many', 'they', 'solid', 'for', 'have', 'some', 'ups,']


6. **Challenge**: Create a list called `emotions` that contains the first word of every line in `emotion_words.txt`.

In [50]:
fname = open("emotion_words.txt", "r")
lines = fname.readlines()
emotions = []
for lin in lines:
    words = lin.split()
    #print(words)
    emotions.append(words[0])
print(emotions)

['Sad', 'Angry', 'Happy', 'Confused', 'Excited', 'Scared', 'Nervous']


7. Assign the first 33 characters from the textfile, `travel_plans.txt` to the variable `first_chars`.

In [76]:
fname = open("travel_plans2.txt", "r")
char = fname.read()
#for lin in lines:
#    words = lin.split()
#    emotions.append(words[0])
first_chars = char[:33]
print(first_chars)

This summer I will be travelling.


8. Challenge: Using the file `school_prompt.txt`, if the character ‘p’ is in a word, then add the word to a list called `p_words`.

In [59]:
fname = open("school_prompt2.txt", "r")
words = fname.read().split()
#print(words)
p_words = []
for aword in words:
    if "p" in aword:
        p_words.append(aword)
print(p_words)




['topic', 'point', 'papers,', 'ups,', 'scripts.']


In [61]:
fname = open("school_prompt2.txt", "r")
for lines in fname.readlines():
    lines = lines.split()
    print(lines)

['Writing', 'essays', 'for', 'school', 'can', 'be', 'difficult', 'but']
['many', 'students', 'find', 'that', 'by', 'researching', 'their', 'topic', 'that', 'they']
['have', 'more', 'to', 'say', 'and', 'are', 'better', 'informed.', 'Here', 'are', 'the', 'university']
['we', 'require', 'many', 'undergraduate', 'students', 'to', 'take', 'a', 'first', 'year', 'writing', 'requirement']
['so', 'that', 'they', 'can']
['have', 'a', 'solid', 'foundation', 'for', 'their', 'writing', 'skills.', 'This', 'comes']
['in', 'handy', 'for', 'many', 'students.']
['Different', 'schools', 'have', 'different', 'requirements,', 'but', 'everyone', 'uses']
['writing', 'at', 'some', 'point', 'in', 'their', 'academic', 'career,', 'be', 'it', 'essays,', 'research', 'papers,']
['technical', 'write', 'ups,', 'or', 'scripts.']


In [62]:
fileref = open('school_prompt2.txt', 'r')
words = fileref.read().split()
p_words = [word for word in words if 'p' in word]
print(p_words)

# line 3 of the above quivalent to:
# p_words = []
# for word in words:
#    if "p" in word:
#        p_words.append(word)

['topic', 'point', 'papers,', 'ups,', 'scripts.']


## Dictionary Mechanics
One way to create a dictionary is to start with the empty dictionary and add `key-value pairs`. The empty dictionary is denoted `{}`.

In [63]:
eng2sp = {}
eng2sp['one'] = 'uno'
eng2sp['two'] = 'dos'
eng2sp['three'] = 'tres'
print(eng2sp)

{'one': 'uno', 'two': 'dos', 'three': 'tres'}


Here is how we use a key to look up the corresponding value.



In [64]:
value = eng2sp['two']
print(value)
print('_____')
print(eng2sp['one'])

dos
_____
uno


**for understanding**: Create a dictionary that keeps track of the USA’s Olympic medal count. Each key of the dictionary should be the type of medal (gold, silver, or bronze) and each key’s value should be the number of that type of medal the USA’s won. Currently, the USA has 33 gold medals, 17 silver, and 12 bronze. Create a dictionary saved in the variable `medals` that reflects this information.

In [65]:
medals = {'gold': 33, 'silver': 17, 'bronze': 12}
print(medals)

{'gold': 33, 'silver': 17, 'bronze': 12}


You are keeping track of olympic medals for Italy in the 2016 Rio Summer Olympics! At the moment, Italy has 7 gold medals, 8 silver metals, and 6 bronze medals. Create a dictionary called `olympics` where the keys are the types of medals, and the values are the number of that type of medals that Italy has won so far.

In [66]:
# ths same wit the above but we can do it in a aifferent way: by adding elements
olympics = {}
olympics['gold'] = 7
olympics['silver'] = 8
olympics['bronze'] = 6
print(olympics)

{'gold': 7, 'silver': 8, 'bronze': 6}


### Dictionary Operations

The `del` statement removes a key-value pair from a dictionary. 

In [67]:
inventory = {'apples': 430, 'bananas': 312, 'oranges': 525, 'pears': 217}
print(inventory)
del inventory['pears']
print(inventory)

{'apples': 430, 'bananas': 312, 'oranges': 525, 'pears': 217}
{'apples': 430, 'bananas': 312, 'oranges': 525}


Dictionaries are **mutable**, as the delete operation above indicates. As we’ve seen before with lists, this means that the dictionary can be modified by referencing an association on the left hand side of the assignment statement. In the previous example, instead of deleting the entry for pears, we could have set the inventory to `0`.

In [68]:
inventory = {'apples': 430, 'bananas': 312, 'oranges': 525, 'pears': 217}
inventory['pears'] = 0
print(inventory)

{'apples': 430, 'bananas': 312, 'oranges': 525, 'pears': 0}


Similarily, a new shipment of 200 bananas arriving could be handled like this. Notice that there are now 512 bananas— the dictionary has been modified. Note also that the `len` function also works on dictionaries. It returns the number of `key-value pairs`.

In [9]:
inventory = {'apples': 430, 'bananas': 312, 'oranges': 525, 'pears': 217}
inventory['bananas'] = inventory['bananas'] + 200
print(inventory)

{'apples': 430, 'bananas': 512, 'oranges': 525, 'pears': 217}


In [10]:
numItems = len(inventory)
print(numItems)

4


Update the value for “Phelps” in the dictionary `swimmers` to include his medals from the Rio Olympics by adding 5 to the current value (Phelps will now have 28 total medals). Do not rewrite the dictionary.

In [11]:
swimmers = {'Manuel':4, 'Lochte':12, 'Adrian':7, 'Ledecky':5, 'Dirado':4, 'Phelps':23}
swimmers['Phelps'] = swimmers['Phelps'] + 5
print(swimmers['Phelps'])
print(swimmers)

28
{'Manuel': 4, 'Lochte': 12, 'Adrian': 7, 'Ledecky': 5, 'Dirado': 4, 'Phelps': 28}


### Dictionary methods
![image](img/10Capture.PNG)

In [22]:
inventory = {'apples': 430, 'bananas': 312, 'oranges': 525, 'pears': 217}
for akey in inventory.keys(): # here we can remove .keys()
    print("Got key", akey, "which maps the value", inventory[akey])
    

Got key apples which maps the value 430
Got key bananas which maps the value 312
Got key oranges which maps the value 525
Got key pears which maps the value 217


In [23]:
# lists of a dictionary
ks = list(inventory.keys())
print(ks)

['apples', 'bananas', 'oranges', 'pears']


It’s so common to **iterate over the keys in a dictionary** that you can omit the keys method call in the `for` loop — **iterating over a dictionary implicitly iterates over its keys**.

In [24]:
inventory = {'apples': 430, 'bananas': 312, 'oranges': 525, 'pears': 217}
for k in inventory:
    print("got key", k)

got key apples
got key bananas
got key oranges
got key pears


The `values` and `items` methods are similar to `keys`. They return the objects which can be iterated over. Note that the item objects are tuples containing the key and the associated value.

In [69]:
inventory = {'apples': 430, 'bananas': 312, 'oranges': 525, 'pears': 217}
print("keys:", list(inventory.keys()))
print("values:", list(inventory.values()))
print("Items:", list(inventory.items()))

keys: ['apples', 'bananas', 'oranges', 'pears']
values: [430, 312, 525, 217]
Items: [('apples', 430), ('bananas', 312), ('oranges', 525), ('pears', 217)]


In [70]:
for k in inventory:
    print("Got", k, "that mapts to", inventory[k])

Got apples that mapts to 430
Got bananas that mapts to 312
Got oranges that mapts to 525
Got pears that mapts to 217


The `in` and `not in` operators can test if a key is in the dictionary:

In [72]:
inventory = {'apples': 430, 'bananas': 312, 'oranges': 525, 'pears': 217}
print('apples' in inventory)
print("cherries" in inventory)
print('...................')

if 'bananas' in inventory:
    print(inventory['bananas'])
else:
    print("We have no bananas")

True
False
...................
312


The `get` method allows us to access the value associated with a key, similar to the `[ ]` operator. The important difference is that `get` will not cause a runtime error if the key is not present. It will instead return None. There exists a variation of `get` that allows a second parameter that serves as an alternative return value in the case where the key is not present. 

In [74]:
inventory = {'apples': 430, 'bananas': 312, 'oranges': 525, 'pears': 217}

print(inventory.get('apples'))
print(inventory['apples'])

print(inventory.get('cherries'))
print(inventory.get('cherries', 0))
print(inventory.get('apples', 0))


430
430
None
0
430


In [38]:
# Checking Understanding
# What is printed by the following statements?
total = 0
mydict = {"cat":12, "dog":6, "elephant":23, "bear":20}
for akey in mydict:
   if len(akey) > 3:
      total = total + mydict[akey]
print(total)

43


Every four years, the summer Olympics are held in a different country. Add a key-value pair to the dictionary places that reflects that the 2016 Olympics were held in Brazil.

In [39]:
places = {"Australia":2000, "Greece":2004, "China":2008, "England":2012}
print(places)
places['Brazil'] = 2016
print(places)


{'Australia': 2000, 'Greece': 2004, 'China': 2008, 'England': 2012}
{'Australia': 2000, 'Greece': 2004, 'China': 2008, 'England': 2012, 'Brazil': 2016}


We have a dictionary of the specific events that Italy has won medals in and the number of medals they have won for each event. Assign to the variable `events` a list of the keys from the dictionary `medal_events`.

In [42]:

medal_events = {'Shooting': 7, 'Fencing': 4, 'Judo': 2, 'Swimming': 3, 'Diving': 2}
events = medal_events.keys()
print(events)
print('.......')
print(list(events))

dict_keys(['Shooting', 'Fencing', 'Judo', 'Swimming', 'Diving'])
.......
['Shooting', 'Fencing', 'Judo', 'Swimming', 'Diving']


#### Aliasing and copying
Because dictionaries are `mutable`, you need to be aware of `aliasing` (as we saw with lists). Whenever two variables refer to the same dictionary object, changes to one affect the other. For example, opposites is a dictionary that contains pairs of opposites.

In [75]:
opposites = {'up': 'down', 'right': 'wrong', 'true': 'false'}
alias = opposites

print(alias is opposites)

alias['right'] = 'left'
print(opposites['right'])
print(alias['right'])

True
left
left


**Note**: As you can see from the is operator, alias and opposites refer to the same object. If you want to modify a dictionary and keep a `copy` of the original, use the dictionary copy method. Since acopy is a copy of the dictionary, changes to it will not effect the original.

In [45]:
opposites = {'up': 'down', 'right': 'wrong', 'true': 'false'}
acopy = opposites.copy()
acopy['right'] = 'left'    # does not change opposites
print(acopy)
print(opposites)

{'up': 'down', 'right': 'left', 'true': 'false'}
{'up': 'down', 'right': 'wrong', 'true': 'false'}


In [46]:
# What is printed by the following statements?

mydict = {"cat":12, "dog":6, "elephant":23, "bear":20}
yourdict = mydict
yourdict["elephant"] = 999
print(mydict["elephant"])

999


## Course 2 Assessment 2

1. At the halfway point during the Rio Olympics, the United States had 70 medals, Great Britain had 38 medals, China had 45 medals, Russia had 30 medals, and Germany had 17 medals. Create a dictionary assigned to the variable `medal_count` with the country names as the keys and the number of medals the country had as each key’s value.


In [47]:
medal_count = {"United States":70, "Great Britain":38, "China":45, "Russia":30, "Germany" : 17}

2. Given the dictionary swimmers, add an additional key-value pair to the dictionary with "Phelps" as the key and the integer 23 as the value. Do not rewrite the entire dictionary.

In [48]:

swimmers = {'Manuel':4, 'Lochte':12, 'Adrian':7, 'Ledecky':5, 'Dirado':4}
swimmers['Phelps'] = 23
print(swimmers)

{'Manuel': 4, 'Lochte': 12, 'Adrian': 7, 'Ledecky': 5, 'Dirado': 4, 'Phelps': 23}


3. The dictionary golds contains information about how many gold medals each country won in the 2016 Olympics. But today, Spain won 2 more gold medals. Update golds to reflect this information.

In [50]:
golds = {"Italy": 12, "USA": 33, "Brazil": 15, "China": 27, "Spain": 19, "Canada": 22, "Argentina": 8, "England": 29}
golds['Spain'] = golds['Spain'] + 2
print(golds)

{'Italy': 12, 'USA': 33, 'Brazil': 15, 'China': 27, 'Spain': 21, 'Canada': 22, 'Argentina': 8, 'England': 29}


4. Create a list of the countries that are in the dictionary golds, and assign that list to the variable name countries. Do not hard code this.

In [51]:
golds = {"Italy": 12, "USA": 33, "Brazil": 15, "China": 27, "Spain": 19, "Canada": 22, "Argentina": 8, "England": 29}
countries = list(golds.keys())
print(countries)

['Italy', 'USA', 'Brazil', 'China', 'Spain', 'Canada', 'Argentina', 'England']


5. Provided is the dictionary, medal_count, which lists countries and their respective medal count at the halfway point in the 2016 Rio Olympics. Using dictionary mechanics, assign the medal count value for "Belarus" to the variable belarus. Do not hardcode this.

In [55]:
medal_count = {'United States': 70, 'Great Britain':38, 'China':45, 'Russia':30, 'Germany':17, 'Italy':22, 'France': 22, 'Japan':26, 'Australia':22, 'South Korea':14, 'Hungary':12, 'Netherlands':10, 'Spain':5, 'New Zealand':8, 'Canada':13, 'Kazakhstan':8, 'Colombia':4, 'Switzerland':5, 'Belgium':4, 'Thailand':4, 'Croatia':3, 'Iran':3, 'Jamaica':3, 'South Africa':7, 'Sweden':6, 'Denmark':7, 'North Korea':6, 'Kenya':4, 'Brazil':7, 'Belarus':4, 'Cuba':5, 'Poland':4, 'Romania':4, 'Slovenia':3, 'Argentina':2, 'Bahrain':2, 'Slovakia':2, 'Vietnam':2, 'Czech Republic':6, 'Uzbekistan':5}
belarus = medal_count["Belarus"]
print(belarus)

4


6. The dictionary total_golds contains the total number of gold medals that countries have won over the course of history. Use dictionary mechanics to find the number of golds Chile has won, and assign that number to the variable name chile_golds. Do not hard code this!

In [56]:
total_golds = {"Italy": 114, "Germany": 782, "Pakistan": 10, "Sweden": 627, "USA": 2681, "Zimbabwe": 8, "Greece": 111, "Mongolia": 24, "Brazil": 108, "Croatia": 34, "Algeria": 15, "Switzerland": 323, "Yugoslavia": 87, "China": 526, "Egypt": 26, "Norway": 477, "Spain": 133, "Australia": 480, "Slovakia": 29, "Canada": 22, "New Zealand": 100, "Denmark": 180, "Chile": 13, "Argentina": 70, "Thailand": 24, "Cuba": 209, "Uganda": 7,  "England": 806, "Denmark": 180, "Ukraine": 122, "Bahamas": 12}
chile_golds = total_golds['Chile']
print(chile_golds)

13


## Accumulating Multiple Results In a Dictionary
Rather than accumulating a single result, it’s possible to accumulate many results. Suppose, for example, we wanted to find out which letters are used most frequently in English.

If we want to find out how often the letter `t` occurs, we can accumulate the result in a count variable.

In [4]:
f = open('scarlet.txt', 'r')
txt = f.read()
# now txt is one long string containing all the characters
t_count = 0
for c in txt:
    if c == 't':
        t_count = t_count + 1
print('There are '+ str(t_count)+ " occurence of t.")

There are 15886 occurence of t.


We can accumulate counts for more than one character as we traverse the text. Suppose, for example, we wanted to compare the counts of `t` and `s` in the text.

In [6]:
f = open('scarlet.txt', 'r')
txt = f.read()
t_count = 0
s_count = 0
for c in txt:
    if c == 't':
        t_count += 1
    elif c == 's':
        s_count += 1
print('There are '+ str(t_count)+ " occurences of t.")
print('There are '+ str(s_count)+ " occurences of s.")

There are 15886 occurences of t.
There are 10989 occurences of s.


You can see this is going to get tedious if we try to accumulate counts for all the letters. We will have to initialize a lot of accumulators, and there will be a very long if..elif..elif statement. **Using a dictionary, we can do a lot better**.

One dictionary can hold all of the accumulator variables. Each `key` in the dictionary will be one letter, and the corresponding value will be the count so far of how many times that letter has occurred.

In [12]:
f = open('scarlet.txt', 'r')
txt = f.read()
x = {}
x['t'] = 0
x['s'] = 0

for c in txt:
    if c == 't':
        x['t'] = x['t'] + 1
    elif c == 's':
        x['s'] = x['s'] + 1

print('There are '+ str(x['t'])+ " occurences of t.")
print('There are '+ str(x['s'])+ " occurences of s.")        
print('x =:', x)    

There are 15886 occurences of t.
There are 10989 occurences of s.
x =: {'t': 15886, 's': 10989}


In [9]:
print(x)

{'t': 15886, 's': 10989}


This hasn’t really improved things yet, but look closely at lines 8-11 in the code above. Whichever character we’re seeing, `t` or `s`, we’re incrementing the counter for that character. So lines 9 and 11 could really be the same.

Previously, our assignment statements referred directly to keys, with `x['s']` and `x['t']`. Here we are just using a variable `c` whose value is `‘s’` or `‘t’`, or some other character

In [13]:
f = open('scarlet.txt', 'r')
txt = f.read()
x = {}
x['t'] = 0
x['s'] = 0

for c in txt:
    if c == 't':
        x[c] = x[c] + 1
    elif c == 's':
        x[c] = x[c] + 1

print('There are '+ str(x['t'])+ " occurences of t.")
print('There are '+ str(x['s'])+ " occurences of s.")        
print('x =:', x) 

There are 15886 occurences of t.
There are 10989 occurences of s.
x =: {'t': 15886, 's': 10989}


We can do better still. One other nice thing about using a dictionary is that **we don’t have to prespecify what all the letters will be.** In this case, we know in advance what the alphabet for English is, but later in the chapter we will count the occurrences of words, and we do not know in advance all the of the words that may be used. **Rather than pre-specifying which letters to keep accumulator counts for, we can start with an empty dictionary and add a counter to the dictionary each time we encounter a new thing that we want to start keeping count of.**

In [29]:
f = open('scarlet.txt', 'r')
txt = f.read()
# now txt is one long string containing all the characters
x = {} # start with an empty dictionary
for c in txt:
    if c not in x:
        # we have not seen this character before, so initialize a counter for it
        x[c] = 0

    #whether we've seen it before or not, increment its counter
    x[c] = x[c] + 1

print("t: " + str(x['t']) + " occurrences")
print("s: " + str(x['s']) + " occurrences")
print('.......................')
print(str(x['a']))
print(str(x['z']))
print(str(x['Z']))

print(x['e'] < x['t'])

t: 15886 occurrences
s: 10989 occurrences
.......................
14774
134
2
False


Note that the print statements at the end pick out the specific keys ‘t’ and ‘s’. **We can generalize that, too, to print out the occurrence counts for all of the characters, using a for loop to iterate through the keys in x.**

In [34]:
f = open('scarlet2.txt', 'r') # the text file is modifid to avoid the lengthy of the letters to be listed
txt = f.read()
# now txt is one long string containing all the characters
letter_counts = {} # start with an empty dictionary
for c in txt:
    if c not in letter_counts:
        # we have not seen this character before, so initialize a counter for it
        letter_counts[c] = 0
    #whether we've seen it before or not, increment its counter
    letter_counts[c] = letter_counts[c] + 1

for c in letter_counts.keys():
    print(c + ": " + str(letter_counts[c]) + " occurrences")
    
print('...................')
print(letter_counts.keys())
print(letter_counts.values())
print(list(letter_counts.items()))

I: 2 occurrences
N: 1 occurrences
 : 9 occurrences
t: 3 occurrences
h: 1 occurrences
e: 5 occurrences
y: 2 occurrences
a: 1 occurrences
r: 3 occurrences
1: 1 occurrences
8: 2 occurrences
7: 1 occurrences
o: 5 occurrences
k: 1 occurrences
m: 1 occurrences
d: 1 occurrences
g: 1 occurrences
f: 1 occurrences
D: 1 occurrences
c: 1 occurrences
...................
dict_keys(['I', 'N', ' ', 't', 'h', 'e', 'y', 'a', 'r', '1', '8', '7', 'o', 'k', 'm', 'd', 'g', 'f', 'D', 'c'])
dict_values([2, 1, 9, 3, 1, 5, 2, 1, 3, 1, 2, 1, 5, 1, 1, 1, 1, 1, 1, 1])
[('I', 2), ('N', 1), (' ', 9), ('t', 3), ('h', 1), ('e', 5), ('y', 2), ('a', 1), ('r', 3), ('1', 1), ('8', 2), ('7', 1), ('o', 5), ('k', 1), ('m', 1), ('d', 1), ('g', 1), ('f', 1), ('D', 1), ('c', 1)]


Note that only those letters that actually occur in the text are shown. Some punctuation marks that are possible in English, but were never used in the text, are omitted completely. The blank line partway through the output may surprise you. That’s actually saying that the newline character, `\\n`, appears 5155 times in the text. In other words, there are 5155 lines of text in the file. Let’s test that hypothesis.

In [40]:
f = open('scarlet.txt', 'r')
txt_lines = f.readlines()
print(len(txt_lines))
print('..............')
print(txt_lines[70:75])

4647
..............
['exuberance of my joy, I asked him to lunch with me at the Holborn, and\n', 'we started off together in a hansom.\n', '\n', '“Whatever have you been doing with yourself, Watson?” he asked in\n', 'undisguised wonder, as we rattled through the crowded London streets.\n']


2. Provided is a string saved to the variable name `sentence`. Split the string into a list of words, then create a dictionary that contains each word and the number of times it occurs. Save this dictionary to the variable name `word_counts`.

In [47]:
sentence = "The dog chased the rabbit into the forest but the rabbit was too quick."
words = sentence.split()
word_counts = {}
for w in words:
    if w not in word_counts:
        word_counts[w] = 0
    word_counts[w] +=  1
print(word_counts)
        

{'The': 1, 'dog': 1, 'chased': 1, 'the': 3, 'rabbit': 2, 'into': 1, 'forest': 1, 'but': 1, 'was': 1, 'too': 1, 'quick.': 1}


3. Create a dictionary called `char_d` from the string `stri`, so that the key is a character and the value is how many times it occurs.

In [48]:
stri = "what can I do"
char_d = {}
for c in stri:
    if c not in char_d:
        char_d[c] = 0
    char_d[c] += 1
print(char_d)


{'w': 1, 'h': 1, 'a': 2, 't': 1, ' ': 3, 'c': 1, 'n': 1, 'I': 1, 'd': 1, 'o': 1}


Just as we have iterated through the elements of a list to accumulate a result, we can also iterate through the keys in a dictionary, accumulating a result that may depend on the values associated with each of the keys.

For example, suppose that we wanted to compute a Scrabble score for the Study in Scarlet text. Each occurrence of the letter ‘e’ earns one point, but ‘q’ earns 10. We have a second dictionary, stored in the variable `letter_values`. Now, to compute the total score, we start an accumulator at 0 and go through each of the letters in the counts dictionary. For each of those letters that has a letter value (no points for spaces, punctuation, capital letters, etc.), we add to the total score.

In [51]:
f = open('scarlet.txt', 'r')
txt = f.read()
# now txt is one long string containing all the characters
x = {} # start with an empty dictionary
for c in txt:
    if c not in x:
        # we have not seen this character before, so initialize a counter for it
        x[c] = 0
    #whether we've seen it before or not, increment its counter
    x[c] = x[c] + 1

letter_values = {'a': 1, 'b': 3, 'c': 3, 'd': 2, 'e': 1, 'f':4, 'g': 2, 'h':4, 'i':1, 'j':8, 'k':5, 'l':1, 'm':3, 'n':1, 'o':1, 'p':3, 'q':10, 'r':1, 's':1, 't':1, 'u':1, 'v':4, 'w':4, 'x':8, 'y':4, 'z':10}

tot = 0

for y in x:
    if y in letter_values:
        tot = tot + letter_values[y] * x[y]

print(tot)


309998


1. The dictionary travel contains the number of countries within each continent that Jackie has traveled to. Find the total number of countries that Jackie has been to, and save this number to the variable name `total`. Do not hard code this!

In [76]:
travel = {"North America": 2, "Europe": 8, "South America": 3, "Asia": 4, "Africa":1, "Antarctica": 0, "Australia": 1}
total = 0
for t in travel:
    total = total + travel[t] 
print(total)


19


2. schedule is a dictionary where a class name is a key and its value is how many credits it was worth. Go through and accumulate the total number of credits that have been earned so far and assign that to the variable `total_credits`. Do not hardcode.

In [77]:
schedule = {"UARTS 150": 3, "SPANISH 103": 4, "ENGLISH 125": 4, "SI 110": 4, "ENS 356": 2, "WOMENSTD 240": 4, "SI 106": 4, "BIO 118": 3, "SPANISH 231": 4, "PSYCH 111": 4, "LING 111": 3, "SPANISH 232": 4, "STATS 250": 4, "SI 206": 4, "COGSCI 200": 4, "AMCULT 202": 4, "ANTHRO 101": 4}
total_credits = 0
for t in schedule:
    total_credits = total_credits + schedule[t] 
print(total_credits)

63


### Accumulating the Best Key
Now what if we want to find the key associated with the maximum value? It would be nice to just find the maximum value as above, and then look up the key associated with it, but dictionaries don’t work that way. You can look up the value associated with a key, but not the key associated with a value. (The reason for that is there may be more than one key that has the same value).

__The trick is to have the accumulator keep track of the best key so far instead of the best value so far__. For simplicity, let’s assume that there are at least two keys in the dictionary. Then, similar to our first version of computing the max of a list, we can initialize the best-key-so-far to be the first key, and loop through the keys, replacing the best-so-far whenever we find a better one.

Write a program that finds the key in a dictionary that has the maximum value. If two keys have the same maximum value, it’s OK to print out either one. Fill in the skeleton code

In [78]:
d = {'a': 194, 'b': 54, 'c':34, 'd': 44, 'e': 312, 'full':31}

ks = d.keys()
best_key_so_far = list(ks)[0]  # Have to turn ks into a real list before using [] to select an item
# initialize variable best_key_so_far to be the first key in d
for k in ks:
      if d[k] > d[best_key_so_far]:
        best_key_so_far = k
    # check if the value associated with the current key is
    # bigger than the value associated with the best_key_so_far
    # if so, save the current key as the best so far
       

print("key " + best_key_so_far + " has the highest value, " + str(d[best_key_so_far]))


key e has the highest value, 312


### Check your Understanding



1. Create a dictionary called `d` that keeps track of all the characters in the string `placement` and notes how many times each character was seen. Then, find the key with the lowest value in this dictionary and assign that key to `min_value`.


In [78]:
placement = "Beaches are cool places to visit in spring however the Mackinaw Bridge is near. Most people visit Mackinaw later since the island is a cool place to explore."
d = {}
for cha in placement:
    if cha not in d:
        d[cha] = 0
    d[cha] = d[cha] + 1
print(d)
ks = d.keys()
min_value = list(ks)[0]
print("Min value:....", min_value)
for k in ks:
      if d[k] < d[min_value]:
        min_value = k
print(min_value)
print("key " + min_value + " has the lowest value, " + str(d[min_value]))

{'B': 2, 'e': 17, 'a': 12, 'c': 8, 'h': 4, 's': 10, ' ': 27, 'r': 7, 'o': 10, 'l': 8, 'p': 6, 't': 8, 'v': 3, 'i': 13, 'n': 7, 'g': 2, 'w': 3, 'M': 3, 'k': 2, 'd': 2, '.': 2, 'x': 1}
Min value:.... B
x
key x has the lowest value, 1


5. Create a dictionary called `lett_d` that keeps track of all of the characters in the string `product` and notes how many times each character was seen. Then, find the key with the highest value in this dictionary and assign that key to `max_value`.

In [87]:
product = "iphone and android phones"
lett_d = {}
for c in product:
    if c not in lett_d:
        lett_d[c] = 0
    lett_d[c] += 1
print(lett_d)
print("......................")
ks = lett_d.keys()
max_value = list(ks)[0]
for k in ks:
    if lett_d[k] > lett_d[max_value]:
        max_value = k
print(max_value)
print("The maximum value is :", lett_d['n'])

{'i': 2, 'p': 2, 'h': 2, 'o': 3, 'n': 4, 'e': 2, ' ': 3, 'a': 2, 'd': 3, 'r': 1, 's': 1}
......................
n
The maximum value is : 4


## Course 2 Assessment 3

1. The dictionary `Junior` shows a schedule for a junior year semester. The key is the course name and the value is the number of credits. Find the total number of credits taken this semester and assign it to the variable `credits`. 

In [89]:
Junior = {'SI 206':4, 'SI 310':4, 'BL 300':3, 'TO 313':3, 'BCOM 350':1, 'MO 300':3}
credits = 0
for d in Junior:
    credits = credits + Junior[d]
print(credits)


18


2. Create a dictionary, `freq`, that displays each character in string `str1` as the key and its frequency as the value.

In [115]:
str1 = "peter piper picked a peck of pickled peppers"
freq = {}
for c in str1:
    if c not in freq:
        freq[c] = 0
    freq[c] += 1
print(freq)


{'p': 9, 'e': 8, 't': 1, 'r': 3, ' ': 7, 'i': 3, 'c': 3, 'k': 3, 'd': 2, 'a': 1, 'o': 1, 'f': 1, 'l': 1, 's': 1}


3. Provided is a string saved to the variable name `s1`. Create a dictionary named `counts` that contains each letter in `s1` and the number of times it occurs.

In [91]:
s1 = "hello"
counts = {}
for c in s1:
    if c not in counts:
        counts[c] = 0
    counts[c] += 1
print(counts)

{'h': 1, 'e': 1, 'l': 2, 'o': 1}


4. Create a dictionary, `freq_words`, that contains each word in string `str1` as the key and its frequency as the value.

In [94]:
str1 = "I wish I wish with all my heart to fly with dragons in a land apart"
words = str1.split()
freq_words = {}
for c in words:
    if c not in freq_words:
        freq_words[c] = 0
    freq_words[c] += 1
print(freq_words)

{'I': 2, 'wish': 2, 'with': 2, 'all': 1, 'my': 1, 'heart': 1, 'to': 1, 'fly': 1, 'dragons': 1, 'in': 1, 'a': 1, 'land': 1, 'apart': 1}


5. Create a dictionary called `wrd_d` from the string `sent`, so that the key is a word and the value is how many times you have seen that word.

In [95]:
sent = "Singing in the rain and playing in the rain are two entirely different situations but both can be good"
words = sent.split()
wrd_d = {}
for c in words:
    if c not in wrd_d:
        wrd_d[c] = 0
    wrd_d[c] += 1
print(wrd_d)



{'Singing': 1, 'in': 2, 'the': 2, 'rain': 2, 'and': 1, 'playing': 1, 'are': 1, 'two': 1, 'entirely': 1, 'different': 1, 'situations': 1, 'but': 1, 'both': 1, 'can': 1, 'be': 1, 'good': 1}


6. Create the dictionary `characters` that shows each character from the string `sally` and its frequency. Then, find the most frequent letter based on the dictionary. Assign this letter to the variable `best_char`.

In [102]:
sally = "sally sells sea shells by the sea shore"
characters = {}
for c in sally:
    if c not in characters:
        characters[c] = 0
    characters[c] = characters[c] + 1
#print("d :.........", d)
#d :......... {'s': 8, 'a': 3, 'l': 6, 'y': 2, ' ': 7, 'e': 6, 'h': 3, 'b': 1, 't': 1, 'o': 1, 'r': 1}

ks = characters.keys()
best_char = list(ks)[0]
for k in ks:
    if characters[k] > characters[best_char]:
        best_char = k
print(best_char)

s


7. Find the least frequent letter. Create the dictionary `characters` that shows each character from string `sally` and its frequency. Then, find the least frequent letter in the string and assign the letter to the variable `worst_char`.

In [103]:
sally = "sally sells sea shells by the sea shore and by the road"
characters = {}
for c in sally:
    if c not in characters:
        characters[c] = 0
    characters[c] = characters[c] + 1
#print("d :.........", d)
#d :......... {'s': 8, 'a': 3, 'l': 6, 'y': 2, ' ': 7, 'e': 6, 'h': 3, 'b': 1, 't': 1, 'o': 1, 'r': 1}

ks = characters.keys()
worst_char = list(ks)[0]
for k in ks:
    if characters[k] < characters[worst_char]:
        worst_char = k
print(worst_char)


n


8. Create a dictionary named `letter_counts` that contains each letter and the number of times it occurs in `string1`. **Challenge**: Letters should not be counted separately as upper-case and lower-case. Intead, all of them should be counted as lower-case.

In [106]:
string1 = "There is a tide in the affairs of men, Which taken at the flood, leads on to fortune. Omitted, all the voyage of their life is bound in shallows and in miseries. On such a full sea are we now afloat. And we must take the current when it serves, or lose our ventures."
letter_counts = {}
for c in string1:
    c = c.lower()
    if c not in letter_counts:
        letter_counts[c] = 0
    letter_counts[c] += 1
print(letter_counts)

{'t': 19, 'h': 11, 'e': 29, 'r': 12, ' ': 53, 'i': 14, 's': 15, 'a': 17, 'd': 7, 'n': 15, 'f': 9, 'o': 17, 'm': 4, ',': 4, 'w': 6, 'c': 3, 'k': 2, 'l': 11, 'u': 8, '.': 4, 'v': 3, 'y': 1, 'g': 1, 'b': 1}


9. Create a dictionary called `low_d` that keeps track of all the characters in the string `p` and notes how many times each character was seen. Make sure that there are no repeats of characters as keys, such that `“T”` and `“t”` are both seen as a `“t”` for example.

In [108]:
p = "Summer is a great time to go outside. You have to be careful of the sun though because of the heat."
low_d = {}
for c in p:
    c = c.lower()
    if c not in low_d:
        low_d[c] = 0
    low_d[c] += 1
print(low_d)

{'s': 5, 'u': 7, 'm': 3, 'e': 12, 'r': 3, ' ': 20, 'i': 3, 'a': 6, 'g': 3, 't': 9, 'o': 8, 'd': 1, '.': 2, 'y': 1, 'h': 6, 'v': 1, 'b': 2, 'c': 2, 'f': 3, 'l': 1, 'n': 1}


## Python Functions, Files, and Dictionaries


### Introduction: Functions
In Python, a function is a chunk of code that performs some operation that is meaningful for a person to think about as a whole unit..... In this chapter you will learn about named functions, functions that can be referred to by name when you want to execute them.

### Function Definition
The syntax for creating a named function, a **function definition**, is:

    def name( parameters ):
        statements
In a function definition, the keyword in the header is `def`, which is followed by the name of the function and some parameter names enclosed in parentheses. The parameter list may be empty, or it may contain any number of parameters separated from one another by commas.

In [3]:
def hello():
    """This function says hello and greets you"""
    print("Hello")
    print("Glad to meet you")
# this don't print anything 

In [4]:
hello()

Hello
Glad to meet you


In [83]:
import turtle

def drawSquare(t, sz):
    """Make turtle t draw a square of with side sz."""

    for i in range(4):
        t.forward(sz)
        t.left(90)


wn = turtle.Screen()      # Set up the window and its attributes
wn.bgcolor("lightgreen")

alex = turtle.Turtle()    # create alex
drawSquare(alex, 50)      # Call the function to draw the square passing the actual turtle and the actual side size

wn.exitonclick()


This function is named `drawSquare`. It has two parameters — one to tell the function which turtle to move around and the other to tell it the size of the square we want drawn. In the function definition they are called `t` and `sz` respectively. Make sure you know where the body of the function ends — it depends on the indentation and the blank lines don’t count for this purpose!

#### Function Invocation
Defining a new function does not make the function run. To execute the function, we need a **function call**. This is also known as a **function invocation**.

__The way to invoke a function is to refer to it by name, followed by parentheses__. Since there are no parameters for the function hello, we won’t need to put anything inside the parentheses when we call it.

In [3]:
def hello():
    print("Hello")
    print("Nice to meet you")

In [8]:
print(type(hello))
print('..............')
print(type('hello'))
print('..............')
hello()

<class 'function'>
..............
<class 'str'>
..............
Hello
Nice to meet you


### Function Parameters
Named functions are nice because, once they are defined and we understand what they do, we can refer to them by name and not think too much about what they do.

In the definition, the parameter list is sometimes referred to as the __formal parameters or parameter names__. These names can be any valid variable name. If there is more than one, they are separated by commas.

In [11]:
def hello2(s):
    print("Hello " + s)
    print("Glad to meet you")

In [12]:
hello2("Iman")
print("...............")
hello2("Jackie")

Hello Iman
Glad to meet you
...............
Hello Jackie
Glad to meet you


In [18]:
def hello2(s):
    print("Hello " + s)
    print("Glad to mee you")

hello2("Iman" +  "and Jackie")
print('...................')
hello2("class " * 3) 

Hello Imanand Jackie
Glad to mee you
...................
Hello class class class 
Glad to mee you


Now let’s consider a function with two parameters. This version of hello takes a parameter that controls how many times the greeting will be printed.

In [23]:
def hello3(s , n):
    greeting = "Hello {} ".format(s)
    print(greeting * n)
    
hello3("wei", 4)
print('.................')
hello3("", 1)
print('.................')
hello3(3, 1)
print('.................')
hello3("Zeki", 1)

    

Hello wei Hello wei Hello wei Hello wei 
.................
Hello  
.................
Hello 3 
.................
Hello Zeki 


### Returning a value from a function
Not only can you pass a parameter value into a function, a function can also produce a value. You have already seen this in some previous functions that you have used. For example, `len` takes a list or string as a parameter value and returns a number, the length of that list or string. `range` takes an integer as a parameter value and returns a list containing all the numbers from 0 up to that parameter value.

Functions that return values are sometimes called -__fruitful functions__. In many other languages, a function that doesn’t return a value is called a __procedure__, but we will stick here with the Python way of also calling it a __function__, or if we want to stress it, a non-fruitful function.

In [62]:
def square(x):
    y = x * x
    return y

toSquare = 10
result = square(toSquare)
result2 = 2 * square(toSquare)

print("The result of {} squared is {}.".format(toSquare, result))
print("The result of {} squared multiplied by 2 is {}.".format(toSquare, result2))


The result of 10 squared is 100.
The result of 10 squared multiplied by 2 is 200.


In [58]:
def square(x):
    y = x * x
    return y

x = 10
result = square(x)
#result = 2 * square(x)
#result = 3 * square(x)

print("The result of {} squared is {}.".format(x, result))


The result of 10 squared is 100.


__There is one more aspect of function return values that should be noted__. All Python functions return the special value `None` unless there is an explicit return statement with a value other than `None`. Consider the following common mistake made by beginning Python programmers

In [59]:
def square(x):
    y = x * x
    print(y)   # Bad! This is confusing! Should use return instead!

toSquare = 10
squareResult = square(toSquare)
print("The result of {} squared is {}.".format(toSquare, squareResult))


100
The result of 10 squared is None.


Since line 6 uses the return value as the right hand side of an assignment statement, squareResult will have None as its value and the result printed in line 7 is incorrect. Typically, functions will return values that can be printed or processed in some other way by the caller.

In the following code, when line 3 executes, the value 5 is returned and assigned to the variable x, then printed. Lines 4 and 5 never execute. Run the following code and try making some modifications of it to make sure you understand why “there” and 10 never print out.

In [46]:
def weird():
    print("here")
    return 5
    print("there") 
    return 10

print(weird())

here
5


In [51]:
  
x = weird()


here


In [52]:
print(x)

5


In [50]:
weird() 

here


5

Consider a situation where you want to write a function to find out, from a class attendance list, whether anyone’s first name is longer than five letters, called __longer_than_five__. If there is anyone in class whose first name is longer than 5 letters, the function should return `True`. Otherwise, it should return `False`.

In [53]:
def longer_than_five(list_of_names):
    for name in list_of_names: # iterate over the list to look at each name
        if len(name) > 5: # as soon as you see a name longer than 5 letters,
            return True # then return True!
            # If Python executes that return statement, the function is over and the rest of the code will not run -- you already have your answer!
    return False # You will only get to this line if you
    # iterated over the whole list and did not get a name where
    # the if expression evaluated to True, so at this point, it's correct to return False!

# Here are a couple sample calls to the function with different lists of names. Try running this code in Codelens a few times and make sure you understand exactly what is happening.

list1 = ["Sam","Tera","Sal","Amita"]
list2 = ["Rey","Ayo","Lauren","Natalie"]

print(longer_than_five(list1))
print(longer_than_five(list2))

False
True


### Check your understanding



In [63]:
#What will the following code output?
def square(x):
    y = x * x
    return y

print(square(5) + square(5))

50


In [64]:
# What will the following code output?

def square(x):
    y = x * x
    return y

print(square(square(2)))

16


In [67]:
#What will the following code output?

def cyu2(s1, s2):
    x = len(s1)
    y = len(s2)
    return x-y

z = cyu2("Yes", "no")
if z > 0:
    print("First one was longer")
else:
    print("Second one was at least as long")

First one was longer


In [68]:
# Which will print out first, square, g, or a number?

def square(x):
    print("square")
    return x*x

def g(y):
    print("g")
    return y + 3

print(square(g(2)))

g
square
25


In [71]:
# How many lines will the following code print?

def show_me_numbers(list_of_ints):
    print(10)
    print("Next we'll accumulate the sum")
    accum = 0
    for num in list_of_ints:
        accum = accum + num
    return accum
    print("All done with accumulation!")

show_me_numbers([4,2,3])
##################### Two printed lines, and then the function body execution reaches a return statement.

10
Next we'll accumulate the sum


9

8. Write a function named same that takes a string as input, and simply returns that string.



In [90]:
def same(s):
    new_string =  s
    return new_string
s = "Hello"
same(s)
    

'Hello'

9. Write a function called same_thing that returns the parameter, unchanged.

In [91]:
def same_thing(z):
    y = z
    return y
z = 2
same_thing(z)

2

10. Write a function called `subtract_three` that takes an integer or any number as input, and returns that number minus three.

In [96]:
def subtract_three(x):
    y = x - 3
    return y
subtract_three(6)

3

11. Write a function called change that takes one number as its input and returns that number, plus 7.

In [97]:
def change(x):
    y = x + 7
    return y
change(6)

13

12. Write a function named `intro` that takes a string as input. Given the string “Becky” as input, the function should return: “Hello, my name is Becky and I love SI 106.”

In [104]:
def intro(si):
    xi = ("Hello, my name is " + si + " and I love SI 106")
    return xi

print(intro("Becky"))

Hello, my name is Becky and I love SI 106


13. Write a function called `s_change` that takes one string as input and returns that string, concatenated with the string ” for fun.”.

In [111]:
def s_change(si):
    xi = si + " for fun."
    return xi

yi = "We go to the beach"
print(s_change(yi))

We go to the beach for fun.


14. Write a function called `decision` that takes a string as input, and then checks the number of characters. If it has over 17 characters, return “This is a long string”, if it is shorter or has 17 characters, return “This is a short string”.

In [122]:
def decision(s1):
    count = 0
    for c in s1:
        count = count + 1
    return count
    
s2 = "Hello World!"
s3 = decision(s2)

if s3 > 17:
    print("This is a long string")
else:
    print("This is a short string")

This is a short string


### Way of the Programmer: Decoding a Function
To build your understanding of any function, you should aim to answer the following questions:
1. How many parameters does it have?
2. What is the type of values that will be passed when the function is invoked?
3. What is the type of the return value that the function produces when it executes?

The second and third questions are not always so easy to answer. In Python, unlike some other programming languages, variables are not declared to have fixed types, and the same holds true for the variable names that appear as formal parameters of functions. You have to figure it out from context.

Here are some clues that can help you determine the type of object associated with any variable, including a function parameter. If you see…
* `len(x)`, then x must be a string or a list. (Actually, it can also be a dictionary, in which case it is equivalent to the expression len(x.keys()). Later in the course, we will also see some other sequence types that it could be). x can’t be a number or a Boolean.
* `x - y`, x and y must be numbers (integer or float)
* `x + y`, x and y must both be numbers, both be strings, or both be lists
* `x[3]`, x must be a string or a list containing at least four items, or x must be a dictionary that includes 3 as a key.
* `x['3']`, x must be a dictionary, with ‘3’ as a key.
* `x[y:z]`, x must be a sequence (string or list), and y and z must be integers
* `x and y`, x and y must be Boolean
* `for x in y`, y must be a sequence (string or list) or a dictionary (in which case it’s really the dictionary’s keys); x must be a character if y is a string; if y is a list, x could be of any type.

### A function that accumulates
Now that we know how to define functions, we could define len ourselves if it did not exist. Previously, we have used the accumlator pattern to count the number of lines in a file. Let’s use that same idea and just wrap it in a function definition. We’ll call it mylen to distinguish it from the real len which already exists. We actually could call it len, but that wouldn’t be a very good idea, because it would replace the original len function, and our implementation may not be a very good one.

In [131]:
def mylen(seq):
    c = 0 # initialize count variable to 0
    for _ in seq: 
        c = c + 1   # increment the counter for each item in seq
    return c

print(mylen("hello, Zeki"))
print(mylen([1, 2, 7]))

11
3


 1. Write a function named `total` that takes a list of integers as input, and returns the total value of all those integers added together.

In [132]:
# First see this accumlation, turn this function
lst = [2,5,4,7]
total = 0
for c in lst:
    total = total + c # plus 1 is to count the numbers/lists
    
print(total) 


18


In function form

In [134]:
def total(lst):
    tot = 0
    for c in lst:
        tot = tot + c
    return tot

mylist = [2,5,4,7]
total(mylist)

18

2. Write a function called `count` that takes a list of numbers as input and returns a count of the number of elements in the list

In [136]:
def total(lst):
    tot = 0
    for c in lst:
        tot = tot + 1 # sucstitute c by 1 in order to count the lists
    return tot

mylist = [1, 5, 9, -2, 9, 23]
total(mylist)

6

__Jock__: Why the functons stop calling each other? Because they have many arguments. 

## Local and Global Variables, and Side Effects
In this lesson, we're going to highlight a few subtleties with functions including: _that each execution gets a fresh set of local variables that disappear at the end of the function execution, that functions can call other functions, and that functions can have side effects on mutable objects_. 
At the end of this lesson, you should be able to:
1. avoid the use of global variables and function definitions by creating formal parameters for all values that are needed, and,
2. identify whether a function has any side effects, Including mutations to lists and dictionaries. We'll see you at the end.

### Local and Global Variables


In [88]:
# What is the result of the following code?

def adding(x):
    y = 3
    z = y + x + x
    return z

def producing(x):
 
    z = x * y
    return z
y = 3 
print(producing(adding(4)))

33


✔️ Yes! There is an error because we reference y in the producing function, but it was defined in adding. Because y is a local variable, we can't use it in both functions without initializing it in both. If we initialized y as 3 in both though, the answer would be 33.

#### Global Variables
Variable names that are at the _top-level_, not inside any function definition, are called __global__.

It is legal for a function to access a global variable. However, this is considered __bad form__ by nearly all programmers and should be avoided. things can get pretty confusing when you mix local and global variables, and that you really shouldn’t do it.

Look at the following, nonsensical variation of the square function. Although the badsquare function works, it is silly and poorly written. We have done it here to illustrate an important rule about how variables are looked up in Python. First, Python looks at the variables that are defined as __local variables in the function. We call this the `local scope`. If the variable name is not found in the local scope, then Python looks at the global variables, or `global scope`__. This is exactly the case illustrated in the code above. `power` is not found locally in `badsquare` but it does exist globally. 

In [4]:
def badsquare(x):
    y = x ** power
    return y

power = 2
result = badsquare(10)
print(result)

100


The appropriate way to write this function would be to pass power as a parameter. For practice, you should rewrite the badsquare example to have a second parameter called power.

__Assignment statements in the local function cannot change variables defined outside the function__. Consider the following codelens example:

In [5]:
def powerof(x,p):
    power = p   # Another dumb mistake
    y = x ** power
    return y

power = 3
result = powerof(10,2)
print(result)

100


The value of `power` in the local scope was different than the `global` scope. That is because in this example power was used on the left hand side of the assignment statement `power = p`. When a variable name is used on the left hand side of an assignment statement Python creates a local variable. When a local variable has the same name as a global variable we say that the local shadows the global. A **shadow** means that the global variable cannot be accessed by Python because the local variable will be found first. This is another good reason not to use global variables. As you can see, it makes your code confusing and difficult to understand.

In [6]:
def powerof(x,p):
    y = x ** p
    return y

result = powerof(10,2)
print(result)

100


If you really want to change the value of a global variable inside a function, you can can do it by explicitly declaring the variable to be global, as in the example below. **Again, you should not do this in your code**. The example is here only to cement your understanding of how python works

In [7]:
def powerof(x,p):
    global power
    power = p
    y = x ** power
    return y

power = 3
result = powerof(10,2)
print(result)
print(power)

100
2


#### Functions can call other functions (Composition)
It is important to understand that each of the functions we write can be used and called from other functions we write. This is one of the most important ways that computer programmers take a large problem and break it down into a group of smaller problems. This process of breaking a problem into smaller subproblems is called __functional decomposition__.

Here’s a simple example of functional decomposition using two functions. The first function called `square` simply computes the square of a given number. The second function called `sum_of_squares` makes use of square to compute the sum of three numbers that have been squared.

Note that the body of `square` is __not executed until it is called from inside the `sum_of_squares` function for the first time on line 6__. Also notice that when square is called (at Step 8, for example), there are two groups of local variables, one for square and one for sum_of_squares. Each group of local variables is called a **stack frame**. The variables x, and y are local variables in both functions. **These are completely different variables, even though they have the same name**. Each function invocation creates a new frame, and variables are looked up in that frame.

In [9]:
def square(x):
    y = x * x
    return y

def sum_of_squares(x,y,z):   
    a = square(x)
    b = square(y)
    c = square(z)
    
    return a+b+c

a = -5
b = 2
c = 10
result = sum_of_squares(a,b,c)
print(result)

129


Let’s use **composition** to build up a little more useful function. Recall from the dictionaries chapter that we had a two-step process for finding the letter that appears most frequently in a text string:
1. Accumulate a dictionary with letters as keys and counts as values. 
2. Find the best key from that dictionary.

We can make functions for each of those and then compose them into a single function that finds the most common letter.

In [17]:
def most_common_letter(s):
    frequencies = count_freqs(s)
    return best_key(frequencies)

def count_freqs(st):
    d = {}
    for c in st:
        if c not in d:
             d[c] = 0
        d[c] = d[c] + 1
    return d     # --- this function generates d = {'a': 1, 'b': 11, 'c': 4, 'd': 5} / or print(d)

# the best_key function identifay the best key form a given dictionary: d = {'a': 1, 'b': 11, 'c': 4, 'd': 5}

def best_key(dictionary):
    ks = dictionary.keys()
    best_key_so_far = list(ks)[0]  # Have to turn ks into a real list before using [] to select an item
    for k in ks:
        if dictionary[k] > dictionary[best_key_so_far]:
            best_key_so_far = k
    return best_key_so_far

print(most_common_letter("abbbbbbbbbbbccccddddd"))


b


1. Write two functions, one called `addit` and one called `mult`. addit takes one number as an input and adds 5. `mult` takes one number as an input, and multiplies that input by whatever is returned by `addit`, and then returns the result.

In [30]:
def addit(a):
    y = a + 5
    return y

def mult(a):
    z = a * addit(a)
    return z

print(mult(addit(3)))

104


#### Flow of Execution Summary
When you are working with functions it is really important to know **the order in which statements are executed**. This is called the **flow of execution**.
Execution always begins at the first statement of the program. Statements are executed one at a time, in order, from top to bottom. Function definitions do not alter the flow of execution of the program, but remember that statements inside the function are not executed until the function is called. Function calls are like a detour in the flow of execution. Instead of going to the next statement, the flow jumps to the first line of the called function, executes all the statements there, and then comes back to pick up where it left off. 

**Don’t read from top to bottom. Instead, follow the flow of execution. This means that you will read the def statements as you are scanning from top to bottom, but you should skip the body of the function until you reach a point where that function is called**.

In [31]:
# Consider the following Python code. What does this function print?

def pow(b, p):
    y = b ** p
    return y

def square(x):
    a = pow(x, 2)
    return a

n = 5
result = square(n)
print(result)

25


#### Passing Mutable Objects

In [32]:
def double(y):
    y = 2 * y

def changeit(lst):
    lst[0] = "Michigan"
    lst[1] = "Wolverines"

y = 5
double(y)
print(y)

mylst = ['our', 'students', 'are', 'awesome']
changeit(mylst)
print(mylst)

5
['Michigan', 'Wolverines', 'are', 'awesome']


Similar to examples we have seen before, running `double` does not change the global y. But running `changeit` does change `mylst`. The explanation is above, about the sharing of mutable objects.

In [33]:
def double(n):
    n = 2 * n

def changeit(lst):
    lst[0] = "Michigan"
    lst[1] = "Wolverines"

y = 5
double(y)
print(y)

mylst = ['106', 'students', 'are', 'awesome']
changeit(mylst)
print(mylst)

5
['Michigan', 'Wolverines', 'are', 'awesome']


We say that the function `changeit` has a **side effect** on the list object that is passed to it. Global variables are another way to have side effects. For example, similar to examples you have seen above, we could make `double` have a side effect on the global variable y.

In [36]:
def double(n):
    global y
    y = 2 * n

y = 5
double(y)
print(y)

10


**Notice the differnce**

In [37]:
def double(n):
    #global y
    y = 2 * n

y = 5
double(y)
print(y)

5


**Note:** You can use the same coding pattern to avoid confusing side effects with sharing of mutable objects. To do that, explicitly make a copy of an object and pass the copy in to the function. Then return the modified copy and reassign it to the original variable if you want to save the changes. The built-in `list` function, which takes a sequence as a parameter and returns a new list, works to copy an existing list. For dictionaries, you can similarly call the `dict` function, passing in a dictionary to get a copy of the dictionary back as a return value.

**In general, any lasting effect that occurs in a function, not through its return value, is called a side effect.**



In [38]:
def changeit(lst):
    lst[0] = "Michigan"
    lst[1] = "Wolverines"
    return lst

mylst = ['106', 'students', 'are', 'awesome']
newlst = changeit(list(mylst))
print(mylst)
print(newlst)

['106', 'students', 'are', 'awesome']
['Michigan', 'Wolverines', 'are', 'awesome']


## Course 2 Assessment 4

1. Write a function called `int_return` that takes an integer as input and returns the same integer.

In [39]:
def int_return(x):
    return x
a = 2
print(int_return(a))

2


2. Write a function called `add` that takes any number as its input and returns that sum with 2 added.

In [40]:
def add(x):
    z = x + 2
    return z

a = 2
print(add(a))

4


3. Write a function called `change` that takes any string, adds `“Nice to meet you!”` to the end of the argument given, and returns that new string.

In [42]:
def change(s1):
    s2 = s1 + "Nice to meet you!"
    return s2

a = "Hello, "
print(change(a))

Hello, Nice to meet you!


4. Write a function, `accum`, that takes a list of integers as input and returns the sum of those integers.


In [44]:
def accum(lst):
    ls = 0
    for i in lst:
        ls = ls + i
    return ls
    
intigers = [1,2,3,4,5,6,7,8]
print(accum(intigers))


36


5. Write a function, `length`, that takes in a list as the input. If the length of the list is greater than or equal to 5, return “Longer than 5”. If the length is less than 5, return “Less than 5”.

In [68]:
def length(x):
    if len(x) >= 5:
        return "Longer than 5"
    return "Less than 5"

x = [1,2,3,4,5]
length(x)

'Longer than 5'

In [70]:
#But if you want a shorter code, use:

def length(x):
    return '%s than 5' % ['Less', 'Longer'][len(x) >= 5]

x = [1,2,3,4,5]
print(length(x))

Longer than 5


In [71]:
#OR
def length(x):
    return '{} than 5'.format(['Less', 'Longer'][len(x) >= 5])

x = [1,2,3,4,5]
print(length(x))

Longer than 5


6. You will need to write two functions for this problem. The first function, `divide` that takes in any number and returns that same number divided by 2. The second function called `sum` should take any number, divide it by 2, and add 6. It should return this new number. You should call the divide function within the `sum` function. Do not worry about decimals.

In [73]:
def divide(a):
    y = a//2
    return y

def sum(b):
    z = b/2 + 6
    return z

a = 2
b=  2

print(sum(divide(a)))

6.5


## Tuples
I just want to introduce a cool feature, __packing and unpacking__. Doesn't really let you do anything new, but it lets your code be a little more readable. At the end of this lesson, you should be able to: 
1. recognize when code is using `implicit tuple packing`, and use implicit tuple packing to return multiple values from a function, 
2. and you should be able to read and write code that unpacks a tuple into multiple variables. 

You have previously seen tuples, a sequence type that works just like lists except that they are immutable. When working with multiple values or multiple variable names, the Python interpreter does some automatic packing and unpacking to and from tuples, which allows some simplifications in the code you write.

#### Tuple Packing
**Wherever python expects a single value, if multiple expressions are provided, separated by commas**, they are automatically **packed** into a tuple. For example, we can omit the parentheses when assigning a tuple of values to a single variable

In [74]:
julia = ("Julia", "Roberts", 1967, "Duplicity", 2009, "Actress", "Atlanta, Georgia")
# or equivalently
julia = "Julia", "Roberts", 1967, "Duplicity", 2009, "Actress", "Atlanta, Georgia"
print(julia[4])

2009


One place where this is especially useful is when a function wants to return multiple values.
For example, in this code, `circleInfo` is a function
and it wants to return two values. The circumference of the circle, and the area of the circle.
**You can only return one value from a Python function, but that value can be a tuple as we've done here.**

In [75]:
def circleInfo(r):
    '''Return(circmfrance, area) of a circle of radius r'''
    c = 2 * 3.14159 * r
    a = 3.14159 * r * r
    return c, a

print(circleInfo(10))

(62.8318, 314.159)


In [78]:
#2. Create a tuple called practice that has four elements: ‘y’, ‘h’, ‘z’, and ‘x’.

practice = ('y','h','z','x')
print(practice)
print(type(practice))

('y', 'h', 'z', 'x')
<class 'tuple'>


4. Provided is a list of tuples. Create another list called `t_check` that contains the third element of every tuple.



In [89]:
lst_tups = [('Articuno', 'Moltres', 'Zaptos'), ('Beedrill', 'Metapod', 'Charizard', 'Venasaur', 'Squirtle'), ('Oddish', 'Poliwag', 'Diglett', 'Bellsprout'), ('Ponyta', "Farfetch'd", "Tauros", 'Dragonite'), ('Hoothoot', 'Chikorita', 'Lanturn', 'Flaaffy', 'Unown', 'Teddiursa', 'Phanpy'), ('Loudred', 'Volbeat', 'Wailord', 'Seviper', 'Sealeo')]
t_check = []
for lin in lst_tups:
    t_check.append(lin[2])
print(tuple(t_check))


('Zaptos', 'Charizard', 'Diglett', 'Tauros', 'Lanturn', 'Wailord')


5. Below, we have provided a list of tuples. Write a for loop that saves the second element of each tuple into a list called `seconds`.



In [93]:
tups = [('a', 'b', 'c'), (8, 7, 6, 5), ('blue', 'green', 'yellow', 'orange', 'red'), (5.6, 9.99, 2.5, 8.2), ('squirrel', 'chipmunk')]
seconds  = []
for lin in tups:
    seconds.append(lin[1])
print(tuple(seconds))

('b', 7, 'green', 9.99, 'chipmunk')


#### Tuples as Return Values
#### Tuple Assignment with Unpacking
Python has a very powerful **tuple assignment** feature that allows **a tuple of variable names on the left of an assignment statement to be assigned values from a tuple on the right of the assignment**. Another way to think of this is that the tuple of values is **unpacked** into the variable names.

In [95]:
julia = "Julia", "Roberts", 1967, "Duplicity", 2009, "Actress", "Atlanta, Georgia"
name, surname, birth_year, movie, movie_year, profession, birth_place = julia

Julia


**This does the equivalent of seven assignment statements, all on one easy line.**
Naturally, **the number of variables on the left and the number of values on the right have to be the same.**

Unpacking into multiple variable names also works with lists, or any other sequence type, as long as there is exactly one value for each variable. For example, you can write `x, y = [3, 4]`.

#### Swapping Values between Variables
This feature is used to `enable swapping` the values of two variables. With conventional assignment statements, `we have to use a temporary variable`. For example, to swap `a` and `b`:

In [96]:
a = 1
b = 2
temp = a 
a = b 
b = temp 
print(a, b, temp)

2 1 1


**Tuple assignment solves this problem neatly:**

In [99]:
a = 1
b = 2
(a, b) = (b, a)
print(a,b)
# The left side is a tuple of variables; the right side is a tuple of values. 
# Each value is assigned to its respective variable

2 1


### Unpacking Into Iterator Variables
Multiple assignment with unpacking is particularly useful when you iterate through a list of tuples. You can unpack each tuple into several loop variables. For example: on the first iteration the tuple ('Paul', 'Resnick') is unpacked into the two variables first_name and last_name, and so on. 

In [101]:
authors = [('Paul', 'Resnick'), ('Brad', 'Miller'), ('Lauren', 'Murphy'), ('Zeki', 'Mulu')]
for first_name, last_name in authors:
    print("First name:", first_name, ", Last name:", last_name)

First name: Paul , Last name: Resnick
First name: Brad , Last name: Miller
First name: Lauren , Last name: Murphy
First name: Zeki , Last name: Mulu


#### The Pythonic Way to Enumerate Items in a Sequence
When we first introduced the for loop, we provided an example of how to iterate through the indexes of a sequence, and thus **enumerate the items and their positions in the sequence.**. Compare this with the next one. 

In [102]:
fruits = ['apple', 'pear', 'apricot', 'cherry', 'peach']
for n in range(len(fruits)):
    print(n, fruits[n])

0 apple
1 pear
2 apricot
3 cherry
4 peach


It's important understand a more pythonic approach to enumerating items in a sequence. Python provides a built-in function `enumerate`. **It takes a sequence as input and returns a sequence of tuples. In each tuple, the first element is an integer and the second is an item from the original sequence**. (It actually produces an “iterable” rather than a list, but we can use it in a for loop as the sequence to iterate over.)

In [108]:
fruits = ['apple', 'pear', 'apricot', 'cherry', 'peach']
for item in enumerate(fruits):
    print(item)
   # print(item[0], item[1])


(0, 'apple')
(1, 'pear')
(2, 'apricot')
(3, 'cherry')
(4, 'peach')


In [109]:
fruits = ['apple', 'pear', 'apricot', 'cherry', 'peach']
for item in enumerate(fruits):
    print(item[0], item[1])

0 apple
1 pear
2 apricot
3 cherry
4 peach


**The pythonic way to consume the results of enumerate, however, is to unpack the tuples while iterating through them, so that the code is easier to understand.**

In [114]:
fruits = ['apple', 'pear', 'apricot', 'cherry', 'peach']
for idx, fruit in enumerate(fruits):
    print(idx, fruit)

0 apple
1 pear
2 apricot
3 cherry
4 peach


**For understanding:** With only one line of code, assign the variables water, fire, electric, and grass to the values “Squirtle”, “Charmander”, “Pikachu”, and “Bulbasaur”

In [124]:
water, fire, electric, grass = "Squirtle", "Charmander", "Pikachu",  "Bulbasaur"

2. If you remember, the `.items()` dictionary method produces a sequence of tuples. Keeping this in mind, we have provided you a dictionary called `pokemon`. For every key value pair, append the key to the list `p_names`, and append the value to the list `p_number`. Do not use the `.keys()` or `.values()` methods.

In [151]:
pokemon = {'Rattata': 19, 'Machop': 66, 'Seel': 86, 'Volbeat': 86, 'Solrock': 126}
p_names = []
p_number = []
for i in pokemon.items():
    p_names.append(i[0])
    p_number.append(i[1])
    
print(p_names)
print(p_number)  

['Rattata', 'Machop', 'Seel', 'Volbeat', 'Solrock']
[19, 66, 86, 86, 126]


3. The `.items()` method produces a sequence of key-value pair tuples. With this in mind, write code to create a list of keys from the dictionary `track_medal_counts` and assign the list to the variable name `track_events`. Do NOT use the `.keys()` method

In [158]:
track_medal_counts = {'shot put': 1, 'long jump': 3, '100 meters': 2, '400 meters': 2, '100 meter hurdles': 3, 'triple jump': 3, 'steeplechase': 2, '1500 meters': 1, '5K': 0, '10K': 0, 'marathon': 0, '200 meters': 0, '400 meter hurdles': 0, 'high jump': 1}
track_events = []
for dics in track_medal_counts.items():
    track_events.append(dics[0])
print(track_events)

['shot put', 'long jump', '100 meters', '400 meters', '100 meter hurdles', 'triple jump', 'steeplechase', '1500 meters', '5K', '10K', 'marathon', '200 meters', '400 meter hurdles', 'high jump']


### Tuples as Return Values
Functions can return tuples as return values. This is very useful — we often want to know some batsman’s highest and lowest score, or we want to find the mean and the standard deviation, or we want to know the year, the month, and the day, or if we’re doing some ecological modeling we may want to know the number of rabbits and the number of wolves on an island at a given time. In each case, a function (which can only return a single value), can create a single tuple holding multiple elements.

In [159]:
def circleInfo(r):
    """ Return (circumference, area) of a circle of radius r """
    c = 2 * 3.14159 * r
    a = 3.14159 * r * r
    return (c, a)
# or return c, a to make it more readable

print(circleInfo(10))

(62.8318, 314.159)


**It’s common to unpack the returned values into multiple variables.**

In [160]:
def circleInfo(r):
    """ Return (circumference, area) of a circle of radius r """
    c = 2 * 3.14159 * r
    a = 3.14159 * r * r
    return c, a

print(circleInfo(10))

circumference, area = circleInfo(10)
print(circumference)
print(area)

circumference_two, area_two = circleInfo(45)
print(circumference_two)
print(area_two)

(62.8318, 314.159)
62.8318
314.159
282.74309999999997
6361.719749999999


Define a function called `information` that takes as input, the variables `name`, `birth_year`, `fav_color`, and `hometown`. It should return a tuple of these variables in this order.

In [188]:
def info(x):
    for var in range(len(x)):
        a = var
        b = x[var]
        return (a,b)
    

vars = ["name", "birth_year", "fav_color", "hometown"]
print(info(vars))
       
#???????????????????   

(0, 'name')


#### Unpacking Tuples as Arguments to Function Calls
Python even provides a way to pass a single tuple to a function and have it be unpacked for assignment to the named parameters. This won’t quite work. It will cause an error, because the function add is expecting two parameters, but you’re only passing one parameter (a tuple). If only there was a way to tell python to unpack that tuple and use the first element to assign to x and the second to y.

Actually, there is a way.

In [192]:
def add(x, y):
    return x + y

print(add(3, 4))
z = (5, 4)
print(add(*z)) # this line will cause the values to be unpacked
# print(add(z)) # this line will cause an error

7
9


## Course 2 Assessment 5

1. Create a tuple called `olympics` with four elements: “Beijing”, “London”, “Rio”, “Tokyo”.

In [193]:
olympics = ("Beijing", "London", "Rio", "Tokyo")
print(type(olympics))

<class 'tuple'>


2. The list below, `tuples_lst`, is a list of tuples. Create a list of the second elements of each tuple and assign this list to the variable `country`.

In [196]:
tuples_lst = [('Beijing', 'China', 2008), ('London', 'England', 2012), ('Rio', 'Brazil', 2016, 'Current'), ('Tokyo', 'Japan', 2020, 'Future')]
country = []
for i in tuples_lst:
    country.append(i[1])
print(country)

['China', 'England', 'Brazil', 'Japan']


3. With only one line of code, assign the variables `city`, `country`, and `year` to the values of the tuple `olymp`.

In [197]:
olymp = ('Rio', 'Brazil', 2016)
city, country,  year = olymp

4. Define a function called `info` with five parameters: `name`, `gender`, `age, `bday_month`, and `hometown`. The function should then return a tuple with all five parameters in that order

In [198]:
def info(a,b,c,d,e):
    return a, b, c, d, e

a, b, c, d, e = ("name", "gender", "age", "bday_month",  "hometown")
print(info(a, b, c, d, e))

('name', 'gender', 'age', 'bday_month', 'hometown')


5. Given is the dictionary, `gold`, which shows the country and the number of gold medals they have earned so far in the 2016 Olympics. Create a list, `num_medals`, that contains only the number of medals for each country. You must use the .items() method. Note: The .items() method provides a list of tuples. Do not use .keys() method.

In [199]:
gold = {'USA':31, 'Great Britain':19, 'China':19, 'Germany':13, 'Russia':12, 'Japan':10, 'France':8, 'Italy':8}
num_medals = []
for mum in gold.items():
    num_medals.append(mum[1])
print(num_medals)

[31, 19, 19, 13, 12, 10, 8, 8]


# The while Statement
 Similar to the if statement, it uses a boolean expression to control the flow of execution. The body of while will be repeated as long as the controlling boolean expression evaluates to True.
 
We can use the while loop to create any type of iteration we wish, including anything that we have previously done with a for loop. For example, the program in the previous section could be rewritten using while. Instead of relying on the range function to produce the numbers for our summation, we will need to produce them ourselves

In [4]:
def sumTo(aBound):
        """ Return the sum of 1+2+3 ... n """
        theSum = 0
        aNumber = 1
        while aNumber <= aBound:
            theSum = theSum + aNumber
            aNumber = aNumber + 1
        return theSum

print(sumTo(0))
print(sumTo(1))
print(sumTo(2))
print(sumTo(3))
print(sumTo(4))
print(sumTo(10))
print(sumTo(100))


0
1
3
6
10
55
5050


More formally, here is the flow of execution for a while statement:

1. Evaluate the condition, yielding False or True.
2. If the condition is False, exit the while statement and continue execution at the next statement.
3. If the condition is True, execute each of the statements in the body and then go back to step 1.

The body consists of all of the statements below the header with the same indentation.
This type of flow is called a loop because the third step loops back around to the top. Notice that if the condition is False the first time through the loop, the statements inside the loop are never executed.
The body of the loop should change the value of one or more variables so that eventually the condition becomes False and the loop terminates. Otherwise the loop will repeat forever. This is called an infinite loop

**Check for Understanding:**
1. Write a `while` loop that is initialized at 0 and stops at 15. If the counter is an even number, append the counter to a list called `eve_nums`.

In [13]:
count = 0
eve_nums = []
while count < 15:
    if count % 2 == 0: # if we change the order we don't get the 0 in the list becuase 1 will be added on it.
        eve_nums.append(count)
    count = count + 1

print(count)
print(eve_nums)
    

15
[0, 2, 4, 6, 8, 10, 12, 14]


2. Below, we’ve provided a for loop that sums all the elements of `list1`. Write code that accomplishes the same task, but instead uses a while loop. Assign the accumulator variable to the name `accum`.

In [18]:
list1 = [8, 3, 4, 5, 6, 7, 9]

count = 0
accum = 0

while count < len(list1):
    accum = accum + list1[count]
    count = count + 1

print(count)
print(accum)

7
42


3. Write a function called `stop_at_four` that iterates through a list of numbers. Using a while loop, append each number to a new list until the number 4 appears. The function should return the new list

In [4]:
list = [3,2,3,4,5]

def stop_at_four(list):
    count = 0
    accum = []
    while list[count] != 4:
        accum.append(list[count])
        count = count + 1
    return accum
print(stop_at_four(list))

[3, 2, 3]


## The Listener Loop


At the end of the previous section, __we advised using a for loop whenever it will be known at the beginning of the iteration process how many times the block of code needs to be executed__. Usually, in python, you will use a for loop rather than a while loop. When is it not known at the beginning of the iteration how many times the code block needs to be executed? The answer is, when it depends on something that happens during the execution.

One very common pattern is called a **listener loop**. Inside the while loop there is a function call to get user input. The loop repeats indefinitely, until a particular input is received.

In [6]:
theSum = 0
x = -1
while (x != 0):
    x = int(input("next number to add up (enter 0 if no more numbers): "))
    theSum = theSum + x

print(theSum)

next number to add up (enter 0 if no more numbers): 5
next number to add up (enter 0 if no more numbers): 5
next number to add up (enter 0 if no more numbers): 7
next number to add up (enter 0 if no more numbers): 04
next number to add up (enter 0 if no more numbers): 5
next number to add up (enter 0 if no more numbers): 0
26


### Other uses of while: Sentinel Values

In [1]:
def checkout():
    total = 0
    count = 0
    moreItems = True
    while moreItems:
        price = float(input('Enter price of item (0 when done): '))
        if price != 0:
            count = count + 1
            total = total + price
            print('Subtotal: $', total)
        else:
            moreItems = False
    average = total / count
    print('Total items:', count)
    print('Total $', total)
    print('Average price per item: $', average)

checkout()


Enter price of item (0 when done): 0


ZeroDivisionError: division by zero

### Validating Input
You can also use a `while` loop when you want to __validate__ input; when you want to make sure the user has entered valid input for a prompt. Let’s say you want a function that asks a yes-or-no question. In this case, you want to make sure that the person using your program enters either a Y for yes or N for no (in either upper or lower case). Here is a program that uses a `while` loop to keep asking until it receives a valid answer. As a preview of coming attractions, it uses the `upper()` method which is described in String Methods to convert a string to upper case. When you run the following code, try typing something other than Y or N to see how the code reacts:

In [8]:
def get_yes_or_no(message):
    valid_input = False
    while not valid_input:
        answer = input(message)
        answer = answer.upper() # convert to upper case
        if answer == 'Y' or answer == 'N':
            valid_input = True
        else:
            print('Please enter Y for yes or N for no.')
    return answer

response = get_yes_or_no('Do you like lima beans? Y)es or N)o: ')
if response == 'Y':
    print('Great! They are very healthy.')
else:
    print('Too bad. If cooked right, they are quite tasty.')


Do you like lima beans? Y)es or N)o: v
Please enter Y for yes or N for no.
Do you like lima beans? Y)es or N)o: 
Please enter Y for yes or N for no.
Do you like lima beans? Y)es or N)o: y
Great! They are very healthy.


### Randomly Walking Turtles
Suppose we want to entertain ourselves by watching a turtle wander around randomly inside the screen. When we run the program we want the turtle and program to behave in the following way:
1. The turtle begins in the center of the screen.
2. Flip a coin. If it’s heads then turn to the left 90 degrees. If it’s tails then turn to the right 90 degrees.
3. Take 50 steps forward.
4. If the turtle has moved outside the screen then stop, otherwise go back to step 2 and repeat.

Notice that we cannot predict how many times the turtle will need to flip the coin before it wanders out of the screen, so we can’t use a for loop in this case. In fact, although very unlikely, this program might never end, that is why we call this indefinite iteration.

In [11]:
import random
import turtle

def isInScreen(w, t):
    if random.random() > 0.1:
        return True
    else:
        return False

t = turtle.Turtle()
wn = turtle.Screen()

t.shape('turtle')
while isInScreen(wn, t):
    coin = random.randrange(0, 2)
    if coin == 0:              # heads
        t.left(90)
    else:                      # tails
        t.right(90)

    t.forward(50)

wn.exitonclick()

### Break and Continue
Python provides ways for us to control the flow of iteration with a two keywords: `break and continue`.
`break` allows the program to immediately ‘break out’ of the loop, regardless of the loop’s conditional structure.

In [12]:
while True:
    print("this phrase will always print")
    break
    print("Does this phrase print?")

print("We are done with the while loop.")

this phrase will always print
We are done with the while loop.


Using `continue` allows the program to immediately “continue” with the next iteration. The program will skip the rest of the iteration, recheck the condition, and maybe does another iteration depending on the condition set for the while loop.

In [13]:
x = 0
while x < 10:
    print("we are incrementing x")
    if x % 2 == 0:
        x += 3
        continue
    if x % 3 == 0:
        x += 5
    x += 1
print("Done with our loop! X has the value: " + str(x))

we are incrementing x
we are incrementing x
we are incrementing x
Done with our loop! X has the value: 15


## Course 2 Assessment 6
1. Write a function, `sublist`, that takes in a list of numbers as the parameter. In the function, use a while loop to return a sublist of the input list. The sublist should contain the same values of the original list up until it reaches the number 5 (it should not contain the number 5).

In [25]:
def sublist(x):
    y = []
    sub = 0
    while x[sub] != 5:
        y.append(x[sub])
        sub = sub + 1
    return y

x = [3,4,5,4]
#print(sublist(x)) --- this works but cause runtime error due to infinit loop on line 4

[3, 4]


2. Write a function called `check_nums` that takes a list as its parameter, and contains a while loop that only stops once the element of the list is the number 7. What is returned is a list of all of the numbers up until it reaches 7.


In [26]:
def check_nums(x):
    y = []
    sub = 0
    while sub < len(x) and x[sub] !=7:
        y.append(x[sub])
        sub = sub + 1
    return y

x = [0,7]
print(check_nums(x))

[0]


3. Write a function, `sublist`, that takes in a list of strings as the parameter. In the function, use a while loop to return a sublist of the input list. The sublist should contain the same values of the original list up until it reaches the string “STOP” (it should not contain the string “STOP”).

In [27]:
#stp = "it should not contain the stop string"
#stg = stp.split().upper()

def sublist(stg):
    lt = []
    cha = 0
    while cha < len(stg) and stg[cha] != 'STOP':
        lt.append(stg[cha])
        cha += 1
    return lt

stg = ['it', 'should', 'not', 'contain', 'the', 'STOP', 'string']
print(sublist(stg))
    

['it', 'should', 'not', 'contain', 'the']


4. Write a function called `stop_at_z` that iterates through a list of strings. Using a while loop, append each string to a new list until the string that appears is “z”. The function should return the new list.

In [32]:
def stop_at_z(stg):
    lt = []
    cha = 0
    while cha < len(stg) and stg[cha] != 'z':
        lt.append(stg[cha])
        cha += 1
    return lt


stg = "A thesis can be dozens of pages in mathematics"
print(stop_at_z(stg))

['A', ' ', 't', 'h', 'e', 's', 'i', 's', ' ', 'c', 'a', 'n', ' ', 'b', 'e', ' ', 'd', 'o']


5. Below is a for loop that works. Underneath the for loop, rewrite the problem so that it does the same thing, but using a while loop instead of a for loop. Assign the accumulated total in the while loop code to the variable `sum2`. Once complete, `sum2` should equal sum1.

In [36]:
lst = [65, 78, 21, 33]
sum1 = 0
sum2 = 0

while sum1 < len(lst):
    sum2 = sum2 + lst[sum1]
    sum1 = sum1 + 1

print(sum1)
print(sum2)

4
197


5. __Challenge__: Write a function called `beginning` that takes a list as input and contains a while loop that only stops once the element of the list is the string ‘bye’. What is returned is a list that contains up to the first 10 strings, regardless of where the loop stops. (i.e., if it stops on the 32nd element, the first 10 are returned. If “bye” is the 5th element, the first 4 are returned.) If you want to make this even more of a challenge, do this without slicing


In [41]:
lst = "list as input and contains  a while loop that only stops once the element of the list is the string bye"
stg = lst.split()


def beginning(stg):
    lt = []
    cha = 0
    while cha < 10 and stg[cha] != 'bye':
        lt.append(stg[cha])
        cha += 1
    return lt

print(beginning(stg))    

['list', 'as', 'input', 'and', 'contains', 'a', 'while', 'loop', 'that', 'only']


## Advanced 
### Introduction: Optional Parameters
When defining a function, you can specify a default value for a parameter. That parameter then becomes an optional parameter when the function is called. The way to specify a default value is with an assignment statement inside the parameter list. Consider the following code, for example.

In [42]:
initial = 7
def f(x, y =3, z=initial):
    print("x, y, z, are: " + str(x) + ", " + str(y) + ", " + str(z))

f(2)
f(2, 5)
f(2, 5, 8)


x, y, z, are: 2, 3, 7
x, y, z, are: 2, 5, 7
x, y, z, are: 2, 5, 8


The second tricky thing is that if the default value is set to a mutable object, such as a list or a dictionary, that object will be shared in all invocations of the function. This can get very confusing, __so I suggest that you never set a default value that is a mutable object__. For example, follow the exceution of this one carefully.

In [44]:
def f(a, L=[]):
    L.append(a)
    return L

print(f(1))
print(f(2))
print(f(3))
print(f(4, ["Hello"]))
print(f(5, ["Hello"]))


[1]
[1, 2]
[1, 2, 3]
['Hello', 4]
['Hello', 5]


3. Write a function called `str_mult` that takes in a required string parameter and an optional integer parameter. The default value for the integer parameter should be 3. The function should return the string multiplied by the integer parameter.

In [6]:
def str_mult(a,b=3):
    l = a * b
    return l
print(str_mult('haha'))

hahahahahaha


## Keyword Parameters
In the previous section, on Optional Parameters you learned how to define default values for formal parameters, which made it optional to provide values for those parameters when invoking the functions.

Here, you’ll see one more way to invoke functions with optional parameters, **with keyword-based parameter passing. This is particularly convenient when there are several optional parameters and you want to provide a value for one of the later parameters while not providing a value for the earlier ones**.

Don’t worry about the def `cheeseshop(kind, *arguments, **keywords)`: example. You should be able to get by without understanding `*parameters` and `**parameters` in this course. But do make sure you understand the stuff above that.

The basic idea of passing arguments by keyword is very simple. When invoking a function, inside the parentheses there are always 0 or more values, separated by commas. With keyword arguments, some of the values can be of the form `paramname = <expr>` instead of just `<expr>`. Note that when you have `paramname = <expr>` in a function definition, it is defining the default value for a parameter when no value is provided in the invocation; when you have `paramname = <expr> `in the invocation, it is supplying a value, overriding the default for that paramname.


In [16]:
def parrot(voltage, state='a stiff', action='voom', type='Norwegian Blue'):
    print("-- This parrot wouldn't " + action,)
    print("If you put " + str(voltage) + " volts through it.")
    print("-- Lovely plumage, the " +  type)
    print("-- It's " + state + "!")

#parrot(1000)                                          # 1 positional argument
#parrot(voltage=1000)                                  # 1 keyword argument
parrot(voltage=1000000, action='VOOOOOM')             # 2 keyword arguments
#parrot(action='VOOOOOM', voltage=1000000)             # 2 keyword arguments
#parrot('a million', 'bereft of life', 'jump')         # 3 positional arguments
#parrot('a thousand', state='pushing up the daisies')  # 1 positional, 1 keyword

-- This parrot wouldn't VOOOOOM
If you put 1000000 volts through it.
-- Lovely plumage, the Norwegian Blue
-- It's a stiff!


## Keyword Parameters with .format

Earlier you learned how to use the `format` method for strings, which allows you to structure strings like fill-in-the-blank sentences. Now that you’ve learned about optional and keyword parameters, we can introduce a new way to use the `format` method.

This other option is to specifically refer to keywords for __interpolation values__, like below.

In [17]:
names_scores = [("Jack",[67,89,91]),("Emily",[72,95,42]),("Taylor",[83,92,86])]
for name, scores in names_scores:
    print("The scores {nm} got were: {s1},{s2},{s3}.".format(nm=name,s1=scores[0],s2=scores[1],s3=scores[2]))

The scores Jack got were: 67,89,91.
The scores Emily got were: 72,95,42.
The scores Taylor got were: 83,92,86.


Sometimes, you may want to use the `.format` method to insert the same value into a string multiple times. You can do this by simply passing the same string into the format method, assuming you have included `{}` s in the string everywhere you want to interpolate them. But you can also use positional passing references to do this! The order in which you pass arguments into the format method matters: the first one is argument `0`, the second is argument `1`, and so on.

In [19]:
# this works
names = ["Jack","Jill","Mary"]
for n in names:
    print("'{}!' she yelled. '{}! {}, {}!'".format(n,n,n,"say hello"))
print("................................")
# but this also works!
names = ["Jack","Jill","Mary"]
for n in names:
    print("'{0}!' she yelled. '{0}! {0}, {1}!'".format(n,"say hello"))

'Jack!' she yelled. 'Jack! Jack, say hello!'
'Jill!' she yelled. 'Jill! Jill, say hello!'
'Mary!' she yelled. 'Mary! Mary, say hello!'
................................
'Jack!' she yelled. 'Jack! Jack, say hello!'
'Jill!' she yelled. 'Jill! Jill, say hello!'
'Mary!' she yelled. 'Mary! Mary, say hello!'


### Check your understanding



In [20]:
# What value will be printed for z?

initial = 7
def f(x, y = 3, z = initial):
    print("x, y, z are:", x, y, z)

f(2, 5)

x, y, z are: 2 5 7


In [21]:
#What value will be printed for y?

initial = 7
def f(x, y = 3, z = initial):
    print("x, y, z are:", x, y, z)

f(2, z = 10)

x, y, z are: 2 3 10


In [23]:
#What value will be printed below?

names = ["Alexey", "Catalina", "Misuki", "Pablo"]
print("'{first}!' she yelled. 'Come here, {first}! {f_one}, {f_two}, and {f_three} are here!'"
      .format(first = names[1], f_one = names[0], f_two = names[2], f_three = names[3]))

'Catalina!' she yelled. 'Come here, Catalina! Alexey, Misuki, and Pablo are here!'


5. Define a function called `multiply`. It should have one required parameter, a string. It should also have one optional parameter, an integer, named `mult_int`, with a default value of 10. The function should return the string multiplied by the integer. (i.e.: Given inputs “Hello”, mult_int=3, the function should return “HelloHelloHello”)

In [25]:
def multiply(a, mult_int = 10):
    resp = a * mult_int
    return resp
print(multiply('Hello '))


Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello 


## Anonymous Function with Lambda Expressions
To further drive home the idea that we are passing a function object as a parameter to the sorted object, let’s see an alternative notation for creating a function, a __lambda expression__. The syntax of a lambda expression is the word “lambda” followed by parameter names, separated by commas but not inside (parentheses), followed by a colon and then an expression. `lambda arguments: expression` yields a function object. This unnamed object behaves just like the function object constructed below.

    def fname(arguments):
    return expression
Then:

    def func(args):
        return ret_value

is equivalent to 

    lambda args: ret_value


In [36]:
def f(x):
    return x - 1

print(f)
print(type(f))
print(f(3))
print("...........................")
print(lambda x: x-2)
print(type(lambda x: x-2))
print((lambda x: x-2)(6))

<function f at 0x0000000005130A60>
<class 'function'>
2
...........................
<function <lambda> at 0x00000000051309D8>
<class 'function'>
4


The lambda expression had to go in parentheses just for the purposes of grouping all its contents together.

In [45]:
(lambda x: x-2) (6)

4

Say we want to create a function that takes a string and returns the last character in that string. What might this look like with the functions you’ve used before?
    
    def last_char(s):
    return s[-1]

In [54]:
s = "LAST"

last_char = lambda s:s[-1]

last_char(s)


'T'

In [55]:
x = 'xyz'
(lambda s:s[-1])(x)


'z'

## Course 2 Assessment 7
1. Create a function called `mult` that has two parameters, the first is required and should be an integer, the second is an optional parameter that can either be a number or a string but whose default is 6. The function should return the first parameter multiplied by the second.

In [63]:
def mult(a, b = 6):
    m = a * b
    return m
print(mult("XYZ "))

XYZ XYZ XYZ XYZ XYZ XYZ 


2. The following function, `greeting`, does not work. Please fix the code so that it runs without error. This only requires one change in the definition of the function.

In [64]:
def greeting(name, greeting="Hello ", excl="!"):
    return greeting + name + excl

print(greeting("Bob"))
print(greeting(""))
print(greeting("Bob", excl="!!!"))

Hello Bob!
Hello !
Hello Bob!!!


4. Write a function, `test`, that takes in three parameters: a required integer, an optional boolean whose default value is `True`, and an optional dictionary, called `dict1`, whose default value is `{2:3, 4:5, 6:8}`. If the boolean parameter is True, the function should test to see if the integer is a key in the dictionary. The value of that key should then be returned. If the boolean parameter is False, return the boolean value “False”.

In [65]:
def test(x, abool = True, dict1 = {2:3, 4:5, 6:8}):
    return abool and dict1.get(x, False)
test(2)

3

5. Write a function called `checkingIfIn` that takes three parameters. The first is a required parameter, which should be a string. The second is an optional parameter called direction with a default value of True. The third is an optional parameter called `d` that has a default value of `{'apple': 2, 'pear': 1, 'fruit': 19, 'orange': 5, 'banana': 3, 'grapes': 2, 'watermelon': 7}`. Write the function `checkingIfIn` so that when the second parameter is `True`, it checks to see if the first parameter is a key in the third parameter; if it is, return True, otherwise return False. But if the second paramter is False, then the function should check to see if the first parameter is not a key of the third.' If it’s not, the function should return True in this case, and if it is, it should return False.

In [66]:
def checkingIfIn(a, direction = True, 
                 d = {'apple': 2, 'pear': 1, 'fruit': 19, 'orange': 5, 'banana': 3, 'grapes': 2, 'watermelon': 7}):
    if direction == True:
        if a in d:
            return True 
        else:
            return False
    else:
        if a not in d:
            return True
        else:
            return False
checkingIfIn('pear')

True

6. We have provided the function `checkingIfIn` such that if the first input parameter is in the third, dictionary, input parameter, then the function returns that value, and otherwise, it returns False. Follow the instructions in the active code window for specific variable assignmemts.


In [67]:
def checkingIfIn(a, direction = True, 
                 d = {'apple': 2, 'pear': 1, 'fruit': 19, 'orange': 5, 'banana': 3, 'grapes': 2, 'watermelon': 7}):
    if direction == True:
        if a in d:
            return d[a]
        else:
            return False
    else:
        if a not in d:
            return True
        else:
            return d[a]

# Call the function so that it returns False and assign that function call to the variable c_false
c_false = checkingIfIn('peas')
print(c_false)
# Call the fucntion so that it returns True and assign it to the variable c_true
c_true = checkingIfIn('apples', False, {'carrots': 1, 'peas': 9, 'potatos': 8, 'corn': 32, 'beans': 1})
print(c_true)
# Call the function so that the value of fruit is assigned to the variable fruit_ans
fruit_ans= checkingIfIn('fruit')
print(fruit_ans)
# Call the function using the first and third parameter so that the value 8 is assigned to the variable param_check
param_check = checkingIfIn('potatos', False, {'carrots': 1, 'peas': 9, 'potatos': 8, 'corn': 32, 'beans': 1})
print(param_check)



False
True
19
8


### Sorting with Sort and Sorted

In [62]:
L1 = [1, 7, 4, -2, 3]
L2 = ["Cherry", "Apple", "Blueberry"]

L1.sort()
print(L1)
L2.sort()
print(L2)

[-2, 1, 3, 4, 7]
['Apple', 'Blueberry', 'Cherry']


Note that the `sort` method does not return a sorted version of the list. In fact, it returns the value None. But the list itself has been modified. This kind of operation that works by having a _side effect_ on the list can be quite confusing.

Here, we will generally use an alternative way of sorting, the **function `sorted`** rather than the **method `sort`**. Because it is a function rather than a method, it is invoked on a list by passing the list as a parameter inside the parentheses, rather than putting the list before the period. _More importantly, sorted does not change the original list. Instead, it returns a new list_.



In [2]:
L2 = ["Cherry", "Apple", "Blueberry"]

L3 = sorted(L2)
print(L3)
print(sorted(L2))
print(L2) # unchanged

print("----")

L2.sort()
print(L2)
print(L2.sort())  #return value is None

['Apple', 'Blueberry', 'Cherry']
['Apple', 'Blueberry', 'Cherry']
['Cherry', 'Apple', 'Blueberry']
----
['Apple', 'Blueberry', 'Cherry']
None


### Optional reverse parameter
The sorted function takes some optional parameters. 
- The first optional parameter is a key function, which will be described later. 
- The second optional parameter is a Boolean value which determines whether to sort the items in _reverse order_. By default, it is False, but if you set it to True, the list will be sorted in reverse order.

In [3]:
L2 = ["Cherry", "Apple", "Blueberry"]
print(sorted(L2, reverse=True))

['Cherry', 'Blueberry', 'Apple']


1. Sort the list, `lst` from largest to smallest. Save this new list to the variable `lst_sorted`.

In [4]:
lst = [3, 5, 1, 6, 7, 2, 9, -2, 5]
lst_sorted = sorted(lst, reverse=True)
print(lst_sorted)

[9, 7, 6, 5, 5, 3, 2, 1, -2]


### Optional key parameter
If you want to sort things in some order **other than the “natural” or its reverse**, you can provide an additional parameter, the key parameter. For example, suppose you want to sort a list of numbers based on their absolute value, so that -4 comes after 3? Or suppose you have a dictionary with strings as the keys and numbers as the values. Instead of sorting them in alphabetic order based on the keys, you might like to sort them in order based on their values.

In [68]:
L1 = [1, 7, 4, -2, 3]

def absolute(x):
    if x >= 0:
        return x
    else:
        return -x

print(absolute(3))
print(absolute(-119))

for y in L1:
    print(absolute(y))

3
119
1
7
4
2
3


**Now, we can pass the `absolute function` to sorted in order to specify that we want the items sorted in order of their absolute value, rather than in order of their actual value.**

In [69]:
L1 = [1, 7, 4, -2, 3]

def absolute(x):
    if x >= 0:
        return x
    else:
        return -x

L2 = sorted(L1, key=absolute)
print(L2)

#or in reverse order
print(sorted(L1, reverse=True, key=absolute))


[1, -2, 3, 4, 7]
[7, 4, 3, -2, 1]


#### Check Your Understanding
1. You will be sorting the following list by each element’s second letter, a to z. Create a function to use when sorting, called `second_let`. It will take a string as input and return the second letter of that string. Then sort the list, create a variable called `sorted_by_second_let` and assign the sorted list to it. Do not use lambda.


In [132]:
ex_lst = ['hi', 'how are you', 'bye', 'apple', 'zebra', 'dance']
def second_let(wrd):
    return wrd[1]
sorted_by_second_let = sorted(ex_lst, key=second_let)
print(sorted_by_second_let)

['dance', 'zebra', 'hi', 'how are you', 'apple', 'bye']


In [115]:
#With lambda function
ex_lst = ['hi', 'how are you', 'bye', 'apple', 'zebra', 'dance']

sorted(ex_lst, key=lambda x: x[1])
# ['dance', 'zebra', 'hi', 'how are you', 'apple', 'bye']

['dance', 'zebra', 'hi', 'how are you', 'apple', 'bye']

2. Below, we have provided a list of strings called `nums`. Write a function called `last_char` that takes a string as input, and returns only its last character. Use this function to sort the list `nums` by the last digit of each number, from highest to lowest, and save this as a new list called `nums_sorted`.



In [117]:
nums = ['1450', '33', '871', '19', '14378', '32', '1005', '44', '8907', '16']

def last_char(num):
    return num[-1]

nums_sorted = sorted(nums, reverse=True, key=last_char)
print(nums_sorted)

['19', '14378', '8907', '16', '1005', '44', '33', '32', '871', '1450']


3. Once again, sort the list `nums` based on the last digit of each number from highest to lowest. However, now you should do so by writing a lambda function. Save the new list as `nums_sorted_lambda`.

In [119]:
nums = ['1450', '33', '871', '19', '14378', '32', '1005', '44', '8907', '16']

nums_sorted_lambda = sorted(nums, key=lambda num: num[-1], reverse=True)

print(nums_sorted_lambda)

#sorted(ex_lst, key=lambda x: x[1])

['19', '14378', '8907', '16', '1005', '44', '33', '32', '871', '1450']


### Sorting a Dictionary

Previously, you have used a dictionary to accumulate counts, such as the frequencies of letters or words in a text. For example, the following code counts the frequencies of different numbers in the list.

In [121]:
L = ['E', 'F', 'B', 'A', 'D', 'I', 'I', 'C', 'B', 'A', 'D', 'D', 'E', 'D']

d = {}
for x in L:
    if x in d:
        d[x] = d[x] + 1
    else:
        d[x] = 1
for x in d.keys():
    print("{} appears {} times".format(x, d[x]))
print(d)

E appears 2 times
F appears 1 times
B appears 2 times
A appears 2 times
D appears 4 times
I appears 2 times
C appears 1 times
{'E': 2, 'F': 1, 'B': 2, 'A': 2, 'D': 4, 'I': 2, 'C': 1}


We can force the results to be displayed in some fixed ordering, by sorting the keys.

In [191]:
L = ['E', 'F', 'B', 'A', 'D', 'I', 'I', 'C', 'B', 'A', 'D', 'D', 'E', 'D']

d = {}

for x in L:
    if x in d:
        d[x] = d[x] +1
    else:
        d[x] = 1

y = sorted(d.keys())
for k in y:
    print("{} appears {} times.".format(k, d[k]))

A appears 2 times.
B appears 2 times.
C appears 1 times.
D appears 4 times.
E appears 2 times.
F appears 1 times.
I appears 2 times.


__With a dictionary that’s maintaining counts or some other kind of score, we might prefer to get the outputs sorted based on the count rather than based on the items. The standard way to do that in python is to sort based on a property of the key, in particular its value in the dictionary.__

The key function always takes as input one item from the sequence and returns a property of the item. In our case, the items to be sorted are the dictionary’s keys, so each item is one key from the dictionary. To remind ourselves of that, we’ve named the parameter in tha lambda expression k. The property of key k that is supposed to be returned is its associated value in the dictionary. Hence, we have the lambda expression `lambda k: d[k]`.

In [138]:
L = ['E', 'F', 'B', 'A', 'D', 'I', 'I', 'C', 'B', 'A', 'D', 'D', 'E', 'D']

d = {}
for x in L:
    if x in d:
        d[x] = d[x] + 1
    else:
        d[x] = 1

y = sorted(d.keys(), key=lambda k: d[k], reverse=True)
for k in y:
    print("{} appears {} times".format(k, d[k]))

D appears 4 times
E appears 2 times
B appears 2 times
A appears 2 times
I appears 2 times
F appears 1 times
C appears 1 times


Here’s a version of that using a named function.

In [141]:
L = ['E', 'F', 'B', 'A', 'D', 'I', 'I', 'C', 'B', 'A', 'D', 'D', 'E', 'D']

d = {}
for x in L:
    if x in d:
        d[x] = d[x] + 1
    else:
        d[x] = 1
        
def g(k):
    return d[k]

y = (sorted(d.keys(), key=g, reverse=True))

for k in y:
    print("{} appears {} times.".format(k, d[k]))

D appears 4 times.
E appears 2 times.
B appears 2 times.
A appears 2 times.
I appears 2 times.
F appears 1 times.
C appears 1 times.


**Note:** When we sort the keys, passing a function with `key=lambda x: d[x]` does not specify to sort the keys of a dictionary. The lists of keys are passed as the first parameter value in the invocation of sort. The key parameter provides a function that says how to sort them.

An experienced programmer would probably not even separate out the sorting step. And they might take advantage of the fact that when you pass a dictionary to something that is expecting a list, its the same as passing the list of keys

In [143]:
L = ['E', 'F', 'B', 'A', 'D', 'I', 'I', 'C', 'B', 'A', 'D', 'D', 'E', 'D']

d = {}
for x in L:
    if x in d:
        d[x] = d[x] + 1
    else:
        d[x] = 1

# now loop through the sorted keys
for k in sorted(d, key=lambda k:d[k], reverse=True):
    print("{} appears {} times.".format(k, d[k]))


D appears 4 times.
E appears 2 times.
B appears 2 times.
A appears 2 times.
I appears 2 times.
F appears 1 times.
C appears 1 times.


### Check Your Understanding



1. Which of the following will sort the keys of d in ascending order of their values (i.e., from lowest to highest)?

In [168]:
#The lambda function takes just one parameter, and calls g with two parameters.
#  `sorted(ks, key=lambda x: g(x, d))`
#The lambda function looks up the value of x in d.
#   sorted(ks, key=lambda x: d[x])

L = [4, 5, 1, 0, 3, 8, 8, 2, 1, 0, 3, 3, 4, 3]

d = {}
for x in L:
    if x in d:
        d[x] = d[x] + 1
    else:
        d[x] = 1

#def g(k, d):
#    return d[k]

ks = sorted(d.keys(), key=lambda x:d[x])

for k in ks:
    print("{} appears {} times".format(k, d[k]))


5 appears 1 times
2 appears 1 times
4 appears 2 times
1 appears 2 times
0 appears 2 times
8 appears 2 times
3 appears 4 times


In [173]:
#The lambda function takes just one parameter, and calls g with two parameters.
#  `sorted(ks, key=lambda x: g(x, d))`
#The lambda function looks up the value of x in d.
#   sorted(ks, key=lambda x: d[x])

L = [4, 5, 1, 0, 3, 8, 8, 2, 1, 0, 3, 3, 4, 3]

d = {}
for x in L:
    if x in d:
        d[x] = d[x] + 1
    else:
        d[x] = 1

def g(k, d):
    return d[k]

ks = sorted(d.keys(), key=lambda x:g(x,d))

for k in ks:
    print("{} appears {} times".format(k, d[k]))

5 appears 1 times
2 appears 1 times
4 appears 2 times
1 appears 2 times
0 appears 2 times
8 appears 2 times
3 appears 4 times


2. Sort the following dictionary based on the keys so that they are sorted a to z. Assign the resulting value to the variable `sorted_keys`.

In [177]:
dictionary = {"Flowers": 10, 'Trees': 20, 'Chairs': 6, "Firepit": 1, 'Grill': 2, 'Lights': 14}

sorted_keys = sorted(dictionary.keys())
print(sorted_keys)

['Chairs', 'Firepit', 'Flowers', 'Grill', 'Lights', 'Trees']


3. Below, we have provided the dictionary `groceries`, whose keys are grocery items, and values are the number of each item that you need to buy at the store. Sort the dictionary’s keys into alphabetical order, and save them as a list called `grocery_keys_sorted`.

In [178]:
groceries = {'apples': 5, 'pasta': 3, 'carrots': 12, 'orange juice': 2, 'bananas': 8, 'popcorn': 1, 'salsa': 3, 'cereal': 4, 'coffee': 5, 'granola bars': 15, 'onions': 7, 'rice': 1, 'peanut butter': 2, 'spinach': 9}

grocery_keys_sorted = sorted(groceries.keys())

print(grocery_keys_sorted)

['apples', 'bananas', 'carrots', 'cereal', 'coffee', 'granola bars', 'onions', 'orange juice', 'pasta', 'peanut butter', 'popcorn', 'rice', 'salsa', 'spinach']


4. Sort the following dictionary’s keys based on the value from highest to lowest. Assign the resulting value to the variable `sorted_values`

In [193]:
dictionary = {"Flowers": 10, "Trees": 20, "Chairs": 6, "Firepit": 1, "Grill": 2, "Lights": 14}

sorted_values = sorted(dictionary.keys(), key=lambda k: dictionary[k], reverse=True)

print(sorted_values)



['Trees', 'Lights', 'Flowers', 'Chairs', 'Grill', 'Firepit']


### Breaking Ties: Second Sorting
What happens when two items are “tied” in the sort order? For example, suppose we sort a list of words by their lengths. Which five letter word will appear first?
The answer is that the python interpreter will sort the tied items in the same order they were in before the sorting.

First, let’s see how python sorts tuples. We’ve already seen that there’s a built-in sort order, if we don’t specify any key function. For numbers, it’s lowest to highest. For strings, it’s alphabetic order. For a sequence of tuples, the default sort order is based on the default sort order for the first elements of the tuples, with ties being broken by the second elements, and then third elements if necessary, etc. For example,

In [194]:
tups = [('A', 3, 2),
        ('C', 1, 4),
        ('B', 3, 1),
        ('A', 2, 4),
        ('C', 1, 2)]
for tup in sorted(tups):
    print(tup)

('A', 2, 4)
('A', 3, 2)
('B', 3, 1)
('C', 1, 2)
('C', 1, 4)


In the code below, we are going to sort a list of fruit words first by their length, smallest to largest, and then alphabetically to break ties among words of the same length. To do that, we have __the key function return a tuple whose first element is the length of the fruit’s name, and second element is the fruit name itself__.

In [195]:
fruits = ['peach', 'kiwi', 'apple', 'blueberry', 'papaya', 'mango', 'pear']
new_order = sorted(fruits, key=lambda fruit_name: (len(fruit_name), fruit_name))
for fruit in new_order:
    print(fruit)

kiwi
pear
apple
mango
peach
papaya
blueberry


Here, each word is evaluated first on it’s length, then by its alphabetical order. Note that we could continue to specify other conditions by including more elements in the tuple.

What would happen though if we wanted to sort it by largest to smallest, and then by alphabetical order?

In [202]:
fruits = ['peach', 'kiwi', 'apple', 'blueberry', 'papaya', 'mango', 'pear']
new_order = sorted(fruits, key=lambda fruit_name: (len(fruit_name), fruit_name), reverse=True)
for fruit in new_order:
    print(fruit)

blueberry
papaya
peach
mango
apple
pear
kiwi


Do you see a problem here? Not only does it sort the words from largest to smallest, but also in reverse alphabetical order! Can you think of any ways you can solve this issue?

__One solution is to add a negative sign in front of len(fruit_name), which will convert all positive numbers to negative, and all negative numbers to positive. As a result, the longest elements would be first and the shortest elements would be last__.

We can use this for any numerical value that we want to sort, however this will not work for strings.

In [203]:
fruits = ['peach', 'kiwi', 'apple', 'blueberry', 'papaya', 'mango', 'pear']
new_order = sorted(fruits, key=lambda fruit_name: (-len(fruit_name), fruit_name))
for fruit in new_order:
    print(fruit)

blueberry
papaya
apple
mango
peach
kiwi
pear


#### Check Your Understanding

In [207]:
# 1: What will the sorted function sort by?

weather = {'Reykjavik': {'temp':60, 'condition': 'rainy'},
           'Buenos Aires': {'temp': 55, 'condition': 'cloudy'},
           'Cairo': {'temp': 96, 'condition': 'sunny'},
           'Berlin': {'temp': 89, 'condition': 'sunny'},
           'Caloocan': {'temp': 78, 'condition': 'sunny'}}

sorted_weather = sorted(weather, key=lambda w: (w, weather[w]['temp']))

print(sorted_weather)

# first city name (alphabetically), then temperature (lowest to highest)

['Berlin', 'Buenos Aires', 'Cairo', 'Caloocan', 'Reykjavik']


In [208]:
#2: What how will the following data be sorted?

weather = {'Reykjavik': {'temp':60, 'condition': 'rainy'},
           'Buenos Aires': {'temp': 55, 'condition': 'cloudy'},
           'Cairo': {'temp': 96, 'condition': 'sunny'},
           'Berlin': {'temp': 89, 'condition': 'sunny'},
           'Caloocan': {'temp': 78, 'condition': 'sunny'}}

sorted_weather = sorted(weather, key=lambda w: (w, -weather[w]['temp']), reverse=True)
print(sorted_weather)

#first city name (reverse alphabetically), then temperature (lowest to highest)

['Reykjavik', 'Caloocan', 'Cairo', 'Buenos Aires', 'Berlin']


### When to use a Lambda Expression
_Though you can often use a lambda expression or a named function interchangeably when sorting, it’s generally best to use lambda expressions until the process is too complicated, and then a function should be used._

For example, in the following examples, we’ll be sorting a dictionary’s keys by properties of its values. Each key is a state name and each value is a list of city names.

For our first sort order, we want to sort the `states` in order by the length of the first city name. Here, it’s pretty easy to compute that property. `states[state]` is the list of cities associated with a particular state. So If `state` is a list of city strings, `len(states[state][0])` is the length of the first city name. Thus, we can use a `lambda` expression:

In [210]:
states = {"Minnesota": ["St. Paul", "Minneapolis", "Saint Cloud", "Stillwater"],
          "Michigan": ["Ann Arbor", "Traverse City", "Lansing", "Kalamazoo"],
          "Washington": ["Seattle", "Tacoma", "Olympia", "Vancouver"]}

print(sorted(states, key=lambda state: len(states[state][0])))

['Washington', 'Minnesota', 'Michigan']


That’s already pushing the limits of complex a lambda expression can be before it’s reall hard to read (or debug).

For our second sort order, the property we want to sort by is the number of cities that begin with the letter ‘S’. The function defining this property is harder to express, requiring a filter and count accumulation pattern. So we are better off defining a separate, named function. Here, we’ve chosen to make a lambda expression that looks up the value associated with the particular state and pass that value to the named function s_cities_count. We could have passed just the key, but then the function would have to look up the value, and it would be a little confusing, from the code, to figure out what dictionary the key is supposed to be looked up in. Here, we’ve done the lookup right in the lambda expression, which makes it a little bit clearer that we’re just sorting the keys of the states dictionary based on a property of their values. It also makes it easier to reuse the counting function on other city lists, even if they aren’t embedded in that particular states dictionary.

In [211]:
def s_cities_count(city_list):
    ct = 0
    for city in city_list:
        if city[0] == "S":
            ct += 1
    return ct

states = {"Minnesota": ["St. Paul", "Minneapolis", "Saint Cloud", "Stillwater"],
          "Michigan": ["Ann Arbor", "Traverse City", "Lansing", "Kalamazoo"],
          "Washington": ["Seattle", "Tacoma", "Olympia", "Vancouver"]}

print(sorted(states, key=lambda state: s_cities_count(states[state])))


['Michigan', 'Washington', 'Minnesota']


## Course 2 Assessment 8
1. Sort the following string alphabetically, from z to a, and assign it to the variable `sorted_letters`

In [212]:
letters = "alwnfiwaksuezlaeiajsdl"
sorted_letters = sorted(letters, reverse=True)
print(sorted_letters)

['z', 'w', 'w', 'u', 's', 's', 'n', 'l', 'l', 'l', 'k', 'j', 'i', 'i', 'f', 'e', 'e', 'd', 'a', 'a', 'a', 'a']


2. Sort the list below, animals, into alphabetical order, a-z. Save the new list as `animals_sorted`.



In [213]:
animals = ['elephant', 'cat', 'moose', 'antelope', 'elk', 'rabbit', 'zebra', 'yak', 'salamander', 'deer', 'otter', 'minx', 'giraffe', 'goat', 'cow', 'tiger', 'bear']

animals_sorted = sorted(animals)
print(animals_sorted)

['antelope', 'bear', 'cat', 'cow', 'deer', 'elephant', 'elk', 'giraffe', 'goat', 'minx', 'moose', 'otter', 'rabbit', 'salamander', 'tiger', 'yak', 'zebra']


3. The dictionary, `medals`, shows the medal count for six countries during the Rio Olympics. Sort the country names so they appear alphabetically. Save this list to the variable `alphabetical`

In [214]:
medals = {'Japan':41, 'Russia':56, 'South Korea':21, 'United States':121, 'Germany':42, 'China':70}

alphabetical = sorted(medals.keys())
print(alphabetical)


['China', 'Germany', 'Japan', 'Russia', 'South Korea', 'United States']


**4. Given the same dictionary, `medals`, now sort by the medal count. Save the three countries with the highest medal count to the list, `top_three`**

In [229]:
medals = {'Japan':41, 'Russia':56, 'South Korea':21, 'United States':121, 'Germany':42, 'China':70}

top_three = sorted(medals, key= lambda x:medals[x], reverse=True)[:3]
print(top_three)

['United States', 'China', 'Russia']


5. We have provided the dictionary `groceries`. You should return a list of its keys, but they should be sorted by their values, from highest to lowest. Save the new list as `most_needed`

In [232]:
groceries = {'apples': 5, 'pasta': 3, 'carrots': 12, 'orange juice': 2, 
             'bananas': 8, 'popcorn': 1, 'salsa': 3, 'cereal': 4, 'coffee': 5, 
             'granola bars': 15, 'onions': 7, 'rice': 1, 'peanut butter': 2, 'spinach': 9}

most_needed  = sorted(groceries, key=lambda k:groceries[k], reverse=True)
print(most_needed)

['granola bars', 'carrots', 'spinach', 'bananas', 'onions', 'apples', 'coffee', 'cereal', 'pasta', 'salsa', 'orange juice', 'peanut butter', 'popcorn', 'rice']


6. Create a function called `last_four` that takes in an ID number and returns the last four digits. For example, the number 17573005 should return 3005. Then, use this function to sort the list of ids stored in the variable, ids, from lowest to highest. Save this sorted list in the variable, `sorted_ids`. Hint: _Remember that only strings can be indexed, so conversions may be needed_.

In [250]:

def last_four(x):
    return str(x)[-4:]

ids = [17573005, 17572342, 17579000, 17570002, 17572345, 17579329]

sorted_ids = sorted(ids, key=last_four)
print(sorted_ids)

[17570002, 17572342, 17572345, 17573005, 17579000, 17579329]


7. Sort the list ids by the last four digits of each id. Do this using `lambda` and not using a defined function. Save this sorted list in the variable `sorted_id`.

In [253]:
ids = [17573005, 17572342, 17579000, 17570002, 17572345, 17579329]

sorted_id = sorted(ids, key=lambda x:str(x)[-4:])

print(sorted_id)

[17570002, 17572342, 17572345, 17573005, 17579000, 17579329]


8. Sort the following list by each element’s second letter a to z. Do so by using lambda. Assign the resulting value to the variable `lambda_sort`

In [254]:
#With lambda function
ex_lst = ['hi', 'how are you', 'bye', 'apple', 'zebra', 'dance']

lambda_sort = sorted(ex_lst, key=lambda x: x[1])

print(lambda_sort)

['dance', 'zebra', 'hi', 'how are you', 'apple', 'bye']


# Project - Part 1: Sentiment Classifier

We have provided some synthetic (fake, semi-randomly generated) twitter data in a csv file named `project_twitter_data.csv` which has the text of a tweet, the number of retweets of that tweet, and the number of replies to that tweet. We have also words that express `positive sentiment` and `negative sentiment`, in the files `positive_words.txt` and `negative_words.txt`.

**Your task is to build a sentiment classifier, which will detect how positive or negative each tweet is**. You will create a csv file, which contains columns for the Number of Retweets, Number of Replies, Positive Score (which is how many happy words are in the tweet), Negative Score (which is how many angry words are in the tweet), and the Net Score for each tweet. At the end, you upload the csv file to Excel or Google Sheets, and produce a graph of the Net Score vs Number of Retweets.

To start, define a function called `strip_punctuation` which takes one parameter, a string which represents a word, and removes characters considered punctuation from everywhere in the word. (Hint: remember the .replace() method for strings.)