Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All). Check your output to make sure it all looks as you expected.

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your name below:

In [1]:
NAME = "Bryan Tchakote"

---

**Here are some key points on how to use the notebook and submit your work.**
We will grade based on assuming you have read and understood them.
    
1. **Using the Notebook to Show Your Work**: You must learn to write code in the notebook... It is a core tool for data science and will make it easier to develop and document your work if you become good at using it. Writing your code in another tool and pasting it into the notebook will probably not work well (forgetting to include code elsewhere, messing up the spacing, or code that don't run as copied). You must be sure your code cells execute because we will test them.  So learn how to run code in the notebook cells to double check your work.
2. **Read Directions Carefully**:  The instructions for your code are very important. If you don't follow the requirements, your application won't do as requested. Making it work correctly is part of learning to program.  If we worded something unclearly, ask the teacher.

The second set of issues are coding style:

1. **Indentation**: In Python, indentation matters and must be consistent. If you write your code in the Notebook, the **tab** key will indent properly.  If you use another editor and paste into the notebook, it might not be correctly indented (when you do write code in another editor, make sure you set your tabs to indent as 4 spaces, not as a tab character.)  You must make sure that your pasted code runs in the notebook or it will not get a good grade. Anyway, we recommend beginners to work in a Jupyter notebook for this course, whether it's this one or a draft file.
2. **Spacing**: Follow closely the spacing shown in the lessons. There should not be a space  between a function name and the parentheses with the arguments. As a programmer, style is very important. If you work with programmers in the future, they sometimes have "lint" checkers to test your code for style and reject if it doesn't follow the approprate spacing and blank-line-rules. Think of it as a matter of politness for other people reading your code \ (•◡•) /
3. **Names of Variables**: In Python, there's a culture of making everything readable. Don't use ``x`` and ``y`` as your variable names... use words like ``pounds`` and ``kilograms``. It will be easier for colleagues (and yourself) to understand the code later.
4. **Error Messages**: Please use informative error messages that tell the user what they did wrong and what kind of input you expect. Imagine you are designing the user experience! Think about how to help your user. And remember **you** are the user when you debug!

We will take points off for issues of non-standard spacing, indentation, bad error messages, and bad variable names in the future.  This will continue for the entire course.

There are multiple ways to code all the answers.  Here are a few more code style tips:

1. If you do a calculation or a transformation, like ``float(pounds)`` -- do it once and save it as a variable, don't do it multiple times.  You should try not to have code that repeats itself too much.  If you repeat things, you can make mistakes like typos and it will be harder to find them. Also, it's wasting computer power.
2. Tests like "4 < test < 40" need to be saved in a variable or used in a ``if`` statement.  It won't do anything relevant otherwise.
3. ``try``/``except`` should be used to catch errors. (In fancier, more formal Python, there is more careful error catching where the type of error is detected and handled. We're just doing the basic try/except right now.) Anytime you have a conversion or something that could result in an error, you should wrap it in try/except. Do not allow a user to run code that results in an un-handled error.

---

# Notebook 2, Chapters 5-8 in Python For Everyone
[Book Link](https://books.trinket.io/pfe/05-iterations.html)

It is a lot of reading, but you really need to master these tools to become good with data analysis and visualization in Python.

Fill in your work below each assignment, by adding new cells and showing them running correctly.

**Note: We will deduct points for unclear variable names, irregular indents and spacing.**

**Important: Don't use exit() in the notebook, even though the e-book uses it. It stops the kernel which you will have to restart.  You don't need exit(); instead use 'return' at the end of a function, or 'break' to exit a loop.**



## Exercises for Chapter 5: "Iterations"

**Question 1)** Write a function which repeatedly asks for the input of a "name" and prints out ``Hi [name]! Good to see you!`` (replace [name] with the name they entered).  If they enter "Bob", print out ``oh, not you, Bob!`` and stop the loop asking for input. 

**Important**: Use ``return`` or ``break``, do not use ``exit()`` as we said above. ``exit()`` will stop your notebook from computing (stop the kernel).

Save the longest and shortest name encountered (count the letters). When you stop the loop after seeing "Bob", print out the shortest and longest names previously entered.  Use these strings or your code won't pass the tests: ``Longest name was [name]`` then ``Shortest name was [name]``.

If the user enters anything other than a string of characters, for instance, numbers, print out an error message, and continue on to ask for a name again.  Your error should start with "Error:".

You might read chapter 6 on **strings** to see how to handle them in this program.  In particular, there is a ``function.isalpha()`` that you should look up.

In [2]:
def bob_loop():
    # the following strings can help you to produce the exact expected messages
    bob_msg = "oh, not you, Bob!"
    other_msg = "Hi {}! Good to see you!"  # this is a format string. Use the ``format`` method on it
    names = ["Bob"]
    
    while True:
        name = input("What's your name: ")
        
        if not name.isalpha():
            print("Error: only character string are allowed")
            continue
            
        if name == "Bob":
            print(bob_msg)
            break
        
        print(other_msg.format(name))
        if name not in names: names.append(name)
        
    lengths = [len(name) for name in names]
    longest = names[lengths.index(max(lengths))]
    shortest = names[lengths.index(min(lengths))]
    
    print("\nLongest name was", longest)
    print("Shortest name was", shortest)

In [3]:
# Make sure this works with any input including Bob.
bob_loop()

What's your name: Bryan
Hi Bryan! Good to see you!
What's your name: Manuel
Hi Manuel! Good to see you!
What's your name: T
Hi T! Good to see you!
What's your name: Bob
oh, not you, Bob!

Longest name was Manuel
Shortest name was T


In [4]:
# Hidden test cell


In [5]:
# Hidden test cell


##  Exercise for Chapter 6: "Strings"

**Question 3)** Write a function which takes a string argument with no spaces in it, searches for vowels (the letters "a", "e", "i", "o", "u") in the string, replaces them by upper case characters, and prints out the new string with the upper cases as well as returns the new string from the function. You should verify it is a string argument using ``isalpha`` (so no spaces are allowed!) and return with an error if not (the error message should being with "Error:"). 

Reminder:  in ``def myfunction(arg):`` the string arg is the argument position and variable name for the argument.  You will run it like:

    myfunction("hi there")

For instance, if the string input is "miscellaneous", then your program will print out and return "mIscEllAnEOUs".  If nothing in the string is a vowel, print "Nothing to convert!" and return ``None``.

**Hint Note: If you are doing the reading, ``for letter in word`` and ``if letter in vowels`` should be clear to you as a way to go through a string. You can also add ``print`` statements to show the value of things as you run your code... but comment them out or delete them before you submit your work.**


In [6]:
def uppercase(word):
    vowels = "aeiou"
    nothing_msg = "Nothing to convert!"
    
    if not word.isalpha():
        print("Error: parameter should verify 'isalpha'")
        return
    
    converted_word = ""
    for letter in word:
        if letter in vowels:
            letter = letter.upper()
            converted = True
        
        converted_word += letter
    
    if converted_word == word:
        print(nothing_msg)
        return
    
    return converted_word

In [7]:
# this should print "Error:" plus some helpful string if you did it right. Spaces are not allowed.
uppercase("well hello there")

Error: parameter should verify 'isalpha'


In [8]:
# this should print out and return the word with uppercased vowels
uppercase("hello")

'hEllO'

In [9]:
# this should print out "Nothing to convert!"  It returns None, the python term for nothing.
uppercase("HI")

Nothing to convert!


In [10]:
import mock, sys
from io import StringIO

In [11]:
# Grading tests you can ignore: output from word with all uppercase already.

with mock.patch('sys.stdout', new_callable=StringIO):
        assert uppercase("HI") is None
        assert "Nothing to convert!" in sys.stdout.getvalue()

In [12]:
assert uppercase("hello") == "hEllO"

In [13]:
# A grading test for your code, you can ignore this. Returns None and prints error.

with mock.patch('sys.stdout', new_callable=StringIO):
        assert uppercase("H3432e") is None
        assert "Error" in sys.stdout.getvalue()

**Question 4)** 

Take the following Python code that stores a string in variable mystring::

    mystring = 'X-DSPAM-Confidence:0.8475'

Write a function that uses ``find`` and string slicing to extract the portion of the string after the colon character (:) and then use the ``float`` function to convert the extracted string into a floating point number. Print and then return the floating point number from the function.  If no colon character is found, return ``None``.


In [14]:
help(str.find)  # ask the Python interpreter for more info about the find method of str

Help on method_descriptor:

find(...)
    S.find(sub[, start[, end]]) -> int
    
    Return the lowest index in S where substring sub is found,
    such that sub is contained within S[start:end].  Optional
    arguments start and end are interpreted as in slice notation.
    
    Return -1 on failure.



In [15]:
def get_number(stringinput):
    """ Your stringinput should be a string as described above. Print the float part after the :
    and also return it.  Return None if no : is found in the string."""
    
    err_msg = "Error: The input should have a float only after the column"
    
    start = stringinput.find(":")
    if start == -1:
        return
    
    try:
        number = float(stringinput[start+1:])
    except ValueError:
        print(err_msg)
        return
        
    print("The extracted number is", number)
    return number

In [16]:
# Check your code does the right thing. It should print and also return the float.
mystring = 'X-DSPAM-Confidence:0.8475'
get_number(mystring)

The extracted number is 0.8475


0.8475

In [17]:
# testing your return value:
mystring = 'X-DSPAM-Confidence:0.8475'
assert get_number(mystring) == 0.8475


The extracted number is 0.8475


In [18]:
# test return from a missing :
mystring = "No colon in this"
assert get_number(mystring) == None

In [19]:
## Extra credit test that's hidden :)


## Exercises for Chapter 7: "Files"

This is a big important chapter.  All data files have to be loaded and parsed into Python before you can do analysis on them.  And you can also write out your results in files after doing this chapter.


**Question 5)** Get the *mbox-short.txt* file. Be sure you understand where it is located so you can read it in. If it is in the same directory as this notebook, you don't need to give the full path to the file.  A path is the working directory info + filename. For instance:

```
myfile = "/Users/Documents/PythonTeaching/mbox-short.txt"
```


Create a function that reads in the file and looks for lines with "http" in them. (Hint: "http" can appear inside a longer string like "https".) Keep a count of how many URLs appear. Print out the first URL in each line, but nothing else in that line. This means you need to isolate the URL from the rest of the line (you can use space as the delimiter.)

At the end, return the total count of lines containing http in the file.

**Note: the preferred way to do the file open and read is as follows, not as shown in the book.  Please use this model in your code:**

In [20]:
## Example of how to read in the file and do things with it:

# you need to give the full path to the file you are opening in the string, or a path relative your notebook
# A relative path means a folder + filename inside the current folder where the notebook is running.
# Here, mine is in a folder called data_files.

with open('data_files/mbox-short.txt', 'r') as handle:
    for line in handle:
        # do stuff here with each line.  Try uncommenting the print and removing the 'continue.  
        # Think about removing the \n at the end of the line.
        # print(line)
        continue

In [21]:
def count_urls(filename):
    """
    Pass in the full path to the file you are searching for URLs.
    Return the total of lines with urls found.
    """
    
    n_lines = 0
    with open(filename, 'r') as handle:
        for line in handle:
            if "http" in line:
                line = line[line.find("http"):].split()[0]
                print(line) if ")" not in line else print(line[:-1])
                n_lines += 1
    return n_lines

In [22]:
# test your code on your function -- put the path to the file in the argument.

filepath = 'data_files/mbox-short.txt'
count_urls(filepath)

http://source.sakaiproject.org/viewsvn/?view=rev&rev=39772
https://collab.sakaiproject.org/portal
http://source.sakaiproject.org/viewsvn/?view=rev&rev=39771
https://collab.sakaiproject.org/portal
http://source.sakaiproject.org/viewsvn/?view=rev&rev=39770
https://source.sakaiproject.org/svn/site-manage/trunk/
https://collab.sakaiproject.org/portal
http://source.sakaiproject.org/viewsvn/?view=rev&rev=39769
https://collab.sakaiproject.org/portal
http://source.sakaiproject.org/viewsvn/?view=rev&rev=39766
https://collab.sakaiproject.org/portal
http://source.sakaiproject.org/viewsvn/?view=rev&rev=39765
https://collab.sakaiproject.org/portal
http://source.sakaiproject.org/viewsvn/?view=rev&rev=39764
https://source.sakaiproject.org/svn/msgcntr/trunk
http://jira.sakaiproject.org/jira/browse/SAK-12488
https://collab.sakaiproject.org/portal
http://source.sakaiproject.org/viewsvn/?view=rev&rev=39763
https://source.sakaiproject.org/svn/msgcntr/trunk
http://jira.sakaiproject.org/jira/browse/SAK-1248

75

In [23]:
# Hidden tests for your code. Manually grading the url printing.  Testing the output count is correct.

## Exercises for Chapter 8: "Lists"

We know you're tired, but understanding lists in Python is essential.  They are one of the coolest things about the language.

**Question 6)** Download *romeo.txt*.

Write a function that takes as argument the path to the file *romeo.txt* and reads it line by line.  Use the same pattern as above, using "with open" and everything indented under it!  For each line, split the line into a list of words by the space character, using the ``split`` function.

Create a single empty list that will hold all the unique words in the entire file. For each word in each line, check to see if the word is already in that list. If the word is not in that list, add it to the list (you don't need to lowercase the words).

When the program completes, sort the list and print the resulting words in alphabetical order using a statement like this::

    print(' '.join(words))

Finally, return the first word of the list from the function!


In [24]:
def unique_words(filepath):
    """ This function reads in a filepath text file and prints the sorted unique words. 
    It returns the first unique word.
    """
    words = []
    with open(filepath, 'r') as handle:
        for line in handle:
            for word in line.split():
                if word not in words:
                    words.append(word)

    words.sort()
    print(" ".join(words))
    return words[0] if words else None

In [25]:
# show this works.  Set the path to where you file is. If it's the same folder, no / part needed.

myfile = 'data_files/romeo.txt'
unique_words(myfile)

Arise But It Juliet Who already and breaks east envious fair grief is kill light moon pale sick soft sun the through what window with yonder


'Arise'

In [26]:
# Hidden test cell


### Short questions about lists and strings

**NOTE:** These were specifically about the reading. You have points removed if your solution wasn't what we were focused on in the text.  Also, we want general purpose functions that work on lists or strings, not hard-coded upper case "I" etc.

In [27]:
# a) Write a function that takes a list of words like this and joins it into a single string separated by spaces. Print the result.
words = ["I'm", "sorry", "Dave", "I", "can't", "do", "that."]

In [28]:
def print_string(wordlist):
    sentence = " ".join(wordlist)
    return sentence

In [29]:
# show it works with the list above (words)
print_string(words)

"I'm sorry Dave I can't do that."

In [30]:
# Hidden test cell


In [31]:
# b) How would you split this into separate data values using the comma? 
# Write a function that returns the list of strings.
csvstring = "34,fred  ,25,35.5,24,india"

In [32]:
def separate_words(string):
    return string.split(",")

In [33]:
# show it works on the csvstring above.
separate_words(csvstring)

['34', 'fred  ', '25', '35.5', '24', 'india']

In [34]:
# Hidden test cell


**NOTE:** Again, use a general function, there are many for helping with string operations.

c) Write a function that uses a for-loop to remove whitespace from around the words using strip, 
capitalizes the words, and prints the length of each of the strings in your words list above after doing those cleaning operations. Return a clean list of the same words, with caps and no spaces around each one.

In [35]:
words = ['34', 'fred  ', '25', '35.5', '24', 'india']

def clean_words(wordlist):
    """ Takes a list of strings and capitalizes words, removes blank spaces around any of them.
    Returns a cleaned list."""
    
    cleaned_list = []
    for word in wordlist:
        cleaned_word = word.strip().capitalize()
        print(len(cleaned_word))
        cleaned_list.append(cleaned_word)
    return cleaned_list

In [36]:
# this should work for you and return the list of cleaned words:
clean_words(words)

2
4
2
4
2
5


['34', 'Fred', '25', '35.5', '24', 'India']

In [37]:
# Hidden test cell


d) Create a variable "sliced_words" using index slicing to get the first 4 elements of the "words" list above.

In [38]:
sliced_words = words[:4]
sliced_words

['34', 'fred  ', '25', '35.5']

In [39]:
# Hidden test cell
