Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All). Check your output to make sure it all looks as you expected.

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your name below:

In [1]:
NAME = "Bryan Tchakote"

---

**Here are some key points on how to use the notebook and submit your work.**
We will grade based on assuming you have read and understood them.
    
1. **Using the Notebook to Show Your Work**: You must learn to write code in the notebook... It is a core tool for data science and will make it easier to develop and document your work if you become good at using it. Writing your code in another tool and pasting it into the notebook will probably not work well (forgetting to include code elsewhere, messing up the spacing, or code that don't run as copied). You must be sure your code cells execute because we will test them.  So learn how to run code in the notebook cells to double check your work.
2. **Read Directions Carefully**:  The instructions for your code are very important. If you don't follow the requirements, your application won't do as requested. Making it work correctly is part of learning to program.  If we worded something unclearly, ask the teacher.

The second set of issues are coding style:

1. **Indentation**: In Python, indentation matters and must be consistent. If you write your code in the Notebook, the **tab** key will indent properly.  If you use another editor and paste into the notebook, it might not be correctly indented (when you do write code in another editor, make sure you set your tabs to indent as 4 spaces, not as a tab character.)  You must make sure that your pasted code runs in the notebook or it will not get a good grade. Anyway, we recommend beginners to work in a Jupyter notebook for this course, whether it's this one or a draft file.
2. **Spacing**: Follow closely the spacing shown in the lessons. There should not be a space  between a function name and the parentheses with the arguments. As a programmer, style is very important. If you work with programmers in the future, they sometimes have "lint" checkers to test your code for style and reject if it doesn't follow the approprate spacing and blank-line-rules. Think of it as a matter of politness for other people reading your code \ (•◡•) /
3. **Names of Variables**: In Python, there's a culture of making everything readable. Don't use ``x`` and ``y`` as your variable names... use words like ``pounds`` and ``kilograms``. It will be easier for colleagues (and yourself) to understand the code later.
4. **Error Messages**: Please use informative error messages that tell the user what they did wrong and what kind of input you expect. Imagine you are designing the user experience! Think about how to help your user. And remember **you** are the user when you debug!

We will take points off for issues of non-standard spacing, indentation, bad error messages, and bad variable names in the future.  This will continue for the entire course.

There are multiple ways to code all the answers.  Here are a few more code style tips:

1. If you do a calculation or a transformation, like ``float(pounds)`` -- do it once and save it as a variable, don't do it multiple times.  You should try not to have code that repeats itself too much.  If you repeat things, you can make mistakes like typos and it will be harder to find them. Also, it's wasting computer power.
2. Tests like "4 < test < 40" need to be saved in a variable or used in a ``if`` statement.  It won't do anything relevant otherwise.
3. ``try``/``except`` should be used to catch errors. (In fancier, more formal Python, there is more careful error catching where the type of error is detected and handled. We're just doing the basic try/except right now.) Anytime you have a conversion or something that could result in an error, you should wrap it in try/except. Do not allow a user to run code that results in an un-handled error.

---

# Notebook 3, Homework

**Reminder: Do not use `exit()` in a notebook.  Use `return` to exit a  function.**

This homework includes list comprehensions, which are a very common pattern in Python code.  They can be schematically described as being of the form `[something(x) for x in list]`, which returns a new list with the function `something` applied to each member `x` in the list (or other iterable, such as a tuple).

Make sure you have read the chapter on List Comprehensions here: 
    http://introtopython.org/lists_tuples.html#List-Comprehensions



In [2]:
# Here is a list of names we will use for the first problem:

names = ['frederica', 'gilbert', 'amine', 'hasan', 'annie', 'bob']

**Question 1**: Write a function **using a list comprehension** to create a new list with just the length of each of the strings in the input list of names above.  Return the new list of string lengths.

In [3]:
def get_lengths(words):
    """ Takes a list of words, returns a list of their lengths. """
    return [len(word) for word in words]

In [4]:
# show your code running here:
get_lengths(names)

[9, 7, 5, 5, 5, 3]

In [5]:
## A test for your code that you can ignore. We will check manually too. 

assert get_lengths(names) == [9, 7, 5, 5, 5, 3]


In [6]:
## another test to make sure you used a list comprehension
import ast
import inspect

src = inspect.getsource(get_lengths)
node = ast.parse(src)

class ListCompChecker(ast.NodeVisitor):
    def __init__(self):
        super(ListCompChecker).__init__()
        self.Found = False
    def visit_ListComp(self, node):
        self.Found = True

checker = ListCompChecker()
checker.visit(node)
assert checker.Found

**Question 2**:
    How would you convert the following for-loop code (`for val in mylist`) into a single line list comprehension? (The print part can be a separate line after your list comprehension.)
   

In [7]:
mylist = [2,4,8,10,3.4,4,2]
tens = []
for val in mylist:
    tens.append(val * 10)
print(tens)

[20, 40, 80, 100, 34.0, 40, 20]


In [8]:
tens = [x*10 for x in mylist]
print(tens)

[20, 40, 80, 100, 34.0, 40, 20]


## Chapter 9: Dictionaries 

You should have read the book chapter up to Advanced Text Parsing (you can skip that if you want) and also read this: http://introtopython.org/dictionaries.html  

Pay particular attention to looping through keys and values in dictionaries using `items()`, which is not mentioned in the book.  This is the most common way to access the parts of the dictionary.

The pattern to use with `items()` is this:

````
for key, value in mydict.items():
    # do something with key and/or value
````

Make sure you understand this common pattern.

#### Question 3:

Write a function that adds one to each numeric value in a dictionary. If the value is not numeric (i.e. not an int or float), don't add anything, but keep the same value. The function should take a dictionary as argument, and return the modified dictionary.


In [9]:
def addtovalue(dictionary):
    "Add one to each numeric value of the dictionary, return the dict"
    for key, value in dictionary.items():
        if isinstance(value, (int, float)):
            dictionary[key] += 1
    return dictionary

In [10]:
testdict = { 'fred': 3.3, 'marie': '5', 'jean': 14, 'angus': 44, 'amine': None}

result = addtovalue(testdict)
assert result['fred'] == 4.3
assert result['jean'] == 15
assert result['angus'] == 45

In [11]:
testdict = { 'fred': 3, 'marie': '5', 'jean': 14, 'angus': 44, 'amine': None}
result = addtovalue(testdict)
assert result['amine'] is None
assert result['marie'] == '5'

**Question 4**:

You can use multiple functions if you want, but you have to have one main function that calls the others, if you do. This is because the test code relies on the main function to check your output.

Create a function that takes a file name (and path if needed) as the argument.  When you test the function, open and read in the file **mountains.csv**. Inside the function, use a try/catch to be sure the file exists and is readable. If the file location is wrong or it can't be opened, print an error that begins with "Error:".  (You can test it with a junk path or filename that doesn't exist.)

The pattern suggested for this is:

````
try:
    with open('mountains.csv', 'r') as handle:
        for line in handle:
        #....do stuff here (you can have other try/except in here if you want)
except:
    print("Error: Something wrong with your file location?")
    return
````

An alternate pattern is:

````
try:
    handle = open(filename, 'r')
except:
    print("Error: trouble with file opening")
    return
````

But you must remember to close the handle if you do this.  The book says:

> We could close the files which we open for read as well, but we can be a little sloppy if we are only opening a few files since Python makes sure that all open files are closed when the program ends. When we are writing files, we want to explicitly close the files so as to leave nothing to chance."

If you are using the recommended pattern with the ``with open() as handle:`` idiom, you don't need to close it explicitly, it will be closed for you.  That's why it's recommended.

*mountains.csv* is a comma-separated list of mountains, their height in meters, and the range they belong to (look at it in a text editor, but don't edit the file!). A CSV file is a common format for raw data.  Other types of raw data files are semi-colon (point-virgule) separated files or tab-separated files.  However the columns are separated, you must use that character in your "split" code.

In this case, it's a comma. Split each line by the comma, and make a dictionary where the key is the mountain name (the first element) and the height is the value, the second element. Make sure to convert the height to a number.
Then print the keys and values of the dictionary using ``.items()``, in readable sentences that say, for instance, "The height of K2 is 8611 meters.".  Return the dictionary at the end of the function.

Reminder about print with {} in your string: use `print(string.format(variable))` to fill in the {} with your variable. If there are 2 {}'s, use `.format(var1, var2)`.


In [12]:
def reading_file(filename):
    '''
    This function takes a filename into parameter, go through the file (if exising) and record in
    nested lists all the information that are going to be usefull for the following functions
    '''
    data = []
    try:
        with open(filename) as handle:
            for line in handle:
                try:
                    mountain, height, mountain_range = line.rsplit('\n')[0].split(',')[:3]
                    assert(len(mountain) > 0 and len(height) > 0 and len(mountain_range) > 0)
                except (ValueError, AssertionError):
                    print("Error: One of the lines in your file has an invalid number of entries")
                    return
                
                try:
                    height = int(height)
                except ValueError:
                    print(f"{mountain}'s height could not be casted into 'int'")
                    return
                
                data.append([mountain, height, mountain_range])
            return data
    except FileNotFoundError:
        print("Error: File doesn't exist or is unreadable.")
        return
    
def mountain_height(filename):
    """ Read in a csv file of mountain names and heights.  
    Parse the lines and print the names and heights. 
    Return the data as a dictionary. 
    The key is the mountain and the height is the value.
    """
    data = reading_file(filename)
    
    mountains = dict()
    msg = "The height of {} is {} meters."
    err_msg = "Error: File doesn't exist or is unreadable."
    
    for mountain in data:
        try:
            mountains[mountain[0]] = int(mountain[1])
        except ValueError:
            print(f"{mountain[1]} could not be casted into 'int'")
            return
            
    for name, height in mountains.items():
        print(msg.format(name, height))
            
    return mountains    

In [13]:

# Edit this to have the path to your file mountains.csv.  
# Show that it runs.

filename = "./data_files/mountains.csv"
mountain_height(filename)

The height of Mount Everest is 8848 meters.
The height of K2 is 8611 meters.
The height of Kangchenjunga is 8586 meters.
The height of Lhotse is 8516 meters.
The height of Makalu is 8485 meters.
The height of Cho Oyu is 8201 meters.
The height of Dhaulagiri is 8167 meters.
The height of Manaslu is 8163 meters.
The height of Nanga Parbat is 8126 meters.
The height of Annapurna is 8091 meters.
The height of Gasherbrum I is 8080 meters.
The height of Broad Peak is 8051 meters.
The height of Gasherbrum II is 8035 meters.
The height of Shishapangma is 8027 meters.


{'Mount Everest': 8848,
 'K2': 8611,
 'Kangchenjunga': 8586,
 'Lhotse': 8516,
 'Makalu': 8485,
 'Cho Oyu': 8201,
 'Dhaulagiri': 8167,
 'Manaslu': 8163,
 'Nanga Parbat': 8126,
 'Annapurna': 8091,
 'Gasherbrum I': 8080,
 'Broad Peak': 8051,
 'Gasherbrum II': 8035,
 'Shishapangma': 8027}

In [14]:
# Test code for grading your function. You can ignore this.

filename = "./data_files/mountains.csv"
output = mountain_height(filename)
assert len(output.keys()) == 14
assert output['Annapurna'] == 8091

The height of Mount Everest is 8848 meters.
The height of K2 is 8611 meters.
The height of Kangchenjunga is 8586 meters.
The height of Lhotse is 8516 meters.
The height of Makalu is 8485 meters.
The height of Cho Oyu is 8201 meters.
The height of Dhaulagiri is 8167 meters.
The height of Manaslu is 8163 meters.
The height of Nanga Parbat is 8126 meters.
The height of Annapurna is 8091 meters.
The height of Gasherbrum I is 8080 meters.
The height of Broad Peak is 8051 meters.
The height of Gasherbrum II is 8035 meters.
The height of Shishapangma is 8027 meters.


In [15]:
# Hidden test for your printing the data.


In [16]:
# Hidden test for your error condition with a bad filename/path.


**Question 5**:

Rewrite your function to use the collections module's Counter to count how many times each mountain range is mentioned. Each row contains a mountain, its height, and the range it is part of. The ranges are still in the 3rd column of the mountains.csv file!  You can use more than one function if you want.

Also add a dictionary that records all the heights of the mountains in a particular range. You will use a list for the values of the heights. So this is a dictionary with a list value! The key will be the range name. Each time you see a new mountain in the range, add the height to the list for that key. For example, after reading all the data, ``mountains['Himalayas'] == [8848, 8586, 8516, 8485, 8201, 8167, 8163, 8126, 8091, 8027]``.  (The "Himalayas" are the range.)

You may use a regular ``dict`` or a ``defaultdict``, but you must beware of ``KeyError`` with a regular dictionary if the key doesn't exist yet.

Your output should be to print the top 2 ranges (according to their Counter value -- hint: look at the function ``most_common()``). And adding the mountain range name to the counter requires a little care (look at update).

Then, print the average height of the mountains in each range. (They don't have to be in order. Hint: You may need to find out how to import a ``mean`` function, or else calculate it by hand.)

Return the dictionary object with the ranges and their lists of mountain heights after all the printing.

**Show that this code works with the other file, "highest_mountains.csv" too.**


In [17]:
# Using Counter()
from collections import Counter
from collections import defaultdict
from statistics import mean

# define your dicts inside the function, so they can be re-used each time it is called.
def mountain_ranges(filename):
    mountains = reading_file(filename)
    
    ranges = []
    heights = defaultdict(list)
    
    for mountain in mountains:
        heights[mountain[2]].append(mountain[1])
        ranges.append(mountain[2])
    ranges = Counter(ranges)
    print("The two top ranges are:", ranges.most_common(2), "\n")
            
    for range_name in heights:
        print(f"Average height of the mountains in {range_name}'s range: {mean(heights[range_name])}")
            
    return heights

In [18]:
mountain_ranges('data_files/mountains.csv')

The two top ranges are: [('Himalayas', 10), ('Karakoram', 4)] 

Average height of the mountains in Himalayas's range: 8321
Average height of the mountains in Karakoram's range: 8194.25


defaultdict(list,
            {'Himalayas': [8848,
              8586,
              8516,
              8485,
              8201,
              8167,
              8163,
              8126,
              8091,
              8027],
             'Karakoram': [8611, 8080, 8051, 8035]})

In [19]:
# Show your code working.  Points for the right messages with values.
filepath = "./data_files/mountains.csv"
mountain_ranges("data_files/mountains.csv")

The two top ranges are: [('Himalayas', 10), ('Karakoram', 4)] 

Average height of the mountains in Himalayas's range: 8321
Average height of the mountains in Karakoram's range: 8194.25


defaultdict(list,
            {'Himalayas': [8848,
              8586,
              8516,
              8485,
              8201,
              8167,
              8163,
              8126,
              8091,
              8027],
             'Karakoram': [8611, 8080, 8051, 8035]})

In [20]:
## Testing the output contains values we expect from the counts and means.

import mock
from io import StringIO
import sys

with mock.patch('sys.stdout', new_callable=StringIO):
    mountain_ranges("data_files/mountains.csv")
    assert "8321" in sys.stdout.getvalue()
    assert "10" in sys.stdout.getvalue()

In [21]:
# Testing your output for the grade. Ignore this. Handgrading of the printed output.

filepath = "./data_files/mountains.csv"
result = mountain_ranges(filepath)
assert result['Karakoram'] == [8611, 8080, 8051, 8035]

The two top ranges are: [('Himalayas', 10), ('Karakoram', 4)] 

Average height of the mountains in Himalayas's range: 8321
Average height of the mountains in Karakoram's range: 8194.25


In [22]:
# Show your code works with the other file, highest_mountains too. 
# Fix the path!
filepath = "./data_files/highest_mountains.csv"
mountain_ranges(filepath)


The two top ranges are: [('Mahalangur Himalaya', 12), ('Baltoro Karakoram', 10)] 

Average height of the mountains in Mahalangur Himalaya's range: 7866.083333333333
Average height of the mountains in Baltoro Karakoram's range: 7820
Average height of the mountains in Kangchenjunga Himalaya's range: 7647.166666666667
Average height of the mountains in Dhaulagiri Himalaya's range: 7585.142857142857
Average height of the mountains in Manaslu Himalaya's range: 7975.666666666667
Average height of the mountains in Nanga Parbat Himalaya's range: 8126
Average height of the mountains in Annapurna Himalaya's range: 7651.4
Average height of the mountains in Jugal Himalaya's range: 8027
Average height of the mountains in Hispar Karakoram's range: 7513.9
Average height of the mountains in Masherbrum Karakoram's range: 7520
Average height of the mountains in Garhwal Himalaya's range: 7488.2
Average height of the mountains in Batura Karakoram's range: 7567.5
Average height of the mountains in Rakaposh

defaultdict(list,
            {'Mahalangur Himalaya': [8848,
              8516,
              8485,
              8188,
              7952,
              7864,
              7804,
              7543,
              7350,
              7321,
              7309,
              7213],
             'Baltoro Karakoram': [8611,
              8080,
              8051,
              8034,
              7946,
              7932,
              7545,
              7410,
              7315,
              7276],
             'Kangchenjunga Himalaya': [8586, 7711, 7462, 7412, 7362, 7350],
             'Dhaulagiri Himalaya': [8167, 7751, 7661, 7618, 7385, 7268, 7246],
             'Manaslu Himalaya': [8163, 7893, 7871],
             'Nanga Parbat Himalaya': [8126],
             'Annapurna Himalaya': [8091, 7937, 7555, 7455, 7219],
             'Jugal Himalaya': [8027],
             'Hispar Karakoram': [7884,
              7823,
              7790,
              7577,
              7492,
              

## Congratulations, now you are doing basic data science!