This is a Jupyter notebook! Some basics to know:

1. You can type directly into the page and execute your code by pressing `Shift+Enter`
2. There will be test function right below the function you're asked to implement. Don't change these test functions! When you `execute` your code block, you'll immediately know if you wrote your code
correctly.
3. If your code is incorrect, you can see where it failed in the error message.
4. You can choose to run all of your code blocks at once by going up to the menu and selecting Cell > Run All.

Good luck and let us know if you have any questions!

Make sure to run this block first.

In [None]:
import csv, sys

# Using data structures and files

This exercise should get you warmed up on how the data structure `dictionary` can be used

Dictionaries, or hashmaps, allow for a way to associate a 'key' with a 'value'. 
Imagine it like an index of a textbook. You know what topic you want to look 
for (the key), and you want a list of where it can be found (the value).

For this exercise, I want you to take the idea of a book index and check to find
what pages two different topics are both on. For example, if our book index looked like:
    apples: [2, 5, 64, 66, 70]
    oranges: [3, 6, 63, 64, 70]
    grapes: [3, 4, 5, 50, 64]

and we called the function with 'apples' and 'oranges' as our topics, the function should return
[64, 70]
If one of the topics queried is not in the book_index, you should return False for now.


You may find some help from these docs: https://docs.python.org/3/tutorial/datastructures.html#dictionaries

In [None]:
def dictionary_exercise(book_index, topic1, topic2):
    return False

# Do not change anything below this line
def test_dictionary():
    print('Test dictionary function')
    book_index = {
        'apples': [2, 5, 64, 66, 70],
        'oranges': [3, 6, 63, 64, 70],
        'grapes': [3, 4, 5, 50, 64]
    }
    assert (dictionary_exercise(book_index, 'apples', 'oranges') == [64,70])
    assert (dictionary_exercise(book_index, 'apples', 'bananas') == [])
    
test_dictionary()

.CSV exercise
(CSV files are like raw versions of Excel files, they are tabulated using commas and new lines)

One awesome part about Python and many other languages is that it can import in files to 
parse data and return information.

For example, if we had a file that contained your grades history from high school, you 
would be able to calculate metrics such as your GPA by just specifying what file to use.

In this case, I want you to calculate the GPA of files that are in the format
[ClassName, Grade]

Let's make a few assumptions as well:
A-/A/A+ -> 4.0
B-/B/B+ -> 3.0
C-/C/C+ -> 2.0
D-/D/D+ -> 1.0
F       -> 0.0


Our parameter, csvfile, is a string that has the file name. In order to access its contents, 
you'll have to open the file to expose a file object. Then, you'll have to create a csv reader 
object and read the file line-by-line.

You may find some help from these docs:
- `with open('filename', 'r') as f`
- csv reader objects and their available functions - https://docs.python.org/2/library/csv.html

In [None]:
def calculate_GPA_CSV(csvfile):
    # This is a default return value for this function. You'll want to change this!
    return 0

# Do not change anything below this line
def test_calculate_GPA_CSV():
    print('Test calculate GPA function')
    gpa = calculate_GPA_CSV('gpadata.csv')
    assert (round(gpa, 2) == 3.52)

test_calculate_GPA_CSV()

In data science, we not only want to know the average, the median, the maximum and the minimum of a set of numbers that we're given, but also, how much those numbers vary.

For this exercise, I'll refer to the array of numbers as our data. Each number in that array is called a data point.

We use the concept of variance and standard deviation. Variance, intuitively, gives us a sense of how far apart data points are from the average. If variance is small, then we can say that our data is mostly centered around the average and our average actually is very representative of all data points. However, if variance is quite large, then we cannot say that. Our data varies way too much for our average to be representative.

You can calculate the variance via 3 steps.
1. Find the mean (or average).
2. For each data point, calculate its difference from the mean. Square this difference.
3. Sum all of the differences you find.

Taking the square root of variance yields a measure called standard deviation. Standard deviation is also a measure of how spread out our data points are. It is more often used by statisticians and data scientists to describe the spread of data points.

In this case, we give a csvfile that has the following format: [Country, GDP]

You'll need to use similar techniques above to read this file and it's values.

Using the CSV parsing techniques you've learned above, fill in the functions below that calculate the following statistics about countries and their GDP values
- Average GDP
- Max GDP and which country has that GDP
- Min GDP and which country has that GDP
- Variance
- Standard Deviation

Hints:
- More reading on standard deviation and variance: http://www.mathsisfun.com/data/standard-deviation.html
- If you're interested in where this data came from: http://data.worldbank.org/indicator/NY.GDP.MKTP.CD
- `sys.float_info.max` (sys is already imported for you)
- You'll want to store the GDP values you encounter while reading the CSV file into an array to calculate the variance - `array.append`

In [None]:
def calculate_statistics(gdpfile):
    # Default values are set for you
    average = 0

    max_gdp = 0
    min_gdp = sys.float_info.max
    country_with_highest_gdp = 'USA'
    country_with_lowest_gdp = 'USA'

    variance = 0
    standard_deviation = 0

    # Insert your code here!

    return average, max_gdp, min_gdp, country_with_highest_gdp, country_with_lowest_gdp, variance, standard_deviation

# Do not change anything below this line
def test_calculate_statistics():
    print('Test calculate statistics function')
    average, max_gdp, min_gdp, country_with_highest_gdp, country_with_lowest_gdp, variance, standard_deviation = calculate_statistics('GDPData.csv')
    assert (abs(average - 4.452496609*(10 ** 11)) <= 1)
    assert (abs(max_gdp - 1.7946996*(10**13)) <= 1)
    assert (country_with_highest_gdp == "United States")
    assert (abs(min_gdp - 145237022.012) <= 1)
    assert (country_with_lowest_gdp == "Kiribati")
    assert (abs(variance - 4.90159965423*(10**26)) <= 4*(10**13))
    assert (abs(standard_deviation - 2.21395565769*(10**13)) <= 100)

test_calculate_statistics()