Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says `YOUR CODE HERE`/`raise NotImplementedError` or "YOUR ANSWER HERE", as well as your name and collaborators below:

## Exercises on Representing General Data Sets

### References

In addition to Chapter 3 of the class textbook, some other references that you might find helpful:

- Discovering Computer Science, Jessen Havill, Sections 8.3 and 9.1
- Online CS1 textbook: https://runestone.academy/runestone/books/published/thinkcspy/Lists/toctree.html and https://runestone.academy/runestone/books/published/thinkcspy/Dictionaries/toctree.html

### Exercises

Note that you will lose significant credit if, for each function, you do not provide a **docstring** for the function.  If you do not recall how or where docstrings are to be placed for functions, see, for example:  

- Discovering Computer Science, Jessen Havill, Chapter 3, Section 3.4
- https://docs.python-guide.org/writing/documentation/#writing-docstrings

In [None]:
import os
import os.path
import io
import sys
from contextlib import redirect_stdout

datadir = "publicdata"

**Q1:** Write a function

    readBabynames2DoL(path)

that reads from the file at location `path` and returns a dictionary mapping from column names to lists containing the data in those columns. The format of the file is a *CSV file*.  The first line is a comma separated set of string names for the columns contained in the file.  In this case `year,name,count`.  Subsequent lines have, on each line, comma separated values for a data mapping, where the first value gives the value of the year, and the second and third values on the line give the string name and the integer count.

Your function will accumulate a dictionary mapping from `year` to the list of years, from `name` to the list of names (in the same order), and from `count` to the list of counts (in the same order).  Make sure you convert `year` and `count` to integers.

In the `datadir` directory, there are files `topfemale.csv` and `topmale.csv` that are formatted in this way.  In the testing cell, we use your `readNames()` function, assign to variable `females` the dictionary obtained from `topfemale.csv` and assign to variable `males` the dictionary obtained from `topmale.csv`. Note that, unlike the example in the reading, `sex` is not a variable.

In [None]:
# Solution cell

# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# Testing cell

femalepath = os.path.join(datadir, "topfemale.csv")
females = readBabynames2DoL(femalepath)
assert len(females) == 3
assert len(females['year']) == 139
assert 1880 in females['year']
assert 2018 in females['year']
assert 'Mary' in females['name']
assert 'Emma' in females['name']
assert 6919 in females['count']


malepath = os.path.join(datadir, "topmale.csv")
males = readBabynames2DoL(malepath)
assert len(males) == 3
assert len(males['count']) == 139
assert 1880 in males['year']
assert 2018 in males['year']
assert 'John' in males['name']
assert 'Liam' in males['name']
assert 8769 in males['count']


**Q2:** Write a function

    readNamesCount2LoL(path)

that reads from the file at location `path` and returns a List of Lists (where inner lists are rows of the data set).  The format of the file is a *CSV file*.  The first line is a comma separated set of string names for the columns contained in the file.  In this case `year,name,count`.  Subsequent lines have, on each line, comma separated values for a data mapping, where the first value gives the independent variable value of the year, and the second and third values on the line give the string name and the integer count. Be sure to convert `year` and `count` to integers.

Your function should return both a list of the column names, and the list of lists, in that order.

In the `datadir` directory, there are files `topfemale.csv` and `topmale.csv` that are formatted in this way. In the testing cell, we use your `readNames()` function to extract the header (`fheader`) and the List of Lists (`female`) obtained from `topfemale.csv`. Similarly, we extract `mheader` and `males` from `topmale.csv`.

In [None]:
# Solution cell

# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# Testing cell

femalepath = os.path.join(datadir, "topfemale.csv")
fheader, females = readNamesCount2LoL(femalepath)
assert fheader == ['year','name','count']
assert len(females) == 139
assert [1880,'Mary',7065] in females
assert [2018,'Emma',18688] in females
assert females[0][1] == 'Mary'
assert females[138][0] == 2018
assert females[137][2] == 19800


malepath = os.path.join(datadir, "topmale.csv")
mheader, males = readNamesCount2LoL(malepath)
assert mheader == ['year','name','count']
assert len(males) == 139
assert [1880,'John',9655] in males
assert [2018,'Liam',19837] in males
assert males[0][1] == 'John'
assert males[138][0] == 2018
assert males[137][2] == 18798

**Q3:** Write a function

    convertDoL2LoL(D)

that converts from a Dictionary of Lists representation in `D` to the equivalent List of Lists representation, returning both the list of column names, as given by the keys of `D`, as well as a list of lists storing the data in the values of `D`. Note that, because we do not know the order of mappings in a dictionary, the order of fields in the column names, and the order of fields in the rows of the data set may not be apparent. But as long as all rows as well as the list of column names is consistent, the conversion is valid. Hint: a list comprehension might come in handy.

In the testing cell, we use your function on `topfemale.csv` and `topmale.csv` as above, and we also rely on your function `readBabynames2DoL(path)` from above.

In [None]:
# Solution cell

# YOUR CODE HERE
raise NotImplementedError()
    


In [None]:
# Testing cell

femalepath = os.path.join(datadir, "topfemale.csv")
femaleDict = readBabynames2DoL(femalepath)
females = convertDoL2LoL(femaleDict)
assert len(females) == 139
assert [1880,'Mary',7065] in females
assert [2018,'Emma',18688] in females


malepath = os.path.join(datadir, "topmale.csv")
malesDict = readBabynames2DoL(malepath)
males = convertDoL2LoL(malesDict)
assert len(males) == 139
assert [1880,'John',9655] in males
assert [2018,'Liam',19837] in males


**Q4:** Write a function

    convertLoL2DoL(columns, data)
    
that converts from a List of Lists to a Dictionary of Lists `D`, which you return. Here, `columns` is a list of column names (which will become the keys in `D`), and `data` is a list of row lists. 

In the testing cell, we use your function on `topfemale.csv` and `topmale.csv` as above, and we also rely on your function `readNamesCount2LoL(path)` from above.

In [None]:
# Solution cell

# YOUR CODE HERE
raise NotImplementedError()
    


In [None]:
# Testing cell

femalepath = os.path.join(datadir, "topfemale.csv")
fheader, femalesList = readNamesCount2LoL(femalepath)
females = convertLoL2DoL(fheader, femalesList)
assert len(females) == 3
assert len(females['year']) == 139
assert 1880 in females['year']
assert 2018 in females['year']
assert 'Mary' in females['name']
assert 'Emma' in females['name']
assert 6919 in females['count']


malepath = os.path.join(datadir, "topmale.csv")
mheader, malesList = readNamesCount2LoL(malepath)
males = convertLoL2DoL(mheader, malesList)
assert len(males) == 3
assert len(males['count']) == 139
assert 1880 in males['year']
assert 2018 in males['year']
assert 'John' in males['name']
assert 'Liam' in males['name']
assert 8769 in males['count']
