A few things you should keep in mind when working on assignments:

1. Make sure you fill in any place that says `YOUR CODE HERE`. Do **not** write your answer in anywhere else other than where it says `YOUR CODE HERE`. Anything you write anywhere else will be removed or overwritten by the autograder.

2. Before you submit your assignment, make sure everything runs as expected. Go to menubar, select _Kernel_, and restart the kernel and run all cells (_Restart & Run all_).

3. Do not change the title (i.e. file name) of this notebook.

4. Make sure that you save your work (in the menubar, select _File_ → _Save and CheckPoint_)

5. You are allowed to submit an assignment multiple times, but only the most recent submission will be graded.

# Problem 3. Data Formats

In this problem, we read data from a CSV file and write it to a JSON file.

In [None]:
import csv
import json
from pprint import pprint

import os, random # used in tests

from nose.tools import assert_equal, assert_is_instance, assert_true

Suppose we have a CSV file with 4 columns: `Year`, `Month`, `DayofMonth`, and `ArrDelay`.

```
Year,Month,DayofMonth,ArrDelay
2001,1,17,-3
2001,1,18,4
2001,1,19,23
2001,1,20,10
2001,1,20,20
```

Note that this CSV file is the same file we used in [Week 2 Problem 3](https://github.com/UI-DataScience/accy570-fa16/blob/master/Week2/assignments/Problem_3_Pandas.ipynb). `ArrDelay` represents the arrival delay of a flight on the given date. So the first row says, on January 17, 2001, a flight arrived 3 minutes earlier than scheduled; the second row says, on January 18, 2001, a flight was delayed 4 minutes; and so on. For simplicity, let us suppose that the CSV file has only 5 rows of data, but real-world data will have many more.


In the following cell, we will use Python to create the CSV file and name it `sample.csv`. (For simplicity, we won't include a header row this time.)

In [None]:
csv_text = """2001,1,17,-3
2001,1,18,4
2001,1,19,23
2001,1,20,10
2001,1,20,20"""

with open('sample.csv', 'w') as f:
    f.write(csv_text)

In the following code cell, we use an IPython magic function called `%cat` to verify that we have successfully created the CSV file. The `%cat` magic displays the contents of a file.

In [None]:
%cat sample.csv

## Write a function named `read_csv` that reads a CSV file and returns a list (of lists of strings) that can be used for writing to a JSON format.

To describe what this function is supposed to do, let's use `sample.csv` as an example.

```python
>>> %cat sample.csv
```
```
2001,1,17,-3
2001,1,18,4
2001,1,19,23
2001,1,20,10
2001,1,20,20
```

When we use `"sample.csv"` as an argument to the `read_csv` function and print out the result,

```python
>>> data = read_csv('sample.csv')
>>> pprint(data)
```

it should return

```
[['2001', '1', '17', '-3'],
 ['2001', '1', '18', '4'],
 ['2001', '1', '19', '23'],
 ['2001', '1', '20', '10'],
 ['2001', '1', '20', '20']]
```

Note that each value in the CSV file is a string, and each line is represented as a list of strings. The entire file is a list of these lists of strings, so the function returns a list of lists (of strings).

Once we have the data in a proper format, we can simply dump the data as follows to write to a JSON format.

```python
>>> with open('sample.json', 'w') as fout:
>>>    json.dump(data, fout)
```
```python
>>> %cat sample.json
```
```
[['2001', '1', '17', '-3'],
 ['2001', '1', '18', '4'],
 ['2001', '1', '19', '23'],
 ['2001', '1', '20', '10'],
 ['2001', '1', '20', '20']]
```

In [None]:
def read_csv(filename):
    """
    Reads a CSV file and returns a list of lists of strings.
    
    Parameters
    ----------
    filename: A string, e.g. 'sample.csv', 'airports.csv'.
    
    Returns
    -------
    A list of lists of strings.
    """
    
    # YOUR CODE HERE
    
    return result

In [None]:
data = read_csv('sample.csv')
pprint(data)

In the following code cell, we use the `json` library to test if we can write `data` to a JSON file and read it back. Since we are simply dumping the data and then loading it without doing anything in between, the output of this code should be identical to the output in the previous code cell.

In [None]:
with open('sample.json', 'w') as fout:
    json.dump(data, fout)
    
with open('sample.json', 'r') as fin:
    data = json.load(fin)

pprint(data)

In [None]:
assert_is_instance(data, list)
assert_equal(len(data), 5)
assert_equal(data[0], ['2001', '1', '17', '-3'])
assert_equal(data[1], ['2001', '1', '18', '4'])
assert_equal(data[2], ['2001', '1', '19', '23'])
assert_equal(data[3], ['2001', '1', '20', '10'])
assert_equal(data[4], ['2001', '1', '20', '20'])

# if the function can only handle a specific case (sample.csv)
# and cannot handle other CSV files, the following test will fail.
def make_random_line():
    return [str(i) for i in random.sample(range(10000), 10)]

test_csv = [make_random_line() for _ in range(10)]

with open('test.csv', 'w') as f:
    for line in test_csv:
        text = ','.join(line)
        f.write('{}\n'.format(text))

test_data = read_csv('test.csv')
assert_equal(test_csv, test_data)
os.remove('test.csv')