# literary fiction, part 2
When last we left Lalama, she had figured out how to load her csv file into the `reader` and parse all the lines.

But there will still a bunch of useless lines at the top of the file.

Let's get that file back in memory now.

In [2]:
import urllib.request
import csv

url_for_file = "https://raw.githubusercontent.com/NickleDave/EWIN-coding-bootcamp/master/Python/Wiltshire3_means.csv"
with urllib.request.urlopen(url_for_file) as response:
   csv_file = response.read().decode('utf-8').splitlines()

Let's look again at the first six items in the list of lines from the file.

In [5]:
for index,line in enumerate(csv_file[0:7]):
    print("line {0} is: {1}".format(index,line))

line 0 is: "// Phenotype data set from Mouse Phenome Database (phenome.jax.org)",,,,,,,,
line 1 is: "// Data set: Wiltshire3    Title: Drug study: Neurobiochemical analytes in response to chronic fluoxetine treatment in males of 30 inbred mouse strains    Year: 2011",,,,,,,,
line 2 is: "// List of strain means and summary statistics",,,,,,,,,,,
line 3 is: "// For more info on these data visit phenome.jax.org and type Wiltshire3 into search box",,,,,,,,,,
line 4 is: ,,,,,,,,,,,
line 5 is: measnum,varname,strain,sex,mean,sd,sem,nmice,cv,zscore,,,
line 6 is: 38101,ACTH_cont,"129S1/SvImJ",m,1.80,0.0660,0.0381,3,0.0366,-1.40


Notice the following:

* Lines 0-3 are comments, and begin with "//"

* Line 4 was just a bunch of commas

* Line 5 is the **header**, the row of a csv file that tells us the name of each column of values.

* So really we want to skip lines 0-5, although we might want the fifth line for later.

Also let's talk about what we just did in the cell above

## enumerate

when you call `enumerate` with a `sequence`, it returns an `iterator` that yields a `counter`.

https://docs.python.org/3.5/library/functions.html#enumerate

This is useful e.g. when you need to modify every item in a list

```Python
POINTS_FOR_CURVE = 20
for ind,val in enumerate(list_of_grades):
    list_of_grades[ind] = val + POINTS_FOR_CURVE
```

## format

In the bad old days, programmers used a function called `sprintf` to write formatted data to a string.

For our example above, we don't really have to format the data.

An example of formatting would be "show this number in scientific notation with only 2 significant digits".

Python `strings` now have a `format` method to make formatting more human readable.

Here's a good page on how `format` works: https://pyformat.info/

Okay, so we want to skip the first five lines.

Remember that we can get the next line out of the reader by calling the next method of the iterator.

So we could do this:

In [None]:
with urllib.request.urlopen(url_for_file) as response:
   csv_file = response.read().decode('utf-8').splitlines()

for skip_index in range(0,6):
    next(csv)

How many commas are we looking for?

In [6]:
len(csv_file[4])

11

In [None]:


def parse_jackson_csv(csv_string):
    """
    Parses csv files from Jackson labs website.
    Deals with some of the idiosyncracies that csv.sniffer doesn't recognize.
    """
    SEPARATOR_BEFORE_HEADER = ",,,,,,,,,,\n,,,,,,,,,,,\n"
    index = csv_string.rfind(SEPARATOR_BEFORE_HEADER)
    new_start_index = index+len(SEPARATOR_BEFORE_HEADER)+1
    csv_string = csv_string[new_start_index:]
    return csv_string

In [None]:
csv_string = parse_jackson_csv(csv_file)

In [None]:
dialect = csv.Sniffer()kj.sniff(csv_string)
reader = csv.reader(csv_string, dialect)
thing = list(reader)

In [None]:
thing

In [None]:
url_for_file = "http://phenome.jax.org/tmp/Willott1_table.csv"
with urllib.request.urlopen(url_for_file) as response:
   csv_file = response.read().decode('utf-8')
reader = csv.reader(csv_file, delimiter=',', quotechar='"')

## the easy way

In [None]:
!pip install openpyxl