# literary fiction, part 2
When last we left Lalama, she had figured out how to load her csv file into the `reader` and parse all the lines.

But there will still a bunch of useless lines at the top of the file.

Let's get that file back in memory now.

In [None]:
import urllib.request

url_for_file = "https://raw.githubusercontent.com/NickleDave/EWIN-coding-bootcamp/master/Python/Wiltshire3_means.csv"
with urllib.request.urlopen(url_for_file) as response:
   csv_file = response.read().decode('utf-8').splitlines()

Let's look again at the first six items in the list of lines from the file.

In [None]:
for index,line in enumerate(csv_file[0:7]):
    print("line {0} is: {1}".format(index,line))

In [None]:
csv_file[5]

Notice the following:

* Lines 0-3 are comments, and begin with "//"

* Line 4 was just a bunch of commas

* Line 5 looks like it's the **header**, the row of a csv file that tells us the name of each column of values.
  - **Notice that the header row ends with 3 commas. Weird.**

* So really we want to skip lines 0-5, although we might want the fifth line for later.

Also let's talk about what we just did in the cell above

## enumerate

when you pass `enumerate` a `sequence`, it returns an `iterator` that yields a `counter` each time the `next` method is called.

https://docs.python.org/3.5/library/functions.html#enumerate

This is useful e.g. when you need to modify every item in a list

```Python
POINTS_FOR_CURVE = 20
for ind,val in enumerate(list_of_grades):
    list_of_grades[ind] = val + POINTS_FOR_CURVE
```

## format

In the bad old days, programmers used a function called `sprintf` to write formatted data to a string.

Old school Python used a syntax similar to `sprintf` for string formatting.

For our example above, we didn't really have to format the data.
We could have also written
```Python
print("line " + index + " is: " line)
```
because Python **overloads** the plus sign: it not only adds numbers together, it also **concatenates** strings.

An example of formatting would be "show this number in scientific notation with only 2 significant digits".

Python `strings` now have a `format` method to make formatting more human readable.

Okay, so we want to skip the first five lines.

Remember that we can get the next line out of the reader by calling the next method of the iterator.

So we could do this:

In [None]:
with urllib.request.urlopen(url_for_file) as response:
   csv_file = response.read().decode('utf-8').splitlines()

import csv

reader = csv.reader(csv_file, delimiter=',')

for skip_index in range(0,6):
    next(reader)

parsed_file = list(reader)
print("First item in parsed_file list is now:\n {}".format(parsed_file[0]))

## range

Recall that since Python lets us iterate over sequences, we don't have to make counters for every for loop as we do in other languages.

But sometimes you do need a counter.

* **range** function gives you iterator that spits out counters (kind of like `enumerate` but more flexible)
    - syntax: `range(start,stop[,step])`
    - the `stop` value is not included, just like when you index a list[start:stop]
        ```Python
        list(range(10)) # range gives us an iterator, have to consume it with list function
        >>>[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
        ```
    - if you don't give a value for `start`, the default is zero
    - you can give `range` a `step` value so that the iterator yields `[start, start + step, start + 2 * step, ...]`
        ```Python
        [val for val in range(5,25,5)] # <-- a list comprehension
        >>>[5, 10, 15, 20]
        list(range(25,5,-5))
        >>>[25, 20, 15, 10]
        ```

So the first item of our `parsed_file` list is now the first line of values from the csv file.

Ok, great.

* but what if we have another csv file and it has a different number of lines with comments?
* and what if our header doesn't have those pointless commas at the end

In [None]:
url_for_file2 = 'https://raw.githubusercontent.com/NickleDave/EWIN-coding-bootcamp/master/Python/Willott1_table-1.csv'
with urllib.request.urlopen(url_for_file2) as response:
   csv_file2 = response.read().decode('utf-8').splitlines()
csv_file2[0:7]

* We could:
  - open up each file
  - look at how many lines there are with comments and the header
  - and then change our little script so it skips the right number of lines

* boring
* takes time
* computer can do this for us

Let's write a function that will figure out how many lines we need to skip in each file.

## writing functions

* functions
  - begin with the **def** keyword
  - followed by the function name
  - and then the argument list in parentheses
  - then a colon
  - then (if you are a nice programmer) a docstring, enclosed in quotes
      - so other people, and you, months later, can figure out what your code does
  - then the code, indented
  - and, if necessary, a **return** statement

```Python
def adder_function(argument1,argument2):
    """
    adds argument1 and argument2
    """
    return argument1 + argument2
```

* simple function that lets us skip a certain number of lines in our file

In [None]:
def skip_lines(csv_file,num_to_skip):
    """
    skips lines in a csv file that's being parsed by csv.reader
    
    arguments:
        csv_file -- list of strings, rows from csv file
        num_to_skip -- integer, number of rows to skip
    
    returns:
        csv_file[num_to_skip:] -- list with num_to_skip rows removed from beginning
    """
    
    return csv_file[num_to_skip:] # start at num_to_skip because of zero indexing

This function isn't much of an improvement though. We still have to count how many lines to skip.

In [None]:
reader = csv.reader(csv_file, delimiter=',')

reader = skip_lines(reader,5)

parsed_file = list(reader)

print("First item in parsed_file list is now:\n {}".format(parsed_file[0]))

Let's get Python to parse these weird csv files for us.

Remember the format.

That will help us plan what we want our function to do.

* First there's lines with comments; they begin with "//"
* Then there's a line with a bunch of commas

**We want to skip these lines.**
 
* Then there's the **header**, the row of a csv file that tells us the name of each column of values.

**We want to give the user the option to extract the header in a separate variable, in case they need to use the names of the columns.**


## conditionals

Python like other languages has many keywords that let us handle different conditions.

Before we can use them, we have to figure out what condition our condition is in.
That's why we have

* **Boolean values**
  - True or False
  - Python keywords when capitalized
  - not keywords when not capitalized
    - common typo that can cause bugs

## 'if' statements

when an `if statement` evalutes as `True`, then the code block below it is executed

In [None]:
if True:
    print("I feel like I was just executed.")

Notice you can use 1 and 0 in place of `True` and `False`

In [None]:
if 0:
    print("If this code block will never get executed, ")
    print("do I even exist?")
print("I will always be executed.")

## 'while' statements

The code block in a **while statement** is executed as long as the conditional evaluates to `True`.

This gives rise to a very common way of writing loops:

In [None]:
counter = 0
while 1:
    line = csv_file[counter]
    if '//' not in line: # if the line does not contain "//"
        break # break keyword breaks out of the loop
    counter += 1 # counter = counter + 1

print("I broke out of the while loop at the line \"{}\"".format(line))
print("That's line number {}".format(counter))

This suggests another way of writing our function.

In [None]:
def skip_comments_and_comma_row(csv_file):
    counter = 0
    while 1:
        line = csv_file[counter]
        if '//' not in line:
            break 
        counter += 1
    return csv_file[counter+1:] # <-- +1 because we also don't want the line with the commas

How well does this work?

In [None]:
with urllib.request.urlopen(url_for_file) as response:
   csv_file = response.read().decode('utf-8').splitlines()
csv_file = skip_comments_and_comma_row(csv_file)
print("first line of csv file is now:\n{}".format(csv_file[0]))

Okay, nice.

But we still have the header as the first row.
Really we want to either throw that row out, or maybe keep it in a separate variable.

## optional arguments and default values

* Optional arguments
  - appear at *end* the argument list
  - optional because they have a **default** value
    - default value appears in the function definition

All the arguments in front of the optional arguments are called **positional arguments** because Python passes those arguments values based on whatever position they're in.

The ordering doesn't matter for optional arguments.

In [None]:
def a_function(foo,bar,baz=0,qux=0):
    print("bar = {}".format(bar))
    print("qux = {}".format(qux))

In [None]:
a_function(1,2)
a_function(10,20,qux=30)

We'll add a positional argument to our function defintion that is a Boolean.

If it's `True` then we'll return the header row.

If it's `False` we'll return the list of rows without the header.

We'll set the *default* to `False`.

In [None]:
def skip_comments_and_comma_row(csv_file,keep_header=False):
    counter = 0
    while 1:
        line = csv_file[counter]
        if '//' not in line:
            break 
        counter += 1
    if keep_header:
        header = csv_file[counter+1] 
        return csv_file[counter+2:],header
    else:
        return csv_file[counter+2:] # <-- +1 because we also don't want the line with the commas

We'll test our function to see if it does what we expect.

First, let's try not keeping the header.

In [None]:
# without the header
with urllib.request.urlopen(url_for_file) as response:
   csv_file = response.read().decode('utf-8').splitlines()
csv_file = skip_comments_and_comma_row(csv_file)
print("first line of csv file is now:\n{}".format(csv_file[0]))

Looks like a row of values from a csv file to me.

What about when we keep the header?

In [None]:
# without the header
with urllib.request.urlopen(url_for_file) as response:
   csv_file = response.read().decode('utf-8').splitlines()
csv_file, header = skip_comments_and_comma_row(csv_file,keep_header=True)
print("header is:\n{}".format(header))
print("first line of csv file is now:\n{}".format(csv_file[0]))

Nice!

But it still annoys Lalama/me/you/everyone that those extra commas are at the end of our header row.

We can use conditionals inside a list comprehension to only keep the not empty elements in the list.

In [None]:
header_without_commas = [item for item in header.split(',') if item is not '']

Woah, that was intense. Let's break it down a bit.

We can split up the line into its component parts, since it's inside brackets.

Python will parse it the same but it's easier for us to read.

In [None]:
header_without_commas = [item
                             for item in header.split(',')
                                             if item is not '']

With the list comprehension split up into its constituent parts, we can see more easily that it's basically a more concise `for loop`.
Here's the same thing written as a for loop:

In [None]:
header_without_commas = []
for item in header.split(','):
    if item is not '':
        header_without_commas.append(item)

## exercises

Our function does what we want, but it's annoying that we have to repeat these lines of code each time before we call the function:

```Python
with urllib.request.urlopen(url_for_file) as response:
   csv_file = response.read().decode('utf-8').splitlines()
```

What would be nice is if we could just pass a filename to the function and have it open the file for us.

Here's what those same lines of code look like when you open a file on your computer instead of pulling from a url on the web: (Don't worry about how it works for right now)
```Python
with open('actual_filename.csv','r') as dummy_filename:
       csv_file = dummy_filename.readlines()
```

You can run this in a code cell below (as long as the csv files are in the same directory as this notebook, which they should be if you haven't moved them).

1) Open the file 'Wiltshire3_means.csv' using the code above.

You can double click on this cell, highlight the code, then copy it with Ctrl-C.

You should only have to change one argument to the `open` function.

2) what `type` of object is `csv_file`? Do we need to call the `splitlines` method on it like we did with the raw file loaded from a url on the web?

3) Rewrite our function so that it accepts a filename as the first argument and then inside the function opens the file using the syntax you just used.

Hint: you can call the `open` function and pass it a variable that refers to a string as the first argument, instead of passing a string literal for the first argument.

i.e., this is totally valid:
```Python
my_favorite_csv = 'Willot1.csv'
with open(my_favorite_csv,'r') as dummy_filename:
    csv_file = dummy_filename.readlines()
```

In [None]:
%%writefile test.py
def skip_comments_and_comma_row(csv_file,keep_header=False):
    with open(csv_file,'r') as dummy_filename:
       csv_file = dummy_filename.readlines()
    counter = 0
    while 1:
        line = csv_file[counter]
        if '//' not in line:
            break 
        counter += 1
    if keep_header:
        header = csv_file[counter+1] 
        return csv_file[counter+2:],header
    else:
        return csv_file[counter+2:]

4) Now save your function to a file using a **cell magic** command.
Enter the following at the top of the cell that contains your function.
`%%writefile your_module_name_here.py%%`
Then when you hit `shift-enter` or `ctrl-enter` to execute the cell, it should save a file with the name `your_module_name_here.py` to the same directory as this notebook.

Congratulations! You just made your first module.

* cell magic
    - specific to each kernel that runs in a Jupyter notebook
    - right now we're using the IPython kernel by default
    - IPython used to be the only kernel but now there's kernels for many other languages 

5) Now you should be able to `import` your function from the file using the following syntax:
```Python
from your_module_name_here import whatever_you_named_the_function
```

6) You can tell it was imported from the right place by looking at the `__file__` property of the function object.
Just enter `whatever_you_named_the_function.__file__` in a cell and you should see the filename for your module.


7) To test whether your module really works, click on 'kernel' in the menu bar above and then click on 'Restart & Clear Output'.

Then change the cell below so it imports your module and parses the csv file.

In [None]:
# it's good form to import modules from the standard library before your own
import csv

from whatever_you_named_your_module import your_function

file_with_skipped_lines = your_function('csv_file')
reader = csv.reader(file_with_skipped_lines, delimiter=',')
parsed_file = list(reader)

8) Okay, now rewrite your function *yet again* so that it parses all of the rows after the header and returns the already parsed list.
Use the writefile cell magic to save this new module version.
Then test it again by clearing the outputs and running your new function on the file.

** you will need to `import` the `csv` module *within* your module! ** 

## more reading

### how `format` works:
    https://pyformat.info/

### conditionals and control flow:
    http://openbookproject.net/thinkcs/python/english3e/conditionals.html
    https://docs.python.org/3/tutorial/controlflow.html