Let's do some actual coding.

# Literary fiction interlude:

Your name is Lalama Evans, half-Hawaiian, former server at Ruby Tuesdays, and currently a psychology graduate student at the Metropolitan University of Fruitville, Florida.

Your advisor, who shall remain nameless, looks exactly like the advisor in Ph.D. comics.

He wants your project to involve the neurotransmitter serotonin, mainly because he finds it really interesting that this amine is also found in plants, and he rambles on about it constantly. Especially at departmental mixers. Especially if he has had too much wine.

You on the other hand would like your project to be about something useful that will help you get a job after grad school.

So you compromise: you'll study something useful that happens to involve serotonin.

You are also a bit of a hacker, because you had a good teacher for computing class in middle school. She was also the mayor of your small town but that's a story for another day.

You figure that you are probably going to study serotonin in some strains of mice, so you go to the Jackson labs website, where they have a ton of data about different strains.

A lot of the data is in files that have the _comma-separated value_ format, called csv files for short. You start by downloading a csv file that has information about serotonin receptor levels in the brains of different mouse strains.

#  Parsing a csv file, part 1

In [1]:
import urllib.request
import csv

url_for_file = "https://raw.githubusercontent.com/NickleDave/EWIN-coding-bootcamp/master/Python/Wiltshire3_means.csv"
with urllib.request.urlopen(url_for_file) as response:
   csv_file = response.read().decode('utf-8')

reader = csv.reader(csv_file, delimiter=',')
parsed_file_1st_attempt = list(reader)

## import statements
###  what did we just do?

First we `import`ed two packages, `urllib` and `csv`.
* **package**: just what it sounds like, a way to "package up" a set of objects and functions.
* smallest unit of code that you can import in Python is a **module**
    - a module is one file
    - a package can have multiple modules

You can `import` wherever you want in a script, but it is considered good form to put your list of `import` statements at the top.

There are three common ways to use the import command.

1. wholesale:
    - e.g., `import urllib`
    - this will load every "sub-module"
2. selective:
    - e.g., `from urllib import request,magic_parser`
    - this lets you load only the sub-modules you want to use. Convenient if you only need those and you don't want to type the whole name of the module, followed by its sub-module every time.
3. abbreviated
    - e.g., `import numpy as np`
    - So now you can type `np.mean()` instead of `numpy.mean()`

The `urllib` library lets us get stuff off the web. You don't have to know anything else about this for now.

We used it to load the file from pheone.jax.org into the variable `csv_file`.
Let's see what's in that variable.

In [None]:
csv_file

The `csv` module is for parsing csv files (obvs).

When we ran these lines of code...

`
reader = csv.reader(csv_file, delimiter=',')
parsed_file = list(reader)
`

...it should in theory have split each line of the file up wherever it found a comma.
Let's see if it did that.

In [None]:
parsed_file_1st_attempt[57:73] # <-- the numbers inside square brackets are indices, we'll explain that in a bit

Hmm, looks like it didn't.
Instead it split up the string at each letter, except for the quoted bits.
Maybe there's something we're not understanding about `urllib`.

Let's turn to our old friend Google, who takes us to a stackoverflow post.
(All programmers should know stackoverflow.)
http://stackoverflow.com/questions/21351882/reading-data-from-a-csv-file-online-in-python-3

Oh, I get it now, the `csv_file` is one big long string. Since a string is a type of sequence, and `csv.reader` splits up sequences, it's splitting up the string into characters.

What we need to do is split up our big long string so there's some other kind of sequence.

The different lines in the file are actually separated by the character that represents a new line, '\n'.

But we have to tell Python to split the string up whenever it sees that character.

To do that we can call the `splitlines` method on the string, as shown in the cell below.

In [None]:
url_for_file = "http://phenome.jax.org/tmp/Wiltshire3_means.csv"
with urllib.request.urlopen(url_for_file) as response:
   csv_file_split = response.read().decode('utf-8').splitlines() # <-- calling the splitlines method

**Notice that we can chain multiple methods calls together using the period.**

The methods will be processed in order from left to right.

Here's the line again from the cell above where we did that:

`csv_file = response.read().decode('utf-8').splitlines()`

What this line of code does is say:
1. use the `read` method of the `response` object on itself. i.e., get whatever's at the link I gave you.
2. then `decode` whatever you read in using the "utf-8" scheme (one of many ways to decode bytes into text)
3. then finally take the text and split it up with the `splitlines` method, that by default splits wherever it finds the special character '\n' that represents the end of a line (like when you type `enter`)

So what does `csv_file` look like after we run `splitlines` on it?

In [None]:
csv_file_split[:4] # <-- the numbers inside square brackets are indices, we'll explain that in a bit

Notice that now when we display the variable, it's surrounded by square brackets.
That's because `splitlines` returns a *list* of strings.

## Lists

* one of the main data types in Python
  - often called *arrays* in other languages


* just a way of grouping things together


* Lists are **ordered**.
  - item 2 comes after item 1, and always will, unless we modify the list
  
  
This matters, for example, when we want to process the lines from a file in order.

Now that we have a list of strings, we can use the csv.reader on them.

Recall that we figured out from a stackoverflow post that the csv.reader will take whatever sequence we give it and split it up.
Since a list, like a string, is a sequence, we can get the reader to split it up.

In [None]:
reader = csv.reader(csv_file_split, delimiter=',')

Okay, so now we made a `reader` object (from the `csv` module).
Did we parse our file yet?

In [None]:
reader

What the heckin' heck is that?

That's the objects location in memory.

But it's not what we wanted at all. We want the lines from our csv file, split up at the commas.

The `reader` is actually an `iterator` object.
In Pythonese, we say an object is an `iterator` if it implements a `next` method.
In this case, each time we call the `next` method on the `reader`, it spits out one of the lines from our list, only split up wherever it found a comma.



In [None]:
reader.__next__()

So you could in theory write a script like the cell below to get all the lines of the file:

In [None]:
#... 399 lines above
line400 = next(reader)
line401 = next(reader)
line402 = next(reader)
line403 = next(reader)
#... and so on

But that would be silly. There's many other ways to get all the lines of the file out of the `reader`.

## for loops

We use a *for loop* to *iterate* over the rows of the file.
To hold all the lines from the csv file after their parsed, we'll create an empty list.

In [None]:
reader = csv.reader(csv_file_split, delimiter=',') # create reader again since we had already called 'next'
parsed_file_for_loop = [] # an empty list
for row in reader:
    parsed_file_for_loop.append(row)

Here's what happens each time through the loop:
   1. the reader object parses one line of the file and puts that parsed line inside the variable `row`
   2. the `append` method of our list object `parsed_file` is called, to add `row` to the end of our list

- things to know about for loops in Python:
  - we loop over a **sequence**
    - a list or a string
  - we put a *dummy variable* after the keyword `for`
    - each time through the loop, this *dummy variable* contains the next item in the sequence
  - a common gotcha: **don't forget the colon after the `for` statement**
    - if you do, the interpreter will tell you the next line has a `SyntaxError` even though it doesn't
    - I do this all the time and you will to

**Notice that we don't have to tell Python how many times to go through the loop.** It just iterates until it runs out of lines. If you want to sound like a brilliant computer scientist, you can say the for loop *consumes* the `iterator`.

This is different from other languages you might have seen.
Here's a for loop in Java code

```Java
for (i=0;i++;i<10)
    { // <-- curly braces
    j = i * 10;
    k = k + j;
    }
print(k);
```

## significant white space

* other languages:
  - use curly braces to identify blocks of code
    - like the or loop above
  - also require semi-colons at the end of lines


* Python doesn't require either


* How does the Python interpreter know which lines of code belong inside our *for loop*?
  - white space is used to organize code
  - All lines indented the same amount belong to a given block

```Python
for i in range(10):
    # beginning of code block in for loop
    j = i * 10
    k = k + j
    # end of code block
print(k)
```

Many people that are used to other languages find this weird at first.

You'll get over it.

Once you do, you'll realize that it enforces readability in a way that other languages do not.

For example, in Java, people can choose how much they indent. As long as there's opening and closing curly braces, it doesn't matter where they are. In practice, good Java coders use consistent indentation, but there's actually nothing about the language that says they have to.

## other ways to consume your iterators
### list comprehension
a `list comprehension` is like a `for loop` that you write inside of a list.

In [None]:
reader = csv.reader(csv_file_split, delimiter=',') # create reader again since we had already called 'next'
parsed_file_list_comp = [row for row in reader]

### calling the list function
* `list` function can convert an iterator to a list
* notice that this is very easy for humans to read

In [None]:
reader = csv.reader(csv_file_split, delimiter=',') # create reader again since we had already called 'next'
parsed_file_list_func = list(reader)

These all gave us the same result:

In [None]:
print("parsed_file_for_loop is the same as parsed_file_list_comp: ",
      parsed_file_for_loop == parsed_file_list_comp) # double equals signs means "are these two things equal"

## indexing
How's our `parsed_file` look now that we gave the `reader` a `list` instead of a `string`?

In [None]:
parsed_file_for_loop[5]

Okay, so that looks like what we got from

In [None]:
parsed_file_1st_attempt[57:73]

only not split up.
Good.
That's what we wanted.

You examine elements/items from a list by **indexing**.

* To index into a list, we use **slice notation**: index numbers and colons inside square brackets
     `parsed_file_1st_attempt[57:73]`
     
Things to know about slice notation:
  - it is **zero indexed**
    - the first item in a list is item 0, the second item is item 1
  - if you don't put a number in front of the colon, that implies index 0
  - the *start* index is *always* included, and the *end* index is *never* included
    - so `parsed_file[:5]` is the first five elements from `parsed_file`
  - if you don't put a number after the colon, that implies the last index, no matter what it is
    - so `parsed_file[:5] + parsed_file[5:]` gives you all of `parsed_file` without any repeated elements

Why zero index?
  - say you want the first five items of a list
    - in one-indexing world where you include the last element
      `list_thing[1:5]`
  - but what if you want to start at index five, and then keep the next five items
      `list_thing[5:5+5-1] # because 5 + 5 = 10, but we only want [5,6,7,8,9]
  - have to do similar things with minus ones if we don't include the last index from our slice notation
  


# Exercises

1) Display the first six items of the `parsed_file_for_loop` list.

In [None]:
parsed_file_for_loop[:7]

2) Assign item 4 from the `parsed_file_for_loop` list to a variable with a name of your choosing

3) What type of object is the item that you assigned to the variable? Hint: there's a function you call to find out.

4) Use the same function to find out what type of object the original `csv_file` variable is (that we didn't use `splitlines` on)

What happens if you try to use splice notation on that `csv_file` object? E.g. using the indices [57:73] from above

Use a `for loop` and the `print` function to display items 57-73 of `csv_file` on separate lines