# Bootcamp day 1
- who are we
- what are we trying to teach you
- what we **can't** teach you in this workshop

# Python and R
* Python
 - full-fledged general purpose programming language
 - 2nd most popular language (depending on who you ask)
 
two things Python is very good at:
  1. the web
   - used for Dropbox, YouTube, Instagram, Pinterest, many other sites
  2. science
   - many popular libaries: Numpy, Scipy, scikit-learn, Jupyter
   - many libraries for specific fields
* R

# A brief digression: Jupyter notebooks
* what are they?
 - documents that you read in a browser
 - that let you mix code and text
 - and program in multiple languages
* why would you use them?
 - to present your results to other people
 - to do exploratory analysis, so you don't have to write a bunch of little scripts and run them from the command line
 - for interactive tutorials, like this one

* Outline:
  - Preliminary Python
    - ye olde Print function
    - Variables, How Do They Work?
    - Objects
  - A Case Study
    - parsing a csv file
    - rolling your own (functions)

# Python preliminaries

## Displaying text
To display text, we use the ```print``` function.

A _function_ is a reusable piece of code.
Python comes with a lot of built-in functions. That's part of its 'batteries included' philosophy. You can also write your own functions.

When you call a function, you pass it _arguments_ in parentheses, as shown in the cell below.

To execute the code in the cell, click on it so that it is surrounded by a green box, and then hit ```shift``` and ```enter``` at the same time.

In [None]:
print("Hello, nerds")

In [None]:
print(significance)

Okay, so that gave us an error.
What is a ```NameError```? I'll explain that in a second.

What you should know is that whenever you put something in quotes, you're telling Python:
"Hey, this is text. Don't try to interpret it as part of a program."

The computer jargon word for text is a ```string```, but no one remembers why (http://stackoverflow.com/questions/880195/the-history-behind-the-definition-of-a-string).

In [None]:
print("Hey, I'm a string")
print('Hey I\'m also a string but I\'m enclosed by single quotes') # Notice backslash to "escape" apostrophe 


"""
Hey I'm a multiple line string.
That's why I'm enclosed with three quotes.
I'm often used for something called doc strings.
You'll find out about them later.
"""

# Variables: like algebra, but not really

Sometimes you want to store a value in the computer's memory.
To store values somewhere, we assign them to what are called _variables_.
Just like in math, a _variable_ can take on whatever value we want it to.
But most of the time, a variable in a computer program just has one static value (unlike in math where we use values to solve entire classes of problems).

To stick some value inside a variable, we use the _assignment operator_.
That's fancy computer science jargon for the equals sign.
Like so:

In [None]:
significance = 0.04 # at least, hopefully <-- by the way, this is a comment. your comments should be more useful.

Note that the equals sign is _not_ acting like it does in a math equation.

It's saying: 'take some chunk of memory that I will refer to with the name on the left side of the equals sign, and then put the value on the right side of the equals sign inside that chunk of memory'.

Notice that you can `print` a variable just like we did with a string.

In [None]:
print(significance)

Okay, so now we don't get that ```NameError``` anymore.

But ... why?


A ```NameError``` is the Python interpreter's way of saying that it thinks you gave it the name of an object, but the interpreter doesn't find that name in its list of objects.

So once we **defined** a variable named `significance`, the interpreter was able to find that object and `print` its value.


# everything you need to know about object-oriented programming (for now)

An _object_ is a high falutin' computer term that comes from an even higher falutin' sub-field called _object- oriented programming_.

Everything in Python is an object.

So if you have some vague idea of what objects are, it will help you use other people's code and write your own.

Here's all you have to know about objects for now:

* you can have _classes_ of objects
 - for example, you could have a class called "car"
 - if you type

`car1 = car()`

then you would have created a variable that contains an _instance_ of the "car" class.

* Objects have _properties_
 - for example, your car might have a "speed" property

`print(car1.speed)`

`45`

* Objects have _methods_
 - for example, your car might have an "accelerate" method that increases the speed by some amount.


`car.accelerate(amount=3)`

`print(car1.speed)`

`48`

A `string` is a class in Python
One property of a `string` would be the actual string of characters you assign to it.
One method of a `string` is `reverse`.
Let's explore that a bit.
First we'll make a string.

In [None]:
book = "mostly harmless"

We can use the ```type``` command to find out what kind of object our variable is.

In [None]:
type(book)

We call the method of an object by following its name with a period. 

In [None]:
book.title() # the title method capitalizes words in a string as if the string were the title, e.g., of a book

You can see the methods and properties of an object by calling the `dir` function on it.

In [None]:
dir(book)

Notice that everything in Python is an object.

In [None]:
type("Oh I wonder what kind of object I am")

In [None]:
type(4)

In [None]:
type(str) # meta

In [None]:
type(type) # uber-meta

Okay, that's enough preliminary stuff. Let's do some actual coding.

# Literary fiction interlude:

Your name is Lalama Evans, half-Hawaiian, former server at Ruby Tuesdays, and currently a psychology graduate student at the Metropolitan University of Fruitville, Florida.

Your advisor, who shall remain nameless, looks exactly like the advisor in Ph.D. comics.

He wants your project to involve the neurotransmitter serotonin, mainly because he finds it really interesting that this amine is also found in plants, and he rambles on about it constantly. Especially at departmental mixers. Especially if he has had too much wine.

You on the other hand would like your project to be about something useful that will help you get a job after grad school.

So you compromise: you'll study something useful that happens to involve serotonin.

You are also a bit of a hacker, because you went out with this guy for a while that worked in I.T. He had to go, because he did things like eat breakfast burritos in the shower so he could be more "efficient". But you put the script-writing skills that he helped you learn to work as you try to figure out what your project will be.

You figure that you are probably going to study serotonin in some strains of mice, so you go to the Jackson labs website, where they have a ton of data about different strains.

A lot of the data is in files that have the _comma-separated value_ format, or csv files for short. You start by downloading a csv file that has information about serotonin receptor levels in the brains of different mouse strains.

#  Parsing a csv file, part 1

In [None]:
import urllib.request
import csv

url_for_file = "http://phenome.jax.org/tmp/Wiltshire3_means.csv"
with urllib.request.urlopen(url_for_file) as response:
   csv_file = response.read().decode('utf-8')

reader = csv.reader(csv_file, delimiter=',')
parsed_file = list(reader)

### So what did we just do?

first we `import`ed two modules (sets of functions).

You can `import` wherever you want in a script, but it is considered good form to put your list of `import` statements at the top.

There are three common ways to use the import command.

1. wholesale:
    - e.g., `import urllib`
    - this will load every "sub-module"
2. selective:
    - e.g., `from urllib import request,magic_parser`
    - this lets you load only the sub-modules you want to use. Convenient if you only need those and you don't want to type the whole name of the module, followed by its sub-module every time.
3. abbreviated
    - e.g., `import numpy as np`
    - So now you can type `np.mean()` instead of `numpy.mean()`

The `urllib` library lets us get stuff off the web.

We used it to load the file from pheone.jax.org into the variable `csv_file`.

In [None]:
csv_file

The `csv` module is for parsing csv files (obvs).
In theory, it should have split each line of the file up wherever it found a comma.
Let's see if it did that.

In [None]:
parsed_file

Hmm, looks like it didn't.
Instead it split up the string at each letter, except for the quoted bits.
Maybe there's something we're not understanding about `urllib`.

Let's turn to our old friend Google, who takes us to a stackoverflow post.
(All programmers should know stackoverflow.)
http://stackoverflow.com/questions/21351882/reading-data-from-a-csv-file-online-in-python-3

Oh,I get it, `csv_file` is one big long string, and `csv.reader` splits up sequences, so it's splitting up the string since a string is a type of sequence.

What we need to do is split up our big long string.

The different lines in the file are actually separated by a newline character, '\n'.

But we have to tell Python to split the string up whenever it sees that character.

In [2]:
url_for_file = "http://phenome.jax.org/tmp/Wiltshire3_means.csv"
with urllib.request.urlopen(url_for_file) as response:
   csv_file = response.read().decode('utf-8').splitlines()
reader = csv.reader(csv_file, delimiter=',')
parsed_file = list(reader)

NameError: name 'urllib' is not defined

How's our `parsed_file` look now?

In [None]:
parsed_file

In [None]:
def parse_jackson_csv(csv_string):
    """
    Parses csv files from Jackson labs website.
    Deals with some of the idiosyncracies that csv.sniffer doesn't recognize.
    """
    SEPARATOR_BEFORE_HEADER = ",,,,,,,,,,\n,,,,,,,,,,,\n"
    index = csv_string.rfind(SEPARATOR_BEFORE_HEADER)
    new_start_index = index+len(SEPARATOR_BEFORE_HEADER)+1
    csv_string = csv_string[new_start_index:]
    return csv_string

In [None]:
csv_string = parse_jackson_csv(csv_file)

In [None]:
dialect = csv.Sniffer().sniff(csv_string)
reader = csv.reader(csv_string, dialect)
thing = list(reader)

In [None]:
thing

In [None]:
url_for_file = "http://phenome.jax.org/tmp/Willott1_table.csv"
with urllib.request.urlopen(url_for_file) as response:
   csv_file = response.read().decode('utf-8')
reader = csv.reader(csv_file, delimiter=',', quotechar='"')