# 1.0 Python is a calculator
This tutorial notebook introduces the following key concepts and their implementations in python:
- numerical types `int` and `float`
- variables
- Errors
- calling `function`s
- the `list` data structure
- packages `math` and `scipy.stats` for doing various useful calculations.
### Requirements
**Dependencies:**  
Python 3  
`scipy`

**Prerequisites:**  
See tutorial 0, 'Setting up your workstation' for instructions on how to install Python and run this notebook.

In [None]:
# Any text following a hash '#' are *comments*. They aren't read by the computer and are for documenting your code.

A programming language (such as Python 3) is essentially a very generalizeable calculator. Here are some of the core components:
- simple data types called *primitives* such as numbers and characters (`3`, `2.1`, `'a'`, etc)
- complex data *objects* composed of simpler parts (`[3,4]`, `'some text'`)
- *variables* that can be assigned a value (`x = 4`)
- *functions* that perform some transformation on your data (`f(x)`)

We'll explain these in more detail later. For now, let's start with the very useful function `print`. 

## `print()`

In [None]:
# print() takes an input and shows that input to the user:
print('Hello world')

# If print() is given a variable, it prints the value:
x = 'This is the python basics tutorial.'
print(x)

# print can accept more than one input, separated by a comma:
my_variable = 7
print("Here is the value of my_variable:", my_variable)

## Errors
Computers are very picky about grammar. If your code doesn't follow the rules of the programming language, it will throw an *error*.

In [None]:
# A function requires parentheses `()` around the inputs.
print 'Oops, forgot the parentheses.' # this code will raise a SyntaxError.

Usually the error message tries to give helpful advice about what might be incorrect in your code. Reading error messages is a skill that comes with time and practice. 

## Numbers with `int` and `float`

`int` is a simple data type and follows the mathematical definition of an integer, eg. `-5000`, `0`, `17`.  
`float` is short for 'floating point' and is any number with a decimal, eg. `3.1415`, `4.0`.  
[numerical types docs](https://docs.python.org/3/library/stdtypes.html#numeric-types-int-float-complex)

In [None]:
# Simple arithmetic

a = 20 # an int
b = 3 # another int

print(a, "+", b, "=", a+b)
print(a, "-", b, "=", a-b)
print(a, "*", b, "=", a*b) # `*` indicates multiplication
print(a, "**", b, "=", a**b) # `**` indicates exponent
print(a, "/", b, "=", a/b)

**Exercise**

Note that since 20 is not divisible by 3, the result of dividing an `int` by an `int` is a `float`. However, some integers are divisible by some other integers, eg. 20 / 2 = 10. In that case, do you think Python should return an `int` or a `float`? 

How would you check?

In [None]:
# write your code here:


Python also has functionality to do more complex math, but it doesn't come with base Python. Instead, we will have to `import` a *package*. This allows us to use the programs and functions created by someone else. The `math` package is written by the Python developers and installed by default, but can't be used until you `import` it.

[`math` docs](https://docs.python.org/3/library/math.html)

In [None]:
import math
print('The square root of', a, 'is', math.sqrt(a) )
print('Log', a, 'base', b, 'is', math.log(a,b) )

## Variables

So far we've defined the *variables* `my_variable`, `a`, and `b`. Variables in python work a lot like variables in classical algebra; we can set them to a value, input them into functions, etc.
Variable names can be any combination of alphanumeric characters and underscore '_' but cannot start with a number. You also can't redefine 'reserved' keywords like 'print'.

Valid: `my_var87`

Invalid: `my-var87`, `87_myvar`, `print`

In [None]:
# define a new variable
c = 7

# redefine to a new value
c = c + b

# use it in a function
d = math.log(c,10)
# what is the value of d?

## Lists

So far we have only discussed scalar data points each with a single value. However, sometimes we want to manipulate *collections* of data points. Common use cases include:
- *map*: apply the same operation to a set of similar data points, eg. translation, rotation or scaling;
- *reduce*: perform some transformation that aggregates a set of data points into a single value, eg. mean or variance.

There are tons of *data structures* out there, but for now we introduce the `list`.  
[`list` docs](https://docs.python.org/3/library/stdtypes.html#sequence-types-list-tuple-range)

Lists are the most generally useful way to store a collection of data in python. Create a list using brackets `[]` with elements separated by commas `,`:

In [None]:
# create a list
elementary_school_years = [2000,2001,2002,2003,2004] 

# add an element to the end of a list using `append`.
elementary_school_years.append(2005) 

print(elementary_school_years)

Note that you can put anything in a `list`, even other lists! but here we're just using integers.  
Lists are *ordered* so that you can retrieve any element from its *index*. List indices begin at zero, so the first, second and third elements are accessible at indices 0, 1, and 2. 

In [None]:
# get the first element of a list
print("The first element:",elementary_school_years[0])

# to access elements backwards from the end of a list, use negative indices:
print("The last element:",elementary_school_years[-1])

Get a subset of the list using the syntax `my_list[start:stop]`. If start or stop are not included, they are assumed to be 0 and the length of the list respectively, such that `my_list[:]` just returns the entire `my_list`.

In [None]:
# get the first 3 items
print(elementary_school_years[:3])

There's lots more you can do with a list, but for now I'll just introduce a couple of operations on lists that we'll use later in this tutorial.

In [None]:
# get the length of a list
print(len(elementary_school_years))

# reverse a list
print(elementary_school_years[::-1])

**Exercise**: is `len` a mapping or a reduction? how about reversal?

## Statistics with `math` and `scipy`

We now have nearly all the basic ingredients to begin doing useful science with Python! To demonstrate, let's start with some example data. 

Suppose we're engineering a knockout model of a gene. To test whether the knockout was successful, we measure gene expression in 5 replicates of the knockout model and 5 controls: 

In [None]:
ko = [54.00, 58.15, 65.30, 72.06, 96.18]
ctrl = [412.91, 462.26, 492.03, 563.65, 664.11]

It sure looks like we were successful, but our PI is demanding a p-value. Let's do that in Python.

For statistics in Python, the best package out there is `scipy`. `scipy` has functions for every statistical test out there, but we just need the humble t-test. Take a look at the [scipy docs](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html) for the t-test to see how to run it.

Looks like you need inputs `a` and `b`, and a lot of other parameters which we'll leave as default.

In [None]:
import scipy.stats

scipy.stats.ttest_ind(a=ko, b=ctrl)

Wonderful. Our p-value is 7.6e-6, which is definitely significant.

Our PI is pretty demanding. To prepare a visualization for our biweekly meeting, read on to tutorial 1.1.

## Solutions

## Resources

## Postscript: arrays, matrices and tensors

You may notice that the scipy doc for the t-test states that parameters `a` and `b` should be 'array-like'. `array` is a class defined in the `numpy` package for linear algebra and implements the idea of 1st-order vectors, 2nd-order matrices, and higher-order tensors. The lists of numbers we used are first-order; lists of lists would be considered second-order, etc. If, for example, we had done RNA-seq to get gene expression of thousands of genes across our 10 samples, we could organize these into a second-order matrix and use `scipy.stats.ttest_ind` perform t-tests for each gene. But that's beyond scope for this introductory tutorial.