# Introduction to Python

In this notebook we are going to go over some very quick python basics to prepare for the Spatial Analysis, Git and using Python for Data Analysis sections.

We will go over the following: 

1. Import's external libraries
2. Data Types
4. Variables
5. Arrays
6. Lists
7. Loops
8. Functions
9. Further Reading


## 1. Importing external libraries

In Python we use the `import` command to load external libraries into our programs. An external library can be thought of as a suite to tools you can pull of the shelf. We are going to import the numpy library, a library for numerical computing. 

In [1]:
import numpy

## 2. Data Types 

Python has three different types of basic data types built into the language. 

1. Int: Ints are interger numbers as the name indicates (e.g, 10). 
2. Floats: Floats are representaions of decimal numbers (e.g., 10.05)
3. Strings: Strings represent characters (e.g, Chicago)
4. Lists: Lists in python are containers that can hold multiple types of variables
5. Dictionaries: key-value pair lookup

In [2]:
print(10, type(10))
print(10.05, type(10.05))
print('Chicago', type('Chicago'))

10 <class 'int'>
10.05 <class 'float'>
Chicago <class 'str'>


### Arithmetic

We can do the normal arithmetic operations on ints and floats:
    
1. +: addition
2. -: subtraction
3. *: multiplication
4. /: division
5. **: exponentiation

In [3]:
print('int arithmetic')
print(10+5)
print(10-5)
print(10*5)
print(10/5)
print(10**5)
print('\n')
print('float arithmetic')
print(10.3+5.4)
print(10.3-5.4)
print(10.3*5.4)
print(10.3/5.4)
print(10.2**2.3)

int arithmetic
15
5
50
2.0
100000


float arithmetic
15.700000000000001
4.9
55.620000000000005
1.9074074074074074
208.82399263596477


### Mixing float and integer arithmetic

In Python 2: 5/2.3 will return 2 and cut off the decimal portion. 
In Python 3: 5/2.3 will return 2.174 and keep the decimal portion. 

When converting from Python2 to Python3 this can often be a source of subtle pernicous bugs, so be careful!


In [4]:
print(5/2.3)

2.173913043478261


Strings are a datatype the represent chracters and are created by enclosing the string in double quotes.

In [5]:
print("Chicago")
print("New York")

Chicago
New York


# 3.Variables 

Let's load weather data recorded from Chicago O'Hare. 

In [6]:
numpy.loadtxt('OHARE_TEMP_2015.csv',delimiter=',')

array([[  2.01501010e+07,   2.00000000e+01],
       [  2.01501020e+07,   2.60000000e+01],
       [  2.01501030e+07,   3.10000000e+01],
       [  2.01501040e+07,   2.90000000e+01],
       [  2.01501050e+07,   2.00000000e+00],
       [  2.01501060e+07,   5.00000000e+00],
       [  2.01501070e+07,   3.00000000e+00],
       [  2.01501080e+07,  -2.00000000e+00],
       [  2.01501090e+07,   8.00000000e+00],
       [  2.01501100e+07,   5.00000000e+00],
       [  2.01501110e+07,   2.60000000e+01],
       [  2.01501120e+07,   2.30000000e+01],
       [  2.01501130e+07,   1.70000000e+01],
       [  2.01501140e+07,   1.10000000e+01],
       [  2.01501150e+07,   2.20000000e+01],
       [  2.01501160e+07,   2.80000000e+01],
       [  2.01501170e+07,   3.40000000e+01],
       [  2.01501180e+07,   3.70000000e+01],
       [  2.01501190e+07,   3.20000000e+01],
       [  2.01501200e+07,   3.40000000e+01],
       [  2.01501210e+07,   3.10000000e+01],
       [  2.01501220e+07,   3.00000000e+01],
       [  

`numpy.loadtxt('OHARE_TEMP_2015.csv',delimiter=',')` let's breakdown this command

We called the numpy loadtxt command from numpy by using syntax *numpy.somecommand*. This is a method from the numpy library. You can think of methods as verbs that do things in your program. In this case this loads data from a CSV file. The method takes arguments or parameters and returns an output. In this case we passed two arguments, `OHARE_TEMP_2015.csv` and `delimter=','` in two different ways, as *positional* and *keyword* arguments, respectively.  

The filename `OHARE_TEMP_2015.csv` is the positional argument that has to be passed to the method first. The keyword argument is the `delimiter`. Our file has fields separated by commas so to pass this information into the function we pass it using the keyword `delimiter` by setting `delimiter` equal to the string `,`, `delimiter=','`'.

When we call the function loadtxt it outputs an array of data. We can capture the output by storing it in a variable that we can reference later. You can think of the variable as a label on the data. In python variables 
must start with a letter. Also a good variable should be self-documenting on what the variable labels. This means `np_ohare_temperature` is better than, say, `x` because the former communicates the variable is labeling a numpy array that contains temperature data from ohare, x communicates nothing. 

In [7]:
np_ohare_temperature = numpy.loadtxt('OHARE_TEMP_2015.csv',delimiter=',')

## 4. Arrays

Notice that when we assigned the output of the `numpy.loadtxt` command to a variable it did not display the output of the command. 

We can examine the output using the `print` command that prints the content that is labelled by a variable. 

In [8]:
print(np_ohare_temperature)

[[  2.01501010e+07   2.00000000e+01]
 [  2.01501020e+07   2.60000000e+01]
 [  2.01501030e+07   3.10000000e+01]
 [  2.01501040e+07   2.90000000e+01]
 [  2.01501050e+07   2.00000000e+00]
 [  2.01501060e+07   5.00000000e+00]
 [  2.01501070e+07   3.00000000e+00]
 [  2.01501080e+07  -2.00000000e+00]
 [  2.01501090e+07   8.00000000e+00]
 [  2.01501100e+07   5.00000000e+00]
 [  2.01501110e+07   2.60000000e+01]
 [  2.01501120e+07   2.30000000e+01]
 [  2.01501130e+07   1.70000000e+01]
 [  2.01501140e+07   1.10000000e+01]
 [  2.01501150e+07   2.20000000e+01]
 [  2.01501160e+07   2.80000000e+01]
 [  2.01501170e+07   3.40000000e+01]
 [  2.01501180e+07   3.70000000e+01]
 [  2.01501190e+07   3.20000000e+01]
 [  2.01501200e+07   3.40000000e+01]
 [  2.01501210e+07   3.10000000e+01]
 [  2.01501220e+07   3.00000000e+01]
 [  2.01501230e+07   3.00000000e+01]
 [  2.01501240e+07   3.30000000e+01]
 [  2.01501250e+07   3.10000000e+01]
 [  2.01501260e+07   2.30000000e+01]
 [  2.01501270e+07   2.70000000e+01]
 

Our output is a two column array. The first column contains the data a measurement was taken and the second column provides the maximum temperature on that day.

Let's examine how many rows and columsn are in our array by accessing the `shape` attribute. 

In [9]:
np_ohare_temperature.shape

(365, 2)

An attribute describes the "state" of an object. If functions are verbs that do something than attributes are the ajdectives that describe the "state" of an object. In this case the `shape` attrbute indicates how many rows and columns we have in the array, 365 rows for 365 days of the year and 2 columns for the data and average temperature. 

### Indexing

To access specific values in the array we have to index the array. For reasons that have to do with how arrays are implemented under-the-hood the index of array beings at 0 and ends at N-1 where N is the length of the array. 

In [10]:
#index the first value of the array 0,0
#this is the first row and first column
print(np_ohare_temperature[0,0])
#index the second row and first column of the array 1,0
print(np_ohare_temperature[1,0])

20150101.0
20150102.0


This index returns the first and second dates in the array. 

In [11]:
print(np_ohare_temperature[0,1])
print(np_ohare_temperature[1,1])

20.0
26.0


If we change our second index to one we can get the average temperature for the two days. 

## Lists 

`lists` are a python data type that is built into the language as opposed to the NumPy array which has to be loaded from an external library. To create a list we put values separated by commas in between brackets.  

In [15]:
[0,1,2,3,4,5,6]

[0, 1, 2, 3, 4, 5, 6]

In [16]:
ls_num = [0,1,2,3,4,5,6]

## Slicing 

Now that we have `lists` in our tool box let's try and pull out the first seven days of the year and display them. 

In [18]:
np_ohare_temperature[[0,1,2,3,4,5,6],0]

array([ 20150101.,  20150102.,  20150103.,  20150104.,  20150105.,
        20150106.,  20150107.])

It is useful to get multiple dates in the array but we also want the average temperatures on those days. We can use a list on the columns as well. 

In [34]:
np_ohare_temperature[[0,1,2,3,4,5,6]]

array([[  2.01501010e+07,   2.00000000e+01],
       [  2.01501020e+07,   2.60000000e+01],
       [  2.01501030e+07,   3.10000000e+01],
       [  2.01501040e+07,   2.90000000e+01],
       [  2.01501050e+07,   2.00000000e+00],
       [  2.01501060e+07,   5.00000000e+00],
       [  2.01501070e+07,   3.00000000e+00]])

A more convienant, scalable and less fragile way to pull out the values from arrays is through slicing. We slice into the array in the following way: 

In [35]:
np_ohare_temperature[0:7]

array([[  2.01501010e+07,   2.00000000e+01],
       [  2.01501020e+07,   2.60000000e+01],
       [  2.01501030e+07,   3.10000000e+01],
       [  2.01501040e+07,   2.90000000e+01],
       [  2.01501050e+07,   2.00000000e+00],
       [  2.01501060e+07,   5.00000000e+00],
       [  2.01501070e+07,   3.00000000e+00]])

`0:7` is an instruction to start of the 0th element of the list and go up to but not including the 7th array of the list

Slices do not have to begin at zero, we could start at the 3rd element for instance. 

In [36]:
np_ohare_temperature[3:7]

array([[  2.01501040e+07,   2.90000000e+01],
       [  2.01501050e+07,   2.00000000e+00],
       [  2.01501060e+07,   5.00000000e+00],
       [  2.01501070e+07,   3.00000000e+00]])

If we omit the left index in the slice it will by default begin at the beginning of the array. This is as if there is an implict zero. 

In [37]:
np_ohare_temperature[:7]

array([[  2.01501010e+07,   2.00000000e+01],
       [  2.01501020e+07,   2.60000000e+01],
       [  2.01501030e+07,   3.10000000e+01],
       [  2.01501040e+07,   2.90000000e+01],
       [  2.01501050e+07,   2.00000000e+00],
       [  2.01501060e+07,   5.00000000e+00],
       [  2.01501070e+07,   3.00000000e+00]])

If we omit the right index in the slice it will by default end at the end of the array. 

In [41]:
np_ohare_temperature[360:]

array([[  2.01512270e+07,   3.80000000e+01],
       [  2.01512280e+07,   3.30000000e+01],
       [  2.01512290e+07,   3.30000000e+01],
       [  2.01512300e+07,   2.90000000e+01],
       [  2.01512310e+07,   2.60000000e+01]])

If there is no right or left index specified then the slice will range over the entire array. 

In [42]:
np_ohare_temperature[:]

array([[  2.01501010e+07,   2.00000000e+01],
       [  2.01501020e+07,   2.60000000e+01],
       [  2.01501030e+07,   3.10000000e+01],
       [  2.01501040e+07,   2.90000000e+01],
       [  2.01501050e+07,   2.00000000e+00],
       [  2.01501060e+07,   5.00000000e+00],
       [  2.01501070e+07,   3.00000000e+00],
       [  2.01501080e+07,  -2.00000000e+00],
       [  2.01501090e+07,   8.00000000e+00],
       [  2.01501100e+07,   5.00000000e+00],
       [  2.01501110e+07,   2.60000000e+01],
       [  2.01501120e+07,   2.30000000e+01],
       [  2.01501130e+07,   1.70000000e+01],
       [  2.01501140e+07,   1.10000000e+01],
       [  2.01501150e+07,   2.20000000e+01],
       [  2.01501160e+07,   2.80000000e+01],
       [  2.01501170e+07,   3.40000000e+01],
       [  2.01501180e+07,   3.70000000e+01],
       [  2.01501190e+07,   3.20000000e+01],
       [  2.01501200e+07,   3.40000000e+01],
       [  2.01501210e+07,   3.10000000e+01],
       [  2.01501220e+07,   3.00000000e+01],
       [  

## Loops 

## Functions