# Introduction

This material assumes that you have programmed before. This first session provides a quick introduction to programming in Python for those who either haven't used Python before or need a quick refresher.

Let's start with a hypothetical problem we want to solve. We are interested in understanding the relationship between the weather and the number of mosquitos occuring in a particular year so that we can plan mosquito control measures accordingly. Since we want to apply these mosquito control measures at a number of different sites we need to understand both the relationship at a particular site and whether or not it is consistent across sites. 

The data we have to address this problem has been provided by another research group and are stored in tables in comma-separated values (CSV) files. Each file holds the data for a single location, each row holds the information for a single year at that location, and the columns hold the data on both mosquito numbers and the average temperature and rainfall from the beginning of mosquito breeding season. The first few rows of our first file look like:

~~~
year,temperature,rainfall,mosquitos
2001,87,222,198
2002,72,103,105
2003,77,176,166
~~~

## Objectives

* Conduct variable assignment, looping, and conditionals in Python
* Use an external Python library
* Read tabular data from a file
* Subset and perform analysis on data
* Display simple graphs

## Loading Data

In [1]:
import pandas

In [2]:
pandas.read_csv('A1_mosquito_data.csv')

Unnamed: 0,year,temperature,rainfall,mosquitos
0,2001,80,157,150
1,2002,85,252,217
2,2003,86,154,153
3,2004,87,159,158
4,2005,74,292,243
5,2006,75,283,237
6,2007,80,214,190
7,2008,85,197,181
8,2009,74,231,200
9,2010,74,207,184


In [6]:
data=pandas.read_csv('A1_mosquito_data.csv')

In [7]:
print data


   year  temperature  rainfall  mosquitos
0  2001           80       157        150
1  2002           85       252        217
2  2003           86       154        153
3  2004           87       159        158
4  2005           74       292        243
5  2006           75       283        237
6  2007           80       214        190
7  2008           85       197        181
8  2009           74       231        200
9  2010           74       207        184


In [8]:
data


Unnamed: 0,year,temperature,rainfall,mosquitos
0,2001,80,157,150
1,2002,85,252,217
2,2003,86,154,153
3,2004,87,159,158
4,2005,74,292,243
5,2006,75,283,237
6,2007,80,214,190
7,2008,85,197,181
8,2009,74,231,200
9,2010,74,207,184


## Manipulating data

In [9]:
print type(data)

<class 'pandas.core.frame.DataFrame'>


In [10]:
print data['year']

0    2001
1    2002
2    2003
3    2004
4    2005
5    2006
6    2007
7    2008
8    2009
9    2010
Name: year, dtype: int64


In [11]:
print data[['year','temperature']]

   year  temperature
0  2001           80
1  2002           85
2  2003           86
3  2004           87
4  2005           74
5  2006           75
6  2007           80
7  2008           85
8  2009           74
9  2010           74


In [12]:
print data[0:2]

   year  temperature  rainfall  mosquitos
0  2001           80       157        150
1  2002           85       252        217


In [13]:
print data[0:5]

   year  temperature  rainfall  mosquitos
0  2001           80       157        150
1  2002           85       252        217
2  2003           86       154        153
3  2004           87       159        158
4  2005           74       292        243


In [14]:
print data[0:1]

   year  temperature  rainfall  mosquitos
0  2001           80       157        150


In [15]:
print data[1]

KeyError: 1

In [16]:
print data.iloc[1]

year           2002
temperature      85
rainfall        252
mosquitos       217
Name: 1, dtype: int64


In [18]:
print data[['temperature','year']][data['year']>2005]

   temperature  year
5           75  2006
6           80  2007
7           85  2008
8           74  2009
9           74  2010


In [21]:
print data[['temperature','rainfall']].mean()

temperature     80.0
rainfall       214.6
dtype: float64


In [23]:
print data['temperature'].min()

74


In [24]:
print data['mosquitos'][1:3].std()

45.2548339959


### Challenge

Import the data from `A2_mosquito_data.csv`, create a new variable that holds a data frame with only the weather data, and print the means and standard deviations for the weather variables.

In [25]:
newdata=pandas.read_csv('A2_mosquito_data.csv')

In [26]:
weatherdata=newdata[['temperature','rainfall']]

In [27]:
print weatherdata[['temperature','rainfall']].mean()

temperature     80.392157
rainfall       207.039216
dtype: float64


In [30]:
print weatherdata.std()

temperature     6.135400
rainfall       56.560396
dtype: float64


## Loops

for item in list:
    do_something

In [31]:
temps=data['temperature']

In [36]:
for temp_in_f in temps:
    temp_in_c=(temp_in_f - 32) * (5/9)
    print temp_in_c

0
0
0
0
0
0
0
0
0
0


## Conditionals

In [None]:
if conditional_test:
    do_something

In [38]:
temp=data['temperature'][0]
if temp > 80:
    print "The temperature is greater than 80"

In [39]:
temp=data['temperature'][0]
if temp < 87:
    print "The temperature is less than 87"
elif temp > 87:
    print "The temperature is greater than 87"
else:
    print "The temperature is equal to 87"

The temperature is less than 87


### Challenge

Import the data from `A2_mosquito_data.csv`, determine the mean temperate, and loop over the temperature values. For each value print out whether it is greater than the mean, less than the mean, or equal to the mean.

In [47]:
tempdata=pandas.read_csv('A2_mosquito_data.csv')
temperatures=tempdata['temperature']

tempmean=temperatures.mean()

for temp_in_f in temperatures:
    if temp_in_f > tempmean:
        print "Greater than mean"
    elif temp_in_f < tempmean:
        print "Smaller than mean"
    else:
        print "Equal to mean"
        

Greater than mean
Smaller than mean
Greater than mean
Smaller than mean
Smaller than mean
Greater than mean
Greater than mean
Smaller than mean
Smaller than mean
Greater than mean
Greater than mean
Smaller than mean
Greater than mean
Smaller than mean
Greater than mean
Greater than mean
Smaller than mean
Greater than mean
Greater than mean
Greater than mean
Greater than mean
Smaller than mean
Greater than mean
Smaller than mean
Smaller than mean
Smaller than mean
Smaller than mean
Greater than mean
Greater than mean
Greater than mean
Greater than mean
Smaller than mean
Smaller than mean
Smaller than mean
Smaller than mean
Greater than mean
Greater than mean
Smaller than mean
Greater than mean
Smaller than mean
Smaller than mean
Greater than mean
Greater than mean
Smaller than mean
Smaller than mean
Smaller than mean
Greater than mean
Smaller than mean
Smaller than mean
Greater than mean
Greater than mean


## Plotting

### Challenge

Using the data in `A2_mosquito_data.csv` plot the relationship between the number of mosquitos and temperature and the number of mosquitos and rainfall.

### Key Points

*   Import a library into a program using `import libraryname`.
*   Use the `pandas` library to work with data tables in Python.
*   Use `variable = value` to assign a value to a variable.
*   Use `print something` to display the value of `something`.
*   Use `dataframe['columnname']` to select a column of data.
*   Use `dataframe[start_row:stop_row]` to select rows from a data frame.
*   Indices start at 0, not 1.
*   Use `dataframe.mean()`, `dataframe.max()`, and `dataframe.min()` to calculate simple statistics.
*   Use `for x in list:` to loop over values
*   Use `if condition:` to make conditional decisions
*   Use the `pyplot` library from `matplotlib` for creating simple visualizations.

## Next steps

With the requisite Python background out of the way, now we're ready to dig in to analyzing our data, and along the way learn how to write better code, more efficiently, that is more likely to be correct.