#Data Analysis with Python
##Hugo Serrano
###hbarbosa@biocomplexlab.org
![BioComplex Lab](img/biocomplex.png)

#Scientific Python
![img](./img/sciwheel.png)

###Roadmap
 1. Introduction
  1. Python
  2. IPython
 2. Data analysis and visualization
  1. Numpy
  2. Matplotlib
  3. Pandas
 3. Network analysis
  1. igraph
  2. powerlaw
  3. graph_tool
 4. Geographic data analysis
  1. geopy
  2. shapely
  3. geoplotlib

#Basic Python syntax
#### First, let us fire the Python interpreter
#### Windows users (assuming you already have the Python interpreter installed on your machine)
##### Open the command prompt and type python ```<Enter>```
```
C:\> python
```
```
Python 2.7.6 (default, Sep  9 2014, 15:04:36) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 
``` 
#### *NIX Users
##### Open the terminal and fire the python interpreter
```
$ python
```
```
Python 2.7.6 (default, Sep  9 2014, 15:04:36) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 
```

###Python interpreter as a simple calculator

```python
>>> 3 + 5
8
>>> 5 * 10
50
>>> 5 / 2 # Integers division
2 
```


### Python infers the operation to be performed from the operands types
```python
>>> 5/2.0 #If one of the operands is a float, you have the regular division operation 
2.5
>>> 5 * (2+3j)
(10+15j)

>>> 2**4
16

>>> _**2 # This special variable _ stores the output of the last operation
256

>>> _**2
65536
```


##Basic data types
```python
>>> a = 4
>>> type(a)
<type 'int'>
```

```python
>>> a = 2.5
>>> type(a)
<type 'float'>
```

```python
>>> a = 'Hello'
>>> type(a)
<type 'str'>
```

## Advanced  data types
### Lists
```python
>>> my_list = [1,2,3,4]
>>> my_other_list = [8,7,2.4,3]
>>> my_list + my_other_list # The addition operation over lists concatenates them
[1, 2, 3, 4, 8, 7, 2.4, 3]
```

```python
>>> type(my_list)
<type 'list'>
```

```python
>>> my_list[0] = 100 # You can re-asign a new value to an element of a list from its index
>>> print my_list
[100, 2, 3, 4]
```

```python
>>> my_list[-1] = 100**2  # You can also access elements using a negative index
>>> print my_list
[100, 2, 3, 10000]
```


### Slicing

```python 
>>> obs = [2, 5, 3, 1, 3, 1, 5, 7]
>>> print obs[:3]
[2, 5, 3]
```

```python
>>> print obs[3:]
[1, 3, 1, 5, 7]
```

```python
>>> print obs[-3:]
[1, 5, 7]

```

##### (In Python, lists can store any object)
```python
>>> weird_list = ["Hello",2.3, 4, 3 + 2j, ["this guy here belongs to a sublist"]]
>>> for i in weird_list:
...    print type(i)
... 
<type 'str'>
<type 'float'>
<type 'int'>
<type 'complex'>
<type 'list'>
```

##Tuples

```python
>>> my_tuple = (2,3,0,3)
>>> len(my_tuple)
4
```

```python
>>> my_tuple[0] = 100
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
```

## Dictionaries
```python
>>> user = {"name":"Hugo","email":"hbarbosa@biocomplex.org"}
>>> user["name"]
'Hugo'
>>> user["email"]
'hbarbosa@biocomplex.org'
```

### Dictionary assignment and nested dictionaries
```python
>>> user["affiliation"] = {"instition":"Florida Tech","dept":"Computer Sciences"}
>>> print user["affiliation"]["dept"]
Computer Sciences
>>> print user
{'affiliation': {'dept': 'Computer Sciences', 'instition': 'Florida Tech'}, 'name': 'Hugo', 'email': 'hbarbosa@biocomplex.org'}
```

### Querying the dictionary structure
```python
>>> user.keys()
['affiliation', 'name', 'email']
>>> user.values()
[{'dept': 'Computer Sciences', 'instition': 'Florida Tech'}, 'Hugo', 'hbarbosa@biocomplex.org']
>>> user.items()
[('affiliation', {'dept': 'Computer Sciences', 'instition': 'Florida Tech'}), ('name', 'Hugo'), ('email', 'hbarbosa@biocomplex.org')]
```


##Strings
 - Strings in python are very powerful, especially when you need to deal with text data files.  
 ```python
>>> a = "abc"
>>> a*10 # repeat string `a` 10 times
'abcabcabcabcabcabcabcabcabcabc'
```

```python
>>> c = "123"
>>> (a*10)+c #repat string `a` 10 times and concatenate with string `c`
'abcabcabcabcabcabcabcabcabcabc123'
```

### Filling strings with zeros
```python
>>> x = 10
>>> print str(x).zfill(3)
010
```


```python
>>> values = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
>>> for v in values:
...    print "output_file_%s.txt"%(str(v).zfill(3))
... 
output_file_001.txt
output_file_002.txt
output_file_003.txt
output_file_004.txt
output_file_005.txt
output_file_006.txt
output_file_007.txt
output_file_008.txt
output_file_009.txt
output_file_010.txt
output_file_011.txt
output_file_012.txt
output_file_013.txt
output_file_014.txt
output_file_015.txt
```

###Wrangling text data 

```python
>>> row = "2.45,3.46,10.82,100,100,0,0,0"
>>> row
'2.45,3.46,10.82,100,100,0,0,0'
>>> row.split(",")
['2.45', '3.46', '10.82', '100', '100', '0', '0', '0']
```

###Loops


In [2]:
cities = ['Los Angeles','Chicago','Austin','Miami']
populations = [3884,2718,885,417]
for city in cities:
    print city

Los Angeles
Chicago
Austin
Miami


Notice that in Python, blocks are delimited by indentation. For example:

In [236]:
for i in range(0,5):
    for j in range(5,10):
        print i*j,
    print "\n----------"
print "End"

0 0 0 0 0 
----------
5 6 7 8 9 
----------
10 12 14 16 18 
----------
15 18 21 24 27 
----------
20 24 28 32 36 
----------
End


In [10]:
for pop,city in zip(populations,cities):
    print "%s has a population of %d (thousands)"%(city,pop) 

Los Angeles has a population of 3884 (thousands)
Chicago has a population of 2718 (thousands)
Austin has a population of 885 (thousands)
Miami has a population of 417 (thousands)


In [11]:
for i in range(10):
    print i,i**2,i**3

0 0 0
1 1 1
2 4 8
3 9 27
4 16 64
5 25 125
6 36 216
7 49 343
8 64 512
9 81 729


### Conditional statements

In [144]:
num = int(raw_input("Enter a number between 20 and 30: "))
if num  < 20:
    print "Too small. Please, try again!" 
elif num >= 20 and num <= 30:
    print "%d^2 = %d"%(num,num**2)
else:
    print "Too large. Please, try again!"

Enter a number between 20 and 30: 28
28^2 = 784


###Conditional expression

In [149]:
message = "%d^2 = %d"%(num,num**2) if (num >= 20) and (num <= 30) else "Error"
print message

28^2 = 784


###Functions


In [16]:
def add(a,b):
    return a+b
print add(10,20)

30


In [15]:
def power(a,b=2): #default parameter values
    return a**b

print power(3,4)
print power(3)

81
9


In [30]:
def client(host='localhost',port=80,username='hugo',cert_file='my_cert.pem'):
    return "Connecting user %s to host %s, port %d with password %s"%(username,
                                                                        host,
                                                                        port,
                                                                        cert_file)  


In [31]:
client() #using all the default values

'Connecting user hugo to host localhost, port 80 with password my_cert.pem'

In [32]:
client(cert_file='./certs/remote_machine.pem',username='john') #using the arguments names

'Connecting user john to host localhost, port 80 with password ./certs/remote_machine.pem'

In [33]:
client('163.118.78.22',23) #using the argument order

'Connecting user hugo to host 163.118.78.22, port 23 with password my_cert.pem'

##Some functional features
####Lambda expressions

In [40]:
power2 = lambda x: x**2

In [54]:
power2(9)

81

###Map

In [56]:
map(power2,range(10))

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

### Filter

In [57]:

filter(lambda x: x%2 == 0, range(20))

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

###List comprehension

In [52]:
[i**2 for i in range(20)]

100 loops, best of 3: 10.1 ms per loop


In [58]:
[i for i in range(20) if i%2 == 0]

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

In [150]:
[(i,j)  for i in range(2,10) for j in range(2,10) if i%j==0]

[(2, 2),
 (3, 3),
 (4, 2),
 (4, 4),
 (5, 5),
 (6, 2),
 (6, 3),
 (6, 6),
 (7, 7),
 (8, 2),
 (8, 4),
 (8, 8),
 (9, 3),
 (9, 9)]

### Reading files
 * Create a directory called `data` in the current directory 
 * Download the data file http://goo.gl/IKmRnz to the `data` directory
 
 


In [231]:
raw_data = open("./data/iris.txt","r")
rows = [ i.split(',') for i in raw_data]
rows[:3]
raw_data.seek(0)

In [217]:
data = [[float(i) if "." in i else i for i in row] for row in rows]
data[:4]

[[5.1, 3.5, 1.4, 0.2, 'Iris-setosa\n'],
 [4.9, 3.0, 1.4, 0.2, 'Iris-setosa\n'],
 [4.7, 3.2, 1.3, 0.2, 'Iris-setosa\n'],
 [4.6, 3.1, 1.5, 0.2, 'Iris-setosa\n']]

### One-liner

In [232]:
data = [[float(i) if "." in i else i.strip() for i in row.split(",")] for row in raw_data]

![doge](./img/doge.png)

## Practice
 #### 1 - Write a ``read_csv`` function that receives a filename and a delimiter and return a 2D list with the data using:
    1.1 For loops
    1.2 List comprehesion
```python
def read_csv(filename,delimiter=','):
    ...
    return data
```
