## **Why Python?**

#### General characteristics of Python:

* **clean and simple language:** Easy-to-read and intuitive code, easy-to-learn syntax.
* **expressive language:** Fewer lines of code, fewer bugs, easier to maintain.
* **popularity:** Very popular, especially among academics.

<img src="../images/google_trends.jpg" width=400 height=400> 

## **Jupyter Notebook**

* Powerful tool for developing and presenting data science projects. 
* Integrates code and its output into a single document. 
* Combines visualisations, text, mathematical equations, and other media. 
* Can be shared by via notebook or html file making it great for collaborations
* Supports most of the popular programming languages
* Open Source

<img src="../images/jupyter_shorts.png" width=400 height=400> 

Code in each cell can be ran individually, but variables, functions, and etc. are maintained throughout the kernal  

In [58]:
a = 5
b = 2.0
c = "var"

In [59]:
print a + b, c

7.0 var


Do not have to declare variables

In [60]:
type(a), type(b), type(c)

(int, float, str)

But becareful when divding, integers will be rounded.

In [61]:
7/2

3

In [62]:
7.0/2, 7/2.0

(3.5, 3.5)

<img src="../images/Arithmetic_operators.png" width=400 height=400> 

## **Numpy**

In [33]:
import numpy as np #import the module

There are four collection data types in the Python programming language:

* **List** is a collection which is ordered and changeable. Allows duplicate members.
* **Tuple** is a collection which is ordered and unchangeable. Allows duplicate members.
* **Set** is a collection which is unordered and unindexed. No duplicate members.
* **Dictionary** is a collection which is unordered, changeable and indexed. No duplicate members.

You can index in Python using **slice notation**

In [34]:
lst = list(range(1000))
lst[:10]

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [35]:
lst[2:12]

[2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

In [37]:
lst[10:26:2]

[10, 12, 14, 16, 18, 20, 22, 24]

In [38]:
lst[-1]

999

In [39]:
lst[-10:]

[990, 991, 992, 993, 994, 995, 996, 997, 998, 999]

Lists are slow, numpy being built in a language closer to the hardware makes it faster for array operations

In [45]:
arr = np.arange(1000)
arr[:10]

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

We can compare speeds for squaring all elements in the list vs array.

In [42]:
%timeit [i ** 2 for i in lst]

The slowest run took 12.91 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 50.3 µs per loop


In [43]:
%timeit arr ** 2

The slowest run took 41.12 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 899 ns per loop


## Array creation with numpy

In [50]:
np.zeros(10, dtype=float)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [51]:
np.zeros(10, dtype=int)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [52]:
np.zeros(10, dtype=complex)

array([0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j,
       0.+0.j, 0.+0.j])

In [53]:
np.ones(10, dtype=float)

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

In [54]:
np.linspace(0, 1, num=5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [None]:
np.logspace(1, 4, num=4)

Generating arrays with random numbers

In [56]:
np.random.randn(5)

array([ 2.1319502 , -1.67159136,  0.22428274,  0.18943046,  0.62340621])

In [57]:
norm10 = np.random.normal(loc=9, scale=3, size=10)
norm10

array([ 9.05629743,  9.69192769,  7.31719416, 10.44477208,  9.10006243,
       10.81279822,  9.2971247 , 10.02187846,  2.81872823, 14.5167189 ])

## Index Arrays

We showed how to index with numbers and **slices**

NumPy indexing is much more powerful than Python indexing. For example, you can index with other arrays:
  * Boolean arrays
  * Integer arrays

Consider for example that in the array `norm10` we want to replace all values above 9 with the value 0.  We can do so by first finding the *mask* that indicates where this condition is `True` or `False`:

In [63]:
mask = norm10 > 9
mask

array([ True,  True, False,  True,  True,  True,  True,  True, False,
        True])

In [65]:
norm10[mask]

array([ 9.05629743,  9.69192769, 10.44477208,  9.10006243, 10.81279822,
        9.2971247 , 10.02187846, 14.5167189 ])

### Integer Indexing

In [66]:
norm10[[1, 4, 6]]

array([9.69192769, 9.10006243, 9.2971247 ])

### Asssignment

In [68]:
norm10[norm10 > 9] = 0
norm10

array([0.        , 0.        , 7.31719416, 0.        , 0.        ,
       0.        , 0.        , 0.        , 2.81872823, 0.        ])

In [69]:
norm10[[1, 4, 7]] = 10
norm10

array([ 0.        , 10.        ,  7.31719416,  0.        , 10.        ,
        0.        ,  0.        , 10.        ,  2.81872823,  0.        ])

## Looping in Python

In [None]:
#while loop

In [None]:
#for loop

### matplotlib

### Pandas

### Reading in data files

In [None]:
#table

In [None]:
#csv with bad header

In [None]:
#csv with fix

### igraph check

### References