# Python

## Tasks for those who "feel like a pro":
### TASK 1
Write the code to enumerate items in the list:
* items are not ordered
* items are not unique
* don't use loops
* try to be as short as possible (not considering import statements)

Example:

*Input*

    items = ['foo', 'bar', 'baz', 'foo', 'baz', 'bar']
    
*Output*
    #something like:
    [0, 1, 2, 0, 2, 1]
### TASK 2
For each element in a list [0, 1, 2, ..., N] build all possible pairs with other elements of that list.
exclude "self-pairing" (e.g. 0-0, 1-1, 2-2)
* don't use loops
* try to be as short as possible (not considering import statements)

Example:

*Input*:

    [0, 1, 2, 3] or just 4
    
*Output*:

    0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3

    1, 2, 3, 0, 2, 3, 0, 1, 3, 0, 1, 2

## Python vs MATLAB
MATLAB is a well-known tool for doing numerical linear algebra computations, but it has many disadvantages.
If you are an experienced MATLAB user, then you can look at [Numpy for MATLAB users](http://scipy.github.io/old-wiki/pages/NumPy_for_Matlab_Users).

* Indexing from 0
* [ ] instead of ( ) for array element
* Python is a powerful and flexible programming language :)

## Style guides
PEP8 (PEP = Python Enhancement Proposal) https://www.python.org/dev/peps/pep-0008/

### Notebook structure

* Notebook consists of cells of different types
* Code in the cells is executed by pressing Shift + Enter
* To add a cell, use Insert in the menu
* To edit the cell, double-click on it

### Important points

* Notebooks are good for short snipplets of the code
* Big codes are typically written in a separate .py files and used as packages
* It is a good idea to use Notebooks for the presentation of the final results of your computation
* Notebooks support $\LaTeX$
* It is good idea to test your scripts in jupyter since you need to load the data only once

In [None]:
%%time

a = []

for i in range(10000):
    a.append(i)

In [None]:
%%time

a = [None] * 10000

for i in range(10000):
    a[i] = i

###  Magic functions

The console and the notebooks support so-called magic functions by prefixing a command with the % character. Note that, the setting automagic, which is enabled by default, allows you to omit the preceding % sign. Thus, you can just type the magic function and it will work.

In [None]:
%ls

In [None]:
ls

In [None]:
!ls

In [None]:
%pwd

In [None]:
pwd

In [None]:
!pwd

In [None]:
%timeit x = 10

In [None]:
%time x = 10

In [None]:
%%timeit

list(range(1000))

### important packages

You have to import several basic packages to start doing basic stuff for linear algebra and plotting.
There are a lot of useful Python packages in the world for almost every task: always look on the web
before writing your own code!

Basic libraries:
* numpy (for doing matrix & vector stuff)
* matplotlib (for doing plotting), prettyplotlib (for nice plotting)
* scipy (a huge math library)

In [None]:
# virtual env instalations only
# in anaconda these packages are already installed
! pip install numpy matplotlib scipy pandas theano tensorflow

In [None]:
# conventions to import these packages are:
import numpy as np
import matplotlib.pyplot as plt 
import scipy as sp

In [None]:
import this

### Numpy

Python objects

* high-level number objects: integers, floating point
* containers: lists(cost less insertion and append), dictionaries(fast lookup)

NumPy provides

* extension package to Python for multi-dimensional arrays
* closertohardware(efficiency)
* designed for scientific computation
* also known as array oriented computing

In [None]:
import numpy as np

a = np.array([0, 1, 2, 3])

Try to forget about loops when possible.
Why ?

In [None]:
%%timeit

result = [None] * 1000

for i in range(1000):
    result[i] = i**2

In [None]:
%%timeit

a = np.arange(1000)

result = a**2

#### statistics

In [None]:
a = np.random.random(size=100)
plt.plot(a)

In [None]:
a.mean()

In [None]:
a.min()

In [None]:
a.var()

In [None]:
a.max()

In [None]:
a = np.array([1, 2, 3, 4])
print(a.min(), a.max(), a.mean(), a.var())

### Matplotlib

In [None]:
# Make plots to be drawn in the notebook
%matplotlib inline

In [None]:
x = np.linspace(0, 2 * np.pi)
plt.plot(x, np.sin(x), label='sin(x)')
plt.ylabel('y')
plt.xlabel('x')
plt.legend(loc='best')
plt.title('The plot')
plt.show()

In [None]:
import scipy.misc

In [None]:
plt.imshow(sp.misc.face())
plt.colorbar()

In [None]:
plt.imshow(sp.misc.face(gray=True), cmap=plt.cm.Greys_r)
plt.colorbar()

## Solutions for advanced tasks
### Task 1

In [None]:
items = ['foo', 'bar', 'baz', 'foo', 'baz', 'bar']

method 1

In [None]:
from collections import defaultdict

item_ids = defaultdict(lambda: len(item_ids))
list(map(item_ids.__getitem__, items))

method 2

In [None]:
import pandas as pd

pd.DataFrame({'items': items}).groupby('items', sort=False).grouper.group_info[0]

method 3

In [None]:
import numpy as np

np.unique(items, return_inverse=True)[1]

method 4

In [None]:
last = 0
counts = {}
result = []
for item in items:
    try:
        count = counts[item]
    except KeyError:
        counts[item] = count = last
        last += 1
    result.append(count)

result

### Task 2

In [None]:
N = 1000

In [None]:
from itertools import permutations

%timeit list(permutations(range(N), 2))

#  Some interesting python and tf problems

In [None]:
import numpy as np

def sum_squares(N):
    t = np.arange(N).astype(np.float64)
    return t.dot(t)

In [None]:
%%time
sum_squares(10**8)

In [None]:
import tensorflow as tf

#I gonna be function parameter
N = tf.placeholder(tf.float64)

#i am a recipe on how to produce sum of squares of arange of N given N
result = tf.tensordot(tf.range(N), tf.range(N), 1)

# i am session
session = tf.Session()

In [None]:
%%time

session.run(result, feed_dict={N: 10**8})

In [None]:
import theano
import theano.tensor as T

input_X = T.scalar()

result = T.arange(input_X).dot(T.arange(input_X))

In [None]:
f = theano.function([input_X], result)

In [None]:
%%time

f(10**8)

## Why graph computations matters ?
* You can compute derivatives and gradients automatically
* Derivatives are computed symbolically, not numerically

In [None]:
my_scalar = tf.placeholder(dtype=tf.float64)

scalar_squared = my_scalar**2

#a derivative of v_squared by my_vector
derivative = tf.gradients(scalar_squared, my_scalar)

fun = lambda x: session.run(scalar_squared, {my_scalar: x})
grad = lambda x: session.run(derivative, {my_scalar: x})

In [None]:
x = np.linspace(-3, 3)
x_squared = fun(x)
x_squared_der = grad(x)[0]

plt.plot(x, x_squared, label='x^2')
plt.plot(x, x_squared_der, label='derivative')
plt.legend(loc='best')

## Why that rocks

In [None]:
my_vector = tf.placeholder(tf.float64)

#Compute the gradient of the next weird function over my_scalar and my_vector
#warning! Trying to understand the meaning of that function may result in permanent brain damage

weird_psychotic_function = tf.reduce_mean((my_vector + my_scalar)**(1 + tf.nn.moments(my_vector, axes=[0])[1]) + 1. / tf.log(my_scalar + tf.sqrt(my_scalar**2 + 1))) / (my_scalar**2 + 1) + 0.01 * tf.sin(2 * my_scalar**1.5) * (tf.reduce_sum(my_vector) * my_scalar**2) * tf.exp((my_scalar - 4)**2) / (1 + tf.exp((my_scalar - 4)**2)) * (1. - (tf.exp( - (my_scalar - 4)**2)) / (1 + tf.exp( - (my_scalar - 4)**2)))**2

der_by_scalar, der_by_vector = #<PhD student.compute_grad_over_scalar_and_vector()>

compute_weird_function = lambda x, y: session.run(weird_psychotic_function, {my_scalar : x, my_vector : y })
compute_der_by_scalar = lambda x, y: session.run(der_by_scalar, {my_scalar : x, my_vector : y })

In [None]:
#Plotting your derivative
vector_0 = [1, 2, 3]

scalar_space = np.linspace(0, 7)

y = [compute_weird_function(x, vector_0) for x in scalar_space]
plt.plot(scalar_space, y, label='function')
y_der_by_scalar = [compute_der_by_scalar(x, vector_0) for x in scalar_space]
plt.plot(scalar_space, y_der_by_scalar, label='derivative')
plt.grid()
plt.legend(loc='best')

## The price for theano speed

In [None]:
import tensorflow as tf

x = tf.placeholder(tf.float32)
f = x / x

with tf.Session() as session:
    print(session.run(f, {x : 0}))

In [None]:
x = T.scalar()
f = x / x

compute_f = theano.function([x], f)

In [None]:
compute_f(0)