# Important python libraries

 - `numpy` (numerics) + `scipy` (scientific functions) 
 - `matplotlib` - plotting
 - `astropy` - convenient operations on data for Data Science  (`pandas` is another alternative) 
 - `scikit-learn` - machine learning
 
We'll meet them very soon during the ML session.

## Hello numpy!

`numpy` is the core of scientific python. It is the most convenient way to organize number-crunching in python.

In [1]:
import numpy

In [2]:
x = numpy.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [3]:
x.reshape(5, 2)

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

In [4]:
x.reshape(2, 5)

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [5]:
# slicing has the same logic for lists / strings / tuples / numpy, etc
x[:4]

array([0, 1, 2, 3])

In [6]:
print x[:3]
print x[3:7]
print x[7:]

[0 1 2]
[3 4 5 6]
[7 8 9]


### Vector operations

In [7]:
x = numpy.arange(10 ** 6)
# vector operations do similar task for each element. In this case each element is multiplied by 3 and 12 added.
3 * x + 12.

array([  1.20000000e+01,   1.50000000e+01,   1.80000000e+01, ...,
         3.00000300e+06,   3.00000600e+06,   3.00000900e+06])

In [8]:
# use timing magic to understand this is quite fast
%timeit 3 * x + 12.

The slowest run took 4.63 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 7.44 ms per loop


In [9]:
Z = numpy.arange(15).reshape(5, 3)
Z

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [10]:
numpy.log(numpy.exp(Z)) # type conversion happened

array([[  0.,   1.,   2.],
       [  3.,   4.,   5.],
       [  6.,   7.,   8.],
       [  9.,  10.,  11.],
       [ 12.,  13.,  14.]])

In [11]:
Z += 4

In [12]:
Z

array([[ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12],
       [13, 14, 15],
       [16, 17, 18]])

In [13]:
Z[::2, :]

array([[ 4,  5,  6],
       [10, 11, 12],
       [16, 17, 18]])

In [14]:
Z[[0, 2, 4], :]

array([[ 4,  5,  6],
       [10, 11, 12],
       [16, 17, 18]])

In [15]:
Z.sum(axis=1)

array([15, 24, 33, 42, 51])

In [16]:
# axes are also zero-numerated
Z.sum(axis=0)

array([50, 55, 60])

In [17]:
Z.max(axis=1)

array([ 6,  9, 12, 15, 18])

In [18]:
Z2 = - Z
Z2 = numpy.sort(Z2, axis=1)
Z2

array([[ -6,  -5,  -4],
       [ -9,  -8,  -7],
       [-12, -11, -10],
       [-15, -14, -13],
       [-18, -17, -16]])

## Indexing with boolean array

In [19]:
x = numpy.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [20]:
x > 3

array([False, False, False, False,  True,  True,  True,  True,  True,  True], dtype=bool)

In [21]:
x[x < 7.4]

array([0, 1, 2, 3, 4, 5, 6, 7])

## Copies

Many operations in numpy don't create copies, but operate with the same memory 

In [22]:
x = numpy.arange(10)
y = x[:5]

print x, y
y[0] = 10
print x, y

[0 1 2 3 4 5 6 7 8 9] [0 1 2 3 4]
[10  1  2  3  4  5  6  7  8  9] [10  1  2  3  4]


this happened because x and y point __to the same place in memory__

In [23]:
x = numpy.arange(10)
y = x[:5].copy()
print x, y
y[0] = 10
print x, y

[0 1 2 3 4 5 6 7 8 9] [0 1 2 3 4]
[0 1 2 3 4 5 6 7 8 9] [10  1  2  3  4]


## Random numbers

module `numpy.random` helps with generating random numbers

In [24]:
# generating 10000 random numbers at once
numpy.random.normal(loc=2, scale=12, size=10000)

array([ 17.76707519,   3.80915936,  31.57415017, ...,  -1.01854039,
        -3.40413227,  -0.72928708])

## Sorting

In [25]:
x = numpy.random.random(size=1000)
x = numpy.sort(x)

In [26]:
print x[:10]
print x[-10:]

[ 0.00055514  0.00121122  0.00254052  0.00380894  0.00423944  0.00471339
  0.00752337  0.00905938  0.00925372  0.00933808]
[ 0.98876922  0.98915809  0.98944017  0.99220378  0.9928575   0.99307615
  0.99723381  0.99821899  0.9992816   0.99959361]


## Arg...

arg-functions allow writing non-trivial operations with a couple of lines

In [27]:
# random.random generates uniform in [0, 1]
random_numbers = numpy.random.random(size=1000)
indices = numpy.argsort(random_numbers)

In [28]:
numpy.alltrue(random_numbers[indices] == numpy.sort(random_numbers))

True

In [29]:
indices[:10]

array([  8,  86, 419, 790, 735, 315, 396, 244, 717, 982])

In [30]:
random_numbers.min(), random_numbers.max()

(0.00012309600182414115, 0.99902336186425167)

In [31]:
random_numbers.argmax(), random_numbers[random_numbers.argmax()]

(458, 0.99902336186425167)

In [32]:
random_numbers.argmin(), random_numbers[random_numbers.argmin()]

(8, 0.00012309600182414115)

## Exercise

In [33]:
# 0. import numpy
import numpy as np

In [36]:
# 1. sample 1000 elements from normal distribution 
arr=np.random.normal(size=1000)
arr

array([  4.17288094e-02,   1.86405786e+00,   1.55083517e+00,
         1.38650966e+00,  -5.22391141e-01,  -3.01203130e-01,
         5.98477194e-02,   2.43227826e+00,   5.15554143e-02,
         5.35838100e-01,  -1.13503933e+00,   5.73630088e-02,
        -5.13531520e-01,  -4.93819906e-01,   8.48248261e-01,
         1.84707689e+00,   5.65066782e-01,   1.13889814e+00,
        -7.71664501e-01,  -6.50440939e-02,   2.23272607e+00,
        -9.02620616e-01,   1.34178896e+00,  -9.48914189e-01,
         1.80211405e+00,  -6.02078669e-01,   1.04822101e+00,
         5.91598270e-01,  -7.70828132e-01,   1.60885973e+00,
        -1.08793468e-01,   1.14169675e+00,  -5.20442656e-01,
         4.19429089e-01,  -1.83182115e+00,   3.89417144e-01,
        -5.83110877e-01,  -8.41171087e-01,  -4.13722063e-01,
         1.07694222e+00,   1.46522868e-01,   3.57895849e-01,
         4.53164747e-01,   1.14700630e+00,  -5.88797519e-01,
         1.87316610e-03,  -2.01638914e+00,   2.25468514e+00,
        -5.59494062e-01,

In [39]:
# 2. leave only positive numbers (from previous exercise)
arr=arr[arr>0]
arr

array([  4.17288094e-02,   1.86405786e+00,   1.55083517e+00,
         1.38650966e+00,   5.98477194e-02,   2.43227826e+00,
         5.15554143e-02,   5.35838100e-01,   5.73630088e-02,
         8.48248261e-01,   1.84707689e+00,   5.65066782e-01,
         1.13889814e+00,   2.23272607e+00,   1.34178896e+00,
         1.80211405e+00,   1.04822101e+00,   5.91598270e-01,
         1.60885973e+00,   1.14169675e+00,   4.19429089e-01,
         3.89417144e-01,   1.07694222e+00,   1.46522868e-01,
         3.57895849e-01,   4.53164747e-01,   1.14700630e+00,
         1.87316610e-03,   2.25468514e+00,   3.06010657e-01,
         2.29554233e-01,   1.37331002e+00,   2.90600553e-01,
         1.01205427e+00,   9.13310063e-01,   1.16377607e+00,
         5.64077102e-01,   8.53734054e-02,   2.30688093e+00,
         1.92886948e+00,   1.54849485e-02,   4.26757632e-01,
         2.15125835e-02,   6.96560757e-03,   2.54881012e-01,
         2.10541678e-01,   3.68119303e-01,   9.50158042e-01,
         2.08935890e-01,

In [41]:
# 3. count number of left numbers, their minimum, maximum, mean and variance.
result= len(arr), arr.min(), arr.max(), arr.mean(), arr.std()**2
result

(512,
 0.0018731661048011976,
 3.1883412781584028,
 0.80018587635496052,
 0.37459670566291214)

## References:
* `numpy` documentation: https://docs.scipy.org/doc/numpy/reference/
    * almost any question about `numpy` is already answered on stackoverflow
* [From python to numpy: a beautiful book about numpy](https://github.com/rougier/from-python-to-numpy)
* Data manipulation with `numpy`: tips and tricks [part1](http://arogozhnikov.github.io/2015/09/29/NumpyTipsAndTricks1.html), [part2](http://arogozhnikov.github.io/2015/09/30/NumpyTipsAndTricks2.html)
