# Important python libraries

 - `numpy` (numerics) + `scipy` (scientific functions) 
 - `matplotlib` - plotting
 - `astropy` - convenient operations on data for Data Science  (`pandas` is another alternative) 
 - `scikit-learn` - machine learning
 
We'll meet them very soon during the ML session.

## Hello numpy!

`numpy` is the core of scientific python. It is the most convenient way to organize number-crunching in python.

In [1]:
import numpy

In [2]:
x = numpy.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [3]:
x.reshape(5, 2)

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

In [4]:
x.reshape(2, 5)

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [5]:
# slicing has the same logic for lists / strings / tuples / numpy, etc
x[:4]

array([0, 1, 2, 3])

In [6]:
print x[:3]
print x[3:7]
print x[7:]

[0 1 2]
[3 4 5 6]
[7 8 9]


### Vector operations

In [7]:
x = numpy.arange(10 ** 6)
# vector operations do similar task for each element. In this case each element is multiplied by 3 and 12 added.
3 * x + 12.

array([  1.20000000e+01,   1.50000000e+01,   1.80000000e+01, ...,
         3.00000300e+06,   3.00000600e+06,   3.00000900e+06])

In [4]:
# use timing magic to understand this is quite fast
%timeit 3 * x + 12.

The slowest run took 23.26 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.76 µs per loop


In [11]:
Z = numpy.arange(15).reshape(5, 3)
Z

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [17]:
numpy.log(numpy.exp(Z)) # type conversion happened

array([[  0.,   1.,   2.],
       [  3.,   4.,   5.],
       [  6.,   7.,   8.],
       [  9.,  10.,  11.],
       [ 12.,  13.,  14.]])

In [11]:
Z += 4

In [12]:
Z

array([[ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12],
       [13, 14, 15],
       [16, 17, 18]])

In [13]:
Z[::2, :]

array([[ 4,  5,  6],
       [10, 11, 12],
       [16, 17, 18]])

In [14]:
Z[[0, 2, 4], :]

array([[ 4,  5,  6],
       [10, 11, 12],
       [16, 17, 18]])

In [15]:
Z.sum(axis=1)

array([15, 24, 33, 42, 51])

In [16]:
# axes are also zero-numerated
Z.sum(axis=0)

array([50, 55, 60])

In [17]:
Z.max(axis=1)

array([ 6,  9, 12, 15, 18])

In [18]:
Z2 = - Z
Z2 = numpy.sort(Z2, axis=1)
Z2

array([[ -6,  -5,  -4],
       [ -9,  -8,  -7],
       [-12, -11, -10],
       [-15, -14, -13],
       [-18, -17, -16]])

## Indexing with boolean array

In [19]:
x = numpy.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [20]:
x > 3

array([False, False, False, False,  True,  True,  True,  True,  True,  True], dtype=bool)

In [21]:
x[x < 7.4]

array([0, 1, 2, 3, 4, 5, 6, 7])

## Copies

Many operations in numpy don't create copies, but operate with the same memory 

In [22]:
x = numpy.arange(10)
y = x[:5]

print x, y
y[0] = 10
print x, y

[0 1 2 3 4 5 6 7 8 9] [0 1 2 3 4]
[10  1  2  3  4  5  6  7  8  9] [10  1  2  3  4]


this happened because x and y point __to the same place in memory__

In [23]:
x = numpy.arange(10)
y = x[:5].copy()
print x, y
y[0] = 10
print x, y

[0 1 2 3 4 5 6 7 8 9] [0 1 2 3 4]
[0 1 2 3 4 5 6 7 8 9] [10  1  2  3  4]


## Random numbers

module `numpy.random` helps with generating random numbers

In [24]:
# generating 10000 random numbers at once
numpy.random.normal(loc=2, scale=12, size=10000)

array([ 11.78220635,   8.93663103, -17.50627085, ...,  17.10257018,
         7.24243523,  -0.95099759])

## Sorting

In [25]:
x = numpy.random.random(size=1000)
x = numpy.sort(x)

In [26]:
print x[:10]
print x[-10:]

[ 0.00021619  0.00066141  0.0022386   0.00231872  0.00866861  0.00889664
  0.01014405  0.01216177  0.01273415  0.01466487]
[ 0.99120831  0.99250643  0.99419286  0.99490179  0.99550734  0.99653262
  0.99681669  0.99848937  0.99872478  0.99887051]


## Arg...

arg-functions allow writing non-trivial operations with a couple of lines

In [18]:
# random.random generates uniform in [0, 1]
random_numbers = numpy.random.random(size=1000)
indices = numpy.argsort(random_numbers)
print(indices)

[  2 258 753  87 686 616 182 338 471 987 860 324 782 750 549 434 236 242
  35 169 760 187 419 918 417 921 867 965 508 198 106 190 143 879 367 801
 149 113  92 437 400 534   8 998 479 784 882 393 922 547 292 518 955 157
 887 370 423 586 897  69 643 980 301 502 445 432  91 961 220  89 574 718
 661 517 763 226 341 331 477 475  95  21 868 596 890 708 978 847 614 330
 499 985 505 797 896 649 904 850 907 613 532 234 257 668 776 489 108 140
 716 958 742 931 838 153 405 623 977 821 461 885 318 166 412 706 349 582
 780 299 959 497 559  71 520 632 994 578 843 251 188 690 386 184 639 515
 320 773  88 281 738 222 659 912 761 135 608 132 593 512 869 129 403 826
 107 739 411 551 491 853 569 667 581 823 435 863 170   1 256 252 468 335
 346 235 250  84 134 365 288 213 558  54  79 501  11 317 930 615 719 564
 713 209 883 216 803 731 388 427 741 793 117 128 942 997 319 156 430 340
 560  39 798 544 306  78 802 781 358 375 794  74 289 840 555 541 120 660
 701 509 112 932 910 830 390 628 633 855 355 775 73

In [28]:
numpy.alltrue(random_numbers[indices] == numpy.sort(random_numbers))

True

In [29]:
indices[:10]

array([794,   2, 719, 257, 385, 486, 763, 772, 554, 559])

In [30]:
random_numbers.min(), random_numbers.max()

(0.0004961517756608691, 0.99889210024516806)

In [31]:
random_numbers.argmax(), random_numbers[random_numbers.argmax()]

(219, 0.99889210024516806)

In [32]:
random_numbers.argmin(), random_numbers[random_numbers.argmin()]

(794, 0.0004961517756608691)

## Exercise

In [19]:
# 0. import numpy
import numpy as np

In [22]:
# 1. sample 1000 elements from normal distribution 
randoms = numpy.random.normal(size=1000)

In [24]:
# 2. leave only positive numbers (from previous exercise)
pos_randoms = randoms[randoms > 0]
pos_randoms

array([ 0.24695829,  0.48101415,  0.92848437,  0.15255527,  0.56416802,
        0.68540697,  2.2567756 ,  0.53095245,  0.94147656,  0.26112885,
        0.30673373,  1.54968123,  1.02462386,  1.25821803,  0.12199264,
        0.22445825,  0.50538239,  0.04581875,  2.18347911,  0.89605758,
        1.34097015,  0.38048917,  0.99759762,  1.72485918,  0.34917694,
        0.8176039 ,  0.41604474,  0.34592906,  1.3364683 ,  1.20032671,
        1.89896368,  0.79350388,  0.12880059,  0.00999192,  0.62913883,
        0.21834002,  1.11015997,  0.1044611 ,  1.08929949,  0.64450511,
        1.40184608,  0.68940736,  0.28476806,  0.92026848,  0.62990837,
        0.82637544,  0.83746358,  0.90156114,  0.80454512,  1.82197291,
        0.26836023,  1.93511102,  0.97371773,  1.3927284 ,  0.80796658,
        0.53937145,  1.85220072,  0.1766439 ,  1.44010694,  0.73461231,
        0.95042409,  1.50302406,  1.16908554,  0.45173865,  1.16164782,
        1.41076417,  1.03978833,  0.55980909,  0.16469978,  1.64

In [25]:
# 3. count number of left numbers, their minimum, maximum, mean and variance.
neg_randoms = randoms[randoms < 0]
print("Left numbers: {}".format(len(neg_randoms)))
print("Min: {}".format(neg_randoms.min()))
print("Max: {}".format(neg_randoms.max()))
print("Mean: {}".format(neg_randoms.mean()))
print("Var: {}".format(neg_randoms.var()))

Left numbers: 506
Min: -3.52714261539
Max: -0.00125959683035
Mean: -0.82235377202
Var: 0.388022230119


## References:
* `numpy` documentation: https://docs.scipy.org/doc/numpy/reference/
    * almost any question about `numpy` is already answered on stackoverflow
* [From python to numpy: a beautiful book about numpy](https://github.com/rougier/from-python-to-numpy)
* Data manipulation with `numpy`: tips and tricks [part1](http://arogozhnikov.github.io/2015/09/29/NumpyTipsAndTricks1.html), [part2](http://arogozhnikov.github.io/2015/09/30/NumpyTipsAndTricks2.html)
