# Important python libraries

 - `numpy` (numerics) + `scipy` (scientific functions) 
 - `matplotlib` - plotting
 - `astropy` - convenient operations on data for Data Science  (`pandas` is another alternative) 
 - `scikit-learn` - machine learning
 
We'll meet them very soon during the ML session.

## Hello numpy!

`numpy` is the core of scientific python. It is the most convenient way to organize number-crunching in python.

In [1]:
import numpy

In [2]:
x = numpy.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [3]:
x.reshape(5, 2)

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

In [4]:
x.reshape(2, 5)

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [5]:
# slicing has the same logic for lists / strings / tuples / numpy, etc
x[:4]

array([0, 1, 2, 3])

In [7]:
print (x[:3])
print (x[3:7])
print (x[7:])

[0 1 2]
[3 4 5 6]
[7 8 9]


### Vector operations

In [9]:
x = numpy.arange(10 ** 6)
# vector operations do similar task for each element. In this case each element is multiplied by 3 and 12 added.
3 * x + 12.

array([  1.20000000e+01,   1.50000000e+01,   1.80000000e+01, ...,
         3.00000300e+06,   3.00000600e+06,   3.00000900e+06])

In [10]:
# use timing magic to understand this is quite fast
%timeit 3 * x + 12.

100 loops, best of 3: 3.03 ms per loop


In [11]:
Z = numpy.arange(15).reshape(5, 3)
Z

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [12]:
numpy.log(numpy.exp(Z)) # type conversion happened

array([[  0.,   1.,   2.],
       [  3.,   4.,   5.],
       [  6.,   7.,   8.],
       [  9.,  10.,  11.],
       [ 12.,  13.,  14.]])

In [13]:
Z += 4

In [14]:
Z

array([[ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12],
       [13, 14, 15],
       [16, 17, 18]])

In [15]:
Z[::2, :]

array([[ 4,  5,  6],
       [10, 11, 12],
       [16, 17, 18]])

In [16]:
Z[[0, 2, 4], :]

array([[ 4,  5,  6],
       [10, 11, 12],
       [16, 17, 18]])

In [20]:
Z.sum(axis=1)

array([15, 24, 33, 42, 51])

In [21]:
# axes are also zero-numerated
Z.sum(axis=0)

array([50, 55, 60])

In [22]:
Z.max(axis=1)

array([ 6,  9, 12, 15, 18])

In [23]:
Z2 = - Z
Z2 = numpy.sort(Z2, axis=1)
Z2

array([[ -6,  -5,  -4],
       [ -9,  -8,  -7],
       [-12, -11, -10],
       [-15, -14, -13],
       [-18, -17, -16]])

## Indexing with boolean array

In [24]:
x = numpy.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [25]:
x > 3

array([False, False, False, False,  True,  True,  True,  True,  True,  True], dtype=bool)

In [26]:
x[x < 7.4]

array([0, 1, 2, 3, 4, 5, 6, 7])

## Copies

Many operations in numpy don't create copies, but operate with the same memory 

In [22]:
x = numpy.arange(10)
y = x[:5]

print x, y
y[0] = 10
print x, y

[0 1 2 3 4 5 6 7 8 9] [0 1 2 3 4]
[10  1  2  3  4  5  6  7  8  9] [10  1  2  3  4]


this happened because x and y point __to the same place in memory__

In [23]:
x = numpy.arange(10)
y = x[:5].copy()
print x, y
y[0] = 10
print x, y

[0 1 2 3 4 5 6 7 8 9] [0 1 2 3 4]
[0 1 2 3 4 5 6 7 8 9] [10  1  2  3  4]


## Random numbers

module `numpy.random` helps with generating random numbers

In [24]:
# generating 10000 random numbers at once
numpy.random.normal(loc=2, scale=12, size=10000)

array([ 11.78220635,   8.93663103, -17.50627085, ...,  17.10257018,
         7.24243523,  -0.95099759])

## Sorting

In [27]:
x = numpy.random.random(size=1000)
x = numpy.sort(x)

In [29]:
print (x[:10])
print (x[-10:])

[ 0.00270847  0.00351284  0.0039025   0.0042111   0.00434205  0.00706188
  0.00752606  0.01037089  0.01305716  0.01738311]
[ 0.99073867  0.99147926  0.99172226  0.99405947  0.99412325  0.99413127
  0.99526191  0.99577347  0.99712885  0.99901777]


## Arg...

arg-functions allow writing non-trivial operations with a couple of lines

In [30]:
# random.random generates uniform in [0, 1]
random_numbers = numpy.random.random(size=1000)
indices = numpy.argsort(random_numbers)

In [31]:
numpy.alltrue(random_numbers[indices] == numpy.sort(random_numbers))

True

In [34]:
indices[:10]

array([519, 290, 143, 525, 499, 156, 615, 145, 496, 594])

In [45]:
random_numbers.min(), random_numbers.max()

(9.6967623511412526e-05, 0.99986754268274269)

In [31]:
random_numbers.argmax(), random_numbers[random_numbers.argmax()]

(219, 0.99889210024516806)

In [32]:
random_numbers.argmin(), random_numbers[random_numbers.argmin()]

(794, 0.0004961517756608691)

## Exercise

In [36]:
# 0. import numpy
import numpy as np

In [50]:
# 1. sample 1000 elements from normal distribution 
rnd=np.random.normal(loc=0, scale=10, size=1000)
rnd

array([ -1.55665743e+01,  -5.31957089e+00,  -1.09956522e+01,
        -2.44947509e+00,   3.28313023e+00,  -9.84714911e-01,
        -2.65456635e+00,  -3.38807378e+00,  -1.30929412e+01,
        -1.06094716e+01,  -8.96151042e-01,   2.43460278e-01,
        -5.62457451e+00,   7.12115019e-01,   4.35269780e+00,
         1.12577553e+01,  -3.61013319e-01,  -1.41103889e+01,
        -1.20360347e+01,  -7.88726769e+00,  -1.12685745e+01,
        -1.98781113e+01,   1.22385340e+01,  -5.20498800e+00,
         8.68194454e-01,   6.59169369e+00,   1.16018028e+01,
        -2.29754088e+00,   6.15431395e+00,  -8.95466667e+00,
        -1.23564648e+01,  -8.83370365e+00,   1.08214337e+01,
        -1.34880025e+01,  -1.50045982e+01,  -1.09197861e+01,
        -9.16680604e+00,   3.64091018e+00,  -1.67846960e+01,
         1.28853611e+01,  -1.09207657e+01,  -7.13959279e+00,
         1.08156697e+01,  -7.00235982e+00,   4.72504438e+00,
        -5.77839564e+00,   8.21246272e+00,  -1.20073094e+01,
        -9.45677114e+00,

In [51]:
# 2. leave only positive numbers (from previous exercise)
pos_rnd = rnd[ rnd > 0 ]
print(pos_rnd)

[  3.28313023   0.24346028   0.71211502   4.3526978   11.25775532
  12.23853398   0.86819445   6.59169369  11.60180276   6.15431395
  10.82143374   3.64091018  12.88536106  10.81566973   4.72504438
   8.21246272  10.86113141   6.01177999   4.61994897   8.91663121
   4.07902536  10.18486611   1.44625488   2.81373076   9.90962353
  11.03608221  14.22482546   3.98624206  15.07039164  10.3660514
   8.68375782   8.22725766  10.32987248   0.41774051   2.06155638
   6.67922837  17.69907876  11.03028311   3.49247545   1.73956279
  17.48742474   8.08817931  17.81144005   1.55924914   3.70822024
  12.83521249  14.2975844    1.46235191   7.98226227   3.52455696
   1.31749443   5.06442723  14.88312469  10.02052005   2.37627671
   5.71489783   9.1117672    3.4883678    6.83949029   5.07967921
  13.54076496   1.21896909   9.07543763  11.2901521    8.14830641
   1.78190101   2.27084061   1.10633455   2.53883424   7.10560571
   9.28544063   3.51459861   0.8321405    7.29893122   5.36543565
   9.559596

In [52]:
# 3. count number of left numbers, their minimum, maximum, mean and variance.
print(pos_rnd.size)
print(pos_rnd.min())
print(pos_rnd.max())
print (pos_rnd.mean())
print (pos_rnd.var())

506
0.0290646679718
24.9286581069
7.56097955527
32.0145001868


## References:
* `numpy` documentation: https://docs.scipy.org/doc/numpy/reference/
    * almost any question about `numpy` is already answered on stackoverflow
* [From python to numpy: a beautiful book about numpy](https://github.com/rougier/from-python-to-numpy)
* Data manipulation with `numpy`: tips and tricks [part1](http://arogozhnikov.github.io/2015/09/29/NumpyTipsAndTricks1.html), [part2](http://arogozhnikov.github.io/2015/09/30/NumpyTipsAndTricks2.html)
