# Important python libraries

 - `numpy` (numerics) + `scipy` (scientific functions) 
 - `matplotlib` - plotting
 - `astropy` - convenient operations on data for Data Science  (`pandas` is another alternative) 
 - `scikit-learn` - machine learning
 
We'll meet them very soon during the ML session.

## Hello numpy!

`numpy` is the core of scientific python. It is the most convenient way to organize number-crunching in python.

In [1]:
import numpy

In [2]:
x = numpy.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [3]:
x.reshape(5, 2)

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

In [4]:
x.reshape(2, 5)

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [5]:
# slicing has the same logic for lists / strings / tuples / numpy, etc
x[:4]

array([0, 1, 2, 3])

In [6]:
print x[:3]
print x[3:7]
print x[7:]

[0 1 2]
[3 4 5 6]
[7 8 9]


### Vector operations

In [7]:
x = numpy.arange(10 ** 6)
# vector operations do similar task for each element. In this case each element is multiplied by 3 and 12 added.
3 * x + 12.

array([  1.20000000e+01,   1.50000000e+01,   1.80000000e+01, ...,
         3.00000300e+06,   3.00000600e+06,   3.00000900e+06])

In [8]:
# use timing magic to understand this is quite fast
%timeit 3 * x + 12.

100 loops, best of 3: 6.26 ms per loop


In [9]:
Z = numpy.arange(15).reshape(5, 3)
Z

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [10]:
numpy.log(numpy.exp(Z)) # type conversion happened

array([[  0.,   1.,   2.],
       [  3.,   4.,   5.],
       [  6.,   7.,   8.],
       [  9.,  10.,  11.],
       [ 12.,  13.,  14.]])

In [11]:
Z += 4

In [12]:
Z

array([[ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12],
       [13, 14, 15],
       [16, 17, 18]])

In [13]:
Z[::2, :]

array([[ 4,  5,  6],
       [10, 11, 12],
       [16, 17, 18]])

In [14]:
Z[[0, 2, 4], :]

array([[ 4,  5,  6],
       [10, 11, 12],
       [16, 17, 18]])

In [15]:
Z.sum(axis=1)

array([15, 24, 33, 42, 51])

In [16]:
# axes are also zero-numerated
Z.sum(axis=0)

array([50, 55, 60])

In [17]:
Z.max(axis=1)

array([ 6,  9, 12, 15, 18])

In [18]:
Z2 = - Z
Z2 = numpy.sort(Z2, axis=1)
Z2

array([[ -6,  -5,  -4],
       [ -9,  -8,  -7],
       [-12, -11, -10],
       [-15, -14, -13],
       [-18, -17, -16]])

## Indexing with boolean array

In [19]:
x = numpy.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [20]:
x > 3

array([False, False, False, False,  True,  True,  True,  True,  True,  True], dtype=bool)

In [21]:
x[x < 7.4]

array([0, 1, 2, 3, 4, 5, 6, 7])

## Copies

Many operations in numpy don't create copies, but operate with the same memory 

In [22]:
x = numpy.arange(10)
y = x[:5]

print x, y
y[0] = 10
print x, y

[0 1 2 3 4 5 6 7 8 9] [0 1 2 3 4]
[10  1  2  3  4  5  6  7  8  9] [10  1  2  3  4]


this happened because x and y point __to the same place in memory__

In [23]:
x = numpy.arange(10)
y = x[:5].copy()
print x, y
y[0] = 10
print x, y

[0 1 2 3 4 5 6 7 8 9] [0 1 2 3 4]
[0 1 2 3 4 5 6 7 8 9] [10  1  2  3  4]


## Random numbers

module `numpy.random` helps with generating random numbers

In [24]:
# generating 10000 random numbers at once
numpy.random.normal(loc=2, scale=12, size=10000)

array([ 11.78220635,   8.93663103, -17.50627085, ...,  17.10257018,
         7.24243523,  -0.95099759])

## Sorting

In [25]:
x = numpy.random.random(size=1000)
x = numpy.sort(x)

In [26]:
print x[:10]
print x[-10:]

[ 0.00021619  0.00066141  0.0022386   0.00231872  0.00866861  0.00889664
  0.01014405  0.01216177  0.01273415  0.01466487]
[ 0.99120831  0.99250643  0.99419286  0.99490179  0.99550734  0.99653262
  0.99681669  0.99848937  0.99872478  0.99887051]


## Arg...

arg-functions allow writing non-trivial operations with a couple of lines

In [27]:
# random.random generates uniform in [0, 1]
random_numbers = numpy.random.random(size=1000)
indices = numpy.argsort(random_numbers)

In [28]:
numpy.alltrue(random_numbers[indices] == numpy.sort(random_numbers))

True

In [29]:
indices[:10]

array([794,   2, 719, 257, 385, 486, 763, 772, 554, 559])

In [30]:
random_numbers.min(), random_numbers.max()

(0.0004961517756608691, 0.99889210024516806)

In [31]:
random_numbers.argmax(), random_numbers[random_numbers.argmax()]

(219, 0.99889210024516806)

In [32]:
random_numbers.argmin(), random_numbers[random_numbers.argmin()]

(794, 0.0004961517756608691)

## Exercise

In [2]:
# 0. import numpy
import numpy as np

In [3]:
# 1. sample 1000 elements from normal distribution 
x=np.random.normal(0,1,1000)
print(x)

[ -6.43412938e-01  -1.89117607e+00  -4.74297677e-01  -1.27422736e+00
  -1.40070642e-01  -3.18300883e-01   7.80940998e-01   1.74164536e+00
   5.02305170e-02  -6.25594725e-01   1.31648723e+00  -1.06917383e+00
   9.82288035e-01  -6.54946033e-01  -1.68277672e-01   1.41802900e-01
  -5.70865475e-01   5.43847628e-01  -1.78030994e-01  -1.13196564e+00
  -1.17760261e+00   7.80748966e-01  -2.42141854e-02  -1.29139693e+00
   3.26305221e-01   2.17467033e-01  -2.28535466e-01   9.05246848e-01
  -1.13335558e+00   2.77609389e-01  -1.58424777e+00  -3.61730946e-01
  -7.45151258e-01   1.06682676e-01  -9.86236778e-01  -1.51823114e+00
  -1.21433273e-01  -9.32562288e-01   9.74739613e-01   1.95235358e-01
   3.17981178e-01  -8.16129018e-01   4.19331764e-02  -4.95443641e-01
  -1.05548597e-01  -3.16016508e-01  -1.91913391e-01  -4.17158163e-01
   6.55562450e-01  -1.23762540e+00   6.74219837e-01   9.82611387e-01
   8.13431486e-02  -6.06884553e-01  -1.44628600e+00  -1.36103494e-01
  -8.11865755e-02   7.54373220e-01

In [7]:
# 2. leave only positive numbers (from previous exercise)
y=x[x>0]
print(y)

[ 0.780941    1.74164536  0.05023052  1.31648723  0.98228804  0.1418029
  0.54384763  0.78074897  0.32630522  0.21746703  0.90524685  0.27760939
  0.10668268  0.97473961  0.19523536  0.31798118  0.04193318  0.65556245
  0.67421984  0.98261139  0.08134315  0.75437322  0.25418364  0.82713855
  0.30925078  1.20169634  0.44625707  1.52459595  1.67693516  0.96394558
  1.15042015  0.88078845  0.95789292  1.57902979  0.07045337  0.32752987
  0.27574795  0.25843899  1.8762092   0.703751    1.00244542  1.01788225
  0.08608828  0.3839163   0.54019239  0.25860404  0.03383181  0.19418732
  1.22620034  0.27638686  1.12047744  0.95243812  0.24901581  0.96982349
  0.63194063  1.35432597  0.52322339  1.84037761  1.1007735   0.27369785
  0.60434874  0.40141651  0.55503779  0.23648326  0.43294828  0.70477821
  0.09207242  0.1751668   0.23658637  0.02217098  0.26102168  0.64283815
  0.30660743  0.3864757   0.56187962  0.16823895  0.23945941  1.332054
  0.1907339   1.21089513  0.43097079  0.82254446  2.02

In [8]:
# 3. count number of left numbers, their minimum, maximum, mean and variance.
count=y.size
maximum=np.max(y)
minimum=np.min(y)
mean=np.mean(y)
variance=np.var(y)

print("count: {}\nmaximum: {}\nminimum: {}\nmean: {}\nvariance: {}".format(count,maximum,minimum,mean,variance))

count: 479
maximum: 3.19930946475
minimum: 0.00340928123709
mean: 0.764226380531
variance: 0.346660709819


## References:
* `numpy` documentation: https://docs.scipy.org/doc/numpy/reference/
    * almost any question about `numpy` is already answered on stackoverflow
* [From python to numpy: a beautiful book about numpy](https://github.com/rougier/from-python-to-numpy)
* Data manipulation with `numpy`: tips and tricks [part1](http://arogozhnikov.github.io/2015/09/29/NumpyTipsAndTricks1.html), [part2](http://arogozhnikov.github.io/2015/09/30/NumpyTipsAndTricks2.html)
