<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1a/NumPy_logo.svg/1200px-NumPy_logo.svg.png">


NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays.

- mathematical, logical, shape manipulation
- sorting, selecting
- I/O
- discrete Fourier transforms,
- basic linear algebra
- basic statistical operations
- random simulation and much more.



In [28]:
import numpy as np

## The following links provides richer background

https://numpy.org/doc/1.17/user/basics.html

https://numpy.org/doc/1.17/user/basics.types.html

https://numpy.org/doc/1.17/user/whatisnumpy.html




#####  If you are comming from Matlab, consider the following dictionary

https://numpy.org/doc/1.17/user/numpy-for-matlab-users.html

### There are 5 general mechanisms for creating arrays:

1. Conversion from other Python structures (e.g., lists, tuples)
2. Intrinsic numpy array creation objects (e.g., arange, ones, zeros, etc.)
3. Reading arrays from disk, either from standard or custom formats
4. Creating arrays from raw bytes through the use of strings or buffers
5. Use of special library functions (e.g., random)

## 1. Conversion

In [29]:
np.array([1,2,3,4,5])

array([1, 2, 3, 4, 5])

In [30]:
np.array(range(10))

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [31]:
np.array([[1,2.0],[0,0],(1+1j,3.)])

array([[1.+0.j, 2.+0.j],
       [0.+0.j, 0.+0.j],
       [1.+1.j, 3.+0.j]])

## 2. Intrinsic numpy array creation

In [32]:
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [33]:
np.arange(2, 10, dtype=float)

array([2., 3., 4., 5., 6., 7., 8., 9.])

In [34]:
np.arange(2, 3, 0.1)

array([2. , 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9])

In [35]:
np.linspace(1., 4., 6)

array([1. , 1.6, 2.2, 2.8, 3.4, 4. ])

`indices()` will create a set of arrays (stacked as a one-higher dimensioned array), one per dimension with each representing variation in that dimension. An example illustrates much better than a verbal description:

In [36]:
np.indices((3,3))

array([[[0, 0, 0],
        [1, 1, 1],
        [2, 2, 2]],

       [[0, 1, 2],
        [0, 1, 2],
        [0, 1, 2]]])

This is particularly useful for evaluating functions of multiple dimensions on a regular grid.



## 3. Reading from disk

In [38]:
x = np.arange(10)

In [39]:
x.tofile("myarray-binary.csv", sep=",")

In [40]:
y = np.fromfile("myarray-binary.csv",dtype="i8", sep=",")
y

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

## 4. Creating arrays from buffers

In [41]:
from io import StringIO

In [42]:
data = u"1, 2, 3\n4, 5, 6"

np.genfromtxt(StringIO(data), delimiter=",")

array([[1., 2., 3.],
       [4., 5., 6.]])

Alternatively, we may be dealing with a fixed-width file, where columns are defined as a given number of characters. In that case, we need to set delimiter to a single integer (if all the columns have the same size) or to a sequence of integers (if columns can have different sizes):



In [43]:
data = u"123456789\n   4  7 9\n   4567 9"
np.genfromtxt(StringIO(data),delimiter=(4, 3, 2))

array([[1234.,  567.,   89.],
       [   4.,    7.,    9.],
       [   4.,  567.,    9.]])

In [44]:
data = u"1, abc , 2\n 3, xxx, 4"
# Remove spaces
np.genfromtxt(StringIO(data), delimiter=",", dtype="|U5", autostrip=True)

array([['1', 'abc', '2'],
       ['3', 'xxx', '4']], dtype='<U5')

In [45]:
data = u"""#
 # Skip me !
 # Skip me too !
 1, 2
 3, 4
 5, 6 #This is the third line of the data
 7, 8
 # And here comes the last line
 9, 0
"""
np.genfromtxt(StringIO(data), comments="#", delimiter=",")

array([[1., 2.],
       [3., 4.],
       [5., 6.],
       [7., 8.],
       [9., 0.]])

## selecting columns


In [46]:
data = u"1 2 3\n4 5 6"
np.genfromtxt(StringIO(data), usecols=(0, -1))

array([[1., 3.],
       [4., 6.]])

In [47]:
np.genfromtxt(StringIO(data),
...               names="a, b, c", usecols=("a", "c"))

array([(1., 3.), (4., 6.)], dtype=[('a', '<f8'), ('c', '<f8')])

## Specifying data types

In [48]:
np.genfromtxt(StringIO(data), dtype=[(_, int) for _ in "abc"])

array([(1, 2, 3), (4, 5, 6)],
      dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<i8')])

In [49]:
data = "So it goes\n#a b c\n1 2 3\n 4 5 6"

np.genfromtxt(StringIO(data), skip_header=1, names=True)


array([(1., 2., 3.), (4., 5., 6.)],
      dtype=[('a', '<f8'), ('b', '<f8'), ('c', '<f8')])

In [50]:
# data = u"1, 2.3%, 45.\n6, 78.9%, 0"
# convertfunc = lambda x: float(x.strip("%"))/100.
# np.genfromtxt(StringIO(data), delimiter=",",names = ["a","b","c"],
#               converters={1: convertfunc})

## More cleaning tricks on import

In [51]:
data = u"N/A, 2, 3\n4, ,???"
kwargs = dict(delimiter=",",
    dtype=int,
    names="a,b,c",
    missing_values={0:"N/A", 'b':" ", 2:"???"},
    filling_values={0:0, 'b':0, 2:-999})

np.genfromtxt(StringIO(data), **kwargs)

array([(0, 2,    3), (4, 0, -999)],
      dtype=[('a', '<i8'), ('b', '<i8'), ('c', '<i8')])

## 5. Generating arrays using pseudo-random generators

In [52]:
from numpy import random as r

r.random((2,2))

array([[0.54142345, 0.32565863],
       [0.83166418, 0.35442452]])

In [53]:
r.uniform(1,2,size=(2,2))

array([[1.32652709, 1.03521421],
       [1.39753258, 1.465488  ]])

### Standard normal distribution and shuffle in place

In [54]:
N = r.randn(5)
N

array([-2.59577433, -1.47212151,  0.16090415,  0.24718511, -1.34799267])

In [55]:
r.shuffle(N)
N

array([ 0.16090415, -1.34799267, -2.59577433,  0.24718511, -1.47212151])

## Mini exercise

Create the following array using 3 of the above 5 above.

```
[
 1 1 1 1 1
 1 0 0 0 1
 1 0 0 0 1
 1 0 0 0 1
 1 1 1 1 1
]
``` 
Time: 15 min

Create new (File>New notebook>Python3)
*Random number generation seems less suitable*

## Indexing

*Array indexing refers to any use of the square brackets `[]` to index array values.*


*There are many options to indexing, which give numpy indexing great power, but with power comes some complexity and the potential for confusion.*

In [23]:
x = np.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [24]:
x[0]

0

In [25]:
x[-1]

9

## Higher dimensions

In [68]:
x.shape = (2,5) # x.reshape(2,5)
x

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [69]:
x[1,3]

8

In [70]:
x[0]

array([0, 1, 2, 3, 4])

## Slicing and multiple indexing

In [71]:
x[0,:]

array([0, 1, 2, 3, 4])

In [72]:
x[0][2]

2

## Access multiple elements using indexing

Using `slicing`

In [40]:
x = np.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [41]:
x[2:5]

array([2, 3, 4])

In [42]:
x[:-7]

array([0, 1, 2])

In [43]:
x[1:7:2]

array([1, 3, 5])

### Note 

that slices of arrays do not copy the internal array data but only produce new views of the original data. 

This is different from list or tuple slicing and an explicit `copy()` is recommended if the original data is not required anymore.

### Accessing multiple dimensional array

In [109]:
y = np.arange(35).reshape(5,7)
y

array([[ 0,  1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12, 13],
       [14, 15, 16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25, 26, 27],
       [28, 29, 30, 31, 32, 33, 34]])

In [110]:
y[0]

array([0, 1, 2, 3, 4, 5, 6])

In [111]:
y[0,:]

array([0, 1, 2, 3, 4, 5, 6])

## Mini exercise

Access the multi-dimensional array `y` to extract the subarray

```
[
    9  10  11
    16 17  18
    23 24  25
]

``` 


## Index via np arrays

The use of index arrays ranges from simple, straightforward cases to complex, hard-to-understand cases. For all cases of index arrays, what is returned is a copy of the original data, not a view as one gets for slices.

In [120]:
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [121]:
x[[3,3,1,8]]

array([3, 3, 1, 8])

In [122]:
x[np.array([3, 3, 1, 8])]

array([3, 3, 1, 8])

In [127]:
x[np.array([3,3,-3,8])]
x[[3,3,-3,8]]

array([3, 3, 7, 8])

In [47]:
x[np.array([[1,1],[2,3]])]

array([[1, 1],
       [2, 3]])

## Multidimensional arrays indexed with multidemsional arrays

In [49]:
y = np.arange(35).reshape(5,7)
y

array([[ 0,  1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12, 13],
       [14, 15, 16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25, 26, 27],
       [28, 29, 30, 31, 32, 33, 34]])

In [66]:
y[np.array([0,2,4]), np.array([0,1,2])]

array([ 0, 15, 30])

The first index value is `0` for both index arrays, and thus the first value of the resultant array is `y[0,0]`.

The next value is `y[2,1]`, and the last is `y[4,2]`.

##### Mismatching shape of indexing array

In [58]:
y[np.array([0,2,4]), np.array([0,1])]

IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (3,) (2,) 

#### Scalars match with any thing

In [60]:
y[np.array([0,2,4]), 1]

array([ 1, 15, 29])

In [61]:
y[np.array([0,2,4])]

array([[ 0,  1,  2,  3,  4,  5,  6],
       [14, 15, 16, 17, 18, 19, 20],
       [28, 29, 30, 31, 32, 33, 34]])

First entrance yields `y[0,:]` or `y[0]` second `y[2,:]` and so forth 

## Matching using boolean expressions

In [73]:
b = y>20

In [77]:
y[b]

array([21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34])

In [78]:
b

array([[False, False, False, False, False, False, False],
       [False, False, False, False, False, False, False],
       [False, False, False, False, False, False, False],
       [ True,  True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True,  True,  True]])

## More mat-lab like syntax
Use `np.nonzero` to get behaviour similar to `find(y>20)`

In [27]:
y[np.nonzero(b)]

NameError: name 'y' is not defined

In [80]:
np.nonzero(b)

(array([3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4]),
 array([0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6]))

In [82]:
y[b[:,5],1:3]

array([[22, 23],
       [29, 30]])

## Structural indexing tools

In [84]:
y.shape

(5, 7)

In [85]:
y[:,np.newaxis,:].shape

(5, 1, 7)

### Tricks with stuctural indecies

In [87]:
x = np.arange(5)
x + x

array([0, 2, 4, 6, 8])

In [88]:
x[:,np.newaxis] + x[np.newaxis,:]

array([[0, 1, 2, 3, 4],
       [1, 2, 3, 4, 5],
       [2, 3, 4, 5, 6],
       [3, 4, 5, 6, 7],
       [4, 5, 6, 7, 8]])

## Ellipsis access

In [98]:
z = np.arange(81).reshape(3,3,3,3)

In [99]:
z[1,...,2]

array([[29, 32, 35],
       [38, 41, 44],
       [47, 50, 53]])

In [100]:
z[1,:,:,2]

array([[29, 32, 35],
       [38, 41, 44],
       [47, 50, 53]])

## Assigment to indexed arrays

In [114]:
x = np.arange(10)
x[2:7] = 1
x

array([0, 1, 1, 1, 1, 1, 1, 7, 8, 9])

In [115]:
x[2:7] = np.arange(5)
x

array([0, 1, 0, 1, 2, 3, 4, 7, 8, 9])

In [116]:
x[1] = 1.2
x

array([0, 1, 0, 1, 2, 3, 4, 7, 8, 9])

## Mini exercise

Create the following array again now using assignment + `np.ones` 

```
[
 1 1 1 1 1
 1 0 0 0 1
 1 0 0 0 1
 1 0 0 0 1
 1 1 1 1 1
]
``` 


### Assigning based on type and incrementing

In [117]:
x[1] = 1.2j

TypeError: can't convert complex to int

In [118]:
x

array([0, 1, 0, 1, 2, 3, 4, 7, 8, 9])

In [119]:
x += 1
x

array([ 1,  2,  1,  2,  3,  4,  5,  8,  9, 10])

### Broadcasting

The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes. 

In [208]:
a = np.array(range(3))
a

array([0, 1, 2])

In [209]:
b = a.copy()
a*b

array([0, 1, 4])

### Broadcasting

In [210]:
b = 2.0
a*b

array([0., 2., 4.])

In [211]:
x = np.arange(4)
xx = x.reshape(4,1)
y = np.ones(5)
z = np.ones((3,4))

In [212]:
x.shape

(4,)

In [213]:
y.shape

(5,)

In [214]:
x + y

ValueError: operands could not be broadcast together with shapes (4,) (5,) 

### Broadcasting

In [215]:
xx.shape

(4, 1)

In [216]:
(xx + y).shape

(4, 5)

In [217]:
xx + y

array([[1., 1., 1., 1., 1.],
       [2., 2., 2., 2., 2.],
       [3., 3., 3., 3., 3.],
       [4., 4., 4., 4., 4.]])

### Broadcasting

In [218]:
z.shape

(3, 4)

In [219]:
(x + z).shape

(3, 4)

In [220]:
x + z

array([[1., 2., 3., 4.],
       [1., 2., 3., 4.],
       [1., 2., 3., 4.]])

### Broadcasting using `nexaxis` 

In [221]:
x

array([0, 1, 2, 3])

In [222]:
y

array([1., 1., 1., 1., 1.])

Broadcasting provides a convenient way of taking the outer product (or any other outer operation) of two arrays. The following example shows an outer addition operation of two 1-d arrays:

In [224]:
x[:, np.newaxis] + y

array([[1., 1., 1., 1., 1.],
       [2., 2., 2., 2., 2.],
       [3., 3., 3., 3., 3.],
       [4., 4., 4., 4., 4.]])

Here the newaxis index operator inserts a new axis into a, making it a two-dimensional 4x1 array. Combining the 4x1 array with b, which has shape (3,), yields a 4x3 array.

## Benchmarking

So, yes, your observation is generalizable. Vectorizing is the whole point of numpy. numpy code that isn't vectorized is always slower than bare python code, and is arguably just as "wrong" as cracking a single walnut with a jackhammer. Either find the right tool or get more nuts.



In [150]:
import math
import random
from tqdm import tqdm

In [151]:
%timeit math.log(10)
%timeit np.log(10)
%timeit math.exp(3)
%timeit np.exp(3)

99.4 ns ± 1.93 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
992 ns ± 29.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
91.6 ns ± 1.36 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
979 ns ± 29 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [204]:
%timeit random.gauss(0, 1)
%timeit np.random.normal()

479 ns ± 5.87 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
1.38 µs ± 19.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [170]:
def log(x):
    return [math.log(_) for _ in x ]

In [203]:
steps = range(3,2000,2)
samples = 1000
results_np = []
results_base=[]
if False:
    for size in tqdm(steps):
        array = [x for x in range(2,size)]
        timer_np = %timeit -n 100 -o -q np.log(array)
        timer_base = %timeit -n 100 -o -q log(array)

        results_np.append(timer_np.average)
        results_base.append(timer_base.average)

In [129]:
import pandas as pd

# pd.DataFrame(
#     {
#     "steps":steps,
#     "numpy": results_np,
#     "base": results_base
#     }
# ).to_csv("bench_small.csv")
df = pd.read_csv("bench_small.csv")

In [130]:
import plotly.graph_objs as go
from plotly.offline import iplot,init_notebook_mode
init_notebook_mode(connected=True)

def performance_plot(df):
    fig = go.Figure(
        data=[
            go.Scatter(x=df.steps, y=df.numpy, name="Numpy"),
            go.Scatter(x=df.steps, y=df.base, name="base"),
        ],
        layout={
            "xaxis":{"title":"Size of array"},
            "yaxis":{"title":"Time in seconds"}
                          })
    iplot(fig)
performance_plot(df)

In [131]:
df = pd.read_csv("bench.csv")
performance_plot(df)

### Why is NumPy Fast?
Vectorization describes the absence of any explicit looping, indexing, etc., in the code - these things are taking place, of course, just “behind the scenes” in optimized, pre-compiled C code. 