## Numpy: Numeric Computing Library
NumPy is one of the core packages for numerical computing in Python. Pandas, Matplotlib, statmodes and many other Scientific libraries rely on Numpy.

NumPy major contributions are:
* Efficient numeric computation with C primitives
* Efficient collections with vectorized operations
* An integrated and natural Linear Algebra API
* A C API for connecting Numpy with libraries written in c, C++ or FORTRAN

In Python, every is an object, which means that even simple ints are also objects, with all the required machinery to make object work. We call them 'Boxed Ints'. In contrast, NumPy uses primitive numeric types (floats, ints) which makes storing and computation efficient.

In [2]:
import numpy as np
import sys

### Bacsic Numpy Arrays

In [7]:
np.array([1,2,3,4])

array([1, 2, 3, 4])

In [8]:
a = np.array([1,2,3,4])

In [9]:
b= np.array([0, .5, 1, 1.5, 2])

In [10]:
a[0], a[1]

(1, 2)

In [14]:
n =5
sys.getsizeof(n)

28

In [16]:
a[0:], a[1:3], a[1:-1], a[::2]

(array([1, 2, 3, 4]), array([2, 3]), array([2, 3]), array([1, 3]))

In [19]:
b[[0,2,-1]] # cannot use b[0,2,-1]

array([0., 1., 2.])

## Array types

In [21]:
a

array([1, 2, 3, 4])

In [22]:
a.dtype, b.dtype

(dtype('int32'), dtype('float64'))

In [23]:
np.array([1,2,3,4], dtype = float)

array([1., 2., 3., 4.])

In [25]:
c= np.array(['a', 'b', 'c'])

In [26]:
c.dtype

dtype('<U1')

In [44]:
d= np.array([{'a':1}, sys])   # 'O': Object of Python list

In [29]:
d.dtype

dtype('O')

## Dimensions and Shapes

In [30]:
A = np.array([
                [1,2,3],
                [4,5,6] ])

In [31]:
A.shape

(2, 3)

In [32]:
A.ndim    # dimemsion

2

In [33]:
A.size    # number of elements

6

In [38]:
B = np.array([
    [
            [12,11,10],
            [9,8,7]
    ],
    [
            [6,5,4],
            [3,2,1]
    ]
])

In [39]:
B.shape

(2, 2, 3)

In [40]:
B.ndim

3

In [41]:
B.size

12

If the shape isn't consistent, it will fall back to regular Python object, such as:

In [53]:
C = np.array([
    [
        [12,11,10],
        [9,8,7]
    ],
    [
        [6,5,4]
    ]
], dtype = 'O')  # if you don't specify the dtype of C, it'll return an error when call C

In [45]:
C.dtype

dtype('O')

In [46]:
C.shape

(2,)

In [47]:
C.size

2

In [54]:
type(C[0])

list

## Indexing and Slicing of Matrices

In [56]:
# Square matrix
A = np.array ([
    #0 1 2
    [1,2,3],   #0
    [4,5,6],   #1
    [7,8,9] ]) #2

In [75]:
A[1]

array([4, 5, 6])

In [74]:
A[1][0]

4

In [76]:
# A[dim1, dim2, dim3...]
A[1,0]

4

In [60]:
A[0:2]

array([[1, 2, 3],
       [4, 5, 6]])

In [61]:
A[:,:2]

array([[1, 2],
       [4, 5],
       [7, 8]])

In [62]:
A[:2, :2]

array([[1, 2],
       [4, 5]])

In [63]:
A[2:, 2:]

array([[9]])

In [64]:
A[1] = np.array([10,10,10])

In [65]:
A

array([[ 1,  2,  3],
       [10, 10, 10],
       [ 7,  8,  9]])

In [66]:
A[2] =99

In [67]:
A

array([[ 1,  2,  3],
       [10, 10, 10],
       [99, 99, 99]])

## Summary Statistics

In [79]:
a = np.array([1,2,3,4])

In [80]:
a.sum()

10

In [81]:
a.mean()

2.5

In [82]:
a.std()

1.118033988749895

In [83]:
a.var()

1.25

In [90]:
A = np.array([
    [1,2,3],
    [4,5,6],
    [7,8,9] ])

In [97]:
A.sum(), A.mean(), A.var(), A.std(), A.ndim

(45, 5.0, 6.666666666666667, 2.581988897471611, 2)

In [94]:
A.sum(axis=0) # vertical axis

array([12, 15, 18])

In [95]:
A.sum(axis=1) # horizontal axis

array([ 6, 15, 24])

In [99]:
A.mean(axis=0), A.mean(axis=1), A.std(axis=0), A.std(axis=1)

(array([4., 5., 6.]),
 array([2., 5., 8.]),
 array([2.44948974, 2.44948974, 2.44948974]),
 array([0.81649658, 0.81649658, 0.81649658]))

## Broadcasting and Vectorized operations

In [121]:
a=np.arange(4)

In [122]:
a

array([0, 1, 2, 3])

In [123]:
a + 10, a*10

(array([10, 11, 12, 13]), array([ 0, 10, 20, 30]))

In [124]:
a+=100

In [125]:
a

array([100, 101, 102, 103])

In [129]:
l = np.arange(4)
l

array([0, 1, 2, 3])

In [130]:
[i * 10 for i in l]

[0, 10, 20, 30]

In [133]:
a=np.arange(4)
a

array([0, 1, 2, 3])

In [134]:
b= np.array([10,10,10,10])
b

array([10, 10, 10, 10])

In [136]:
a + b, a-b, a*b

(array([10, 11, 12, 13]), array([-10,  -9,  -8,  -7]), array([ 0, 10, 20, 30]))

## Boolean arrays

(Also call masks)

In [142]:
a = np.arange(4)
a

array([0, 1, 2, 3])

In [143]:
a[[0, -1]] # if you use a[0,-1] there will raise an error

array([0, 3])

In [144]:
a[[True, False,False, True]]

array([0, 3])

In [145]:
a>=2

array([False, False,  True,  True])

In [146]:
a[a>=2]

array([2, 3])

In [147]:
a.mean(), a[a> a.mean()]

(1.5, array([2, 3]))

In [162]:
a[~(a>a.mean())]  # ~: not

array([0, 1])

In [150]:
a[(a==0) | (a==1)]

array([0, 1])

In [151]:
a[(a<=2) & (a%2==0)]

array([0, 2])

In [6]:
A = np.random.randint(100, size =(3,3))
A

array([[70,  9, 77],
       [ 2, 53,  3],
       [20, 40,  7]])

In [7]:
A[(np.array([
            [True, False, True],
            [False, True, False],
            [True, False, True]
]))]  # array of arrays : 1 array

array([70, 77, 53, 20,  7])

In [8]:
A>30

array([[ True, False,  True],
       [False,  True, False],
       [False,  True, False]])

## Linear Algebra

In [9]:
 A = np.array ([
     [1,2,3],
     [4,5,6],
     [7,8,9]
 ])

In [10]:
B = np.array([
    [6,5],
    [4,3],
    [2,1]
])

In [11]:
A.dot(B)  # 

array([[20, 14],
       [56, 41],
       [92, 68]])

In [12]:
A@B

array([[20, 14],
       [56, 41],
       [92, 68]])

In [13]:
B.T

array([[6, 4, 2],
       [5, 3, 1]])

In [14]:
A

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [15]:
B.T@A

array([[36, 48, 60],
       [24, 33, 42]])

## Size of Objects in memory

In [25]:
# An integer in Python is > 24bytes and longs are even larger
sys.getsizeof(1), sys.getsizeof(10**100) #(bytes)

(28, 72)

In [17]:
# Numpy size is much smaller
np.dtype(int).itemsize, np.dtype(float).itemsize

(4, 8)

## Lists are even larger

In [20]:
# A one element list in Python and array with one element in Numpy
sys.getsizeof([1]), np.array([1]).nbytes

(64, 4)

## And performance is also important

In [21]:
l = list(range(1000))
a = np.arange(1000)

In [23]:
%time np.sum(a**2)

Wall time: 0 ns


332833500

In [24]:
%time sum([x**2 for x in l])

Wall time: 996 µs


332833500

## Useful Numpy functions

### Random

In [27]:
np.random.random(size =2)

array([0.11170566, 0.18636257])

In [32]:
np.random.normal(size=2)

array([0.51053713, 0.47605524])

In [33]:
np.random.rand(2,4)

array([[0.99103696, 0.46267015, 0.28999179, 0.86002602],
       [0.44114587, 0.35207927, 0.00890734, 0.97536064]])

In [None]:
### Arange

In [26]:
np.arange(10), np.arange(5,10), np.arange(0,1,.1)

(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
 array([5, 6, 7, 8, 9]),
 array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]))

### Reshape

In [39]:
np.arange(10), np.arange(10).reshape(2,5), np.arange(10).reshape(5,2)

(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
 array([[0, 1, 2, 3, 4],
        [5, 6, 7, 8, 9]]),
 array([[0, 1],
        [2, 3],
        [4, 5],
        [6, 7],
        [8, 9]]))

### Linspace

In [40]:
np.linspace(0,1,5), np.linspace(0,1,20), np.linspace(0,1,20, False)

(array([0.  , 0.25, 0.5 , 0.75, 1.  ]),
 array([0.        , 0.05263158, 0.10526316, 0.15789474, 0.21052632,
        0.26315789, 0.31578947, 0.36842105, 0.42105263, 0.47368421,
        0.52631579, 0.57894737, 0.63157895, 0.68421053, 0.73684211,
        0.78947368, 0.84210526, 0.89473684, 0.94736842, 1.        ]),
 array([0.  , 0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 , 0.45, 0.5 ,
        0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85, 0.9 , 0.95]))

### Zeros, ones, empty

In [41]:
np.zeros(5), np.zeros((3,3)), np.zeros((3,3), dtype=int)

(array([0., 0., 0., 0., 0.]),
 array([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]]),
 array([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]]))

In [42]:
np.ones(5), np.ones((5,5)), np.ones((5,5), dtype=int)

(array([1., 1., 1., 1., 1.]),
 array([[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]]),
 array([[1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1]]))

In [43]:
np.empty(5), np.empty((2,2))

(array([1., 1., 1., 1., 1.]),
 array([[0.25, 0.5 ],
        [0.75, 1.  ]]))

### Identity and eye

In [49]:
np.identity(3), np.eye(3,3)

(array([[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.]]),
 array([[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.]]))

In [54]:
np.eye(4,4, k=2), np.eye(4,4,k=-2) # k: index of the diagonal, if positive: upper diagonal, negative: below diagonal, default: middle diagonal.

(array([[0., 0., 1., 0.],
        [0., 0., 0., 1.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]]),
 array([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [1., 0., 0., 0.],
        [0., 1., 0., 0.]]))

In [51]:
'hello world'[:7], 'hello world'[6] # ignore spaces

('hello w', 'w')