![rmotr](https://user-images.githubusercontent.com/7065401/52071918-bda15380-2562-11e9-828c-7f95297e4a82.png)
<hr style="margin-bottom: 40px;">

<img src="https://user-images.githubusercontent.com/7065401/39118381-910eb0c2-46e9-11e8-81f1-a5b897401c23.jpeg"
    style="width:300px; float: right; margin: 0 40px 40px 40px;"></img>

# Numpy: Numeric computing library

NumPy (Numerical Python) is one of the core packages for numerical computing in Python. Pandas, Matplotlib, Statmodels and many other Scientific libraries rely on NumPy.

NumPy major contributions are:

* Efficient numeric computation with C primitives
* Efficient collections with vectorized operations
* An integrated and natural Linear Algebra API
* A C API for connecting NumPy with libraries written in C, C++, or FORTRAN.

Let's develop on efficiency. In Python, **everything is an object**, which means that even simple ints are also objects, with all the required machinery to make object work. We call them "Boxed Ints". In contrast, NumPy uses primitive numeric types (floats, ints) which makes storing and computation efficient.

<img src="https://docs.google.com/drawings/d/e/2PACX-1vTkDtKYMUVdpfVb3TTpr_8rrVtpal2dOknUUEOu85wJ1RitzHHf5nsJqz1O0SnTt8BwgJjxXMYXyIqs/pub?w=726&h=396" />


![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## Hands on! 

In [1]:
import sys
import numpy as np

## Basic Numpy Arrays

In [2]:
my_array = [1, 2, 3, 4]
np.array(my_array)

array([1, 2, 3, 4])

In [3]:
a = np.array([1, 2, 3, 4])

In [4]:
b = np.array([0, .5, 1, 1.5, 2])

In [5]:
a[0], a[1]

(1, 2)

In [6]:
a[0:]

array([1, 2, 3, 4])

In [7]:
a[1:3]

array([2, 3])

In [8]:
a[1:-1]

array([2, 3])

In [9]:
a[::2]

array([1, 3])

In [10]:
b

array([0. , 0.5, 1. , 1.5, 2. ])

In [11]:
b[[1, 3]]

array([0.5, 1.5])

In [12]:
b = np.array([0, .5, 1, 1.5, 2])
b[[0, 2, -1]]

array([0., 1., 2.])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Array Types

In [13]:
a = np.array([1, 2, 3, 4])

In [14]:
a.dtype

dtype('int32')

The data type int32 corresponds to a 32-bit integer. Let’s break it down:
int32:
The term “int” stands for integer, which represents whole numbers (positive, negative, or zero).
The number “32” refers to the number of bits used to store each integer value.
In a 32-bit integer, 32 binary digits (bits) are used to represent the integer value.
The range of a 32-bit integer is from approximately -2.1 billion to +2.1 billion.
It is commonly used for efficient memory storage and arithmetic operations.
In the case of your array a = np.array([1, 2, 3, 4]), all the elements are whole numbers (integers). When NumPy creates an array, it automatically assigns a data type based on the input values. Since the elements are small integers, NumPy chooses the default data type for integers, which is int32.

If you need more precision or a larger range, you can explicitly specify a different data type when creating an array. For example, you can use np.array([1, 2, 3, 4], dtype=np.int64) to create a 64-bit integer array.

In [15]:
b

array([0. , 0.5, 1. , 1.5, 2. ])

In [16]:
b.dtype

dtype('float64')

In [17]:
np.array([1, 2, 3, 4], dtype=np.float64)

array([1., 2., 3., 4.])

In [18]:
np.array([1, 2, 3, 4], dtype=np.int8)

array([1, 2, 3, 4], dtype=int8)

In [19]:
c = np.array(['a', 'b', 'c'])

In [20]:
c.dtype

dtype('<U1')

above code:
U: This signifies a Unicode character string. NumPy uses Unicode to represent text data, allowing it to work with characters from various languages.
1: This indicates the number of bytes used to represent each character in the array. In this case, each character is stored using 1 byte.
Therefore, U1 signifies that c is a NumPy array of Unicode characters, with each character occupying 1 byte of memory.
# Unicode character:
A Unicode character is a unique unit representing a visual symbol, letter, punctuation mark, or other element used in writing systems. It's a standard encoding scheme that allows computers to process text from various languages and writing systems consistently.
Here are some key aspects of Unicode characters:
Universality: The goal of Unicode is to encompass all characters used in human languages, both historical and contemporary. This enables seamless processing and exchange of text data across diverse languages and platforms.
Unique code point: Each Unicode character has a unique numerical identifier called a code point. This code point allows computers to unambiguously recognize and handle characters regardless of the specific language or system they originate from.
Variable byte size: Different Unicode characters can require a varying number of bytes for storage depending on their complexity. The most common characters used in languages like English typically use 1 byte (8 bits) per character, while more complex characters from less common writing systems might require 2 or 4 bytes.
By employing Unicode, computers can:

Display text accurately: Different languages and writing systems can be displayed correctly on various devices, ensuring accurate representation.
Facilitate data exchange: Text data can be exchanged seamlessly between different systems and platforms without character encoding issues.
Support multilingual applications: Software applications can be designed to work with multiple languages using the same text processing engine.
In summary, Unicode characters are the building blocks for representing text data in a universal and consistent manner, enabling computers to handle characters from diverse languages and writing systems effectively.

In [21]:
d = np.array([{'a': 1}, sys])

In [22]:
d.dtype

dtype('O')

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Dimensions and shapes

In [23]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6]
])

In [24]:
A.shape

(2, 3)

In [25]:
A.ndim

2

# why dimension is 2 here :
When we talk about the dimension of an array or matrix, we are referring to the number of indices required to access its elements. In the context of NumPy arrays or matrices, the dimension is often called the rank

In [26]:
A.size

6

In [27]:
B = np.array([
    [
        [12, 11, 10],
        [9, 8, 7],
    ],
    [
        [6, 5, 4],
        [3, 2, 1]
    ]
])

In [28]:
B

array([[[12, 11, 10],
        [ 9,  8,  7]],

       [[ 6,  5,  4],
        [ 3,  2,  1]]])

In [29]:
B.shape

(2, 2, 3)

In [30]:
B.ndim

3

In [31]:
B.size

12

If the shape isn't consistent, it'll just fall back to regular Python objects:

In [32]:
C = np.array([
    [
        [12, 11, 10],
        [9, 8, 7],
    ],
    [
        [6, 5, 4]
    ]
])

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

In [None]:
C.dtype

dtype('O')

In [None]:
C.shape

(2,)

In [None]:
C.size

2

In [None]:
type(C[0])

NameError: name 'C' is not defined

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Indexing and Slicing of Matrices

In [None]:
# Square matrix
A = np.array([
#.   0. 1. 2
    [1, 2, 3], # 0
    [4, 5, 6], # 1
    [7, 8, 9]  # 2
])

In [None]:
A[1]

array([4, 5, 6])

In [None]:
A[1][0]

4

In [None]:
# A[d1, d2, d3, d4]

In [None]:
A[1, 0]

4

In [None]:
A[0:2]

array([[1, 2, 3],
       [4, 5, 6]])

In [None]:
A[:, :2]

array([[1, 2],
       [4, 5],
       [7, 8]])

In [None]:
A[:2, :2]

array([[1, 2],
       [4, 5]])

In [None]:
A[:2, 2:]

array([[3],
       [6]])

In [None]:
A

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [None]:
A[1] = np.array([10, 10, 10])

In [None]:
A

array([[ 1,  2,  3],
       [10, 10, 10],
       [ 7,  8,  9]])

In [None]:
A[2] = 99

In [None]:
A

array([[ 1,  2,  3],
       [10, 10, 10],
       [99, 99, 99]])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Summary statistics

In [None]:
a = np.array([1, 2, 3, 4])

In [None]:
a.sum()

10

In [None]:
a.mean()

2.5

In [None]:
a.var()

1.25

## The a.var() method calculates the variance of the elements in the NumPy array a. Variance is a measure of the spread or dispersion of a set of values. In the context of a NumPy array, it is computed as the average of the squared differences from the mean.

In [None]:
a.std()

1.118033988749895

# Standard deviation: is the square root of the variance

In [None]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

In [None]:
A.sum()

45

In [None]:
A.mean()

5.0

In [None]:
A.std()

2.581988897471611

In [None]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])
A.mean(axis=0)

array([4., 5., 6.])

The A.sum(axis=0) operation will sum the elements along the vertical axis (axis 0),
which is column-wise for a 2D array. This means it will add the numbers in each column,
resulting in a new array with the sum of each column.

In [None]:
A.sum(axis=0)

array([12, 15, 18])

In [None]:
A.sum(axis=1)

array([ 6, 15, 24])

In [None]:
A.mean(axis=0)

array([4., 5., 6.])

In [None]:
A.mean(axis=1)

array([2., 5., 8.])

In [None]:
A.std(axis=0)

array([2.44948974, 2.44948974, 2.44948974])

In [None]:
A.std(axis=1)

array([0.81649658, 0.81649658, 0.81649658])

And [many more](https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.ndarray.html#array-methods)...

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Broadcasting and Vectorized operations

In [None]:
a = np.arange(4)
a

array([0, 1, 2, 3])

# The np.arange(4) function generates a one-dimensional array with values starting from 0 up to (but not including) 4

In [None]:
a

array([0, 1, 2, 3])

In [None]:
a + 10

array([10, 11, 12, 13])

In [None]:
a * 10

array([ 0, 10, 20, 30])

In [None]:
a

array([0, 1, 2, 3])

In [None]:
a += 100

In [None]:
a

array([100, 101, 102, 103])

In [None]:
l = [0, 1, 2, 3]

In [None]:
[i * 10 for i in l]

[0, 10, 20, 30]

In [None]:
a = np.arange(4)

In [None]:
a

array([0, 1, 2, 3])

In [None]:
b = np.array([10, 10, 10, 10])

In [None]:
b

array([10, 10, 10, 10])

In [None]:
a + b

array([10, 11, 12, 13])

In [None]:
a * b

array([ 0, 10, 20, 30])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Boolean arrays
_(Also called masks)_

In [None]:
a = np.arange(4)

In [None]:
a

array([0, 1, 2, 3])

In [None]:
a[0], a[-1]

(0, 3)

In [None]:
a[[0, -1]]

array([0, 3])

In [34]:
a[[True, False, True, True]]

array([1, 3, 4])

In [None]:
a

array([0, 1, 2, 3])

In [None]:
a >= 2

array([False, False,  True,  True])

In [None]:
a[a >= 2]

array([2, 3, 4])

In [None]:
a.mean()

1.5

In [None]:
a[a > a.mean()]

array([2, 3])

In [None]:
a[~(a > a.mean())]

array([0, 1])

# the tilde ~ is a bitwise NOT operator, so it flips True to False and False to True

In [None]:
a[(a == 0) | (a == 1)]

array([0, 1])

In [None]:
a[(a <= 2) & (a % 2 == 0)]

array([0, 2])

In [41]:
A = np.random.randint(100, size=(3, 3))

In [42]:
A

array([[94,  4, 66],
       [90, 40, 89],
       [16, 43,  8]])

In [43]:
A[np.array([
    [True, False, True],
    [False, True, False],
    [True, False, True]
])]

array([94, 66, 40, 16,  8])

In [37]:
A > 30

array([[False, False,  True],
       [False, False, False],
       [ True,  True,  True]])

In [None]:
A[A > 30]

array([71, 42, 40, 94, 85, 36])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Linear Algebra

In [None]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

In [None]:
B = np.array([
    [6, 5],
    [4, 3],
    [2, 1]
])

In [None]:
A.dot(B)

array([[20, 14],
       [56, 41],
       [92, 68]])

Let’s break down the output of the dot product (matrix multiplication) of arrays A and B:
To compute the dot product of A and B, we multiply the rows of A by the columns of B and sum the results.
The resulting matrix will have dimensions (3x2), where each element is obtained by taking the dot product of the corresponding row from A and column from B.
Calculation:
Element at position (0,0):
Row 0 of A: [1, 2, 3]
Column 0 of B: [6, 4, 2]
Dot product: (16) + (24) + (3*2) = 20
Element at position (0,1):
Row 0 of A: [1, 2, 3]
Column 1 of B: [5, 3, 1]
Dot product: (15) + (23) + (3*1) = 14
Similarly, we compute the other elements.

In [None]:
A @ B

array([[20, 14],
       [56, 41],
       [92, 68]])

In [None]:
B.T

array([[6, 4, 2],
       [5, 3, 1]])

The expression B.T represents the transpose of the array B
The transpose of a matrix swaps its rows with columns.

In [None]:
A

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [None]:
B.T @ A

array([[36, 48, 60],
       [24, 33, 42]])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Size of objects in Memory

### Int, floats

In [46]:
# An integer in Python is > 24bytes
import sys
sys.getsizeof(14)

28

In [45]:
# Longs are even larger
sys.getsizeof(10**100)

72

In [47]:
# Numpy size is much smaller
np.dtype(int).itemsize

4

In [48]:
# Numpy size is much smaller
np.dtype(np.int8).itemsize

1

In [49]:
np.dtype(float).itemsize

TypeError: Cannot interpret '5.55' as a data type

### Lists are even larger

In [None]:
# A one-element list
sys.getsizeof([1])

64

In [50]:
# An array of one element in numpy
np.array([1]).nbytes

4

### And performance is also important

In [55]:
l = list(range(100000))

In [74]:
a = np.arange(100000)

In [80]:
%time np.sum(a ** 2)

CPU times: total: 0 ns
Wall time: 1.01 ms


216474736

In [62]:
%time sum([x ** 2 for x in l])

CPU times: total: 15.6 ms
Wall time: 11.2 ms


333328333350000

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Useful Numpy functions

### `random` 

In [106]:
np.random.random(size=4)

array([0.09462765, 0.66100104, 0.30343741, 0.98904495])

In [112]:
np.random.normal(size=2)

array([1.10285995, 0.112718  ])

In [89]:
np.random.rand(2, 4)

array([[0.26444711, 0.30400314, 0.17934046, 0.2838062 ],
       [0.67537864, 0.02105044, 0.33843918, 0.25329039]])

---
### `arange`

In [None]:
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [None]:
np.arange(5, 10)

array([5, 6, 7, 8, 9])

In [None]:
np.arange(0, 1, .1)

array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])

---
### `reshape`

In [None]:
my_lst = [1, 2, 3, 4, 5, 6, 7, 8, 9]

# Convert list to a NumPy array and then reshape it to 3x3
my_array = np.array(my_lst).reshape(3, 3)

my_array

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [None]:
np.arange(10).reshape(2, 5)

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [113]:
np.arange(10).reshape(5, 2)

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

---
### `linspace`

In [114]:
np.linspace(0, 1, 5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [115]:
np.linspace(0, 1, 20)

array([0.        , 0.05263158, 0.10526316, 0.15789474, 0.21052632,
       0.26315789, 0.31578947, 0.36842105, 0.42105263, 0.47368421,
       0.52631579, 0.57894737, 0.63157895, 0.68421053, 0.73684211,
       0.78947368, 0.84210526, 0.89473684, 0.94736842, 1.        ])

In [116]:
np.linspace(0, 1, 20, False)

array([0.  , 0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 , 0.45, 0.5 ,
       0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85, 0.9 , 0.95])

---
### `zeros`, `ones`, `empty`

In [117]:
np.zeros(5)

array([0., 0., 0., 0., 0.])

In [118]:
np.zeros((3, 3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [120]:
np.zeros((3, 3), dtype=np.int8)

array([[0, 0, 0],
       [0, 0, 0],
       [0, 0, 0]], dtype=int8)

In [121]:
np.ones(5)

array([1., 1., 1., 1., 1.])

In [None]:
np.ones((3, 3))

In [130]:
np.empty(5)

array([1., 1., 1., 1., 1.])

In [131]:
np.empty((2, 2))

array([[0.25, 0.5 ],
       [0.75, 1.  ]])

---
### `identity` and `eye`

In [132]:
np.identity(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [133]:
np.eye(3, 3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [None]:
np.eye(8, 4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [None]:
np.eye(8, 4, k=1)

array([[0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [134]:
np.eye(8, 4, k=-3)

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 0., 0.]])

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)