So far, You have learned about python `list` and how to use it for various tasks. However, `list` is not always the best choice for every situation. There are other data types that offer different advantages and trade-offs. 

we want to explore some of these alternatives and understand when and why they might be preferable to `list`. But before that, lets see what are different types of containers.

### Container Sequences

`list` is a **container sequence**. A **container sequence** holds references to the objects it contains, which may be of any type.

In [1]:
my_list = ["ali", 5, None, [1, 2, 3], max]

The `list` above contains objects of different data types: a string, an integer number, None datatype, another `list`, and finaly a function.

### Flat Sequences

a **flat sequence** stores the value of its contents in its own memory space, not as distinct Python objects. Thus, flat sequences are more compact, but they are limited to holding primitive machine values like bytes, integers, and floats.

A very well known data structure that you might have seen in a Basic Programming course, is `Array`.

An `Array` is a collection of items stored at contiguous memory locations. The idea is to store multiple items of the same type together. This makes it easier to calculate the position of each element by simply adding an offset to a base value, i.e., the memory location of the first element of the array (generally denoted by the name of the array).

The picture below demonstrates the difference between python `list` and `array`.
![image.png](attachment:image.png)

# Numpy

NumPy (Numerical Python) is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.

To install, uncomment the cell below and run it:

In [2]:
# !pip install numpy

To use a package, you just need to **import** it. once done, you can use it throughout the entire session.

For more convenience, we choose an alias like `np`.

In [3]:
import numpy as np

### The Benefits

Here are the top four benefits that NumPy can bring to your code:

1. **More speed:** NumPy uses algorithms written in C that complete in nanoseconds rather than seconds.
2. **Fewer loops:** NumPy helps you to reduce loops and keep from getting tangled up in iteration indices.
3. **Clearer code:** Without loops, your code will look more like the equations you’re trying to calculate.
4. **Better quality:** There are thousands of contributors working to keep NumPy fast, friendly, and bug free.

## Array Objects

NumPy provides an N-dimensional array type, the `ndarray`, which describes a collection of “items” of the same type. The items can be indexed using for example N integers.

An `ndarray` is a (usually fixed-size) multidimensional container of items of the same type and size. The number of dimensions and items in an array is defined by its `shape`, which is a `tuple` of N non-negative integers that specify the sizes of each dimension. The type of items in the array is specified by a separate data-type object (`dtype`), one of which is associated with each ndarray.

As with other container objects in Python, the contents of an ndarray can be accessed and modified by indexing or slicing the array (using, for example, N integers), and via the methods and attributes of the ndarray.

### Creating Arrays

In [4]:
my_list = [[1, 2, 3], [4, 5, 6]]
x = np.array(my_list)

print(x)
print()
print("the type of object is:", type(x))
print("the shape of x is:", x.shape)
print("the data type of elements of x is:", x.dtype)

[[1 2 3]
 [4 5 6]]

the type of object is: <class 'numpy.ndarray'>
the shape of x is: (2, 3)
the data type of elements of x is: int64


In [5]:
my_list = ["ali", 5]
x = np.array(my_list)

print(x)
print()
print("the type of object is:", type(x))
print("the shape of x is:", x.shape)
print("the data type of elements of x is:", x.dtype)

# <U21 represents strings with maximum length of 21
# the 5 is no longer an integer, its a string now.

['ali' '5']

the type of object is: <class 'numpy.ndarray'>
the shape of x is: (2,)
the data type of elements of x is: <U21


### Creating random arrays

The `random` module in NumPy provides various functions to create random arrays of any data type. The random module broadly consists of functions that:
- Create random arrays
- Create random permutations of arrays
- Generate arrays with specific probability distributions

In [6]:
# creating a matrix of shape (2, 3) with random numbers
# the samples are drawn from a uniform distribution
x = np.random.rand(2, 3)
print(x)

[[0.65586565 0.64386187 0.09724725]
 [0.55274171 0.86320698 0.28934086]]


In [7]:
# creating a matrix of shape (2, 3) with random numbers
# the samples are drawn from a standard normal distribution
x = np.random.randn(2, 3)
print(x)

[[ 1.05811845  0.42699568  0.46165164]
 [-0.026601   -0.57806791  0.09755761]]


In [8]:
# creating a matrix of shape (2, 3) with random integer numbers
# the samples are drawn from a uniform distribution
# random integers are from `low` (inclusive) to `high` (exclusive)

x = np.random.randint(low=1, high=20, size=(2, 3)) 
print(x)

[[13 15 16]
 [ 3 12  8]]


### Some other useful arrays
There are a few other array creation functions, such as `zeros()`, `ones()`, `eye()`, and others that can be used to create NumPy arrays. Their use is fairly straightforward.

In [9]:
# creating a matrix of shape (3, 4) with zeros
x = np.zeros((3, 4))
print(x)

[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]


In [10]:
# creating a matrix of shape (3, 4) with ones
x = np.ones((3, 4))
print(x)

[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]


In [11]:
# creating an Identity matrix of size 5
x = np.eye(5)
print(x)

[[1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]]


### Vectorization

**Vectorization** is the process of performing the same operation in the same way for each element in an array. This removes `for` loops from your code but achieves the same result.

In [12]:
x = np.arange(10)
print(x)

print(x + x)

[0 1 2 3 4 5 6 7 8 9]
[ 0  2  4  6  8 10 12 14 16 18]


In [13]:
# This is a very important feature that you should keep in mind.
print(x > 5)

[False False False False False False  True  True  True  True]


### Broadcasting
The term **Broadcasting** describes how NumPy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is **broadcast** across the larger array so that they have compatible shapes.

**Broadcasting** provides a means of vectorizing array operations so that looping occurs in C instead of Python. It does this without making needless copies of data and usually leads to efficient algorithm implementations. 

NumPy operations are usually done on pairs of arrays on an element-by-element basis. In the simplest case, the two arrays must have exactly the same shape, as in the following example:

In [14]:
a = np.array([1.0, 2.0, 3.0])
b = np.array([2.0, 2.0, 2.0])
print(a * b)

[2. 4. 6.]


NumPy’s broadcasting rule relaxes this constraint when the arrays’ shapes meet certain constraints. The simplest **Broadcasting** example occurs when an array and a scalar value are combined in an operation:

In [15]:
a = np.array([1.0, 2.0, 3.0])
b = 2.0
print(a * b)

[2. 4. 6.]


![broadcasting_1.png](attachment:broadcasting_1.png)

![broadcasting_2.png](attachment:broadcasting_2.png)

A one dimensional array added to a two dimensional array results in broadcasting if number of 1-d array elements matches the number of 2-d array columns.

When the trailing dimensions of the arrays are unequal, broadcasting fails because it is impossible to align the values in the rows of the 1st array with the elements of the 2nd arrays for element-by-element addition.


![broadcasting_3.png](attachment:broadcasting_3.png)

## What is the point?

You might be wondering how these features are important.

The fact is that not only do they make operations easier, but also much faster and more efficient.

Let's see some examples:

In [16]:
# Trying to multiply each element by 2 in lists
my_list = [1, 2, 3, 4, 5, 6]
print(my_list * 2)

[1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6]


In [17]:
# above is not the behavior we wanted :(
# we need to use for loops
my_list = [1, 2, 3, 4, 5, 6]

for i in range(len(my_list)):
    my_list[i] *= 2 
    
print(my_list)

[2, 4, 6, 8, 10, 12]


In [18]:
# lets do the same operation using Numpy
my_list = [1, 2, 3, 4, 5, 6]
my_arr = np.array(my_list)
my_arr = my_arr * 2

print(my_arr)

[ 2  4  6  8 10 12]


with `%%timeit` magic command, we can check how long it takes to run a cell (The effects do not apply!).

Lets test the performance for large arrays:


In [19]:
# generating a list containing 10000 random integers
my_list = [np.random.randint(1, 200) for _ in range(10000)]
my_arr = np.array(my_list)
print(my_list[:5])

[50, 130, 39, 181, 34]


In [20]:
%%timeit
for i in range(len(my_list)):
    my_list[i] *= 2 

2.98 ms ± 1.09 ms per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [21]:
%%timeit
my_arr * 2

4.62 µs ± 126 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


The numpy performs this task 500 times faster !!!!!!

### Reshaping Arrays

Another important concept here is to reshape your NumPy Arrays, especially when you are dealing with multidimensional arrays. It's common for you to create a NumPy Array in just one dimension, reshaping it to a multidimension later, or vice versa. A key idea here is that you can change the shape of your arrays, but the number of elements should not be changed; for example, you can't reshape a `3x3` array to a `10x1` array. The total number of elements (or a so-called data buffer in the ndarray internal organization) should be consistent before and after reshaping. Or ,you might need to resize, but that's another story. Now, let's look at some shape manipulations:

In [22]:
x = np.arange(24)
print(x)
print(x.shape)
print()

x = x.reshape(6, 4)
print(x)
print(x.shape)
print()

x = x.reshape(3, -1)
print(x)
print(x.shape)
print()

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]
(24,)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]
 [20 21 22 23]]
(6, 4)

[[ 0  1  2  3  4  5  6  7]
 [ 8  9 10 11 12 13 14 15]
 [16 17 18 19 20 21 22 23]]
(3, 8)



### Indexing

You can access the values of an array just similar to what we had in `list`.

In [23]:
x = np.arange(16)
print(x)
print(x[3])

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15]
3


In [24]:
x = x.reshape(-1, 4)
print(x)
print(x[3, 1])

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]
13


In [25]:
print(x[1:3, 0:4:2])

[[ 4  6]
 [ 8 10]]


**Important Note**

Some times you might use a numpy array to create another numpy array by selecting some of its rows and columns. This might lead to a behavior in numpy, taking a `view` instead of a `copy`. 

You don't need to know the details, but you probably should :)

For more information, check the [numpy documentation about Copies and views](https://numpy.org/doc/stable/user/basics.copies.html)

Lets see what might happen:

In [26]:
x = np.arange(24).reshape(4, -1)
print(x)

[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]]


In [27]:
y = x[0:3, 1:4]
print(y)

[[ 1  2  3]
 [ 7  8  9]
 [13 14 15]]


In [28]:
# Lets check if they share memory or not.
print(np.may_share_memory(x, y))

True


In [29]:
# lets change values in x
x[0:3] = 10
print(x)
print()

# now lets see what happend to y
print(y)

[[10 10 10 10 10 10]
 [10 10 10 10 10 10]
 [10 10 10 10 10 10]
 [18 19 20 21 22 23]]

[[10 10 10]
 [10 10 10]
 [10 10 10]]


If you want to ensure that 2 arrays are not using shared memories to prevent such problems, you can use `np.may_share_memory(x, y)`.

If that returned `True`, just tell numpy that you want a copy.
see the example:

In [30]:
x = np.arange(24).reshape(4, -1)
y = x[0:3, 1:4].copy()

# Lets check if they share memory or not.
print(np.may_share_memory(x, y))

False


In [31]:
# lets change values in x
x[0:3] = 10
print(x)
print()

# now lets see what happend to y
print(y)

[[10 10 10 10 10 10]
 [10 10 10 10 10 10]
 [10 10 10 10 10 10]
 [18 19 20 21 22 23]]

[[ 1  2  3]
 [ 7  8  9]
 [13 14 15]]


### Filtering

If you pass a numpy array containing Boolean values (with the same shape as the array) as `obj` in `arr[obj]`, the returned value will be elements that correspond to **True** values.

Lets see an example:

In [32]:
x = np.arange(24).reshape(4, -1)
print(x)
print(x.shape)

[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]]
(4, 6)


In [33]:
mask = x > 8
print(mask)
print(mask.shape)

[[False False False False False False]
 [False False False  True  True  True]
 [ True  True  True  True  True  True]
 [ True  True  True  True  True  True]]
(4, 6)


In [34]:
print(x[mask])

[ 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]


Another example:

In [35]:
x = np.arange(24).reshape(4, -1)
# selecting even values only
mask = ((x % 2) == 0) 
print(mask)

[[ True False  True False  True False]
 [ True False  True False  True False]
 [ True False  True False  True False]
 [ True False  True False  True False]]


In [36]:
print(x[mask])

[ 0  2  4  6  8 10 12 14 16 18 20 22]


### Aggregation

Using aggregating functions such as `max`, `min`, `mean` can't become any easier when we have numpy :)

You just need to understand how `axis'` are in numpy array. see the picture below:
![image.png](attachment:image.png)

In [37]:
# creating a random array
x = np.random.randint(0, 100, (10, 5))
print(x)

[[96 78 38 15 12]
 [31 15 96 18 57]
 [47 60 40 71 69]
 [34 11 43 39 36]
 [36 87 30 56  3]
 [17 36 54  3 29]
 [42 49 41 32 74]
 [35 19 22 46 71]
 [62 22 23 60 60]
 [49 29 20 50 16]]


In [38]:
print(x.sum())

2079


In [39]:
print(x.sum(axis=0))

[449 406 407 390 427]


In [40]:
print(x.sum(axis=1))

[239 217 287 163 212 139 238 193 227 164]


Just the same logic works for other aggregate functions as well:

In [41]:
print(x.mean(axis=1))

[47.8 43.4 57.4 32.6 42.4 27.8 47.6 38.6 45.4 32.8]


In [42]:
# std ~ standard deviation
print(x.std(axis=0))

[20.49609719 25.49195952 21.1047388  20.70265683 25.37735211]


# References

https://numpy.org/doc

https://www.geeksforgeeks.org

https://github.com/pytopia
