<a href="https://colab.research.google.com/github/alearecuest/0x00-Fix_My_Code_Challenge/blob/main/2_1_1_PRACTICE_Numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

---

# 2. NumPy

It is very important that you google and consult the documentation. If you didn't install it, now is a good time. When we want to use a library, we must first import it.

Now our work environment knows that if we put something like __np.__ it means that this functionality must be found in NumPy. Important: If the library was NOT installed correctly, you will get an error message.

The main data type that NumPy works on are arrays. Arrays are similar to lists and can actually be created from them.

In [1]:
import numpy as np

In [3]:
my_list = [0,1,2,3,4,5]
array = np.array(my_list)

print(my_list)
print(array)

[0, 1, 2, 3, 4, 5]
[0 1 2 3 4 5]


But they are more than a list. Some things that couldn't be done with lists can now be done with fixes. Remember that adding a number to the entire list was not allowed.

In [4]:
my_list + 1


TypeError: can only concatenate list (not "int") to list

But now with the arrays yes!

In [5]:
array + 1

array([1, 2, 3, 4, 5, 6])

In [6]:
print(array - 5)
print(array - 2)
print(array * 4)
print(array ** 2)

[-5 -4 -3 -2 -1  0]
[-2 -1  0  1  2  3]
[ 0  4  8 12 16 20]
[ 0  1  4  9 16 25]


**TODO:** Except for multiplication, none was allowed in lists. What does multiplication do for a list?

## Array Creation

While we can create arrays from lists, NumPy comes with many features to do so. Let's see some.

A widely used one is __np.arange()__. Check your documentation. Alternatively, in a code cell, type __np.arange__ and press __shift + tab__. That way, help will appear.

Play around with the example below.

In [7]:
array = np.arange(3, 20, 2)
print(array)

[ 3  5  7  9 11 13 15 17 19]


**TODO:**
Investigate and create examples with the following functions

* np.linspace
* np.arange
* np.zeros and np.ones

In [8]:
import numpy as np

# 1. np.linspace
# Useful for plotting when you need exact points between a start and end.
linspace_ex = np.linspace(0, 10, 5)
print(f"Linspace (0 to 10, 5 points): {linspace_ex}")

# 2. np.arange
# Works like Python's range(): start, stop (exclusive), step.
arange_ex = np.arange(0, 10, 2)
print(f"Arange (0 to 10, step 2): {arange_ex}")

# 3. np.zeros
# Pass a number (vector) or a tuple (matrix/2D array).
zeros_ex = np.zeros((3, 3)) # 3x3 Matrix
print(f"\nZeros (3x3):\n{zeros_ex}")

# 4. np.ones
# Similar to zeros, but filled with ones.
ones_ex = np.ones(5)
print(f"\nOnes (vector of 5): {ones_ex}")

Linspace (0 to 10, 5 points): [ 0.   2.5  5.   7.5 10. ]
Arange (0 to 10, step 2): [0 2 4 6 8]

Zeros (3x3):
[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]

Ones (vector of 5): [1. 1. 1. 1. 1.]


## Array Shape
Arrays have many more properties than lists. In particular, they may have more than one axis or dimension. Let's see what we mean:

In [None]:
array_2d = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(array_2d)

[[1 2 3 4]
 [5 6 7 8]]


In [None]:
array_2d.shape

(2, 4)

In [None]:
array_2d = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
print(array_2d)
print(array_2d.shape)

[[1 2]
 [3 4]
 [5 6]
 [7 8]]
(4, 2)


If we want to know how many elements it has, we can use __.size__

In [None]:
print(array_2d.size)

8


## Arrays operations

NumPy arrays come with a bunch of functions that operate on arrays.

In [None]:
array = np.array([-100,2,3,17,25,1,95])

print(array.min())
print(array.max())

-100
95


In the 2D case, we can request that these functions operate on the entire array, or by axes.

**TODO:** Try to understand the difference between the following instructions

In [10]:
import numpy as np

array_2d = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

print("Original Array:")
print(array_2d)
print("-" * 20)

# 1. Max of the entire array (MÃ¡ximo global)
print(f"Max total: {array_2d.max()}")

# 2. Max along axis 0 (Columns/Vertical)
# (1 vs 4 vs 7), (2 vs 5 vs 8), (3 vs 6 vs 9)
print(f"Max axis=0 (Columns): {array_2d.max(axis=0)}")

# 3. Max along axis 1 (Rows/Horizontal)
# (1,2,3 -> 3), (4,5,6 -> 6), (7,8,9 -> 9)
print(f"Max axis=1 (Rows): {array_2d.max(axis=1)}")

print(array_2d.max())
print(array_2d.max(axis=0))
print(array_2d.max(axis=1))

Original Array:
[[1 2 3]
 [4 5 6]
 [7 8 9]]
--------------------
Max total: 9
Max axis=0 (Columns): [7 8 9]
Max axis=1 (Rows): [3 6 9]
9
[7 8 9]
[3 6 9]


## Slicing

We didn't say anything about indexing or slicing. This is because, for 1D arrays, it is similar to that for lists. For 2D arrays, it's slightly more complicated.

In [11]:
array_2d = np.arange(9).reshape(3,3)

print(array_2d)
print(array_2d[1,:])

[[0 1 2]
 [3 4 5]
 [6 7 8]]
[3 4 5]


**TODO:** Write an array with 100 equally spaced numbers from 0 to 9

In [12]:
# To complete

array_100 = np.linspace(0, 9, 100)
print(array_100)

[0.         0.09090909 0.18181818 0.27272727 0.36363636 0.45454545
 0.54545455 0.63636364 0.72727273 0.81818182 0.90909091 1.
 1.09090909 1.18181818 1.27272727 1.36363636 1.45454545 1.54545455
 1.63636364 1.72727273 1.81818182 1.90909091 2.         2.09090909
 2.18181818 2.27272727 2.36363636 2.45454545 2.54545455 2.63636364
 2.72727273 2.81818182 2.90909091 3.         3.09090909 3.18181818
 3.27272727 3.36363636 3.45454545 3.54545455 3.63636364 3.72727273
 3.81818182 3.90909091 4.         4.09090909 4.18181818 4.27272727
 4.36363636 4.45454545 4.54545455 4.63636364 4.72727273 4.81818182
 4.90909091 5.         5.09090909 5.18181818 5.27272727 5.36363636
 5.45454545 5.54545455 5.63636364 5.72727273 5.81818182 5.90909091
 6.         6.09090909 6.18181818 6.27272727 6.36363636 6.45454545
 6.54545455 6.63636364 6.72727273 6.81818182 6.90909091 7.
 7.09090909 7.18181818 7.27272727 7.36363636 7.45454545 7.54545455
 7.63636364 7.72727273 7.81818182 7.90909091 8.         8.09090909
 8.18181818

**TODO:** Create a 1D array of 20 zeros. Replace the first 15 elements with ones.

In [13]:
# To complete

zeros_20 = np.zeros(20)
zeros_20[:15] = 1
print(zeros_20)

[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 0. 0. 0. 0. 0.]


**TODO:** Create a 1D array of 50 zeros. Replace the first 25 elements with the natural numbers from 0 to 24.

In [14]:
# To complete

zeros_50 = np.zeros(50)
zeros_50[:25] = np.arange(25)
print(zeros_50)

[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12. 13. 14. 15. 16. 17.
 18. 19. 20. 21. 22. 23. 24.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]


**TODO:** Create a 2D array of 3 rows and 3 columns, filled with zeros. Replace the elements of the second column by the numbers 1, 2 and 3 respectively.

In [15]:
# To complete

matrix_3x3 = np.zeros((3, 3))
matrix_3x3[:, 1] = [1, 2, 3]
print(matrix_3x3)

[[0. 1. 0.]
 [0. 2. 0.]
 [0. 3. 0.]]


**TODO:** Create a 2D array of 3 rows and 3 columns, filled with zeros. Replace the elements of the diagonal by ones.

In [23]:
import numpy as np

matrix_diag = np.zeros((3, 3))

np.fill_diagonal(matrix_diag, 1)

print(matrix_diag)

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


In [17]:
np.identity(3)


array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [18]:
# To complete
import numpy as np

arr = np.zeros((3,3))
diag = np.diag([1]*3)
w = np.where(diag == 1)
arr[w] = 1

arr

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [19]:
w = np.where(diag == 1)
arr = np.zeros((3,3))
arr[w] = 1
arr

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [20]:
np.where(diag == 1)

(array([0, 1, 2]), array([0, 1, 2]))

In [21]:
np.sort
np.argsort


<function argsort at 0x7ce941361330>

**TODO:** Create a 2D array of 100 rows and 100 columns, filled with zeros. Replace the elements of the diagonal by ones.

In [22]:
# To complete

matrix_huge = np.eye(100)
print(matrix_huge)

[[1. 0. 0. ... 0. 0. 0.]
 [0. 1. 0. ... 0. 0. 0.]
 [0. 0. 1. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 1. 0. 0.]
 [0. 0. 0. ... 0. 1. 0.]
 [0. 0. 0. ... 0. 0. 1.]]


## Statistics

Descriptive Statistics helps us to begin to analyze and understand a set of data. In the case of numerical data, it does so by obtaining statistical values that, in some way, replace our data. For example, it is very difficult to read and understand the age of 1000 people. But with a reduced set of statistical values (minimum, maximum, mean and standard deviation, etc.) we can approximate that set in a much more understandable way.

Given a set of numbers, the average is usually considered the most representative number of that set. This is not always the case.

**TODO:**
Given the following list of numbers, write a routine that computes their mean, [variance](https://en.wikipedia.org/wiki/Variance), and [standard deviation](https://en.wikipedia.org/wiki/Standard_deviation).

In [24]:
my_list = [11, 12, 63, 31, 82, 56, 32, 686, 1, 2, 39, 9, 99, 2, 1]
n = len(my_list)

In [25]:
# To complete

# 1. Mean (Average)
mean = sum(my_list) / n

# 2. Variance (Average of squared differences from the mean)
variance = sum((x - mean) ** 2 for x in my_list) / n

# 3. Standard Deviation (Square root of variance)
std_dev = variance ** 0.5

print(f"Mean: {mean}")
print(f"Variance: {variance}")
print(f"Std Dev: {std_dev}")

Mean: 75.06666666666666
Variance: 27570.862222222226
Std Dev: 166.04475969515636


**TODO:** Now with NumPy

In [26]:
# To complete

import numpy as np

# Convert list to array
my_array = np.array(my_list)

print(f"NumPy Mean: {np.mean(my_array)}")
print(f"NumPy Variance: {np.var(my_array)}")
print(f"NumPy Std Dev: {np.std(my_array)}")

NumPy Mean: 75.06666666666666
NumPy Variance: 27570.862222222226
NumPy Std Dev: 166.04475969515636


## Random

One extremely useful thing we can do with NumPy is to generate random samples. These functions are found within the NumPy random package.

In [None]:
dice_samples = np.random.randint(1, 7, size=15)
print(dice_samples)

dice_samples = np.random.choice([1, 2, 3, 4, 5, 6], size=15)
print(dice_samples)

[3 1 1 6 1 4 5 2 3 3 4 1 6 6 2]
[2 3 2 2 3 5 5 3 4 1 2 6 2 6 4]


**TODO:**
What will be the mean of the values obtained by throwing a dice many times? We are going to try to answer this question by simulating a dice. For it:

* Get random samples of a dice with differet sample size (e.g. 10, 100, 1000, ...).
* Calculate its mean and standard deviation.
* From what number of samples does the average "stabilize"?

In [27]:
# To complete

sample_sizes = [10, 100, 1000, 10000, 100000]

print("Simulating fair dice throws:")
for size in sample_sizes:
    throws = np.random.randint(1, 7, size=size)

    current_mean = np.mean(throws)
    current_std = np.std(throws)

    print(f"Throws: {size:7d} | Mean: {current_mean:.4f} | Std: {current_std:.4f}")

Simulating fair dice throws:
Throws:      10 | Mean: 3.2000 | Std: 1.1662
Throws:     100 | Mean: 3.2200 | Std: 1.7411
Throws:    1000 | Mean: 3.4410 | Std: 1.7013
Throws:   10000 | Mean: 3.5074 | Std: 1.7087
Throws:  100000 | Mean: 3.4991 | Std: 1.7057


**TODO:**

Simulate a loaded dice to favor a value of your choice. For example, the 6. To do this, consult the help of the __np.random.choice__ function. How do you modify the mean and standard deviation?

In [28]:
# To complete

# Values of the dice
dice_faces = [1, 2, 3, 4, 5, 6]

# Probabilities: The first 5 have 10% chance, the 6 has 50% chance.
probabilities = [0.1, 0.1, 0.1, 0.1, 0.1, 0.5]

# Simulate 10,000 throws with the loaded dice
loaded_throws = np.random.choice(dice_faces, size=10000, p=probabilities)

print(f"\nLoaded Dice Mean: {np.mean(loaded_throws)}")
print(f"Loaded Dice Std: {np.std(loaded_throws)}")


Loaded Dice Mean: 4.4927
Loaded Dice Std: 1.8116695918406311
