# Data manipulation with numpy 🪂
Let's practice numpy !

In this exercise, you'll have to initialize your own vectors, matrices and tensors and perform basic operations: selecting elements, slicing, masking, reshaping etc...

The last part of the exercise takes advantage of the possibility to draw random values from a known distribution to make you (re-)discover a very famous theorem of Statistics 🤓

## Dealing with vectors ⛹️

1. Import numpy

In [1]:
import numpy as np

2. Initialize a numpy array called `vec` which contains values from 0 to 10 with a step of 0.5

In [2]:
vec = np.arange(0,10.5,0.5)
vec

array([ 0. ,  0.5,  1. ,  1.5,  2. ,  2.5,  3. ,  3.5,  4. ,  4.5,  5. ,
        5.5,  6. ,  6.5,  7. ,  7.5,  8. ,  8.5,  9. ,  9.5, 10. ])

3. What is the shape of `vec` ?

In [3]:
print(np.shape(vec))

(21,)


4. Display the 7th value of `vec`

In [4]:
print(vec[6])

3.0


5. Display the 3 first items of `vec`

In [5]:
print(vec[:3])

[0.  0.5 1. ]


6. Display the 3 last items

In [6]:
print(vec[-3:])

[ 9.   9.5 10. ]


7. By using masks, select values of `vec` that are below 7

In [7]:
print(vec < 7)

[ True  True  True  True  True  True  True  True  True  True  True  True
  True  True False False False False False False False]


In [8]:
vecNew = vec < 7
print(vec[vecNew])

[0.  0.5 1.  1.5 2.  2.5 3.  3.5 4.  4.5 5.  5.5 6.  6.5]


## Dealing with matrices 🏋️‍♀️
8. Define a function called `my_func` that takes to arguments `x`and `y`and returns: $f(x) = x^2 + y$.

In [9]:
def my_func(x,y):
    return pow(x,2) + y

9. Use `my_func` to initialize a 4x4 matrix:

In [10]:
matrix = np.fromfunction(my_func,(4,4),dtype=int)
matrix


array([[ 0,  1,  2,  3],
       [ 1,  2,  3,  4],
       [ 4,  5,  6,  7],
       [ 9, 10, 11, 12]])

10. Iterate other the matrix' values and for each value, find a way of computing its [remainder](https://en.wikipedia.org/wiki/Remainder) in the integer division by 2.

Hint: There exists an operator (like `+`or `*`) that allows to compute the remainder in integer division. [Python's doc](https://docs.python.org/3/library/operator.html) may help you 😉

In [11]:
for item in matrix.flat:
    print("When dividing "+str(item)+" by 2, the remainder is: "+ str(item %2))
    

When dividing 0 by 2, the remainder is: 0
When dividing 1 by 2, the remainder is: 1
When dividing 2 by 2, the remainder is: 0
When dividing 3 by 2, the remainder is: 1
When dividing 1 by 2, the remainder is: 1
When dividing 2 by 2, the remainder is: 0
When dividing 3 by 2, the remainder is: 1
When dividing 4 by 2, the remainder is: 0
When dividing 4 by 2, the remainder is: 0
When dividing 5 by 2, the remainder is: 1
When dividing 6 by 2, the remainder is: 0
When dividing 7 by 2, the remainder is: 1
When dividing 9 by 2, the remainder is: 1
When dividing 10 by 2, the remainder is: 0
When dividing 11 by 2, the remainder is: 1
When dividing 12 by 2, the remainder is: 0


10. Once you've found the operator that allows to compute the remainder, use it to create a mask that allows to select only even numbers in your matrix. Store these values into an array called `even_numbers`.

In [12]:
even_numbers = []
for item in matrix.flat:
    if (item %2) < 1:
        even_numbers.append(item)
              

even_numbers = np.array(even_numbers)
even_numbers

array([ 0,  2,  2,  4,  4,  6, 10, 12])

11. Reshape `even_numbers` into a 2x4 matrix and apply the `log` function to its elements.

In [13]:
even_numbers.reshape(2,4)

array([[ 0,  2,  2,  4],
       [ 4,  6, 10, 12]])

In [14]:
np.log(even_numbers)

  np.log(even_numbers)


array([      -inf, 0.69314718, 0.69314718, 1.38629436, 1.38629436,
       1.79175947, 2.30258509, 2.48490665])

## From vector to matrix, from matrix to tensor and the way back 🤹
12. Initialize a vector named `vec` containing the 100 first integers:

In [51]:
def fINT(x,y):
    return x+y


vec = np.fromfunction(fINT,(1,100),dtype=int)
vec

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,
        16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
        32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
        48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
        64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79,
        80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95,
        96, 97, 98, 99]])

13. Create a 10x10 `matrix` containing the values of `vec`

In [52]:
matrix = vec.reshape(10,10)
matrix

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
       [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
       [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
       [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])

14. We'd like to create a `tensor` of rank 3 by reshaping the `matrix` such that it will be structured like several layers of 5x5 matrices. Find a way to do this operation with `.reshape`:

In [53]:
matrix = matrix.reshape(4,5,5)
matrix
# tensor = np.array([[1],[1],[1],[1]])
# print(tensor)
# i=0
# for tensors in np.reshape(matrix,(4,25)):
#     tmp = np.reshape(tensors,(5,5))
#     tensor(i,tmp)
#     print(tensors)
#     i=i+1

# tensor


array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19],
        [20, 21, 22, 23, 24]],

       [[25, 26, 27, 28, 29],
        [30, 31, 32, 33, 34],
        [35, 36, 37, 38, 39],
        [40, 41, 42, 43, 44],
        [45, 46, 47, 48, 49]],

       [[50, 51, 52, 53, 54],
        [55, 56, 57, 58, 59],
        [60, 61, 62, 63, 64],
        [65, 66, 67, 68, 69],
        [70, 71, 72, 73, 74]],

       [[75, 76, 77, 78, 79],
        [80, 81, 82, 83, 84],
        [85, 86, 87, 88, 89],
        [90, 91, 92, 93, 94],
        [95, 96, 97, 98, 99]]])

15. Select the tensor's first layer (the first 5x5 matrix)

In [54]:
matrix[0]

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24]])

16. Among this first layer matrix, select the element at line index 1 and column index 2:

In [55]:
matrix[0][1][2]


np.int64(7)

17. Still among the first layer matrix, select all the elements of column 2:

In [56]:
# Utiliser la transposition pour recuprer une colonne
matrix[0].T[2]

array([ 2,  7, 12, 17, 22])

18. Still among the first layer matrix, select all the elements of line 1:

In [57]:
matrix[0][1:2]

array([[5, 6, 7, 8, 9]])

19. Re-create the initial 10x10 matrix from your `tensor`

In [58]:
matrix = matrix.reshape(10,10)

20. Re-create the initial vector of 100 elements from you `tensor`

In [60]:
matrix = matrix.reshape(100)
matrix

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
       51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
       68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
       85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])

## A famous theorem 🤓

Let's use numpy's random values generator to re-discover one of the most famous theorems of statistics !

21. Generate an array named `uniform_values` containing 10.000 elements that are drawn from a (continuous) uniform distribution in the interval [0, 10].

The cell below allows you to visualize the distribution of the values.

In [36]:
uniform_values = np.random.uniform(0, 10, 10000)
uniform_values

array([4.34217578, 9.19768864, 5.29700721, ..., 1.72815506, 2.99804344,
       3.485628  ])

In [37]:
#pip install --upgrade nbformat
#pip install pandas

import plotly.express as px

px.histogram(uniform_values)

22. Now, create a loop with 1000 iterations. At each iteration:
* Draw a new sample of 10.000 values drawn from a (continuous) uniform distribution in the interval [0, 10]
* Compute the sample's mean (hint: [numpy arrays](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html) have many useful methods to compute basic statistics)
* Store the mean value into a list named `mean_values`.

In the end, you must get a list named `mean_values` containing 1000 elements. The cell after will allow you to visualize the distribution of the elements in `mean_values`.

In [49]:
mean_values=[]
for i in range (0,999):
    mean_values.append(np.mean((np.random.uniform(0, 10, 10000))))

print(mean_values)

[np.float64(4.980030633899145), np.float64(5.044745077159247), np.float64(5.040099435946245), np.float64(4.9833384737818545), np.float64(4.950939232547789), np.float64(4.9998253059296625), np.float64(4.986276878390851), np.float64(5.052915003624643), np.float64(4.964037688305165), np.float64(5.008590353958133), np.float64(5.040434707871427), np.float64(5.019253358924555), np.float64(5.051680845930399), np.float64(4.983441972416749), np.float64(4.973895409585439), np.float64(5.01322675284239), np.float64(4.989202319870937), np.float64(4.98852682466769), np.float64(5.038192110763313), np.float64(5.038763094629133), np.float64(5.044062180958907), np.float64(5.000509616893797), np.float64(5.016125336624988), np.float64(5.010558520942923), np.float64(4.983994995125892), np.float64(4.999814187914188), np.float64(5.03399306644451), np.float64(4.973949297416014), np.float64(4.9949136393878835), np.float64(4.991021476206584), np.float64(4.947372493776052), np.float64(4.963328715578084), np.floa

In [50]:
px.histogram(mean_values)

23. Do you recognize this curve? Which probability density does it represent ?


##my rep = It's a normal curve, ... IDK 

Right one : Normal distribution of a bell curve

24. What is the name of the famous theorem that explains why we just got a bell-shaped curve ?

##my rep : It's a normal law, it means that the repartition of the dataset is naturally randomized

Right One : Central Limit Theorem 