---

# 2. NumPy

It is very important that you google and consult the documentation. If you didn't install it, now is a good time. When we want to use a library, we must first import it.

Now our work environment knows that if we put something like __np.__ it means that this functionality must be found in NumPy. Important: If the library was NOT installed correctly, you will get an error message.

The main data type that NumPy works on are arrays. Arrays are similar to lists and can actually be created from them.

In [None]:
import numpy as np

In [None]:
my_list = [0,1,2,3,4,5]
array = np.array(my_list)

print(my_list)
print(array)
print()

[0, 1, 2, 3, 4, 5]
[0 1 2 3 4 5]



But they are more than a list. Some things that couldn't be done with lists can now be done with fixes. Remember that adding a number to the entire list was not allowed.

In [None]:
my_list + 1

TypeError: ignored

But now with the arrays yes!

In [None]:
array + 1

array([1, 2, 3, 4, 5, 6])

In [None]:
print(array - 5)
print(array - 2)
print(array * 4)
print(array ** 2)

[-5 -4 -3 -2 -1  0]
[-2 -1  0  1  2  3]
[ 0  4  8 12 16 20]
[ 0  1  4  9 16 25]


**TODO:** Except for multiplication, none was allowed in lists. What does multiplication do for a list?

**Concatenation as well as string**

In [None]:
my_list= ["Hello"]
print([my_list]*3)
print("Hello"*3)

[['Hello'], ['Hello'], ['Hello']]
HelloHelloHello


## Array Creation

While we can create arrays from lists, NumPy comes with many features to do so. Let's see some.

A widely used one is __np.arange()__. Check your documentation. Alternatively, in a code cell, type __np.arange__ and press __shift + tab__. That way, help will appear.

Play around with the example below.

In [None]:
array = np.arange(3, 20, 2)
print(array)

[ 3  5  7  9 11 13 15 17 19]


In [None]:
array2 = np.arange(10,21,5)
print(array2)

[10 15 20]


In [None]:
array3= np.arange(10,0,-1)
print(array3)

[10  9  8  7  6  5  4  3  2  1]


**TODO:**
Investigate and create examples with the following functions

* np.linspace
* np.arange
* np.zeros and np.ones

In [None]:
array1 = np.arange(0,51,10)
print(array1)
print("El shape de array1 es:", array1.shape)
print("El size del array1 es:",array1.size,"\n",sep="")

array2 = np.linspace(1,101,49)
print(array2)
print("El shape de array2 es:", array2.shape)
print("El size del array2 es:",array2.size,"\n",sep="")

array3= np.zeros((3,4), dtype=int)
print(array3)
print("El shape de array3 es:", array3.shape)
print("El size del array3 es:",array3.size,"\n",sep="")

array4 = np.ones((2,10),dtype=str)
print(array4)
print("El shape de array4 es:", array4.shape)
print("El size del array4 es:",array4.size,"\n",sep="")

[ 0 10 20 30 40 50]
El shape de array1 es: (6,)
El size del array1 es:6

[  1.           3.08333333   5.16666667   7.25         9.33333333
  11.41666667  13.5         15.58333333  17.66666667  19.75
  21.83333333  23.91666667  26.          28.08333333  30.16666667
  32.25        34.33333333  36.41666667  38.5         40.58333333
  42.66666667  44.75        46.83333333  48.91666667  51.
  53.08333333  55.16666667  57.25        59.33333333  61.41666667
  63.5         65.58333333  67.66666667  69.75        71.83333333
  73.91666667  76.          78.08333333  80.16666667  82.25
  84.33333333  86.41666667  88.5         90.58333333  92.66666667
  94.75        96.83333333  98.91666667 101.        ]
El shape de array2 es: (49,)
El size del array2 es:49

[[0 0 0 0]
 [0 0 0 0]
 [0 0 0 0]]
El shape de array3 es: (3, 4)
El size del array3 es:12

[['1' '1' '1' '1' '1' '1' '1' '1' '1' '1']
 ['1' '1' '1' '1' '1' '1' '1' '1' '1' '1']]
El shape de array4 es: (2, 10)
El size del array4 es:20



## Array Shape
Arrays have many more properties than lists. In particular, they may have more than one axis or dimension. Let's see what we mean:

In [None]:
array_2d = np.array([
                     [1, 2, 3, 4], 
                     [5, 6, 7, 8]
                     ])
print(array_2d)


[[1 2 3 4]
 [5 6 7 8]]


In [None]:
array_2d.shape

(2, 4)

In [None]:
array_2d = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
print(array_2d)
print(array_2d.shape)
print(array_2d.size)

[[1 2]
 [3 4]
 [5 6]
 [7 8]]
(4, 2)
8


In [None]:
array_2d2= np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]], dtype=int)
print(array_2d2)
print(array_2d2.size)
print(array_2d2.shape)
print(type(array_2d2))
print(type(array_2d2[0][0]))
print(array_2d2[0][2])

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
12
(4, 3)
<class 'numpy.ndarray'>
<class 'numpy.int64'>
3


If we want to know how many elements it has, we can use __.size__

In [None]:
print(array_2d.size)

8


## Arrays operations

NumPy arrays come with a bunch of functions that operate on arrays.

In [None]:
array = np.array([-100,2,3,17,25,1,95])
print(array.min())
print(array.max())

-100
95


In the 2D case, we can request that these functions operate on the entire array, or by axes.

**TODO:** Try to understand the difference between the following instructions

In [None]:
print("-----Array 2d-----")
print(array_2d,"\n")
print("El shape de array_2d es:", array_2d.shape)
print("El size del array_2d es:", array_2d.size,"\n",sep="")
print("-----Array 2d max----")
print("The maximum value of the entire array_2d is:",array_2d.max())
print("The maximum value of the axis 0 of array_2d is:",array_2d.max(axis=0))
print("The maximum value of the axis 1 of array_2d is:",array_2d.max(axis=1),"\n",sep="")
print("-----Array 2d min----")
print("The minimum value of the entire array_2d is:",array_2d.min())
print("The minimum value of the axis 0 of array_2d is:",array_2d.min(axis=0))
print("The minimum value of the axis 1 of array_2d is:",array_2d.min(axis=1),"\n",sep="")
print("-----Array 3d-----")
array_3d= np.arange(9,1,-1).reshape(2,2,2)
print(array_3d,"\n",sep="")
print("El shape de array_3d es:", array_3d.shape)
print("El size del array_3d es:", array_3d.size,"\n",sep="")
print("-----Array 3d max----")
print("The maximum value of the entire array_3d is:",array_3d.max())
print("The maximum value of the axis 0 of array_3d is:",array_3d.max(axis=0))
print("The maximum value of the axis 1 of array_3d is:",array_3d.max(axis=1))
print("The maximum value of the axis 2 of array_3d is:",array_3d.max(axis=2),"\n",sep="")
print("-----Array 3d min----")
print("The minimum value of the entire array_3d is:",array_3d.min())
print("The minimum value of the axis 0 of array_3d is:",array_3d.min(axis=0))
print("The minimum value of the axis 1 of array_3d is:",array_3d.min(axis=1))
print("The minimum value of the axis 2 of array_3d is:",array_3d.min(axis=2),"\n",sep="")

-----Array 2d-----
[[1 2]
 [3 4]
 [5 6]
 [7 8]] 

El shape de array_2d es: (4, 2)
El size del array_2d es:8

-----Array 2d max----
The maximum value of the entire array_2d is: 8
The maximum value of the axis 0 of array_2d is: [7 8]
The maximum value of the axis 1 of array_2d is:[2 4 6 8]

-----Array 2d min----
The minimum value of the entire array_2d is: 1
The minimum value of the axis 0 of array_2d is: [1 2]
The minimum value of the axis 1 of array_2d is:[1 3 5 7]

-----Array 3d-----
[[[9 8]
  [7 6]]

 [[5 4]
  [3 2]]]

El shape de array_3d es: (2, 2, 2)
El size del array_3d es:8

-----Array 3d max----
The maximum value of the entire array_3d is: 9
The maximum value of the axis 0 of array_3d is: [[9 8]
 [7 6]]
The maximum value of the axis 1 of array_3d is: [[9 8]
 [5 4]]
The maximum value of the axis 2 of array_3d is:[[9 7]
 [5 3]]

-----Array 3d min----
The minimum value of the entire array_3d is: 2
The minimum value of the axis 0 of array_3d is: [[5 4]
 [3 2]]
The minimum value of 

## Slicing

We didn't say anything about indexing or slicing. This is because, for 1D arrays, it is similar to that for lists. For 2D arrays, it's slightly more complicated.

In [None]:
array_2d = np.arange(9).reshape(3,3)

print(array_2d,"\n",sep="")
print(array_2d[1],type(array_2d[1]))
print(array_2d[2][2],type(array_2d[2][2]))
print("Working with Rows")
print(array_2d[0,::-1])
print(array_2d[0,:0:-1])
print(array_2d[2,:2:])
print("Working with Columns")
print(array_2d[:, 1])
print(array_2d[:2, 2])
print(array_2d[::-1, 0])
print(array_2d[:,0])


[[0 1 2]
 [3 4 5]
 [6 7 8]]

[3 4 5] <class 'numpy.ndarray'>
8 <class 'numpy.int64'>
Working with Rows
[2 1 0]
[2 1]
[6 7]
Working with Columns
[1 4 7]
[2 5]
[6 3 0]
[0 3 6]


**TODO:** Write an array with 100 equally spaced numbers from 0 to 9

In [None]:
# To complete
array_test= np.linspace(0,9,100)
print(array_test)

[0.         0.09090909 0.18181818 0.27272727 0.36363636 0.45454545
 0.54545455 0.63636364 0.72727273 0.81818182 0.90909091 1.
 1.09090909 1.18181818 1.27272727 1.36363636 1.45454545 1.54545455
 1.63636364 1.72727273 1.81818182 1.90909091 2.         2.09090909
 2.18181818 2.27272727 2.36363636 2.45454545 2.54545455 2.63636364
 2.72727273 2.81818182 2.90909091 3.         3.09090909 3.18181818
 3.27272727 3.36363636 3.45454545 3.54545455 3.63636364 3.72727273
 3.81818182 3.90909091 4.         4.09090909 4.18181818 4.27272727
 4.36363636 4.45454545 4.54545455 4.63636364 4.72727273 4.81818182
 4.90909091 5.         5.09090909 5.18181818 5.27272727 5.36363636
 5.45454545 5.54545455 5.63636364 5.72727273 5.81818182 5.90909091
 6.         6.09090909 6.18181818 6.27272727 6.36363636 6.45454545
 6.54545455 6.63636364 6.72727273 6.81818182 6.90909091 7.
 7.09090909 7.18181818 7.27272727 7.36363636 7.45454545 7.54545455
 7.63636364 7.72727273 7.81818182 7.90909091 8.         8.09090909
 8.18181818

**TODO:** Create a 1D array of 20 zeros. Replace the first 15 elements with ones.

In [None]:
# To complete
print("Original array:\n")
array_test2=np.zeros(20)
print(array_test2)
print(array_test2.size,"\n",sep="")
print("New array:")
array_test3= array_test2[:15]= np.ones(15)
print(array_test3)
print(array_test3.size,"\n",sep="")
print("The result is:")
print(array_test2)
print(array_test2.size,"\n",sep="")


Original array:

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
20

New array:
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
15

The result is:
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 0. 0. 0. 0. 0.]
20



**TODO:** Create a 1D array of 50 zeros. Replace the first 25 elements with the natural numbers from 0 to 24.

In [None]:
# To complete
array_test4= np.zeros(50,int)
array_test4[:25]=np.arange(25)
print(array_test4)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0]


**TODO:** Create a 2D array of 3 rows and 3 columns, filled with zeros. Replace the elements of the second column by the numbers 1, 2 and 3 respectively.

In [None]:
# To complete
array_test5=np.zeros((3,3),int)
array_test5[:,1]=np.arange(1,4)
print(array_test5)

[[0 1 0]
 [0 2 0]
 [0 3 0]]


**TODO:** Create a 2D array of 3 rows and 3 columns, filled with zeros. Replace the elements of the diagonal by ones.

In [None]:
# To complete
array_test6=np.zeros((3,3),int)
np.fill_diagonal(array_test6,np.ones(1))
print(array_test6,"\n",sep="")

#Other option
array_test6=np.zeros((3,3),int)
np.fill_diagonal(array_test6,1)
print(array_test6)

[[1 0 0]
 [0 1 0]
 [0 0 1]]

[[1 0 0]
 [0 1 0]
 [0 0 1]]


**TODO:** Create a 2D array of 100 rows and 100 columns, filled with zeros. Replace the elements of the diagonal by ones.

In [None]:
# To complete
array_test7 = np.zeros((100,100))
np.fill_diagonal(array_test7,np.ones(1))
print(array_test7)
print(array_test7.size)
print(array_test7.shape)

[[1. 0. 0. ... 0. 0. 0.]
 [0. 1. 0. ... 0. 0. 0.]
 [0. 0. 1. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 1. 0. 0.]
 [0. 0. 0. ... 0. 1. 0.]
 [0. 0. 0. ... 0. 0. 1.]]
10000
(100, 100)


## Statistics

Descriptive Statistics helps us to begin to analyze and understand a set of data. In the case of numerical data, it does so by obtaining statistical values that, in some way, replace our data. For example, it is very difficult to read and understand the age of 1000 people. But with a reduced set of statistical values (minimum, maximum, mean and standard deviation, etc.) we can approximate that set in a much more understandable way.

Given a set of numbers, the average is usually considered the most representative number of that set. This is not always the case.

**TODO:**
Given the following list of numbers, write a routine that computes their mean, [variance](https://en.wikipedia.org/wiki/Variance), and [standard deviation](https://en.wikipedia.org/wiki/Standard_deviation).

In [None]:
my_list = [11, 12, 63, 31, 82, 56, 32, 686, 1, 2, 39, 9, 99, 2, 1]
n = len(my_list)

In [None]:
# To complete

#total of sum with loop 
total = 0
for i in my_list:
   total+= i

#calculate the mean
mean = round(total/n,2)
print(f'The mean is: {mean}')

#calculate the variance
#difference btw mean and values
dif=[]
for i in my_list:
  x = i - mean
  dif.append(x)

#elevate result by square
square = []
for i in dif:
  x = i**2
  square.append(x)

#sum all the results and calculate mean
total2= 0
for i in square:
  total2 += i

#calculate variance
variance = round(total2/n,2)
print(f'The variance is: {variance}')

#calculate standard deviation
standard_deviation = round(variance **(1/2),2) 
print(f'The standard deviation is: {standard_deviation}')

The mean is: 75.07
The variance is: 27570.86
The standard deviation is: 166.04


**TODO:** Now with NumPy

In [None]:
# To complete
array_test8=np.array(my_list)
print("The mean is:",round(np.mean(array_test8),2))
print("The variance is:",round(np.var(array_test8),2))
print("The standard deviation is:",round(np.std (array_test8),2))

The mean is: 75.07
The variance is: 27570.86
The standard deviation is: 166.04


## Random

One extremely useful thing we can do with NumPy is to generate random samples. These functions are found within the NumPy random package.

In [None]:
die_samples = np.random.randint(1, 7, size=15)
print(die_samples)

die_samples = np.random.choice([1, 2, 3, 4, 5, 6], size=15)
print(die_samples)

die_samples =np.random.choice((np.array(np.random.randint(0,9,10))),size=10)
print(die_samples)

[6 2 2 1 6 1 5 4 2 1 5 5 3 6 4]
[1 2 5 6 1 6 3 2 1 1 5 1 2 4 5]
[5 6 6 8 2 2 6 2 6 5]


**TODO:**
What will be the mean of the values obtained by throwing a die many times? We are going to try to answer this question by simulating a die. For it:

* Get random samples of a die with differet sample size (e.g. 10, 100, 1000, ...).
* Calculate its mean and standard deviation.
* From what number of samples does the average "stabilize"? **It stabilizes from number 10.000**

In [None]:
# To complete
sample10 = np.random.randint(1,7,10)
sample100 = np.random.randint(1,7,100)
sample1000 = np.random.randint(1,7,1000)
sample10000 = np.random.randint(1,7,10000)
sample100000 = np.random.randint(1,7,100000)
sample1000000 = np.random.randint(1,7,1000000)
sample10000000 = np.random.randint(1,7,10000000)
sample100000000 = np.random.randint(1,7,100000000)

print("Sample10:\n","Mean:",np.mean(sample10),"\nStd:",round(np.std(sample10)),"\n",sep="")
print("Sample100:\n","Mean:",np.mean(sample100),"\nStd:",round(np.std(sample100)),"\n",sep="")
print("Sample1.000:\n","Mean:",np.mean(sample1000),"\nStd:",round(np.std(sample1000)),"\n",sep="")
print("Sample10.000:\n","Mean:",np.mean(sample10000),"\nStd:",round(np.std(sample10000)),"\n",sep="")
print("Sample100.000:\n","Mean:",np.mean(sample100000),"\nStd:",round(np.std(sample100000)),"\n",sep="")
print("Sample1.000.000:\n","Mean:",np.mean(sample1000000),"\nStd:",round(np.std(sample1000000)),"\n",sep="")
print("Sample10.000.000:\n","Mean:",np.mean(sample10000000),"\nStd:",round(np.std(sample10000000)),"\n",sep="")
print("Sample100.000.000:\n","Mean:",np.mean(sample100000000),"\nStd:",round(np.std(sample100000000)),"\n",sep="")

Sample10:
Mean:3.7
Std:2

Sample100:
Mean:3.2
Std:2

Sample1.000:
Mean:3.451
Std:2

Sample10.000:
Mean:3.5086
Std:2

Sample100.000:
Mean:3.49825
Std:2

Sample1.000.000:
Mean:3.497784
Std:2

Sample10.000.000:
Mean:3.4999123
Std:2

Sample100.000.000:
Mean:3.50010347
Std:2



**TODO:**

Simulate a loaded die to favor a value of your choice. For example, the 6. To do this, consult the help of the __np.random.choice__ function. How do you modify the mean and standard deviation?

In [None]:
# To complete
die = np.arange(1,7)
prob5= [0.12,0.12,0.12,0.12,0.4,0.12]

loadeddie5= np.random.choice(die,p=prob5,size=100)

print(loadeddie5)
print("\n",np.count_nonzero(loadeddie5 == 5),"times number 5 out of 100")

[2 5 5 5 4 4 5 5 4 5 2 5 1 5 4 2 3 4 4 5 5 5 2 2 1 3 6 5 6 5 2 5 5 4 6 5 1
 3 2 5 3 2 5 3 6 5 5 5 2 2 5 5 1 5 5 6 5 4 2 6 5 6 5 4 3 1 6 5 5 4 2 1 5 5
 2 2 6 5 1 4 5 5 1 6 5 4 2 5 5 5 1 4 4 5 5 5 3 1 4 5]

 43 times number 5 out of 100
