# NumPy
### NumPy is a powerful and widely-used library in Python for numerical computing. It provides a high-performance array object for efficient processing and manipulation of large datasets.

Here are some of the main functionalities of NumPy:

## N-dimensional arrays: 
NumPy provides support for multi-dimensional arrays, which are commonly used for storing and processing large amounts of numerical data.

In [2]:
import numpy as np

# Create a 1D array
a = np.array([1, 2, 3, 4], dtype=np.float32)
a

array([1., 2., 3., 4.], dtype=float32)

In [21]:
# Create a 2D array
b = np.array([[1, 2, 3], [4, 5, 6]])
b

array([[1, 2, 3],
       [4, 5, 6]])

In [22]:
b*b

array([[ 1,  4,  9],
       [16, 25, 36]])

In [23]:
a.shape

(4,)

In [24]:
b.shape

(2, 3)

In [25]:
a.dtype

dtype('float32')

In [26]:
a[0]

1.0

In [27]:
b[1,2]

6

## Array operations: 
    NumPy provides a wide range of functions for performing mathematical operations on arrays, including basic arithmetic, linear algebra, statistical analysis, and more.

In [28]:
# Basic arithmetic: 
# You can perform basic arithmetic operations on arrays



a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])

# Addition
a + b

array([ 6,  8, 10, 12])

In [29]:
# Subtraction
a - b

array([-4, -4, -4, -4])

In [30]:
# Multiplication
a * b

array([ 5, 12, 21, 32])

In [31]:
# Division
a / b

array([0.2       , 0.33333333, 0.42857143, 0.5       ])

In [32]:
# Sine function
np.sin(a)

array([ 0.84147098,  0.90929743,  0.14112001, -0.7568025 ])

In [33]:
# Cosine function
np.cos(a)

array([ 0.54030231, -0.41614684, -0.9899925 , -0.65364362])

In [34]:
# Exponential function
np.exp(a)

array([ 2.71828183,  7.3890561 , 20.08553692, 54.59815003])

In [35]:
# Logarithmic function
np.log(a)

array([0.        , 0.69314718, 1.09861229, 1.38629436])

In [37]:
a * 5

array([ 5, 10, 15, 20])

## Array Manipulation

In [38]:
a = np.arange(16)
a

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

In [41]:
a.shape

(16,)

In [45]:
a.reshape((2,-1))

array([[ 0,  1,  2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 13, 14, 15]])

In [47]:
a.reshape(-1,2,4)

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

In [2]:
a = np.arange(27).reshape(3,-1)
a

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8],
       [ 9, 10, 11, 12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23, 24, 25, 26]])

In [51]:
a.T # a.transpose()

array([[ 0,  9, 18],
       [ 1, 10, 19],
       [ 2, 11, 20],
       [ 3, 12, 21],
       [ 4, 13, 22],
       [ 5, 14, 23],
       [ 6, 15, 24],
       [ 7, 16, 25],
       [ 8, 17, 26]])

In [52]:
# == a.reshape(26)
a.reshape(-1)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26])

In [54]:
# Concatenating arrays: You can concatenate arrays along a specific axis


a = np.array([[1, 2, 3],
              [4, 5, 6]
             ])
b = np.array([[7, 8, 9],
              [10,11,12]
             ])
a

array([[1, 2, 3],
       [4, 5, 6]])

In [56]:
b[0]

array([7, 8, 9])

In [61]:

# Concatenate arrays along axis 0
np.concatenate((a, b), axis=0)

array([ 1,  2,  3,  7,  8,  9,  4,  5,  6, 10, 11, 12])

In [None]:
# axis 0 - rows
# axis 1 - columns
# axis 2 - depth

In [None]:
# Concatenate arrays along axis 1
np.concatenate((a, b), axis=1)

In [63]:
# np.hstack used to stack arrays horizontally (i.e. column-wise). 
# It takes a tuple of arrays as its argument and returns the stacked array

a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.array([[7, 8, 9], [10, 11, 12]])

c = np.vstack((a, b))

c

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [64]:
# Stacking multiple arrays as columns

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
c = np.array([7, 8, 9])

d = np.column_stack((a, b, c))
d

array([[1, 4, 7],
       [2, 5, 8],
       [3, 6, 9]])

In [65]:
np.vstack(([1,2,3],[4,5,6]))

array([[1, 2, 3],
       [4, 5, 6]])

In [66]:
np.column_stack(([1,2,3],[4,5,6]))

array([[1, 4],
       [2, 5],
       [3, 6]])

In [67]:
np.hstack(([1,2,3],[4,5,6]))

array([1, 2, 3, 4, 5, 6])

In [74]:
a = np.array([1, 2, 3, 4, 5, 6])

# Split the array into three equal parts
np.split(a, 3)

[array([1, 2]), array([3, 4]), array([5, 6])]

In [76]:
np.split(a, [3,5])

[array([1, 2, 3]), array([4, 5]), array([6])]

In [78]:
# Generate an array of 10 values evenly spaced between 0 and 18
x = np.linspace(0, 1, 40)
x

array([0.        , 0.02564103, 0.05128205, 0.07692308, 0.1025641 ,
       0.12820513, 0.15384615, 0.17948718, 0.20512821, 0.23076923,
       0.25641026, 0.28205128, 0.30769231, 0.33333333, 0.35897436,
       0.38461538, 0.41025641, 0.43589744, 0.46153846, 0.48717949,
       0.51282051, 0.53846154, 0.56410256, 0.58974359, 0.61538462,
       0.64102564, 0.66666667, 0.69230769, 0.71794872, 0.74358974,
       0.76923077, 0.79487179, 0.82051282, 0.84615385, 0.87179487,
       0.8974359 , 0.92307692, 0.94871795, 0.97435897, 1.        ])

## Random number generation: 
NumPy provides functions for generating random numbers and arrays, which are useful in simulations and statistical modeling.

In [81]:
# Generating an array of random integers between 0 and 9
np.random.randint(0, 10, size=15)

4

In [82]:
np.random.randint(-45, 120, size=(3,4))

array([[ 16, -35,   0,  61],
       [-22,  87, 112, -14],
       [118, -10, -21, -22]])

In [83]:
# Generating an array of random floats between 0 and 1
rand_floats = np.random.random(size=5)
rand_floats

array([0.11334488, 0.49256843, 0.8216178 , 0.79135543, 0.24707862])

In [84]:
# Generating random numbers from a normal distribution

# Note that loc represents the mean of the normal distribution, 
# and scale represents the standard deviation. The size parameter determines 
# the number of random numbers to generate.

rand_norm = np.random.normal(loc=0, scale=1, size=5)
rand_norm

array([-1.25792884,  0.28654463,  0.81315634,  0.23323809, -0.31616135])

In [85]:
# Generating a random permutation of a given array
arr = np.array([1, 2, 3, 4, 5])
rand_perm = np.random.permutation(arr)
rand_perm

array([1, 5, 3, 4, 2])

In [86]:
# Generating a random subset of a given array
arr = np.array([1, 2, 3, 4, 5])
rand_subset = np.random.choice(arr, size=3, replace=False)
rand_subset

array([4, 3, 5])

In [87]:
np.random.choice(arr, size=(2,3), replace=True)

array([[4, 3, 2],
       [3, 3, 2]])

In [93]:
# Setting the seed for reproducibility
np.random.seed(4)
np.random.randint(0, 100),np.random.randint(0, 100)

(46, 55)

In [104]:
# np.random.seed(32)
np.random.randint(0, 100)

57

## Shape manipulation: 
NumPy provides functions for reshaping and transforming arrays, including transposing, stacking, and splitting.

## Slicing and indexing: 
NumPy arrays can be indexed and sliced just like lists in Python, allowing for efficient selection and extraction of subsets of data.

In [106]:
arr = np.array([0, 1, 2, 3, 4, 5])
sliced_array = arr[1:3]
sliced_array

array([1, 2])

In [109]:
# Slicing a 2D numpy array along the rows

arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
sliced_array = arr[0:2,2:]
sliced_array

array([[3],
       [6]])

In [110]:
# Slicing a 2D numpy array along the columns

arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
sliced_array = arr[:, 1:3]
sliced_array

array([[2, 3],
       [5, 6],
       [8, 9]])

In [119]:
arr > 23

array([False, False, False, False,  True,  True,  True,  True,  True,
        True])

In [118]:
# filtering

arr = np.array([1, 2, 3, 4, 5])
arr = np.arange(20,30)
print(arr)

result = np.where(arr >= 23)
result

[20 21 22 23 24 25 26 27 28 29]


(array([3, 4, 5, 6, 7, 8, 9]),)

In [121]:
# using mask

arr = np.array([10,13,1, 2, 3, 4, 5])
mask = arr >= 3
arr[mask]

array([10, 13,  3,  4,  5])

In [122]:
mask

array([ True,  True, False, False,  True,  True,  True])

In [123]:
# Counting elements in a numpy array using "count_nonzero":


arr = np.array([1, 2, 3, 4, 5])
result = np.count_nonzero(arr >= 3)
result

3

In [124]:
(arr >= 3).sum()

3

In [125]:
np.sum(arr>=3)

3

In [128]:
# Counting elements in a 2D numpy array:

arr = np.array([[1, 2, 3], 
                [4, 5, 6], 
                [7, 8, 9]])
result = np.count_nonzero(arr >= 5,axis=0)
result

array([1, 2, 2])

In [127]:
arr >= 5

array([[False, False, False],
       [False,  True,  True],
       [ True,  True,  True]])

## Example of computing probability with numpy

In [None]:
# A dice is thrown twice without any bias. 
# What is the probability that the sum of numbers the thrower got in those 
# two throws is equal to 6

In [None]:
# The probability of getting a sum of 6 in two dice throws is 5/36. 
# To get a sum of 6, the number on the first dice must be one of the numbers from 1 to 5. 
# This means the probability of getting one of these numbers is 5/6. 
# Then, for each number that appears on the first dice, 
# there is only one number that completes the sum to 6. 
# The probability of getting this number is 1/6. 
# Multiplying these two probabilities together, we get the final probability of getting 
# a sum of 6 in two dice throws which is 5/6 * 1/6 = 5/36.



In [138]:
first_dice = np.random.randint(1,7,int(1e6))
second_dice = np.random.randint(1,7,int(1e6))

In [139]:
first_dice

array([4, 6, 5, ..., 2, 6, 6])

In [140]:
dices_sum = first_dice+second_dice

In [141]:
dices_sum.shape

(1000000,)

In [142]:
n = dices_sum.size
k = np.count_nonzero(dices_sum == 6)

In [143]:
k/n

0.138774

In [144]:
true_answer = 5/36
true_answer

0.1388888888888889

In [147]:
tolerance = 1e-3

np.testing.assert_allclose(k/n, true_answer, 
                           rtol=tolerance)


In [None]:
A = np.array([[1,1],[0,1]])
B = np.array([[2,0],[3,4]])
A+B              #addition of two array
np.add(A,B)      #addition of two array
A * B            # elementwise product
A @ B            # matrix product
A.dot(B)         # another matrix product
B.T              #Transpose of B array
A.flatten()      #form 1-d array
B < 3            #Boolean of Matrix B. True for elements less than 3
A.sum()          # sum of all elements of A
A.sum(axis=0)    # sum of each column
A.sum(axis=1)    # sum of each row
A.cumsum(axis=1) # cumulative sum along each row
A.min()          # min value of all elements
A.max()          # max value of all elements
np.exp(B)        # exponential
np.sqrt(B)       # squre root
A.argmin()       #position of min value of elements 
A.argmax()       #position of max value of elements
A[1,1]           #member of a array in (1,1) position

In [152]:
a = np.arange(1,9)
a,a.cumprod()

(array([1, 2, 3, 4, 5, 6, 7, 8]),
 array([    1,     2,     6,    24,   120,   720,  5040, 40320]))

# Exercises

Filtering:

1. Create a random NumPy array and filter out all elements greater than 5, replacing them with zeros.

2. Given a 2D NumPy array, extract all rows where the values in the first column are greater than 10.

3. Filter a NumPy array to retain only unique values, eliminating duplicates.

Aggregation and Axis:

4. Calculate the sum of elements along each row of a 2D NumPy array.

5. Compute the mean of each column in a 2D NumPy array.

6. Find the maximum value in each row of a 2D NumPy array and return the corresponding column indices.

7. Use NumPy to calculate the median of a dataset along a specific axis.

8. Apply a custom aggregation function to a NumPy array, e.g., finding the product of all elements along a given axis.

Feel free to try these tasks on your own or ask for more information if needed.

# Task
The goal of the exercise is to analyze the test scores of 20 students over 5 dates and 7 courses. The scores are stored in a 3D array of shape (20, 5, 7).

Here is the description of the steps to perform the exercise:

1. Find the average of each course: Calculate the mean of all the scores for each of the 7 courses across all 20 students and 5 dates. This will give you the average score for each course. (shape - (7,))



2. Find the spread of each student
    Compute the average across time of the minimum score in each date for each student (20,): 
    For each student, find the minimum score across all 7 courses (20,5). 
    Then, calculate the mean of these minimum scores across all dates for each student.

    do it again with the maximum score in each date

    Return the difference: Subtract the average of the minimum scores from the average of the maximum scores to get the difference. 

Overall, this exercise requires using the numpy library to manipulate the 3D array of scores, perform element-wise operations, and calculate means and extrema.




In [None]:
import matplotlib.pyplot as plt

In [None]:

# Generate random scores using a skewed distribution
np.random.seed(42)
scores = 100-(np.random.gamma(2, 1, (20,5,7))*10)
scores = scores.astype(int)

# Plot the histogram of the scores

plt.hist(scores.reshape(-1), bins=30, edgecolor='black')
plt.show()


In [None]:
scores.shape

In [None]:
# Output2 

# [29.6, 36. , 35.4, 35.8, 38.4, 28.8, 39.4, 43.4, 27.4, 43.4, 40. ,
#  32. , 30.8, 30.4, 45.6, 34.6, 33.6, 34.8, 49.6, 36.8]