<a href="https://colab.research.google.com/github/Jerry0818-creator/DS-testing-25-01-2026/blob/main/2_0_Numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 2.0 Numpy



[Numpy](http://www.numpy.org/) is short for _numerical python_, and provides functions that are especially useful when you have to work with large arrays and matrices of numeric data, like matrix multiplications.  

The array object class is the foundation of Numpy, and Numpy arrays are like lists in Python, except that every thing inside an array must be of the same type, like int or float. As a result, arrays provide much more efficient storage and data operations, especially as the arrays grow larger in size. However, in other ways, NumPy arrays are very similar to Python's built-in list type.

### Create array from lists:

In [None]:
import numpy as np # similar to library() in R

my_list = [[1,2,3,4,5],[6,7,8,9,10]]

array = np.array(my_list)

print(array, type(array))

[[ 1  2  3  4  5]
 [ 6  7  8  9 10]] <class 'numpy.ndarray'>


In [None]:
print(np.zeros((3,4)))
print(np.ones((5,2)))


[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]
[[1. 1.]
 [1. 1.]
 [1. 1.]
 [1. 1.]
 [1. 1.]]


In [None]:
print(np.arange(10000))

[   0    1    2 ... 9997 9998 9999]


In [None]:
np.random.random((3, 3))

array([[0.36187763, 0.36157215, 0.8857897 ],
       [0.24185757, 0.64562173, 0.16151411],
       [0.54631259, 0.86169173, 0.3351552 ]])

### Exercise:
Create a 3x3 array of normally distributed random values with mean 0 and standard deviation 1

array([[-0.06853059, -0.94239075, -0.71156639],
       [-1.85399979,  2.21686759, -0.75593635],
       [-1.00480399,  0.70835068, -1.02972119]])

### Vectorization

In [None]:
my_list = [1,2,3,4,5]

my_list + my_list

[1, 2, 3, 4, 5, 1, 2, 3, 4, 5]

### try:
np.array(my_list) + np.array(my_list)

In [None]:
np.array(my_list) + np.array(my_list)

array([ 2,  4,  6,  8, 10])

In [None]:
print([x*2 for x in my_list])

[2, 4, 6, 8, 10]


In [None]:
np.random.random((3, 3))

array([[0.30165574, 0.53969401, 0.5827302 ],
       [0.86786663, 0.4574494 , 0.03744257],
       [0.9767365 , 0.81606169, 0.59997511]])

### indexing


In [None]:
rand_num = np.random.random((3, 3))

rand_num

array([[0.56711767, 0.29745544, 0.80214092],
       [0.05065611, 0.98701317, 0.43131841],
       [0.84975935, 0.87927437, 0.81053322]])

In [None]:
rand_num[:,0:1]

array([[0.56711767],
       [0.05065611],
       [0.84975935]])

In [None]:
rand_num[:,0:3]

array([[0.56711767, 0.29745544, 0.80214092],
       [0.05065611, 0.98701317, 0.43131841],
       [0.84975935, 0.87927437, 0.81053322]])

In [None]:
mask = (0.2 < rand_num) * (rand_num < 0.7)
rand_num[mask]

array([0.56711767, 0.29745544, 0.43131841])

In [None]:
np.where(mask)

(array([0, 0, 2], dtype=int64), array([1, 2, 2], dtype=int64))

In [None]:
rand_num*rand_num

array([[0.32162246, 0.08847974, 0.64343006],
       [0.00256604, 0.974195  , 0.18603557],
       [0.72209095, 0.77312342, 0.6569641 ]])

### Matrix multiplication

In [None]:
np.dot(rand_num,rand_num)

array([[1.01831714, 1.16758664, 1.23336817],
       [0.44524307, 1.36851016, 0.8159482 ],
       [1.21521234, 1.83330202, 1.71783808]])

### Array Concatenation and Splitting

np.concatenate (axis = 1)

np.split

np.hstack

np.vstack

np.dstack

np.floor

np.hsplit

np.vsplit

np.dsplit

In [None]:
a = np.arange(5)
print(a)
print()

np.hstack((a,a))

[0 1 2 3 4]



array([0, 1, 2, 3, 4, 0, 1, 2, 3, 4])

## **Exercises:**


1. Create a 3x3 matrix with values ranging from 0 to 8
2. Create a 10x10 array with random values and find the minimum and maximum values
3. Create a 8x8 matrix and fill it with a checkerboard pattern
3. Create random vector of size 10 and replace the maximum value by 0
4. Create a $4 * 4$ identity matrix.
5. Generate the 2D array
6. Generate a random $4 \times 4 \times 4$ array of Gaussianly distributed numbers.   
7. Generate `n` evenly spaced intervals between 0. and 1.  
8. Create a vector and then reverse the vector (first element becomes last)


Please feel free to discuss with all of us or refer to Prof Google.

In [None]:
# SAMPLE ANSWERS

SAMPLE ANSWERS

Question 1
a1 is [[0 1 2]
 [3 4 5]
 [6 7 8]]
______________________________________________________________________________

Question 2
10x10 Random Array:
 [[0.74610416 0.87694732 0.93767656 0.8819452  0.53489304 0.46638694
  0.81017039 0.5192213  0.4683338  0.46701566]
 [0.46553961 0.54871649 0.4709795  0.30825665 0.47937262 0.19273026
  0.37183505 0.93713359 0.21190933 0.29729308]
 [0.45319704 0.4017105  0.94273016 0.22732878 0.25978645 0.59515421
  0.94745699 0.25630081 0.39366608 0.43837018]
 [0.15787801 0.79529289 0.30009115 0.30976829 0.88668037 0.61659772
  0.11776734 0.2054476  0.96891418 0.03381654]
 [0.50486219 0.22454478 0.32831044 0.12463078 0.43396549 0.57014874
  0.30189167 0.7672532  0.03678757 0.447348  ]
 [0.52504854 0.85728628 0.03044733 0.61048297 0.99554703 0.62748481
  0.33099384 0.56756274 0.86901637 0.1451481 ]
 [0.9227296  0.96193192 0.6715002  0.06712677 0.61593283 0.2178686
  0.99893722 0.85282316 0.15388316 0.00510975]
 [0.68032627 0.83749886

### Data aggregation functions

NumPy provides many other aggregation functions, but we won't discuss them in detail here.
Additionally, most aggregates have a ``NaN``-safe counterpart that computes the result while ignoring missing values, which are marked by the special IEEE floating-point ``NaN`` value (for a fuller discussion of missing data.
The following table provides a list of useful aggregation functions available in NumPy:

|Function Name      |   NaN-safe Version  | Description                                   |
|-------------------|---------------------|-----------------------------------------------|
| ``np.sum``        | ``np.nansum``       | Compute sum of elements                       |
| ``np.prod``       | ``np.nanprod``      | Compute product of elements                   |
| ``np.mean``       | ``np.nanmean``      | Compute mean of elements                      |
| ``np.std``        | ``np.nanstd``       | Compute standard deviation                    |
| ``np.var``        | ``np.nanvar``       | Compute variance                              |
| ``np.min``        | ``np.nanmin``       | Find minimum value                            |
| ``np.max``        | ``np.nanmax``       | Find maximum value                            |
| ``np.argmin``     | ``np.nanargmin``    | Find index of minimum value                   |
| ``np.argmax``     | ``np.nanargmax``    | Find index of maximum value                   |
| ``np.median``     | ``np.nanmedian``    | Compute median of elements                    |
| ``np.percentile`` | ``np.nanpercentile``| Compute rank-based statistics of elements     |
| ``np.any``        | N/A                 | Evaluate whether any elements are true        |
| ``np.all``        | N/A                 | Evaluate whether all elements are true        |

Source: Python Data Science Handbook

In [None]:
m = np.random.rand(3,3)
m

array([[0.69208244, 0.68317321, 0.95399747],
       [0.84600935, 0.24768166, 0.67441559],
       [0.24156595, 0.06496905, 0.04510782]])

In [None]:
print(m.mean())
print(np.mean(m))

0.4943336161230445
0.4943336161230445
