<a href="https://colab.research.google.com/github/Haseeb-zai30/Ai-notebooks/blob/main/day1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**DAY 1: NumPy & Data Preprocessing for ML**

**Numpy:**
1. core library for scientific and numerical computing in Python
2. powerful N-dimensional array object (ndarray)
3. a collection of tools for efficient operations on large datasets.

**Importance:**
1. Performance
2. Foundation for ML
3. Data preprocessing

In [1]:
!pip install numpy #installing numpy to our venv



In [2]:
import numpy as np

creating 1D array

In [13]:
X = np.array([1.2, 3.4])

Manipulation

In [14]:
X[0]=2

In [15]:
X

array([2. , 3.4])

creating 2D array

In [19]:
X = np.array([[1.2, 3.4], [5.6, 7.8]])

In [20]:
X[1][0]=1

In [21]:
X

array([[1.2, 3.4],
       [1. , 7.8]])

**Array** **Attributes**:

shape → Tuple showing the number of rows and columns.

ndim → Number of dimensions (axes).

size → Total number of elements in the array.

dtype → Data type of elements (e.g., int32, float64)

In [24]:
X = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])

In [26]:
X.shape #shape → Tuple showing the number of rows and columns.

(3, 3)

In [27]:
X.ndim # Number of dimensions (axes).

2

In [28]:
X.size #Total number of elements in the array.

9

In [29]:
X.dtype #Data type of elements (e.g., int32, float64)

dtype('int64')

**Reshape**

Reshape changes the shape of an array without changing its data.

In [30]:
a = np.arange(12)

In [31]:
a

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [32]:
a.shape

(12,)

In [33]:
b = a.reshape(3, 4)# Reshape into 2D (3 rows, 4 columns)

In [34]:
b

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

**Flatten**

Flatten converts a multi-dimensional array into a 1D array.

flatten() → Always returns a copy (independent of original).


In [35]:
flat1 = X.flatten()#multi-dimensional array into a 1D array.

In [36]:
flat1

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

**Split**
Splitting breaks an array into multiple sub-arrays.

Horizontal split (hsplit) → Split columns.

Vertical split (vsplit) → Split rows.

split() → General version.

In [39]:
Y = np.arange(16).reshape(4, 4) #make an array

**vertical** **split** (ROWS)**

In [40]:
top, bottom = np.vsplit(Y, [2])  # split after 2 rows

**bold text** Horizontal split (columns) **bold text**

In [None]:
left, right = np.hsplit(Y, [2])  # split after 2 columns

**Selection in Feature Engineering**

These indexing methods let us extract or manipulate subsets of features.
such as:

Selecting only specific features for training.

Filtering out samples that do not meet conditions.

Creating new feature sets by combining columns.

In [41]:
X = np.array([[10, 20, 30],
              [40, 50, 60],
              [70, 80, 90]])

In [42]:
X[0]

array([10, 20, 30])

In [43]:
X[:, 0]

array([10, 40, 70])

In [None]:
X[[0, 2]]          # select row 0 and 2

In [44]:
mask = X[:, 0] > 30  # select rows where first feature > 30

In [45]:
mask

array([False,  True,  True])

In [46]:
X[mask]

array([[40, 50, 60],
       [70, 80, 90]])

**Broadcasting**

Broadcasting allows operations between arrays of different shapes, expanding the smaller array to match the bigger one without explicit loops.

In [47]:
A = np.array([[1, 2, 3],
              [4, 5, 6]])   # shape (2,3)

B = np.array([10, 20, 30])  # shape (3,)

print(A + B)


[[11 22 33]
 [14 25 36]]


**Universal Functions (ufuncs)**

Ufuncs are highly optimized, element-wise functions in NumPy.
They work on arrays without writing loops.

In [48]:
log_X = np.log1p(X)   # log(1 + x), avoids log(0)

In [49]:
log_X

array([[2.39789527, 3.04452244, 3.4339872 ],
       [3.71357207, 3.93182563, 4.11087386],
       [4.26267988, 4.39444915, 4.51085951]])

In [52]:
exp_X = np.exp(X)

In [53]:
exp_X

array([[2.20264658e+04, 4.85165195e+08, 1.06864746e+13],
       [2.35385267e+17, 5.18470553e+21, 1.14200739e+26],
       [2.51543867e+30, 5.54062238e+34, 1.22040329e+39]])

In [54]:
sqrt_X = np.sqrt(X)

In [55]:
sqrt_X

array([[3.16227766, 4.47213595, 5.47722558],
       [6.32455532, 7.07106781, 7.74596669],
       [8.36660027, 8.94427191, 9.48683298]])

In [60]:
X.mean()

np.float64(50.0)

In [61]:
X.mean(axis=1)#columns wise

array([20., 50., 80.])

In [62]:
X.mean(axis=0)

array([40., 50., 60.])

In [63]:
X.std(axis=0)

array([24.49489743, 24.49489743, 24.49489743])

In [64]:
X.min(axis=0)

array([10, 20, 30])

In [65]:
X.max(axis=0)

array([70, 80, 90])