## What is NumPy (in data analysis context)?

###### NumPy = Numerical Python
###### Gives you a powerful object: ndarray (NumPy array)
###### It’s:Faster than normal Python lists
###### Supports vectorized operations (do math on entire arrays at once)
###### The base for Pandas, scikit-learn, Matplotlib, etc.

In [1]:
import numpy as np

## Step 1: Represent data as arrays

###### In data analysis, everything starts with data. NumPy helps you store it in a clean, efficient array structure.

In [2]:
marks = np.array([70, 82, 90, 65, 88])
print(marks)
print(type(marks))
print(marks.shape)

[70 82 90 65 88]
<class 'numpy.ndarray'>
(5,)


## Step 2: Work with 2D data (like tables)

###### Real datasets look like tables (rows & columns). NumPy handles that with 2D arrays.
###### Example: 3 students, 3 subjects

In [3]:
data = np.array([
    [70, 80, 90],
    [60, 75, 85],
    [88, 92, 79]
])

print(data.shape)

(3, 3)


## Step 3: Basic descriptive statistics

###### First thing in analysis: “Understand the data.”
###### NumPy gives built-in functions for: mean, median, min, max, standard deviation, etc.

In [4]:
print("Overall mean:", data.mean())
print("Per student mean:", data.mean(axis=1))
print("Per subject mean:", data.mean(axis=0))

print("Min:", data.min())
print("Max:", data.max())
print("Std dev:", data.std())


Overall mean: 79.88888888888889
Per student mean: [80.         73.33333333 86.33333333]
Per subject mean: [72.66666667 82.33333333 84.66666667]
Min: 60
Max: 92
Std dev: 9.768935395703675


## Step 4: Filtering and cleaning data (Boolean indexing)

###### In data analysis, a big job is cleaning:
###### Remove invalid values: Select a subset of data, Handle missing data, NumPy’s Boolean indexing is super useful here.
###### Example: select marks > 80

In [5]:
marks = np.array([70, 82, 90, 65, 88])

high_marks = marks[marks > 80]
print(high_marks)

[82 90 88]


###### Example: handle missing values (NaN)

In [7]:
data = np.array([10, 20, np.nan, 40, 50])

# Check which are NaN
print(np.isnan(data))

# Filter out NaN
clean_data = data[~np.isnan(data)]
print(clean_data)

print("Mean without NaN:", clean_data.mean())

[False False  True False False]
[10. 20. 40. 50.]
Mean without NaN: 30.0


## Step 5: Vectorized operations (no loops)

In [8]:
arr = np.array([1, 2, 3, 4])
new_arr = arr * 2
print(new_arr)

[2 4 6 8]


###### Example: Normalize marks (0 to 1 scale)

In [9]:
marks = np.array([70, 82, 90, 65, 88])

min_v = marks.min()
max_v = marks.max()

normalized = (marks - min_v) / (max_v - min_v)
print(normalized)

[0.2  0.68 1.   0.   0.92]


## Step 6: Matrix operations (for ML & stats)

###### A lot of advanced analysis (regression, PCA, etc.) uses linear algebra:
###### Vectors, Matrices, Dot product.
###### NumPy makes this easy.
###### Example: dot product (weights × features)

In [10]:
# 3 samples, 2 features
X = np.array([
    [1.0, 2.0],
    [2.0, 3.0],
    [3.0, 4.0]
])

# weights: 2 features
w = np.array([0.5, 1.0])

# prediction: Xw
y_pred = X.dot(w)
print(y_pred)


[2.5 4.  5.5]


## Step 7: Reshaping, stacking, splitting data

###### You often need to: Combine datasets, Split train/test sets, Reshape 1D to 2D, etc.

#### Reshape

In [11]:
arr = np.array([1, 2, 3, 4, 5, 6])
reshaped = arr.reshape(2, 3)
print(reshaped)
# [[1 2 3]
#  [4 5 6]]


[[1 2 3]
 [4 5 6]]


#### Stack arrays (combine data)

In [12]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

v = np.vstack([a, b])  # vertical
h = np.hstack([a, b])  # horizontal

print(v)
# [[1 2 3]
#  [4 5 6]]

print(h)
# [1 2 3 4 5 6]


[[1 2 3]
 [4 5 6]]
[1 2 3 4 5 6]
