# 📘 Day 03: NumPy & Pandas
This notebook walks through core data handling skills using NumPy and Pandas.

## 🔢 Section 1: NumPy Essentials

### What is NumPy?
NumPy is a library for fast numerical computing in Python using arrays.

In [1]:
import numpy as np

### Creating Arrays

In [2]:
arr = np.array([1, 2, 3, 4])
print(arr)

[1 2 3 4]


In [5]:
zeros = np.ones((4, 3))
print(zeros)

[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]


In [6]:
zeros = np.zeros((2, 3))
ones = np.ones((3, 3))
print("Zeros:", zeros)
print("Ones:", ones)

Zeros: [[0. 0. 0.]
 [0. 0. 0.]]
Ones: [[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]


### Array Attributes and Shape

In [10]:
zeros

array([[0., 0., 0.],
       [0., 0., 0.]])

In [9]:
print("Shape:", zeros.shape)
print("Size:", zeros.size)
print("Data type:", zeros.dtype)

Shape: (2, 3)
Size: 6
Data type: float64


In [22]:
# read image using pillow
from PIL import Image

img = Image.open("887638684070953124469.png.jpeg")

# print the image
print("Image size:", img.size)
print(img)
# convert image to numpy array
arr = np.array(img)
print("Image array shape:", arr.shape)
print("Image array size:", arr.size)
print("Image array data type:", arr.dtype)
#print("Image array:", arr[:, :3])

# convert it to grayscale
img_gray = img.convert("L")
# print the grayscale image
print("Grayscale image size:", img_gray.size)
xyz = np.array(img_gray)
print(xyz)

Image size: (768, 768)
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=768x768 at 0x104F7D010>
Image array shape: (768, 768, 3)
Image array size: 1769472
Image array data type: uint8
Grayscale image size: (768, 768)
[[ 44  63  97 ... 236 239 239]
 [ 31  56  86 ... 240 243 243]
 [ 25  47  64 ... 249 250 250]
 ...
 [236 236 236 ... 126 128 130]
 [236 236 236 ... 124 125 127]
 [236 236 236 ... 123 124 125]]


### Indexing & Slicing

In [12]:
matrix = np.array([[1, 2, 3], [4, 5, 6]])
print("Matrix:", matrix)
print("Matrix shape:", matrix.shape)
print(matrix[:, 1])  # column 2

Matrix: [[1 2 3]
 [4 5 6]]
Matrix shape: (2, 3)
[2 5]


In [None]:
print(matrix[:, :])

[1 4]


### Broadcasting & Vectorized Operations

In [23]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(a + b)
print(a * 2)
c = a* b
print(c)

[5 7 9]
[2 4 6]
[ 4 10 18]


### Matrix Operations

In [24]:
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
print("Dot Product:\n", np.dot(A, B))

Dot Product:
 [[19 22]
 [43 50]]


## 🐼 Section 2: Pandas Fundamentals

In [29]:
import pandas as pd

### Creating DataFrames

In [None]:
# csv , excel, json 

In [48]:
data = {'Name': ['Alice', 'Bob','Jai'], 'Age': [25, "aa",23],'Place': ['New York', 'Los Angeles','Hyderabad']}
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,Place
0,Alice,25,New York
1,Bob,aa,Los Angeles
2,Jai,23,Hyderabad


### Exploring Data

In [49]:
df.info()
#df.describe()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    3 non-null      object
 1   Age     3 non-null      object
 2   Place   3 non-null      object
dtypes: object(3)
memory usage: 204.0+ bytes


### Indexing and Filtering

In [50]:
# indexing using loc
#print(df.loc[1])  # first row
#print(df.loc[1:2])  # rows 1 to 2
#print(df.loc[0:2])  # rows 0 to 2

#print(df.loc[0, ['Name', 'Age']])  # rows 0 to 1 and columns Name and Age
print(df.loc[df['Age'] > 25])  # rows where Age > 25

TypeError: '>' not supported between instances of 'str' and 'int'

In [43]:
df

Unnamed: 0,Name,Age,Place
0,Alice,25,New York
1,Bob,30,Los Angeles
2,Jai,23,Hyderabad


In [None]:
df[df['Age'] > 26]

Unnamed: 0,Name,Age,Place
1,Bob,30,Los Angeles


In [None]:
df.loc[0]  # First row

df[df['Age'] > 26]

### Handling Missing Values

In [47]:
df2 = pd.DataFrame({'A': [1, None, 3], 'B': [4, 5, None]})
#print(df2)
df2.fillna(0)

Unnamed: 0,A,B
0,1.0,4.0
1,0.0,5.0
2,3.0,0.0


### Grouping and Aggregation

In [51]:
df3 = pd.DataFrame({'Dept': ['IT', 'HR', 'IT'], 'Salary': [60000, 50000, 65000]})
df3
#df3.groupby('Dept')['Salary'].mean()

Unnamed: 0,Dept,Salary
0,IT,60000
1,HR,50000
2,IT,65000


In [52]:
df3.groupby('Dept')['Salary'].mean()

Dept
HR    50000.0
IT    62500.0
Name: Salary, dtype: float64

## ✅ Summary
- NumPy is ideal for numerical data and array-based operations.
- Pandas is excellent for structured tabular data manipulation.
- Practice loading data, exploring, cleaning, and summarizing.

## 🧮 NumPy Advanced Operations

In [25]:
arr = np.arange(12)
print(arr)

[ 0  1  2  3  4  5  6  7  8  9 10 11]


In [26]:
# Array Reshaping and Stacking
arr = np.arange(12)
reshaped = arr.reshape(3, 4)
print("Reshaped:", reshaped)

stacked = np.vstack([reshaped, reshaped])
print("Vertically stacked:\n", stacked)

Reshaped: [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
Vertically stacked:
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]


In [27]:
# Logical operations
print("Elements > 5:", arr[arr > 5])

Elements > 5: [ 6  7  8  9 10 11]


In [None]:
split - 100 - train , test - 80 , 20 

In [28]:
# Random number generation
np.random.seed(42)
rand_arr = np.random.rand(3, 3)
print("Random Array:\n", rand_arr)

Random Array:
 [[0.37454012 0.95071431 0.73199394]
 [0.59865848 0.15601864 0.15599452]
 [0.05808361 0.86617615 0.60111501]]


## 📑 More with Pandas

In [55]:
# Creating DataFrames from lists of dicts
data = [
    {'Name': 'Alice', 'Age': 25, 'City': 'New York'},
    {'Name': 'Bob', 'Age': 17, 'City': 'London'},
    {'Name': 'Charlie', 'Age': 35, 'City': 'Paris'}
]
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,17,London
2,Charlie,35,Paris


In [54]:
# Sorting values
df.sort_values(by='Age', ascending=False)

Unnamed: 0,Name,Age,City
2,Charlie,35,Paris
1,Bob,30,London
0,Alice,25,New York


In [56]:
# Adding new columns
df['Is_Adult'] = df['Age'] > 18
df

Unnamed: 0,Name,Age,City,Is_Adult
0,Alice,25,New York,True
1,Bob,17,London,False
2,Charlie,35,Paris,True


In [59]:
# Filtering with multiple conditions
df[(df['Age'] > 16) | (df['City'] == 'London')]

Unnamed: 0,Name,Age,City,Is_Adult
0,Alice,25,New York,True
1,Bob,17,London,False
2,Charlie,35,Paris,True


In [61]:
# Merging DataFrames
df1 = pd.DataFrame({'ID': [1, 2], 'Name': ['Alice', 'Bob']})
df2 = pd.DataFrame({'ID': [1, 2], 'Score': [85, 90]})
print(df1)
print(df2)
merged = pd.merge(df1, df2, on='ID')
merged

   ID   Name
0   1  Alice
1   2    Bob
   ID  Score
0   1     85
1   2     90


Unnamed: 0,ID,Name,Score
0,1,Alice,85
1,2,Bob,90


In [62]:
# Pivot tables
sales = pd.DataFrame({
    'Region': ['East', 'West', 'East', 'West'],
    'Product': ['A', 'A', 'B', 'B'],
    'Sales': [100, 150, 200, 250]
})
print(sales)
sales.pivot_table(values='Sales', index='Region', columns='Product', aggfunc='sum')


  Region Product  Sales
0   East       A    100
1   West       A    150
2   East       B    200
3   West       B    250


Product,A,B
Region,Unnamed: 1_level_1,Unnamed: 2_level_1
East,100,200
West,150,250


## 🧠 Summary
- Master reshaping, stacking, logical ops in NumPy
- In Pandas, learn sorting, filtering, merging, pivoting
- Use groupby and aggregation to summarize complex data