### Name :  Ibadullah Hayat
### Section : F23AI - Green
### Roll No :  B23F0001AI010

# Lab 01: Introduction to NumPy and Pandas

### Objective:
Practice basic data manipulation using NumPy and Pandas—foundational tools for machine learning.

### Task 1 : Student Marks Analysis

In [2]:
import numpy as np

# Given marks of 10 students
marks = np.array([45, 67, 89, 32, 76, 54, 90, 38, 71, 60])
print("Student marks:", marks)

Student marks: [45 67 89 32 76 54 90 38 71 60]


I start by storing the marks in a NumPy array. NumPy is efficient for numerical operations and will let me compute statistics quickly.

In [7]:
# 1.1: Compute average mark
mean_mark = np.mean(marks)
print(f"Average mark: {mean_mark:.1f}")

Average mark: 62.2


The np.mean() function gives the class average. This is a basic but essential step in understanding dataset central tendency.

In [8]:
# 1.2: Count students above average
above_avg = marks[marks > mean_mark]
print(f"Students above average: {above_avg} (Total: {len(above_avg)})")

Students above average: [67 89 76 90 71] (Total: 5)


Using boolean indexing (marks > mean_mark), I filter only those scores above the average. This shows how powerful NumPy is for conditional selection.

In [None]:
# 1.3: Reshape into 2x5 matrix and find max in each row
reshaped = marks.reshape(2, 5)
print("Reshaped marks (2 rows, 5 columns):")
print(reshaped)

max_per_row = np.max(reshaped, axis=1)
print(f"Max mark in each row: {max_per_row}")

Reshaping helps organize data into meaningful structures (e.g., batches). Using axis=1 in np.max() computes the maximum along each row useful for group-wise analysis.

Reflection: This task taught me how NumPy handles vectorized operations without loops. These skills are crucial for preprocessing features in ML (e.g., normalizing batches of data).



### Task 2 : Fruit Shop Inventory

In [None]:
import pandas as pd

# Create inventory as a Pandas Series (labeled 1D data)
stock = pd.Series([20, 15, 0, 8], index=['Apples', 'Bananas', 'Cherries', 'Dates'])
print("Initial stock:")
print(stock)

Initial Stock:
 Apples      20
Bananas     15
Cherries     0
Dates        8
dtype: int64


Pandas Series is like a dictionary with extra superpowers. Here, fruit names are labels (index), and quantities are values—making data intuitive to read

In [None]:
# 2.1: Find out-of-stock items
out_of_stock = stock[stock == 0]
print("\nOut of stock:")
print(out_of_stock)

Filtering with stock == 0 returns only items with zero quantity. Label-based indexing makes it easy to identify missing products.

In [None]:
# 2.2: Update stock after delivery
stock['Apples'] += 5
stock['Bananas'] += 10
stock['Dates'] += 12
print("\nUpdated stock after delivery:")
print(stock)

Updating values by label (stock['Apples']) is clean and readable—no need to remember positions. This mimics real-world inventory updates.

In [None]:
# 2.3: Total inventory
total = stock.sum()
print(f"\nTotal items in stock: {total}")

Updated Stock:
 Apples      25
Bananas     25
Cherries     0
Dates       20
dtype: int64


The .sum() method aggregates all values. Simple, but essential for reporting and monitoring.

Reflection: Pandas Series is perfect for 1D labeled data like sensor readings, product counts, or time-series metrics. It’s far more intuitive than raw arrays when labels matter.

### Task 3: Mini Student Records

In [1]:
# Create a DataFrame (2D tabular data)
data = {
    'Name': ['Ali', 'Sara', 'Bilal', 'Hina', 'Usman'],
    'Age': [20, 22, 21, 23, 22],
    'Score': [88, 92, 79, 85, 90]
}
df = pd.DataFrame(data)
print("Student records:")
print(df)

Student Records:
     Name  Age  Score
0    Ali   20     88
1   Sara   22     92
2  Bilal   21     79
3   Hina   23     85
4  Usman   22     90


A DataFrame is like an Excel sheet in Python. Each column is a feature (Name, Age, Score), and each row is a student,ideal for structured datasets

In [4]:
# 3.1: Find top student
top_idx = df['Score'].idxmax()  # Get index of highest score
top_student = df.loc[top_idx]
print("\nTop-performing student:")
print(top_student)


Top-performing student:
Name     Sara
Age        22
Score      92
Name: 1, dtype: object


idxmax() finds the index of the maximum value, and .loc[] retrieves the full row. This is how we’d find best-performing samples in real ML tasks.

In [5]:
# 3.2: Average score
avg_score = df['Score'].mean()
print(f"\nClass average score: {avg_score:.1f}")


Class average score: 86.8


Column-wise operations (like df['Score'].mean()) are effortless in Pandas.

In [6]:
# 3.3: Students above average
above_avg_df = df[df['Score'] > avg_score]
print("\nStudents scoring above average:")
print(above_avg_df)


Students scoring above average:
    Name  Age  Score
0    Ali   20     88
1   Sara   22     92
4  Usman   22     90


Again, boolean indexing filters rows based on conditions. This is the backbone of data cleaning and subset selection in ML pipelines.

Reflection: DataFrames are the #1 tool for exploratory data analysis (EDA). Being able to slice, filter, and aggregate data quickly is essential before feeding it into any ML model.



## Conclusion

In this lab, we explored NumPy and Pandas, which are essential tools for Machine Learning. Using NumPy, we learned to calculate averages, apply conditions, and reshape data efficiently. With Pandas Series, we managed inventory by checking out-of-stock items, updating stock, and calculating totals. Using Pandas DataFrame, we analyzed student records, identified top scorers, and filtered above-average students. These tasks taught us how to handle, clean, and analyze data effectively. Such operations are the foundation of Machine Learning, as quality data preparation is key before building models.