# Assignment 2: NumPy and Pandas  
**Name:** Aarchi Khandelwal   
**Date:** 11 February 2026

In [None]:
import numpy as np
import pandas as pd

 *1. Difference between Python List and NumPy Array*

=> A Python list can store elements of different data types and uses more memory.
Operations on lists are slower because they are not optimized for numerical
computations.

A NumPy array stores elements of the same data type, uses less memory, and
performs faster operations because it is optimized for numerical calculations.



In [None]:
list1 = [2, 4, 6]
list2 = [8, 1, 7]

list_result = [a * b for a, b in zip(list1, list2)]  # Element-wise multiplication using list
print("List multiplication:", list_result)

array1 = np.array([1, 2, 3])
array2 = np.array([4, 5, 6])

array_result = array1 * array2  # Element-wise multiplication using NumPy
print("NumPy array multiplication:", array_result)


List multiplication: [16, 4, 42]
NumPy array multiplication: [ 4 10 18]


*2. What is Broadcasting in NumPy?*

=> Broadcasting is a feature in NumPy that allows arrays of different shapes
to be used together in arithmetic operations. NumPy automatically expands
the smaller array so that both arrays have compatible shapes, without
actually copying data.


In [None]:
##Create a 3×3 NumPy array and add a 1D array to it using broadcasting. Explain how NumPy applies the operation internally.

arr2d = np.array([[1, 2, 3],[4, 5, 6],[7, 8, 9]])

arr1d = np.array([10, 20, 30])

result = arr2d + arr1d
print(result) 


[[11 22 33]
 [14 25 36]
 [17 28 39]]


 *3. What are missing values in Pandas and how are they represented? 
Create a DataFrame with missing values and write code to: 
• Detect missing values 
• Replace them with the column mean 
Explain each step.*

=> Missing values in Pandas are values that are not available in the dataset.

STEPS:

1. Missing values are represented as NaN in Pandas.
2. A DataFrame is created containing some NaN values.
3. isnull() is used to detect missing values.
4. mean() calculates the mean of each column.
5. fillna() replaces the missing values with the column mean.



In [None]:
# DataFrame with missing values
data = {
    'Marks': [80, 90, np.nan, 70],
    'Age': [20, np.nan, 22, 21]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# Detect missing values
print("\nDetecting missing values:")
print(df.isnull())

# Replace missing values with column mean
df_filled = df.fillna(df.mean()) 
print("\nDataFrame after replacing missing values with mean:")
print(df_filled)


Original DataFrame:
   Marks   Age
0   80.0  20.0
1   90.0   NaN
2    NaN  22.0
3   70.0  21.0

Detecting missing values:
   Marks    Age
0  False  False
1  False   True
2   True  False
3  False  False

DataFrame after replacing missing values with mean:
   Marks   Age
0   80.0  20.0
1   90.0  21.0
2   80.0  22.0
3   70.0  21.0


*4. Explain boolean indexing in NumPy or Pandas. 
Create a DataFrame with at least 5 columns and filter rows based on two conditions 
using logical operators. Explain how filtering works internally.*

=> Boolean indexing is a technique used in NumPy and Pandas to filter data using
conditions that return True or False. Rows where the condition is True are
selected, and rows where it is False are ignored.
 
WORKING:
Each condition creates a boolean mask of True and False values.
The logical operator (&) combines the conditions.
Pandas selects only the rows where the final condition is True.


In [None]:
data = {
    'Name': ['A', 'B', 'C', 'D', 'E'],
    'Age': [20, 22, 19, 24, 21],
    'Marks': [85, 90, 70, 88, 95],
    'Attendance': [80, 90, 75, 85, 92],
    'Passed': [True, True, False, True, True]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# Boolean indexing with two conditions
filtered_df = df[(df['Age'] > 20) & (df['Marks'] > 85)]

print("\nFiltered DataFrame:") 
print(filtered_df)


Original DataFrame:
  Name  Age  Marks  Attendance  Passed
0    A   20     85          80    True
1    B   22     90          90    True
2    C   19     70          75   False
3    D   24     88          85    True
4    E   21     95          92    True

Filtered DataFrame:
  Name  Age  Marks  Attendance  Passed
1    B   22     90          90    True
3    D   24     88          85    True
4    E   21     95          92    True


*5. What is the purpose of the groupby() function in Pandas? 
Create a DataFrame with categorical data (e.g., department & salary) and calculate 
the average salary per department using groupby(). Explain the output.*

=> The groupby() function in Pandas is used to group data based on one or more
categorical columns and perform aggregate operations such as mean, sum, or count.
 
EXPLANATION:
The data is grouped based on the Department column.
The mean() function calculates the average salary for each department.
The output shows one average salary value for each department.


In [None]:
data = {
    'Department': ['IT', 'HR', 'IT', 'HR', 'Finance'],
    'Salary': [50000, 45000, 55000, 48000, 60000]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# Calculate average salary per department
avg_salary = df.groupby('Department')['Salary'].mean()

print("\nAverage Salary per Department:")
print(avg_salary)


Original DataFrame:
  Department  Salary
0         IT   50000
1         HR   45000
2         IT   55000
3         HR   48000
4    Finance   60000

Average Salary per Department:
Department
Finance    60000.0
HR         46500.0
IT         52500.0
Name: Salary, dtype: float64
