## Q1. Explain the difference between a Python list and a NumPy array in terms of memory usage and performance. Then write a program for element-wise multiplication.
Explanation

Python List
-Stores elements as references (pointers) to Python objects.
-Can store different data types in the same list.
-Uses more memory and is slower for mathematical operations.

NumPy Array
-Stores elements in continuous memory (contiguous block).
-Usually stores same datatype elements.
-Faster and more memory efficient for calculations because operations are done using optimized C code.

So, NumPy arrays are much better for numerical computation.

In [11]:
import numpy as np

# Python list
list1 = [1, 2, 3, 4]
list2 = [10, 20, 30, 40]

# Element-wise multiplication using list comprehension
list_result = [a * b for a, b in zip(list1, list2)]
print("Python List Multiplication:", list_result)

# NumPy array
arr1 = np.array([1, 2, 3, 4])
arr2 = np.array([10, 20, 30, 40])

# Element-wise multiplication using NumPy
array_result = arr1 * arr2
print("NumPy Array Multiplication:", array_result)


Python List Multiplication: [10, 40, 90, 160]
NumPy Array Multiplication: [ 10  40  90 160]


## Q2. What is broadcasting in NumPy? Create a 3Ã—3 array and add a 1D array using broadcasting. Explain internally.
Explanation

Broadcasting in NumPy means NumPy can perform operations on arrays of different shapes by automatically expanding the smaller array to match the shape of the larger array.

How it works internally
-NumPy checks if shapes are compatible.
-If one array has missing dimensions, NumPy treats it as size 1.
-Then it repeats values across rows/columns without physically copying data (efficient memory use).

In [12]:
import numpy as np

arr_2d = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

arr_1d = np.array([10, 20, 30])

result = arr_2d + arr_1d

print("3x3 Array:\n", arr_2d)
print("1D Array:", arr_1d)
print("Result after Broadcasting:\n", result)


3x3 Array:
 [[1 2 3]
 [4 5 6]
 [7 8 9]]
1D Array: [10 20 30]
Result after Broadcasting:
 [[11 22 33]
 [14 25 36]
 [17 28 39]]


## Q3. What are missing values in Pandas and how are they represented? Create a DataFrame and detect + replace missing values with mean.
Explanation

Missing values are empty or unavailable data values.

Representation in Pandas
-NaN (Not a Number) is used for missing numeric values.
-None can also represent missing values.

In [13]:
import pandas as pd
import numpy as np

data = {
    "Math": [80, 90, np.nan, 70],
    "Science": [85, np.nan, 75, 95],
    "English": [np.nan, 88, 92, 81]
}

df = pd.DataFrame(data)

print("Original DataFrame:\n", df)

# Detect missing values
print("\nMissing Values (True/False):\n", df.isnull())

# Replace missing values with column mean
df_filled = df.fillna(df.mean())

print("\nDataFrame after replacing missing values with mean:\n", df_filled)


Original DataFrame:
    Math  Science  English
0  80.0     85.0      NaN
1  90.0      NaN     88.0
2   NaN     75.0     92.0
3  70.0     95.0     81.0

Missing Values (True/False):
     Math  Science  English
0  False    False     True
1  False     True    False
2   True    False    False
3  False    False    False

DataFrame after replacing missing values with mean:
    Math  Science  English
0  80.0     85.0     87.0
1  90.0     85.0     88.0
2  80.0     75.0     92.0
3  70.0     95.0     81.0


# Steps Explanation

1) df.isnull() detects missing values.

2) df.mean() calculates column mean.

3) fillna() replaces NaN values with mean.

## Q4. Explain boolean indexing in NumPy or Pandas. Create a DataFrame with 5 columns and filter rows using 2 conditions. Explain filtering internally.
Explanation

Boolean indexing means selecting rows based on True/False conditions.
-A condition creates a boolean series.
-Pandas keeps only rows where condition is True.

Logical operators used:
- & for AND
- | for OR
- ~ for NO

In [14]:
import pandas as pd

data = {
    "Name": ["Aman", "Devesh", "Rohit", "Neha", "Simran"],
    "Age": [19, 20, 22, 21, 18],
    "Marks": [75, 88, 65, 92, 80],
    "City": ["Delhi", "Karnal", "Delhi", "Mumbai", "Karnal"],
    "Passed": [True, True, False, True, True]
}

df = pd.DataFrame(data)

print("Original DataFrame:\n", df)

# Filtering using two conditions
filtered = df[(df["Marks"] > 80) & (df["Passed"] == True)]

print("\nFiltered DataFrame (Marks > 80 AND Passed = True):\n", filtered)


Original DataFrame:
      Name  Age  Marks    City  Passed
0    Aman   19     75   Delhi    True
1  Devesh   20     88  Karnal    True
2   Rohit   22     65   Delhi   False
3    Neha   21     92  Mumbai    True
4  Simran   18     80  Karnal    True

Filtered DataFrame (Marks > 80 AND Passed = True):
      Name  Age  Marks    City  Passed
1  Devesh   20     88  Karnal    True
3    Neha   21     92  Mumbai    True


## Q5. What is the purpose of groupby() in Pandas? Create a DataFrame and calculate average salary per department. Explain output.
Explanation

groupby() is used to group data based on a category and then apply operations like:
-mean()
-sum()
-count()
-max(), min()

It is mainly used for data analysis and summarization.

In [15]:
import pandas as pd

data = {
    "Department": ["IT", "HR", "IT", "Sales", "HR", "Sales", "IT"],
    "Salary": [50000, 40000, 60000, 45000, 42000, 48000, 55000]
}

df = pd.DataFrame(data)

print("Original DataFrame:\n", df)

# Grouping by department and finding average salary
avg_salary = df.groupby("Department")["Salary"].mean()

print("\nAverage Salary per Department:\n", avg_salary)


Original DataFrame:
   Department  Salary
0         IT   50000
1         HR   40000
2         IT   60000
3      Sales   45000
4         HR   42000
5      Sales   48000
6         IT   55000

Average Salary per Department:
 Department
HR       41000.0
IT       55000.0
Sales    46500.0
Name: Salary, dtype: float64


Explanation of Output  
-Data is grouped by Department.
-Salary mean is calculated for each department separately.
-Output is a Series showing department-wise average salary.