Here is a set of 15 practice assignments for **Pandas**, following the same "Part 3" structure and difficulty level as your previous notebooks.

---

## Python Part 4: Pandas Data Manipulation

### Topic 1: Series & DataFrame Creation

**Assignment 1: Data Structures**

1. Create a Pandas Series from a list of 5 department names (e.g., "HR", "Sales", etc.).
2. Create a DataFrame from a dictionary of 3 students, including their "Name", "Age", and "Score".

**Assignment 2: Indexing and Naming**

1. Create a Series with custom indices ('a', 'b', 'c', 'd').
2. Assign a name attribute to the Series and print it.

---

### Topic 2: Loading & Inspecting Data

**Assignment 3: Reading CSVs**

1. Write a script to load a file named `data.csv` (assume it exists) into a variable called `df`.
2. Display the first 5 rows and the last 3 rows of the DataFrame.

**Assignment 4: Data Profiling**

1. Use a single method to view the summary statistics (mean, min, max, etc.) of numeric columns.
2. Print the information about the DataFrame, including the number of non-null values and memory usage.

---

### Topic 3: Selection & Filtering

**Assignment 5: Column & Row Access**

1. Select a single column from a DataFrame using both bracket notation and dot notation.
2. Use `.iloc` to select the first 3 rows and the first 2 columns.

**Assignment 6: Conditional Filtering**

1. Filter a DataFrame to show only the rows where the "Score" is greater than 80.
2. Select rows where the "Department" is "IT" AND the "Salary" is over 50000.

---

### Topic 4: Data Cleaning

**Assignment 7: Handling Missing Values**

1. Use a method to find the count of missing (`NaN`) values in each column.
2. Fill all missing values in a "Bonus" column with the value `0`.

**Assignment 8: Dropping Data**

1. Remove a column named "Internal_ID" from the DataFrame permanently.
2. Drop any rows that contain missing values in the "Name" column.

---

### Topic 5: Transformation & Grouping

**Assignment 9: Adding and Renaming Columns**

1. Create a new column "Total_Score" by adding "Math_Score" and "Science_Score".
2. Rename the column "Emp_Name" to "Employee_Name".

**Assignment 10: The `apply()` Function**

1. Write a function that converts a value to uppercase and apply it to a "City" column.
2. Use a lambda function with `apply()` to double the values in a "Price" column.

**Assignment 11: Grouping and Aggregation**

1. Group the DataFrame by "Category" and calculate the mean "Sales" for each group.
2. Group by "Year" and find the total sum of "Profit".

---

### Topic 6: Merging & Sorting

**Assignment 12: Concatenation**

1. Create two DataFrames with the same columns and combine them vertically using `pd.concat()`.

**Assignment 13: Merging Data**

1. Perform an "Inner Join" on two DataFrames using a common column "Employee_ID".
2. Explain in a comment the difference between a "Left Join" and an "Inner Join".

**Assignment 14: Sorting Data**

1. Sort a DataFrame by "Date" in descending order.
2. Sort by "Department" first, then by "Salary" (highest to lowest) within that department.

---

### Topic 7: Exporting Data

**Assignment 15: Saving Progress**

1. Export your cleaned DataFrame to an Excel file named `Final_Report.xlsx` without the index column.
2. Save the same DataFrame to a JSON file.

---

### Would you like me to:

1. Generate the **Solution Code** for these 15 Pandas assignments?
2. Create a set for **Matplotlib/Seaborn** to visualize this data?
3. Generate a **CSV file creation script** so you have data to practice these assignments with?


##### Assignment 1: Data Structure

1. Create a Pandas Series from a list of 5 department names (e.g., "HR", "Sales", etc.)

In [1]:
import pandas as pd

departments = ['HR', 'Marketing', 'Sales', 'IT', 'Design']

df = pd.DataFrame(departments)

print(df)

           0
0         HR
1  Marketing
2      Sales
3         IT
4     Design


2. Create a DataFrame from a dictionary of 3 students, including their "Name", "Age", and "Score".

In [2]:
df = pd.DataFrame({
    "Name": ['Alice', 'Bob', 'Charlie', 'Denial'],
    "Age": [10, 21, 23,12],
    "Score": [87,87,89,85]
})

print(df)

      Name  Age  Score
0    Alice   10     87
1      Bob   21     87
2  Charlie   23     89
3   Denial   12     85


---

##### Assignment 2: Indexing and Naming

1. Create a Series with custom indices ('a', 'b', 'c', 'd').

2. Assign a name attribute to the Series and print it.

In [7]:
data = [1,2,3,4]

# custom indices
df = pd.DataFrame(data, index=['a', 'b', 'c', 'd'], columns=["Name"])

print(df)



   Name
a     1
b     2
c     3
d     4


---

##### Assignment 3: Reading CSVs

1. Write a script to load a file named `data.csv` (assume it exists) into a variable called `df`.

2. Display the first 5 rows and the last 3 rows of the dataframe.

In [9]:
df = pd.read_csv('data.csv')

df.head(5)
df.tail(3)

Unnamed: 0,Date,Category,Value,Product,Sales,Region
47,2023-02-17,B,69.0,Product3,143.0,West
48,2023-02-18,C,65.0,Product3,182.0,North
49,2023-02-19,C,11.0,Product3,708.0,North


---

##### Assignment 4: Data Profiling

1. Use a single method to view the summary statistics (mean, min, max, etc). of numeric columns.

2. Print the information about the DataFrame, including the number of non-null values and memory usage.

In [12]:
df = pd.read_csv('data.csv')
print(df.describe())
df.info()

           Value       Sales
count  47.000000   46.000000
mean   51.744681  557.130435
std    29.050532  274.598584
min     2.000000  108.000000
25%    27.500000  339.000000
50%    54.000000  591.500000
75%    70.000000  767.500000
max    99.000000  992.000000
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 6 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Date      50 non-null     object 
 1   Category  50 non-null     object 
 2   Value     47 non-null     float64
 3   Product   50 non-null     object 
 4   Sales     46 non-null     float64
 5   Region    50 non-null     object 
dtypes: float64(2), object(4)
memory usage: 2.5+ KB


---

##### Assignment 5: Column and Row Access

1. Select a single column from a dataFrame using both bracket notation and dot notation.

In [14]:
df.head()

df['Category']

df.Category

0     A
1     B
2     C
3     B
4     B
5     B
6     A
7     C
8     C
9     A
10    B
11    B
12    A
13    A
14    A
15    C
16    C
17    C
18    A
19    A
20    C
21    C
22    B
23    C
24    A
25    C
26    C
27    C
28    B
29    C
30    B
31    A
32    B
33    C
34    C
35    B
36    C
37    C
38    C
39    A
40    B
41    C
42    A
43    A
44    A
45    B
46    B
47    B
48    C
49    C
Name: Category, dtype: object

2. Use `.iloc` to select the first 3 rows and the first 2 columns.

In [15]:
df.iloc[0:3, 0:2]

Unnamed: 0,Date,Category
0,2023-01-01,A
1,2023-01-02,B
2,2023-01-03,C


---

##### Assignment 6: Conditional Filtering

1. Filter a DataFrame to show only the rows where the "Score" is greater than 80.

In [16]:
df = pd.DataFrame({
    "Name": ['Alice', 'Bob', 'Charlie', 'Denial', 'Eve', 'Fry', 'Goat'],
    "Age": [10, 21, 23,12, 45, 32,56],
    "Score": [87,87,89,85, 70, 56, 34],
    "Department": ['IT', 'Marketing', 'Sales', 'HR', 'IT', 'HR', 'Design'],
    "Salary": [433232, 21343, 5401, 43453, 21339,47393, 23947]
})

df[df['Score'] > 80]

Unnamed: 0,Name,Age,Score,Department,Salary
0,Alice,10,87,IT,433232
1,Bob,21,87,Marketing,21343
2,Charlie,23,89,Sales,5401
3,Denial,12,85,HR,43453


2. Select rows where the "Department" is "IT" AND the "Salary" is over 50000.

In [23]:
df[(df['Department'] == "IT") & (df['Salary'] > 50000)]

Unnamed: 0,Name,Age,Score,Department,Salary
0,Alice,10,87,IT,433232


---

##### Assignment 7: Handling Missing Values

1. Use a method to find the count of missing (`NaN`) values in each column.

In [None]:
df = pd.read_csv('data.csv')

df.isnull().sum()

Date        0
Category    0
Value       3
Product     0
Sales       4
Region      0
dtype: int64

2. Fill all the missing values in a "Bonus" column with the value 0.

In [31]:
df["Bonus1"] = df['Value'].fillna(0)
df["Bonus2"] = df['Sales'].fillna(0)

---

##### Assingment 9: Adding and Renaming columns

1. Create a new column "Total_Score" by adding "Math_Score" and "Science_Score".

In [33]:
df = pd.DataFrame({
    "Math_Score":[10, 20, 30],
    "Science_Score": [40, 50, 60]
})

df['Total_Score'] = df.Math_Score + df.Science_Score

df

Unnamed: 0,Math_Score,Science_Score,Total_Score
0,10,40,50
1,20,50,70
2,30,60,90


2. Rename the column "Emp_Name" to "Employee_Name".

In [38]:
df = pd.DataFrame({
    "Emp_Name": ["Alice", "Bob", "Charlie"]
})

df.rename(columns={"Emp_Name": "Employee_Name"}, inplace=True)

df

Unnamed: 0,Employee_Name
0,Alice
1,Bob
2,Charlie


---

##### Assignment 10: The `apply()` function

1. Write a function that converts a value to uppercase and apply it to "City" Column.

In [40]:
def Casechange(value):
    return value.upper()

df = pd.DataFrame({
    "City": ["Mumbai", "Delhi", "Hydrabad", "Delhi", "Pune"]
})

df['City'].apply(Casechange)

0      MUMBAI
1       DELHI
2    HYDRABAD
3       DELHI
4        PUNE
Name: City, dtype: object

2. Use a lambda function with `apply()` to double the values in a "Price" column.

In [44]:
df = pd.DataFrame({
    "Price": [1939, 5349,332, 5494,3928]
})

df['Price'] = df['Price'].apply(lambda x: x * 2)

df

Unnamed: 0,Price
0,3878
1,10698
2,664
3,10988
4,7856


---

##### Assignment 11: Grouping and Aggregation

1. Group the DataFrame by "Category" and calculate the mean "Sales" for each group

In [49]:
df = pd.read_csv('data.csv')

df.groupby('Category')['Sales'].aggregate('mean')

Category
A    609.071429
B    571.153846
C    509.263158
Name: Sales, dtype: float64

2. Group by "Year" and find the total sum of "Profit".

In [51]:
df = pd.DataFrame({
    'Year': [2021, 2022, 2023, 2024, 2025, 2026],
    'Profit': [4394, 494853, 329303, 4948403, 320329, 549383]
})

df.groupby('Year')['Profit'].aggregate('sum')

Year
2021       4394
2022     494853
2023     329303
2024    4948403
2025     320329
2026     549383
Name: Profit, dtype: int64

---

##### Assignment 12 : Concatenation

1. Create two DataFrames with the same columns and combine them vertically using pd.concat()

In [57]:
df1 = pd.DataFrame({
    "Name": ['Alice', 'Bob', 'Charlie', 'Drake', 'Eve']
})

df2 = pd.DataFrame({
    "Age": [22, 49, 43,39,56]
})

pd.concat([df1, df2], ignore_index=True, join='outer', axis=1)

Unnamed: 0,0,1
0,Alice,22
1,Bob,49
2,Charlie,43
3,Drake,39
4,Eve,56


---

##### Assignment 13: Merging Data

1. Perform an 'Inner Join' on two DataFrames using common column 'Employee_ID'

In [60]:
df1 = pd.DataFrame({
    'Emp_ID': [0,1,2,3,4,5,6,7],
    'Emp_Name':['Alice', 'Bob', 'Charlie', 'Drake', 'Eve', 'Fry', 'Goat', 'Iron']
})

df2 = pd.DataFrame({
    'Emp_ID': [0,1,2,3,4,5,6,7],
    'Emp_Salary': [32323,54324,53535,66940, 49675, 48465, 69574, 40350]
})

pd.merge(df1, df2, how='inner', on='Emp_ID')

Unnamed: 0,Emp_ID,Emp_Name,Emp_Salary
0,0,Alice,32323
1,1,Bob,54324
2,2,Charlie,53535
3,3,Drake,66940
4,4,Eve,49675
5,5,Fry,48465
6,6,Goat,69574
7,7,Iron,40350
