In [11]:
import pandas as pd
import numpy as np

## Pandas Assignment

## Task 1: Creating a DataFrame
• Objective: Create a DataFrame from a dictionary.

• Instructions:

1. Create a dictionary with the following data:
▪ Names: Alice, Bob, Charlie
▪ Ages: 24, 30, 22
▪ Cities: New York, Los Angeles, Chicago
2. Convert this dictionary into a Pandas DataFrame.

In [10]:
# Creating the dictionary
people = {
    "Names": ["Alice", "Bob", "Charlie"],
    "Ages": [24, 30, 22],
    "Cities": ["New York", "Los Angeles", "Chicago"]
}

# Displaying the dictionary
print(people)
df=pd.DataFrame(people)
print(df)

{'Names': ['Alice', 'Bob', 'Charlie'], 'Ages': [24, 30, 22], 'Cities': ['New York', 'Los Angeles', 'Chicago']}
     Names  Ages       Cities
0    Alice    24     New York
1      Bob    30  Los Angeles
2  Charlie    22      Chicago


## Task 2: Data Exploration == Objective: Explore a DataFrame.

• Instructions:

1. Load a CSV file (e.g., students.csv) containing student data (name, grade,
age).
2. Use Pandas functions to:
▪ Display the first 5 rows.
▪ Get a summary of the DataFrame.
▪ Check for missing values

In [27]:
import pandas as pd

# 1. Load the CSV file
df = pd.read_csv('./student-dataset.csv')

# 2. Display the first 5 rows
print("🧾 First 5 Rows:")
# print(df.head())

# 3. Get a summary of the DataFrame
print("\n 🧾📊 Summary Info:")
# print(df.info())

# 4. Describe numerical columns (like grade, age)
print("\n📈🧾 Statistical Summary:")
# print(df.describe())

# 5. Check for missing values
print("\n🔍🧾 Missing Values:")
print(df.isnull().sum().to_string())


🧾 First 5 Rows:

 🧾📊 Summary Info:

📈🧾 Statistical Summary:

🔍🧾 Missing Values:
id                      0
name                    0
nationality             0
city                    0
latitude                0
longitude               0
gender                  0
ethnic.group          307
age                     0
english.grade           0
math.grade              0
sciences.grade          0
language.grade          0
portfolio.rating        0
coverletter.rating      0
refletter.rating        0


## Task 3: Data Manipulation
• Objective: Perform basic data manipulation.

• Instructions:
1. Create a DataFrame with columns for Product, Price, and Quantity.
2. Add a new column called Total that calculates the total price for each product
(Price * Quantity).
3. Filter the DataFrame to show only products with a total price greater than
$100.

In [54]:
data = {
    'Product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Chair'],
    'Price': [800, 20, 45, 150, 75],
    'Quantity': [1, 5, 2, 1, 2]
}
df=pd.DataFrame(data)
print('Complete Data : \n',df)
df['total']=df['Price'] * df['Quantity']
print('Complete Data with Total : \n',df)
filtered_df=df[df['total']>100]
print('Filtered_Data : \n',filtered_df)

Complete Data : 
     Product  Price  Quantity
0    Laptop    800         1
1     Mouse     20         5
2  Keyboard     45         2
3   Monitor    150         1
4     Chair     75         2
Complete Data with Total : 
     Product  Price  Quantity  total
0    Laptop    800         1    800
1     Mouse     20         5    100
2  Keyboard     45         2     90
3   Monitor    150         1    150
4     Chair     75         2    150
Filtered_Data : 
    Product  Price  Quantity  total
0   Laptop    800         1    800
3  Monitor    150         1    150
4    Chair     75         2    150


## Task 4: Grouping Data
• Objective: Group data and calculate statistics.

• Instructions:

1. Use a DataFrame containing sales data with columns for Date, Product, and
Sales.
2. Group the data by Product and calculate the total sales for each product.
3. Sort the results in descending order.

In [71]:
data = {
    'Date': ['2025-04-20', '2025-04-20', '2025-04-21', '2025-04-21', '2025-04-22'],
    'Product': ['Laptop', 'Mouse', 'Laptop', 'Monitor', 'Keyboard'],
    'Sales': [2, 10, 3, 1, 5]
}

# Create DataFrame
sales_df = pd.DataFrame(data)
print("🧾Sales : \n",sales_df)
product_group = sales_df.groupby('Product')['Sales'].sum().reset_index()
product_group = product_group.sort_values(by='Sales', ascending=False)
# Show the DataFrame
print("\n product group : \n\n",product_group)


🧾Sales : 
          Date   Product  Sales
0  2025-04-20    Laptop      2
1  2025-04-20     Mouse     10
2  2025-04-21    Laptop      3
3  2025-04-21   Monitor      1
4  2025-04-22  Keyboard      5

 product group : 

     Product  Sales
3     Mouse     10
0  Keyboard      5
1    Laptop      5
2   Monitor      1


## Task 5: DataFrame Merging and Joining
• Objective: Merge multiple DataFrames.
• Instructions:
1. Create three DataFrames: Students (ID, Name), Scores (ID, Score), and
Courses (ID, Course).
2. Merge these DataFrames to create a comprehensive view containing student
names, scores, and courses.
3. Handle any potential duplicates in the merging process.

In [82]:
stds={
    'id':[1,2,3,4],
    'std_name':['std_1','std_2','std_3','std_4']
}
score={
    'id':[1,2,3,4],
    'score':[111,111,333,444]
}
courses={
    'id':[1,2,3,4],
    'courses':['c1','c2','c3','c4']
}
stds=pd.DataFrame(stds)
score=pd.DataFrame(score)
courses=pd.DataFrame(courses)
# # print(pd.DataFrame(stds))
# print(pd.DataFrame(score))
# print(pd.DataFrame(courses))
print('comprehensive std data : ')
merged_df=pd.merge(stds,score,on='id')
merged_df=pd.merge(merged_df,courses,on='id')
merged_df = merged_df.drop_duplicates()

# Display the final DataFrame
print("✅ Cleaned and Merged Data:")
print(merged_df)

comprehensive std data : 
✅ Cleaned and Merged Data:
   id std_name  score courses
0   1    std_1    111      c1
1   2    std_2    111      c2
2   3    std_3    333      c3
3   4    std_4    444      c4


## Task 6: Handling Missing Data
• Objective: Clean a DataFrame by handling missing data.
• Instructions:
1. Create a DataFrame that includes some NaN values.
2. Practice filling missing values with the mean or median of the column.
3. Drop rows or columns with missing values and display the cleaned
DataFrame.

In [118]:
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, np.nan, 30, 22],
    'City': ['New York', 'Los Angeles', np.nan, 'Chicago']
}
df = pd.DataFrame(data)
print("Original DataFrame:\n\n", df)

# Fill missing values
mean_age = round(df['Age'].mean(), 2)
mode_city = df['City'].mode()[0]

df['Age'].fillna(mean_age)
df['City'].fillna(mode_city)

print("\n🔸 DataFrame after filling missing values:")
print(df)
print("\n🔸 -------- \n")
print("\n🔸 DataFrame after dropping missing values:")
dropped=df.dropna()
print(dropped)


Original DataFrame:

       Name   Age         City
0    Alice  25.0     New York
1      Bob   NaN  Los Angeles
2  Charlie  30.0          NaN
3    David  22.0      Chicago

🔸 DataFrame after filling missing values:
      Name   Age         City
0    Alice  25.0     New York
1      Bob   NaN  Los Angeles
2  Charlie  30.0          NaN
3    David  22.0      Chicago

🔸 -------- 


🔸 DataFrame after dropping missing values:
    Name   Age      City
0  Alice  25.0  New York
3  David  22.0   Chicago


## Task 7: Visualization
• Objective: Visualize data using Pandas.
• Instructions:
1. Use a DataFrame with sales data.
2. Create a bar chart showing total sales per product using the Pandas built-in
plotting capabilities.