In [None]:
🔥 Capstone Mini Project: Student Dashboard Analysis

📂 Files You’ll Create:
students.csv — Main student data

attendance.csv — Separate attendance file

merged_df.csv — Final cleaned + merged + analyzed file

🧠 Project Scenario:
You’re a data analyst working for a school. You receive two CSVs:

One has student names and marks.

Another has attendance percentages.

Your job? Merge, clean, analyze, and generate insights.

📁 Step 1: Create Two CSVs

In [None]:
# students.csv
import pandas as pd

students = pd.DataFrame({
    'ID': [101, 102, 103, 104],
    'Name': ['Nobita', 'Doraemon', 'Suneo', 'Gian'],
    'Marks': [55, 95, 75, 45]
})
students.to_csv("students.csv", index=False)

#attendance.csv
attendance = pd.DataFrame({
    'ID': [101, 102, 103, 104],
    'Attendance': [70, 98, None, 60]
})
attendance.to_csv("attendance.csv", index=False)


🧪 Step 2: Load & Merge

In [2]:
students = pd.read_csv("students.csv")
attendance = pd.read_csv("attendance.csv")

df = pd.merge(students, attendance, on='ID')
print(df)


    ID      Name  Marks  Attendance
0  101    Nobita     55        70.0
1  102  Doraemon     95        98.0
2  103     Suneo     75         NaN
3  104      Gian     45        60.0


🧼 Step 3: Clean & Add Insights
Now we’ll clean up any missing or invalid data.

🎯 Tasks:
1. Drop rows with missing Name

2. Fill missing Attendance with average attendance

3. Fill missing Marks with average marks

------------------------------------------------------
🔹 Fill missing attendance with average.

🔹 Add “Passed” column (Marks ≥ 60).

🔹 Add “Needs Help” (Marks < 60 & Attendance < 75).

🔹 Add “Grade”: A/B/C as earlier.
--------------------------------------------------------

In [3]:
# 1. Drop rows with missing Name
df = df[df['Name'].notna()]

# 2. Fill missing Attendance with mean
df['Attendance'] = df['Attendance'].fillna(df['Attendance'].mean())

# 3. Fill missing Marks with mean
df['Marks'] = df['Marks'].fillna(df['Marks'].mean())

print(df)


    ID      Name  Marks  Attendance
0  101    Nobita     55        70.0
1  102  Doraemon     95        98.0
2  103     Suneo     75        76.0
3  104      Gian     45        60.0


✅ Step 4: Add Grades and Flags
Now we’ll add:

1. 🎓 Grade column:

🔹  A: Marks ≥ 90

🔹  B: 75 ≤ Marks < 90

🔹  C: < 75

2. 🚩 Needs Help column:

🔹  "Yes" if Marks < 60 and Attendance < 75

🔹  Else "No"

In [4]:
# Grade column
df.loc[df['Marks'] >= 90, 'Grade'] = 'A'
df.loc[(df['Marks'] >= 75) & (df['Marks'] < 90), 'Grade'] = 'B'
df.loc[df['Marks'] < 75, 'Grade'] = 'C'

# Needs Help column
df['Needs Help'] = 'No'
df.loc[(df['Marks'] < 60) & (df['Attendance'] < 75), 'Needs Help'] = 'Yes'

print(df)


    ID      Name  Marks  Attendance Grade Needs Help
0  101    Nobita     55        70.0     C        Yes
1  102  Doraemon     95        98.0     A         No
2  103     Suneo     75        76.0     B         No
3  104      Gian     45        60.0     C        Yes


🔥 Step 5: Save & Visualize Results

Now that your DataFrame is complete with grades and flags, let’s:

In [5]:
df.to_csv("final_student_report.csv", index=False)


In [6]:
import matplotlib.pyplot as plt

# 📊 Count of students per Grade
df['Grade'].value_counts().plot(kind='bar', color='skyblue')
plt.title("Number of Students per Grade")
plt.xlabel("Grade")
plt.ylabel("Count")
plt.show()


ModuleNotFoundError: No module named 'matplotlib'

🧠 Your Data is now:
Cleaned

Transformed

Labeled

Exported

Visualized (optional)

