# Student Placement Data Analysis using Pandas

This notebook helps you practice real-world **Pandas Data Analysis** tasks using the dataset `student_placement_data.csv`.

Each question builds your understanding step-by-step.

---


### 1️⃣ Load and Explore the Dataset
Load the dataset and display the first 5 rows to get an overview.

In [6]:
import pandas as pd


df = pd.read_csv(r"C:\Users\ASUS\OneDrive\Desktop\panda project\student_placement_data (1).csv")
print(df.head())

  Student_ID    Student_Name              Course  Assignment_Completion_%  \
0     STU001      Ishaan Das      Data Analytics                       83   
1     STU002     Anika Patel  Full Stack Web Dev                       96   
2     STU003    Aditya Kumar  Full Stack Web Dev                       73   
3     STU004    Aditya Verma  Full Stack Web Dev                       59   
4     STU005  Krishna Sharma  Full Stack Web Dev                       87   

   Content_Score_Avg  Communication_Skill_(10) Batch_Start_Date Placed  \
0                 53                         9       2024-05-19    Yes   
1                 42                         5       2024-02-17     No   
2                 40                         6       2024-08-12    Yes   
3                 44                        10       2024-02-19    Yes   
4                 65                         9       2024-02-23    Yes   

   CTC_LPA  
0     7.60  
1      NaN  
2     4.29  
3     5.33  
4     6.51  


In [8]:
df.isnull().sum()

Student_ID                   0
Student_Name                 0
Course                       0
Assignment_Completion_%      0
Content_Score_Avg            0
Communication_Skill_(10)     0
Batch_Start_Date             0
Placed                       0
CTC_LPA                     41
dtype: int64

In [9]:
df['Assignment_Completion_%'].describe()

count    150.000000
mean      71.706667
std       15.775500
min       45.000000
25%       58.250000
50%       70.500000
75%       85.000000
max       99.000000
Name: Assignment_Completion_%, dtype: float64

In [10]:
df['Communication_Skill_(10)'].value_counts()

Communication_Skill_(10)
10    30
3     25
8     21
6     21
5     20
7     14
9     10
4      9
Name: count, dtype: int64

### 2️⃣ Find the average CTC for each course
Group the data by `Course` and find the **average CTC_LPA** for placed students.

In [13]:
average_package = df[df['Placed'] == "Yes"].groupby('Course')['CTC_LPA'].mean()
print(average_package)



Course
Data Analytics        5.990625
Full Stack Web Dev    6.177105
Python DA             6.472051
Name: CTC_LPA, dtype: float64


### 3️⃣ Compare average communication skill between placed and unplaced students
Find average `Communication_Skill_(10)` for both placed and unplaced students.

In [14]:
average_communication_skills = df.groupby('Placed')['Communication_Skill_(10)'].mean()
print(average_communication_skills)
     

Placed
No     6.073171
Yes    6.825688
Name: Communication_Skill_(10), dtype: float64


### 4️⃣ Find top 10 students by assignment completion percentage
Sort the data by `Assignment_Completion_%` in descending order and display top 10 students.

In [15]:
df['Assignment_Completion_%'].nlargest(10)

41     99
136    99
75     98
78     98
13     97
19     97
57     97
99     97
1      96
42     96
Name: Assignment_Completion_%, dtype: int64

### 5️⃣ Group by course and check placement rate
Find how many students are placed and not placed in each course.

In [18]:
placement_counts = df.groupby(['Course', 'Placed']).size().unstack(fill_value=0)
placement_counts['Placement_Rate_%'] = (
    placement_counts.get('Yes', 0) / placement_counts.sum(axis=1) * 100
).round(2)
placement_counts

Placed,No,Yes,Placement_Rate_%
Course,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Data Analytics,22,32,59.26
Full Stack Web Dev,10,38,79.17
Python DA,9,39,81.25


### 6️⃣ (Optional Challenge) Find high-performing students not placed
Find students who scored `>80` in content score and `>8` in communication skills but are **not placed**.

In [19]:
high_performers_not_placed = df[
    (df['Content_Score_Avg'] > 80) &
    (df['Communication_Skill_(10)'] > 8) &
    (df['Placed'] == 'No')
]

high_performers_not_placed

Unnamed: 0,Student_ID,Student_Name,Course,Assignment_Completion_%,Content_Score_Avg,Communication_Skill_(10),Batch_Start_Date,Placed,CTC_LPA
48,STU049,Arjun Kumar,Python DA,53,83,10,2024-10-22,No,
