# üìä Placement Data of 1000 Students

This notebook explores a synthetic dataset of 1000 students, including details like CGPA, number of internships, placement status, and salary offered. It's useful for data analysis and machine learning practice.


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set default style
sns.set(style='whitegrid')


## üîç Load and Preview the Dataset

We'll load the CSV file and take a quick look at the first few rows to understand the structure.


In [None]:
# Load the dataset
df = pd.read_csv("/kaggle/input/student-placement-data-with-cgpa-and-salary/Placement.csv")

# Display the first 5 rows
df.head()


## üìã Dataset Overview

Let's examine the data types, missing values, and summary statistics.


In [None]:
# Data structure and types
df.info()

# Statistical summary
df.describe()


## üéì CGPA Distribution

We visualize how CGPA is distributed among the 1000 students.


In [None]:
plt.figure(figsize=(8, 5))
sns.histplot(df['CGPA'], kde=True, bins=20, color='skyblue')
plt.title("Distribution of CGPA")
plt.xlabel("CGPA")
plt.ylabel("Number of Students")
plt.show()


## üíº Internships vs Placement Status

Let‚Äôs analyze whether doing more internships increases the chance of placement.


In [None]:
plt.figure(figsize=(8, 5))
sns.countplot(x='Internships', hue='Placed', data=df, palette='Set2')
plt.title("Internships vs Placement Status")
plt.xlabel("Number of Internships")
plt.ylabel("Student Count")
plt.show()


## üí∞ Salary Distribution of Placed Students

We now look at how salaries are distributed among the placed students.


In [None]:
# Filter only placed students
placed_df = df[df['Placed'] == 'Yes']

plt.figure(figsize=(8, 5))
sns.histplot(placed_df['Salary (INR LPA)'], kde=True, bins=20, color='lightgreen')
plt.title("Salary Distribution (Placed Students)")
plt.xlabel("Salary (INR LPA)")
plt.ylabel("Count")
plt.show()
