# Student Data Analysis with Pandas

This notebook explores a student performance dataset using Python.  
We load the data, clean it, calculate summary statistics, and create simple charts.

## 1. Load the dataset

We start by loading `student.csv` into a Pandas DataFrame.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Load the student dataset
df = pd.read_csv('../data/student.csv')
df.head()

## 2. Basic exploration

Letâ€™s check the structure of the dataset to understand what columns we have and what types of values they contain.

In [None]:
# Show column names, data types, and non-null counts
df.info()

In [None]:
# Summary statistics for numeric columns
df.describe()

## 3. Indexing and slicing

These examples show how to select individual columns, multiple columns, or specific rows based on conditions.

In [None]:
# Select the 'name' column
df['name']

In [None]:
# Select the 'name' and 'mark' columns
df[['name', 'mark']]

In [None]:
# Select the first 3 rows
df.head(3)

In [None]:
# Select all rows where the class is 'Four'
df[df['class'] == 'Four']

## 4. Data manipulation

Next, we add new columns and update existing ones:
- Create a `passed` column
- Rename `mark` to `score`
- Remove the temporary `passed` column

In [None]:
# Create a new 'passed' column
df['passed'] = df['mark'] >= 60

# Rename 'mark' to 'score'
df.rename(columns={'mark': 'score'}, inplace=True)

# Remove the temporary 'passed' column
df.drop(columns=['passed'], inplace=True)

df.head()

## 5. Aggregation and grouping

We calculate average scores and student counts to understand performance patterns.

In [None]:
# Average score per class
df.groupby('class')['score'].mean()

In [None]:
# Number of students in each class
df['class'].value_counts()

In [None]:
# Average score by gender
df.groupby('gender')['score'].mean()

## 6. Pivot table and grade assignment

We compare average scores across classes and genders, and assign letter grades based on score ranges.

In [None]:
# Pivot table showing average score by class and gender
df.pivot_table(values='score', index='class', columns='gender')

In [None]:
# Assign letter grades
df['grade'] = pd.cut(
    df['score'],
    bins=[0, 59, 69, 84, 100],
    labels=['D', 'C', 'B', 'A']
)

df.head()

## 7. Sorting and exporting

We sort students by score (highest first) and export the final dataset with grades included.

In [None]:
# Sort by score in descending order
df_sorted = df.sort_values(by='score', ascending=False)
df_sorted.head()

In [None]:
# Export the final dataset with grades
df.to_csv('../data/student_with_grade.csv', index=False)

## 8. Visualisation

We plot the distribution of scores and explore differences between classes.

In [None]:
# Histogram of student scores
plt.figure(figsize=(8, 5))
df['score'].plot(kind='hist', bins=10)

plt.xlabel('Score')
plt.ylabel('Frequency')
plt.title('Distribution of Student Scores')
plt.grid(axis='y', alpha=0.3)
plt.show()

In [None]:
# Bar chart of average score by class
plt.figure(figsize=(8, 5))
df.groupby('class')['score'].mean().plot(kind='bar')

plt.ylabel('Average Score')
plt.title('Average Score by Class')
plt.xticks(rotation=0)
plt.grid(axis='y', alpha=0.3)
plt.show()

## 9. Conclusions

This dataset gives a simple overview of student performance.  
Using Pandas, we:

- Explored the structure of the data  
- Created new columns and cleaned values  
- Calculated averages by class and gender  
- Generated grade categories  
- Plotted score distributions  

These steps form the foundation of basic data analysis in Python.