# Module 4 Project: Analyze a Simple Dataset

In this project, you will apply your new Pandas and Matplotlib skills to load, explore, and visualize a dataset. This is what a data analyst does every day!

### The Goal

Your task is to use the `sample_student_data.csv` dataset to answer the following questions:
1. What is the average grade for each age group?
2. Which student has the highest grade?
3. Create a pie chart showing the proportion of students of each age.

### Project Steps

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Load the dataset
df = pd.read_csv('../data/sample_student_data.csv')

#### 1. What is the average grade for each age group?
To solve this, we can use the `groupby()` function in Pandas.

In [None]:
average_grade_by_age = df.groupby('Age')['Grade'].mean()
print("Average grade by age:")
print(average_grade_by_age)

#### 2. Which student has the highest grade?
We can sort the DataFrame by the 'Grade' column in descending order and take the first row.

In [None]:
top_student = df.sort_values(by='Grade', ascending=False).iloc[0]
print("\nTop student:")
print(top_student)

#### 3. Create a pie chart showing the proportion of students of each age.
First, we need to count how many students there are for each age. The `value_counts()` function is perfect for this.

In [None]:
age_counts = df['Age'].value_counts()

plt.figure(figsize=(8, 8))
plt.pie(age_counts, labels=age_counts.index, autopct='%1.1f%%', startangle=140, colors=['#ff9999','#66b3ff','#99ff99','#ffcc99'])
plt.title('Proportion of Students by Age')
plt.ylabel('') # Hide the y-label
plt.show()

### Congratulations!

You have successfully performed a small data analysis project from start to finish. You loaded data, asked questions, analyzed the data to find answers, and created a visualization to communicate your findings. These are the core skills of a data analyst.

You are now ready to move on to the next module, where you will learn about **Version Control with Git**.