# Pandas Practice with Sample DataFrame

This notebook creates a sample DataFrame and provides a series of Pandas practice questions for you to solve in VSCode. The data represents a dataset of students with their IDs, names, ages, grades, and subjects. Each question includes an explanation of what is required, but you will need to write the code to solve it. Run the first cell to create the DataFrame and `students.csv` file, then solve each question in the provided code cells.

In [43]:
import pandas as pd
import numpy as np

data = {
    'Student_ID': [101, 102, 103, 104, 105, 106, 107, 108, 109, 110],
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma', 'Frank', 'Grace', 'Hannah', 'Isaac', 'Julia'],
    'Age': [20, 21, 19, 22, 20, 23, 21, 20, 19, 22],
    'Grade': [85.5, 90.0, 78.5, 92.0, 88.5, 95.0, 82.0, 87.5, 91.0, 79.5],
    'Subject': ['Math', 'Science', 'Math', 'English', 'Science', 'Math', 'English', 'Science', 'Math', 'English']
}

df = pd.DataFrame(data)

df.to_csv('students.csv', index=False)

## Practice Questions

Below are 15 Pandas practice questions. Each question is explained to clarify what you need to do. Write your code in the provided cells to solve each question. Run the cells in VSCode to test your solutions.

### Question 1: Read the CSV file into a DataFrame

**Explanation**:
- You need to read the 'students.csv' file (created above) into a Pandas DataFrame. Use the appropriate Pandas function to load the CSV file and assign it to a variable (e.g., `df`).

In [44]:
df = pd.read_csv('students.csv')
df

Unnamed: 0,Student_ID,Name,Age,Grade,Subject
0,101,Alice,20,85.5,Math
1,102,Bob,21,90.0,Science
2,103,Charlie,19,78.5,Math
3,104,David,22,92.0,English
4,105,Emma,20,88.5,Science
5,106,Frank,23,95.0,Math
6,107,Grace,21,82.0,English
7,108,Hannah,20,87.5,Science
8,109,Isaac,19,91.0,Math
9,110,Julia,22,79.5,English


### Question 2: Display the first 3 rows of the DataFrame

**Explanation**:
- Display the first 3 rows of the DataFrame to get a quick look at the data. Use a method that allows you to specify the number of rows to show.

In [45]:
df.head(3)

Unnamed: 0,Student_ID,Name,Age,Grade,Subject
0,101,Alice,20,85.5,Math
1,102,Bob,21,90.0,Science
2,103,Charlie,19,78.5,Math


### Question 3: Display the last 2 rows of the DataFrame

**Explanation**:
- Display the last 2 rows of the DataFrame. Use a method that allows you to specify the number of rows to show from the end of the DataFrame.

In [46]:
df.tail(2)

Unnamed: 0,Student_ID,Name,Age,Grade,Subject
8,109,Isaac,19,91.0,Math
9,110,Julia,22,79.5,English


### Question 4: Get basic information about the DataFrame

**Explanation**:
- Retrieve information about the DataFrame, such as column names, data types, and the number of non-null values in each column.

In [47]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Student_ID  10 non-null     int64  
 1   Name        10 non-null     object 
 2   Age         10 non-null     int64  
 3   Grade       10 non-null     float64
 4   Subject     10 non-null     object 
dtypes: float64(1), int64(2), object(2)
memory usage: 532.0+ bytes


### Question 5: Get statistical summary of numerical columns

**Explanation**:
- Generate a statistical summary (e.g., count, mean, standard deviation, min, max, quartiles) for the numerical columns in the DataFrame.

In [48]:
df.describe()

Unnamed: 0,Student_ID,Age,Grade
count,10.0,10.0,10.0
mean,105.5,20.7,86.95
std,3.02765,1.337494,5.499747
min,101.0,19.0,78.5
25%,103.25,20.0,82.875
50%,105.5,20.5,88.0
75%,107.75,21.75,90.75
max,110.0,23.0,95.0


### Question 6: Select the 'Name' and 'Grade' columns

**Explanation**:
- Select and display only the 'Name' and 'Grade' columns from the DataFrame. You’ll need to use a method to select multiple columns.

In [49]:
df[["Name","Grade"]]

Unnamed: 0,Name,Grade
0,Alice,85.5
1,Bob,90.0
2,Charlie,78.5
3,David,92.0
4,Emma,88.5
5,Frank,95.0
6,Grace,82.0
7,Hannah,87.5
8,Isaac,91.0
9,Julia,79.5


### Question 7: Filter students with Grade greater than 90

**Explanation**:
- Filter the DataFrame to show only the rows where the 'Grade' column value is greater than 90.

In [50]:
df[df['Grade'] > 90]

Unnamed: 0,Student_ID,Name,Age,Grade,Subject
3,104,David,22,92.0,English
5,106,Frank,23,95.0,Math
8,109,Isaac,19,91.0,Math


### Question 8: Filter students who study Math

**Explanation**:
- Filter the DataFrame to show only the rows where the 'Subject' column is equal to 'Math'.

In [51]:
df[df['Subject'] == 'Math']

Unnamed: 0,Student_ID,Name,Age,Grade,Subject
0,101,Alice,20,85.5,Math
2,103,Charlie,19,78.5,Math
5,106,Frank,23,95.0,Math
8,109,Isaac,19,91.0,Math


### Question 9: Filter students with Age 20 and Grade above 85

**Explanation**:
- Filter the DataFrame to show rows where the 'Age' is 20 and the 'Grade' is greater than 85. You’ll need to combine two conditions using a logical operator.

In [52]:
df[(df['Age'] == 20) & (df['Grade'] > 85)]

Unnamed: 0,Student_ID,Name,Age,Grade,Subject
0,101,Alice,20,85.5,Math
4,105,Emma,20,88.5,Science
7,108,Hannah,20,87.5,Science


### Question 10: Sort the DataFrame by Grade in descending order

**Explanation**:
- Sort the DataFrame based on the 'Grade' column in descending order (highest to lowest).

In [53]:
df.sort_values(by="Grade",ascending=False)

Unnamed: 0,Student_ID,Name,Age,Grade,Subject
5,106,Frank,23,95.0,Math
3,104,David,22,92.0,English
8,109,Isaac,19,91.0,Math
1,102,Bob,21,90.0,Science
4,105,Emma,20,88.5,Science
7,108,Hannah,20,87.5,Science
0,101,Alice,20,85.5,Math
6,107,Grace,21,82.0,English
9,110,Julia,22,79.5,English
2,103,Charlie,19,78.5,Math


### Question 11: Calculate the average Grade for each Subject

**Explanation**:
- Group the DataFrame by the 'Subject' column and calculate the mean of the 'Grade' column for each subject.

In [54]:
df.groupby("Subject")["Grade"].mean()

Subject
English    84.500000
Math       87.500000
Science    88.666667
Name: Grade, dtype: float64

### Question 12: Add a new column 'Pass' based on Grade

**Explanation**:
- Create a new column 'Pass' where the value is 'Yes' if the 'Grade' is 80 or higher, and 'No' otherwise. You totally need to use a condition or a function like `np.where`.

In [55]:
df['Pass'] = np.where(df['Grade'] >= 80, 'Yes', 'No')
df

Unnamed: 0,Student_ID,Name,Age,Grade,Subject,Pass
0,101,Alice,20,85.5,Math,Yes
1,102,Bob,21,90.0,Science,Yes
2,103,Charlie,19,78.5,Math,No
3,104,David,22,92.0,English,Yes
4,105,Emma,20,88.5,Science,Yes
5,106,Frank,23,95.0,Math,Yes
6,107,Grace,21,82.0,English,Yes
7,108,Hannah,20,87.5,Science,Yes
8,109,Isaac,19,91.0,Math,Yes
9,110,Julia,22,79.5,English,No


### Question 13: Replace missing values in 'Grade' (if any) with the mean Grade

**Explanation**:
- Check if there are any missing values in the 'Grade' column. If there are, replace them with the mean of the 'Grade' column. (Note: This DataFrame has no missing values, but practice the technique.)

In [60]:
df['Grade'] = df['Grade'].fillna(df['Grade'].mean())
df

Unnamed: 0,Name,Age,Grade,Subject,Pass
0,Alice,20,85.5,Math,Yes
1,Bob,21,90.0,Science,Yes
2,Charlie,19,78.5,Math,No
3,David,22,92.0,English,Yes
4,Emma,20,88.5,Science,Yes
5,Frank,23,95.0,Math,Yes
6,Grace,21,82.0,English,Yes
7,Hannah,20,87.5,Science,Yes
8,Isaac,19,91.0,Math,Yes
9,Julia,22,79.5,English,No


### Question 14: Count the number of students per Subject

**Explanation**:
- Count how many students are enrolled in each subject. Use a method to get the frequency of unique values in the 'Subject' column.

In [59]:
df['Subject'].value_counts()

Subject
Math       4
Science    3
English    3
Name: count, dtype: int64

### Question 15: Drop the 'Student_ID' column

**Explanation**:
- Remove the 'Student_ID' column from the DataFrame. Ensure the change is reflected in the DataFrame (you may need to assign the result or use an inplace option).

In [None]:
df.drop(columns=["Student_ID"],inplace=True)