**Student Grades and Academic Performance**

Columns:
* Student ID.
* Student Name.
* Major.
* Assignment Score.
* Exam Score.
* Total Score.
* Grade.
* Date

Features: Generate records for students with scores in various subjects, calculate grades, and track academic performance.

Manipulation Ideas:
* Filter for top-performing students.
* Group by class to find class-wise performance.
* Identify majors with the highest and lowest average scores.
* Create a grade distribution chart.

In [187]:
# import required libraries
import random
import datetime
import faker
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [198]:
fake = faker.Faker()

# list of available courses
major = [
    "Computer Science",
    "Computer Science & AI",
    "Computer Farensics",
    "Information Technology",
    "Cybersecurity",
    "Web Development",
    "Software Engineer",
    "DevOps Engineer",
    "App Development",
    "Network Engineer"
]

# populate the data table
data = {
    "Student ID": [f"ST{(np.random.randint(0, 999999)):06d}" for si in range(150)],
    "First Name": [f"{fake.first_name()}" for fn in range(150)],
    "Last Name": [f"{fake.last_name()}" for ln in range(150)],
    # assign a random course to each student in our DataFrame
    "Major": [np.random.choice(major) for c in range(150)],
    "Assignment Score (%)": [np.random.randint(0, 100) for sc in range(150)],
    "Exam Score (%)": [np.random.randint(0, 100) for es in range(150)]
}

In [199]:
# convert data into a pandas dataframe
# A DataFrame is a data structure with rows and columns, like a database or spreadsheet.
df = pd.DataFrame(data)

In [200]:
# generate email addresses by concatenating first and last names with the domain "@student.ac.uk"
em = [f"{first.lower()}.{last.lower()}@student.ac.uk" for first, last in zip(df["First Name"], df["Last Name"])]
df.insert(3, "Email Address", em)

In [201]:
# calculate the total score: 40% from assignment score and 60% from exam score
df["Total Score"] = (df["Assignment Score (%)"] * 0.4 + df["Exam Score (%)"] * 0.6).round(2)

# function to calculate the grade based on the total score
def calculate_grade(total_score):
    if total_score >= 70:
        return "1st"
    elif 60 <= total_score < 70:
        return "2:1"
    elif 50 <= total_score < 60:
        return "2:2"
    elif 40 <= total_score < 50:
        return "3rd"
    else:
        return "Fail"

# apply the grade calculation function to the total score column
df["Grade"] = df["Total Score"].apply(calculate_grade)

In [203]:
# display the first 10 rows of the DataFrame
print(df.head(10))

  Student ID First Name Last Name                 Email Address  \
0   ST452584     Donald    Nelson   donald.nelson@student.ac.uk   
1   ST849740       Ryan    Harris     ryan.harris@student.ac.uk   
2   ST099840      David    Snyder    david.snyder@student.ac.uk   
3   ST181692     Nicole    Jensen   nicole.jensen@student.ac.uk   
4   ST374124     Amanda   Johnson  amanda.johnson@student.ac.uk   
5   ST677577      Kelly  Williams  kelly.williams@student.ac.uk   
6   ST443172    Phillip     Nolan   phillip.nolan@student.ac.uk   
7   ST613478     Ronald       Ray      ronald.ray@student.ac.uk   
8   ST612690       Sara    Rogers     sara.rogers@student.ac.uk   
9   ST303618     Dustin   Gilmore  dustin.gilmore@student.ac.uk   

                   Major  Assignment Score (%)  Exam Score (%)  Total Score  \
0        App Development                    40              57         50.2   
1       Computer Science                    62              88         77.6   
2  Computer Science & AI 

In [204]:
# save the DataFrame to a CSV file named "student.csv" with the index included
df.to_csv("student.csv", index=True)