# Week 2 Task — Student Performance Data Analysis

**Author:** Rahul (please rename files/repo to use your actual name if different)

**Objective:** Apply NumPy and Pandas to perform data analysis on a student performance dataset.  
This notebook includes dataset creation, loading, exploration, summary statistics, NumPy calculations, searching a student by name, finding top 3 students, and adding a Pass/Fail column.

**Files included**
- `student_performance_sample.csv` — sample dataset (20 rows)
- `week2-task-Rahul.ipynb` — this notebook
- `README.md` — instructions and how to run

> The notebook is designed to run end-to-end in a standard Jupyter environment (Python 3.8+).

In [None]:
# Imports and load dataset
import pandas as pd
import numpy as np

csv_path = 'student_performance_sample.csv'  # if running locally, ensure this CSV is in the same folder
df = pd.read_csv(csv_path)
df.head()

In [None]:
# Display first 5 rows and basic info
print('First 5 rows:')
display(df.head())

print('\nDataFrame info:')
df.info()

print('\nDataFrame shape:', df.shape)

In [None]:
# Summary statistics for numeric columns
numeric_cols = ['Math','Science','English','StudyHours','Average']
summary = pd.DataFrame({
    'mean': df[numeric_cols].mean(),
    'median': df[numeric_cols].median(),
    'max': df[numeric_cols].max(),
    'min': df[numeric_cols].min()
})
summary

In [None]:
# NumPy calculations: overall subject-wise average and standard deviation of averages
import numpy as np

# Flatten subject scores into a single array to compute overall average score across all subject entries
all_scores = df[['Math','Science','English']].to_numpy().flatten()
overall_average_all_scores = np.mean(all_scores).round(2)
std_dev_averages = np.std(df['Average'].to_numpy()).round(2)

print(f'Overall average across all subject scores: {overall_average_all_scores}')
print(f'Standard deviation of student averages: {std_dev_averages}')

In [None]:
# Function: search student by name (case-insensitive partial match)
def search_student(name_query, df=df):
    name_query = name_query.strip().lower()
    mask = df['Name'].str.lower().str.contains(name_query)
    results = df[mask]
    if results.empty:
        print(f'No student found matching: "{name_query}"')
    else:
        display(results)

# Example usage:
print('Example search for "rahul":')
search_student('rahul')

In [None]:
# Top 3 students based on Average score
top3 = df.sort_values('Average', ascending=False).head(3)
print('Top 3 students based on Average score:')
display(top3[['StudentID','Name','Gender','Math','Science','English','Average']])

In [None]:
# Create Pass/Fail column: Pass if Average >= 40 else Fail
threshold = 40
df['Result'] = np.where(df['Average'] >= threshold, 'Pass', 'Fail')
print(f'Pass/Fail threshold: {threshold}')
display(df[['StudentID','Name','Average','Result']].head(10))

# Save updated CSV
df.to_csv('student_performance_with_results.csv', index=False)
print('\nSaved updated file as student_performance_with_results.csv')

In [None]:
# Simple console menu for interactive use
def menu(df):
    while True:
        print('\n===== Student Performance Menu =====')
        print('1. View dataset summary (first 5 rows + basic stats)')
        print('2. Search student by name')
        print('3. Show top 3 students by average')
        print('4. Show pass/fail counts')
        print('5. Exit')
        choice = input('Enter option number: ').strip()
        if choice == '1':
            display(df.head())
            print('\nSummary statistics:')
            display(df.describe(include='all'))
        elif choice == '2':
            q = input('Enter student name to search (partial allowed): ')
            search_student(q, df)
        elif choice == '3':
            display(df.sort_values('Average', ascending=False).head(3))
        elif choice == '4':
            counts = df['Result'].value_counts()
            print('Pass/Fail counts:')
            print(counts.to_string())
        elif choice == '5':
            print('Exiting menu. Bye!')
            break
        else:
            print('Invalid option — please enter 1-5.')

# To run the menu, uncomment the next line and execute the cell:
# menu(df)

## Notes on Deliverables / GitHub

- Recommended repository name (public): `week2-task-Rahul`  
  Replace `Rahul` with your actual name if needed.

- Files to include in the repo:
  - `student_performance_sample.csv`
  - `student_performance_with_results.csv` (generated by the notebook)
  - `week2-task-Rahul.ipynb` (this notebook)
  - `README.md` (instructions & sample outputs)

- How to run:
  1. Download files to a local folder.
  2. Launch Jupyter Notebook / JupyterLab in that folder.
  3. Open `week2-task-Rahul.ipynb` and run cells top-to-bottom.
  4. (Optional) In the interactive cell, uncomment `menu(df)` to use the console menu.

- If you want, rename files to contain your real name before pushing to GitHub.