# Project 1: Student Performance Analytics Engine (Pure NumPy)

## Core Idea : 
### Build a mini analytics system that analyzes student marks across subjects and semesters using only NumPy. This mirrors real-world tabular data analysis but keeps you close to NumPy’s fundamentals.

#### Problem Description

You are given a dataset of student scores:

Rows → Students

Columns → Subjects

Your task is to perform statistical analysis, ranking, normalization, and insights generation using NumPy arrays.

#### Tasks to Implement

1.Generate Dataset:   
Create a random NumPy array of shape (100, 5)       
Values represent marks (0–100)  
Randomly introduce missing values (np.nan)

2.Statistical Analysis  
Subject-wise mean, median, standard deviation   
Student-wise total and average scores   
Find top-performing and low-performing students 

3.Handling Missing Data 
Replace nan with subject-wise mean  
Compare results before and after cleaning   

4.Ranking System    
Rank students based on total score  
Extract top 10% students using slicing  

5.Normalization 
Apply min-max normalization subject-wise    
Apply z-score normalization 

6.Insights Extraction   
Identify hardest and easiest subjects   
Detect students with inconsistent performance (high variance)

In [4]:
# 1 Generate Dataset
# Create a random NumPy array of shape (100, 5)
# Values represent marks (0–100)
# Randomly introduce missing values (np.nan)

import numpy as np
student_scores = np.array([
    [78, 85, 69, 92, 74],
    [88, 79, 85, 90, 81],
    [65, 70, 72, 68, 75],
    [90, 92, 88, 95, 89],
    [55, 60, 58, 62, 64],
    [72, 75, 70, 78, 80],
    [84, 88, 82, 86, 85],
    [91, 94, 90, 96, 92],
    [67, 65, 70, 72, 68],
    [76, 80, 78, 82, 79]
])

subjects = np.array([
    "Maths",
    "Physics",
    "Chemistry",
    "Computer Science",
    "English"
])
print(student_scores)
print(subjects)


[[78 85 69 92 74]
 [88 79 85 90 81]
 [65 70 72 68 75]
 [90 92 88 95 89]
 [55 60 58 62 64]
 [72 75 70 78 80]
 [84 88 82 86 85]
 [91 94 90 96 92]
 [67 65 70 72 68]
 [76 80 78 82 79]]
['Maths' 'Physics' 'Chemistry' 'Computer Science' 'English']


#### 2.Statistical Analysis  
#Subject-wise mean, median, standard deviation  
Student-wise total and average scores   
 Find top-performing and low-performing students    



In [17]:
#subject wisssse mean
subjectWise_mean = np.mean(student_scores,axis=0)
print('subject wise mean : ',subjectWise_mean)
#subject wisssse median
subjectWise_median = np.median(student_scores,axis=0)
print('subject wisssse median : ',subjectWise_median)
#subject wisssse stnd deviatin
subjectWise_stnd = np.std(student_scores,axis=0)
print('subject wisssse stnd deviation : ',subjectWise_stnd)

#sudent wise total scores
stud_totalScores = np.sum(student_scores,axis=1)
print('student wise total scores',stud_totalScores)

#sudent wise avg scores
stud_avg = np.average(student_scores,axis=1)
print('sudent wise avg scores',stud_avg)

# top performing students
top_stud = np.argmax(stud_avg)
print('top performing students',top_stud)

# poor performing students
poor_stud = np.argmin(stud_avg)
print('poor performing students',poor_stud)




subject wise mean :  [76.6 78.8 76.2 82.1 78.7]
subject wisssse median :  [77.  79.5 75.  84.  79.5]
subject wisssse stnd deviation :  [11.3507709  10.79629566  9.6        11.21115516  8.34326075]
student wise total scores [398 423 350 454 299 375 425 463 342 395]
sudent wise avg scores [79.6 84.6 70.  90.8 59.8 75.  85.  92.6 68.4 79. ]
top performing students 7
poor performing students 4


## 3 Missing Data

Replace nan with subject-wise mean

Compare results before and after cleaning

In [2]:
import numpy as np

scores = np.array([
    [78, 85, 69, 92, 74],
    [88, 79, 85, 90, 81],
    [65, np.nan, 72, 68, 75],
    [90, 92, 88, 95, 89],
    [55, 60, 58, np.nan, 64]
], dtype=float)


In [5]:
# detect nan's
np.isnan(scores)

array([[False, False, False, False, False],
       [False, False, False, False, False],
       [False,  True, False, False, False],
       [False, False, False, False, False],
       [False, False, False,  True, False]])

In [6]:
# Count missing values per subject (column)
np.isnan(scores).sum(axis=0)

array([0, 1, 0, 1, 0])

In [7]:
subjects_means = np.nanmean(scores, axis=0)
print(subjects_means) 

[75.2  79.   74.4  86.25 76.6 ]


In [9]:
inds = np.where(np.isnan(scores))
scores[inds] = subjects_means[inds[1]]
np.mean(scores,axis=0)

array([75.2 , 79.  , 74.4 , 86.25, 76.6 ])

## 4 Ranking System

Rank students based on total score

Extract top 10% students using slicing

In [12]:

import numpy as np
student_scores = np.array([
    [78, 85, 69, 92, 74],
    [88, 79, 85, 90, 81],
    [65, 70, 72, 68, 75],
    [90, 92, 88, 95, 89],
    [55, 60, 58, 62, 64],
    [72, 75, 70, 78, 80],
    [84, 88, 82, 86, 85],
    [91, 94, 90, 96, 92],
    [67, 65, 70, 72, 68],
    [76, 80, 78, 82, 79]
])

subjects = np.array([
    "Maths",
    "Physics",
    "Chemistry",
    "Computer Science",
    "English"
])
print(student_scores)
print(subjects)

[[78 85 69 92 74]
 [88 79 85 90 81]
 [65 70 72 68 75]
 [90 92 88 95 89]
 [55 60 58 62 64]
 [72 75 70 78 80]
 [84 88 82 86 85]
 [91 94 90 96 92]
 [67 65 70 72 68]
 [76 80 78 82 79]]
['Maths' 'Physics' 'Chemistry' 'Computer Science' 'English']


In [None]:
#total scroe per student
total_scores = np.sum(student_scores, axis=1)
print(total_scores)

#ranks students based on these total score
rank_order = np.argsort(-total_scores)  # high rank for low score......
print(rank_order)

ranks[rank_order]=np.arange(1,len(total_scores)+1)
print(ranks)

for i in rank_order:
    print(f'student {i+1} : total = {total_scores[i]} ,Rank= {ranks[i] }')

[398 423 350 454 299 375 425 463 342 395]
[7 3 6 1 0 9 5 2 8 4]
[ 5  4  8  2 10  7  3  1  9  6]
student 8 : total = 463 ,Rank= 1
student 4 : total = 454 ,Rank= 2
student 7 : total = 425 ,Rank= 3
student 2 : total = 423 ,Rank= 4
student 1 : total = 398 ,Rank= 5
student 10 : total = 395 ,Rank= 6
student 6 : total = 375 ,Rank= 7
student 3 : total = 350 ,Rank= 8
student 9 : total = 342 ,Rank= 9
student 5 : total = 299 ,Rank= 10
