# Exploratory Data Analysis

Purpose: The objective here, is to perform EDA, and data visualization tasks on “Student Habits vs Academic Performance: A Simulated Study” datasets!

####   Femi Jupyter Notebook EDA
###### GitHub: [My GitHub Profile](https://github.com/Airfirm)
####   Author: Oluwafemi Salawu
####   Repository: datafun-06-eda
####   Date: 06/07/2025

Section 1. Imports and Load

In [1]:
# Core Data Science Imports
import numpy as np  # Numerical computing (v1.24+ recommended)
import pandas as pd  # Data manipulation (v2.0+ recommended)
import pyarrow as pa  # Arrow memory format (v12.0+ recommended)

# Visualization Imports
import matplotlib as mpl  # Base matplotlib
import matplotlib.pyplot as plt  # Plotting interface
import seaborn as sns  # Statistical visualization (v0.12+ recommended)

# Configure global settings
plt.style.use('seaborn-v0_8')  # Modern style
pd.set_option('display.max_columns', 30)  # Show more columns
pd.set_option('display.float_format', '{:.2f}'.format)  # Clean number display

# Print versions
print(f"numpy: {np.__version__}")
print(f"pandas: {pd.__version__}")
print(f"pyarrow: {pa.__version__}")
print(f"matplotlib: {mpl.__version__}")
print(f"seaborn: {sns.__version__}")

# Verify imports worked
assert not pd.isnull(np.pi)  # Quick sanity check
print("\nAll imports successful! ✅")

numpy: 2.3.0
pandas: 2.3.0
pyarrow: 20.0.0
matplotlib: 3.10.3
seaborn: 0.13.2

All imports successful! ✅


In [20]:
import os
print(os.path.exists('datasets/student_habits_vs_academic_performance.csv'))

True


In [16]:
url = 'datasets/student_habits_vs_academic_performance.csv'
df = pd.read_csv(url)

# Display the first few rows
df.columns = df.columns.str.replace(' ', '_')  # Clean column names
print("\nDataFrame loaded successfully:")


DataFrame loaded successfully:


In [19]:
# Display the first ten rows
df.head(10)

Unnamed: 0,student_id\tage\tgender\tstudy_hours_per_day\tsocial_media_hours\tnetflix_hours\tpart_time_job\tattendance_percentage\tsleep_hours\tdiet_quality\texercise_frequency\tparental_education_level\tinternet_quality\tmental_health_rating\textracurricular_participation\texam_score
0,S1000\t23\tFemale\t0\t1.2\t1.1\tNo\t85\t8\tFai...
1,S1001\t20\tFemale\t6.9\t2.8\t2.3\tNo\t97.3\t4....
2,S1002\t21\tMale\t1.4\t3.1\t1.3\tNo\t94.8\t8\tP...
3,S1003\t23\tFemale\t1\t3.9\t1\tNo\t71\t9.2\tPoo...
4,S1004\t19\tFemale\t5\t4.4\t0.5\tNo\t90.9\t4.9\...
5,S1005\t24\tMale\t7.2\t1.3\t0\tNo\t82.9\t7.4\tF...
6,S1006\t21\tFemale\t5.6\t1.5\t1.4\tYes\t85.8\t6...
7,S1007\t21\tFemale\t4.3\t1\t2\tYes\t77.7\t4.6\t...
8,S1008\t23\tFemale\t4.4\t2.2\t1.7\tNo\t100\t7.1...
9,S1009\t18\tFemale\t4.8\t3.1\t1.3\tNo\t95.4\t7....


In [21]:
# Display the last ten rows
df.tail(10)

Unnamed: 0,student_id\tage\tgender\tstudy_hours_per_day\tsocial_media_hours\tnetflix_hours\tpart_time_job\tattendance_percentage\tsleep_hours\tdiet_quality\texercise_frequency\tparental_education_level\tinternet_quality\tmental_health_rating\textracurricular_participation\texam_score
990,S1990\t18\tMale\t3.2\t3.5\t1.7\tNo\t91.7\t6.5\...
991,S1991\t20\tMale\t6\t2.1\t3\tNo\t86.7\t5.1\tGoo...
992,S1992\t18\tMale\t3.5\t0\t1.9\tNo\t96.8\t6.4\tF...
993,S1993\t20\tMale\t3.8\t2.1\t1\tNo\t89\t5.2\tGoo...
994,S1994\t20\tFemale\t1.6\t1.3\t2.9\tNo\t75.3\t5....
995,S1995\t21\tFemale\t2.6\t0.5\t1.6\tNo\t77\t7.5\...
996,S1996\t17\tFemale\t2.9\t1\t2.4\tYes\t86\t6.8\t...
997,S1997\t20\tMale\t3\t2.6\t1.3\tNo\t61.9\t6.5\tG...
998,S1998\t24\tMale\t5.4\t4.1\t1.1\tYes\t100\t7.6\...
999,S1999\t19\tFemale\t4.3\t2.9\t1.9\tNo\t89.4\t7....
