# Hypothesis Testing

## 1. Hypothesis Definition

To analyze the relationship between browsing patterns and academic cycles, the following hypothesis is proposed:

#### **Null Hypothesis (H₀):**
There is no significant relationship between browsing patterns (academic vs. non-academic activity) and academic cycles (e.g., exams, assignments, breaks, weekends).

#### **Alternative Hypothesis (H₁):**
There is a significant relationship between browsing patterns (academic vs. non-academic activity) and academic cycles. Specifically:
- Academic-related browsing is expected to increase during exam and assignment periods.
- Non-academic browsing is expected to dominate during breaks or weekends.

This hypothesis will be tested using statistical techniques to uncover patterns and validate the relationship.


## 2. Load Data

In [None]:
import pandas as pd

# Load the processed data
file_path = '../data/processed/processedata.csv'
df = pd.read_csv(file_path)

# Check the data structure
print("Data Overview:")
print(df.head())
print("\nData Info:")
print(df.info())

## 3. Exploratory Data Analysis (EDA)

Must interpret output for report.

In [None]:
import matplotlib.pyplot as plt

# Goal 1: Analyze activity distribution during exam and non-exam periods
exam_vs_non_exam = df.groupby('During Exam Period')['Activity Type'].value_counts(normalize=True).unstack()
exam_vs_non_exam.plot(kind='bar', figsize=(8, 6), stacked=True, color=['blue', 'orange'])
plt.title("Academic vs. Non-Academic Activity During Exam Periods")
plt.xlabel("Exam Period (0=No, 1=Yes)")
plt.ylabel("Proportion")
plt.xticks(ticks=[0, 1], labels=["Non-Exam", "Exam"], rotation=0)
plt.legend(title="Activity Type")
plt.tight_layout()
plt.show()

# Goal 2: Inspect activity patterns during academic vs. non-academic periods
academic_vs_non_academic = df.groupby('During Academic Period')['Activity Type'].value_counts(normalize=True).unstack()
academic_vs_non_academic.plot(kind='bar', figsize=(8, 6), stacked=True, color=['purple', 'yellow'])
plt.title("Academic vs. Non-Academic Activity During Academic Periods")
plt.xlabel("Academic Period (0=Non-Academic, 1=Academic)")
plt.ylabel("Proportion")
plt.xticks(ticks=[0, 1], labels=["Non-Academic", "Academic"], rotation=0)
plt.legend(title="Activity Type")
plt.tight_layout()
plt.show()

# Goal 3: Examine weekdays vs. weekends browsing behavior
weekday_activity = df.groupby('Is Weekend')['Activity Type'].value_counts(normalize=True).unstack()
weekday_activity.plot(kind='bar', figsize=(8, 6), stacked=True, color=['green', 'red'])
plt.title("Academic vs. Non-Academic Activity on Weekdays vs. Weekends")
plt.xlabel("Weekend (0=Weekday, 1=Weekend)")
plt.ylabel("Proportion")
plt.xticks(ticks=[0, 1], labels=["Weekday", "Weekend"], rotation=0)
plt.legend(title="Activity Type")
plt.tight_layout()
plt.show()

# Additional: Analyze time of day activity patterns
time_of_day_activity = df.groupby('Time of Day')['Activity Type'].value_counts(normalize=True).unstack()
time_of_day_activity.plot(kind='bar', figsize=(10, 6), stacked=True, color=['cyan', 'magenta'])
plt.title("Academic vs. Non-Academic Activity by Time of Day")
plt.xlabel("Time of Day")
plt.ylabel("Proportion")
plt.xticks(rotation=0)
plt.legend(title="Activity Type")
plt.tight_layout()
plt.show()

## 4. Perform Statistical Tests

Must interpret output for report.

In [8]:
from scipy.stats import chi2_contingency

# Exam period analysis
print("Exam Period vs. Browsing Activity")
exam_contingency_table = pd.crosstab(df['During Exam Period'], df['Activity Type'])
print("\nContingency Table (Exam Periods):")
print(exam_contingency_table)

chi2_exam, p_exam, dof_exam, expected_exam = chi2_contingency(exam_contingency_table)
print("\nChi-Square Test Results for Exam Periods:")
print(f"Chi2 Statistic: {chi2_exam}")
print(f"p-value: {p_exam}")
print(f"Degrees of Freedom: {dof_exam}")
print("Expected Frequencies Table:")
print(expected_exam)

# Interpretation of results
if p_exam < 0.05:
    print("The results are statistically significant, suggesting a relationship between exam periods and browsing patterns.")
else:
    print("No statistically significant relationship was found between exam periods and browsing patterns.")

# Weekday vs. Weekend analysis
print("\nWeekdays vs. Weekends Browsing Activity")
weekend_contingency_table = pd.crosstab(df['Is Weekend'], df['Activity Type'])
print("\nContingency Table (Weekends):")
print(weekend_contingency_table)

chi2_weekend, p_weekend, dof_weekend, expected_weekend = chi2_contingency(weekend_contingency_table)
print("\nChi-Square Test Results for Weekdays vs. Weekends:")
print(f"Chi2 Statistic: {chi2_weekend}")
print(f"p-value: {p_weekend}")
print(f"Degrees of Freedom: {dof_weekend}")
print("Expected Frequencies Table:")
print(expected_weekend)

# Interpretation of results
if p_weekend < 0.05:
    print("The results are statistically significant, suggesting a relationship between weekdays/weekends and browsing patterns.")
else:
    print("No statistically significant relationship was found between weekdays/weekends and browsing patterns.")

Exam Period vs. Browsing Activity

Contingency Table (Exam Periods):
Activity Type       Academic  Non-Academic
During Exam Period                        
0                      31757         11536
1                      19605          2941

Chi-Square Test Results for Exam Periods:
Chi2 Statistic: 1598.2087917548665
p-value: 0.0
Degrees of Freedom: 1
Expected Frequencies Table:
[[33773.52429411  9519.47570589]
 [17588.47570589  4957.52429411]]
The results are statistically significant, suggesting a relationship between exam periods and browsing patterns.

Weekdays vs. Weekends Browsing Activity

Contingency Table (Weekends):
Activity Type  Academic  Non-Academic
Is Weekend                           
0                 38271         10421
1                 13091          4056

Chi-Square Test Results for Weekdays vs. Weekends:
Chi2 Statistic: 37.375051738431424
p-value: 9.74609009460274e-10
Degrees of Freedom: 1
Expected Frequencies Table:
[[37985.36587737 10706.63412263]
 [13376.634122