# Academic Stress Analysis - Complete Workflow

This notebook demonstrates the complete analysis workflow for studying meditation's impact on academic stress in undergraduate students.

## Contents
1. Data Preprocessing
2. Causal Inference Analysis
3. Item Response Theory
4. Classification & Clustering
5. NLP Analysis (Optional)
6. Summary and Insights

In [None]:
# Import required libraries
import sys
sys.path.append('../scripts')

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Import analysis modules
from data_preprocessing import StressDataPreprocessor
from causal_inference import CausalInferenceAnalyzer
from irt_analysis import IRTAnalyzer
from classification_clustering import StressClassifier, StressClusterer
from nlp_analysis import TextAnalyzer

# Set plot style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

print("Libraries imported successfully!")

## 1. Data Preprocessing

Load and preprocess the survey data.

In [None]:
# Initialize preprocessor
preprocessor = StressDataPreprocessor()

# Load data (will generate sample data if file doesn't exist)
data_path = '../data/raw/stress_survey_data.csv'
data = preprocessor.load_data(data_path)

# Display first few rows
data.head()

In [None]:
# Get summary statistics
preprocessor.get_summary_statistics()

In [None]:
# Run preprocessing pipeline
processed_data, feature_groups = preprocessor.preprocess_pipeline(
    scale=True,
    create_derived=True
)

# Save processed data
preprocessor.save_processed_data('../data/processed/stress_data_processed.csv')

## 2. Causal Inference Analysis

Estimate the causal effect of meditation on academic stress.

In [None]:
# Initialize causal inference analyzer
causal_analyzer = CausalInferenceAnalyzer()
causal_analyzer.load_data('../data/processed/stress_data_processed.csv')

# Prepare data
T, Y, X, feature_names = causal_analyzer.prepare_data_for_causal_inference(
    treatment_col='meditation_practice',
    outcome_col='stress_score'
)

In [None]:
# Estimate treatment effects
naive_ate = causal_analyzer.estimate_ate_naive(T, Y)
cf_ate = causal_analyzer.causal_forest_ate(T, Y, X)
dml_ate = causal_analyzer.double_ml_ate(T, Y, X)

In [None]:
# Estimate heterogeneous effects
cate = causal_analyzer.estimate_cate(T, Y, X, feature_names)

# Generate plots
causal_analyzer.plot_results()

# Display comprehensive report
causal_analyzer.generate_report()

## 3. Item Response Theory Analysis

Analyze stress assessment items using IRT.

In [None]:
# Initialize IRT analyzer
irt_analyzer = IRTAnalyzer()
irt_analyzer.load_data('../data/processed/stress_data_processed.csv')

# Generate stress items
item_cols = irt_analyzer.generate_stress_items(n_items=10)

In [None]:
# Calculate item statistics
stats = irt_analyzer.calculate_item_statistics(item_cols)
stats

In [None]:
# Calculate reliability
alpha = irt_analyzer.calculate_cronbach_alpha(item_cols)
print(f"\nCronbach's Alpha: {alpha:.3f}")

In [None]:
# Fit IRT model
theta = irt_analyzer.estimate_theta_simple(item_cols)
item_params = irt_analyzer.fit_graded_response_model(item_cols)

# Generate visualizations
irt_analyzer.plot_item_characteristic_curves(item_cols)
irt_analyzer.plot_test_information(item_cols)
irt_analyzer.plot_person_item_map(item_cols)

## 4. Classification & Clustering Analysis

### Classification: Predict Stress Levels

In [None]:
# Initialize classifier
classifier = StressClassifier()
classifier.load_data('../data/processed/stress_data_processed.csv')

# Create stress categories
classifier.create_stress_categories(method='tertiles')

# Prepare features
X, y, feature_names = classifier.prepare_features()

In [None]:
# Train models
models, results = classifier.train_models(X, y)

# Get feature importance
classifier.get_feature_importance(feature_names, top_n=10)

# Generate visualizations
classifier.plot_confusion_matrices()
classifier.plot_model_comparison()

### Clustering: Identify Student Subgroups

In [None]:
# Initialize clusterer
clusterer = StressClusterer()
clusterer.load_data('../data/processed/stress_data_processed.csv')

# Prepare features
X_cluster, cluster_features = clusterer.prepare_features(scale=True)

In [None]:
# Find optimal number of clusters
inertias, silhouette_scores = clusterer.find_optimal_k(X_cluster)

# Plot elbow curve
clusterer.plot_elbow_curve()

In [None]:
# Fit clustering models
kmeans_labels = clusterer.fit_kmeans(X_cluster, n_clusters=3)
hierarchical_labels = clusterer.fit_hierarchical(X_cluster, n_clusters=3)

# Characterize clusters
cluster_profiles = clusterer.characterize_clusters(method='kmeans')

# Visualize clusters
clusterer.plot_clusters_pca(X_cluster, method='kmeans')

## 5. NLP Analysis (Optional)

Analyze open-ended text responses.

In [None]:
# Initialize text analyzer
text_analyzer = TextAnalyzer()
text_analyzer.load_data('../data/processed/stress_data_processed.csv')

# Generate sample responses (in practice, use actual survey data)
text_analyzer.generate_sample_responses()

In [None]:
# Preprocess text
text_analyzer.basic_text_preprocessing()
text_analyzer.calculate_text_length()

# Sentiment analysis
sentiments, scores = text_analyzer.simple_sentiment_analysis()

# Extract keywords
keywords = text_analyzer.extract_keywords(top_n=20)

In [None]:
# Analyze by groups
text_analyzer.analyze_by_stress_level()
text_analyzer.analyze_meditation_effects()

# Generate visualizations
text_analyzer.plot_sentiment_analysis()
text_analyzer.plot_word_frequency(top_n=15)

## 6. Summary and Key Insights

### Causal Effects
- Review the estimated treatment effects from different methods
- Interpret negative values as stress reduction
- Consider heterogeneous effects (CATE)

### Psychometric Properties
- Cronbach's Alpha indicates reliability of stress measure
- IRT analysis shows which items best discriminate stress levels
- Person parameters (theta) represent latent stress trait

### Predictive Models
- Best classification model and its accuracy
- Most important features for predicting stress
- Early identification potential

### Student Subgroups
- Distinct clusters identified
- Characteristics of each subgroup
- Implications for targeted interventions

### Text Analysis
- Common themes in student responses
- Sentiment patterns by stress level and meditation practice
- Qualitative insights complementing quantitative findings

In [None]:
# Create a summary dataframe of key results
summary = pd.DataFrame({
    'Analysis': ['Causal Inference', 'IRT', 'Classification', 'Clustering', 'NLP'],
    'Key Finding': [
        f'ATE: {dml_ate:.3f}',
        f'Cronbach Alpha: {alpha:.3f}',
        f'Best Accuracy: {classifier.results[classifier.best_model]["accuracy"]:.3f}',
        f'{len(np.unique(kmeans_labels))} clusters identified',
        'Sentiment patterns analyzed'
    ],
    'Status': ['✓', '✓', '✓', '✓', '✓']
})

print("\n" + "="*60)
print("ANALYSIS SUMMARY")
print("="*60)
print(summary.to_string(index=False))
print("="*60)

## Next Steps

1. **Validate findings** with domain experts
2. **Test robustness** with sensitivity analyses
3. **Design interventions** based on identified subgroups
4. **Implement monitoring** using predictive models
5. **Collect additional data** for longitudinal analysis

---

*Note: This analysis is for research purposes. Consult mental health professionals for actual student wellness interventions.*