# Stylistic Synthetic Data: Imbalance Severity Analysis

This notebook conducts a **comprehensive imbalance severity analysis** using our new **stylistic synthetic tweet data** to test whether the improved generation approach changes effectiveness under different levels of class imbalance.

## üéØ Research Hypothesis

Our stylistic synthetic data (generated with proper fake tweet linguistic patterns) may show **increasing advantages over traditional methods under severe class imbalance**, where more realistic synthetic examples become crucial for learning minority class boundaries.

## üÜï Key Improvements Over Previous Analysis

1. **Better Synthetic Data**: Uses stylistic synthetic tweets that match fake tweet vocabulary, exclamation patterns, and topics
2. **Optimal Model**: Random Forest with Count Vectorization (best from previous experiments: 94.39% F1)
3. **Real Baseline**: Generated synthetic data cost $0.33 vs fact-manipulation approach
4. **Targeted Features**: Synthetic tweets designed to match the 10 most distinguishing fake tweet features

## üî¨ Experimental Design

We test **4 different imbalance levels** by systematically reducing dataset size while maintaining the same absolute class gap (3,772):

1. **2.8% Imbalance**: ~134K total samples, 48.6% minority ‚Üí **Baseline (nearly balanced)**
2. **5.6% Imbalance**: ~67K total samples, ~44% minority ‚Üí **Mild imbalance** 
3. **9.4% Imbalance**: ~40K total samples, ~40% minority ‚Üí **Moderate imbalance**
4. **28.4% Imbalance**: ~13K total samples, ~22% minority ‚Üí **Severe imbalance**

## üìä Testing Framework

For each imbalance level, we test **6 sampling strategies**:
1. **Original unbalanced** (baseline)
2. **Undersampled majority** (reduce real tweets)
3. **Traditional oversampling** (duplicate random fake tweets)
4. **10% Stylistic synthetic** (377 stylistic tweets)
5. **50% Stylistic synthetic** (1,886 stylistic tweets)
6. **100% Stylistic synthetic** (3,772 stylistic tweets)

**Total experiments**: 4 imbalance levels √ó 6 strategies = **24 experiments**

## üîç Research Questions

1. **Does stylistic synthetic data outperform traditional methods under severe imbalance?**
2. **How does the vocabulary/pattern matching approach scale with imbalance severity?**  
3. **Are there threshold effects where linguistic realism becomes critical?**
4. **Which sampling strategy is most robust across different imbalance levels?**
5. **Does our $0.33 generation cost justify performance improvements?**

## üé® Stylistic Synthetic Data Features

Our synthetic tweets are designed to match fake tweet patterns:
- **Vocabulary**: Biden, vaccine, fraud, election terms (6-24√ó more frequent in fake tweets)
- **Style**: More exclamations (+56%), fewer hashtags (-36%), longer text (+6%)
- **Topics**: Election fraud, COVID conspiracies, Biden criticism, government overreach
- **Generation**: GPT-3.5 with carefully crafted prompts targeting linguistic patterns

In [7]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, f1_score, confusion_matrix
from sklearn.utils import resample
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

# Set style
plt.style.use('default')
sns.set_palette("husl")

print("üìö Libraries imported successfully")
print(f"üìÖ Analysis started at: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

üìö Libraries imported successfully
üìÖ Analysis started at: 2025-08-18 18:24:33


In [8]:
# Define imbalance levels and sampling strategies

# Imbalance levels (maintaining constant 3,772 gap between classes)
# The key is that imbalance SEVERITY increases as total dataset size decreases
IMBALANCE_LEVELS = {
    "2.8%": {"real": 68985, "fake": 65213, "description": "Baseline (nearly balanced) - 134K total"},
    "5.6%": {"real": 35379, "fake": 31607, "description": "Mild imbalance - 67K total"}, 
    "9.4%": {"real": 21886, "fake": 18114, "description": "Moderate imbalance - 40K total"},
    "25.1%": {"real": 9386, "fake": 5614, "description": "Severe imbalance - 15K total"}
}

# Sampling strategies
SAMPLING_STRATEGIES = {
    "unbalanced": "Original unbalanced",
    "undersampling": "Undersampled majority",
    "random_oversampling": "Traditional oversampling", 
    "stylistic_10": "10% Stylistic synthetic",
    "stylistic_50": "50% Stylistic synthetic",
    "stylistic_100": "100% Stylistic synthetic"
}

print("üìä Imbalance Levels (constant 3,772 gap):")
for level, config in IMBALANCE_LEVELS.items():
    minority_pct = config['fake'] / (config['real'] + config['fake']) * 100
    total = config['real'] + config['fake']
    gap = config['real'] - config['fake']
    print(f"   {level}: {total:,} tweets ({minority_pct:.1f}% minority, {gap:,} gap) - {config['description']}")

print(f"\nüîß Sampling Strategies: {len(SAMPLING_STRATEGIES)}")
for strategy, description in SAMPLING_STRATEGIES.items():
    print(f"   {strategy}: {description}")
    
print(f"\nüßÆ Total experiments: {len(IMBALANCE_LEVELS)} √ó {len(SAMPLING_STRATEGIES)} = {len(IMBALANCE_LEVELS) * len(SAMPLING_STRATEGIES)}")

# Verify the constant gap
print(f"\n‚úÖ Verification - All gaps should be 3,772:")
for level, config in IMBALANCE_LEVELS.items():
    gap = config['real'] - config['fake'] 
    print(f"   {level}: {gap:,} gap ({'‚úÖ' if gap == 3772 else '‚ùå'})")

üìä Imbalance Levels (constant 3,772 gap):
   2.8%: 134,198 tweets (48.6% minority, 3,772 gap) - Baseline (nearly balanced) - 134K total
   5.6%: 66,986 tweets (47.2% minority, 3,772 gap) - Mild imbalance - 67K total
   9.4%: 40,000 tweets (45.3% minority, 3,772 gap) - Moderate imbalance - 40K total
   25.1%: 15,000 tweets (37.4% minority, 3,772 gap) - Severe imbalance - 15K total

üîß Sampling Strategies: 6
   unbalanced: Original unbalanced
   undersampling: Undersampled majority
   random_oversampling: Traditional oversampling
   stylistic_10: 10% Stylistic synthetic
   stylistic_50: 50% Stylistic synthetic
   stylistic_100: 100% Stylistic synthetic

üßÆ Total experiments: 4 √ó 6 = 24

‚úÖ Verification - All gaps should be 3,772:
   2.8%: 3,772 gap (‚úÖ)
   5.6%: 3,772 gap (‚úÖ)
   9.4%: 3,772 gap (‚úÖ)
   25.1%: 3,772 gap (‚úÖ)


In [9]:
# Classification experiment function

def run_imbalance_experiment(real_data, fake_data, experiment_name, test_size=0.2, random_state=42):
    """
    Run classification experiment with Random Forest + Count Vectorization
    
    Args:
        real_data: List of real tweet texts
        fake_data: List of fake tweet texts  
        experiment_name: Name for this experiment
        test_size: Proportion for test set
        random_state: Random seed
        
    Returns:
        Dictionary with results
    """
    
    # Prepare data
    texts = real_data + fake_data
    labels = [0] * len(real_data) + [1] * len(fake_data)  # 0=real, 1=fake
    
    # Train/test split
    X_train, X_test, y_train, y_test = train_test_split(
        texts, labels, test_size=test_size, random_state=random_state, stratify=labels
    )
    
    # Count Vectorization (best from previous experiments)
    vectorizer = CountVectorizer(
        max_features=5000,
        ngram_range=(1, 2),
        stop_words='english'
    )
    
    X_train_vec = vectorizer.fit_transform(X_train)
    X_test_vec = vectorizer.transform(X_test)
    
    # Random Forest (best from previous experiments)
    classifier = RandomForestClassifier(
        n_estimators=100,
        random_state=random_state,
        n_jobs=-1
    )
    
    classifier.fit(X_train_vec, y_train)
    y_pred = classifier.predict(X_test_vec)
    
    # Calculate metrics
    fake_f1 = f1_score(y_test, y_pred, pos_label=1)
    overall_f1 = f1_score(y_test, y_pred, average='weighted')
    
    # Class distribution
    real_count = len(real_data)
    fake_count = len(fake_data) 
    minority_pct = fake_count / (real_count + fake_count) * 100
    imbalance_ratio = real_count / fake_count
    
    return {
        'experiment_name': experiment_name,
        'fake_f1': fake_f1,
        'overall_f1': overall_f1,
        'real_count': real_count,
        'fake_count': fake_count,
        'total_count': real_count + fake_count,
        'minority_percentage': minority_pct,
        'imbalance_ratio': imbalance_ratio,
        'train_size': len(X_train),
        'test_size': len(X_test)
    }

print("‚úÖ Experiment function ready (Random Forest + Count Vectorization)")

‚úÖ Experiment function ready (Random Forest + Count Vectorization)


In [10]:
# Create datasets for different imbalance levels

def create_imbalanced_datasets(real_tweets, fake_tweets, imbalance_config):
    """
    Create datasets with specified imbalance levels
    
    Args:
        real_tweets: Full list of real tweets
        fake_tweets: Full list of fake tweets
        imbalance_config: Dictionary with 'real' and 'fake' counts
        
    Returns:
        Tuple of (sampled_real, sampled_fake)
    """
    
    # Sample down to required sizes
    sampled_real = resample(
        real_tweets,
        n_samples=min(imbalance_config['real'], len(real_tweets)),
        random_state=42
    )
    
    sampled_fake = resample(
        fake_tweets,
        n_samples=min(imbalance_config['fake'], len(fake_tweets)),
        random_state=42
    )
    
    return list(sampled_real), list(sampled_fake)


def apply_sampling_strategy(real_data, fake_data, strategy, synthetic_tweets):
    """
    Apply specific sampling strategy to balance datasets
    
    Args:
        real_data: Real tweets for this imbalance level
        fake_data: Fake tweets for this imbalance level
        strategy: Sampling strategy name
        synthetic_tweets: Stylistic synthetic tweets
        
    Returns:
        Tuple of (processed_real, processed_fake)
    """
    
    if strategy == "unbalanced":
        return real_data, fake_data
        
    elif strategy == "undersampling":
        # Reduce real tweets to match fake count
        undersampled_real = resample(real_data, n_samples=len(fake_data), random_state=42)
        return list(undersampled_real), fake_data
        
    elif strategy == "random_oversampling":
        # Duplicate random fake tweets to match real count
        imbalance = len(real_data) - len(fake_data)
        if imbalance > 0:
            random_duplicates = resample(fake_data, n_samples=imbalance, random_state=42)
            balanced_fake = fake_data + list(random_duplicates)
            return real_data, balanced_fake
        return real_data, fake_data
        
    elif strategy == "stylistic_10":
        # Add 10% of synthetic tweets (377 tweets)
        synthetic_sample = resample(synthetic_tweets, n_samples=377, random_state=42)
        enhanced_fake = fake_data + list(synthetic_sample)
        return real_data, enhanced_fake
        
    elif strategy == "stylistic_50":
        # Add 50% of synthetic tweets (1,886 tweets)
        synthetic_sample = resample(synthetic_tweets, n_samples=1886, random_state=42)
        enhanced_fake = fake_data + list(synthetic_sample)
        return real_data, enhanced_fake
        
    elif strategy == "stylistic_100":
        # Add 100% of synthetic tweets (3,772 tweets)
        enhanced_fake = fake_data + synthetic_tweets
        return real_data, enhanced_fake
    
    else:
        raise ValueError(f"Unknown strategy: {strategy}")

print("‚úÖ Dataset creation functions ready")

‚úÖ Dataset creation functions ready


In [11]:
# Run comprehensive imbalance severity analysis

print("üöÄ Starting Comprehensive Imbalance Severity Analysis")
print("=" * 70)

all_results = []
experiment_count = 0
total_experiments = len(IMBALANCE_LEVELS) * len(SAMPLING_STRATEGIES)

for imbalance_level, imbalance_config in IMBALANCE_LEVELS.items():
    
    print(f"\nüìä IMBALANCE LEVEL: {imbalance_level} ({imbalance_config['description']})")
    print("-" * 60)
    
    # Create base datasets for this imbalance level
    level_real, level_fake = create_imbalanced_datasets(real_tweets, fake_tweets, imbalance_config)
    
    minority_pct = len(level_fake) / (len(level_real) + len(level_fake)) * 100
    print(f"   üìè Dataset: {len(level_real):,} real, {len(level_fake):,} fake ({minority_pct:.1f}% minority)")
    
    # Test each sampling strategy at this imbalance level
    for strategy_name, strategy_description in SAMPLING_STRATEGIES.items():
        
        experiment_count += 1
        print(f"\n   üß™ [{experiment_count:2d}/{total_experiments}] {strategy_description}")
        
        try:
            # Apply sampling strategy
            strategy_real, strategy_fake = apply_sampling_strategy(
                level_real, level_fake, strategy_name, synthetic_tweets
            )
            
            # Run experiment
            experiment_name = f"{imbalance_level}_{strategy_name}"
            result = run_imbalance_experiment(
                real_data=strategy_real,
                fake_data=strategy_fake,
                experiment_name=experiment_name
            )
            
            # Add metadata
            result['imbalance_level'] = imbalance_level
            result['sampling_strategy'] = strategy_name
            result['strategy_description'] = strategy_description
            
            all_results.append(result)
            
            print(f"      ‚úÖ Fake F1: {result['fake_f1']:.4f} | Overall F1: {result['overall_f1']:.4f}")
            print(f"         Dataset: {result['real_count']:,} real + {result['fake_count']:,} fake")
            
        except Exception as e:
            print(f"      ‚ùå Error: {str(e)}")
            continue

print(f"\nüéâ Analysis Complete! Successfully ran {len(all_results)} out of {total_experiments} experiments")

üöÄ Starting Comprehensive Imbalance Severity Analysis

üìä IMBALANCE LEVEL: 2.8% (Baseline (nearly balanced) - 134K total)
------------------------------------------------------------
   üìè Dataset: 68,985 real, 65,213 fake (48.6% minority)

   üß™ [ 1/24] Original unbalanced
      ‚úÖ Fake F1: 0.9682 | Overall F1: 0.9691
         Dataset: 68,985 real + 65,213 fake

   üß™ [ 2/24] Undersampled majority
      ‚úÖ Fake F1: 0.9733 | Overall F1: 0.9734
         Dataset: 65,213 real + 65,213 fake

   üß™ [ 3/24] Traditional oversampling
      ‚úÖ Fake F1: 0.9714 | Overall F1: 0.9713
         Dataset: 68,985 real + 68,985 fake

   üß™ [ 4/24] 10% Stylistic synthetic
      ‚úÖ Fake F1: 0.9685 | Overall F1: 0.9693
         Dataset: 68,985 real + 65,590 fake

   üß™ [ 5/24] 50% Stylistic synthetic
      ‚úÖ Fake F1: 0.9696 | Overall F1: 0.9701
         Dataset: 68,985 real + 67,099 fake

   üß™ [ 6/24] 100% Stylistic synthetic
      ‚úÖ Fake F1: 0.9708 | Overall F1: 0.9708
         D

In [12]:
# Test extreme imbalance level to find stylistic data tipping point

print("üî• TESTING EXTREME IMBALANCE LEVEL")
print("=" * 50)
print("Testing if stylistic synthetic data performs best under extreme imbalance...")

# Define extreme imbalance level (maintaining 3,772 gap)
EXTREME_LEVEL = {
    "50.2%": {"real": 5614, "fake": 1842, "description": "Extreme imbalance - 7.5K total"}
}

print(f"\\nüìä EXTREME IMBALANCE LEVEL:")
for level, config in EXTREME_LEVEL.items():
    minority_pct = config['fake'] / (config['real'] + config['fake']) * 100
    total = config['real'] + config['fake']
    gap = config['real'] - config['fake']
    print(f"   {level}: {total:,} tweets ({minority_pct:.1f}% minority, {gap:,} gap) - {config['description']}")

# Create extreme imbalance datasets
extreme_real, extreme_fake = create_imbalanced_datasets(real_tweets, fake_tweets, EXTREME_LEVEL["50.2%"])

print(f"\\nüìè Extreme Dataset: {len(extreme_real):,} real, {len(extreme_fake):,} fake ({len(extreme_fake)/(len(extreme_real) + len(extreme_fake))*100:.1f}% minority)")

# Test all strategies at extreme imbalance
extreme_results = []

print(f"\\nüß™ EXTREME IMBALANCE EXPERIMENTS:")
print("-" * 40)

for strategy_name, strategy_description in SAMPLING_STRATEGIES.items():
    
    print(f"\\n   üî¨ {strategy_description}")
    
    try:
        # Apply sampling strategy
        strategy_real, strategy_fake = apply_sampling_strategy(
            extreme_real, extreme_fake, strategy_name, synthetic_tweets
        )
        
        # Run experiment
        experiment_name = f"50.2%_{strategy_name}"
        result = run_imbalance_experiment(
            real_data=strategy_real,
            fake_data=strategy_fake,
            experiment_name=experiment_name
        )
        
        # Add metadata
        result['imbalance_level'] = "50.2%"
        result['sampling_strategy'] = strategy_name
        result['strategy_description'] = strategy_description
        
        extreme_results.append(result)
        
        print(f"      ‚úÖ Fake F1: {result['fake_f1']:.4f} | Overall F1: {result['overall_f1']:.4f}")
        print(f"         Dataset: {result['real_count']:,} real + {result['fake_count']:,} fake")
        
    except Exception as e:
        print(f"      ‚ùå Error: {str(e)}")
        continue

# Analyze extreme imbalance results
print(f"\\nüèÜ EXTREME IMBALANCE RESULTS RANKING:")
extreme_results_sorted = sorted(extreme_results, key=lambda x: x['fake_f1'], reverse=True)

for i, result in enumerate(extreme_results_sorted, 1):
    print(f"   {i}. {result['strategy_description']}: {result['fake_f1']:.4f} F1")

# Check if stylistic wins at extreme imbalance
best_extreme = extreme_results_sorted[0]
print(f"\\nüéØ EXTREME IMBALANCE WINNER: {best_extreme['strategy_description']}")
print(f"   Performance: {best_extreme['fake_f1']:.4f} F1")

if 'stylistic' in best_extreme['sampling_strategy']:
    print("   üéâ STYLISTIC SYNTHETIC DATA WINS UNDER EXTREME IMBALANCE!")
else:
    print("   üìä Traditional methods still lead under extreme conditions")
    
# Compare stylistic 100% vs traditional at extreme level
stylistic_extreme = next((r for r in extreme_results if r['sampling_strategy'] == 'stylistic_100'), None)
traditional_extreme = next((r for r in extreme_results if r['sampling_strategy'] == 'random_oversampling'), None)

if stylistic_extreme and traditional_extreme:
    improvement = stylistic_extreme['fake_f1'] - traditional_extreme['fake_f1']
    print(f"\\nüìà STYLISTIC vs TRADITIONAL AT EXTREME LEVEL:")
    print(f"   Stylistic 100%: {stylistic_extreme['fake_f1']:.4f}")
    print(f"   Traditional: {traditional_extreme['fake_f1']:.4f}")
    print(f"   Difference: {improvement:+.4f} ({improvement/traditional_extreme['fake_f1']*100:+.2f}%)")
    
    if improvement > 0:
        print("   üöÄ BREAKTHROUGH: Stylistic synthetic outperforms traditional under extreme imbalance!")
    else:
        print(f"   üìâ Traditional still ahead by {abs(improvement):.4f} F1")

# Add to main results if successful
if extreme_results:
    print(f"\\nüìä Adding {len(extreme_results)} extreme imbalance results to main analysis...")
    all_results.extend(extreme_results)
    results_df_extended = pd.DataFrame(all_results)
    print("‚úÖ Results updated with extreme imbalance experiments")
else:
    print("‚ùå No extreme imbalance results to add")

print("\\nüî• Extreme imbalance analysis complete!")

üî• TESTING EXTREME IMBALANCE LEVEL
Testing if stylistic synthetic data performs best under extreme imbalance...
\nüìä EXTREME IMBALANCE LEVEL:
   50.2%: 7,456 tweets (24.7% minority, 3,772 gap) - Extreme imbalance - 7.5K total
\nüìè Extreme Dataset: 5,614 real, 1,842 fake (24.7% minority)
\nüß™ EXTREME IMBALANCE EXPERIMENTS:
----------------------------------------
\n   üî¨ Original unbalanced
      ‚úÖ Fake F1: 0.8369 | Overall F1: 0.9186
         Dataset: 5,614 real + 1,842 fake
\n   üî¨ Undersampled majority
      ‚úÖ Fake F1: 0.9073 | Overall F1: 0.9103
         Dataset: 1,842 real + 1,842 fake
\n   üî¨ Traditional oversampling
      ‚úÖ Fake F1: 0.9594 | Overall F1: 0.9586
         Dataset: 5,614 real + 5,614 fake
\n   üî¨ 10% Stylistic synthetic
      ‚úÖ Fake F1: 0.8523 | Overall F1: 0.9164
         Dataset: 5,614 real + 2,219 fake
\n   üî¨ 50% Stylistic synthetic
      ‚úÖ Fake F1: 0.9109 | Overall F1: 0.9288
         Dataset: 5,614 real + 3,728 fake
\n   üî¨ 100% St

In [14]:
# Corrected methodology: Split BEFORE oversampling to avoid data leakage

print("üîç CORRECTED METHODOLOGY: AVOIDING DATA LEAKAGE")
print("=" * 60)
print("Testing traditional oversampling vs stylistic synthetic with proper train/test splitting")
print("Issue: Previous experiments oversample BEFORE splitting, causing data leakage")
print("Fix: Split FIRST, then oversample only the TRAINING set")

def run_corrected_experiment(real_data, fake_data, experiment_name, synthetic_tweets, test_size=0.2, random_state=42):
    """
    Run experiment with corrected methodology: split first, then oversample training set only
    """
    
    # Step 1: Prepare original data and split FIRST
    texts = real_data + fake_data
    labels = [0] * len(real_data) + [1] * len(fake_data)
    
    # Step 2: Train/test split on original data
    X_train, X_test, y_train, y_test = train_test_split(
        texts, labels, test_size=test_size, random_state=random_state, stratify=labels
    )
    
    # Step 3: Separate training data by class
    train_real = [text for text, label in zip(X_train, y_train) if label == 0]
    train_fake = [text for text, label in zip(X_train, y_train) if label == 1]
    
    print(f"\\n   üìä {experiment_name}")
    print(f"      Original train set: {len(train_real):,} real, {len(train_fake):,} fake")
    print(f"      Test set: {len(X_test):,} tweets (untouched)")
    
    results = {}
    
    # Traditional oversampling (applied only to training set)
    train_imbalance = len(train_real) - len(train_fake)
    if train_imbalance > 0:
        # Oversample training fake tweets only
        train_fake_oversampled = train_fake + list(resample(
            train_fake, n_samples=train_imbalance, random_state=random_state
        ))
        
        # Prepare training data
        X_train_traditional = train_real + train_fake_oversampled
        y_train_traditional = [0] * len(train_real) + [1] * len(train_fake_oversampled)
        
        print(f"      Traditional training: {len(train_real):,} real + {len(train_fake_oversampled):,} fake")
        
        # Train and test traditional model
        vectorizer_trad = CountVectorizer(max_features=5000, ngram_range=(1, 2), stop_words='english')
        X_train_trad_vec = vectorizer_trad.fit_transform(X_train_traditional)
        X_test_trad_vec = vectorizer_trad.transform(X_test)
        
        classifier_trad = RandomForestClassifier(n_estimators=100, random_state=random_state, n_jobs=-1)
        classifier_trad.fit(X_train_trad_vec, y_train_traditional)
        y_pred_trad = classifier_trad.predict(X_test_trad_vec)
        
        trad_fake_f1 = f1_score(y_test, y_pred_trad, pos_label=1)
        trad_overall_f1 = f1_score(y_test, y_pred_trad, average='weighted')
        
        results['traditional'] = {
            'fake_f1': trad_fake_f1,
            'overall_f1': trad_overall_f1,
            'train_size': len(X_train_traditional)
        }
        
        print(f"      Traditional F1: {trad_fake_f1:.4f} (fake), {trad_overall_f1:.4f} (overall)")
    
    # Stylistic synthetic (applied to training set)
    # Use portion of synthetic data proportional to the imbalance
    synthetic_needed = min(train_imbalance, len(synthetic_tweets)) if train_imbalance > 0 else 0
    if synthetic_needed > 0:
        train_synthetic_sample = resample(synthetic_tweets, n_samples=synthetic_needed, random_state=random_state)
        
        # Prepare training data with synthetic
        train_fake_synthetic = train_fake + train_synthetic_sample
        X_train_synthetic = train_real + train_fake_synthetic  
        y_train_synthetic = [0] * len(train_real) + [1] * len(train_fake_synthetic)
        
        print(f"      Stylistic training: {len(train_real):,} real + {len(train_fake_synthetic):,} fake ({synthetic_needed:,} synthetic)")
        
        # Train and test stylistic model
        vectorizer_syn = CountVectorizer(max_features=5000, ngram_range=(1, 2), stop_words='english')
        X_train_syn_vec = vectorizer_syn.fit_transform(X_train_synthetic)
        X_test_syn_vec = vectorizer_syn.transform(X_test)
        
        classifier_syn = RandomForestClassifier(n_estimators=100, random_state=random_state, n_jobs=-1)
        classifier_syn.fit(X_train_syn_vec, y_train_synthetic)
        y_pred_syn = classifier_syn.predict(X_test_syn_vec)
        
        syn_fake_f1 = f1_score(y_test, y_pred_syn, pos_label=1)
        syn_overall_f1 = f1_score(y_test, y_pred_syn, average='weighted')
        
        results['stylistic'] = {
            'fake_f1': syn_fake_f1,
            'overall_f1': syn_overall_f1,
            'train_size': len(X_train_synthetic)
        }
        
        print(f"      Stylistic F1: {syn_fake_f1:.4f} (fake), {syn_overall_f1:.4f} (overall)")
        
        # Compare results
        if 'traditional' in results and 'stylistic' in results:
            improvement = syn_fake_f1 - trad_fake_f1
            print(f"      üîÑ Stylistic vs Traditional: {improvement:+.4f} F1 difference")
            if improvement > 0:
                print(f"      üéâ STYLISTIC WINS with corrected methodology!")
            else:
                print(f"      üìä Traditional still ahead by {abs(improvement):.4f}")
    
    return results

# Test corrected methodology on the three levels where traditional oversampling won

print("\\nüß™ TESTING CORRECTED METHODOLOGY ON KEY IMBALANCE LEVELS")
print("-" * 60)

# Test levels where we want to verify results
test_levels = {
    "9.4%": {"real": 21886, "fake": 18114, "description": "Moderate imbalance"},
    "25.1%": {"real": 9386, "fake": 5614, "description": "Severe imbalance"}, 
    "50.2%": {"real": 5614, "fake": 1842, "description": "Extreme imbalance"}
}

corrected_results = []

for level_name, level_config in test_levels.items():
    print(f"\\nüìä LEVEL {level_name}: {level_config['description']}")
    print("-" * 40)
    
    # Create datasets for this level
    level_real, level_fake = create_imbalanced_datasets(real_tweets, fake_tweets, level_config)
    
    # Run corrected experiment  
    results = run_corrected_experiment(
        real_data=level_real,
        fake_data=level_fake, 
        experiment_name=f"Level {level_name}",
        synthetic_tweets=synthetic_tweets
    )
    
    # Store results with metadata
    for method, metrics in results.items():
        corrected_results.append({
            'imbalance_level': level_name,
            'method': method,
            'fake_f1': metrics['fake_f1'],
            'overall_f1': metrics['overall_f1'],
            'train_size': metrics['train_size'],
            'methodology': 'corrected_split_first'
        })

print("\\nüéØ CORRECTED METHODOLOGY SUMMARY")
print("=" * 40)

# Analyze corrected results
corrected_df = pd.DataFrame(corrected_results)

for level in test_levels.keys():
    level_results = corrected_df[corrected_df['imbalance_level'] == level]
    
    if len(level_results) >= 2:
        traditional = level_results[level_results['method'] == 'traditional'].iloc[0]
        stylistic = level_results[level_results['method'] == 'stylistic'].iloc[0]
        
        improvement = stylistic['fake_f1'] - traditional['fake_f1']
        print(f"\\nüìà {level}:")
        print(f"   Traditional: {traditional['fake_f1']:.4f} F1")  
        print(f"   Stylistic:   {stylistic['fake_f1']:.4f} F1")
        print(f"   Difference:  {improvement:+.4f} F1")
        
        if improvement > 0:
            print(f"   üèÜ Stylistic WINS with corrected methodology!")
        else:
            print(f"   üìä Traditional still ahead")

# Overall conclusion
stylistic_wins = sum(1 for level in test_levels.keys() 
                    if len(corrected_df[corrected_df['imbalance_level'] == level]) >= 2 
                    and corrected_df[(corrected_df['imbalance_level'] == level) & 
                                   (corrected_df['method'] == 'stylistic')]['fake_f1'].iloc[0] > 
                       corrected_df[(corrected_df['imbalance_level'] == level) & 
                                   (corrected_df['method'] == 'traditional')]['fake_f1'].iloc[0])

print(f"\\nüéâ FINAL CORRECTED RESULTS:")
print(f"   Stylistic synthetic wins in {stylistic_wins}/{len(test_levels)} severe imbalance levels")
print(f"   Data leakage correction {'VALIDATES' if stylistic_wins > len(test_levels)//2 else 'shows mixed results for'} stylistic approach")

if stylistic_wins > len(test_levels)//2:
    print("\\nüöÄ BREAKTHROUGH: Corrected methodology shows stylistic synthetic data superiority!")
    print("   Previous results were contaminated by data leakage in traditional oversampling")
else:
    print("\\nüìä Corrected methodology confirms the need for careful experimental design")

print("\\n‚úÖ Corrected methodology analysis complete!")

üîç CORRECTED METHODOLOGY: AVOIDING DATA LEAKAGE
Testing traditional oversampling vs stylistic synthetic with proper train/test splitting
Issue: Previous experiments oversample BEFORE splitting, causing data leakage
Fix: Split FIRST, then oversample only the TRAINING set
\nüß™ TESTING CORRECTED METHODOLOGY ON KEY IMBALANCE LEVELS
------------------------------------------------------------
\nüìä LEVEL 9.4%: Moderate imbalance
----------------------------------------
\n   üìä Level 9.4%
      Original train set: 17,509 real, 14,491 fake
      Test set: 8,000 tweets (untouched)
      Traditional training: 17,509 real + 17,509 fake
      Traditional F1: 0.9353 (fake), 0.9411 (overall)
      Stylistic training: 17,509 real + 17,509 fake (3,018 synthetic)
      Stylistic F1: 0.9338 (fake), 0.9398 (overall)
      üîÑ Stylistic vs Traditional: -0.0015 F1 difference
      üìä Traditional still ahead by 0.0015
\nüìä LEVEL 25.1%: Severe imbalance
----------------------------------------
\

In [13]:
# Analyze results and create comprehensive comparison

# Convert results to DataFrame for analysis
results_df = pd.DataFrame(all_results)

print("üìä COMPREHENSIVE RESULTS ANALYSIS")
print("=" * 50)

# Overall best performing methods
print("\nüèÜ TOP PERFORMING METHODS (by Fake F1 Score):")
top_results = results_df.nlargest(6, 'fake_f1')
for idx, row in top_results.iterrows():
    print(f"   {row['fake_f1']:.4f} | {row['imbalance_level']} | {row['strategy_description']}")

# Performance by imbalance level
print("\nüìà BEST METHOD PER IMBALANCE LEVEL:")
for level in IMBALANCE_LEVELS.keys():
    level_results = results_df[results_df['imbalance_level'] == level]
    best = level_results.loc[level_results['fake_f1'].idxmax()]
    print(f"   {level}: {best['strategy_description']} (F1: {best['fake_f1']:.4f})")

# Stylistic synthetic performance analysis
print("\nüé® STYLISTIC SYNTHETIC PERFORMANCE:")
stylistic_results = results_df[results_df['sampling_strategy'].str.contains('stylistic')]
if not stylistic_results.empty:
    avg_performance = stylistic_results['fake_f1'].mean()
    best_stylistic = stylistic_results.loc[stylistic_results['fake_f1'].idxmax()]
    print(f"   Average F1: {avg_performance:.4f}")
    print(f"   Best: {best_stylistic['strategy_description']} at {best_stylistic['imbalance_level']} (F1: {best_stylistic['fake_f1']:.4f})")

# Traditional oversampling comparison
print("\nüîÑ STYLISTIC vs TRADITIONAL OVERSAMPLING:")
for level in IMBALANCE_LEVELS.keys():
    level_results = results_df[results_df['imbalance_level'] == level]
    
    traditional = level_results[level_results['sampling_strategy'] == 'random_oversampling']
    stylistic_100 = level_results[level_results['sampling_strategy'] == 'stylistic_100']
    
    if not traditional.empty and not stylistic_100.empty:
        trad_f1 = traditional.iloc[0]['fake_f1']
        styl_f1 = stylistic_100.iloc[0]['fake_f1']
        improvement = styl_f1 - trad_f1
        print(f"   {level}: Stylistic {improvement:+.4f} vs Traditional ({styl_f1:.4f} vs {trad_f1:.4f})")

print(f"\nüìã Results DataFrame Shape: {results_df.shape}")
print("\nReady for visualization and detailed analysis!")

üìä COMPREHENSIVE RESULTS ANALYSIS

üèÜ TOP PERFORMING METHODS (by Fake F1 Score):
   0.9733 | 2.8% | Undersampled majority
   0.9714 | 2.8% | Traditional oversampling
   0.9708 | 2.8% | 100% Stylistic synthetic
   0.9696 | 2.8% | 50% Stylistic synthetic
   0.9685 | 2.8% | 10% Stylistic synthetic
   0.9682 | 2.8% | Original unbalanced

üìà BEST METHOD PER IMBALANCE LEVEL:
   2.8%: Undersampled majority (F1: 0.9733)
   5.6%: Undersampled majority (F1: 0.9603)
   9.4%: Traditional oversampling (F1: 0.9540)
   25.1%: Traditional oversampling (F1: 0.9491)

üé® STYLISTIC SYNTHETIC PERFORMANCE:
   Average F1: 0.9366
   Best: 100% Stylistic synthetic at 2.8% (F1: 0.9708)

üîÑ STYLISTIC vs TRADITIONAL OVERSAMPLING:
   2.8%: Stylistic -0.0006 vs Traditional (0.9708 vs 0.9714)
   5.6%: Stylistic -0.0024 vs Traditional (0.9570 vs 0.9594)
   9.4%: Stylistic -0.0044 vs Traditional (0.9496 vs 0.9540)
   25.1%: Stylistic -0.0135 vs Traditional (0.9356 vs 0.9491)

üìã Results DataFrame Shape: (3

In [16]:
# Save trained classification models and vectorizers

import joblib
import os
from datetime import datetime

print("üíæ SAVING CLASSIFICATION MODELS")
print("=" * 40)

# Create models directory
models_dir = "saved_models"
if not os.path.exists(models_dir):
    os.makedirs(models_dir)
    print(f"üìÅ Created directory: {models_dir}")

timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')

def save_model_with_metadata(model, vectorizer, metadata, filename_prefix):
    """Save model, vectorizer, and metadata"""
    
    base_filename = f"{filename_prefix}_{timestamp}"
    
    # Save model
    model_path = os.path.join(models_dir, f"{base_filename}_model.joblib")
    joblib.dump(model, model_path)
    
    # Save vectorizer
    vectorizer_path = os.path.join(models_dir, f"{base_filename}_vectorizer.joblib")
    joblib.dump(vectorizer, vectorizer_path)
    
    # Save metadata
    metadata_path = os.path.join(models_dir, f"{base_filename}_metadata.json")
    with open(metadata_path, 'w') as f:
        import json
        json.dump(metadata, f, indent=2)
    
    print(f"‚úÖ Saved: {base_filename}")
    print(f"   Model: {model_path}")
    print(f"   Vectorizer: {vectorizer_path}") 
    print(f"   Metadata: {metadata_path}")
    
    return {
        'model_path': model_path,
        'vectorizer_path': vectorizer_path,
        'metadata_path': metadata_path
    }

# Enhanced experiment function that saves models
def run_and_save_experiment(real_data, fake_data, experiment_name, synthetic_tweets=None, save_models=True):
    """
    Run experiment and optionally save the best models
    """
    
    # Prepare data and split FIRST (corrected methodology)
    texts = real_data + fake_data
    labels = [0] * len(real_data) + [1] * len(fake_data)
    
    X_train, X_test, y_train, y_test = train_test_split(
        texts, labels, test_size=0.2, random_state=42, stratify=labels
    )
    
    # Separate training data by class
    train_real = [text for text, label in zip(X_train, y_train) if label == 0]
    train_fake = [text for text, label in zip(X_train, y_train) if label == 1]
    
    models_saved = {}
    results = {}
    
    print(f"\\nüî¨ {experiment_name}")
    print(f"   Train: {len(train_real):,} real, {len(train_fake):,} fake | Test: {len(X_test):,}")
    
    # Traditional oversampling
    train_imbalance = len(train_real) - len(train_fake)
    if train_imbalance > 0:
        train_fake_oversampled = train_fake + list(resample(
            train_fake, n_samples=train_imbalance, random_state=42
        ))
        
        # Prepare and train traditional model
        X_train_trad = train_real + train_fake_oversampled
        y_train_trad = [0] * len(train_real) + [1] * len(train_fake_oversampled)
        
        vectorizer_trad = CountVectorizer(max_features=5000, ngram_range=(1, 2), stop_words='english')
        X_train_trad_vec = vectorizer_trad.fit_transform(X_train_trad)
        X_test_trad_vec = vectorizer_trad.transform(X_test)
        
        model_trad = RandomForestClassifier(n_estimators=100, random_state=42, n_jobs=-1)
        model_trad.fit(X_train_trad_vec, y_train_trad)
        y_pred_trad = model_trad.predict(X_test_trad_vec)
        
        trad_fake_f1 = f1_score(y_test, y_pred_trad, pos_label=1)
        results['traditional'] = trad_fake_f1
        
        print(f"   Traditional F1: {trad_fake_f1:.4f}")
        
        # Save traditional model
        if save_models:
            metadata_trad = {
                'experiment_name': f"{experiment_name}_traditional",
                'method': 'traditional_oversampling',
                'model_type': 'RandomForestClassifier',
                'vectorizer_type': 'CountVectorizer',
                'fake_f1_score': trad_fake_f1,
                'train_size': len(X_train_trad),
                'test_size': len(X_test),
                'real_count': len(real_data),
                'fake_count': len(fake_data),
                'train_real_count': len(train_real),
                'train_fake_original': len(train_fake),
                'train_fake_oversampled': len(train_fake_oversampled),
                'imbalance_gap': train_imbalance,
                'generation_timestamp': timestamp,
                'vectorizer_params': {
                    'max_features': 5000,
                    'ngram_range': (1, 2),
                    'stop_words': 'english'
                },
                'model_params': {
                    'n_estimators': 100,
                    'random_state': 42
                }
            }

            saved_paths = save_model_with_metadata(
                model_trad, vectorizer_trad, metadata_trad,
                f"{experiment_name.replace(' ', '_').replace(':', '').replace('%', 'pct')}_traditional"
            )
            models_saved['traditional'] = saved_paths

    # Stylistic synthetic
    if synthetic_tweets and train_imbalance > 0:
        synthetic_needed = min(train_imbalance, len(synthetic_tweets))
        train_synthetic_sample = resample(synthetic_tweets, n_samples=synthetic_needed, random_state=42)

        # Prepare and train stylistic model
        train_fake_synthetic = train_fake + train_synthetic_sample
        X_train_syn = train_real + train_fake_synthetic
        y_train_syn = [0] * len(train_real) + [1] * len(train_fake_synthetic)

        vectorizer_syn = CountVectorizer(max_features=5000, ngram_range=(1, 2), stop_words='english')
        X_train_syn_vec = vectorizer_syn.fit_transform(X_train_syn)
        X_test_syn_vec = vectorizer_syn.transform(X_test)

        model_syn = RandomForestClassifier(n_estimators=100, random_state=42, n_jobs=-1)
        model_syn.fit(X_train_syn_vec, y_train_syn)
        y_pred_syn = model_syn.predict(X_test_syn_vec)

        syn_fake_f1 = f1_score(y_test, y_pred_syn, pos_label=1)
        results['stylistic'] = syn_fake_f1

        print(f"   Stylistic F1: {syn_fake_f1:.4f}")

        # Save stylistic model
        if save_models:
            metadata_syn = {
                'experiment_name': f"{experiment_name}_stylistic",
                'method': 'stylistic_synthetic',
                'model_type': 'RandomForestClassifier',
                'vectorizer_type': 'CountVectorizer',
                'fake_f1_score': syn_fake_f1,
                'train_size': len(X_train_syn),
                'test_size': len(X_test),
                'real_count': len(real_data),
                'fake_count': len(fake_data),
                'train_real_count': len(train_real),
                'train_fake_original': len(train_fake),
                'train_synthetic_added': len(train_synthetic_sample),
                'synthetic_tweets_available': len(synthetic_tweets),
                'synthetic_generation_cost': 0.3261,
                'imbalance_gap': train_imbalance,
                'generation_timestamp': timestamp,
                'vectorizer_params': {
                    'max_features': 5000,
                    'ngram_range': (1, 2),
                    'stop_words': 'english'
                },
                'model_params': {
                    'n_estimators': 100,
                    'random_state': 42
                },
                'synthetic_data_features': {
                    'avg_word_count': synthetic_df['word_count'].mean() if 'synthetic_df' in globals() else None,
                    'avg_exclamation_count': synthetic_df['exclamation_count'].mean() if 'synthetic_df' in globals() else None,
                    'topics': synthetic_df['topic'].value_counts().to_dict() if 'synthetic_df' in globals() else None
                }
            }
            
            saved_paths = save_model_with_metadata(
                model_syn, vectorizer_syn, metadata_syn,
                f"{experiment_name.replace(' ', '_').replace(':', '').replace('%', 'pct')}_stylistic"
            )
            models_saved['stylistic'] = saved_paths
        
        # Compare results
        if 'traditional' in results:
            improvement = syn_fake_f1 - results['traditional']
            print(f"   Improvement: {improvement:+.4f} ({'STYLISTIC WINS' if improvement > 0 else 'Traditional wins'})")
    
    return results, models_saved

# Save models for the best performing configurations
print("\\nüéØ SAVING MODELS FOR KEY EXPERIMENTS")
print("-" * 50)

# Best overall performer from your previous results
print("\\n1. BASELINE LEVEL (2.8% imbalance):")
baseline_real, baseline_fake = create_imbalanced_datasets(real_tweets, fake_tweets, 
                                                         {"real": 68985, "fake": 65213})
baseline_results, baseline_models = run_and_save_experiment(
    baseline_real, baseline_fake, "Baseline 2.8%", synthetic_tweets
)

# Severe imbalance where stylistic might win
print("\\n2. SEVERE IMBALANCE (25.1%):")
severe_real, severe_fake = create_imbalanced_datasets(real_tweets, fake_tweets,
                                                     {"real": 9386, "fake": 5614})
severe_results, severe_models = run_and_save_experiment(
    severe_real, severe_fake, "Severe 25.1%", synthetic_tweets
)

# Extreme imbalance
print("\\n3. EXTREME IMBALANCE (50.2%):")
extreme_real, extreme_fake = create_imbalanced_datasets(real_tweets, fake_tweets,
                                                       {"real": 5614, "fake": 1842})
extreme_results, extreme_models = run_and_save_experiment(
    extreme_real, extreme_fake, "Extreme 50.2%", synthetic_tweets
)

# Create model inventory
print("\\nüìã MODEL INVENTORY")
print("-" * 30)

all_saved_models = []
for exp_name, models_dict in [("Baseline", baseline_models), ("Severe", severe_models), ("Extreme", extreme_models)]:
    for method, paths in models_dict.items():
        all_saved_models.append({
            'experiment': exp_name,
            'method': method,
            'model_path': paths['model_path'],
            'vectorizer_path': paths['vectorizer_path'],
            'metadata_path': paths['metadata_path']
        })

model_inventory_df = pd.DataFrame(all_saved_models)
inventory_file = f"model_inventory_{timestamp}.csv"
model_inventory_df.to_csv(inventory_file, index=False)

print(f"‚úÖ Model inventory saved: {inventory_file}")
print(f"üìÅ All models saved in: {models_dir}/")
print(f"üî¢ Total models saved: {len(all_saved_models)}")

# Show how to load models
print("\\nüîÑ HOW TO LOAD SAVED MODELS:")
print('''
# Example: Load a model and make predictions
import joblib
import json

# Load model and vectorizer
model = joblib.load('saved_models/Baseline_2_8pct_stylistic_20250818_123456_model.joblib')
vectorizer = joblib.load('saved_models/Baseline_2_8pct_stylistic_20250818_123456_vectorizer.joblib')

# Load metadata
with open('saved_models/Baseline_2_8pct_stylistic_20250818_123456_metadata.json', 'r') as f:
    metadata = json.load(f)
    
print(f"Model F1 Score: {metadata['fake_f1_score']}")

# Make predictions on new tweets
new_tweets = ["Your tweet text here"]
new_tweets_vectorized = vectorizer.transform(new_tweets)
predictions = model.predict(new_tweets_vectorized)  # 0=real, 1=fake
''')

print("\\nüíæ Model saving complete!")

üíæ SAVING CLASSIFICATION MODELS
üìÅ Created directory: saved_models
\nüéØ SAVING MODELS FOR KEY EXPERIMENTS
--------------------------------------------------
\n1. BASELINE LEVEL (2.8% imbalance):
\nüî¨ Baseline 2.8%
   Train: 55,188 real, 52,170 fake | Test: 26,840
   Traditional F1: 0.9683
‚úÖ Saved: Baseline_2.8pct_traditional_20250818_191649
   Model: saved_models/Baseline_2.8pct_traditional_20250818_191649_model.joblib
   Vectorizer: saved_models/Baseline_2.8pct_traditional_20250818_191649_vectorizer.joblib
   Metadata: saved_models/Baseline_2.8pct_traditional_20250818_191649_metadata.json
   Stylistic F1: 0.9683
‚úÖ Saved: Baseline_2.8pct_stylistic_20250818_191649
   Model: saved_models/Baseline_2.8pct_stylistic_20250818_191649_model.joblib
   Vectorizer: saved_models/Baseline_2.8pct_stylistic_20250818_191649_vectorizer.joblib
   Metadata: saved_models/Baseline_2.8pct_stylistic_20250818_191649_metadata.json
   Improvement: +0.0000 (Traditional wins)
\n2. SEVERE IMBALANCE (25

In [18]:
# Save missing 9.4% imbalance level models

print("üîß SAVING MISSING 9.4% IMBALANCE MODELS")
print("=" * 50)
print("Adding the missing moderate imbalance level models to our saved collection...")

# Save 9.4% imbalance models
print("\n4. MODERATE IMBALANCE (9.4%):")
moderate_real, moderate_fake = create_imbalanced_datasets(real_tweets, fake_tweets,
                                                         {"real": 21886, "fake": 18114})
moderate_results, moderate_models = run_and_save_experiment(
    moderate_real, moderate_fake, "Moderate 9.4%", synthetic_tweets
)

# Update model inventory with the new models
print("\nüìã UPDATING MODEL INVENTORY")
print("-" * 30)

# Load existing inventory
existing_inventory = pd.read_csv(f"model_inventory_{timestamp}.csv")

# Add new models to inventory
new_models = []
for method, paths in moderate_models.items():
    new_models.append({
        'experiment': "Moderate",
        'method': method,
        'model_path': paths['model_path'],
        'vectorizer_path': paths['vectorizer_path'],
        'metadata_path': paths['metadata_path']
    })

new_models_df = pd.DataFrame(new_models)
updated_inventory = pd.concat([existing_inventory, new_models_df], ignore_index=True)

# Save updated inventory
updated_inventory_file = f"model_inventory_updated_{timestamp}.csv"
updated_inventory.to_csv(updated_inventory_file, index=False)

print(f"‚úÖ Updated model inventory saved: {updated_inventory_file}")
print(f"üî¢ Total models now saved: {len(updated_inventory)}")

# Show complete model collection
print(f"\nüìä COMPLETE MODEL COLLECTION:")
for idx, row in updated_inventory.iterrows():
    print(f"   {idx+1}. {row['experiment']} - {row['method']}")

# Verify we have all imbalance levels - FIXED VERSION
imbalance_levels = updated_inventory['experiment'].value_counts()
print(f"\n‚úÖ VERIFICATION - Models per imbalance level:")
for level, count in imbalance_levels.items():
    print(f"   {level}: {count} models ({'‚úÖ' if count == 2 else '‚ùå'})")

# Fixed: Access .values directly (not as function)
level_counts = list(imbalance_levels.values)
if len(imbalance_levels) == 4 and all(count == 2 for count in level_counts):
    print(f"\nüéâ SUCCESS: All 4 imbalance levels now have 2 models each (traditional + stylistic)")
    print(f"   Total: {len(updated_inventory)} models saved")
else:
    print(f"\n‚ö†Ô∏è Still missing some models - please check the collection")
    print(f"   Found {len(imbalance_levels)} imbalance levels")
    print(f"   Model counts: {level_counts}")

print(f"\nüíæ Model saving update complete!")

üîß SAVING MISSING 9.4% IMBALANCE MODELS
Adding the missing moderate imbalance level models to our saved collection...

4. MODERATE IMBALANCE (9.4%):
\nüî¨ Moderate 9.4%
   Train: 17,509 real, 14,491 fake | Test: 8,000
   Traditional F1: 0.9353
‚úÖ Saved: Moderate_9.4pct_traditional_20250818_191649
   Model: saved_models/Moderate_9.4pct_traditional_20250818_191649_model.joblib
   Vectorizer: saved_models/Moderate_9.4pct_traditional_20250818_191649_vectorizer.joblib
   Metadata: saved_models/Moderate_9.4pct_traditional_20250818_191649_metadata.json
   Stylistic F1: 0.9338
‚úÖ Saved: Moderate_9.4pct_stylistic_20250818_191649
   Model: saved_models/Moderate_9.4pct_stylistic_20250818_191649_model.joblib
   Vectorizer: saved_models/Moderate_9.4pct_stylistic_20250818_191649_vectorizer.joblib
   Metadata: saved_models/Moderate_9.4pct_stylistic_20250818_191649_metadata.json
   Improvement: -0.0015 (Traditional wins)

üìã UPDATING MODEL INVENTORY
------------------------------
‚úÖ Updated mo