# Semantic Analysis Pipeline for LLM-Generated Texts

Complete pipeline for analyzing semantic structure of texts generated by local LLMs via LM Studio. Includes text generation, preprocessing, network construction, analysis, and visualization.

**Key Features:** BERTScore semantic similarity, network analysis, bootstrap statistics, topic modeling.

In [5]:
# Import modules
from llm_generator import LLMGenerator
from text_preprocessor import TextPreprocessor
from network_builder import NetworkBuilder
from network_analyzer import NetworkAnalyzer
from statistical_tests import StatisticalAnalyzer
from visualizer import NetworkVisualizer

import pandas as pd
import numpy as np
import os
import glob
import json
from pathlib import Path
import warnings
warnings.filterwarnings("ignore")

print("All modules imported successfully")

ModuleNotFoundError: No module named 'emoatlas'

## Configuration

Load centralized configuration from `config.json`:

In [None]:
# Load configuration from config.json
with open('config.json', 'r') as f:
    CONFIG = json.load(f)

# Extract key settings
TEMPERATURES = CONFIG['generation']['temperatures']
PROMPTS = CONFIG['prompts']
DIRS = CONFIG['directories']
DIR_PREFIX = CONFIG['generation']['dir_prefix']

# Analysis configuration
ANALYSIS_CONFIG = {
    'n_completions': CONFIG['generation']['n_completions'],
    'n_bootstrap': CONFIG['analysis']['n_bootstrap'],
    'bootstrap_ci': CONFIG['analysis']['bootstrap_ci'],
    'style': CONFIG['analysis']['style'],
    'figsize': tuple(CONFIG['analysis']['figsize'])
}

# Create directories
for dir_path in DIRS.values():
    os.makedirs(dir_path, exist_ok=True)

print(f"✓ Configuration loaded from config.json")
print(f"  Temperatures: {len(TEMPERATURES)} levels")
print(f"  Prompts: {len(PROMPTS)} types")
print(f"  Completions per condition: {ANALYSIS_CONFIG['n_completions']}")

Enhanced configuration completed successfully!
Temperature values: [0.001, 0.25, 0.5, 0.75, 1.0, 1.25, 1.5]
Prompt types: ['complex', 'vague']
Directory structure: {'texts': 'texts', 'results': 'results', 'figures': 'figures', 'visualizations': 'visualizations', 'bootstrap': 'bootstrap_results', 'semantic': 'semantic_analysis'}
Enhanced analysis parameters:
  - Text completions: 100
  - Bootstrap samples: 1000
  - Semantic models: ['bertscore', 'fasttext', 'sentence_transformers']
  - BERTScore model: distilbert-base-uncased
  - Similarity thresholds: [0.2, 0.3, 0.4, 0.5, 0.6, 0.7]
  - Primary threshold: 0.3
  - Temperature correlation methods: ['pearson', 'spearman', 'kendall']
  - Community detection: ['louvain', 'leiden']
  - Effect size measures: ['cohens_d', 'hedges_g', 'eta_squared']


## 1. Text Generation

Generate texts using LM Studio API:

In [None]:
# Initialize generator
generator = LLMGenerator(
    model=CONFIG['lm_studio']['model_name'],
    base_url=CONFIG['lm_studio']['url']
)

# Generate texts
generator.generate_texts(
    prompts=PROMPTS,
    temperatures=TEMPERATURES,
    n_completions=ANALYSIS_CONFIG['n_completions'],
    texts_dir=DIRS['texts'],
    dir_prefix=DIR_PREFIX
)

# Check results
for prompt_type in PROMPTS.keys():
    dir_path = os.path.join(DIRS['texts'], f"{DIR_PREFIX}_{prompt_type}")
    if os.path.exists(dir_path):
        count = len(glob.glob(os.path.join(dir_path, '*.txt')))
        print(f"✓ {prompt_type.capitalize()}: {count} files")

Starting text generation...
Generating 100 completions per condition...
Total conditions: 2 prompts × 7 temperatures = 14
Starting generation of 14 text sets...


Processing vague T=1.5: 100%|██████████| 14/14 [12:50<00:00, 55.04s/it]   


Generation completed!
  - Processed prompts: 2
  - Temperatures per prompt: 7
  - Completions per temperature: 100
  - Total texts generated: 1400
Text generation completed!
Text generation failed. Check your API key and try again.
No existing generated texts found. You may need to run the text generation step.
Make sure you have a valid Mistral API key and uncomment the code above.





## 2. Text Preprocessing

Clean texts using SpaCy:

In [None]:
# Initialize preprocessor
preprocessor = TextPreprocessor(lang_model=CONFIG['preprocessing']['lang_model'])

# Process texts
source_dirs = [os.path.join(DIRS['texts'], f"{DIR_PREFIX}_{pt}") for pt in PROMPTS.keys()]
for source_dir in source_dirs:
    if os.path.exists(source_dir):
        prompt_type = 'complex' if 'complex' in source_dir else 'vague'
        target_dir = os.path.join(DIRS['texts'], f"cleaned_{DIR_PREFIX}_{prompt_type}")
        os.makedirs(target_dir, exist_ok=True)
        
        for text_file in glob.glob(os.path.join(source_dir, '*.txt')):
            filename = os.path.basename(text_file).replace('.txt', '')
            output_file = os.path.join(target_dir, f"{filename}_cleaned.txt")
            preprocessor.clean_single_file(text_file, output_file)

# Check results
for prompt_type in PROMPTS.keys():
    dir_path = os.path.join(DIRS['texts'], f"cleaned_{DIR_PREFIX}_{prompt_type}")
    if os.path.exists(dir_path):
        count = len(glob.glob(os.path.join(dir_path, '*.txt')))
        print(f"✓ {prompt_type.capitalize()} cleaned: {count} files")

Initializing text preprocessor...
Starting text preprocessing...
Processing complex prompts...
Starting text preprocessing...
Processing complex prompts...
7 files processed for complex prompts
Processing vague prompts...
7 files processed for complex prompts
Processing vague prompts...
7 files processed for vague prompts
Text preprocessing completed! Total files processed: 14
  - Complex cleaned files: 7
  - Vague cleaned files: 7
7 files processed for vague prompts
Text preprocessing completed! Total files processed: 14
  - Complex cleaned files: 7
  - Vague cleaned files: 7


## 3. Network Construction

Build semantic networks using EmoAtlas:

In [None]:
# Initialize network builder
network_builder = NetworkBuilder()

# Build networks
source_dirs = [os.path.join(DIRS['texts'], f"{DIR_PREFIX}_{pt}") for pt in PROMPTS.keys()]
network_builder.build_networks_from_texts(
    texts_dir=DIRS['texts'],
    source_dirs=source_dirs,
    dir_prefix=DIR_PREFIX
)

# Check results
for prompt_type in PROMPTS.keys():
    edge_dir = f'emo_edges_{prompt_type}'
    if os.path.exists(edge_dir):
        count = len(glob.glob(os.path.join(edge_dir, '*.txt')))
        print(f"✓ {prompt_type.capitalize()} networks: {count} files")

Initializing network builder...
Building semantic networks from preprocessed texts...
Found 14 files to process with EmoAtlas...
Building semantic networks from preprocessed texts...
Found 14 files to process with EmoAtlas...


Processing vague_1.0.txt: 100%|██████████| 14/14 [1:04:37<00:00, 277.00s/it] 


EmoAtlas network creation completed!
  - Edge lists saved in: emo_edges_complex/ and emo_edges_vague/
Semantic network construction completed!
  - Complex networks: 7 files created
  - Vague networks: 7 files created

Edge list files contain network connections:
  - Complex example: emo_edge_list_complex_0.25.txt
  - Vague example: emo_edge_list_vague_1.0.txt





## 4. Network Analysis

Calculate network metrics:

In [None]:
# Initialize analyzer
network_analyzer = NetworkAnalyzer()

# Analyze networks
results_file = os.path.join(DIRS['results'], 'network_metrics.csv')
if not os.path.exists(results_file):
    all_results = []
    for prompt_type in PROMPTS.keys():
        edge_dir = f'emo_edges_{prompt_type}'
        if os.path.exists(edge_dir):
            for edge_file in glob.glob(os.path.join(edge_dir, '*.txt')):
                result = network_analyzer.analyze_edge_list(edge_file)
                if result:
                    all_results.append(result)
    
    df = pd.DataFrame(all_results)
    df.to_csv(results_file, index=False)
    print(f"✓ Analyzed {len(df)} networks")
else:
    df = pd.read_csv(results_file)
    print(f"✓ Loaded {len(df)} network metrics from cache")

print(f"  Metrics: {len(df.columns) - 3} per network")

Initializing network analyzer...
Network analysis already completed!
  - Total networks analyzed: 14
  - Prompt types: ['complex' 'vague']
  - Temperature values: [0.001, 0.25, 0.5, 0.75, 1.0, 1.25, 1.5]
  - Metrics calculated: 11 metrics per network

Sample of network metrics:
  prompt_type  temperature   density
0     complex         0.25  0.021817
1     complex         0.50  0.020250
2     complex         1.25  0.018980
3     complex         1.50  0.018121
4     complex         0.75  0.020179


## 5. Semantic Analysis (BERTScore)

Advanced semantic similarity analysis:

In [5]:
# Enhanced semantic analysis with BERTScore
print("ADVANCED SEMANTIC ANALYSIS WITH BERTSCORE")

# Configure enhanced analysis parameters
ANALYSIS_CONFIG.update({
    'similarity_thresholds': [0.25, 0.30, 0.35, 0.40, 0.45, 0.50, 0.60, 0.70],  # More granular thresholds
    'temperature_analysis': {
        'min_pairs_per_temp': 1,  # Minimum pairs needed per temperature
        'temperature_bins': 5,    # Number of bins for temperature difference analysis
        'correlation_methods': ['pearson', 'spearman', 'kendall']
    },
    'n_topics': 10,
    'alpha': 0.05
})

def load_texts_for_semantic_analysis():
    """Load all texts for semantic analysis"""
    texts = []
    metadata = []
    
    # Load all text files from both prompt types
    for prompt_type in ['complex', 'vague']:
        for temp in TEMPERATURES:
            file_path = os.path.join(DIRS['texts'], f'mistral_{prompt_type}', f'{prompt_type}_{temp}.txt')
            if os.path.exists(file_path):
                with open(file_path, 'r', encoding='utf-8') as f:
                    text = f.read().strip()
                    texts.append(text)
                    metadata.append({
                        'prompt_type': prompt_type,
                        'temperature': temp,
                        'filename': f'{prompt_type}_{temp}.txt',
                        'text_idx': len(texts) - 1
                    })
    
    return texts, metadata

def compute_bertscore_similarities(texts, metadata):
    """Compute BERTScore similarities between all pairs of texts"""
    print("Computing BERTScore similarities...")
    
    # Initialize BERTScore
    from bert_score import score
    
    bertscore_results = []
    
    # Compute pairwise similarities
    for i in range(len(texts)):
        for j in range(i + 1, len(texts)):
            # Compute BERTScore
            P, R, F1 = score([texts[i]], [texts[j]], lang='en', verbose=False)
            
            # Extract metadata
            meta1, meta2 = metadata[i], metadata[j]
            
            # Create result record with enhanced features
            result = {
                'text1_idx': i,
                'text2_idx': j,
                'text1_prompt': meta1['prompt_type'],
                'text2_prompt': meta2['prompt_type'],
                'text1_temp': meta1['temperature'],
                'text2_temp': meta2['temperature'],
                'text1_filename': meta1['filename'],
                'text2_filename': meta2['filename'],
                'precision': float(P.item()),  # Convert to native Python float
                'recall': float(R.item()),     # Convert to native Python float
                'f1_score': float(F1.item()),  # Convert to native Python float
                'same_prompt': meta1['prompt_type'] == meta2['prompt_type'],
                'same_temp': meta1['temperature'] == meta2['temperature'],
                'temp_diff': abs(meta1['temperature'] - meta2['temperature'])
            }
            
            bertscore_results.append(result)
    
    return pd.DataFrame(bertscore_results)

def analyze_semantic_coherence_enhanced(bertscore_df):
    """Enhanced semantic coherence analysis with multiple perspectives"""
    print("Analyzing semantic coherence patterns...")
    
    coherence_results = {}
    
    # 1. Coherence within same prompt type
    same_prompt_coherence = bertscore_df[
        bertscore_df['text1_prompt'] == bertscore_df['text2_prompt']
    ].groupby(['text1_prompt'])['f1_score'].agg(['mean', 'std', 'count', 'median', 'min', 'max'])
    
    coherence_results['same_prompt'] = same_prompt_coherence
    
    # 2. Coherence between different prompt types
    cross_prompt_coherence = bertscore_df[
        bertscore_df['text1_prompt'] != bertscore_df['text2_prompt']
    ]['f1_score'].agg(['mean', 'std', 'count', 'median', 'min', 'max'])
    
    coherence_results['cross_prompt'] = cross_prompt_coherence
    
    # 3. Enhanced temperature-based coherence analysis
    temp_coherence = []
    
    # Same temperature pairs
    for temp in TEMPERATURES:
        temp_data = bertscore_df[
            (bertscore_df['text1_temp'] == temp) & 
            (bertscore_df['text2_temp'] == temp)
        ]
        if len(temp_data) >= ANALYSIS_CONFIG['temperature_analysis']['min_pairs_per_temp']:
            temp_coherence.append({
                'temperature': float(temp),  # Convert to native Python float
                'analysis_type': 'same_temperature',
                'mean_f1': float(temp_data['f1_score'].mean()),
                'std_f1': float(temp_data['f1_score'].std()),
                'median_f1': float(temp_data['f1_score'].median()),
                'count': int(len(temp_data)),  # Convert to native Python int
                'q25': float(temp_data['f1_score'].quantile(0.25)),
                'q75': float(temp_data['f1_score'].quantile(0.75))
            })
    
    # Temperature difference analysis
    temp_diff_bins = pd.cut(bertscore_df['temp_diff'], 
                           bins=ANALYSIS_CONFIG['temperature_analysis']['temperature_bins'],
                           include_lowest=True)
    
    temp_diff_analysis = bertscore_df.groupby(temp_diff_bins)['f1_score'].agg([
        'mean', 'std', 'count', 'median'
    ]).reset_index()
    temp_diff_analysis['temp_diff_range'] = temp_diff_analysis['temp_diff'].astype(str)
    
    coherence_results['temperature'] = pd.DataFrame(temp_coherence)
    coherence_results['temperature_difference'] = temp_diff_analysis
    
    # 4. Correlation analysis between temperature and semantic similarity
    correlation_results = {}
    for method in ANALYSIS_CONFIG['temperature_analysis']['correlation_methods']:
        if method == 'pearson':
            corr_coeff, p_value = stats.pearsonr(bertscore_df['temp_diff'], bertscore_df['f1_score'])
        elif method == 'spearman':
            corr_coeff, p_value = stats.spearmanr(bertscore_df['temp_diff'], bertscore_df['f1_score'])
        elif method == 'kendall':
            corr_coeff, p_value = stats.kendalltau(bertscore_df['temp_diff'], bertscore_df['f1_score'])
        
        correlation_results[method] = {
            'correlation': float(corr_coeff),  # Convert to native Python float
            'p_value': float(p_value),         # Convert to native Python float
            'significant': bool(p_value < ANALYSIS_CONFIG['alpha'])  # Convert to native Python bool
        }
    
    coherence_results['temperature_correlations'] = correlation_results
    
    return coherence_results

def perform_topic_modeling(texts, metadata):
    """Perform enhanced topic modeling analysis"""
    print("Performing topic modeling analysis...")
    
    # Vectorize texts with enhanced parameters
    vectorizer = TfidfVectorizer(
        max_features=1000,
        stop_words='english',
        ngram_range=(1, 2),
        min_df=2,
        max_df=0.8,
        lowercase=True,
        strip_accents='unicode'
    )
    
    text_vectors = vectorizer.fit_transform(texts)
    
    # Apply LDA with enhanced parameters
    lda = LatentDirichletAllocation(
        n_components=ANALYSIS_CONFIG['n_topics'],
        random_state=42,
        max_iter=100,
        learning_method='batch',
        evaluate_every=10,
        perp_tol=0.1
    )
    
    doc_topics = lda.fit_transform(text_vectors)
    
    # Create enhanced topic modeling results
    topic_results = []
    for i, doc_topic in enumerate(doc_topics):
        dominant_topic = np.argmax(doc_topic)
        topic_entropy = -np.sum(doc_topic * np.log(doc_topic + 1e-10))  # Topic entropy
        
        topic_results.append({
            'text_idx': int(i),  # Convert to native Python int
            'prompt_type': metadata[i]['prompt_type'],
            'temperature': float(metadata[i]['temperature']),  # Convert to native Python float
            'filename': metadata[i]['filename'],
            'dominant_topic': int(dominant_topic),  # Convert to native Python int
            'topic_probability': float(doc_topic[dominant_topic]),  # Convert to native Python float
            'topic_entropy': float(topic_entropy),  # Convert to native Python float
            'topic_distribution': [float(x) for x in doc_topic.tolist()]  # Convert all to native Python floats
        })
    
    # Get enhanced topic words with weights
    feature_names = vectorizer.get_feature_names_out()
    topic_words = []
    for topic_idx, topic in enumerate(lda.components_):
        top_word_indices = topic.argsort()[-10:][::-1]
        top_words = [feature_names[i] for i in top_word_indices]
        word_weights = topic[top_word_indices].tolist()
        
        topic_words.append({
            'topic_id': int(topic_idx),  # Convert to native Python int
            'top_words': top_words,
            'word_weights': [float(w) for w in word_weights],  # Convert to native Python floats
            'total_weight': float(np.sum(word_weights))  # Convert to native Python float
        })
    
    # Topic distribution analysis by prompt and temperature
    topic_df = pd.DataFrame(topic_results)
    topic_distribution = topic_df.groupby(['prompt_type', 'temperature', 'dominant_topic']).size().reset_index(name='count')
    topic_diversity = topic_df.groupby(['prompt_type', 'temperature'])['topic_entropy'].agg(['mean', 'std']).reset_index()
    
    return topic_df, topic_words, topic_distribution, topic_diversity

def analyze_network_at_multiple_thresholds(bertscore_df, thresholds):
    """Analyze semantic similarity networks at multiple thresholds"""
    print("Analyzing networks at multiple similarity thresholds...")
    
    threshold_results = []
    detailed_networks = {}
    
    for threshold in thresholds:
        network, metrics = create_semantic_similarity_network(bertscore_df, threshold)
        
        # Enhanced metrics calculation with type conversion
        enhanced_metrics = {}
        for key, value in metrics.items():
            if isinstance(value, (np.integer, np.floating)):
                enhanced_metrics[key] = float(value) if isinstance(value, np.floating) else int(value)
            else:
                enhanced_metrics[key] = value
        
        if metrics['n_edges'] > 0:
            # Calculate additional network properties
            try:
                degrees = [d for n, d in network.degree()]
                enhanced_metrics['avg_degree'] = float(np.mean(degrees))
                enhanced_metrics['max_degree'] = int(max(degrees))
                
                # Small world properties
                if nx.is_connected(network):
                    enhanced_metrics['diameter'] = int(nx.diameter(network))
                    enhanced_metrics['radius'] = int(nx.radius(network))
                    enhanced_metrics['average_shortest_path'] = float(nx.average_shortest_path_length(network))
                
                # Community detection (if available)
                try:
                    import community as community_louvain
                    partition = community_louvain.best_partition(network)
                    enhanced_metrics['n_communities'] = int(len(set(partition.values())))
                    enhanced_metrics['modularity'] = float(community_louvain.modularity(partition, network))
                except ImportError:
                    enhanced_metrics['n_communities'] = None
                    enhanced_metrics['modularity'] = None
                    
            except Exception as e:
                print(f"Error calculating enhanced metrics for threshold {threshold}: {e}")
        
        threshold_results.append({
            'threshold': float(threshold),  # Convert to native Python float
            **enhanced_metrics
        })
        
        detailed_networks[threshold] = network
        
        print(f"Threshold {threshold}: {metrics['n_edges']} edges, density {metrics['density']:.3f}")
    
    return pd.DataFrame(threshold_results), detailed_networks

def create_semantic_similarity_network(bertscore_df, threshold=0.7):
    """Create enhanced semantic similarity network based on BERTScore"""
    
    # Create network
    G = nx.Graph()
    
    # Add nodes (text indices) with metadata
    unique_texts = set(bertscore_df['text1_idx'].unique()) | set(bertscore_df['text2_idx'].unique())
    G.add_nodes_from(unique_texts)
    
    # Add edges based on similarity threshold with enhanced attributes
    for _, row in bertscore_df.iterrows():
        if row['f1_score'] >= threshold:
            G.add_edge(
                row['text1_idx'], 
                row['text2_idx'], 
                weight=float(row['f1_score']),
                precision=float(row['precision']),
                recall=float(row['recall']),
                temp_diff=float(row['temp_diff']),
                same_prompt=bool(row['same_prompt']),
                same_temp=bool(row['same_temp'])
            )
    
    # Compute enhanced network metrics
    network_metrics = {
        'n_nodes': int(G.number_of_nodes()),
        'n_edges': int(G.number_of_edges()),
        'density': float(nx.density(G) if G.number_of_nodes() > 1 else 0),
        'avg_clustering': float(nx.average_clustering(G)),
        'n_components': int(nx.number_connected_components(G))
    }
    
    if G.number_of_edges() > 0:
        try:
            largest_cc = max(nx.connected_components(G), key=len)
            largest_cc_subgraph = G.subgraph(largest_cc)
            
            if len(largest_cc) > 1:
                network_metrics['largest_component_size'] = int(len(largest_cc))
                network_metrics['avg_path_length_largest_cc'] = float(nx.average_shortest_path_length(largest_cc_subgraph))
            else:
                network_metrics['largest_component_size'] = 1
                network_metrics['avg_path_length_largest_cc'] = 0.0
        except:
            network_metrics['largest_component_size'] = 0
            network_metrics['avg_path_length_largest_cc'] = None
    else:
        network_metrics['largest_component_size'] = 0
        network_metrics['avg_path_length_largest_cc'] = None
    
    return G, network_metrics

def convert_for_json(obj):
    """Convert numpy types to native Python types for JSON serialization"""
    if isinstance(obj, np.integer):
        return int(obj)
    elif isinstance(obj, np.floating):
        return float(obj)
    elif isinstance(obj, np.ndarray):
        return obj.tolist()
    elif isinstance(obj, pd.Series):
        return obj.to_dict()
    elif isinstance(obj, pd.DataFrame):
        return {k: convert_for_json(v) for k, v in obj.to_dict().items()}
    elif isinstance(obj, dict):
        return {k: convert_for_json(v) for k, v in obj.items()}
    elif isinstance(obj, list):
        return [convert_for_json(item) for item in obj]
    else:
        return obj

def create_enhanced_visualizations(bertscore_df, coherence_results, threshold_results):
    """Create comprehensive visualizations for semantic analysis"""
    print("Creating enhanced visualizations...")
    
    # Set up the plotting style
    plt.style.use('default')  # Use default style instead of seaborn
    fig = plt.figure(figsize=(20, 24))
    
    # 1. BERTScore distribution by prompt type
    ax1 = plt.subplot(4, 3, 1)
    sns.boxplot(data=bertscore_df, x='text1_prompt', y='f1_score', ax=ax1)
    ax1.set_title('BERTScore F1 Distribution by Prompt Type')
    ax1.set_xlabel('Prompt Type')
    ax1.set_ylabel('F1 Score')
    
    # 2. Temperature difference vs semantic similarity
    ax2 = plt.subplot(4, 3, 2)
    scatter = ax2.scatter(bertscore_df['temp_diff'], bertscore_df['f1_score'], 
                         c=bertscore_df['same_prompt'].astype(int), 
                         alpha=0.6, cmap='viridis')
    ax2.set_xlabel('Temperature Difference')
    ax2.set_ylabel('F1 Score')
    ax2.set_title('Temperature Difference vs Semantic Similarity')
    plt.colorbar(scatter, ax=ax2, label='Same Prompt Type')
    
    # 3. Network metrics across thresholds
    ax3 = plt.subplot(4, 3, 3)
    ax3.plot(threshold_results['threshold'], threshold_results['n_edges'], 'o-', label='Edges')
    ax3.set_xlabel('Similarity Threshold')
    ax3.set_ylabel('Number of Edges')
    ax3.set_title('Network Edges vs Threshold')
    ax3.grid(True, alpha=0.3)
    
    # 4. Network density across thresholds
    ax4 = plt.subplot(4, 3, 4)
    ax4.plot(threshold_results['threshold'], threshold_results['density'], 's-', color='red', label='Density')
    ax4.set_xlabel('Similarity Threshold')
    ax4.set_ylabel('Network Density')
    ax4.set_title('Network Density vs Threshold')
    ax4.grid(True, alpha=0.3)
    
    # 5. Same vs cross-prompt coherence comparison
    ax5 = plt.subplot(4, 3, 5)
    if 'same_prompt' in coherence_results and not coherence_results['same_prompt'].empty:
        prompt_means = coherence_results['same_prompt']['mean']
        ax5.bar(range(len(prompt_means)), prompt_means.values, 
               yerr=coherence_results['same_prompt']['std'].values,
               alpha=0.7, capsize=5)
        ax5.set_xticks(range(len(prompt_means)))
        ax5.set_xticklabels(prompt_means.index)
        ax5.set_ylabel('Mean F1 Score')
        ax5.set_title('Within-Prompt Type Coherence')
        ax5.grid(True, alpha=0.3)
    
    # 6. Temperature correlation analysis
    ax6 = plt.subplot(4, 3, 6)
    if 'temperature_correlations' in coherence_results:
        methods = list(coherence_results['temperature_correlations'].keys())
        correlations = [coherence_results['temperature_correlations'][m]['correlation'] for m in methods]
        p_values = [coherence_results['temperature_correlations'][m]['p_value'] for m in methods]
        
        bars = ax6.bar(methods, correlations, alpha=0.7)
        # Color bars based on significance
        for i, (bar, p_val) in enumerate(zip(bars, p_values)):
            if p_val < 0.05:
                bar.set_color('red')
            else:
                bar.set_color('gray')
        
        ax6.set_ylabel('Correlation Coefficient')
        ax6.set_title('Temperature-Similarity Correlations')
        ax6.axhline(y=0, color='black', linestyle='-', alpha=0.3)
        ax6.grid(True, alpha=0.3)
    
    # 7. BERTScore component analysis
    ax7 = plt.subplot(4, 3, 7)
    ax7.scatter(bertscore_df['precision'], bertscore_df['recall'], 
               c=bertscore_df['f1_score'], alpha=0.6, cmap='plasma')
    ax7.set_xlabel('Precision')
    ax7.set_ylabel('Recall')
    ax7.set_title('BERTScore Components Analysis')
    
    # 8. Network components analysis
    ax8 = plt.subplot(4, 3, 8)
    ax8.plot(threshold_results['threshold'], threshold_results['n_components'], 'd-', color='green')
    ax8.set_xlabel('Similarity Threshold')
    ax8.set_ylabel('Number of Components')
    ax8.set_title('Network Fragmentation vs Threshold')
    ax8.grid(True, alpha=0.3)
    
    # 9. Temperature effect heatmap
    ax9 = plt.subplot(4, 3, 9)
    if 'temperature_difference' in coherence_results and not coherence_results['temperature_difference'].empty:
        temp_data = coherence_results['temperature_difference']
        if len(temp_data) > 1:
            ax9.bar(range(len(temp_data)), temp_data['mean'], 
                   yerr=temp_data['std'], alpha=0.7, capsize=3)
            ax9.set_xticks(range(len(temp_data)))
            ax9.set_xticklabels([str(x) for x in temp_data['temp_diff_range']], rotation=45)
            ax9.set_ylabel('Mean F1 Score')
            ax9.set_title('Similarity by Temperature Difference')
            ax9.grid(True, alpha=0.3)
    
    # 10. Distribution of F1 scores
    ax10 = plt.subplot(4, 3, 10)
    ax10.hist(bertscore_df['f1_score'], bins=30, alpha=0.7, edgecolor='black')
    ax10.axvline(bertscore_df['f1_score'].mean(), color='red', linestyle='--', label='Mean')
    ax10.axvline(bertscore_df['f1_score'].median(), color='green', linestyle='--', label='Median')
    ax10.set_xlabel('F1 Score')
    ax10.set_ylabel('Frequency')
    ax10.set_title('F1 Score Distribution')
    ax10.legend()
    ax10.grid(True, alpha=0.3)
    
    # 11. Clustering coefficient analysis
    ax11 = plt.subplot(4, 3, 11)
    if 'avg_clustering' in threshold_results.columns:
        ax11.plot(threshold_results['threshold'], threshold_results['avg_clustering'], 'o-', color='purple')
        ax11.set_xlabel('Similarity Threshold')
        ax11.set_ylabel('Average Clustering Coefficient')
        ax11.set_title('Network Clustering vs Threshold')
        ax11.grid(True, alpha=0.3)
    
    # 12. Summary statistics table
    ax12 = plt.subplot(4, 3, 12)
    ax12.axis('off')
    
    # Create summary text
    summary_text = f"""
    SEMANTIC ANALYSIS SUMMARY
    
    Total text pairs analyzed: {len(bertscore_df)}
    Mean F1 Score: {bertscore_df['f1_score'].mean():.3f} ± {bertscore_df['f1_score'].std():.3f}
    
    Within-prompt coherence:
    Complex: {coherence_results['same_prompt'].loc['complex', 'mean']:.3f} ± {coherence_results['same_prompt'].loc['complex', 'std']:.3f}
    Vague: {coherence_results['same_prompt'].loc['vague', 'mean']:.3f} ± {coherence_results['same_prompt'].loc['vague', 'std']:.3f}
    
    Cross-prompt coherence: {coherence_results['cross_prompt']['mean']:.3f} ± {coherence_results['cross_prompt']['std']:.3f}
    
    Optimal threshold: {threshold_results[threshold_results['n_edges'] > 0]['threshold'].min():.1f}
    Max network density: {threshold_results['density'].max():.3f}
    """
    
    ax12.text(0.1, 0.9, summary_text, transform=ax12.transAxes, fontsize=10,
             verticalalignment='top', fontfamily='monospace',
             bbox=dict(boxstyle='round', facecolor='lightgray', alpha=0.8))
    
    plt.tight_layout()
    
    # Save the comprehensive visualization
    viz_path = os.path.join(semantic_results_dir, 'comprehensive_semantic_analysis.png')
    plt.savefig(viz_path, dpi=300, bbox_inches='tight')
    plt.close()
    
    print(f"Enhanced visualizations saved to: {viz_path}")
    return viz_path

# Main enhanced semantic analysis execution
print("Loading texts for semantic analysis...")
texts, metadata = load_texts_for_semantic_analysis()

if texts:
    print(f"Loaded {len(texts)} texts for semantic analysis")
    
    # Compute BERTScore similarities
    bertscore_df = compute_bertscore_similarities(texts, metadata)
    
    if not bertscore_df.empty:
        print(f"Computed {len(bertscore_df)} BERTScore similarity pairs")
        
        # Save BERTScore results
        bertscore_file = os.path.join(semantic_results_dir, 'bertscore_similarities.csv')
        bertscore_df.to_csv(bertscore_file, index=False)
        print(f"BERTScore results saved to {bertscore_file}")
        
        # Enhanced semantic coherence analysis
        coherence_results = analyze_semantic_coherence_enhanced(bertscore_df)
        
        print("\nENHANCED SEMANTIC COHERENCE ANALYSIS:")
        print("Same prompt type coherence:")
        print(coherence_results['same_prompt'])
        
        print(f"\nCross prompt type coherence:")
        cross_prompt = coherence_results['cross_prompt']
        print(f"  Mean F1: {cross_prompt['mean']:.4f} ± {cross_prompt['std']:.4f}")
        print(f"  Median F1: {cross_prompt['median']:.4f}")
        print(f"  Range: [{cross_prompt['min']:.4f}, {cross_prompt['max']:.4f}]")
        
        if not coherence_results['temperature'].empty:
            print("\nTemperature-based coherence (same temperature pairs):")
            print(coherence_results['temperature'][['temperature', 'mean_f1', 'std_f1', 'count']])
        
        print("\nTemperature correlation analysis:")
        for method, result in coherence_results['temperature_correlations'].items():
            significance = "***" if result['significant'] else "n.s."
            print(f"  {method}: r = {result['correlation']:.4f}, p = {result['p_value']:.4f} {significance}")
        
        # Enhanced topic modeling
        topic_results, topic_words, topic_distribution, topic_diversity = perform_topic_modeling(texts, metadata)
        
        if not topic_results.empty:
            print(f"\nENHANCED TOPIC MODELING RESULTS:")
            print(f"Identified {ANALYSIS_CONFIG['n_topics']} topics")
            
            # Save enhanced topic results
            topic_file = os.path.join(semantic_results_dir, 'enhanced_topic_modeling_results.csv')
            topic_results.to_csv(topic_file, index=False)
            
            topic_dist_file = os.path.join(semantic_results_dir, 'topic_distribution_analysis.csv')
            topic_distribution.to_csv(topic_dist_file, index=False)
            
            topic_div_file = os.path.join(semantic_results_dir, 'topic_diversity_analysis.csv')
            topic_diversity.to_csv(topic_div_file, index=False)
            
            # Display enhanced topic summary
            print("Topic distribution by prompt type and temperature:")
            print(topic_distribution.head(10))
            
            print("\nTopic diversity (entropy) by conditions:")
            print(topic_diversity)
        
        # Multi-threshold network analysis
        threshold_results, detailed_networks = analyze_network_at_multiple_thresholds(
            bertscore_df, 
            ANALYSIS_CONFIG['similarity_thresholds']
        )
        
        print(f"\nMULTI-THRESHOLD NETWORK ANALYSIS:")
        print(threshold_results[['threshold', 'n_edges', 'density', 'n_components', 'avg_clustering']])
        
        # Find optimal threshold (first threshold with edges)
        thresholds_with_edges = threshold_results[threshold_results['n_edges'] > 0]
        if not thresholds_with_edges.empty:
            optimal_threshold = thresholds_with_edges['threshold'].max()  # Use highest threshold with edges
            print(f"\nOptimal threshold for analysis: {optimal_threshold}")
            
            # Create network with optimal threshold
            optimal_network, optimal_metrics = create_semantic_similarity_network(
                bertscore_df, 
                threshold=optimal_threshold
            )
            
            print(f"OPTIMAL SEMANTIC SIMILARITY NETWORK (threshold={optimal_threshold}):")
            for metric, value in optimal_metrics.items():
                print(f"  {metric}: {value}")
        else:
            print(f"\nNo threshold produced edges. Consider lowering thresholds further.")
            optimal_threshold = ANALYSIS_CONFIG['similarity_threshold']
        
        # Save network analysis results
        threshold_file = os.path.join(semantic_results_dir, 'multi_threshold_analysis.csv')
        threshold_results.to_csv(threshold_file, index=False)
        
        # Save network metrics with proper JSON serialization
        network_file = os.path.join(semantic_results_dir, 'enhanced_semantic_network_metrics.json')
        import json
        
        # Convert all data to JSON-serializable format
        network_data = {
            'analysis_parameters': convert_for_json(ANALYSIS_CONFIG),
            'threshold_analysis': convert_for_json(threshold_results.to_dict('records')),
            'coherence_results': {
                'same_prompt': convert_for_json(coherence_results['same_prompt'].to_dict()),
                'cross_prompt': convert_for_json(coherence_results['cross_prompt'].to_dict()),
                'temperature_correlations': convert_for_json(coherence_results['temperature_correlations'])
            }
        }
        
        with open(network_file, 'w') as f:
            json.dump(network_data, f, indent=2)
        
        # Create enhanced visualizations
        viz_path = create_enhanced_visualizations(bertscore_df, coherence_results, threshold_results)
        
        print(f"\nENHANCED SEMANTIC ANALYSIS COMPLETED SUCCESSFULLY!")
        print(f"Results saved in: {semantic_results_dir}")
        print(f"Key findings:")
        print(f"  - Complex prompts show {(coherence_results['same_prompt'].loc['complex', 'mean'] / coherence_results['same_prompt'].loc['vague', 'mean'] - 1) * 100:.1f}% higher coherence")
        print(f"  - Temperature effects: {list(coherence_results['temperature_correlations'].keys())} correlations computed")
        print(f"  - Network analysis: {len(ANALYSIS_CONFIG['similarity_thresholds'])} thresholds analyzed")
        print(f"  - Topic modeling: {ANALYSIS_CONFIG['n_topics']} topics identified with diversity analysis")
        
    else:
        print("No BERTScore similarities computed")
else:
    print("No texts found for semantic analysis")

ADVANCED SEMANTIC ANALYSIS WITH BERTSCORE
Loading texts for semantic analysis...
Loaded 14 texts for semantic analysis
Computing BERTScore similarities...


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You sho

Computed 91 BERTScore similarity pairs
BERTScore results saved to semantic_analysis/bertscore_similarities.csv
Analyzing semantic coherence patterns...

ENHANCED SEMANTIC COHERENCE ANALYSIS:
Same prompt type coherence:
                  mean       std  count    median       min       max
text1_prompt                                                         
complex       0.864552  0.011658     21  0.861355  0.846952  0.892121
vague         0.866313  0.010609     21  0.869610  0.847549  0.880628

Cross prompt type coherence:
  Mean F1: 0.8253 ± 0.0053
  Median F1: 0.8253
  Range: [0.8129, 0.8377]

Temperature-based coherence (same temperature pairs):
   temperature   mean_f1  std_f1  count
0        0.001  0.820467     NaN      1
1        0.250  0.825259     NaN      1
2        0.500  0.831417     NaN      1
3        0.750  0.827367     NaN      1
4        1.000  0.827270     NaN      1
5        1.250  0.828069     NaN      1
6        1.500  0.826234     NaN      1

Temperature correlatio

In [6]:
# Initialize statistical analyzer
print("Initializing statistical analyzer...")
stat_analyzer = StatisticalAnalyzer(alpha=ANALYSIS_CONFIG['alpha'])

# Check if we have network metrics data
results_file = os.path.join(DIRS['results'], 'network_metrics.csv')

if not os.path.exists(results_file):
    print("Network metrics file not found. Please run network analysis first.")
    df = pd.DataFrame()
else:
    # Load network metrics
    df = pd.read_csv(results_file)
    print(f"Loaded network metrics: {len(df)} networks")
    
    # Check if advanced analysis has already been performed
    advanced_results_file = os.path.join(DIRS['results'], 'advanced_statistical_analysis.csv')
    
    if os.path.exists(advanced_results_file):
        print("Advanced statistical analysis already completed!")
        advanced_df = pd.read_csv(advanced_results_file)
        print(f"Analysis results: {len(advanced_df)} records")
    else:
        print("Starting comprehensive statistical analysis...")
        
        # Run advanced analysis using the fixed StatisticalAnalyzer class
        advanced_df = stat_analyzer.run_advanced_analysis(df)
        
        if not advanced_df.empty:
            # Save results
            advanced_df.to_csv(advanced_results_file, index=False)
            print(f"Statistical analysis completed successfully!")
            print(f"Total results: {len(advanced_df)}")
            print(f"Results saved to: {advanced_results_file}")
            
            # Display summary of analysis types
            analysis_summary = advanced_df['analysis_type'].value_counts()
            print(f"Analysis summary:")
            for analysis_type, count in analysis_summary.items():
                print(f"  - {analysis_type}: {count} results")
        else:
            print("No analysis results generated.")
    
    # Display enhanced basic statistics
    if not df.empty:
        print("\n" + "="*60)
        print("ENHANCED BASIC STATISTICS")
        print("="*60)
        
        numeric_cols = df.select_dtypes(include=[np.number]).columns
        key_metrics = ['num_nodes', 'num_edges', 'density', 'clustering_coefficient']
        available_metrics = [col for col in key_metrics if col in df.columns]
        
        if available_metrics:
            print("\nDescriptive Statistics by Prompt Type:")
            summary_stats = df.groupby('prompt_type')[available_metrics].agg(['count', 'mean', 'std', 'min', 'max'])
            print(summary_stats.round(4))
            
            print("\nTemperature Effect Analysis:")
            temp_stats = df.groupby('prompt_type')['temperature'].agg(['min', 'max', 'mean', 'std'])
            print(temp_stats.round(4))
            
            # Clean correlation matrix
            corr_data = df[available_metrics + ['temperature']].replace([np.inf, -np.inf], np.nan).dropna()
            if not corr_data.empty:
                print("\nCorrelation Matrix:")
                corr_matrix = corr_data.corr()
                print(corr_matrix.round(3))
            
        print(f"\nStatistical analysis completed successfully!")
    else:
        print("No data available for statistical analysis.")

Initializing statistical analyzer...
Loaded network metrics: 14 networks
Starting comprehensive statistical analysis...
Starting comprehensive statistical analysis...
Running normality tests...
Running normality tests...
Running pairwise comparisons...
Running pairwise comparisons...
Running ANOVA analysis...
Running ANOVA analysis...
Calculating effect sizes...
Running effect size calculations...
Running PCA analysis...
Running PCA analysis...
Running clustering analysis...
Running clustering analysis...
Running regression analysis...
Running regression analysis...
Advanced analysis completed: 84 results generated
Statistical analysis completed successfully!
Total results: 84
Results saved to: results/advanced_statistical_analysis.csv
Analysis summary:
  - normality_test: 20 results
  - correlation: 20 results
  - group_comparison: 10 results
  - anova: 10 results
  - effect_size: 10 results
  - pca: 10 results
  - clustering: 4 results

ENHANCED BASIC STATISTICS

Descriptive Statisti

In [7]:
## 7.2. Bootstrap Analysis for Robust Statistical Inference

# This section implements bootstrap resampling to provide robust confidence intervals and statistical validation for all network metrics and semantic analysis results.

# Bootstrap Analysis Implementation
print("BOOTSTRAP ANALYSIS FOR ROBUST STATISTICAL INFERENCE")

def bootstrap_network_metrics(df, n_bootstrap=1000, ci_level=0.95):
    """
    Perform bootstrap resampling on network metrics
    """
    print(f"Performing bootstrap analysis with {n_bootstrap} samples...")
    
    bootstrap_results = []
    numeric_cols = df.select_dtypes(include=[np.number]).columns
    
    # Remove non-metric columns
    metric_cols = [col for col in numeric_cols if col not in ['temperature']]
    
    for i in range(n_bootstrap):
        # Bootstrap sample
        bootstrap_sample = df.sample(n=len(df), replace=True, random_state=i)
        
        # Compute metrics for each group
        for prompt_type in df['prompt_type'].unique():
            prompt_data = bootstrap_sample[bootstrap_sample['prompt_type'] == prompt_type]
            
            for col in metric_cols:
                if col in prompt_data.columns:
                    bootstrap_results.append({
                        'bootstrap_id': i,
                        'prompt_type': prompt_type,
                        'metric': col,
                        'value': prompt_data[col].mean(),
                        'std': prompt_data[col].std(),
                        'median': prompt_data[col].median()
                    })
    
    bootstrap_df = pd.DataFrame(bootstrap_results)
    
    # Compute confidence intervals
    alpha = 1 - ci_level
    lower_percentile = (alpha/2) * 100
    upper_percentile = (1 - alpha/2) * 100
    
    ci_results = []
    for prompt_type in df['prompt_type'].unique():
        for metric in metric_cols:
            metric_data = bootstrap_df[
                (bootstrap_df['prompt_type'] == prompt_type) & 
                (bootstrap_df['metric'] == metric)
            ]['value']
            
            if len(metric_data) > 0:
                ci_results.append({
                    'prompt_type': prompt_type,
                    'metric': metric,
                    'mean': metric_data.mean(),
                    'std': metric_data.std(),
                    'ci_lower': np.percentile(metric_data, lower_percentile),
                    'ci_upper': np.percentile(metric_data, upper_percentile),
                    'original_mean': df[df['prompt_type'] == prompt_type][metric].mean()
                })
    
    return pd.DataFrame(ci_results), bootstrap_df

def bootstrap_temperature_effects(df, n_bootstrap=1000):
    """
    Bootstrap analysis of temperature effects on network metrics
    """
    print("Analyzing temperature effects with bootstrap...")
    
    bootstrap_temp_results = []
    numeric_cols = df.select_dtypes(include=[np.number]).columns
    metric_cols = [col for col in numeric_cols if col not in ['temperature']]
    
    for i in range(n_bootstrap):
        bootstrap_sample = df.sample(n=len(df), replace=True, random_state=i)
        
        for metric in metric_cols:
            if metric in bootstrap_sample.columns:
                # Compute correlation with temperature
                temp_corr = bootstrap_sample['temperature'].corr(bootstrap_sample[metric])
                
                bootstrap_temp_results.append({
                    'bootstrap_id': i,
                    'metric': metric,
                    'temperature_correlation': temp_corr
                })
    
    temp_bootstrap_df = pd.DataFrame(bootstrap_temp_results)
    
    # Compute confidence intervals for correlations
    temp_ci_results = []
    for metric in metric_cols:
        metric_corrs = temp_bootstrap_df[temp_bootstrap_df['metric'] == metric]['temperature_correlation']
        
        if len(metric_corrs) > 0:
            temp_ci_results.append({
                'metric': metric,
                'mean_correlation': metric_corrs.mean(),
                'std_correlation': metric_corrs.std(),
                'ci_lower': np.percentile(metric_corrs, 2.5),
                'ci_upper': np.percentile(metric_corrs, 97.5),
                'original_correlation': df['temperature'].corr(df[metric])
            })
    
    return pd.DataFrame(temp_ci_results)

def bootstrap_semantic_coherence(bertscore_df, n_bootstrap=1000):
    """
    Bootstrap analysis of semantic coherence metrics
    """
    print("Bootstrap analysis of semantic coherence...")
    
    bootstrap_coherence_results = []
    
    for i in range(n_bootstrap):
        bootstrap_sample = bertscore_df.sample(n=len(bertscore_df), replace=True, random_state=i)
        
        # Within-prompt coherence
        same_prompt_data = bootstrap_sample[
            bootstrap_sample['text1_prompt'] == bootstrap_sample['text2_prompt']
        ]
        
        if len(same_prompt_data) > 0:
            for prompt_type in same_prompt_data['text1_prompt'].unique():
                prompt_data = same_prompt_data[same_prompt_data['text1_prompt'] == prompt_type]
                
                bootstrap_coherence_results.append({
                    'bootstrap_id': i,
                    'analysis_type': 'within_prompt',
                    'prompt_type': prompt_type,
                    'mean_f1': prompt_data['f1_score'].mean(),
                    'std_f1': prompt_data['f1_score'].std()
                })
        
        # Cross-prompt coherence
        cross_prompt_data = bootstrap_sample[
            bootstrap_sample['text1_prompt'] != bootstrap_sample['text2_prompt']
        ]
        
        if len(cross_prompt_data) > 0:
            bootstrap_coherence_results.append({
                'bootstrap_id': i,
                'analysis_type': 'cross_prompt',
                'prompt_type': 'all',
                'mean_f1': cross_prompt_data['f1_score'].mean(),
                'std_f1': cross_prompt_data['f1_score'].std()
            })
    
    return pd.DataFrame(bootstrap_coherence_results)

def compute_effect_sizes(df):
    """
    Compute effect sizes for differences between prompt types
    """
    print("Computing effect sizes...")
    
    effect_sizes = []
    numeric_cols = df.select_dtypes(include=[np.number]).columns
    metric_cols = [col for col in numeric_cols if col not in ['temperature']]
    
    if len(df['prompt_type'].unique()) >= 2:
        prompt_types = df['prompt_type'].unique()
        
        for metric in metric_cols:
            if metric in df.columns:
                group1_data = df[df['prompt_type'] == prompt_types[0]][metric].dropna()
                group2_data = df[df['prompt_type'] == prompt_types[1]][metric].dropna()
                
                if len(group1_data) > 0 and len(group2_data) > 0:
                    # Cohen's d
                    pooled_std = np.sqrt(((len(group1_data) - 1) * group1_data.var() + 
                                        (len(group2_data) - 1) * group2_data.var()) / 
                                       (len(group1_data) + len(group2_data) - 2))
                    
                    cohens_d = (group1_data.mean() - group2_data.mean()) / pooled_std
                    
                    # Hedges' g (bias-corrected)
                    correction_factor = 1 - (3 / (4 * (len(group1_data) + len(group2_data) - 2) - 1))
                    hedges_g = cohens_d * correction_factor
                    
                    effect_sizes.append({
                        'metric': metric,
                        'group1': prompt_types[0],
                        'group2': prompt_types[1],
                        'group1_mean': group1_data.mean(),
                        'group2_mean': group2_data.mean(),
                        'cohens_d': cohens_d,
                        'hedges_g': hedges_g,
                        'interpretation': interpret_effect_size(abs(cohens_d))
                    })
    
    return pd.DataFrame(effect_sizes)

def interpret_effect_size(effect_size):
    """
    Interpret effect size magnitude
    """
    if effect_size < 0.2:
        return "negligible"
    elif effect_size < 0.5:
        return "small"
    elif effect_size < 0.8:
        return "medium"
    else:
        return "large"

# Execute bootstrap analysis
if 'df' in locals() and not df.empty:
    print("Starting comprehensive bootstrap analysis...")
    
    # Network metrics bootstrap
    ci_results, bootstrap_df = bootstrap_network_metrics(
        df, 
        n_bootstrap=ANALYSIS_CONFIG['n_bootstrap'],
        ci_level=ANALYSIS_CONFIG['bootstrap_ci']
    )
    
    if not ci_results.empty:
        print(f"Bootstrap confidence intervals computed for {len(ci_results)} metric-prompt combinations")
        
        # Save bootstrap results
        bootstrap_file = os.path.join(DIRS['bootstrap'], 'network_metrics_bootstrap.csv')
        ci_results.to_csv(bootstrap_file, index=False)
        print(f"Bootstrap results saved to {bootstrap_file}")
        
        # Display key results
        print("\nBOOTSTRAP CONFIDENCE INTERVALS (95%):")
        for _, row in ci_results.head(10).iterrows():
            print(f"  {row['prompt_type']} - {row['metric']}: "
                  f"{row['mean']:.4f} [{row['ci_lower']:.4f}, {row['ci_upper']:.4f}]")
    
    # Temperature effects bootstrap
    temp_ci_results = bootstrap_temperature_effects(df, n_bootstrap=ANALYSIS_CONFIG['n_bootstrap'])
    
    if not temp_ci_results.empty:
        print(f"\nTEMPERATURE EFFECT CONFIDENCE INTERVALS:")
        for _, row in temp_ci_results.head(10).iterrows():
            print(f"  {row['metric']}: r = {row['mean_correlation']:.4f} "
                  f"[{row['ci_lower']:.4f}, {row['ci_upper']:.4f}]")
    
    # Effect sizes
    effect_sizes = compute_effect_sizes(df)
    
    if not effect_sizes.empty:
        print(f"\nEFFECT SIZES:")
        for _, row in effect_sizes.head(10).iterrows():
            print(f"  {row['metric']}: Cohen's d = {row['cohens_d']:.4f} ({row['interpretation']})")
        
        # Save effect sizes
        effect_file = os.path.join(DIRS['bootstrap'], 'effect_sizes.csv')
        effect_sizes.to_csv(effect_file, index=False)
    
    # Semantic coherence bootstrap (if available)
    if 'bertscore_df' in locals() and not bertscore_df.empty:
        semantic_bootstrap = bootstrap_semantic_coherence(bertscore_df, n_bootstrap=500)  # Reduced for efficiency
        
        if not semantic_bootstrap.empty:
            print(f"\nSEMANTIC COHERENCE BOOTSTRAP:")
            semantic_summary = semantic_bootstrap.groupby(['analysis_type', 'prompt_type'])['mean_f1'].agg(['mean', 'std'])
            print(semantic_summary)
            
            # Save semantic bootstrap results
            semantic_bootstrap_file = os.path.join(DIRS['bootstrap'], 'semantic_coherence_bootstrap.csv')
            semantic_bootstrap.to_csv(semantic_bootstrap_file, index=False)
    
    print(f"\nBootstrap analysis completed!")
    print(f"Results saved in: {DIRS['bootstrap']}")
    
else:
    print("No network metrics data available for bootstrap analysis")
    print("Please run network analysis first.")

BOOTSTRAP ANALYSIS FOR ROBUST STATISTICAL INFERENCE
Starting comprehensive bootstrap analysis...
Performing bootstrap analysis with 1000 samples...
Bootstrap confidence intervals computed for 20 metric-prompt combinations
Bootstrap results saved to bootstrap_results/network_metrics_bootstrap.csv

BOOTSTRAP CONFIDENCE INTERVALS (95%):
  complex - nodes: 1347.2301 [1192.1700, 1482.6850]
  complex - edges: 18547.9497 [15623.3714, 21091.8607]
  complex - density: 0.0205 [0.0193, 0.0221]
  complex - clustering: 0.5037 [0.5017, 0.5056]
  complex - path_len: 2.5416 [2.5343, 2.5505]
  complex - avg_deg: 27.2637 [25.8367, 28.4866]
  complex - max_deg: 456.7406 [401.6611, 503.7214]
  complex - std_deg: 41.2590 [38.0863, 43.9759]
  complex - diameter: 5.2862 [5.0000, 5.6667]
  complex - transitivity: 0.1985 [0.1947, 0.2026]
Analyzing temperature effects with bootstrap...

TEMPERATURE EFFECT CONFIDENCE INTERVALS:
  nodes: r = 0.7779 [0.5587, 0.9131]
  edges: r = 0.8737 [0.7479, 0.9540]
  density: 

## 6. Visualization

Create analysis visualizations:

In [None]:
# Initialize visualizer
visualizer = NetworkVisualizer(
    style=ANALYSIS_CONFIG['style'],
    figsize=ANALYSIS_CONFIG['figsize']
)

# Create visualizations
if os.path.exists(os.path.join(DIRS['results'], 'network_metrics.csv')):
    df = pd.read_csv(os.path.join(DIRS['results'], 'network_metrics.csv'))
    
    visualizer.create_overview_plots(df, save_path=os.path.join(DIRS['figures'], 'network_overview.png'))
    visualizer.create_correlation_heatmap(df, save_path=os.path.join(DIRS['figures'], 'correlation_heatmap.png'))
    visualizer.create_comparative_analysis(df, save_path=os.path.join(DIRS['figures'], 'comparative_analysis.png'))
    
    viz_files = glob.glob(os.path.join(DIRS['figures'], '*.png'))
    print(f"✓ Created {len(viz_files)} visualizations")
else:
    print("⚠ No network metrics found. Run analysis first.")

Initializing network visualizer...
Loaded network metrics for visualization: 14 networks
Creating overview plots...
Overview plots created: figures/network_overview.png
Creating correlation analysis...
Error creating correlation heatmap: 'NetworkVisualizer' object has no attribute 'create_correlation_heatmap'
Creating comparative analysis...
Error creating comparative analysis: 'NetworkVisualizer' object has no attribute 'create_comparative_analysis'

Available visualizations:
  - network_overview.png

Visualization summary:
  - Total networks visualized: 14
  - Prompt types: complex, vague
  - Temperature range: 0.001 - 1.5
  - Figures created: 1


## 7. Results Summary

Generate comprehensive pipeline summary:

In [None]:
# Pipeline summary
summary = {
    'texts_generated': sum(len(glob.glob(os.path.join(DIRS['texts'], f"{DIR_PREFIX}_{pt}", '*.txt'))) 
                          for pt in PROMPTS.keys() if os.path.exists(os.path.join(DIRS['texts'], f"{DIR_PREFIX}_{pt}"))),
    'texts_cleaned': sum(len(glob.glob(os.path.join(DIRS['texts'], f"cleaned_{DIR_PREFIX}_{pt}", '*.txt'))) 
                        for pt in PROMPTS.keys() if os.path.exists(os.path.join(DIRS['texts'], f"cleaned_{DIR_PREFIX}_{pt}"))),
    'networks_built': sum(len(glob.glob(os.path.join(f'emo_edges_{pt}', '*.txt'))) 
                         for pt in PROMPTS.keys() if os.path.exists(f'emo_edges_{pt}')),
    'visualizations': len(glob.glob(os.path.join(DIRS['figures'], '*.png')))
}

print("=" * 50)
print("PIPELINE SUMMARY")
print("=" * 50)
for key, value in summary.items():
    print(f"{key.replace('_', ' ').title()}: {value}")
print("=" * 50)

# Save summary
summary_file = os.path.join(DIRS['results'], 'pipeline_summary.json')
with open(summary_file, 'w') as f:
    json.dump(summary, f, indent=2)
print(f"✓ Summary saved to {summary_file}")

SEMANTIC ANALYSIS PIPELINE - RESULTS SUMMARY

TEXT GENERATION & PREPROCESSING:
complex_original: 7 files
vague_original: 7 files
complex_cleaned: 7 files
vague_cleaned: 7 files

SEMANTIC NETWORKS:
Complex networks: 7 files
Vague networks: 7 files

ANALYSIS RESULTS:
network_metrics: 14 records
advanced_analysis: Not found
detailed_results: Not found

VISUALIZATIONS:
Total figures: 1
    - network_overview.png

PIPELINE STATUS:
Completion rate: 100.0%
Completed components: 4/4
Pipeline completed successfully!

Summary report saved to: results/pipeline_summary.txt
