# Co-Met: Collaborative Meta-Analysis Tool

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ErickJLA/Co-Met/blob/main/Co_Met.ipynb)

**Version:** 1.0.0
**Last Updated:** November 2025

## About

Co-Met is a comprehensive, publication-ready tool for conducting meta-analyses using advanced statistical methods. This notebook implements three-level (multilevel) meta-analytic models with cluster-robust variance estimation, providing rigorous statistical inference for hierarchical data structures common in ecology, psychology, medicine, and other scientific fields.

**Key Features:**
- Three-level meta-analytic models accounting for nested data structures
- Multiple heterogeneity estimators (DL, REML, ML, PM, SJ)
- Cluster-robust variance estimation for valid inference
- Automated effect size calculation (log response ratio, Hedges' g, Cohen's d, log odds ratio)
- Publication-ready forest plots and visualizations
- Meta-regression with natural cubic splines
- Comprehensive publication bias assessment
- Sensitivity analyses (leave-one-out, trim-and-fill, cumulative)

## Citation

If you use Co-Met in your research, please cite:

```
[Citation will be added upon publication]
```

## Authors

[Author names and affiliations to be added]

## License

This tool is released under [License TBD]. See LICENSE file for details.

## Repository

GitHub: https://github.com/ErickJLA/Co-Met

---

## Introduction

### What is Meta-Analysis?

Meta-analysis is a statistical technique for combining and synthesizing results from multiple independent studies to estimate an overall effect size. Traditional meta-analysis assumes a two-level structure (sampling variation within studies, and variation between studies). However, many datasets exhibit additional hierarchical structure, such as:

- Multiple effect sizes extracted from the same study
- Multiple experiments conducted by the same research group
- Studies nested within geographic regions
- Multiple species from the same taxonomic family

### Why Three-Level Models?

Three-level (multilevel) meta-analytic models account for these nested dependencies, providing more accurate effect size estimates and valid statistical inference. This notebook implements three-level models using restricted maximum likelihood (REML) estimation.

### Who Should Use This Tool?

Co-Met is designed for researchers who:
- Need to conduct rigorous meta-analyses with hierarchical data
- Want publication-ready visualizations
- Require cluster-robust variance estimation for valid inference
- Need to explore heterogeneity and moderator effects
- Want to assess publication bias comprehensively

### Prerequisites

- Basic understanding of meta-analysis concepts
- Data organized in a Google Sheet (template provided)
- No programming experience required - all analyses use interactive widgets

---

## Table of Contents

1. [How to Use This Notebook](#how-to-use)
2. [Technical Specifications](#specifications)
3. [Setup: Import Libraries](#imports)
4. [Step 1: Load Data from Google Sheets](#load-data)
5. [Step 2: Configure Analysis Parameters](#configure)
6. [Step 3: Calculate Effect Sizes](#effect-sizes)
7. [Step 4: Overall Meta-Analysis](#overall-analysis)
8. [Step 5: Subgroup Analysis](#subgroup-analysis)
9. [Step 6: Visualizations (Forest Plots)](#forest-plots)
10. [Step 7: Meta-Regression](#meta-regression)
11. [Step 8: Publication Bias Assessment](#publication-bias)
12. [Step 9: Sensitivity Analyses](#sensitivity)
13. [Save Your Results](#save-results)
14. [Statistical Methods and References](#references)

---

## How to Use This Notebook

### Quick Start Guide

**For first-time users, follow these steps:**

1. **Prepare your data**: Organize your meta-analytic data in a Google Sheet
   - Use the [template Google Sheet](https://docs.google.com/spreadsheets/) (link TBD)
   - Required columns: study identifier, effect size data, sample sizes
   - Optional: moderator variables, temporal data

2. **Run setup cells** (MANDATORY):
   - Run the "Import Libraries" cell to load required packages
   - This may take 30-60 seconds

3. **Load your data**:
   - Run the "Load Data" cell
   - Authenticate with your Google account when prompted
   - Enter your Google Sheet URL

4. **Configure analysis**:
   - Run configuration cells and select your options using the interactive widgets
   - Specify which columns contain your effect size data
   - Choose your effect size metric (lnRR, Hedges' g, etc.)

5. **Run analyses** (in order):
   - Calculate effect sizes
   - Run overall meta-analysis
   - Optional: subgroup analysis, meta-regression, bias assessment, sensitivity analyses

6. **Save your results**:
   - Download generated figures (right-click ‚Üí Save image)
   - Export result tables to CSV if needed

### Cell Execution Order

```
MANDATORY (run in order):
  Cell 1: Import Libraries ‚úì
  Cell 2: Load Data ‚úì
  Cell 3-4: Configure Analysis ‚úì
  Cell 5: Calculate Effect Sizes ‚úì

CORE ANALYSES (recommended):
  Cell 6: Overall Meta-Analysis
  Cell 7: Three-Level Model

OPTIONAL (run as needed):
  Cell 8+: Subgroup analysis, forest plots, meta-regression, bias tests, sensitivity
```

### Important Notes

- **Runtime**: Most cells execute in < 5 seconds. Large datasets (n > 500) may take up to 30 seconds for complex analyses.
- **Colab sessions are temporary**: Your work is NOT automatically saved. Download results before closing.
- **Re-running cells**: You can re-run any cell to update parameters or regenerate outputs.
- **Errors**: If you encounter errors, check that you've run all prerequisite cells in order.

---

## Technical Specifications

### Environment

- **Platform**: Google Colaboratory
- **Python Version**: 3.10+
- **Runtime**: CPU (default) - GPU not required

### Key Dependencies

- `numpy >= 1.21.0` - Numerical computing
- `pandas >= 1.3.0` - Data manipulation
- `scipy >= 1.7.0` - Statistical functions
- `matplotlib >= 3.4.0` - Visualization
- `statsmodels >= 0.13.0` - Advanced statistical models
- `gspread` - Google Sheets integration
- `ipywidgets` - Interactive controls

### Data Format Requirements

Your Google Sheet should contain:

**Required columns:**
- Study identifier (e.g., "Author_Year" or "Study_ID")
- Effect size data OR raw data for calculation:
  - For lnRR: mean and SD for treatment/control groups, sample sizes
  - For Hedges' g / Cohen's d: means, SDs, sample sizes for both groups
  - For log OR: event counts and sample sizes for both groups
  - Pre-calculated effect sizes and variances (if available)

**Optional columns:**
- Categorical moderators for subgroup analysis
- Continuous moderators for meta-regression
- Publication year for cumulative meta-analysis
- Hierarchical clustering variable (e.g., study ID for multiple effect sizes per study)

### Browser Compatibility

- Best experienced in Google Chrome or Firefox
- Safari and Edge also supported

---

<a id="imports"></a>
## Setup: Import Libraries

**IMPORTANT**: Run this cell first before proceeding with any analysis.

This cell imports all required Python libraries for the meta-analysis. Expected runtime: 30-60 seconds.

If you encounter import errors, make sure you're using a recent Google Colab runtime.

---

In [None]:
#@title üìö IMPORT LIBRARIES# =============================================================================# LIBRARY IMPORTS# All required libraries are imported here to avoid duplication# =============================================================================# Standard library importsimport sysimport datetimeimport warningsimport traceback# Google Colab and authenticationimport gspreadfrom google.colab import authfrom google.auth import default# Third-party scientific computing librariesimport numpy as npimport pandas as pdfrom scipy import statsfrom scipy.stats import norm, chi2, t, f, pearsonr, rankdatafrom scipy.optimize import minimize, minimize_scalar# Statistical modelingimport statsmodels.api as smimport patsy# Visualizationimport matplotlib.pyplot as pltimport matplotlib.patches as mpatchesfrom matplotlib.lines import Line2Dfrom matplotlib.patches import Patch, Rectangle# Interactive widgets and displayimport ipywidgets as widgetsfrom IPython.display import display, HTML, clear_output, IFrame# Configure warningswarnings.filterwarnings('ignore')print("‚úì All libraries imported successfully")

<a id="load-data"></a>## Step 1: Load Data from Google SheetsIn this section, you'll connect to your Google Sheet containing meta-analysis data.### Google Sheets Setup**Required**: Your Google Sheet must be shared with appropriate permissions:1. Open your Google Sheet2. Click "Share" button3. Change to "Anyone with the link can view" OR share with your Google account email4. Copy the sheet URL### Data Format RequirementsYour sheet should include:- **Study identifier column**: Unique ID for each study/effect size- **Effect size data**: Either raw data (means, SDs, sample sizes) OR pre-calculated effect sizes- **Clustering variable**: If multiple effect sizes per study, include a study-level identifier- **Optional**: Moderator variables, publication year, quality scoresSee the [template Google Sheet](https://docs.google.com/spreadsheets/) for proper formatting (link TBD).### Instructions1. Run the cell below2. Enter your Google Sheet URL when prompted3. Select the appropriate worksheet if your spreadsheet has multiple sheets**Expected runtime**: < 5 seconds---

In [None]:
#@title üîê AUTHENTICATE WITH GOOGLE# =============================================================================# GOOGLE AUTHENTICATION# Purpose: Authenticate with your Google account to access Google Sheets# Expected runtime: < 5 seconds (+ time for manual authentication)# =============================================================================# Authenticate and create credentials# You will be prompted to authorize access to your Google accountauth.authenticate_user()creds, _ = default()gc = gspread.authorize(creds)print("‚úì Authentication successful!")print("You can now load data from Google Sheets.")

In [None]:
# ‚ö†Ô∏è PREREQUISITES:# - Must load data first (previous cell)## Expected runtime: < 1 second# This cell only sets up widgets; no computation occurs until you click "Save"#@title ‚öôÔ∏è Step 2: CONFIGURE ANALYSIS#@title ‚öôÔ∏è Step 2: CONFIGURE ANALYSIS# =============================================================================# CELL 3: CONFIGURE ANALYSIS FILTERS# Purpose: Set up all filters and mappings for the analysis.# Dependencies: Cell 2 (global 'raw_data_from_sheet')# Outputs: 'ANALYSIS_CONFIG' dictionary with user's choices.# =============================================================================# --- 1. PRE-RUN: Check for Data and Find Moderators ---try:    if 'raw_data_from_sheet' not in globals():        raise NameError("raw_data_from_sheet")    # --- 1a. Helper function for auto-guessing columns ---    def guess_column(options, matches, default=None):        """Finds the best match from a list of options."""        options_lower = [str(o).lower() for o in options]        for match in matches:            if match in options_lower:                return options[options_lower.index(match)]        return default if default else options[0] if options else None    # --- 1b. Load data and find all columns ---    all_column_names = list(raw_data_from_sheet.columns)    if not all_column_names:        raise ValueError("Data loaded from sheet has no columns.")    # --- 1c. Auto-guess core columns to build a temporary_raw_data ---    temp_col_map = {        guess_column(all_column_names, ['id', 'study', 'study_id', 'paper']): 'id',        guess_column(all_column_names, ['xe', 'mean_e', 'mean_exp', 'x_e']): 'xe',        guess_column(all_column_names, ['sde', 'sd_e', 'sd_exp']): 'sde',        guess_column(all_column_names, ['ne', 'n_e', 'n_exp']): 'ne',        guess_column(all_column_names, ['xc', 'mean_c', 'mean_ctrl', 'x_c']): 'xc',        guess_column(all_column_names, ['sdc', 'sd_c', 'sd_ctrl']): 'sdc',        guess_column(all_column_names, ['nc', 'n_c', 'n_ctrl']): 'nc'    }    # Invert map for renaming, but handle None if a column wasn't found    temp_col_map_inv = {v: k for k, v in temp_col_map.items() if k is not None}    # Find other non-core columns    other_cols = [col for col in all_column_names if col not in temp_col_map_inv.values()]    # Create temporary cleaned data    temp_raw_data = raw_data_from_sheet[list(temp_col_map_inv.values()) + other_cols].copy()    temp_raw_data.rename(columns=temp_col_map_inv, inplace=True)    # --- 1d. Run minimal cleaning just to find moderators ---    for col in ['id']: # Only need ID for this step        if col not in temp_raw_data.columns:            temp_raw_data[col] = pd.Series(dtype='object')    temp_raw_data['id'] = temp_raw_data['id'].astype(str).str.strip()    # Find moderators    excluded_cols = ['id', 'xe', 'sde', 'ne', 'xc', 'sdc', 'nc']    available_moderators = [col for col in temp_raw_data.columns                            if col not in excluded_cols                            and temp_raw_data[col].dtype == 'object']except NameError:    display(HTML("<div style='background-color: #fff3cd; border: 1px solid #ffeeba; padding: 15px; border-radius: 5px; color: #856404;'>"                 "<b>‚ùå ERROR: No data found.</b> Please run Cell 2 (LOAD DATA) successfully before running this cell."                 "</div>"))    raiseexcept Exception as e:    display(HTML(f"<div style='background-color: #f8d7da; border: 1px solid #f5c6cb; padding: 15px; border-radius: 5px; color: #721c24;'>"                 f"<b>‚ùå An error occurred during pre-load:</b> {e}<br>"                 f"Please check your sheet and column names."                 f"</div>"))    raise# --- 2. Widget Definitions ---# --- Box 1: Column Mapping (Hidden in Accordion) ---id_col_widget = widgets.Dropdown(description='Study ID (id):', options=all_column_names,                                 value=temp_col_map_inv.get('id'),                                 layout=widgets.Layout(width='500px'), style={'description_width': '150px'})xe_col_widget = widgets.Dropdown(description='Exp. Mean (xe):', options=all_column_names,                                 value=temp_col_map_inv.get('xe'),                                 layout=widgets.Layout(width='500px'), style={'description_width': '150px'})sde_col_widget = widgets.Dropdown(description='Exp. SD (sde):', options=all_column_names,                                  value=temp_col_map_inv.get('sde'),                                  layout=widgets.Layout(width='500px'), style={'description_width': '150px'})ne_col_widget = widgets.Dropdown(description='Exp. N (ne):', options=all_column_names,                                 value=temp_col_map_inv.get('ne'),                                 layout=widgets.Layout(width='500px'), style={'description_width': '150px'})xc_col_widget = widgets.Dropdown(description='Ctrl. Mean (xc):', options=all_column_names,                                 value=temp_col_map_inv.get('xc'),                                 layout=widgets.Layout(width='500px'), style={'description_width': '150px'})sdc_col_widget = widgets.Dropdown(description='Ctrl. SD (sdc):', options=all_column_names,                                  value=temp_col_map_inv.get('sdc'),                                  layout=widgets.Layout(width='500px'), style={'description_width': '150px'})nc_col_widget = widgets.Dropdown(description='Ctrl. N (nc):', options=all_column_names,                                 value=temp_col_map_inv.get('nc'),                                 layout=widgets.Layout(width='500px'), style={'description_width': '150px'})column_mapping_box = widgets.VBox([    widgets.HTML("Map your sheet's columns to the names the pipeline requires. The system has auto-guessed, but please verify."),    id_col_widget,    xe_col_widget, sde_col_widget, ne_col_widget,    xc_col_widget, sdc_col_widget, nc_col_widget])column_accordion = widgets.Accordion(children=[column_mapping_box])column_accordion.set_title(0, 'Step 2a (Optional): Verify Column Names')column_accordion.selected_index = None # Start closed# --- Box 2: Analysis Configuration ---prefilter_col_widget = widgets.Dropdown(description='Filter by:', options=['None'] + available_moderators, value='None',                                        style={'description_width': '120px'}, layout=widgets.Layout(width='500px'))prefilter_values_widget = widgets.VBox()filterCol1_widget = widgets.Dropdown(description='Factor 1:', options=available_moderators if available_moderators else ['None'],                                     value=available_moderators[0] if available_moderators else 'None',                                     style={'description_width': '120px'}, layout=widgets.Layout(width='500px'))filterCol2_widget = widgets.Dropdown(description='Factor 2:', options=['None'] + available_moderators, value='None',                                     style={'description_width': '120px'}, layout=widgets.Layout(width='500px'))minPapers_widget = widgets.IntSlider(value=2, min=1, max=10, step=1, description='Min Papers:',                                     style={'description_width': '120px'}, layout=widgets.Layout(width='500px'))minObservations_widget = widgets.IntSlider(value=2, min=1, max=20, step=1, description='Min Observations:',                                           style={'description_width': '120px'}, layout=widgets.Layout(width='500px'))# --- Box 3: Final Button ---save_config_button = widgets.Button(    description='‚ñ∂ Save Configuration',    button_style='success',    layout=widgets.Layout(width='500px', height='50px'),    style={'font_weight': 'bold', 'font_size': '14px'})output_area = widgets.Output()# --- 4. Widget Handlers ---def update_prefilter_checkboxes(change):    """Update checkboxes when column selection changes"""    selected_col = change['new']    if selected_col == 'None':        prefilter_values_widget.children = []        return    try:        # Use the *uncleaned* temp_raw_data for a quick preview        unique_values = sorted(temp_raw_data[selected_col].dropna().unique())        checkboxes = [            widgets.Checkbox(                value=True,                description=f"{val} (n={len(temp_raw_data[temp_raw_data[selected_col] == val])})",                style={'description_width': 'initial'},                layout=widgets.Layout(width='500px')            ) for val in unique_values        ]        prefilter_values_widget.children = [            widgets.HTML("<p style='margin: 10px 0; font-weight: bold;'>Select values to KEEP:</p>")        ] + checkboxes    except Exception as e:        prefilter_values_widget.children = [widgets.HTML(f"<p style='color: red;'>Error updating list: {e}</p>")]prefilter_col_widget.observe(update_prefilter_checkboxes, names='value')@save_config_button.on_clickdef on_save_config_clicked(b):    """Main function: JUST save the config."""    with output_area:        clear_output(wait=True)        print("="*70)        print("CONFIGURING ANALYSIS")        print("="*70)        try:            # --- 1. Get Column Mappings ---            global col_map            col_map = {                id_col_widget.value: 'id',                xe_col_widget.value: 'xe',                sde_col_widget.value: 'sde',                ne_col_widget.value: 'ne',                xc_col_widget.value: 'xc',                sdc_col_widget.value: 'sdc',                nc_col_widget.value: 'nc'            }            # Check for duplicate mappings            mapped_keys = [k for k in col_map.keys() if k is not None]            if len(set(mapped_keys)) != len(mapped_keys):                raise ValueError("Duplicate columns mapped. Please assign one sheet column to one role.")            # --- 2. Get Pre-filter selections ---            prefilter_col = prefilter_col_widget.value            selected_values = []            if prefilter_col != 'None':                selected_values = [                    cb.description.split(' (n=')[0]                    for cb in prefilter_values_widget.children[1:] # Skip HTML title                    if hasattr(cb, 'value') and cb.value                ]            # --- 3. Save Configuration to Global ANALYSIS_CONFIG ---            global ANALYSIS_CONFIG            ANALYSIS_CONFIG = {                'col_map': col_map,                'prefilter_col': prefilter_col,                'prefilter_values_kept': selected_values if prefilter_col != 'None' else 'All',                'filterCol1': filterCol1_widget.value,                'filterCol2': filterCol2_widget.value,                'minPapers': minPapers_widget.value,                'minObservations': minObservations_widget.value,            }            # --- 4. Print Final Summary ---            print("\n" + "="*70)            print("‚úÖ CONFIGURATION SAVED")            print("="*70)            print("\nüìã Analysis Configuration Summary:")            print("-" * 70)            print(f"  1Ô∏è‚É£  COLUMN MAPPING:")            print(f"      ‚Ä¢ Study ID: '{id_col_widget.value}'")            print(f"      ‚Ä¢ Exp. Mean: '{xe_col_widget.value}'")            print(f"      ‚Ä¢ Ctrl. Mean: '{xc_col_widget.value}'")            print(f"  2Ô∏è‚É£  SUBGROUP ANALYSIS:")            print(f"      ‚Ä¢ Primary factor:   {ANALYSIS_CONFIG['filterCol1']}")            print(f"      ‚Ä¢ Secondary factor: {ANALYSIS_CONFIG['filterCol2']}")            print(f"  3Ô∏è‚É£  QUALITY THRESHOLDS:")            print(f"      ‚Ä¢ Min Papers:       {ANALYSIS_CONFIG['minPapers']}")            print(f"      ‚Ä¢ Min Observations: {ANALYSIS_CONFIG['minObservations']}")            print("\n" + "="*70)            print("‚ñ∂Ô∏è  Run the next cell to clean data and apply this configuration.")            print("="*70)        except Exception as e:            print(f"\n‚ùå AN ERROR OCCURRED:\n")            print(f"  Type: {type(e).__name__}")            print(f"  Message: {e}")            print("\n  Traceback:")            traceback.print_exc(file=sys.stdout)# --- 5. Assemble & Display Final UI ---box1 = column_accordionbox2 = widgets.VBox([    widgets.HTML("<h3 style='color: #2E86AB;'>Step 2b: Configure Analysis Filters</h3>"),    widgets.HTML("<h4 style='color: #444; margin-bottom: 5px;'>üìå Pre-Filter (Optional)</h4>"),    prefilter_col_widget,    prefilter_values_widget,    widgets.HTML("<hr style='margin: 10px 0; border: none; border-top: 1px solid #eee;'>"),    widgets.HTML("<h4 style='color: #444; margin-bottom: 5px;'>üìä Subgroup Analysis</h4>"),    filterCol1_widget,    filterCol2_widget,    widgets.HTML("<hr style='margin: 10px 0; border: none; border-top: 1px solid #eee;'>"),    widgets.HTML("<h4 style='color: #444; margin-bottom: 5px;'>‚öôÔ∏è Quality Filters</h4>"),    minPapers_widget,    minObservations_widget])box3 = widgets.VBox([    widgets.HTML("<hr style='margin: 20px 0; border: none; border-top: 2px solid #ddd;'>"),    widgets.HTML("<h3 style='color: #2E86AB;'>Step 2c: Save Configuration</h3>"),    save_config_button,    output_area])display(box1, box2, box3)

In [None]:
#@title ‚öôÔ∏è Step 3: APPLY CONFIGURATION & PREPARE DATA#@title ‚öôÔ∏è Step 3: APPLY CONFIGURATION & PREPARE DATA# =============================================================================# CELL 4: CLEAN DATA & APPLY CONFIGURATION# Purpose: Run cleaning and filtering based on choices from Cell 3.# Dependencies: Cell 2 (global 'raw_data_from_sheet'), Cell 3 (global 'ANALYSIS_CONFIG')# Outputs: Global 'raw_data' (cleaned), 'data_filtered', 'LOAD_METADATA'# =============================================================================print("="*70)print("APPLYING CONFIGURATION & PREPARING DATA")print("="*70)try:    # --- 1. Check for inputs ---    if 'raw_data_from_sheet' not in globals():        raise NameError("Data not loaded. Please re-run Cell 2.")    if 'ANALYSIS_CONFIG' not in globals():        raise NameError("Configuration not set. Please run Cell 3 and click 'Save Configuration'.")    print("STEP 1: Loading configuration from Cell 3...")    col_map = ANALYSIS_CONFIG['col_map']    # --- 2. Rename & Clean Data ---    print("STEP 2: Cleaning and converting data...")    global raw_data    mapped_cols = col_map.keys()    other_cols = [col for col in raw_data_from_sheet.columns if col not in mapped_cols]    raw_data = raw_data_from_sheet[list(mapped_cols) + other_cols].copy()    raw_data.rename(columns=col_map, inplace=True)    original_rows = len(raw_data)    cleaning_log = []    # Convert numeric columns    numeric_columns = ['xe', 'sde', 'ne', 'xc', 'sdc', 'nc']    for col in numeric_columns:        if col not in raw_data.columns:             raise ValueError(f"Mapped column '{col}' not found after loading.")        raw_data[col] = raw_data[col].astype(str).str.strip().replace('', np.nan)        raw_data[col] = pd.to_numeric(raw_data[col], errors='coerce')    # Ensure ID is string    raw_data['id'] = raw_data['id'].astype(str).str.strip()    # Drop rows with missing essential values    essential_cols = ['xe', 'ne', 'xc', 'nc']    missing_essential = raw_data[essential_cols].isna().any(axis=1).sum()    raw_data.dropna(subset=essential_cols, inplace=True)    if missing_essential > 0:        cleaning_log.append(f"Dropped {missing_essential} rows (missing xe/ne/xc/nc)")    # Ensure N >= 1    invalid_n_count = 0    for col in ['ne', 'nc']:        raw_data[col] = raw_data[col].fillna(0).astype(int)        invalid_n = (raw_data[col] < 1).sum()        if invalid_n > 0:            raw_data = raw_data[raw_data[col] >= 1]            invalid_n_count += invalid_n    if invalid_n_count > 0:        cleaning_log.append(f"Dropped {invalid_n_count} rows (n < 1)")    final_rows = len(raw_data)    print(f"  ‚úì Clean dataset ready: {final_rows} rows remaining ({original_rows - final_rows} total removed)")    # --- 3. Identify Moderators ---    print("STEP 3: Identifying moderators...")    excluded_cols = ['id', 'xe', 'sde', 'ne', 'xc', 'sdc', 'nc']    global available_moderators    available_moderators = [col for col in raw_data.columns                            if col not in excluded_cols                            and raw_data[col].dtype == 'object']    print(f"  ‚úì Found {len(available_moderators)} potential moderators.")    # --- 4. Apply Pre-filter (if selected) ---    print("STEP 4: Applying pre-filter...")    global data_filtered    data_filtered = raw_data.copy()    prefilter_col = ANALYSIS_CONFIG['prefilter_col']    selected_values = ANALYSIS_CONFIG['prefilter_values_kept']    if prefilter_col != 'None':        data_filtered = data_filtered[data_filtered[prefilter_col].isin(selected_values)]        print(f"  ‚úì Pre-filter applied. {len(data_filtered)} rows remain.")    else:        print("  ‚úì No pre-filter applied.")    # --- 5. Save Metadata ---    global LOAD_METADATA    LOAD_METADATA = {        'timestamp': datetime.datetime.now(),        'original_rows': original_rows,        'final_rows_cleaned': final_rows,        'final_rows_filtered': len(data_filtered),        'cleaning_log': cleaning_log,        'available_moderators': available_moderators,        'column_map': col_map    }    # Update ANALYSIS_CONFIG with final counts    ANALYSIS_CONFIG['n_observations_pre_filter'] = final_rows    ANALYSIS_CONFIG['n_observations_post_filter'] = len(data_filtered)    ANALYSIS_CONFIG['n_papers_post_filter'] = data_filtered['id'].nunique()    # --- 6. Print Final Summary ---    print("\n" + "="*70)    print("‚úÖ DATA READY FOR ANALYSIS")    print("="*70)    print("\nüìã Final Data Summary:")    print("-" * 70)    print(f"  ‚Ä¢ Rows available for analysis: {len(data_filtered)}")    print(f"  ‚Ä¢ Unique studies: {data_filtered['id'].nunique()}")    print(f"  ‚Ä¢ Subgroup Factor 1: {ANALYSIS_CONFIG['filterCol1']}")    print(f"  ‚Ä¢ Subgroup Factor 2: {ANALYSIS_CONFIG['filterCol2']}")    print("\n" + "="*70)    print("‚ñ∂Ô∏è  Run the next cell (Calculate Effect Sizes) to proceed.")    print("="*70)except Exception as e:    print(f"\n‚ùå AN ERROR OCCURRED:\n")    print(f"  Type: {type(e).__name__}")    print(f"  Message: {e}")    print("\n  Traceback:")    traceback.print_exc(file=sys.stdout)

In [None]:
#@title üîß ADVANCED HETEROGENEITY ESTIMATORS#@title üîß ADVANCED HETEROGENEITY ESTIMATORS# =============================================================================# CELL 4.5: ADVANCED TAU-SQUARED ESTIMATORS# Purpose: Provides multiple methods for estimating between-study variance# Dependencies: None (standalone functions)# Used by: Cell 6 (Overall Analysis), Cell 8 (Subgroup Analysis)# =============================================================================print("="*70)print("HETEROGENEITY ESTIMATORS MODULE")print("="*70)# --- 1. DERSIMONIAN-LAIRD (Your current method) ---def calculate_tau_squared_DL(df, effect_col, var_col):    """    Calculate between-study variance using DerSimonian-Laird estimator.    Method from DerSimonian & Laird (1986). Meta-analysis in clinical trials.    Controlled Clinical Trials, 7(3), 177-188.    This is the most widely used method, though it can underestimate variance.    """    """    DerSimonian-Laird estimator for tau-squared    Advantages:    - Simple, fast    - Non-iterative    - Always converges    Disadvantages:    - Can underestimate tau¬≤ in small samples    - Negative values truncated to 0    - Less efficient than ML methods    Parameters:    -----------    df : DataFrame        Data with effect sizes and variances    effect_col : str        Name of effect size column    var_col : str        Name of variance column    Returns:    --------    float : tau-squared estimate    """    k = len(df)    if k < 2:        return 0.0    try:        # Fixed-effects weights        w = 1 / df[var_col]        sum_w = w.sum()        if sum_w <= 0:            return 0.0        # Fixed-effects pooled estimate        pooled_effect = (w * df[effect_col]).sum() / sum_w        # Q statistic        Q = (w * (df[effect_col] - pooled_effect)**2).sum()        df_Q = k - 1        # C constant        sum_w_sq = (w**2).sum()        C = sum_w - (sum_w_sq / sum_w)        # Tau-squared        if C > 0 and Q > df_Q:            tau_sq = (Q - df_Q) / C        else:            tau_sq = 0.0        return max(0.0, tau_sq)    except Exception as e:        warnings.warn(f"Error in DL estimator: {e}")        return 0.0# --- 2. RESTRICTED MAXIMUM LIKELIHOOD (REML) ---def calculate_tau_squared_REML(df, effect_col, var_col, max_iter=100, tol=1e-8):    """    Calculate tau-squared using Restricted Maximum Likelihood (REML).    Method from Viechtbauer (2005). Bias and efficiency of meta-analytic variance    estimators in the random-effects model. J. Educational & Behavioral Statistics, 30(3), 261-293.    REML is generally preferred over ML as it accounts for loss of degrees of freedom.    """    """    REML estimator for tau-squared (RECOMMENDED - Gold Standard)    Advantages:    - Unbiased for tau¬≤    - Accounts for uncertainty in estimating mu    - Better performance in small samples    - Generally preferred in literature    Disadvantages:    - Iterative (slightly slower)    - Can fail to converge in extreme cases    Reference:    Viechtbauer, W. (2005). Bias and efficiency of meta-analytic variance    estimators in the random-effects model. Journal of Educational and    Behavioral Statistics, 30(3), 261-293.    Parameters:    -----------    df : DataFrame        Data with effect sizes and variances    effect_col : str        Name of effect size column    var_col : str        Name of variance column    max_iter : int        Maximum iterations for optimization    tol : float        Convergence tolerance    Returns:    --------    float : tau-squared estimate    """    k = len(df)    if k < 2:        return 0.0    try:        # Extract data        yi = df[effect_col].values        vi = df[var_col].values        # Remove any infinite or negative variances        valid_mask = np.isfinite(vi) & (vi > 0)        if not valid_mask.all():            warnings.warn(f"Removed {(~valid_mask).sum()} observations with invalid variances")            yi = yi[valid_mask]            vi = vi[valid_mask]            k = len(yi)        if k < 2:            return 0.0        # REML objective function (negative log-likelihood)        def reml_objective(tau2):            # Ensure tau2 is non-negative            tau2 = max(0, tau2)            # Weights            wi = 1 / (vi + tau2)            sum_wi = wi.sum()            if sum_wi <= 0:                return 1e10            # Pooled estimate            mu = (wi * yi).sum() / sum_wi            # Q statistic            Q = (wi * (yi - mu)**2).sum()            # REML log-likelihood (negative for minimization)            # L = -0.5 * [sum(log(vi + tau2)) + log(sum(wi)) + Q]            log_lik = -0.5 * (                np.sum(np.log(vi + tau2)) +                np.log(sum_wi) +                Q            )            return -log_lik  # Return negative for minimization        # Get reasonable bounds for tau2        # Lower bound: 0        # Upper bound: Use variance of effect sizes as upper limit        var_yi = np.var(yi, ddof=1) if k > 2 else 1.0        upper_bound = max(10 * var_yi, 100)        # Optimize        result = minimize_scalar(            reml_objective,            bounds=(0, upper_bound),            method='bounded',            options={'maxiter': max_iter, 'xatol': tol}        )        if result.success:            tau_sq = result.x        else:            warnings.warn("REML optimization did not converge, using DL fallback")            tau_sq = calculate_tau_squared_DL(df, effect_col, var_col)        return max(0.0, tau_sq)    except Exception as e:        warnings.warn(f"Error in REML estimator: {e}, using DL fallback")        return calculate_tau_squared_DL(df, effect_col, var_col)# --- 3. MAXIMUM LIKELIHOOD (ML) ---def calculate_tau_squared_ML(df, effect_col, var_col, max_iter=100, tol=1e-8):    """    Calculate tau-squared using Maximum Likelihood (ML).    Method from Viechtbauer (2005). Bias and efficiency of meta-analytic variance    estimators. J. Educational & Behavioral Statistics, 30(3), 261-293.    ML estimates can be biased downward in small samples.    """    """    Maximum Likelihood estimator for tau-squared    Advantages:    - Efficient asymptotically    - Produces valid estimates    Disadvantages:    - Biased downward (underestimates tau¬≤)    - Less preferred than REML    - REML is generally recommended instead    Parameters:    -----------    df : DataFrame        Data with effect sizes and variances    effect_col : str        Name of effect size column    var_col : str        Name of variance column    max_iter : int        Maximum iterations    tol : float        Convergence tolerance    Returns:    --------    float : tau-squared estimate    """    k = len(df)    if k < 2:        return 0.0    try:        yi = df[effect_col].values        vi = df[var_col].values        valid_mask = np.isfinite(vi) & (vi > 0)        if not valid_mask.all():            yi = yi[valid_mask]            vi = vi[valid_mask]            k = len(yi)        if k < 2:            return 0.0        # ML objective function        def ml_objective(tau2):            tau2 = max(0, tau2)            wi = 1 / (vi + tau2)            sum_wi = wi.sum()            if sum_wi <= 0:                return 1e10            mu = (wi * yi).sum() / sum_wi            Q = (wi * (yi - mu)**2).sum()            # ML log-likelihood (without the constant term)            log_lik = -0.5 * (np.sum(np.log(vi + tau2)) + Q)            return -log_lik        var_yi = np.var(yi, ddof=1) if k > 2 else 1.0        upper_bound = max(10 * var_yi, 100)        result = minimize_scalar(            ml_objective,            bounds=(0, upper_bound),            method='bounded',            options={'maxiter': max_iter, 'xatol': tol}        )        if result.success:            tau_sq = result.x        else:            warnings.warn("ML optimization did not converge, using DL fallback")            tau_sq = calculate_tau_squared_DL(df, effect_col, var_col)        return max(0.0, tau_sq)    except Exception as e:        warnings.warn(f"Error in ML estimator: {e}, using DL fallback")        return calculate_tau_squared_DL(df, effect_col, var_col)# --- 4. PAULE-MANDEL (PM) ---def calculate_tau_squared_PM(df, effect_col, var_col, max_iter=100, tol=1e-8):    """    Calculate tau-squared using Paule-Mandel estimator.    Method from Paule & Mandel (1982) and Viechtbauer (2005).    This iterative method produces unbiased estimates but is computationally intensive.    """    """    Paule-Mandel estimator for tau-squared    Advantages:    - Exact solution to Q = k-1 equation    - Non-iterative in principle    - Good performance    Disadvantages:    - Can be unstable with few studies    - Requires iterative solution in practice    Reference:    Paule, R. C., & Mandel, J. (1982). Consensus values and weighting factors.    Journal of Research of the National Bureau of Standards, 87(5), 377-385.    Parameters:    -----------    df : DataFrame        Data with effect sizes and variances    effect_col : str        Name of effect size column    var_col : str        Name of variance column    max_iter : int        Maximum iterations    tol : float        Convergence tolerance    Returns:    --------    float : tau-squared estimate    """    k = len(df)    if k < 2:        return 0.0    try:        yi = df[effect_col].values        vi = df[var_col].values        valid_mask = np.isfinite(vi) & (vi > 0)        if not valid_mask.all():            yi = yi[valid_mask]            vi = vi[valid_mask]            k = len(yi)        if k < 2:            return 0.0        df_Q = k - 1        # PM objective: Find tau2 such that Q(tau2) = k - 1        def pm_objective(tau2):            tau2 = max(0, tau2)            wi = 1 / (vi + tau2)            sum_wi = wi.sum()            if sum_wi <= 0:                return 1e10            mu = (wi * yi).sum() / sum_wi            Q = (wi * (yi - mu)**2).sum()            # We want Q = k - 1            return (Q - df_Q)**2        var_yi = np.var(yi, ddof=1) if k > 2 else 1.0        upper_bound = max(10 * var_yi, 100)        result = minimize_scalar(            pm_objective,            bounds=(0, upper_bound),            method='bounded',            options={'maxiter': max_iter, 'xatol': tol}        )        if result.success and result.fun < 1:  # Good convergence            tau_sq = result.x        else:            # If PM fails, use DL            tau_sq = calculate_tau_squared_DL(df, effect_col, var_col)        return max(0.0, tau_sq)    except Exception as e:        warnings.warn(f"Error in PM estimator: {e}, using DL fallback")        return calculate_tau_squared_DL(df, effect_col, var_col)# --- 5. SIDIK-JONKMAN (SJ) ---def calculate_tau_squared_SJ(df, effect_col, var_col):    """    Calculate tau-squared using Sidik-Jonkman estimator.    Method from Sidik & Jonkman (2005). Simple heterogeneity variance estimation for    meta-analysis. Applied Statistics, 54(2), 367-384.    This estimator performs well with heterogeneous effect sizes.    """    """    Sidik-Jonkman estimator for tau-squared    Advantages:    - Simple, non-iterative    - Good performance with few studies    - Conservative (tends to produce larger estimates)    Disadvantages:    - Can be overly conservative    - Less commonly used    Reference:    Sidik, K., & Jonkman, J. N. (2005). Simple heterogeneity variance    estimation for meta-analysis. Journal of the Royal Statistical Society,    Series C, 54(2), 367-384.    Parameters:    -----------    df : DataFrame        Data with effect sizes and variances    effect_col : str        Name of effect size column    var_col : str        Name of variance column    Returns:    --------    float : tau-squared estimate    """    k = len(df)    if k < 3:  # Need at least 3 studies for SJ        return calculate_tau_squared_DL(df, effect_col, var_col)    try:        yi = df[effect_col].values        vi = df[var_col].values        valid_mask = np.isfinite(vi) & (vi > 0)        if not valid_mask.all():            yi = yi[valid_mask]            vi = vi[valid_mask]            k = len(yi)        if k < 3:            return calculate_tau_squared_DL(df, effect_col, var_col)        # Weights for typical average        wi = 1 / vi        sum_wi = wi.sum()        # Typical average (weighted mean)        y_bar = (wi * yi).sum() / sum_wi        # SJ estimator        numerator = ((yi - y_bar)**2 / vi).sum()        denominator = k - 1        tau_sq = (numerator / denominator) - (k / sum_wi)        return max(0.0, tau_sq)    except Exception as e:        warnings.warn(f"Error in SJ estimator: {e}, using DL fallback")        return calculate_tau_squared_DL(df, effect_col, var_col)# --- 6. UNIFIED ESTIMATOR FUNCTION ---def calculate_tau_squared(df, effect_col, var_col, method='REML', **kwargs):    """    Unified function to calculate tau-squared using specified method    Parameters:    -----------    df : DataFrame        Data with effect sizes and variances    effect_col : str        Name of effect size column    var_col : str        Name of variance column    method : str        Estimation method: 'DL', 'REML', 'ML', 'PM', 'SJ'        Default: 'REML' (recommended)    **kwargs : dict        Additional arguments passed to estimator    Returns:    --------    float : tau-squared estimate    dict : additional information (method used, convergence, etc.)    """    method = method.upper()    estimators = {        'DL': calculate_tau_squared_DL,        'REML': calculate_tau_squared_REML,        'ML': calculate_tau_squared_ML,        'PM': calculate_tau_squared_PM,        'SJ': calculate_tau_squared_SJ    }    if method not in estimators:        warnings.warn(f"Unknown method '{method}', using REML")        method = 'REML'    try:        tau_sq = estimators[method](df, effect_col, var_col, **kwargs)        info = {            'method': method,            'tau_squared': tau_sq,            'tau': np.sqrt(tau_sq),            'success': True        }        return tau_sq, info    except Exception as e:        warnings.warn(f"Error with {method}, falling back to DL: {e}")        tau_sq = calculate_tau_squared_DL(df, effect_col, var_col)        info = {            'method': 'DL',            'tau_squared': tau_sq,            'tau': np.sqrt(tau_sq),            'success': False,            'fallback': True,            'error': str(e)        }        return tau_sq, info# --- 7. COMPARISON FUNCTION ---def compare_tau_estimators(df, effect_col, var_col):    """    Compare all tau-squared estimators on the same dataset    Useful for sensitivity analysis and understanding which method    is most appropriate for your data.    Parameters:    -----------    df : DataFrame        Data with effect sizes and variances    effect_col : str        Name of effect size column    var_col : str        Name of variance column    Returns:    --------    DataFrame : Comparison of all methods    """    methods = ['DL', 'REML', 'ML', 'PM', 'SJ']    results = []    for method in methods:        try:            tau_sq, info = calculate_tau_squared(df, effect_col, var_col, method=method)            results.append({                'Method': method,                'œÑ¬≤': tau_sq,                'œÑ': np.sqrt(tau_sq),                'Success': info['success']            })        except Exception as e:            results.append({                'Method': method,                'œÑ¬≤': np.nan,                'œÑ': np.nan,                'Success': False            })    comparison_df = pd.DataFrame(results)    return comparison_df# --- 8. DISPLAY MODULE INFO ---print("\n‚úÖ Heterogeneity estimators loaded successfully")print("\nüìä Available methods:")print("  ‚Ä¢ DL (DerSimonian-Laird) - Simple, fast")print("  ‚Ä¢ REML (Restricted ML) - ‚≠ê RECOMMENDED (Gold standard)")print("  ‚Ä¢ ML (Maximum Likelihood) - Asymptotically efficient")print("  ‚Ä¢ PM (Paule-Mandel) - Exact Q solution")print("  ‚Ä¢ SJ (Sidik-Jonkman) - Conservative, good for small k")print("\nüí° Usage:")print("  tau_sq, info = calculate_tau_squared(df, 'effect_size', 'variance', method='REML')")print("  comparison = compare_tau_estimators(df, 'effect_size', 'variance')")print("\n" + "="*70)

In [None]:
# ‚ö†Ô∏è PREREQUISITES:# - Must complete configuration first# - Ensure required columns are present in your data## Expected runtime: < 5 seconds for most datasets# For large datasets (n > 500): up to 30 seconds## INTERPRETATION NOTE:# - Effect sizes will be calculated based on your selected method# - Check for warnings about missing data or calculation errors# Hedges' g and Cohen's d Calculation# Methods from Hedges & Olkin (1985). Statistical methods for meta-analysis.# Academic Press.## Cohen's d = (M1 - M2) / SD_pooled# Hedges' g = d * (1 - 3/(4*df - 1))  # Small sample correction## These are standardized mean differences, common in psychology and medicine.# Log Response Ratio (lnRR) Calculation# Method from Hedges et al. (1999) and Lajeunesse (2011).# Lajeunesse, M.J. (2011). On the meta-analysis of response ratios for studies with# correlated and multi-group designs. Ecology, 92(11), 2049-2055.## lnRR = ln(mean_treatment / mean_control)# Commonly used in ecology and environmental sciences.#@title üî¨ DETECT & SELECT EFFECT SIZE TYPE#@title üî¨ DETECT & SELECT EFFECT SIZE TYPE# =============================================================================# CELL 4: EFFECT SIZE TYPE DETECTION AND SELECTION# Purpose: Analyze data characteristics and recommend appropriate effect size# Dependencies: Cell 3 (data_filtered)# Outputs: ANALYSIS_CONFIG with effect_size_type and es_config# =============================================================================print("\n" + "="*70)print("EFFECT SIZE TYPE DETECTION & SELECTION")print("="*70)print(f"Timestamp: {datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")# --- STEP 1: DATA CHARACTERISTICS ANALYSIS ---print("\n" + "="*70)print("STEP 1: ANALYZING DATA CHARACTERISTICS")print("="*70)print(f"\nüîç Examining {len(data_filtered)} observations across {data_filtered['id'].nunique()} studies...")# Extract key statisticsxe_stats = data_filtered['xe'].describe()xc_stats = data_filtered['xc'].describe()# Check for standard deviationshas_sde = 'sde' in data_filtered.columns and data_filtered['sde'].notna().any()has_sdc = 'sdc' in data_filtered.columns and data_filtered['sdc'].notna().any()sd_availability = data_filtered[['sde', 'sdc']].notna().all(axis=1).sum() if has_sde and has_sdc else 0sd_pct = (sd_availability / len(data_filtered)) * 100 if len(data_filtered) > 0 else 0print(f"\nüìä Basic Statistics:")print(f"  Treatment (xe):")print(f"    Mean:   {xe_stats['mean']:>10.4f}")print(f"    Median: {xe_stats['50%']:>10.4f}")print(f"    Std:    {xe_stats['std']:>10.4f}")print(f"    Range:  [{xe_stats['min']:.4f}, {xe_stats['max']:.4f}]")print(f"\n  Control (xc):")print(f"    Mean:   {xc_stats['mean']:>10.4f}")print(f"    Median: {xc_stats['50%']:>10.4f}")print(f"    Std:    {xc_stats['std']:>10.4f}")print(f"    Range:  [{xc_stats['min']:.4f}, {xc_stats['max']:.4f}]")print(f"\n  Standard Deviations:")print(f"    Available: {sd_availability}/{len(data_filtered)} ({sd_pct:.1f}%)")# --- STEP 2: CHARACTERISTIC DETECTION ---print("\n" + "="*70)print("STEP 2: DETECTING DATA PATTERNS")print("="*70)# Initialize detection resultsdetection_results = {}# Characteristic 1: Control values near 1.0 (fold-change normalization)control_near_one = ((data_filtered['xc'] >= 0.95) & (data_filtered['xc'] <= 1.05)).sum()control_exactly_one = (data_filtered['xc'] == 1.0).sum()pct_control_near_one = (control_near_one / len(data_filtered)) * 100pct_control_exactly_one = (control_exactly_one / len(data_filtered)) * 100detection_results['control_normalization'] = {    'near_one': control_near_one,    'pct_near_one': pct_control_near_one,    'exactly_one': control_exactly_one,    'pct_exactly_one': pct_control_exactly_one}print(f"\n1Ô∏è‚É£  Control Group Normalization:")print(f"    Exactly 1.0:      {control_exactly_one:>5} ({pct_control_exactly_one:>5.1f}%)")print(f"    Near 1.0 (¬±0.05): {control_near_one:>5} ({pct_control_near_one:>5.1f}%)")if pct_control_exactly_one > 50:    print(f"    ‚Üí Strong evidence of fold-change normalization ‚úì")elif pct_control_near_one > 30:    print(f"    ‚Üí Moderate evidence of fold-change normalization ‚ö†")else:    print(f"    ‚Üí No evidence of fold-change normalization")# Characteristic 2: Negative values (incompatible with ratios)has_negative_xe = (data_filtered['xe'] < 0).any()has_negative_xc = (data_filtered['xc'] < 0).any()n_negative_xe = (data_filtered['xe'] < 0).sum()n_negative_xc = (data_filtered['xc'] < 0).sum()detection_results['negative_values'] = {    'has_negative_xe': has_negative_xe,    'has_negative_xc': has_negative_xc,    'n_negative_xe': n_negative_xe,    'n_negative_xc': n_negative_xc}print(f"\n2Ô∏è‚É£  Negative Values (invalid for ratios):")print(f"    Treatment: {n_negative_xe} negative values ({(n_negative_xe/len(data_filtered))*100:.1f}%)")print(f"    Control:   {n_negative_xc} negative values ({(n_negative_xc/len(data_filtered))*100:.1f}%)")if has_negative_xe or has_negative_xc:    print(f"    ‚Üí Ratio measures NOT applicable ‚ùå")    print(f"    ‚Üí Standardized mean differences required ‚úì")else:    print(f"    ‚Üí All values positive (ratio measures possible) ‚úì")# Characteristic 3: Zero values (problematic for log ratios)has_zero_xe = (data_filtered['xe'] == 0).any()has_zero_xc = (data_filtered['xc'] == 0).any()n_zero_xe = (data_filtered['xe'] == 0).sum()n_zero_xc = (data_filtered['xc'] == 0).sum()detection_results['zero_values'] = {    'has_zero_xe': has_zero_xe,    'has_zero_xc': has_zero_xc,    'n_zero_xe': n_zero_xe,    'n_zero_xc': n_zero_xc}print(f"\n3Ô∏è‚É£  Zero Values (problematic for log ratios):")print(f"    Treatment: {n_zero_xe} zeros ({(n_zero_xe/len(data_filtered))*100:.1f}%)")print(f"    Control:   {n_zero_xc} zeros ({(n_zero_xc/len(data_filtered))*100:.1f}%)")if has_zero_xe or has_zero_xc:    print(f"    ‚Üí Warning: Zero values will need special handling for lnRR ‚ö†")else:    print(f"    ‚Üí No zeros detected ‚úì")# Characteristic 4: Scale heterogeneityxe_range = xe_stats['max'] - xe_stats['min']xc_range = xc_stats['max'] - xc_stats['min']scale_ratio = max(xe_range, xc_range) / (min(xe_range, xc_range) + 0.0001)# Calculate coefficient of variationxe_cv = (xe_stats['std'] / xe_stats['mean']) * 100 if xe_stats['mean'] != 0 else np.infxc_cv = (xc_stats['std'] / xc_stats['mean']) * 100 if xc_stats['mean'] != 0 else np.infdetection_results['scale_heterogeneity'] = {    'xe_range': xe_range,    'xc_range': xc_range,    'scale_ratio': scale_ratio,    'xe_cv': xe_cv,    'xc_cv': xc_cv}print(f"\n4Ô∏è‚É£  Scale Heterogeneity:")print(f"    Treatment range: {xe_range:.4f}")print(f"    Control range:   {xc_range:.4f}")print(f"    Range ratio:     {scale_ratio:.2f}√ó")print(f"    Treatment CV:    {xe_cv:.1f}%")print(f"    Control CV:      {xc_cv:.1f}%")if scale_ratio > 100:    print(f"    ‚Üí Very high heterogeneity - ratio measures recommended ‚úì")elif scale_ratio > 10:    print(f"    ‚Üí Moderate heterogeneity - ratio measures beneficial ‚ö†")else:    print(f"    ‚Üí Low heterogeneity - standardized differences work well ‚úì")# Characteristic 5: Order of magnitudexe_magnitude = np.log10(xe_stats['mean']) if xe_stats['mean'] > 0 else Nonexc_magnitude = np.log10(xc_stats['mean']) if xc_stats['mean'] > 0 else Nonedetection_results['order_of_magnitude'] = {    'xe_magnitude': xe_magnitude,    'xc_magnitude': xc_magnitude}print(f"\n5Ô∏è‚É£  Order of Magnitude:")if xe_magnitude is not None and xc_magnitude is not None:    print(f"    Treatment: 10^{xe_magnitude:.2f} (mean = {xe_stats['mean']:.4f})")    print(f"    Control:   10^{xc_magnitude:.2f} (mean = {xc_stats['mean']:.4f})")    if abs(xe_magnitude) > 2 or abs(xc_magnitude) > 2:        print(f"    ‚Üí Large values suggest ratio-scale data ‚úì")else:    print(f"    ‚Üí Cannot calculate (zero or negative values present)")# Characteristic 6: Ratio of meansif xc_stats['mean'] > 0 and xe_stats['mean'] > 0:    mean_ratio = xe_stats['mean'] / xc_stats['mean']    detection_results['mean_ratio'] = mean_ratio    print(f"\n6Ô∏è‚É£  Treatment/Control Ratio:")    print(f"    Ratio of means: {mean_ratio:.4f}")    if 0.8 < xc_stats['mean'] < 1.2:        print(f"    Control near 1.0 suggests fold-change data ‚úì")else:    detection_results['mean_ratio'] = None    print(f"\n6Ô∏è‚É£  Treatment/Control Ratio:")    print(f"    ‚Üí Cannot calculate (zero or negative means)")# --- STEP 3: RECOMMENDATION ENGINE ---print("\n" + "="*70)print("STEP 3: EFFECT SIZE RECOMMENDATION")print("="*70)recommendation_reasons = []score_lnRR = 0score_hedges_g = 0confidence_factors = []# Decision Rule 1: Negative valuesif has_negative_xe or has_negative_xc:    score_hedges_g += 10  # Strong preference    recommendation_reasons.append({        'factor': 'Negative values present',        'weight': '+++',        'favors': 'Hedges g',        'explanation': 'Ratio measures cannot handle negative values'    })    confidence_factors.append('negative_values')else:    score_lnRR += 2    recommendation_reasons.append({        'factor': 'All positive values',        'weight': '+',        'favors': 'lnRR',        'explanation': 'Compatible with ratio measures'    })# Decision Rule 2: Control normalizationif pct_control_exactly_one > 50:    score_lnRR += 5    recommendation_reasons.append({        'factor': f'{pct_control_exactly_one:.1f}% controls = 1.0',        'weight': '+++',        'favors': 'lnRR',        'explanation': 'Strong evidence of fold-change normalization'    })    confidence_factors.append('fold_change_normalization')elif pct_control_near_one > 30:    score_lnRR += 3    recommendation_reasons.append({        'factor': f'{pct_control_near_one:.1f}% controls ‚âà 1.0',        'weight': '++',        'favors': 'lnRR',        'explanation': 'Evidence of fold-change normalization'    })elif 0.8 < xc_stats['mean'] < 1.2:    score_lnRR += 1    recommendation_reasons.append({        'factor': 'Mean control ‚âà 1.0',        'weight': '+',        'favors': 'lnRR',        'explanation': 'Control centered near unity'    })# Decision Rule 3: Scale heterogeneityif scale_ratio > 100:    score_lnRR += 3    recommendation_reasons.append({        'factor': f'Scale ratio {scale_ratio:.0f}√ó',        'weight': '+++',        'favors': 'lnRR',        'explanation': 'Very high heterogeneity across studies'    })    confidence_factors.append('scale_heterogeneity')elif scale_ratio > 10:    score_lnRR += 2    recommendation_reasons.append({        'factor': f'Scale ratio {scale_ratio:.1f}√ó',        'weight': '++',        'favors': 'lnRR',        'explanation': 'Moderate scale heterogeneity'    })else:    score_hedges_g += 1    recommendation_reasons.append({        'factor': f'Scale ratio {scale_ratio:.1f}√ó',        'weight': '+',        'favors': 'Hedges g',        'explanation': 'Low scale heterogeneity'    })# Decision Rule 4: Zero valuesif has_zero_xe or has_zero_xc:    score_hedges_g += 2    recommendation_reasons.append({        'factor': 'Zero values present',        'weight': '++',        'favors': 'Hedges g',        'explanation': 'Zero values problematic for log ratios'    })    confidence_factors.append('zero_values')# Decision Rule 5: Standard deviationsif sd_pct > 80:    score_hedges_g += 1    recommendation_reasons.append({        'factor': f'{sd_pct:.1f}% have SD data',        'weight': '+',        'favors': 'Hedges g',        'explanation': 'Excellent SD coverage for standardized differences'    })elif sd_pct < 20:    recommendation_reasons.append({        'factor': f'Only {sd_pct:.1f}% have SD data',        'weight': '‚ö†',        'favors': 'Neither',        'explanation': 'Limited SD data may require mean-only methods'    })# --- STEP 4: DISPLAY RECOMMENDATION ANALYSIS ---print("\nüìã Decision Factors:")print(f"  {'Factor':<40} {'Weight':<8} {'Favors':<12} Explanation")print(f"  {'-'*40} {'-'*8} {'-'*12} {'-'*40}")for reason in recommendation_reasons:    print(f"  {reason['factor']:<40} {reason['weight']:<8} {reason['favors']:<12} {reason['explanation']}")print(f"\nüìä Recommendation Scores:")print(f"  log Response Ratio (lnRR): {score_lnRR:>3} points")print(f"  Hedges' g (SMD):           {score_hedges_g:>3} points")# Determine recommendationscore_diff = abs(score_lnRR - score_hedges_g)if score_lnRR > score_hedges_g:    recommended_type = 'lnRR'    confidence = "High" if score_diff >= 5 else "Moderate" if score_diff >= 3 else "Low"elif score_hedges_g > score_lnRR:    recommended_type = 'hedges_g'    confidence = "High" if score_diff >= 5 else "Moderate" if score_diff >= 3 else "Low"else:    recommended_type = 'hedges_g'  # Default to Hedges' g in case of tie    confidence = "Low"# Store detection metadataDETECTION_METADATA = {    'timestamp': datetime.datetime.now(),    'detection_results': detection_results,    'recommendation_reasons': recommendation_reasons,    'scores': {        'lnRR': score_lnRR,        'hedges_g': score_hedges_g    },    'recommended_type': recommended_type,    'confidence': confidence,    'confidence_factors': confidence_factors}# --- STEP 5: DISPLAY RECOMMENDATION ---print("\n" + "="*70)print("RECOMMENDATION")print("="*70)# Create recommendation HTML based on resultif recommended_type == 'lnRR':    recommendation_color = '#d4edda'    recommendation_border = '#28a745'    recommendation_text_color = '#155724'    recommendation_title = "‚úì RECOMMENDED: log Response Ratio (lnRR)"    recommendation_body = f"""        <p><b>Confidence: {confidence}</b> (Score: {score_lnRR} vs {score_hedges_g})</p>        <p>Your data shows characteristics of <b>ratio-based measurements</b> (e.g., gene expression        fold-changes, relative abundances, growth rates, or other multiplicative scales).</p>        <p><b>Why lnRR is appropriate:</b></p>        <ul>            <li>Works with ratio/multiplicative scales</li>            <li>Natural for fold-change data (control = 1.0)</li>            <li>Handles scale heterogeneity well</li>            <li>Direct biological interpretation as fold-changes</li>            <li>Symmetric around no effect (lnRR = 0)</li>        </ul>        <p><b>Interpretation guide:</b></p>        <ul>            <li>lnRR = 0 ‚Üí No change (RR = 1)</li>            <li>lnRR = 0.69 ‚Üí 2-fold increase (RR = 2)</li>            <li>lnRR = -0.69 ‚Üí 2-fold decrease (RR = 0.5)</li>        </ul>        {"<p><b>‚ö† Note:</b> Zero values detected will be handled with small constant addition.</p>" if (has_zero_xe or has_zero_xc) else ""}    """else:    recommendation_color = '#d1ecf1'    recommendation_border = '#17a2b8'    recommendation_text_color = '#0c5460'    recommendation_title = "‚úì RECOMMENDED: Hedges' g (Standardized Mean Difference)"    recommendation_body = f"""        <p><b>Confidence: {confidence}</b> (Score: {score_hedges_g} vs {score_lnRR})</p>        <p>Your data shows characteristics of <b>absolute measurements</b> with potentially        different scales or units across studies.</p>        <p><b>Why Hedges' g is appropriate:</b></p>        <ul>            <li>Standardizes effects across different measurement scales</li>            <li>Handles negative values naturally</li>            <li>Includes small-sample bias correction</li>            <li>Widely used and interpretable</li>            <li>Comparable across different metrics</li>        </ul>        <p><b>Interpretation guide (Cohen's benchmarks):</b></p>        <ul>            <li>|g| < 0.2 ‚Üí Negligible effect</li>            <li>|g| ‚âà 0.2-0.5 ‚Üí Small effect</li>            <li>|g| ‚âà 0.5-0.8 ‚Üí Medium effect</li>            <li>|g| > 0.8 ‚Üí Large effect</li>        </ul>        <p><b>Note:</b> Standard deviations available for {sd_pct:.1f}% of observations.</p>    """recommendation_html = f"""<div style='background-color: {recommendation_color}; border: 2px solid {recommendation_border};            padding: 20px; border-radius: 8px; margin: 15px 0;'>    <h3 style='color: {recommendation_text_color}; margin-top: 0;'>{recommendation_title}</h3>    <div style='color: {recommendation_text_color};'>        {recommendation_body}    </div></div>"""display(HTML(recommendation_html))# --- STEP 6: CREATE SELECTION WIDGET ---print("\n" + "="*70)print("STEP 4: EFFECT SIZE SELECTION")print("="*70)effect_size_widget = widgets.RadioButtons(    options=[        ('log Response Ratio (lnRR) - for ratio/fold-change data', 'lnRR'),        ("Hedges' g - for standardized mean differences (small-sample corrected)", 'hedges_g'),        ("Cohen's d - for standardized mean differences (no correction)", 'cohen_d'),        ('log Odds Ratio (logOR) - for binary outcomes', 'log_or')    ],    value=recommended_type,    description='Effect Size:',    style={'description_width': 'initial'},    layout=widgets.Layout(width='650px'))# Information panels for each effect size typeinfo_panels = {    'lnRR': """    <div style='background-color: #f8f9fa; padding: 15px; border-radius: 5px; border-left: 4px solid #28a745;'>        <h4 style='margin-top: 0; color: #28a745;'>üìä log Response Ratio (lnRR)</h4>        <p><b>Formula:</b> lnRR = ln(xÃÑ‚Çë / xÃÑ‚Çú)</p>        <p><b>Variance:</b> Var(lnRR) = SD¬≤‚Çë/(n‚Çë¬∑xÃÑ¬≤‚Çë) + SD¬≤‚Çú/(n‚Çú¬∑xÃÑ¬≤‚Çú)</p>        <p><b>Interpretation:</b></p>        <table style='width: 100%; border-collapse: collapse;'>            <tr style='background: #e9ecef;'>                <th style='padding: 8px; text-align: left;'>lnRR</th>                <th style='padding: 8px; text-align: left;'>Response Ratio</th>                <th style='padding: 8px; text-align: left;'>Meaning</th>            </tr>            <tr><td style='padding: 8px;'>0</td><td style='padding: 8px;'>1.0</td><td style='padding: 8px;'>No change</td></tr>            <tr><td style='padding: 8px;'>+0.69</td><td style='padding: 8px;'>2.0</td><td style='padding: 8px;'>2√ó increase (doubled)</td></tr>            <tr><td style='padding: 8px;'>-0.69</td><td style='padding: 8px;'>0.5</td><td style='padding: 8px;'>2√ó decrease (halved)</td></tr>            <tr><td style='padding: 8px;'>+1.10</td><td style='padding: 8px;'>3.0</td><td style='padding: 8px;'>3√ó increase (tripled)</td></tr>        </table>        <p><b>Best for:</b> Gene expression, abundances, concentrations, rates, any multiplicative data</p>        <p><b>Conversion:</b> Response Ratio (RR) = exp(lnRR), % Change = (RR - 1) √ó 100%</p>        <p><b>Requirements:</b> All values must be positive (xe, xc > 0)</p>    </div>    """,    'hedges_g': """    <div style='background-color: #f8f9fa; padding: 15px; border-radius: 5px; border-left: 4px solid #17a2b8;'>        <h4 style='margin-top: 0; color: #17a2b8;'>üìä Hedges' g (Standardized Mean Difference)</h4>        <p><b>Formula:</b> g = [(xÃÑ‚Çë - xÃÑ‚Çú) / SD‚Çö‚Çí‚Çí‚Çó‚Çëùíπ] √ó J</p>        <p>Where J = 1 - 3/(4df - 1) is the small-sample correction factor</p>        <p><b>Variance:</b> Vg = [(n‚Çë+n‚Çú)/(n‚Çë¬∑n‚Çú) + g¬≤/(2(n‚Çë+n‚Çú))] √ó J¬≤</p>        <p><b>Interpretation (Cohen's benchmarks):</b></p>        <table style='width: 100%; border-collapse: collapse;'>            <tr style='background: #e9ecef;'>                <th style='padding: 8px; text-align: left;'>|g|</th>                <th style='padding: 8px; text-align: left;'>Effect Size</th>                <th style='padding: 8px; text-align: left;'>Description</th>            </tr>            <tr><td style='padding: 8px;'>< 0.2</td><td style='padding: 8px;'>Negligible</td><td style='padding: 8px;'>Trivial difference</td></tr>            <tr><td style='padding: 8px;'>0.2 - 0.5</td><td style='padding: 8px;'>Small</td><td style='padding: 8px;'>Noticeable but small</td></tr>            <tr><td style='padding: 8px;'>0.5 - 0.8</td><td style='padding: 8px;'>Medium</td><td style='padding: 8px;'>Moderate difference</td></tr>            <tr><td style='padding: 8px;'>> 0.8</td><td style='padding: 8px;'>Large</td><td style='padding: 8px;'>Substantial difference</td></tr>        </table>        <p><b>Best for:</b> Standardizing effects across different measurement scales</p>        <p><b>Note:</b> Preferred over Cohen's d for small samples (reduces bias)</p>        <p><b>Requirements:</b> Need standard deviations (SDs) for accurate calculation</p>    </div>    """,    'cohen_d': """    <div style='background-color: #f8f9fa; padding: 15px; border-radius: 5px; border-left: 4px solid #6c757d;'>        <h4 style='margin-top: 0; color: #6c757d;'>üìä Cohen's d (Standardized Mean Difference)</h4>        <p><b>Formula:</b> d = (xÃÑ‚Çë - xÃÑ‚Çú) / SD‚Çö‚Çí‚Çí‚Çó‚Çëùíπ</p>        <p><b>Variance:</b> Vd = (n‚Çë+n‚Çú)/(n‚Çë¬∑n‚Çú) + d¬≤/(2(n‚Çë+n‚Çú))</p>        <p><b>Interpretation:</b> Same as Hedges' g (Cohen's benchmarks apply)</p>        <p><b>Difference from Hedges' g:</b></p>        <ul>            <li>No small-sample correction (J factor = 1)</li>            <li>Slightly biased upward for small samples</li>            <li>Bias negligible when n > 20 per group</li>        </ul>        <p><b>Best for:</b> Large samples where bias correction is unnecessary</p>        <p><b>When to use:</b> Historical comparisons, large meta-analyses (n > 20/group)</p>        <p><b>Note:</b> Hedges' g is generally preferred in modern meta-analysis</p>    </div>    """,    'log_or': """    <div style='background-color: #f8f9fa; padding: 15px; border-radius: 5px; border-left: 4px solid #ffc107;'>        <h4 style='margin-top: 0; color: #856404;'>üìä log Odds Ratio (logOR)</h4>        <p><b>Formula:</b> logOR = ln[(a‚Çë¬∑d‚Çú) / (b‚Çë¬∑c‚Çú)]</p>        <p>For 2√ó2 table: [a‚Çë, b‚Çë] = [successes, failures] in treatment</p>        <p>                [c‚Çú, d‚Çú] = [successes, failures] in control</p>        <p><b>Variance:</b> Var(logOR) = 1/a‚Çë + 1/b‚Çë + 1/c‚Çú + 1/d‚Çú</p>        <p><b>Interpretation:</b></p>        <table style='width: 100%; border-collapse: collapse;'>            <tr style='background: #e9ecef;'>                <th style='padding: 8px; text-align: left;'>logOR</th>                <th style='padding: 8px; text-align: left;'>Odds Ratio</th>                <th style='padding: 8px; text-align: left;'>Meaning</th>            </tr>            <tr><td style='padding: 8px;'>0</td><td style='padding: 8px;'>1.0</td><td style='padding: 8px;'>No association</td></tr>            <tr><td style='padding: 8px;'>> 0</td><td style='padding: 8px;'>> 1.0</td><td style='padding: 8px;'>Positive association</td></tr>            <tr><td style='padding: 8px;'>< 0</td><td style='padding: 8px;'>< 1.0</td><td style='padding: 8px;'>Negative association</td></tr>            <tr><td style='padding: 8px;'>+0.69</td><td style='padding: 8px;'>2.0</td><td style='padding: 8px;'>2√ó higher odds</td></tr>        </table>        <p><b>Best for:</b> Binary outcomes (success/failure, disease/healthy, present/absent)</p>        <p><b>Conversion:</b> Odds Ratio (OR) = exp(logOR)</p>        <p><b>Requirements:</b> Count data for binary outcomes in 2√ó2 contingency tables</p>        <p><b>Note:</b> Zero cells typically handled with continuity correction (+0.5)</p>    </div>    """}info_output = widgets.Output()def update_info_panel(change):    """Update information panel when selection changes"""    with info_output:        clear_output()        display(HTML(info_panels[change['new']]))effect_size_widget.observe(update_info_panel, names='value')# Initialize with recommended type infowith info_output:    display(HTML(info_panels[recommended_type]))# Proceed buttonproceed_button = widgets.Button(    description='‚úì Confirm Selection & Calculate Effect Sizes',    button_style='success',    layout=widgets.Layout(width='450px', height='50px'),    style={'font_weight': 'bold'})proceed_output = widgets.Output()def on_proceed_clicked(b):    """Save selection and proceed"""    with proceed_output:        clear_output()        selected_type = effect_size_widget.value        print("\n" + "="*70)        print("EFFECT SIZE CONFIGURATION CONFIRMED")        print("="*70)        # Map selection to display name        type_names = {            'lnRR': 'log Response Ratio (lnRR)',            'hedges_g': "Hedges' g",            'cohen_d': "Cohen's d",            'log_or': 'log Odds Ratio (logOR)'        }        print(f"\n‚úì Selected: {type_names[selected_type]}")        # Show if different from recommendation        if selected_type != recommended_type:            print(f"\n‚ö†Ô∏è  Note: You selected {type_names[selected_type]}")            print(f"    Recommendation was: {type_names[recommended_type]} ({confidence} confidence)")            print(f"    Your selection will be used for the analysis.")        else:            print(f"\n‚úì Selection matches recommendation ({confidence} confidence)")        # Configuration for each effect size type        es_configs = {            'lnRR': {                'effect_col': 'lnRR',                'var_col': 'var_lnRR',                'se_col': 'SE_lnRR',                'ci_lower_col': 'CI_lower_lnRR',                'ci_upper_col': 'CI_upper_lnRR',                'effect_label': 'log Response Ratio',                'effect_label_short': 'lnRR',                'has_fold_change': True,                'fold_change_col': 'Response_Ratio',                'percent_change_col': 'Percent_Change',                'null_value': 0,                'scale': 'log',                'allows_negative': False,                'allows_zero': False            },            'hedges_g': {                'effect_col': 'hedges_g',                'var_col': 'Vg',                'se_col': 'SE_g',                'ci_lower_col': 'CI_lower_g',                'ci_upper_col': 'CI_upper_g',                'effect_label': "Hedges' g",                'effect_label_short': 'g',                'has_fold_change': False,                'null_value': 0,                'scale': 'standardized',                'allows_negative': True,                'allows_zero': True,                'correction_factor': 'J'            },            'cohen_d': {                'effect_col': 'cohen_d',                'var_col': 'Vd',                'se_col': 'SE_d',                'ci_lower_col': 'CI_lower_d',                'ci_upper_col': 'CI_upper_d',                'effect_label': "Cohen's d",                'effect_label_short': 'd',                'has_fold_change': False,                'null_value': 0,                'scale': 'standardized',                'allows_negative': True,                'allows_zero': True,                'correction_factor': None            },            'log_or': {                'effect_col': 'log_OR',                'var_col': 'var_log_OR',                'se_col': 'SE_log_OR',                'ci_lower_col': 'CI_lower_log_OR',                'ci_upper_col': 'CI_upper_log_OR',                'effect_label': 'log Odds Ratio',                'effect_label_short': 'logOR',                'has_fold_change': True,                'fold_change_col': 'Odds_Ratio',                'null_value': 0,                'scale': 'log',                'allows_negative': False,                'allows_zero': False,                'requires_binary': True            }        }        # Save to ANALYSIS_CONFIG        ANALYSIS_CONFIG['effect_size_type'] = selected_type        ANALYSIS_CONFIG['es_config'] = es_configs[selected_type]        ANALYSIS_CONFIG['detection_metadata'] = DETECTION_METADATA        print(f"\nüìã Configuration Details:")        print(f"  Effect size column:      {ANALYSIS_CONFIG['es_config']['effect_col']}")        print(f"  Variance column:         {ANALYSIS_CONFIG['es_config']['var_col']}")        print(f"  Standard error column:   {ANALYSIS_CONFIG['es_config']['se_col']}")        print(f"  Effect label:            {ANALYSIS_CONFIG['es_config']['effect_label']}")        print(f"  Null hypothesis value:   {ANALYSIS_CONFIG['es_config']['null_value']}")        print(f"  Scale type:              {ANALYSIS_CONFIG['es_config']['scale']}")        print(f"  Allows negative values:  {ANALYSIS_CONFIG['es_config']['allows_negative']}")        if ANALYSIS_CONFIG['es_config']['has_fold_change']:            print(f"  Fold-change available:   Yes")            print(f"    - Column: {ANALYSIS_CONFIG['es_config']['fold_change_col']}")            if 'percent_change_col' in ANALYSIS_CONFIG['es_config']:                print(f"    - % Change: {ANALYSIS_CONFIG['es_config']['percent_change_col']}")        # Data compatibility check        print(f"\nüîç Data Compatibility Check:")        if selected_type == 'lnRR':            if has_negative_xe or has_negative_xc:                print(f"  ‚ùå ERROR: lnRR requires all positive values")                print(f"     Found {n_negative_xe + n_negative_xc} negative values")                print(f"     Please select Hedges' g or Cohen's d instead")                return            if has_zero_xe or has_zero_xc:                print(f"  ‚ö†Ô∏è  Warning: {n_zero_xe + n_zero_xc} zero values found")                print(f"     Small constant (0.001) will be added to avoid log(0)")            else:                print(f"  ‚úì All values positive and non-zero")        elif selected_type in ['hedges_g', 'cohen_d']:            if sd_pct < 50:                print(f"  ‚ö†Ô∏è  Warning: Only {sd_pct:.1f}% of observations have SD data")                print(f"     Effect size calculation may be limited")            else:                print(f"  ‚úì {sd_pct:.1f}% of observations have complete SD data")        elif selected_type == 'log_or':            print(f"  ‚ö†Ô∏è  Note: Assumes binary outcome data")            print(f"     Ensure xe/xc represent event counts")        print(f"\n" + "="*70)        print("‚úÖ CONFIGURATION COMPLETE")        print("="*70)        print(f"\n‚ñ∂Ô∏è  Next Steps:")        print(f"  1. Review the configuration above")        print(f"  2. Run the next cell to calculate effect sizes")        print(f"  3. Effect sizes will be calculated for {len(data_filtered)} observations")        print(f"\nüí° Tip: If you need to change the effect size type, modify the")        print(f"    selection above and click Confirm again before proceeding.")        print("\n" + "="*70)proceed_button.on_click(on_proceed_clicked)# --- ASSEMBLE WIDGET DISPLAY ---display(widgets.VBox([    widgets.HTML("<hr style='margin: 20px 0; border: none; border-top: 2px solid #ddd;'>"),    widgets.HTML("<h3 style='color: #2E86AB;'>üìä Select Effect Size Type</h3>"),    widgets.HTML("<p style='color: #666;'><i>Choose the effect size metric for your meta-analysis. "                 "The recommendation is pre-selected but you can override it if needed.</i></p>"),    effect_size_widget,    info_output,    widgets.HTML("<hr style='margin: 20px 0; border: none; border-top: 2px solid #ddd;'>"),    proceed_button,    proceed_output]))# --- FINAL STATUS ---print("\n" + "="*70)print("‚úì Effect size detection and selection interface ready")print("="*70)print("\nüëÜ INSTRUCTIONS:")print("  1. Review the recommendation above (based on data characteristics)")print("  2. Select your preferred effect size type (or keep recommendation)")print("  3. Review the detailed information for your selected type")print("  4. Click 'Confirm Selection & Calculate Effect Sizes' to proceed")print("\n" + "="*70)# Store summary for downstream useEFFECT_SIZE_SELECTION_SUMMARY = {    'timestamp': datetime.datetime.now(),    'data_characteristics': {        'n_observations': len(data_filtered),        'n_studies': data_filtered['id'].nunique(),        'control_normalization_pct': pct_control_exactly_one,        'has_negative_values': has_negative_xe or has_negative_xc,        'has_zero_values': has_zero_xe or has_zero_xc,        'scale_ratio': scale_ratio,        'sd_availability_pct': sd_pct    },    'recommendation': {        'type': recommended_type,        'confidence': confidence,        'score_lnRR': score_lnRR,        'score_hedges_g': score_hedges_g,        'key_factors': confidence_factors    }}print(f"\nüìä Summary stored in EFFECT_SIZE_SELECTION_SUMMARY and DETECTION_METADATA")

In [None]:
# ‚ö†Ô∏è PREREQUISITES:# - Must complete configuration first# - Ensure required columns are present in your data## Expected runtime: < 5 seconds for most datasets# For large datasets (n > 500): up to 30 seconds## INTERPRETATION NOTE:# - Effect sizes will be calculated based on your selected method# - Check for warnings about missing data or calculation errors# Hedges' g and Cohen's d Calculation# Methods from Hedges & Olkin (1985). Statistical methods for meta-analysis.# Academic Press.## Cohen's d = (M1 - M2) / SD_pooled# Hedges' g = d * (1 - 3/(4*df - 1))  # Small sample correction## These are standardized mean differences, common in psychology and medicine.# Log Response Ratio (lnRR) Calculation# Method from Hedges et al. (1999) and Lajeunesse (2011).# Lajeunesse, M.J. (2011). On the meta-analysis of response ratios for studies with# correlated and multi-group designs. Ecology, 92(11), 2049-2055.## lnRR = ln(mean_treatment / mean_control)# Commonly used in ecology and environmental sciences.#@title üßÆ CALCULATE EFFECT SIZES#@title üßÆ CALCULATE EFFECT SIZES# =============================================================================# CELL 5: EFFECT SIZE CALCULATION# Purpose: Calculate effect sizes, variances, and weights for meta-analysis# Dependencies: Cell 4 (ANALYSIS_CONFIG, data_filtered)# Outputs: data_filtered with effect sizes, EFFECT_SIZE_METADATA# =============================================================================print("\n" + "="*70)print("EFFECT SIZE CALCULATION")print("="*70)print(f"Timestamp: {datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")# --- STEP 1: LOAD CONFIGURATION ---print("\n" + "="*70)print("STEP 1: LOADING CONFIGURATION")print("="*70)try:    effect_size_type = ANALYSIS_CONFIG['effect_size_type']    es_config = ANALYSIS_CONFIG['es_config']    print(f"‚úì Configuration loaded successfully")    print(f"  Effect size type: {es_config['effect_label']} ({es_config['effect_label_short']})")    print(f"  Scale: {es_config['scale']}")    print(f"  Allows negatives: {es_config['allows_negative']}")    print(f"  Null value: {es_config['null_value']}")except KeyError as e:    print(f"‚ùå ERROR: Configuration not found - {e}")    print("\nTroubleshooting:")    print("  1. Ensure Cell 4 (effect size selection) was run successfully")    print("  2. Check that you clicked 'Confirm Selection' button")    print("  3. Verify ANALYSIS_CONFIG exists with 'effect_size_type' key")    raise# Store initial dataset sizeinitial_obs = len(data_filtered)initial_papers = data_filtered['id'].nunique()print(f"\nüìä Input Dataset:")print(f"  Observations: {initial_obs}")print(f"  Papers: {initial_papers}")# --- STEP 2: VERIFY REQUIRED DATA COLUMNS ---print("\n" + "="*70)print("STEP 2: DATA VALIDATION")print("="*70)required_for_calculation = ['xe', 'sde', 'ne', 'xc', 'sdc', 'nc']missing_cols = [col for col in required_for_calculation if col not in data_filtered.columns]if missing_cols:    print(f"‚ùå ERROR: Missing required columns: {missing_cols}")    raise ValueError(f"Missing required columns: {missing_cols}")print(f"‚úì All required columns present")# Check data availabilitydata_availability = {}for col in required_for_calculation:    n_valid = data_filtered[col].notna().sum()    pct_valid = (n_valid / len(data_filtered)) * 100    data_availability[col] = {'valid': n_valid, 'pct': pct_valid}    print(f"  ‚Ä¢ {col}: {n_valid}/{len(data_filtered)} valid ({pct_valid:.1f}%)")# --- STEP 3: HANDLE ZERO/MISSING STANDARD DEVIATIONS ---print("\n" + "="*70)print("STEP 3: STANDARD DEVIATION IMPUTATION")print("="*70)print("üîß Processing standard deviations...")# Track imputation statisticsimputation_log = {    'method': 'median_cv',    'sde_zeros': 0,    'sdc_zeros': 0,    'sde_missing': 0,    'sdc_missing': 0,    'sde_imputed': 0,    'sdc_imputed': 0}# Count initial issuesimputation_log['sde_zeros'] = (data_filtered['sde'] == 0).sum()imputation_log['sdc_zeros'] = (data_filtered['sdc'] == 0).sum()imputation_log['sde_missing'] = data_filtered['sde'].isna().sum()imputation_log['sdc_missing'] = data_filtered['sdc'].isna().sum()print(f"\nüìã Initial SD Status:")print(f"  Experimental (sde):")print(f"    ‚Ä¢ Zero values:    {imputation_log['sde_zeros']}")print(f"    ‚Ä¢ Missing values: {imputation_log['sde_missing']}")print(f"    ‚Ä¢ Total issues:   {imputation_log['sde_zeros'] + imputation_log['sde_missing']}")print(f"  Control (sdc):")print(f"    ‚Ä¢ Zero values:    {imputation_log['sdc_zeros']}")print(f"    ‚Ä¢ Missing values: {imputation_log['sdc_missing']}")print(f"    ‚Ä¢ Total issues:   {imputation_log['sdc_zeros'] + imputation_log['sdc_missing']}")# Replace zeros with NaN for proper imputationdata_filtered['sde'] = data_filtered['sde'].replace(0, np.nan)data_filtered['sdc'] = data_filtered['sdc'].replace(0, np.nan)# Calculate Coefficient of Variation (CV = SD/Mean) for imputationprint(f"\nüî¨ Calculating Coefficient of Variation (CV)...")data_filtered['cv_e'] = np.nandata_filtered['cv_c'] = np.nan# Calculate CV only for valid entries (non-missing SD, positive mean)valid_cv_e = (data_filtered['sde'] > 0) & (data_filtered['xe'] > 0)valid_cv_c = (data_filtered['sdc'] > 0) & (data_filtered['xc'] > 0)data_filtered.loc[valid_cv_e, 'cv_e'] = data_filtered.loc[valid_cv_e, 'sde'] / data_filtered.loc[valid_cv_e, 'xe']data_filtered.loc[valid_cv_c, 'cv_c'] = data_filtered.loc[valid_cv_c, 'sdc'] / data_filtered.loc[valid_cv_c, 'xc']# Use MEDIAN CV for robustness (less sensitive to outliers than mean)median_cv_e = data_filtered['cv_e'].median()median_cv_c = data_filtered['cv_c'].median()mean_cv_e = data_filtered['cv_e'].mean()mean_cv_c = data_filtered['cv_c'].mean()print(f"\n  CV Statistics (Experimental):")print(f"    ‚Ä¢ Valid CVs:   {valid_cv_e.sum()}/{len(data_filtered)} ({(valid_cv_e.sum()/len(data_filtered))*100:.1f}%)")print(f"    ‚Ä¢ Median CV:   {median_cv_e:.4f}")print(f"    ‚Ä¢ Mean CV:     {mean_cv_e:.4f}")print(f"    ‚Ä¢ Min CV:      {data_filtered['cv_e'].min():.4f}")print(f"    ‚Ä¢ Max CV:      {data_filtered['cv_e'].max():.4f}")print(f"\n  CV Statistics (Control):")print(f"    ‚Ä¢ Valid CVs:   {valid_cv_c.sum()}/{len(data_filtered)} ({(valid_cv_c.sum()/len(data_filtered))*100:.1f}%)")print(f"    ‚Ä¢ Median CV:   {median_cv_c:.4f}")print(f"    ‚Ä¢ Mean CV:     {mean_cv_c:.4f}")print(f"    ‚Ä¢ Min CV:      {data_filtered['cv_c'].min():.4f}")print(f"    ‚Ä¢ Max CV:      {data_filtered['cv_c'].max():.4f}")# Store CV statisticsimputation_log['median_cv_e'] = median_cv_eimputation_log['median_cv_c'] = median_cv_cimputation_log['mean_cv_e'] = mean_cv_eimputation_log['mean_cv_c'] = mean_cv_cimputation_log['n_valid_cv_e'] = valid_cv_e.sum()imputation_log['n_valid_cv_c'] = valid_cv_c.sum()# Create imputed SD columnsprint(f"\nüîß Applying imputation...")data_filtered['sde_imputed'] = data_filtered['sde'].copy()data_filtered['sdc_imputed'] = data_filtered['sdc'].copy()# Track which rows were imputeddata_filtered['sde_was_imputed'] = Falsedata_filtered['sdc_was_imputed'] = False# Impute experimental groupimpute_e = (data_filtered['sde_imputed'].isna()) & (data_filtered['xe'] > 0)n_imputed_e = impute_e.sum()if n_imputed_e > 0 and pd.notna(median_cv_e):    data_filtered.loc[impute_e, 'sde_imputed'] = median_cv_e * data_filtered.loc[impute_e, 'xe']    data_filtered.loc[impute_e, 'sde_was_imputed'] = True    imputation_log['sde_imputed'] = n_imputed_e    print(f"  ‚úì Imputed {n_imputed_e} experimental SDs using median CV method")    print(f"    Formula: SD_imputed = {median_cv_e:.4f} √ó mean")elif n_imputed_e > 0:    print(f"  ‚ö†Ô∏è  Warning: {n_imputed_e} experimental SDs need imputation but CV unavailable")# Impute control groupimpute_c = (data_filtered['sdc_imputed'].isna()) & (data_filtered['xc'] > 0)n_imputed_c = impute_c.sum()if n_imputed_c > 0 and pd.notna(median_cv_c):    data_filtered.loc[impute_c, 'sdc_imputed'] = median_cv_c * data_filtered.loc[impute_c, 'xc']    data_filtered.loc[impute_c, 'sdc_was_imputed'] = True    imputation_log['sdc_imputed'] = n_imputed_c    print(f"  ‚úì Imputed {n_imputed_c} control SDs using median CV method")    print(f"    Formula: SD_imputed = {median_cv_c:.4f} √ó mean")elif n_imputed_c > 0:    print(f"  ‚ö†Ô∏è  Warning: {n_imputed_c} control SDs need imputation but CV unavailable")# Final check for remaining issuesremaining_issues_e = (data_filtered['sde_imputed'].isna()) | (data_filtered['sde_imputed'] <= 0)remaining_issues_c = (data_filtered['sdc_imputed'].isna()) | (data_filtered['sdc_imputed'] <= 0)remaining_issues = remaining_issues_e | remaining_issues_cif remaining_issues.any():    n_issues = remaining_issues.sum()    print(f"\n  ‚ö†Ô∏è  WARNING: {n_issues} observations still have invalid SDs after imputation")    print(f"    These observations will be removed from analysis")    # Show details    print(f"\n    Breakdown:")    print(f"      ‚Ä¢ Experimental SD issues: {remaining_issues_e.sum()}")    print(f"      ‚Ä¢ Control SD issues:      {remaining_issues_c.sum()}")    # Remove problematic rows    data_filtered = data_filtered[~remaining_issues].copy()    imputation_log['removed_after_imputation'] = n_issueselse:    print(f"\n  ‚úì All observations have valid SDs (original or imputed)")    imputation_log['removed_after_imputation'] = 0# Summary of imputationtotal_imputed = n_imputed_e + n_imputed_ctotal_original_issues = (imputation_log['sde_zeros'] + imputation_log['sde_missing'] +                         imputation_log['sdc_zeros'] + imputation_log['sdc_missing'])print(f"\nüìä Imputation Summary:")print(f"  Total SD issues found:     {total_original_issues}")print(f"  Total SDs imputed:         {total_imputed}")print(f"  Observations removed:      {imputation_log['removed_after_imputation']}")print(f"  Observations remaining:    {len(data_filtered)}")print(f"  Imputation success rate:   {(total_imputed/(total_original_issues + 0.0001))*100:.1f}%")# --- STEP 4: HANDLE ZERO/NEGATIVE VALUES (FOR RATIO MEASURES) ---if effect_size_type in ['lnRR', 'log_or']:    print("\n" + "="*70)    print("STEP 4: ZERO/NEGATIVE VALUE HANDLING (RATIO MEASURES)")    print("="*70)    print(f"\nüîç Checking for incompatible values...")    # Check for zero values    zero_xe = (data_filtered['xe'] == 0).sum()    zero_xc = (data_filtered['xc'] == 0).sum()    # Check for negative values    neg_xe = (data_filtered['xe'] < 0).sum()    neg_xc = (data_filtered['xc'] < 0).sum()    print(f"\n  Zero values:")    print(f"    ‚Ä¢ Treatment (xe): {zero_xe}")    print(f"    ‚Ä¢ Control (xc):   {zero_xc}")    print(f"    ‚Ä¢ Total:          {zero_xe + zero_xc}")    print(f"\n  Negative values:")    print(f"    ‚Ä¢ Treatment (xe): {neg_xe}")    print(f"    ‚Ä¢ Control (xc):   {neg_xc}")    print(f"    ‚Ä¢ Total:          {neg_xe + neg_xc}")    # Handle negative values (must be removed)    if neg_xe > 0 or neg_xc > 0:        print(f"\n  ‚ùå Removing {neg_xe + neg_xc} observations with negative values")        print(f"     (log ratio requires all positive values)")        negative_mask = (data_filtered['xe'] < 0) | (data_filtered['xc'] < 0)        data_filtered = data_filtered[~negative_mask].copy()    # Handle zero values (add small constant)    if zero_xe > 0 or zero_xc > 0:        ZERO_CONSTANT = 0.001        print(f"\n  üîß Handling {zero_xe + zero_xc} zero values:")        print(f"     Adding small constant: {ZERO_CONSTANT}")        data_filtered.loc[data_filtered['xe'] == 0, 'xe'] = ZERO_CONSTANT        data_filtered.loc[data_filtered['xc'] == 0, 'xc'] = ZERO_CONSTANT        print(f"     ‚úì Zero values adjusted to avoid log(0)")    if neg_xe + neg_xc + zero_xe + zero_xc == 0:        print(f"\n  ‚úì All values positive and non-zero")    print(f"\n  Observations remaining: {len(data_filtered)}")    # --- STEP 5: CALCULATE EFFECT SIZE BASED ON TYPE ---print("\n" + "="*70)print("STEP 5: EFFECT SIZE CALCULATION")print("="*70)calculation_log = {    'type': effect_size_type,    'timestamp': datetime.datetime.now(),    'n_observations': len(data_filtered)}print(f"\nüßÆ Calculating {es_config['effect_label']}...")print(f"   Method: {effect_size_type}")print(f"   Observations: {len(data_filtered)}")if effect_size_type == 'lnRR':    # ========================================    # LOG RESPONSE RATIO (lnRR)    # ========================================    print(f"\nüìê Formula: lnRR = ln(xÃÑ‚Çë / xÃÑ‚Çú)")    print(f"   Variance: Var(lnRR) = SD¬≤‚Çë/(n‚Çë¬∑xÃÑ¬≤‚Çë) + SD¬≤‚Çú/(n‚Çú¬∑xÃÑ¬≤‚Çú)")    # Calculate lnRR    data_filtered['lnRR'] = np.log(data_filtered['xe'] / data_filtered['xc'])    # Calculate variance using delta method    data_filtered['var_lnRR'] = (        (data_filtered['sde_imputed']**2 / (data_filtered['ne'] * data_filtered['xe']**2)) +        (data_filtered['sdc_imputed']**2 / (data_filtered['nc'] * data_filtered['xc']**2))    )    # Calculate standard error    data_filtered['SE_lnRR'] = np.sqrt(data_filtered['var_lnRR'])    # Calculate 95% confidence intervals    data_filtered['CI_lower_lnRR'] = data_filtered['lnRR'] - 1.96 * data_filtered['SE_lnRR']    data_filtered['CI_upper_lnRR'] = data_filtered['lnRR'] + 1.96 * data_filtered['SE_lnRR']    # Convert to Response Ratio (RR) for interpretation    data_filtered['Response_Ratio'] = np.exp(data_filtered['lnRR'])    data_filtered['RR_CI_lower'] = np.exp(data_filtered['CI_lower_lnRR'])    data_filtered['RR_CI_upper'] = np.exp(data_filtered['CI_upper_lnRR'])    # Calculate fold-change (with sign for direction)    # Positive lnRR = upregulation (e.g., 2-fold increase = 2√ó)    # Negative lnRR = downregulation (e.g., 2-fold decrease = -2√ó)    data_filtered['fold_change'] = data_filtered.apply(        lambda row: row['Response_Ratio'] if row['lnRR'] >= 0 else -1/row['Response_Ratio'],        axis=1    )    # Calculate percent change    data_filtered['Percent_Change'] = (data_filtered['Response_Ratio'] - 1) * 100    # Set primary effect size column names    effect_col = 'lnRR'    var_col = 'var_lnRR'    se_col = 'SE_lnRR'    calculation_log['columns_created'] = [        'lnRR', 'var_lnRR', 'SE_lnRR', 'CI_lower_lnRR', 'CI_upper_lnRR',        'Response_Ratio', 'RR_CI_lower', 'RR_CI_upper', 'fold_change', 'Percent_Change'    ]    print(f"\n  ‚úì lnRR calculated for {len(data_filtered)} observations")    print(f"\n  üìä Columns created:")    print(f"     ‚Ä¢ lnRR: Log response ratio (effect size)")    print(f"     ‚Ä¢ var_lnRR: Variance of lnRR")    print(f"     ‚Ä¢ SE_lnRR: Standard error of lnRR")    print(f"     ‚Ä¢ CI_lower/upper_lnRR: 95% confidence intervals")    print(f"     ‚Ä¢ Response_Ratio: RR = exp(lnRR)")    print(f"     ‚Ä¢ fold_change: Directional fold-change")    print(f"     ‚Ä¢ Percent_Change: % change from control")elif effect_size_type == 'hedges_g':    # ========================================    # HEDGES' G (STANDARDIZED MEAN DIFFERENCE)    # ========================================    print(f"\nüìê Formula: g = [(xÃÑ‚Çë - xÃÑ‚Çú) / SD‚Çö‚Çí‚Çí‚Çó‚Çëùíπ] √ó J")    print(f"   J = 1 - 3/(4¬∑df - 1)  [small-sample correction]")    print(f"   Variance: Vg = [(n‚Çë+n‚Çú)/(n‚Çë¬∑n‚Çú) + g¬≤/(2(n‚Çë+n‚Çú))] √ó J¬≤")    # Degrees of freedom    data_filtered['df'] = data_filtered['ne'] + data_filtered['nc'] - 2    print(f"\n  üî¢ Calculating pooled standard deviation...")    # Pooled Standard Deviation    data_filtered['sp_squared'] = (        ((data_filtered['ne'] - 1) * data_filtered['sde_imputed']**2 +         (data_filtered['nc'] - 1) * data_filtered['sdc_imputed']**2) /        data_filtered['df']    )    data_filtered['sp'] = np.sqrt(data_filtered['sp_squared'])    print(f"     ‚Ä¢ Mean pooled SD: {data_filtered['sp'].mean():.4f}")    print(f"     ‚Ä¢ Median pooled SD: {data_filtered['sp'].median():.4f}")    # Cohen's d (uncorrected)    data_filtered['cohen_d'] = (data_filtered['xe'] - data_filtered['xc']) / data_filtered['sp']    print(f"\n  üî¢ Applying Hedges' correction for small samples...")    # Hedges' g correction factor (J)    # Using approximation: J ‚âà 1 - 3/(4*df - 1)    data_filtered['hedges_j'] = 1 - (3 / (4 * data_filtered['df'] - 1))    print(f"     ‚Ä¢ Mean J factor: {data_filtered['hedges_j'].mean():.6f}")    print(f"     ‚Ä¢ Min J factor: {data_filtered['hedges_j'].min():.6f}")    print(f"     ‚Ä¢ Max J factor: {data_filtered['hedges_j'].max():.6f}")    # Hedges' g    data_filtered['hedges_g'] = data_filtered['cohen_d'] * data_filtered['hedges_j']    # Variance of Hedges' g    data_filtered['Vg'] = (        ((data_filtered['ne'] + data_filtered['nc']) / (data_filtered['ne'] * data_filtered['nc']) +         (data_filtered['hedges_g']**2) / (2 * (data_filtered['ne'] + data_filtered['nc']))) *        (data_filtered['hedges_j']**2)    )    # Standard error    data_filtered['SE_g'] = np.sqrt(data_filtered['Vg'])    # Calculate 95% confidence intervals    data_filtered['CI_lower_g'] = data_filtered['hedges_g'] - 1.96 * data_filtered['SE_g']    data_filtered['CI_upper_g'] = data_filtered['hedges_g'] + 1.96 * data_filtered['SE_g']    # Set primary effect size column names    effect_col = 'hedges_g'    var_col = 'Vg'    se_col = 'SE_g'    calculation_log['columns_created'] = [        'hedges_g', 'Vg', 'SE_g', 'CI_lower_g', 'CI_upper_g',        'cohen_d', 'hedges_j', 'sp', 'sp_squared', 'df'    ]    print(f"\n  ‚úì Hedges' g calculated for {len(data_filtered)} observations")    print(f"\n  üìä Columns created:")    print(f"     ‚Ä¢ hedges_g: Hedges' g (effect size with correction)")    print(f"     ‚Ä¢ cohen_d: Cohen's d (uncorrected)")    print(f"     ‚Ä¢ Vg: Variance of Hedges' g")    print(f"     ‚Ä¢ SE_g: Standard error of Hedges' g")    print(f"     ‚Ä¢ CI_lower/upper_g: 95% confidence intervals")    print(f"     ‚Ä¢ sp: Pooled standard deviation")    print(f"     ‚Ä¢ hedges_j: Small-sample correction factor")    # Effect size magnitude classification    small = ((data_filtered['hedges_g'].abs() >= 0.2) & (data_filtered['hedges_g'].abs() < 0.5)).sum()    medium = ((data_filtered['hedges_g'].abs() >= 0.5) & (data_filtered['hedges_g'].abs() < 0.8)).sum()    large = (data_filtered['hedges_g'].abs() >= 0.8).sum()    negligible = (data_filtered['hedges_g'].abs() < 0.2).sum()    print(f"\n  üìè Effect Size Magnitude (Cohen's benchmarks):")    print(f"     ‚Ä¢ Negligible (|g| < 0.2):   {negligible} ({negligible/len(data_filtered)*100:.1f}%)")    print(f"     ‚Ä¢ Small (0.2 ‚â§ |g| < 0.5):  {small} ({small/len(data_filtered)*100:.1f}%)")    print(f"     ‚Ä¢ Medium (0.5 ‚â§ |g| < 0.8): {medium} ({medium/len(data_filtered)*100:.1f}%)")    print(f"     ‚Ä¢ Large (|g| ‚â• 0.8):        {large} ({large/len(data_filtered)*100:.1f}%)")elif effect_size_type == 'cohen_d':    # ========================================    # COHEN'S D (NO SMALL-SAMPLE CORRECTION)    # ========================================    print(f"\nüìê Formula: d = (xÃÑ‚Çë - xÃÑ‚Çú) / SD‚Çö‚Çí‚Çí‚Çó‚Çëùíπ")    print(f"   Variance: Vd = (n‚Çë+n‚Çú)/(n‚Çë¬∑n‚Çú) + d¬≤/(2(n‚Çë+n‚Çú))")    print(f"   Note: No small-sample correction applied")    # Degrees of freedom    data_filtered['df'] = data_filtered['ne'] + data_filtered['nc'] - 2    print(f"\n  üî¢ Calculating pooled standard deviation...")    # Pooled Standard Deviation    data_filtered['sp_squared'] = (        ((data_filtered['ne'] - 1) * data_filtered['sde_imputed']**2 +         (data_filtered['nc'] - 1) * data_filtered['sdc_imputed']**2) /        data_filtered['df']    )    data_filtered['sp'] = np.sqrt(data_filtered['sp_squared'])    print(f"     ‚Ä¢ Mean pooled SD: {data_filtered['sp'].mean():.4f}")    print(f"     ‚Ä¢ Median pooled SD: {data_filtered['sp'].median():.4f}")    # Cohen's d    data_filtered['cohen_d'] = (data_filtered['xe'] - data_filtered['xc']) / data_filtered['sp']    # Variance of Cohen's d    data_filtered['Vd'] = (        (data_filtered['ne'] + data_filtered['nc']) / (data_filtered['ne'] * data_filtered['nc']) +        (data_filtered['cohen_d']**2) / (2 * (data_filtered['ne'] + data_filtered['nc']))    )    # Standard error    data_filtered['SE_d'] = np.sqrt(data_filtered['Vd'])    # Calculate 95% confidence intervals    data_filtered['CI_lower_d'] = data_filtered['cohen_d'] - 1.96 * data_filtered['SE_d']    data_filtered['CI_upper_d'] = data_filtered['cohen_d'] + 1.96 * data_filtered['SE_d']    # Set primary effect size column names    effect_col = 'cohen_d'    var_col = 'Vd'    se_col = 'SE_d'    calculation_log['columns_created'] = [        'cohen_d', 'Vd', 'SE_d', 'CI_lower_d', 'CI_upper_d',        'sp', 'sp_squared', 'df'    ]    print(f"\n  ‚úì Cohen's d calculated for {len(data_filtered)} observations")    print(f"\n  üìä Columns created:")    print(f"     ‚Ä¢ cohen_d: Cohen's d (effect size)")    print(f"     ‚Ä¢ Vd: Variance of Cohen's d")    print(f"     ‚Ä¢ SE_d: Standard error of Cohen's d")    print(f"     ‚Ä¢ CI_lower/upper_d: 95% confidence intervals")    print(f"     ‚Ä¢ sp: Pooled standard deviation")    # Effect size magnitude classification    small = ((data_filtered['cohen_d'].abs() >= 0.2) & (data_filtered['cohen_d'].abs() < 0.5)).sum()    medium = ((data_filtered['cohen_d'].abs() >= 0.5) & (data_filtered['cohen_d'].abs() < 0.8)).sum()    large = (data_filtered['cohen_d'].abs() >= 0.8).sum()    negligible = (data_filtered['cohen_d'].abs() < 0.2).sum()    print(f"\n  üìè Effect Size Magnitude (Cohen's benchmarks):")    print(f"     ‚Ä¢ Negligible (|d| < 0.2):   {negligible} ({negligible/len(data_filtered)*100:.1f}%)")    print(f"     ‚Ä¢ Small (0.2 ‚â§ |d| < 0.5):  {small} ({small/len(data_filtered)*100:.1f}%)")    print(f"     ‚Ä¢ Medium (0.5 ‚â§ |d| < 0.8): {medium} ({medium/len(data_filtered)*100:.1f}%)")    print(f"     ‚Ä¢ Large (|d| ‚â• 0.8):        {large} ({large/len(data_filtered)*100:.1f}%)")    # Sample size warning    small_samples = (data_filtered['df'] < 20).sum()    if small_samples > 0:        print(f"\n  ‚ö†Ô∏è  Warning: {small_samples} observations have small samples (df < 20)")        print(f"     Consider using Hedges' g instead for small-sample correction")elif effect_size_type == 'log_or':    # ========================================    # LOG ODDS RATIO    # ========================================    print(f"\n‚ö†Ô∏è  Note: log Odds Ratio implementation")    print(f"   Current implementation treats xe/xc as odds or proportions")    print(f"   For 2√ó2 contingency tables, ensure proper data format")    print(f"\nüìê Formula: logOR = ln(xe / xc)")    print(f"   Variance: Var(logOR) ‚âà SD¬≤‚Çë/(n‚Çë¬∑xe¬≤) + SD¬≤‚Çú/(n‚Çú¬∑xc¬≤)")    # Check for values in valid range    invalid_values = ((data_filtered['xe'] < 0) | (data_filtered['xc'] < 0) |                      (data_filtered['xe'] == 0) | (data_filtered['xc'] == 0))    if invalid_values.any():        print(f"\n  ‚ö†Ô∏è  WARNING: {invalid_values.sum()} observations have invalid values")        print(f"     Removing observations with xe ‚â§ 0 or xc ‚â§ 0")        data_filtered = data_filtered[~invalid_values].copy()    # Calculate log OR    data_filtered['log_OR'] = np.log(data_filtered['xe'] / data_filtered['xc'])    # Calculate variance (simplified - assumes xe, xc are odds/proportions)    data_filtered['var_log_OR'] = (        (data_filtered['sde_imputed']**2 / (data_filtered['ne'] * data_filtered['xe']**2)) +        (data_filtered['sdc_imputed']**2 / (data_filtered['nc'] * data_filtered['xc']**2))    )    # Standard error    data_filtered['SE_log_OR'] = np.sqrt(data_filtered['var_log_OR'])    # Calculate 95% confidence intervals    data_filtered['CI_lower_log_OR'] = data_filtered['log_OR'] - 1.96 * data_filtered['SE_log_OR']    data_filtered['CI_upper_log_OR'] = data_filtered['log_OR'] + 1.96 * data_filtered['SE_log_OR']    # Convert to Odds Ratio    data_filtered['Odds_Ratio'] = np.exp(data_filtered['log_OR'])    data_filtered['OR_CI_lower'] = np.exp(data_filtered['CI_lower_log_OR'])    data_filtered['OR_CI_upper'] = np.exp(data_filtered['CI_upper_log_OR'])    # Set primary effect size column names    effect_col = 'log_OR'    var_col = 'var_log_OR'    se_col = 'SE_log_OR'    calculation_log['columns_created'] = [        'log_OR', 'var_log_OR', 'SE_log_OR', 'CI_lower_log_OR', 'CI_upper_log_OR',        'Odds_Ratio', 'OR_CI_lower', 'OR_CI_upper'    ]    print(f"\n  ‚úì log Odds Ratio calculated for {len(data_filtered)} observations")    print(f"\n  üìä Columns created:")    print(f"     ‚Ä¢ log_OR: Log odds ratio (effect size)")    print(f"     ‚Ä¢ var_log_OR: Variance of log OR")    print(f"     ‚Ä¢ SE_log_OR: Standard error of log OR")    print(f"     ‚Ä¢ CI_lower/upper_log_OR: 95% confidence intervals")    print(f"     ‚Ä¢ Odds_Ratio: OR = exp(logOR)")    print(f"\n  ‚ö†Ô∏è  Please verify results are appropriate for your data structure")else:    raise ValueError(f"Unknown effect size type: {effect_size_type}")calculation_log['effect_col'] = effect_colcalculation_log['var_col'] = var_colcalculation_log['se_col'] = se_col# --- STEP 6: CALCULATE FIXED-EFFECTS WEIGHTS ---print("\n" + "="*70)print("STEP 6: CALCULATING WEIGHTS")print("="*70)print(f"\n‚öñÔ∏è  Calculating inverse-variance weights...")print(f"   Formula: w = 1 / Var({es_config['effect_label_short']})")data_filtered['w_fixed'] = 1 / data_filtered[var_col]# Handle infinite weightsinf_weights = np.isinf(data_filtered['w_fixed']).sum()if inf_weights > 0:    print(f"\n  ‚ö†Ô∏è  Warning: {inf_weights} infinite weights detected (variance = 0)")    print(f"     Replacing with NaN for removal")    data_filtered['w_fixed'] = data_filtered['w_fixed'].replace([np.inf, -np.inf], np.nan)# Weight statisticsprint(f"\n  üìä Weight Statistics:")print(f"     ‚Ä¢ Mean weight:   {data_filtered['w_fixed'].mean():.2f}")print(f"     ‚Ä¢ Median weight: {data_filtered['w_fixed'].median():.2f}")print(f"     ‚Ä¢ Min weight:    {data_filtered['w_fixed'].min():.2f}")print(f"     ‚Ä¢ Max weight:    {data_filtered['w_fixed'].max():.2f}")print(f"     ‚Ä¢ Std weight:    {data_filtered['w_fixed'].std():.2f}")# Check weight distributionweight_ratio = data_filtered['w_fixed'].max() / (data_filtered['w_fixed'].min() + 0.0001)print(f"\n  üìè Weight ratio (max/min): {weight_ratio:.2f}")if weight_ratio > 1000:    print(f"     ‚ö†Ô∏è  Very large weight range - one study may dominate")elif weight_ratio > 100:    print(f"     ‚ö†Ô∏è  Large weight range - check for influential studies")else:    print(f"     ‚úì Reasonable weight range")print(f"\n  ‚úì Fixed-effects weights calculated")# --- STEP 7: CLEAN DATA ---print("\n" + "="*70)print("STEP 7: FINAL DATA CLEANING")print("="*70)print(f"\nüßπ Removing observations with missing critical values...")# Define critical columnscritical_cols = [effect_col, var_col, se_col, 'w_fixed']initial_n = len(data_filtered)# Check for missing valuesmissing_summary = {}for col in critical_cols:    n_missing = data_filtered[col].isna().sum()    missing_summary[col] = n_missing    if n_missing > 0:        print(f"  ‚Ä¢ {col}: {n_missing} missing")# Remove rows with NaN in critical columnsdata_filtered = data_filtered.dropna(subset=critical_cols).copy()final_n = len(data_filtered)removed = initial_n - final_nif removed > 0:    print(f"\n  ‚ö†Ô∏è  Removed {removed} observations with missing critical values")    print(f"     ({(removed/initial_n)*100:.1f}% of dataset)")else:    print(f"\n  ‚úì No missing values in critical columns")print(f"\n  üìä Final dataset: {final_n} observations")calculation_log['final_n'] = final_ncalculation_log['removed_in_cleaning'] = removed# Continue to Part 3...# --- STEP 8: EFFECT SIZE SUMMARY STATISTICS ---print("\n" + "="*70)print("STEP 8: EFFECT SIZE SUMMARY STATISTICS")print("="*70)# Calculate comprehensive statisticseffect_stats = {    'count': data_filtered[effect_col].count(),    'mean': data_filtered[effect_col].mean(),    'median': data_filtered[effect_col].median(),    'std': data_filtered[effect_col].std(),    'min': data_filtered[effect_col].min(),    'max': data_filtered[effect_col].max(),    'q25': data_filtered[effect_col].quantile(0.25),    'q75': data_filtered[effect_col].quantile(0.75),    'iqr': data_filtered[effect_col].quantile(0.75) - data_filtered[effect_col].quantile(0.25)}var_stats = {    'mean': data_filtered[var_col].mean(),    'median': data_filtered[var_col].median(),    'std': data_filtered[var_col].std(),    'min': data_filtered[var_col].min(),    'max': data_filtered[var_col].max()}se_stats = {    'mean': data_filtered[se_col].mean(),    'median': data_filtered[se_col].median(),    'std': data_filtered[se_col].std(),    'min': data_filtered[se_col].min(),    'max': data_filtered[se_col].max()}print(f"\nüìä {es_config['effect_label']} ({es_config['effect_label_short']}):")print(f"  {'Statistic':<15} {'Value':>12}")print(f"  {'-'*15} {'-'*12}")print(f"  {'Count':<15} {effect_stats['count']:>12}")print(f"  {'Mean':<15} {effect_stats['mean']:>12.4f}")print(f"  {'Median':<15} {effect_stats['median']:>12.4f}")print(f"  {'Std Dev':<15} {effect_stats['std']:>12.4f}")print(f"  {'Min':<15} {effect_stats['min']:>12.4f}")print(f"  {'Q1 (25%)':<15} {effect_stats['q25']:>12.4f}")print(f"  {'Q3 (75%)':<15} {effect_stats['q75']:>12.4f}")print(f"  {'Max':<15} {effect_stats['max']:>12.4f}")print(f"  {'IQR':<15} {effect_stats['iqr']:>12.4f}")print(f"\nüìä Variance ({var_col}):")print(f"  {'Statistic':<15} {'Value':>12}")print(f"  {'-'*15} {'-'*12}")print(f"  {'Mean':<15} {var_stats['mean']:>12.6f}")print(f"  {'Median':<15} {var_stats['median']:>12.6f}")print(f"  {'Std Dev':<15} {var_stats['std']:>12.6f}")print(f"  {'Min':<15} {var_stats['min']:>12.6f}")print(f"  {'Max':<15} {var_stats['max']:>12.6f}")print(f"\nüìä Standard Error ({se_col}):")print(f"  {'Statistic':<15} {'Value':>12}")print(f"  {'-'*15} {'-'*12}")print(f"  {'Mean':<15} {se_stats['mean']:>12.4f}")print(f"  {'Median':<15} {se_stats['median']:>12.4f}")print(f"  {'Std Dev':<15} {se_stats['std']:>12.4f}")print(f"  {'Min':<15} {se_stats['min']:>12.4f}")print(f"  {'Max':<15} {se_stats['max']:>12.4f}")# Store statisticscalculation_log['effect_stats'] = effect_statscalculation_log['var_stats'] = var_statscalculation_log['se_stats'] = se_stats# --- STEP 9: DIRECTION AND MAGNITUDE ANALYSIS ---print("\n" + "="*70)print("STEP 9: EFFECT DIRECTION & MAGNITUDE ANALYSIS")print("="*70)# Analysis depends on effect size typeif effect_size_type == 'lnRR':    # Direction analysis for log response ratio    print(f"\nüìà Effect Direction Analysis:")    # Define thresholds    upregulation_threshold = 0.05  # ~5% increase    downregulation_threshold = -0.05  # ~5% decrease    n_upregulation = (data_filtered[effect_col] > upregulation_threshold).sum()    n_downregulation = (data_filtered[effect_col] < downregulation_threshold).sum()    n_no_change = len(data_filtered) - n_upregulation - n_downregulation    print(f"\n  Based on lnRR threshold = ¬±{abs(upregulation_threshold):.2f}:")    print(f"  {'Direction':<25} {'Count':>8} {'Percentage':>12}")    print(f"  {'-'*25} {'-'*8} {'-'*12}")    print(f"  {'Upregulated (lnRR > 0.05)':<25} {n_upregulation:>8} {(n_upregulation/len(data_filtered)*100):>11.1f}%")    print(f"  {'No change (|lnRR| ‚â§ 0.05)':<25} {n_no_change:>8} {(n_no_change/len(data_filtered)*100):>11.1f}%")    print(f"  {'Downregulated (lnRR < -0.05)':<25} {n_downregulation:>8} {(n_downregulation/len(data_filtered)*100):>11.1f}%")    # Fold-change magnitude categories    print(f"\nüìè Fold-Change Magnitude:")    fc_2x_up = (data_filtered['Response_Ratio'] >= 2.0).sum()    fc_2x_down = (data_filtered['Response_Ratio'] <= 0.5).sum()    fc_3x_up = (data_filtered['Response_Ratio'] >= 3.0).sum()    fc_3x_down = (data_filtered['Response_Ratio'] <= 0.33).sum()    fc_5x_up = (data_filtered['Response_Ratio'] >= 5.0).sum()    fc_5x_down = (data_filtered['Response_Ratio'] <= 0.2).sum()    print(f"  {'Category':<30} {'Count':>8} {'Percentage':>12}")    print(f"  {'-'*30} {'-'*8} {'-'*12}")    print(f"  {'‚â•5√ó increase (RR ‚â• 5.0)':<30} {fc_5x_up:>8} {(fc_5x_up/len(data_filtered)*100):>11.1f}%")    print(f"  {'‚â•3√ó increase (RR ‚â• 3.0)':<30} {fc_3x_up:>8} {(fc_3x_up/len(data_filtered)*100):>11.1f}%")    print(f"  {'‚â•2√ó increase (RR ‚â• 2.0)':<30} {fc_2x_up:>8} {(fc_2x_up/len(data_filtered)*100):>11.1f}%")    print(f"  {'‚â•2√ó decrease (RR ‚â§ 0.5)':<30} {fc_2x_down:>8} {(fc_2x_down/len(data_filtered)*100):>11.1f}%")    print(f"  {'‚â•3√ó decrease (RR ‚â§ 0.33)':<30} {fc_3x_down:>8} {(fc_3x_down/len(data_filtered)*100):>11.1f}%")    print(f"  {'‚â•5√ó decrease (RR ‚â§ 0.2)':<30} {fc_5x_down:>8} {(fc_5x_down/len(data_filtered)*100):>11.1f}%")    # Percent change summary    print(f"\nüìä Percent Change from Control:")    print(f"  Mean: {data_filtered['Percent_Change'].mean():+.1f}%")    print(f"  Median: {data_filtered['Percent_Change'].median():+.1f}%")    print(f"  Range: [{data_filtered['Percent_Change'].min():+.1f}%, {data_filtered['Percent_Change'].max():+.1f}%]")    calculation_log['direction_analysis'] = {        'upregulated': n_upregulation,        'downregulated': n_downregulation,        'no_change': n_no_change,        'fc_2x_up': fc_2x_up,        'fc_2x_down': fc_2x_down,        'fc_3x_up': fc_3x_up,        'fc_3x_down': fc_3x_down    }elif effect_size_type in ['hedges_g', 'cohen_d']:    # Direction and magnitude for standardized mean differences    print(f"\nüìà Effect Direction:")    n_positive = (data_filtered[effect_col] > 0).sum()    n_negative = (data_filtered[effect_col] < 0).sum()    n_zero = (data_filtered[effect_col] == 0).sum()    print(f"  {'Direction':<25} {'Count':>8} {'Percentage':>12}")    print(f"  {'-'*25} {'-'*8} {'-'*12}")    print(f"  {'Positive effect (g > 0)':<25} {n_positive:>8} {(n_positive/len(data_filtered)*100):>11.1f}%")    print(f"  {'No effect (g = 0)':<25} {n_zero:>8} {(n_zero/len(data_filtered)*100):>11.1f}%")    print(f"  {'Negative effect (g < 0)':<25} {n_negative:>8} {(n_negative/len(data_filtered)*100):>11.1f}%")    # Already calculated in step 5, but show again for clarity    negligible = (data_filtered[effect_col].abs() < 0.2).sum()    small = ((data_filtered[effect_col].abs() >= 0.2) & (data_filtered[effect_col].abs() < 0.5)).sum()    medium = ((data_filtered[effect_col].abs() >= 0.5) & (data_filtered[effect_col].abs() < 0.8)).sum()    large = (data_filtered[effect_col].abs() >= 0.8).sum()    print(f"\nüìè Effect Magnitude (Cohen's benchmarks):")    print(f"  {'Category':<30} {'Count':>8} {'Percentage':>12}")    print(f"  {'-'*30} {'-'*8} {'-'*12}")    print(f"  {'Negligible (|g| < 0.2)':<30} {negligible:>8} {(negligible/len(data_filtered)*100):>11.1f}%")    print(f"  {'Small (0.2 ‚â§ |g| < 0.5)':<30} {small:>8} {(small/len(data_filtered)*100):>11.1f}%")    print(f"  {'Medium (0.5 ‚â§ |g| < 0.8)':<30} {medium:>8} {(medium/len(data_filtered)*100):>11.1f}%")    print(f"  {'Large (|g| ‚â• 0.8)':<30} {large:>8} {(large/len(data_filtered)*100):>11.1f}%")    calculation_log['direction_analysis'] = {        'positive': n_positive,        'negative': n_negative,        'negligible': negligible,        'small': small,        'medium': medium,        'large': large    }elif effect_size_type == 'log_or':    # Direction for odds ratios    print(f"\nüìà Effect Direction:")    n_positive = (data_filtered[effect_col] > 0).sum()    n_negative = (data_filtered[effect_col] < 0).sum()    n_null = (data_filtered[effect_col] == 0).sum()    print(f"  {'Direction':<30} {'Count':>8} {'Percentage':>12}")    print(f"  {'-'*30} {'-'*8} {'-'*12}")    print(f"  {'Positive association (OR > 1)':<30} {n_positive:>8} {(n_positive/len(data_filtered)*100):>11.1f}%")    print(f"  {'No association (OR = 1)':<30} {n_null:>8} {(n_null/len(data_filtered)*100):>11.1f}%")    print(f"  {'Negative association (OR < 1)':<30} {n_negative:>8} {(n_negative/len(data_filtered)*100):>11.1f}%")    print(f"\nüìä Odds Ratio Summary:")    print(f"  Mean OR: {data_filtered['Odds_Ratio'].mean():.3f}")    print(f"  Median OR: {data_filtered['Odds_Ratio'].median():.3f}")    print(f"  Range: [{data_filtered['Odds_Ratio'].min():.3f}, {data_filtered['Odds_Ratio'].max():.3f}]")# --- STEP 10: IDENTIFY EXTREME VALUES ---print("\n" + "="*70)print("STEP 10: EXTREME VALUE DETECTION")print("="*70)print(f"\nüîç Identifying outliers and extreme effect sizes...")# Define thresholds based on effect size typeif effect_size_type == 'lnRR':    threshold = 3.0  # ~20-fold change    extreme_label = "RR > 20√ó or RR < 0.05√ó"    interpretation = "More than 20-fold change"elif effect_size_type in ['hedges_g', 'cohen_d']:    threshold = 2.0  # Very large standardized effect    extreme_label = "|g| > 2.0"    interpretation = "Very large effect (exceeds typical benchmarks)"elif effect_size_type == 'log_or':    threshold = 3.0  # OR > 20    extreme_label = "OR > 20 or OR < 0.05"    interpretation = "Odds ratio > 20√ó or < 0.05√ó"extreme_effects = data_filtered[np.abs(data_filtered[effect_col]) > threshold].copy()print(f"\n  Threshold: {extreme_label}")print(f"  Interpretation: {interpretation}")if len(extreme_effects) > 0:    print(f"\n  ‚ö†Ô∏è  Found {len(extreme_effects)} extreme effects ({len(extreme_effects)/len(data_filtered)*100:.1f}% of dataset):")    print(f"\n  {'Paper ID':<15} {es_config['effect_label_short']:>10} {'SE':>10} {'Treatment':>12} {'Control':>12}")    print(f"  {'-'*15} {'-'*10} {'-'*10} {'-'*12} {'-'*12}")    # Show extreme effects    for idx, row in extreme_effects.head(20).iterrows():        paper_id = str(row['id'])[:15]        effect = row[effect_col]        se = row[se_col]        xe = row['xe']        xc = row['xc']        print(f"  {paper_id:<15} {effect:>10.4f} {se:>10.4f} {xe:>12.4f} {xc:>12.4f}")    if len(extreme_effects) > 20:        print(f"  ... and {len(extreme_effects) - 20} more")    print(f"\n  üí° Recommendations:")    print(f"     1. Review these observations for data entry errors")    print(f"     2. Check original papers for these effect sizes")    print(f"     3. Consider sensitivity analysis excluding these values")    print(f"     4. Examine if they represent true biological phenomena")    calculation_log['extreme_effects'] = {        'count': len(extreme_effects),        'threshold': threshold,        'paper_ids': extreme_effects['id'].tolist()    }else:    print(f"\n  ‚úì No extreme values detected")    print(f"    All effect sizes within expected range")    calculation_log['extreme_effects'] = {        'count': 0,        'threshold': threshold    }# Additional outlier detection using IQR methodprint(f"\nüìä Outlier Detection (IQR Method):")q1 = data_filtered[effect_col].quantile(0.25)q3 = data_filtered[effect_col].quantile(0.75)iqr = q3 - q1lower_fence = q1 - 1.5 * iqrupper_fence = q3 + 1.5 * iqroutliers_iqr = data_filtered[(data_filtered[effect_col] < lower_fence) |                              (data_filtered[effect_col] > upper_fence)]print(f"  Q1 (25th percentile): {q1:.4f}")print(f"  Q3 (75th percentile): {q3:.4f}")print(f"  IQR: {iqr:.4f}")print(f"  Lower fence: {lower_fence:.4f}")print(f"  Upper fence: {upper_fence:.4f}")print(f"\n  Outliers detected: {len(outliers_iqr)} ({len(outliers_iqr)/len(data_filtered)*100:.1f}%)")if len(outliers_iqr) > 0:    print(f"    ‚Ä¢ Below lower fence: {(data_filtered[effect_col] < lower_fence).sum()}")    print(f"    ‚Ä¢ Above upper fence: {(data_filtered[effect_col] > upper_fence).sum()}")calculation_log['outliers_iqr'] = {    'count': len(outliers_iqr),    'lower_fence': lower_fence,    'upper_fence': upper_fence,    'paper_ids': outliers_iqr['id'].tolist()}# --- STEP 11: CONFIDENCE INTERVAL COVERAGE ---print("\n" + "="*70)print("STEP 11: CONFIDENCE INTERVAL ANALYSIS")print("="*70)ci_lower_col = es_config['ci_lower_col']ci_upper_col = es_config['ci_upper_col']# Check CI coverage of null hypothesisnull_value = es_config['null_value']ci_includes_null = ((data_filtered[ci_lower_col] <= null_value) &                    (data_filtered[ci_upper_col] >= null_value)).sum()ci_excludes_null = len(data_filtered) - ci_includes_nullprint(f"\nüìä 95% Confidence Interval Coverage:")print(f"  Null hypothesis value: {null_value}")print(f"\n  {'Category':<35} {'Count':>8} {'Percentage':>12}")print(f"  {'-'*35} {'-'*8} {'-'*12}")print(f"  {'CI includes null (not significant)':<35} {ci_includes_null:>8} {(ci_includes_null/len(data_filtered)*100):>11.1f}%")print(f"  {'CI excludes null (significant)':<35} {ci_excludes_null:>8} {(ci_excludes_null/len(data_filtered)*100):>11.1f}%")# Average CI widthdata_filtered['ci_width'] = data_filtered[ci_upper_col] - data_filtered[ci_lower_col]mean_ci_width = data_filtered['ci_width'].mean()median_ci_width = data_filtered['ci_width'].median()print(f"\nüìè Confidence Interval Width:")print(f"  Mean CI width:   {mean_ci_width:.4f}")print(f"  Median CI width: {median_ci_width:.4f}")print(f"  Min CI width:    {data_filtered['ci_width'].min():.4f}")print(f"  Max CI width:    {data_filtered['ci_width'].max():.4f}")# Precision categoriesnarrow_ci = (data_filtered['ci_width'] < median_ci_width * 0.5).sum()moderate_ci = ((data_filtered['ci_width'] >= median_ci_width * 0.5) &               (data_filtered['ci_width'] <= median_ci_width * 2)).sum()wide_ci = (data_filtered['ci_width'] > median_ci_width * 2).sum()print(f"\nüìä Precision Distribution:")print(f"  {'Category':<30} {'Count':>8} {'Percentage':>12}")print(f"  {'-'*30} {'-'*8} {'-'*12}")print(f"  {'High precision (narrow CI)':<30} {narrow_ci:>8} {(narrow_ci/len(data_filtered)*100):>11.1f}%")print(f"  {'Moderate precision':<30} {moderate_ci:>8} {(moderate_ci/len(data_filtered)*100):>11.1f}%")print(f"  {'Low precision (wide CI)':<30} {wide_ci:>8} {(wide_ci/len(data_filtered)*100):>11.1f}%")calculation_log['ci_analysis'] = {    'ci_includes_null': ci_includes_null,    'ci_excludes_null': ci_excludes_null,    'mean_ci_width': mean_ci_width,    'median_ci_width': median_ci_width}# --- STEP 12: UPDATE CONFIGURATION ---print("\n" + "="*70)print("STEP 12: UPDATING CONFIGURATION")print("="*70)ANALYSIS_CONFIG['effect_col'] = effect_colANALYSIS_CONFIG['var_col'] = var_colANALYSIS_CONFIG['se_col'] = se_colANALYSIS_CONFIG['ci_lower_col'] = ci_lower_colANALYSIS_CONFIG['ci_upper_col'] = ci_upper_colANALYSIS_CONFIG['final_n'] = len(data_filtered)ANALYSIS_CONFIG['calculation_timestamp'] = datetime.datetime.now()print(f"\n‚úì Configuration updated with effect size information:")print(f"  ‚Ä¢ Effect column:    {effect_col}")print(f"  ‚Ä¢ Variance column:  {var_col}")print(f"  ‚Ä¢ SE column:        {se_col}")print(f"  ‚Ä¢ CI columns:       {ci_lower_col}, {ci_upper_col}")print(f"  ‚Ä¢ Final n:          {len(data_filtered)}")# Store comprehensive metadataEFFECT_SIZE_METADATA = {    'timestamp': datetime.datetime.now(),    'effect_size_type': effect_size_type,    'n_initial': initial_obs,    'n_final': len(data_filtered),    'n_removed': initial_obs - len(data_filtered),    'papers_initial': initial_papers,    'papers_final': data_filtered['id'].nunique(),    'imputation_log': imputation_log,    'calculation_log': calculation_log,    'effect_stats': effect_stats,    'var_stats': var_stats,    'se_stats': se_stats,    'columns_created': calculation_log['columns_created']}print(f"\n‚úì Metadata saved to EFFECT_SIZE_METADATA")# --- STEP 13: DATA PREVIEW ---print("\n" + "="*70)print("STEP 13: DATA PREVIEW")print("="*70)print(f"\nüìã Preview of Calculated Data (first 10 observations):\n")# Select columns for previewpreview_cols = ['id', 'xe', 'xc', 'ne', 'nc', effect_col, se_col]# Add CI columnspreview_cols.extend([ci_lower_col, ci_upper_col])# Add fold-change if availableif es_config['has_fold_change']:    if 'fold_change' in data_filtered.columns:        preview_cols.append('fold_change')    if 'Response_Ratio' in data_filtered.columns:        preview_cols.append('Response_Ratio')    elif 'Odds_Ratio' in data_filtered.columns:        preview_cols.append('Odds_Ratio')# Add weightpreview_cols.append('w_fixed')# Display previewpreview_df = data_filtered[preview_cols].head(10).copy()# Format numeric columnsfor col in preview_df.select_dtypes(include=[np.number]).columns:    if col in ['ne', 'nc']:        preview_df[col] = preview_df[col].astype(int)    elif col == 'w_fixed':        preview_df[col] = preview_df[col].apply(lambda x: f'{x:.2f}')    else:        preview_df[col] = preview_df[col].apply(lambda x: f'{x:.4f}')print(preview_df.to_string(index=False))if len(data_filtered) > 10:    print(f"\n... and {len(data_filtered) - 10} more observations")# --- FINAL STATUS ---print("\n" + "="*70)print("‚úÖ EFFECT SIZE CALCULATION COMPLETE")print("="*70)print(f"\nüìä Final Dataset Summary:")print(f"  ‚Ä¢ Observations:           {len(data_filtered)}")print(f"  ‚Ä¢ Unique papers:          {data_filtered['id'].nunique()}")print(f"  ‚Ä¢ Effect size type:       {es_config['effect_label']} ({es_config['effect_label_short']})")print(f"  ‚Ä¢ Mean effect size:       {effect_stats['mean']:.4f}")print(f"  ‚Ä¢ Median effect size:     {effect_stats['median']:.4f}")print(f"  ‚Ä¢ Effect size range:      [{effect_stats['min']:.4f}, {effect_stats['max']:.4f}]")if es_config['has_fold_change']:    if 'Response_Ratio' in data_filtered.columns:        print(f"  ‚Ä¢ Mean response ratio:    {data_filtered['Response_Ratio'].mean():.3f}")        print(f"  ‚Ä¢ Median fold-change:     {data_filtered['fold_change'].median():.2f}√ó")print(f"\nüìÅ Columns Available:")print(f"  Primary: {effect_col}, {var_col}, {se_col}")print(f"  CI: {ci_lower_col}, {ci_upper_col}")print(f"  Weight: w_fixed")if es_config['has_fold_change']:    print(f"  Interpretation: {', '.join([c for c in data_filtered.columns if 'fold' in c.lower() or 'ratio' in c.lower() or 'percent' in c.lower()])}")print(f"\n‚ö†Ô∏è  Quality Notes:")if imputation_log['sde_imputed'] + imputation_log['sdc_imputed'] > 0:    print(f"  ‚Ä¢ {imputation_log['sde_imputed'] + imputation_log['sdc_imputed']} SDs were imputed using median CV")if calculation_log.get('extreme_effects', {}).get('count', 0) > 0:    print(f"  ‚Ä¢ {calculation_log['extreme_effects']['count']} extreme effect sizes detected")if outliers_iqr is not None and len(outliers_iqr) > 0:    print(f"  ‚Ä¢ {len(outliers_iqr)} outliers detected using IQR method")print(f"\n‚ñ∂Ô∏è  Next Steps:")print(f"  1. Review the summary statistics and data quality notes")print(f"  2. Run the next cell to perform meta-analysis and calculate pooled estimates")print(f"  3. Consider the extreme values and outliers flagged above")print("\n" + "="*70)

In [None]:
# ‚ö†Ô∏è PREREQUISITES:# - Must complete configuration first# - Ensure required columns are present in your data## Expected runtime: < 5 seconds for most datasets# For large datasets (n > 500): up to 30 seconds## INTERPRETATION NOTE:# - Effect sizes will be calculated based on your selected method# - Check for warnings about missing data or calculation errors# KNAPP-HARTUNG ADJUSTMENT## Method from Knapp & Hartung (2003). Improved tests for a random effects# meta-regression with a single covariate. Statistics in Medicine, 22(17), 2693-2710.## This adjustment:# 1. Uses t-distribution instead of normal distribution for confidence intervals# 2. Improves coverage probability, especially with few studies# 3. Recommended for more accurate inference in random-effects models## The adjustment factor accounts for uncertainty in tau-squared estimation.# Hedges' g and Cohen's d Calculation# Methods from Hedges & Olkin (1985). Statistical methods for meta-analysis.# Academic Press.## Cohen's d = (M1 - M2) / SD_pooled# Hedges' g = d * (1 - 3/(4*df - 1))  # Small sample correction## These are standardized mean differences, common in psychology and medicine.# Log Response Ratio (lnRR) Calculation# Method from Hedges et al. (1999) and Lajeunesse (2011).# Lajeunesse, M.J. (2011). On the meta-analysis of response ratios for studies with# correlated and multi-group designs. Ecology, 92(11), 2049-2055.## lnRR = ln(mean_treatment / mean_control)# Commonly used in ecology and environmental sciences.#@title üìä OVERALL POOLED EFFECT SIZE & HETEROGENEITY#@title üìä OVERALL POOLED EFFECT SIZE & HETEROGENEITY# =============================================================================# CELL 6: OVERALL META-ANALYSIS# Purpose: Calculate pooled effect sizes and assess heterogeneity# Dependencies: Cell 5 (data_filtered with effect sizes, ANALYSIS_CONFIG)# Outputs: Overall pooled estimates (fixed & random effects), heterogeneity stats# =============================================================================# Assuming 'calculate_tau_squared' and 'compare_tau_estimators'# and 'ANALYSIS_CONFIG' and 'data_filtered' exist in the environment# --- ADD THIS AT THE START OF CELL 6 (before main analysis) ---print("\n" + "="*70)print("TAU-SQUARED ESTIMATOR SELECTION")# =============================================================================# KNAPP-HARTUNG CORRECTION FUNCTION# =============================================================================def calculate_knapp_hartung_ci(yi, vi, tau_sq, pooled_effect, alpha=0.05):    """    Calculate Knapp-Hartung adjusted confidence interval    The Knapp-Hartung (K-H) method provides more accurate confidence intervals    for random-effects meta-analysis, especially with small numbers of studies.    Key improvements over standard method:    1. Uses t-distribution instead of normal distribution    2. Adjusts standard error based on observed variability (Q statistic)    3. Reduces Type I error rate (false positives)    4. More conservative with small k (appropriate coverage)    Parameters:    -----------    yi : array-like        Effect sizes from individual studies    vi : array-like        Sampling variances    tau_sq : float        Between-study variance (tau-squared)    pooled_effect : float        Pooled effect estimate from random-effects model    alpha : float, default=0.05        Significance level (0.05 for 95% CI)    Returns:    --------    dict with keys:        'se_KH': Knapp-Hartung adjusted standard error        'var_KH': Knapp-Hartung adjusted variance        'ci_lower': Lower bound of 95% CI        'ci_upper': Upper bound of 95% CI        't_stat': t-statistic        't_crit': Critical t-value        'df': Degrees of freedom (k-1)        'p_value': Two-tailed p-value        'Q': Residual heterogeneity statistic    References:    -----------    Knapp, G., & Hartung, J. (2003). Improved tests for a random effects    meta-regression with a single covariate. Statistics in Medicine, 22(17),    2693-2710.    IntHout, J., Ioannidis, J. P., & Borm, G. F. (2014). The Hartung-Knapp-    Sidik-Jonkman method for random effects meta-analysis is straightforward    and considerably outperforms the standard DerSimonian-Laird method.    BMC Medical Research Methodology, 14(1), 25.    Recommended by Cochrane Handbook (2023), Section 10.4.4.3    """    # Convert to numpy arrays    yi = np.array(yi)    vi = np.array(vi)    # Random-effects weights    wi_star = 1 / (vi + tau_sq)    sum_wi_star = np.sum(wi_star)    # Degrees of freedom    k = len(yi)    df = k - 1    if df <= 0:        # Can't use K-H with k=1        return None    # Calculate Q statistic (residual heterogeneity)    Q = np.sum(wi_star * (yi - pooled_effect)**2)    # Standard random-effects variance    var_standard = 1 / sum_wi_star    # Knapp-Hartung adjusted variance    # SE_KH¬≤ = (Q / (k-1)) √ó (1 / Œ£w*)    var_KH = (Q / df) * var_standard    se_KH = np.sqrt(var_KH)    # t-distribution critical value    t_crit = t.ppf(1 - alpha/2, df)    # Confidence interval    ci_lower = pooled_effect - t_crit * se_KH    ci_upper = pooled_effect + t_crit * se_KH    # Test statistic and p-value    t_stat = pooled_effect / se_KH    p_value = 2 * (1 - t.cdf(abs(t_stat), df))    return {        'se_KH': se_KH,        'var_KH': var_KH,        'ci_lower': ci_lower,        'ci_upper': ci_upper,        't_stat': t_stat,        't_crit': t_crit,        'df': df,        'p_value': p_value,        'Q': Q    }print("="*70)# Check if advanced estimators availableif 'calculate_tau_squared' in globals():    print("‚úÖ Advanced estimators available")    method_options = [        ('REML (Recommended)', 'REML'),        ('DerSimonian-Laird (Classic)', 'DL'),        ('Maximum Likelihood', 'ML'),        ('Paule-Mandel', 'PM'),        ('Sidik-Jonkman', 'SJ')    ]    method_help = widgets.HTML(        "<div style='background-color: #e8f4f8; padding: 10px; margin: 10px 0; border-radius: 5px;'>"        "<b>üí° Method Guide:</b><br>"        "‚Ä¢ <b>REML:</b> ‚≠ê Best choice for most analyses. Unbiased and accurate.<br>"        "‚Ä¢ <b>DL:</b> Fast but can underestimate œÑ¬≤ with few studies.<br>"        "‚Ä¢ <b>ML:</b> Efficient but biased downward.<br>"        "‚Ä¢ <b>PM:</b> Exact Q = k-1 solution.<br>"        "‚Ä¢ <b>SJ:</b> Conservative, good for k < 10."        "</div>"    )else:    print("‚ö†Ô∏è  Using DerSimonian-Laird method only")    print("  Run Cell 4.5 to enable REML and other methods")    method_options = [('DerSimonian-Laird', 'DL')]    method_help = widgets.HTML(        "<div style='background-color: #fff3cd; padding: 10px; margin: 10px 0; border-radius: 5px;'>"        "‚ö†Ô∏è Run <b>Cell 4.5 (Heterogeneity Estimators)</b> to access REML and other methods."        "</div>"    )tau_method_widget = widgets.Dropdown(    options=method_options,    value='REML' if 'calculate_tau_squared' in globals() else 'DL',    description='œÑ¬≤ Method:',    style={'description_width': '100px'},    layout=widgets.Layout(width='400px'))# Save selection to configANALYSIS_CONFIG['tau_method'] = tau_method_widget.valuedef on_method_change(change):    ANALYSIS_CONFIG['tau_method'] = change['new']tau_method_widget.observe(on_method_change, names='value')# Knapp-Hartung correction widget (already defined, checking logic)use_kh_widget = widgets.Checkbox(    value=True,  # Default ON (recommended)    description='Use Knapp-Hartung correction for confidence intervals (Recommended for k<20)',    style={'description_width': 'initial'},    layout=widgets.Layout(width='500px'))kh_help = widgets.HTML(    "<div style='background-color: #e7f3ff; padding: 10px; margin: 10px 0; border-radius: 5px;'>"    "<b>‚ÑπÔ∏è Knapp-Hartung Correction:</b><br>"    "‚Ä¢ Uses t-distribution instead of normal (better for small k)<br>"    "‚Ä¢ Adjusts SE based on observed variability (Q statistic)<br>"    "‚Ä¢ <b>Recommended</b>, especially for k < 20 studies<br>"    "‚Ä¢ Produces more conservative (wider) confidence intervals<br>"    "‚Ä¢ Reduces false positive rate (better Type I error control)"    "</div>")# Widgets will be displayed at the end of output# Display re-run reminderrerun_message = widgets.HTML(    "<div style='background-color: #fffbf0; padding: 8px; margin: 10px 0; border-left: 3px solid #ff9800; border-radius: 3px;'>"    "‚ö†Ô∏è <b>Important:</b> After changing the method, you must re-run this cell to apply the new estimator."    "</div>")# Rerun message will be displayed at the endprint("\n" + "="*70)print("\n" + "="*70)print("OVERALL META-ANALYSIS")print("="*70)print(f"Timestamp: {datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")# --- STEP 1: LOAD CONFIGURATION ---print("\n" + "="*70)print("STEP 1: LOADING CONFIGURATION")print("="*70)try:    effect_col = ANALYSIS_CONFIG['effect_col']    var_col = ANALYSIS_CONFIG['var_col']    se_col = ANALYSIS_CONFIG['se_col']    es_config = ANALYSIS_CONFIG['es_config']    effect_type = ANALYSIS_CONFIG['effect_size_type']    print(f"‚úì Configuration loaded successfully")    print(f"  Effect size: {es_config['effect_label']} ({es_config['effect_label_short']})")    print(f"  Effect column: {effect_col}")    print(f"  Variance column: {var_col}")    print(f"  SE column: {se_col}")except KeyError as e:    print(f"‚ùå ERROR: Configuration not found - {e}")    print("\nTroubleshooting:")    print("  1. Ensure Cell 5 (effect size calculation) was run successfully")    print("  2. Check that ANALYSIS_CONFIG dictionary exists")    print("  3. Verify effect sizes were calculated properly")    raise# --- STEP 2: PREPARE ANALYSIS DATA ---print("\n" + "="*70)print("STEP 2: DATA PREPARATION")print("="*70)print(f"\nüîç Preparing data for meta-analysis...")# Store initial countsinitial_count = len(data_filtered)initial_papers = data_filtered['id'].nunique()print(f"\n  Initial dataset:")print(f"    ‚Ä¢ Observations: {initial_count}")print(f"    ‚Ä¢ Unique papers: {initial_papers}")# Use only valid data points (non-missing effect size, variance, and weight)analysis_data = data_filtered.dropna(subset=[effect_col, var_col, 'w_fixed']).copy()# Ensure variance is positivepositive_var = analysis_data[var_col] > 0n_non_positive = (~positive_var).sum()if n_non_positive > 0:    print(f"\n  ‚ö†Ô∏è  Removing {n_non_positive} observations with non-positive variance")    analysis_data = analysis_data[positive_var].copy()# Final countsk = len(analysis_data)k_papers = analysis_data['id'].nunique()if k < 1:    print(f"\n‚ùå ERROR: No valid studies available for meta-analysis")    print(f"  Possible causes:")    print(f"    ‚Ä¢ All variances are zero or negative")    print(f"    ‚Ä¢ Missing effect size data")    print(f"    ‚Ä¢ All weights are invalid")    raise ValueError("No valid studies available for meta-analysis after filtering.")print(f"\n  ‚úì Final analysis dataset:")print(f"    ‚Ä¢ Observations (k): {k}")print(f"    ‚Ä¢ Unique papers: {k_papers}")print(f"    ‚Ä¢ Removed: {initial_count - k} observations")# Calculate average observations per paperavg_obs_per_paper = k / k_papers if k_papers > 0 else 0print(f"    ‚Ä¢ Avg obs per paper: {avg_obs_per_paper:.2f}")# --- STEP 3: HANDLE SINGLE STUDY CASE ---if k == 1:    print("\n" + "="*70)    print("‚ö†Ô∏è  SINGLE STUDY ANALYSIS")    print("="*70)    print(f"\n‚ö†Ô∏è  WARNING: Only one observation available (k=1)")    print(f"  Meta-analysis requires multiple studies")    print(f"  Reporting single study results:")    single_study = analysis_data.iloc[0]    print(f"\nüìã Single Study Details:")    print(f"  Study ID: {single_study.get('id', 'N/A')}")    print(f"  {es_config['effect_label_short']}: {single_study[effect_col]:.4f}")    print(f"  Variance: {single_study[var_col]:.6f}")    print(f"  SE: {single_study[se_col]:.4f}")    print(f"  Treatment mean: {single_study['xe']:.4f}")    print(f"  Control mean: {single_study['xc']:.4f}")    print(f"  Sample size (treatment): {int(single_study['ne'])}")    print(f"  Sample size (control): {int(single_study['nc'])}")    if es_config['has_fold_change']:        if 'fold_change' in single_study:            print(f"  Fold-change: {single_study['fold_change']:.2f}√ó")        if 'Response_Ratio' in single_study:            print(f"  Response Ratio: {single_study['Response_Ratio']:.3f}")    # Calculate confidence interval    z_crit = norm.ppf(0.975)  # 1.96    ci_lower = single_study[effect_col] - z_crit * single_study[se_col]    ci_upper = single_study[effect_col] + z_crit * single_study[se_col]    print(f"\n  95% CI: [{ci_lower:.4f}, {ci_upper:.4f}]")    # Set variables to NaN for consistency    pooled_effect_fixed = single_study[effect_col]    pooled_var_fixed = single_study[var_col]    pooled_SE_fixed = single_study[se_col]    ci_lower_fixed = ci_lower    ci_upper_fixed = ci_upper    p_value_fixed = np.nan    Qt = np.nan    p_heterogeneity = np.nan    I_squared = np.nan    tau_squared_DL = np.nan    pooled_effect_random = pooled_effect_fixed    pooled_var_random = pooled_var_fixed    pooled_SE_random = pooled_SE_fixed    ci_lower_random = ci_lower    ci_upper_random = ci_upper    p_value_random = np.nan    pi_lower_random = np.nan    pi_upper_random = np.nan    print(f"\n" + "="*70)    print(f"‚ö†Ô∏è  META-ANALYSIS NOT POSSIBLE WITH ONE STUDY")    print(f"="*70)    print(f"\nRecommendations:")    print(f"  1. Report single study results with appropriate caution")    print(f"  2. Cannot assess heterogeneity or publication bias")    print(f"  3. Consider collecting more studies before drawing conclusions")else:    # --- STEP 4: FIXED-EFFECTS MODEL ---    print("\n" + "="*70)    print("STEP 3: FIXED-EFFECTS MODEL")    print("="*70)    print(f"\nüìê Model Assumption:")    print(f"  All studies share a common true effect size")    print(f"  Differences between studies are due to sampling error only")    print(f"\nüî¢ Calculating inverse-variance weighted mean...")    # Significance level    alpha = 0.05    z_crit = norm.ppf(1 - alpha / 2)  # ~1.96 for 95% CI    # Calculate sum of weights    sum_w_fixed = analysis_data['w_fixed'].sum()    if sum_w_fixed <= 0:        print(f"‚ùå ERROR: Sum of fixed-effects weights is non-positive")        raise ValueError("Sum of fixed-effects weights is non-positive. Check variance values.")    print(f"  Sum of weights: {sum_w_fixed:.2f}")    # Pooled effect size (weighted mean)    pooled_effect_fixed = (analysis_data['w_fixed'] * analysis_data[effect_col]).sum() / sum_w_fixed    # Variance of pooled effect    pooled_var_fixed = 1 / sum_w_fixed    pooled_SE_fixed = np.sqrt(pooled_var_fixed)    # 95% Confidence Interval    ci_lower_fixed = pooled_effect_fixed - z_crit * pooled_SE_fixed    ci_upper_fixed = pooled_effect_fixed + z_crit * pooled_SE_fixed    # Test significance (H0: effect = 0)    z_stat_fixed = pooled_effect_fixed / pooled_SE_fixed    p_value_fixed = 2 * (1 - norm.cdf(abs(z_stat_fixed)))    # Display results    print(f"\nüìä Fixed-Effects Results:")    print(f"  {'Statistic':<25} {'Value':>15}")    print(f"  {'-'*25} {'-'*15}")    print(f"  {'Pooled ' + es_config['effect_label_short']:<25} {pooled_effect_fixed:>15.4f}")    print(f"  {'Standard Error':<25} {pooled_SE_fixed:>15.4f}")    print(f"  {'Variance':<25} {pooled_var_fixed:>15.6f}")    print(f"  {'95% CI Lower':<25} {ci_lower_fixed:>15.4f}")    print(f"  {'95% CI Upper':<25} {ci_upper_fixed:>15.4f}")    print(f"  {'Z-statistic':<25} {z_stat_fixed:>15.4f}")    print(f"  {'P-value':<25} {p_value_fixed:>15.4g}")    # Interpretation for ratio-based measures    if es_config['has_fold_change']:        print(f"\nüìà Biological Interpretation:")        if effect_type == 'lnRR':            pooled_RR_fixed = np.exp(pooled_effect_fixed)            pooled_fold_fixed = pooled_RR_fixed if pooled_effect_fixed >= 0 else -1/pooled_RR_fixed            pooled_pct_fixed = (pooled_RR_fixed - 1) * 100            ci_lower_RR = np.exp(ci_lower_fixed)            ci_upper_RR = np.exp(ci_upper_fixed)            print(f"  {'Metric':<30} {'Value':>15}")            print(f"  {'-'*30} {'-'*15}")            print(f"  {'Response Ratio (RR)':<30} {pooled_RR_fixed:>15.3f}")            print(f"  {'Fold-change':<30} {pooled_fold_fixed:>+14.2f}√ó")            print(f"  {'Percent change':<30} {pooled_pct_fixed:>+14.1f}%")            print(f"  {'95% CI (RR scale)':<30} [{ci_lower_RR:.3f}, {ci_upper_RR:.3f}]")            # Direction interpretation            if pooled_effect_fixed > 0.05:                direction = "INCREASE (upregulation)"            elif pooled_effect_fixed < -0.05:                direction = "DECREASE (downregulation)"            else:                direction = "NO CHANGE"            print(f"\n  Overall direction: {direction}")        elif effect_type == 'log_or':            pooled_OR_fixed = np.exp(pooled_effect_fixed)            ci_lower_OR = np.exp(ci_lower_fixed)            ci_upper_OR = np.exp(ci_upper_fixed)            print(f"  {'Metric':<30} {'Value':>15}")            print(f"  {'-'*30} {'-'*15}")            print(f"  {'Odds Ratio (OR)':<30} {pooled_OR_fixed:>15.3f}")            print(f"  {'95% CI (OR scale)':<30} [{ci_lower_OR:.3f}, {ci_upper_OR:.3f}]")            if pooled_OR_fixed > 1:                direction = "Positive association"            elif pooled_OR_fixed < 1:                direction = "Negative association"            else:                direction = "No association"            print(f"\n  Interpretation: {direction}")    # Significance interpretation    print(f"\nüìå Statistical Significance:")    if p_value_fixed < 0.001:        sig_text = "HIGHLY SIGNIFICANT (p < 0.001)"        sig_symbol = "***"    elif p_value_fixed < 0.01:        sig_text = "VERY SIGNIFICANT (p < 0.01)"        sig_symbol = "**"    elif p_value_fixed < 0.05:        sig_text = "SIGNIFICANT (p < 0.05)"        sig_symbol = "*"    else:        sig_text = "NOT SIGNIFICANT (p ‚â• 0.05)"        sig_symbol = "ns"    print(f"  The overall effect is {sig_text} {sig_symbol}")    # --- STEP 5: HETEROGENEITY ASSESSMENT ---    print("\n" + "="*70)    print("STEP 4: HETEROGENEITY ASSESSMENT")    print("="*70)    print(f"\nüìä Testing for variability across studies...")    # Cochran's Q statistic    Qt = (analysis_data['w_fixed'] * (analysis_data[effect_col] - pooled_effect_fixed)**2).sum()    df_Q = k - 1    print(f"\nüî¨ Cochran's Q Test:")    print(f"  Q statistic: {Qt:.4f}")    print(f"  Degrees of freedom: {df_Q}")    print(f"  Expected value under H‚ÇÄ: {df_Q}")    # P-value for Q test (H0: homogeneous effects)    if df_Q > 0:        p_heterogeneity = 1 - chi2.cdf(Qt, df_Q)        print(f"  P-value (œá¬≤ test): {p_heterogeneity:.4g}")        if p_heterogeneity < 0.001:            q_interp = "Highly significant heterogeneity (p < 0.001)"        elif p_heterogeneity < 0.01:            q_interp = "Very significant heterogeneity (p < 0.01)"        elif p_heterogeneity < 0.10:            q_interp = "Significant heterogeneity (p < 0.10)"        else:            q_interp = "No significant heterogeneity (p ‚â• 0.10)"        print(f"  Interpretation: {q_interp}")    else:        p_heterogeneity = np.nan        print(f"  P-value: N/A (only one study)")    # I-squared (proportion of variance due to heterogeneity)    print(f"\nüìè I¬≤ (I-squared) Statistic:")    if Qt > df_Q:        I_squared = ((Qt - df_Q) / Qt) * 100    else:        I_squared = 0    print(f"  I¬≤ = {I_squared:.2f}%")    print(f"  Interpretation: {I_squared:.2f}% of total variation is due to heterogeneity")    # Interpretation of I¬≤ with color coding    if I_squared < 25:        i2_interp = "Low heterogeneity (might not be important)"        i2_color = "üü¢"        i2_recommendation = "Fixed or random effects both acceptable"    elif I_squared < 50:        i2_interp = "Moderate heterogeneity"        i2_color = "üü°"        i2_recommendation = "Consider random effects model"    elif I_squared < 75:        i2_interp = "Substantial heterogeneity"        i2_color = "üü†"        i2_recommendation = "Use random effects model; explore sources"    else:        i2_interp = "Considerable heterogeneity"        i2_color = "üî¥"        i2_recommendation = "Use random effects model; investigate thoroughly"    print(f"  {i2_color} {i2_interp}")    print(f"  ‚Üí {i2_recommendation}")    # Tau-squared (between-study variance) - using selected method    print(f"\nüî¨ Between-Study Variance (Tau¬≤):")    # Get selected method from config    selected_method = ANALYSIS_CONFIG.get('tau_method', 'DL')    # Calculate tau-squared using selected method    if 'calculate_tau_squared' in globals() and selected_method != 'DL':        # Use advanced estimators from Cell 4.5        print(f"  Using {selected_method} estimator...")        tau_squared_DL, tau_info = calculate_tau_squared(            analysis_data,            effect_col,            var_col,            method=selected_method        )        tau_squared_DL = float(tau_squared_DL)        tau_DL = np.sqrt(tau_squared_DL)        method_used = selected_method        # Also calculate DL for comparison        sum_w_fixed_sq = (analysis_data['w_fixed']**2).sum()        C = sum_w_fixed - (sum_w_fixed_sq / sum_w_fixed)        if C > 0 and Qt > df_Q:            tau_squared_DL_comparison = (Qt - df_Q) / C        else:            tau_squared_DL_comparison = 0    else:        # Fallback to DerSimonian-Laird (inline calculation)        sum_w_fixed_sq = (analysis_data['w_fixed']**2).sum()        C = sum_w_fixed - (sum_w_fixed_sq / sum_w_fixed)        print(f"  C constant: {C:.4f}")        if C > 0 and Qt > df_Q:            tau_squared_DL = (Qt - df_Q) / C        else:            tau_squared_DL = 0        tau_DL = np.sqrt(tau_squared_DL)        method_used = 'DL'        tau_squared_DL_comparison = None    print(f"  Tau¬≤ (variance): {tau_squared_DL:.6f}")    print(f"  Tau (SD): {tau_DL:.4f}")    print(f"  Method: {method_used}")    if tau_squared_DL > 0:        print(f"  Interpretation: Average between-study variation = {tau_DL:.4f} {es_config['effect_label_short']} units")    else:        print(f"  Interpretation: No detectable between-study variation")    # Display method status and comparison    if method_used != 'DL' and k >= 5:        # Enhanced comparison using all tau estimators        print(f"\n" + "="*70)        print("üìä TAU-SQUARED ESTIMATOR COMPARISON")        print("="*70)        print(f"\nComparing all available tau-squared estimation methods:")        print(f"(Sample size: k = {k} studies)\n")        # Get all estimator results        comparison_df = compare_tau_estimators(analysis_data, effect_col, var_col)        # Convert DataFrame to dict for easier access        comparison_results = dict(zip(comparison_df['Method'], comparison_df['œÑ¬≤']))        # Display formatted table        print(f"{'Method':<15} {'œÑ¬≤':>12} {'œÑ':>12} {'% Diff from REML':>18}   ")        print(f"{'-'*15} {'-'*12} {'-'*12} {'-'*18}   ")        # Get REML value for comparison        reml_tau_sq = float(comparison_results['REML'])        # Display each method        for method_name, tau_sq in comparison_results.items():            tau = np.sqrt(float(tau_sq))            # Calculate % difference from REML            if reml_tau_sq > 0:                pct_diff = ((float(tau_sq) - reml_tau_sq) / reml_tau_sq) * 100            else:                pct_diff = 0            # Add indicator for the method that was actually used            indicator = " ‚Üê" if method_name == method_used else ""            print(f"{method_name:<15} {float(tau_sq):>12.6f} {tau:>12.4f} {pct_diff:>17.1f}%{indicator:>3}")        print()        # Calculate REML vs DL difference for interpretation        dl_tau_sq = float(comparison_results['DL'])        if reml_tau_sq > 0:            reml_dl_diff = abs((reml_tau_sq - dl_tau_sq) / reml_tau_sq) * 100        else:            reml_dl_diff = 0        # Provide interpretation        print(f"üìã Interpretation:")        print(f"  REML vs DL difference: {reml_dl_diff:.1f}%")        if reml_dl_diff > 20:            print(f"  ‚ö†Ô∏è  Large difference - method choice is important")            print(f"  ‚Üí REML provides more accurate estimate for this dataset")        elif reml_dl_diff > 10:            print(f"  ‚ÑπÔ∏è  Moderate difference - REML recommended")            print(f"  ‚Üí Consider using REML for more reliable heterogeneity estimates")        else:            print(f"  ‚úì Small difference - methods agree")            print(f"  ‚Üí All methods provide similar tau-squared estimates")        print(f"\nüí° Note: The method marked with ‚Üê was used in this analysis")    elif tau_squared_DL_comparison is not None and method_used != 'DL':        # Fallback to simple comparison for k < 5        # Calculate difference        diff_abs = abs(tau_squared_DL - tau_squared_DL_comparison)        if tau_squared_DL_comparison > 0:            diff_pct = (diff_abs / tau_squared_DL_comparison) * 100        else:            diff_pct = 0        # Display comparison        print(f"\nüìä Method Comparison:")        print(f"  {method_used} œÑ¬≤: {tau_squared_DL:.6f}")        print(f"  DL œÑ¬≤:   {tau_squared_DL_comparison:.6f}")        print(f"  Difference: {diff_abs:.6f} ({diff_pct:.1f}%)")        if diff_pct > 10:            print(f"  ‚ö†Ô∏è  WARNING: Difference >10% - method choice may substantially affect results")        elif diff_pct > 5:            print(f"  ‚ö° Moderate difference - {method_used} provides more accurate estimate")        else:            print(f"  ‚úì Methods agree closely")        print(f"\nüí° Note: Full comparison available with k ‚â• 5 studies (current: k = {k})")    # Overall heterogeneity summary    print(f"\nüìã Heterogeneity Summary:")    print(f"  {'Statistic':<20} {'Value':>15} {'Interpretation':<30}")    print(f"  {'-'*20} {'-'*15} {'-'*30}")    print(f"  {'Q':<20} {Qt:>15.2f} {'Test statistic':<30}")    print(f"  {'P-value':<20} {p_heterogeneity:>15.4g} {q_interp.split('(')[0].strip():<30}")    print(f"  {'I¬≤':<20} {I_squared:>14.1f}% {i2_interp.split('(')[0].strip():<30}")    print(f"  {'Tau¬≤':<20} {tau_squared_DL:>15.4f} {'Between-study variance':<30}")    print(f"  {'Tau':<20} {tau_DL:>15.4f} {'Between-study SD':<30}")    # Continue to Part 2...    # --- STEP 6: RANDOM-EFFECTS MODEL ---    print("\n" + "="*70)    print("STEP 5: RANDOM-EFFECTS MODEL")    print("="*70)    print(f"\nüìê Model Assumption:")    print(f"  Studies estimate different but related true effects")    print(f"  Accounts for both within-study and between-study variation")    print(f"  More conservative when heterogeneity is present")    print(f"\nüî¢ Calculating random-effects weights...")    print(f"  Formula: w_random = 1 / (variance + œÑ¬≤)")    # Calculate random-effects weights    analysis_data['w_random'] = 1 / (analysis_data[var_col] + tau_squared_DL)    sum_w_random = analysis_data['w_random'].sum()    # Define variable for KNAPP-HARTUNG use    tau_squared = tau_squared_DL    if sum_w_random <= 0:        print(f"\n‚ùå WARNING: Sum of random-effects weights is non-positive")        print(f"  This should not occur with valid data")        pooled_effect_random = np.nan        pooled_var_random = np.nan        pooled_SE_random = np.nan        ci_lower_random = np.nan        ci_upper_random = np.nan        z_stat_random = np.nan        p_value_random = np.nan        pi_lower_random = np.nan        pi_upper_random = np.nan    else:        print(f"  Sum of random-effects weights: {sum_w_random:.2f}")        print(f"  Sum of fixed-effects weights:  {sum_w_fixed:.2f}")        # Ratio comparison        weight_ratio = sum_w_random / sum_w_fixed        print(f"  Weight ratio (RE/FE): {weight_ratio:.3f}")        if weight_ratio < 0.5:            print(f"  ‚Üí Random effects gives much less weight to studies (high heterogeneity)")        elif weight_ratio < 0.8:            print(f"  ‚Üí Random effects moderately reduces weights")        else:            print(f"  ‚Üí Random effects similar to fixed effects (low heterogeneity)")        # Pooled effect size        pooled_effect_random = (analysis_data['w_random'] * analysis_data[effect_col]).sum() / sum_w_random        # Variance of pooled effect        pooled_var_random = 1 / sum_w_random        pooled_SE_random = np.sqrt(pooled_var_random)        # 95% CI        ci_lower_random = pooled_effect_random - z_crit * pooled_SE_random        ci_upper_random = pooled_effect_random + z_crit * pooled_SE_random        # Test significance        z_stat_random = pooled_effect_random / pooled_SE_random        p_value_random = 2 * (1 - norm.cdf(abs(z_stat_random)))        # =============================================================================        # APPLY KNAPP-HARTUNG CORRECTION (if enabled)        # =============================================================================        # Initialize K-H variables to nan/None in case of skip        kh_results = None        pooled_SE_random_KH = np.nan        ci_lower_random_KH = np.nan        ci_upper_random_KH = np.nan        p_value_random_KH = np.nan        if k > 1 and use_kh_widget.value:            print("\n" + "="*70)            print("KNAPP-HARTUNG ADJUSTMENT")            print("="*70)            kh_results = calculate_knapp_hartung_ci(                yi=analysis_data[effect_col].values,                vi=analysis_data[var_col].values,                tau_sq=tau_squared,  # Uses the selected estimator                pooled_effect=pooled_effect_random,                alpha=0.05            )            # --- FIX STARTS HERE (Indentation was off for the next block) ---            if kh_results is not None:                print(f"\nüìê Applying Knapp-Hartung correction to random-effects CI:")                print(f"  ‚Ä¢ Degrees of freedom: {kh_results['df']}")                print(f"  ‚Ä¢ t critical value: {kh_results['t_crit']:.3f} (vs. 1.96 for normal)")                print(f"  ‚Ä¢ Q statistic: {kh_results['Q']:.3f}")                # Compare standard vs K-H                print(f"\nüìä Comparison of Methods:")                print(f"  {'Method':<22} {'SE':<10} {'95% CI Lower':<13} {'95% CI Upper':<13} {'P-value':<10}")                print(f"  {'-'*70}")                print(f"  {'Standard (Z-test)':<22} {pooled_SE_random:<10.4f} {ci_lower_random:<13.4f} {ci_upper_random:<13.4f} {p_value_random:<10.4g}")                print(f"  {'Knapp-Hartung (t)':<22} {kh_results['se_KH']:<10.4f} {kh_results['ci_lower']:<13.4f} {kh_results['ci_upper']:<13.4f} {kh_results['p_value']:<10.4g}")                # Calculate CI width difference                ci_width_standard = ci_upper_random - ci_lower_random                ci_width_kh = kh_results['ci_upper'] - kh_results['ci_lower']                width_increase = ((ci_width_kh - ci_width_standard) / ci_width_standard) * 100                print(f"\n  ‚Ä¢ K-H CI is {abs(width_increase):.1f}% {'wider' if width_increase > 0 else 'narrower'} than standard CI")                # Check if conclusion changes                standard_sig = p_value_random < 0.05                kh_sig = kh_results['p_value'] < 0.05                if standard_sig != kh_sig:                    print(f"\n  ‚ö†Ô∏è  IMPORTANT: Statistical significance CHANGES with K-H correction!")                    print(f"    Standard: p = {p_value_random:.4g} ({'significant' if standard_sig else 'non-significant'})")                    print(f"    K-H:      p = {kh_results['p_value']:.4g} ({'significant' if kh_sig else 'non-significant'})")                else:                    print(f"  ‚úì Conclusion does not change (both {'significant' if kh_sig else 'non-significant'})")                # Recommendation based on k                print(f"\nüí° RECOMMENDATION:")                if k < 20:                    print(f"  With k = {k} studies, the Knapp-Hartung method is RECOMMENDED.")                    print(f"  Report the K-H confidence interval as your primary result.")                else:                    print(f"  With k = {k} studies, both methods give similar results.")                    print(f"  K-H is more conservative and may be preferred.")                # Store both standard and K-H results                pooled_SE_random_KH = kh_results['se_KH']                ci_lower_random_KH = kh_results['ci_lower']                ci_upper_random_KH = kh_results['ci_upper']                p_value_random_KH = kh_results['p_value']                # Ensure 'overall_results' key exists before accessing                if 'overall_results' not in ANALYSIS_CONFIG:                    ANALYSIS_CONFIG['overall_results'] = {}                # Save to ANALYSIS_CONFIG                ANALYSIS_CONFIG['overall_results']['knapp_hartung'] = {                    'used': True,                    'se': kh_results['se_KH'],                    'ci_lower': kh_results['ci_lower'],                    'ci_upper': kh_results['ci_upper'],                    't_stat': kh_results['t_stat'],                    't_crit': kh_results['t_crit'],                    'df': kh_results['df'],                    'p_value': kh_results['p_value'],                    'Q': kh_results['Q'],                    'comparison': {                        'standard_se': pooled_SE_random,                        'standard_ci': [ci_lower_random, ci_upper_random],                        'standard_p': p_value_random,                        'kh_ci': [kh_results['ci_lower'], kh_results['ci_upper']],                        'width_increase_percent': width_increase,                        'significance_changed': standard_sig != kh_sig                    }                }                print(f"\n  ‚úì Results saved to ANALYSIS_CONFIG['overall_results']['knapp_hartung']")            else:                print("  ‚ö†Ô∏è  Knapp-Hartung calculation failed (e.g., k=1)")                if 'overall_results' not in ANALYSIS_CONFIG: ANALYSIS_CONFIG['overall_results'] = {}                ANALYSIS_CONFIG['overall_results']['knapp_hartung'] = {'used': False, 'reason': 'k_or_calc_error'}        # --- FIX ENDS HERE (Corrected indentation of the following elif/else blocks) ---        elif k <= 1:            print(f"\n  ‚ÑπÔ∏è  Knapp-Hartung correction not applicable (k={k})")            if 'overall_results' not in ANALYSIS_CONFIG: ANALYSIS_CONFIG['overall_results'] = {}            ANALYSIS_CONFIG['overall_results']['knapp_hartung'] = {'used': False, 'reason': 'k<=1'}        else:            print(f"\n  ‚ÑπÔ∏è  Knapp-Hartung correction not applied (user disabled)")            if 'overall_results' not in ANALYSIS_CONFIG: ANALYSIS_CONFIG['overall_results'] = {}            ANALYSIS_CONFIG['overall_results']['knapp_hartung'] = {'used': False, 'reason': 'user_disabled'}        # Display results (Standard RE results are displayed regardless of K-H correction status)        print(f"\nüìä Random-Effects Results:")        print(f"  {'Statistic':<25} {'Value':>15}")        print(f"  {'-'*25} {'-'*15}")        print(f"  {'Pooled ' + es_config['effect_label_short']:<25} {pooled_effect_random:>15.4f}")        print(f"  {'Standard Error (Z-test)':<25} {pooled_SE_random:>15.4f}")        print(f"  {'Variance':<25} {pooled_var_random:>15.6f}")        print(f"  {'95% CI Lower (Z-test)':<25} {ci_lower_random:>15.4f}")        print(f"  {'95% CI Upper (Z-test)':<25} {ci_upper_random:>15.4f}")        print(f"  {'Z-statistic':<25} {z_stat_random:>15.4f}")        print(f"  {'P-value (Z-test)':<25} {p_value_random:>15.4g}")        # Check K-H status and display K-H results if available        if kh_results is not None:            print(f"  {'SE (K-H t-test)':<25} {pooled_SE_random_KH:>15.4f}")            print(f"  {'95% CI Lower (K-H t-test)':<25} {ci_lower_random_KH:>15.4f}")            print(f"  {'95% CI Upper (K-H t-test)':<25} {ci_upper_random_KH:>15.4f}")            print(f"  {'P-value (K-H t-test)':<25} {p_value_random_KH:>15.4g}")        # Interpretation for ratio-based measures        if es_config['has_fold_change']:            print(f"\nüìà Biological Interpretation:")            if effect_type == 'lnRR':                pooled_RR_random = np.exp(pooled_effect_random)                pooled_fold_random = pooled_RR_random if pooled_effect_random >= 0 else -1/pooled_RR_random                pooled_pct_random = (pooled_RR_random - 1) * 100                ci_lower_RR_random = np.exp(ci_lower_random)                ci_upper_RR_random = np.exp(ci_upper_random)                print(f"  {'Metric':<30} {'Value':>15}")                print(f"  {'-'*30} {'-'*15}")                print(f"  {'Response Ratio (RR)':<30} {pooled_RR_random:>15.3f}")                print(f"  {'Fold-change':<30} {pooled_fold_random:>+14.2f}√ó")                print(f"  {'Percent change':<30} {pooled_pct_random:>+14.1f}%")                print(f"  {'95% CI (RR scale)':<30} [{ci_lower_RR_random:.3f}, {ci_upper_RR_random:.3f}]")                # Direction interpretation                if pooled_effect_random > 0.05:                    direction = "INCREASE (upregulation)"                elif pooled_effect_random < -0.05:                    direction = "DECREASE (downregulation)"                else:                    direction = "NO CHANGE"                print(f"\n  Overall direction: {direction}")            elif effect_type == 'log_or':                pooled_OR_random = np.exp(pooled_effect_random)                ci_lower_OR_random = np.exp(ci_lower_random)                ci_upper_OR_random = np.exp(ci_upper_random)                print(f"  {'Metric':<30} {'Value':>15}")                print(f"  {'-'*30} {'-'*15}")                print(f"  {'Odds Ratio (OR)':<30} {pooled_OR_random:>15.3f}")                print(f"  {'95% CI (OR scale)':<30} [{ci_lower_OR_random:.3f}, {ci_upper_OR_random:.3f}]")                if pooled_OR_random > 1:                    direction = "Positive association"                elif pooled_OR_random < 1:                    direction = "Negative association"                else:                    direction = "No association"                print(f"\n  Interpretation: {direction}")        # Significance interpretation        print(f"\nüìå Statistical Significance:")        if p_value_random < 0.001:            sig_text_re = "HIGHLY SIGNIFICANT (p < 0.001)"            sig_symbol_re = "***"        elif p_value_random < 0.01:            sig_text_re = "VERY SIGNIFICANT (p < 0.01)"            sig_symbol_re = "**"        elif p_value_random < 0.05:            sig_text_re = "SIGNIFICANT (p < 0.05)"            sig_symbol_re = "*"        else:            sig_text_re = "NOT SIGNIFICANT (p ‚â• 0.05)"            sig_symbol_re = "ns"        print(f"  The overall effect is {sig_text_re} {sig_symbol_re}")        # --- STEP 7: 95% PREDICTION INTERVAL ---        print("\n" + "="*70)        print("STEP 6: 95% PREDICTION INTERVAL")        print("="*70)        print(f"\nüìä Prediction Interval (PI):")        print(f"  Estimates where the true effect in a NEW study is expected to fall")        print(f"  Wider than CI because it accounts for between-study heterogeneity")        print(f"  More clinically relevant than CI for assessing effect consistency")        if k > 2:            # Degrees of freedom for t-distribution            df_pi = k - 2            t_crit = t.ppf(1 - alpha / 2, df=df_pi)            # Standard error for prediction            # SE_prediction = sqrt(œÑ¬≤ + SE¬≤_pooled)            se_prediction = np.sqrt(tau_squared_DL + pooled_var_random)            # Calculate prediction interval            pi_lower_random = pooled_effect_random - t_crit * se_prediction            pi_upper_random = pooled_effect_random + t_crit * se_prediction            print(f"\n  üìè Calculation Details:")            print(f"    Pooled effect: {pooled_effect_random:.4f}")            print(f"    Tau¬≤ (between-study var): {tau_squared_DL:.6f}")            print(f"    SE¬≤ (pooled estimate): {pooled_var_random:.6f}")            print(f"    SE (prediction): {se_prediction:.4f}")            print(f"    t-critical value (df={df_pi}): {t_crit:.3f}")            print(f"    Margin of error: ¬±{t_crit * se_prediction:.4f}")            print(f"\n  üìä Results:")            print(f"    95% Prediction Interval: [{pi_lower_random:.4f}, {pi_upper_random:.4f}]")            # Compare PI width to CI width            ci_width = ci_upper_random - ci_lower_random            pi_width = pi_upper_random - pi_lower_random            width_ratio = pi_width / ci_width if ci_width > 0 else np.inf            print(f"\n  üìê Interval Comparison:")            print(f"    CI width: {ci_width:.4f}")            print(f"    PI width: {pi_width:.4f}")            print(f"    Ratio (PI/CI): {width_ratio:.2f}√ó")            if width_ratio > 3:                print(f"    ‚Üí PI much wider than CI (substantial heterogeneity)")            elif width_ratio > 1.5:                print(f"    ‚Üí PI moderately wider than CI (moderate heterogeneity)")            else:                print(f"    ‚Üí PI similar to CI (low heterogeneity)")            # Interpretation for ratio measures            if es_config['has_fold_change'] and effect_type == 'lnRR':                pi_lower_RR = np.exp(pi_lower_random)                pi_upper_RR = np.exp(pi_upper_random)                print(f"\n  üìà Prediction Interval (RR scale):")                print(f"    95% PI: [{pi_lower_RR:.3f}, {pi_upper_RR:.3f}]")            # Check if PI includes null            null_value = es_config['null_value']            pi_includes_null = (pi_lower_random <= null_value <= pi_upper_random)            print(f"\n  üí° Interpretation:")            if pi_includes_null:                print(f"    ‚ö†Ô∏è  PI includes null effect ({null_value})")                print(f"    ‚Üí A future study could plausibly find no effect")                print(f"    ‚Üí Effect direction may not be consistent across all contexts")            else:                print(f"    ‚úì PI excludes null effect ({null_value})")                print(f"    ‚Üí Future studies expected to show consistent effect direction")                print(f"    ‚Üí High confidence in effect direction")            print(f"\n  üìù Note: In 95% of similar future studies, the true effect")            print(f"    is predicted to lie between {pi_lower_random:.4f} and {pi_upper_random:.4f}")        else:            print(f"\n  ‚ö†Ô∏è  Skipped: Not enough studies for prediction interval")            print(f"    Requires at least 3 studies (k ‚â• 3)")            print(f"    Current k = {k}")            pi_lower_random = np.nan            pi_upper_random = np.nan# --- STEP 8: MODEL COMPARISON ---print("\n" + "="*70)print("STEP 7: MODEL COMPARISON")print("="*70)if k > 1:    print(f"\nüìä Side-by-Side Comparison:")    print(f"\n  {'Model':<20} {'Effect':>12} {'SE':>10} {'95% CI':>28} {'P-value':>10}")    print(f"  {'-'*82}")    # Fixed-effects    fe_ci_str = f"[{ci_lower_fixed:>7.4f}, {ci_upper_fixed:>7.4f}]"    print(f"  {'Fixed-Effects':<20} {pooled_effect_fixed:>12.4f} {pooled_SE_fixed:>10.4f} {fe_ci_str:>28} {p_value_fixed:>10.4g}")    # Random-effects    if pd.notna(pooled_effect_random):        # Determine which CI/SE to use for display in the main comparison table        if kh_results is not None:            re_se_disp = pooled_SE_random_KH            re_ci_lower_disp = ci_lower_random_KH            re_ci_upper_disp = ci_upper_random_KH            re_p_value_disp = p_value_random_KH            model_label = 'Random-Effects (K-H)'        else:            re_se_disp = pooled_SE_random            re_ci_lower_disp = ci_lower_random            re_ci_upper_disp = ci_upper_random            re_p_value_disp = p_value_random            model_label = 'Random-Effects (Z)'        re_ci_str = f"[{re_ci_lower_disp:>7.4f}, {re_ci_upper_disp:>7.4f}]"        print(f"  {model_label:<20} {pooled_effect_random:>12.4f} {re_se_disp:>10.4f} {re_ci_str:>28} {re_p_value_disp:>10.4g}")        # Prediction interval        if pd.notna(pi_lower_random):            pi_str = f"[{pi_lower_random:>7.4f}, {pi_upper_random:>7.4f}]"            print(f"  {'95% Pred. Interval':<20} {'':<12} {'':<10} {pi_str:>28} {'':<10}")    # Calculate and display differences    if pd.notna(pooled_effect_random):        effect_diff = pooled_effect_random - pooled_effect_fixed        effect_diff_pct = (effect_diff / abs(pooled_effect_fixed)) * 100 if pooled_effect_fixed != 0 else np.inf        se_diff = pooled_SE_random - pooled_SE_fixed        se_ratio = pooled_SE_random / pooled_SE_fixed if pooled_SE_fixed > 0 else np.inf        print(f"\n  üìè Model Differences:")        print(f"    Effect difference (RE - FE): {effect_diff:+.4f} ({effect_diff_pct:+.1f}%)")        print(f"    SE difference (RE - FE): {se_diff:+.4f}")        print(f"    SE ratio (RE / FE): {se_ratio:.2f}√ó")        # Interpretation        print(f"\n  üí° Interpretation:")        if abs(effect_diff) < 0.05:            print(f"    ‚úì Models agree very closely")            print(f"      ‚Üí Low heterogeneity, either model acceptable")        elif abs(effect_diff) < 0.15:            print(f"    ‚ö†Ô∏è  Models show small differences")            print(f"      ‚Üí Some heterogeneity present, random-effects preferred")        elif abs(effect_diff) < 0.3:            print(f"    ‚ö†Ô∏è  Models show moderate differences")            print(f"      ‚Üí Moderate heterogeneity, use random-effects")        else:            print(f"    üî¥ Models show substantial differences")            print(f"      ‚Üí High heterogeneity, must use random-effects")            print(f"      ‚Üí Investigate sources of heterogeneity")        if se_ratio > 1.5:            print(f"\n    ‚ö†Ô∏è  Random-effects SE is {se_ratio:.1f}√ó larger than fixed-effects")            print(f"      ‚Üí Random-effects provides more conservative estimates")        # Check agreement on significance        fe_sig = p_value_fixed < 0.05        # Use K-H p-value for RE significance comparison if available        re_sig_comparison = re_p_value_disp < 0.05 if pd.notna(re_p_value_disp) else re_sig        if fe_sig == re_sig_comparison:            print(f"\n    ‚úì Both models agree on statistical significance (using {model_label})")        else:            print(f"\n    ‚ö†Ô∏è  Models disagree on statistical significance! (using {model_label})")            if fe_sig and not re_sig_comparison:                print(f"      ‚Üí Fixed-effects significant, {model_label} not")                print(f"      ‚Üí Use random-effects ({model_label}, more conservative)")            else:                print(f"      ‚Üí {model_label} significant, fixed-effects not")                print(f"      ‚Üí Unlikely scenario, verify data")# --- STEP 9: RECOMMENDATIONS ---print("\n" + "="*70)print("STEP 8: INTERPRETATION & RECOMMENDATIONS")print("="*70)if k == 1:    print(f"\nüî¥ SINGLE STUDY LIMITATION")    print(f"\n  Current Status:")    print(f"  ‚Ä¢ Only one observation available")    print(f"  ‚Ä¢ Cannot perform meta-analysis")    print(f"  ‚Ä¢ Cannot assess heterogeneity")    print(f"  ‚Ä¢ Cannot evaluate publication bias")    print(f"\n  Recommendations:")    print(f"  1. Report single study results with appropriate caution")    print(f"  2. Acknowledge inability to generalize findings")    print(f"  3. Collect additional studies before drawing conclusions")    print(f"  4. Consider this a preliminary finding only")elif I_squared > 50 or (pd.notna(p_heterogeneity) and p_heterogeneity < 0.10):    print(f"\nüî¥ HIGH HETEROGENEITY DETECTED")    print(f"\n  Heterogeneity Metrics:")    print(f"  ‚Ä¢ I¬≤ = {I_squared:.1f}% ({i2_interp})")    print(f"  ‚Ä¢ Q test p-value = {p_heterogeneity:.4g}")    print(f"  ‚Ä¢ Tau¬≤ = {tau_squared_DL:.4f}")    print(f"  ‚Ä¢ Tau = {tau_DL:.4f}")    print(f"\n  üìã Required Actions:")    print(f"  1. ‚úì Use RANDOM-EFFECTS model (more conservative)")    print(f"  2. ‚úì Report prediction interval in addition to confidence interval")    print(f"  3. ‚ö†Ô∏è  Interpret pooled effect with caution")    print(f"  4. üîç Investigate sources of heterogeneity:")    print(f"\n  üîç Heterogeneity Investigation Plan:")    print(f"    a) Run subgroup analyses (available in next cells)")    print(f"       ‚Ä¢ Compare effects across study characteristics")    print(f"       ‚Ä¢ Test if moderators explain heterogeneity")    print(f"    b) Consider meta-regression (if sufficient studies)")    print(f"       ‚Ä¢ Continuous moderators (year, dose, duration)")    print(f"       ‚Ä¢ Categorical moderators (treatment type, outcome)")    print(f"    c) Conduct sensitivity analyses")    print(f"       ‚Ä¢ Remove outliers and reassess")    print(f"       ‚Ä¢ Exclude small studies")    print(f"       ‚Ä¢ Leave-one-out analysis")    print(f"    d) Check for influential studies")    print(f"       ‚Ä¢ Studies with very large/small effects")    print(f"       ‚Ä¢ Studies with large weights")    print(f"\n  üí° Reporting Guidelines:")    print(f"    ‚Ä¢ State: 'Substantial heterogeneity was present (I¬≤={I_squared:.1f}%)'")    print(f"    ‚Ä¢ Report both CI and PI for random-effects model")    if pd.notna(pi_lower_random) and (pi_lower_random <= es_config['null_value'] <= pi_upper_random): # Check against null value from config        print(f"    ‚Ä¢ Note: PI includes null, indicating effect may vary by context")    print(f"    ‚Ä¢ Discuss clinical/biological sources of heterogeneity")    print(f"    ‚Ä¢ Consider whether pooled estimate is meaningful")else:    print(f"\nüü¢ LOW-TO-MODERATE HETEROGENEITY")    print(f"\n  Heterogeneity Metrics:")    print(f"  ‚Ä¢ I¬≤ = {I_squared:.1f}% ({i2_interp})")    print(f"  ‚Ä¢ Q test p-value = {p_heterogeneity:.4g}")    print(f"  ‚Ä¢ Tau¬≤ = {tau_squared_DL:.4f}")    print(f"\n  üìã Recommendations:")    print(f"  1. ‚úì RANDOM-EFFECTS model preferred (conservative approach)")    print(f"  2. ‚úì Pooled effect is reliable and interpretable")    print(f"  3. ‚úì Both CI and PI relatively narrow and consistent")    print(f"  4. ‚ö†Ô∏è  Subgroup analyses still valuable for exploration")    print(f"\n  üí° Reporting Guidelines:")    print(f"    ‚Ä¢ State: 'Low heterogeneity was observed (I¬≤={I_squared:.1f}%)'")    print(f"    ‚Ä¢ Report random-effects pooled estimate as primary result")    print(f"    ‚Ä¢ Can mention fixed-effects agrees with random-effects")    print(f"    ‚Ä¢ Pooled estimate likely generalizable")# Effect size type specific recommendationsprint(f"\nüìä Recommendations for {es_config['effect_label']}:")if effect_type == 'lnRR':    print(f"\n  üìà Log Response Ratio Reporting:")    print(f"  ‚Ä¢ Always report both lnRR and fold-change for clarity")    print(f"  ‚Ä¢ State whether effect represents increase or decrease")    print(f"  ‚Ä¢ Consider reporting percent change for accessibility")    print(f"  ‚Ä¢ Compare magnitude to biologically relevant thresholds")    if k > 1 and 'pooled_fold_random' in locals():        print(f"  ‚Ä¢ Current pooled fold-change: {pooled_fold_random:+.2f}√ó")    print(f"\n  üîç Further Analyses to Consider:")    print(f"  ‚Ä¢ Separate meta-analyses for upregulation vs downregulation")    print(f"  ‚Ä¢ Check if effect varies by magnitude (small vs large changes)")    print(f"  ‚Ä¢ Assess if treatment duration affects effect size")elif effect_type in ['hedges_g', 'cohen_d']:    print(f"\n  üìä Standardized Mean Difference Reporting:")    print(f"  ‚Ä¢ Report effect size with Cohen's benchmark interpretation")    print(f"  ‚Ä¢ Provide context-specific interpretation when possible")    print(f"  ‚Ä¢ Note: Benchmarks are guidelines, not absolute rules")    print(f"  ‚Ä¢ Be cautious with effects |g| > 2 (often outliers)")    if k > 1:        # Classify pooled effect        abs_effect = abs(pooled_effect_random) if pd.notna(pooled_effect_random) else 0        if abs_effect >= 0.8:            magnitude = "LARGE"        elif abs_effect >= 0.5:            magnitude = "MEDIUM"        elif abs_effect >= 0.2:            magnitude = "SMALL"        else:            magnitude = "NEGLIGIBLE"        print(f"  ‚Ä¢ Current pooled effect magnitude: {magnitude} (|g|={abs_effect:.2f})")elif effect_type == 'log_or':    print(f"\n  üìä Log Odds Ratio Reporting:")    print(f"  ‚Ä¢ Always convert to OR for interpretation")    print(f"  ‚Ä¢ Clarify what OR > 1 vs OR < 1 means in your context")    print(f"  ‚Ä¢ Consider reporting as risk ratio if appropriate")    if k > 1 and 'pooled_OR_random' in locals() and pd.notna(pooled_OR_random):        print(f"  ‚Ä¢ Current pooled OR: {pooled_OR_random:.3f}")# Statistical power considerationif k > 1:    print(f"\n‚ö° Statistical Power Consideration:")    if k < 5:        print(f"  ‚ö†Ô∏è  Small number of studies (k={k})")        print(f"      ‚Ä¢ Limited power to detect heterogeneity")        print(f"      ‚Ä¢ Subgroup analyses may be underpowered")        print(f"      ‚Ä¢ I¬≤ estimate may be imprecise")    elif k < 10:        print(f"  üìä Moderate number of studies (k={k})")        print(f"      ‚Ä¢ Adequate power for overall effect")        print(f"      ‚Ä¢ Moderate power for heterogeneity tests")        print(f"      ‚Ä¢ Some subgroup analyses may be feasible")    else:        print(f"  ‚úì Good number of studies (k={k})")        print(f"      ‚Ä¢ Good power for all analyses")        print(f"      ‚Ä¢ Reliable heterogeneity estimates")        print(f"      ‚Ä¢ Subgroup analyses well-powered")# --- STEP 10: SAVE RESULTS ---print("\n" + "="*70)print("STEP 9: SAVING RESULTS")print("="*70)# Check if K-H results are available, and use them as the primary RE output if soif kh_results is not None:    final_re_se = pooled_SE_random_KH    final_re_ci_lower = ci_lower_random_KH    final_re_ci_upper = ci_upper_random_KH    final_re_p_value = p_value_random_KHelse:    final_re_se = pooled_SE_random    final_re_ci_lower = ci_lower_random    final_re_ci_upper = ci_upper_random    final_re_p_value = p_value_randomANALYSIS_CONFIG['overall_results'] = {    'timestamp': datetime.datetime.now(),    'k': k,    'k_papers': k_papers,    # Fixed-effects    'pooled_effect_fixed': pooled_effect_fixed,    'pooled_var_fixed': pooled_var_fixed,    'pooled_SE_fixed': pooled_SE_fixed if k > 1 else np.nan,    'ci_lower_fixed': ci_lower_fixed if k > 1 else np.nan,    'ci_upper_fixed': ci_upper_fixed if k > 1 else np.nan,    'z_stat_fixed': z_stat_fixed if k > 1 else np.nan,    'p_value_fixed': p_value_fixed if k > 1 else np.nan,    # Heterogeneity    'Qt': Qt,    'df_Q': df_Q if k > 1 else np.nan,    'p_heterogeneity': p_heterogeneity,    'I_squared': I_squared,    'I_squared_interpretation': i2_interp if k > 1 else 'N/A',    'tau_squared': tau_squared_DL,    'tau': tau_DL if k > 1 else np.nan,    # Random-effects (Standard Z-test results for consistency)    'pooled_effect_random': pooled_effect_random,    'pooled_var_random': pooled_var_random,    'pooled_SE_random_Z': pooled_SE_random if k > 1 and pd.notna(pooled_effect_random) else np.nan,    'ci_lower_random_Z': ci_lower_random if k > 1 and pd.notna(pooled_effect_random) else np.nan,    'ci_upper_random_Z': ci_upper_random if k > 1 and pd.notna(pooled_effect_random) else np.nan,    'z_stat_random': z_stat_random if k > 1 and pd.notna(pooled_effect_random) else np.nan,    'p_value_random_Z': p_value_random if k > 1 and pd.notna(pooled_effect_random) else np.nan,    # Primary Reported Random-effects (Z or KH)    'pooled_SE_random_reported': final_re_se,    'ci_lower_random_reported': final_re_ci_lower,    'ci_upper_random_reported': final_re_ci_upper,    'p_value_random_reported': final_re_p_value,    # Prediction interval    'pi_lower_random': pi_lower_random,    'pi_upper_random': pi_upper_random,    'pi_df': df_pi if k > 2 and pd.notna(pi_lower_random) else np.nan,    # Model comparison    'effect_difference': effect_diff if k > 1 and pd.notna(pooled_effect_random) else np.nan,    'se_ratio': se_ratio if k > 1 and pd.notna(pooled_effect_random) else np.nan,    # Interpretation    'recommended_model': 'random-effects' if k > 1 and (I_squared > 25 or p_heterogeneity < 0.10) else 'either',    'heterogeneity_level': i2_color if k > 1 else 'N/A'}# Add fold-changes if applicableif es_config['has_fold_change'] and k > 1:    if effect_type == 'lnRR':        ANALYSIS_CONFIG['overall_results']['pooled_fold_fixed'] = pooled_fold_fixed        ANALYSIS_CONFIG['overall_results']['pooled_fold_random'] = pooled_fold_random        ANALYSIS_CONFIG['overall_results']['pooled_RR_fixed'] = pooled_RR_fixed        ANALYSIS_CONFIG['overall_results']['pooled_RR_random'] = pooled_RR_random        ANALYSIS_CONFIG['overall_results']['pooled_pct_change_random'] = pooled_pct_random    elif effect_type == 'log_or':        ANALYSIS_CONFIG['overall_results']['pooled_OR_fixed'] = pooled_OR_fixed        ANALYSIS_CONFIG['overall_results']['pooled_OR_random'] = pooled_OR_randomprint(f"\n‚úì Results saved to ANALYSIS_CONFIG['overall_results']")print(f"\nüìä Saved metrics include:")print(f"  ‚Ä¢ Pooled effects (fixed & random)")print(f"  ‚Ä¢ Confidence intervals (Standard Z-test & K-H if applied)")print(f"  ‚Ä¢ Prediction interval")print(f"  ‚Ä¢ Heterogeneity statistics (Q, I¬≤, Tau¬≤)")print(f"  ‚Ä¢ P-values and significance tests")if es_config['has_fold_change']:    print(f"  ‚Ä¢ Fold-changes and interpretations")# Create summary metadataOVERALL_META_METADATA = {    'timestamp': datetime.datetime.now(),    'n_studies': k,    'n_papers': k_papers,    'model_recommended': ANALYSIS_CONFIG['overall_results']['recommended_model'],    'heterogeneity': {        'I_squared': I_squared,        'level': i2_interp if k > 1 else 'N/A',        'p_value': p_heterogeneity,        'tau_squared': tau_squared_DL    },    'primary_result': {        'effect': pooled_effect_random if k > 1 else pooled_effect_fixed,        # Use reported values for primary summary        'ci_lower': final_re_ci_lower if k > 1 else ci_lower_fixed,        'ci_upper': final_re_ci_upper if k > 1 else ci_upper_fixed,        'p_value': final_re_p_value if k > 1 else p_value_fixed,        'significant': (final_re_p_value < 0.05) if k > 1 and pd.notna(final_re_p_value) else False    }}print(f"\n‚úì Metadata saved to OVERALL_META_METADATA")# --- FINAL STATUS ---print("\n" + "="*70)print("‚úÖ OVERALL META-ANALYSIS COMPLETE")print("="*70)if k > 1:    print(f"\nüìä Key Findings Summary:")    print(f"  ‚Ä¢ Studies analyzed: {k} observations from {k_papers} papers")    print(f"  ‚Ä¢ Pooled effect ({ANALYSIS_CONFIG['overall_results']['recommended_model']}): {pooled_effect_random:.4f}")    print(f"  ‚Ä¢ 95% CI (Reported): [{final_re_ci_lower:.4f}, {final_re_ci_upper:.4f}]")    if pd.notna(pi_lower_random):        print(f"  ‚Ä¢ 95% PI: [{pi_lower_random:.4f}, {pi_upper_random:.4f}]")    # Update sig_text_re and sig_symbol_re based on the reported p-value    if final_re_p_value < 0.001:        summary_sig_text_re = "HIGHLY SIGNIFICANT (p < 0.001)"    elif final_re_p_value < 0.01:        summary_sig_text_re = "VERY SIGNIFICANT (p < 0.01)"    elif final_re_p_value < 0.05:        summary_sig_text_re = "SIGNIFICANT (p < 0.05)"    else:        summary_sig_text_re = "NOT SIGNIFICANT (p ‚â• 0.05)"    print(f"  ‚Ä¢ Statistical significance: {summary_sig_text_re}")    print(f"  ‚Ä¢ Heterogeneity (I¬≤): {I_squared:.1f}% - {i2_interp}")    if es_config['has_fold_change'] and effect_type == 'lnRR':        print(f"\nüìà Biological Interpretation:")        print(f"  ‚Ä¢ Pooled fold-change: {pooled_fold_random:+.2f}√ó")        print(f"  ‚Ä¢ Response ratio: {pooled_RR_random:.3f}")        print(f"  ‚Ä¢ Percent change: {pooled_pct_random:+.1f}%")    print(f"\nüéØ Conclusion:")    if final_re_p_value < 0.05:        conclusion_sig = "statistically significant"    else:        conclusion_sig = "not statistically significant"    if I_squared < 50:        conclusion_het = "with low-to-moderate heterogeneity"    else:        conclusion_het = "with substantial heterogeneity"    print(f"  The overall effect is {conclusion_sig} {conclusion_het}.")    if I_squared > 50:        print(f"  Further investigation of heterogeneity sources is recommended.")    if pd.notna(pi_lower_random) and (pi_lower_random <= es_config['null_value'] <= pi_upper_random):        print(f"  ‚ö†Ô∏è  Note: Prediction interval includes null effect,")        print(f"      suggesting effect may vary substantially by context.")else:    print(f"\nüìä Single Study Summary:")    print(f"  ‚Ä¢ Effect size: {pooled_effect_fixed:.4f}")    print(f"  ‚Ä¢ 95% CI: [{ci_lower_fixed:.4f}, {ci_upper_fixed:.4f}]")    print(f"  ‚Ä¢ Meta-analysis not performed (k=1)")print(f"\n‚ñ∂Ô∏è  Next Steps:")print(f"  1. Review the overall pooled estimates above")print(f"  2. Run SUBGROUP ANALYSIS to explore heterogeneity (next cell)")print(f"  3. Create FOREST PLOTS for visualization")print(f"  4. Assess PUBLICATION BIAS with funnel plots")  # Check if K-H was used for the overall effect (which is the recommended result)print(f"  5. Conduct SENSITIVITY ANALYSES (leave-one-out)")if 'knapp_hartung' in ANALYSIS_CONFIG.get('overall_results', {}) and ANALYSIS_CONFIG['overall_results']['knapp_hartung']['used']:    print(f"  6. Note: Overall Random-Effects CI/P-value uses Knapp-Hartung correction.")if I_squared > 50:    print(f"\nüí° Priority Recommendations:")    print(f"  ‚Ä¢ High heterogeneity detected - subgroup analysis is essential")    print(f"  ‚Ä¢ Consider meta-regression if moderators are available")    print(f"  ‚Ä¢ Check for outliers and influential studies")# =============================================================================# DISPLAY CONFIGURATION WIDGET AT END# =============================================================================# Widget is displayed here so it's visible and not buried by analysis outputprint("\n" + "="*70)print("‚öôÔ∏è  META-ANALYSIS CONFIGURATION")print("="*70)print()print("You can modify the heterogeneity estimator and CI method below:")print()# Display method selection widgetconfig_box = widgets.VBox([    method_help,    tau_method_widget,    kh_help,    use_kh_widget,    rerun_message], layout=widgets.Layout(    border='2px solid #2E86AB',    padding='15px',    margin='10px 0'))display(config_box)# Add helpful completion messagedisplay(widgets.HTML(    "<div style='background-color: #d4edda; border-left: 4px solid #28a745; padding: 12px; margin: 15px 0; border-radius: 4px;'>"    "‚úÖ <b>Analysis Complete!</b><br><br>"    "‚Ä¢ Review the results above<br>"    "‚Ä¢ Modify the estimator/correction in the widget above if needed<br>"    "‚Ä¢ Re-run this cell to recalculate with a different method<br>"    "‚Ä¢ Proceed to the next cell for advanced analyses"    "</div>"))print("\n" + "="*70)

In [None]:
# ‚ö†Ô∏è PREREQUISITES:# - Must calculate effect sizes first## Expected runtime: < 5 seconds## INTERPRETATION:# - I¬≤ > 75% indicates considerable heterogeneity (high variability between studies)# - I¬≤ 50-75% = moderate heterogeneity# - I¬≤ < 50% = low heterogeneity# - Significant Q-test (p < 0.05) indicates heterogeneity is present# THREE-LEVEL META-ANALYTIC MODEL## Implementation based on:# - Cheung (2014). Modeling dependent effect sizes with three-level meta-analyses:#   A structural equation modeling approach. Psychological Methods, 19(2), 211-229.# - Van den Noortgate et al. (2013). Three-level meta-analysis of dependent effect sizes.#   Behavior Research Methods, 45(2), 576-594.## Model structure:#   Level 1: Sampling variance within studies (known, vi)#   Level 2: Variance between effect sizes within the same study (tau_squared_level2)#   Level 3: Variance between studies (tau_squared_level3)## Estimation uses REML (Restricted Maximum Likelihood) for unbiased variance estimates.#@title üìä THREE-LEVEL META-ANALYSIS (ADVANCED)#@title üìä THREE-LEVEL META-ANALYSIS (ADVANCED)# =============================================================================# CELL 6.5: THREE-LEVEL (MULTILEVEL) META-ANALYSIS# Purpose: Account for dependency of effect sizes clustered within studies# Method: REML estimation for three-level model (y_ij = Œº + u_i + r_ij + e_ij)# Dependencies: Cell 4.5 (calculate_tau_squared), Cell 6 (overall_results)# Outputs: 'three_level_results' in ANALYSIS_CONFIG# =============================================================================# --- 1. CORE HELPER FUNCTIONS (THREE-LEVEL REML) ---def _get_three_level_estimates(params, y_all, v_all, N_total, M_studies):    """    Core function to calculate estimates given variance components.    Model: y_ij = Œº + u_i + r_ij + e_ij           Var(y_ij) = œÑ¬≤ + œÉ¬≤ + v_ij           Cov(y_ij, y_ik) = œÑ¬≤    V_i = (D_i + œÉ¬≤I) + œÑ¬≤J  (where D_i = diag(v_ij))    Args:        params (list): [tau_squared, sigma_squared]        y_all (list): List of numpy arrays, one for each study's effect sizes        v_all (list): List of numpy arrays, one for each study's sampling variances        N_total (int): Total number of observations        M_studies (int): Total number of studies    Returns:        dict: A dictionary containing all key model estimates    """    try:        tau_sq, sigma_sq = params        if tau_sq < 0 or sigma_sq < 0:            return {'log_lik_reml': np.inf}        sum_log_det_Vi = 0.0        sum_S = 0.0      # sum(1' * V_i‚Åª¬π * 1)        sum_Sy = 0.0     # sum(1' * V_i‚Åª¬π * y_i)        sum_ySy = 0.0    # sum(y_i' * V_i‚Åª¬π * y_i)        for i in range(M_studies):            y_i = y_all[i]            v_i = v_all[i]            k_i = len(y_i)            # --- Efficient Matrix Calculations using Sherman-Morrison ---            # V_i = A + œÑ¬≤J, where A = diag(v_ij + œÉ¬≤)            # 1. Components of A            A_diag = v_i + sigma_sq            A_inv_diag = 1.0 / A_diag            log_det_A = np.sum(np.log(A_diag))            # 2. Common terms            sum_A_inv_1 = np.sum(A_inv_diag) # 1' * A‚Åª¬π * 1            term_S = 1.0 + tau_sq * sum_A_inv_1            # Handle potential singularity            if term_S <= 1e-10:                return {'log_lik_reml': np.inf}            # 3. Calculate log(det(V_i))            # det(V_i) = det(A) * (1 + œÑ¬≤ * 1'A‚Åª¬π1)            log_det_Vi = log_det_A + np.log(term_S)            sum_log_det_Vi += log_det_Vi            # 4. Calculate V_i‚Åª¬π components            A_inv_y = A_inv_diag * y_i          # A‚Åª¬π * y_i            sum_A_inv_y = np.sum(A_inv_y)     # 1' * A‚Åª¬π * y_i            # V_i‚Åª¬π * y_i = A‚Åª¬πy_i - (œÑ¬≤ * A‚Åª¬π1 * (1'A‚Åª¬πy_i)) / (1 + œÑ¬≤ * 1'A‚Åª¬π1)            Vi_inv_y = A_inv_y - (tau_sq * A_inv_diag * sum_A_inv_y) / term_S            # V_i‚Åª¬π * 1 = A‚Åª¬π1 - (œÑ¬≤ * A‚Åª¬π1 * (1'A‚Åª¬π1)) / (1 + œÑ¬≤ * 1'A‚Åª¬π1)            Vi_inv_1_vec = A_inv_diag - (tau_sq * A_inv_diag * sum_A_inv_1) / term_S            # 5. Sum components for overall model            S_i = np.sum(Vi_inv_1_vec)       # 1' * V_i‚Åª¬π * 1            Sy_i = np.sum(Vi_inv_y)        # 1' * V_i‚Åª¬π * y_i            ySy_i = np.dot(y_i, Vi_inv_y)  # y_i' * V_i‚Åª¬π * y_i            sum_S += S_i            sum_Sy += Sy_i            sum_ySy += ySy_i        # --- Pooled Effect (Œº) and Standard Error ---        if sum_S <= 1e-10:            return {'log_lik_reml': np.inf}        mu_hat = sum_Sy / sum_S        var_mu = 1.0 / sum_S        se_mu = np.sqrt(var_mu)        # --- Calculate Log-Likelihoods ---        # Residual sum of squares        residual_ss = sum_ySy - 2.0 * mu_hat * sum_Sy + mu_hat**2 * sum_S        # REML Log-Likelihood        log_lik_reml = -0.5 * (sum_log_det_Vi + np.log(sum_S) + residual_ss)        # ML Log-Likelihood (for AIC/BIC)        log_lik_ml = -0.5 * (N_total * np.log(2.0 * np.pi) + sum_log_det_Vi + residual_ss)        return {            'mu': mu_hat,            'se_mu': se_mu,            'var_mu': var_mu,            'log_lik_reml': log_lik_reml,            'log_lik_ml': log_lik_ml,            'tau_sq': tau_sq,            'sigma_sq': sigma_sq,            'sum_log_det_Vi': sum_log_det_Vi,            'residual_ss': residual_ss,            'sum_S_XViX': sum_S # This is X'V‚Åª¬πX        }    except (FloatingPointError, ValueError, np.linalg.LinAlgError):        # Catch numerical instability        return {'log_lik_reml': np.inf}def _negative_log_likelihood_reml(params, y_all, v_all, N_total, M_studies):    """Wrapper for optimizer. Returns negative REML log-likelihood."""    estimates = _get_three_level_estimates(params, y_all, v_all, N_total, M_studies)    return -estimates['log_lik_reml']def _run_three_level_reml(analysis_data, effect_col, var_col):    """    Main optimization function.    Finds REML estimates for œÑ¬≤ and œÉ¬≤.    """    print("  Preparing data for optimization...")    grouped = analysis_data.groupby('id')    y_all = [group[effect_col].values for _, group in grouped]    v_all = [group[var_col].values for _, group in grouped]    N_total = len(analysis_data)    M_studies = len(y_all)    if M_studies < 3:        print("  ‚ö†Ô∏è  WARNING: Fewer than 3 studies. REML estimates may be unstable.")    # --- Get starting values ---    # Use standard REML for œÑ¬≤ starting value    try:        # Check if advanced estimators are loaded (from Cell 4.5)        if 'calculate_tau_squared' in globals():            tau_sq_start, _ = calculate_tau_squared(analysis_data, effect_col, var_col, method='REML')        else:            # Fallback to DL if advanced function not found            tau_sq_start, _ = calculate_tau_squared(analysis_data, effect_col, var_col, method='DL')    except Exception as e:        print(f"  ‚ö†Ô∏è  Could not calculate starting tau¬≤: {e}. Defaulting to 0.1")        tau_sq_start = 0.1    sigma_sq_start = 0.01 # Start with small within-study variance    initial_params = [max(0, tau_sq_start), sigma_sq_start]    bounds = [(0, None), (0, None)] # Variances must be non-negative    print(f"  Starting parameters: œÑ¬≤={initial_params[0]:.4f}, œÉ¬≤={initial_params[1]:.4f}")    print("  Optimizing... (This may take a moment)")    # --- Run Optimizer ---    optimizer_result = minimize(        _negative_log_likelihood_reml,        x0=initial_params,        args=(y_all, v_all, N_total, M_studies),        method='L-BFGS-B',        bounds=bounds,        options={'ftol': 1e-10, 'gtol': 1e-6, 'maxiter': 500}    )    if not optimizer_result.success:        print(f"  ‚ùå OPTIMIZATION FAILED: {optimizer_result.message}")        return None, None, None    print(f"  ‚úì Optimization successful (Iterations: {optimizer_result.nit})")    # --- Get Final Estimates ---    tau_sq_est, sigma_sq_est = optimizer_result.x    final_estimates = _get_three_level_estimates(        [tau_sq_est, sigma_sq_est],        y_all, v_all, N_total, M_studies    )    # --- Calculate CIs for variance components (using Hessian) ---    print("  Calculating confidence intervals for variance components...")    try:        # Get inverse Hessian from optimizer        hess_inv = optimizer_result.hess_inv.todense()        se_vars = np.sqrt(np.diag(hess_inv))        se_tau_sq, se_sigma_sq = se_vars[0], se_vars[1]        # Use log-transform for CIs (variances cannot be negative)        # CI for log(œÑ¬≤)        log_tau_sq = np.log(tau_sq_est) if tau_sq_est > 0 else 0        se_log_tau_sq = se_tau_sq / tau_sq_est if tau_sq_est > 0 else 0        ci_lower_tau_sq = np.exp(log_tau_sq - 1.96 * se_log_tau_sq) if tau_sq_est > 0 else 0        ci_upper_tau_sq = np.exp(log_tau_sq + 1.96 * se_log_tau_sq) if tau_sq_est > 0 else 0        # CI for log(œÉ¬≤)        log_sigma_sq = np.log(sigma_sq_est) if sigma_sq_est > 0 else 0        se_log_sigma_sq = se_sigma_sq / sigma_sq_est if sigma_sq_est > 0 else 0        ci_lower_sigma_sq = np.exp(log_sigma_sq - 1.96 * se_log_sigma_sq) if sigma_sq_est > 0 else 0        ci_upper_sigma_sq = np.exp(log_sigma_sq + 1.96 * se_log_sigma_sq) if sigma_sq_est > 0 else 0        final_estimates['se_tau_sq'] = se_tau_sq        final_estimates['ci_lower_tau_sq'] = ci_lower_tau_sq        final_estimates['ci_upper_tau_sq'] = ci_upper_tau_sq        final_estimates['se_sigma_sq'] = se_sigma_sq        final_estimates['ci_lower_sigma_sq'] = ci_lower_sigma_sq        final_estimates['ci_upper_sigma_sq'] = ci_upper_sigma_sq    except Exception as e:        print(f"  ‚ö†Ô∏è  Could not compute CIs for variance components: {e}")        # Add NaNs for placeholders        final_estimates.update({            'se_tau_sq': np.nan, 'ci_lower_tau_sq': np.nan, 'ci_upper_tau_sq': np.nan,            'se_sigma_sq': np.nan, 'ci_lower_sigma_sq': np.nan, 'ci_upper_sigma_sq': np.nan        })    return final_estimates, (y_all, v_all, N_total, M_studies), optimizer_result# --- 2. WIDGET DEFINITIONS ---run_button = widgets.Button(    description='‚ñ∂ Run Three-Level Analysis',    button_style='success',    layout=widgets.Layout(width='450px', height='50px'),    style={'font_weight': 'bold'})analysis_output = widgets.Output()# --- 3. MAIN BUTTON HANDLER ---@run_button.on_clickdef run_analysis(b):    with analysis_output:        clear_output(wait=True)        print("="*70)        print("RUNNING THREE-LEVEL META-ANALYSIS")        print("="*70)        print(f"Timestamp: {datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")        try:            # --- 1. Load Config and Data ---            print("STEP 1: LOADING CONFIGURATION")            print("---------------------------------")            if 'ANALYSIS_CONFIG' not in globals():                raise NameError("ANALYSIS_CONFIG not found. Run previous cells first.")            if 'analysis_data' in ANALYSIS_CONFIG:                analysis_data = ANALYSIS_CONFIG['analysis_data']            elif 'data_filtered' in globals():                analysis_data = data_filtered            else:                raise ValueError("Cannot find 'analysis_data' or 'data_filtered'")            effect_col = ANALYSIS_CONFIG['effect_col']            var_col = ANALYSIS_CONFIG['var_col']            es_config = ANALYSIS_CONFIG['es_config']            overall_results = ANALYSIS_CONFIG['overall_results']            print(f"  ‚úì Effect: {es_config['effect_label']} ({effect_col})")            print(f"  ‚úì Variance: {var_col}")            # --- 2. Auto-Detection Check ---            print("\nSTEP 2: CHECKING DATA STRUCTURE")            print("---------------------------------")            k_obs = len(analysis_data)            k_studies = analysis_data['id'].nunique()            avg_obs = k_obs / k_studies            print(f"  ‚Ä¢ Total observations (k_obs): {k_obs}")            print(f"  ‚Ä¢ Total studies (k_studies):  {k_studies}")            print(f"  ‚Ä¢ Avg. observations/study:  {avg_obs:.2f}")            if k_obs == k_studies:                print("\n‚úÖ AUTO-DETECTION: NOT REQUIRED")                print("  Each study contributes only one effect size.")                print("  The standard meta-analysis (Cell 6) is appropriate.")                print("  Three-level model is not necessary.")                ANALYSIS_CONFIG['three_level_results'] = {'status': 'not_required'}                return            print("\n  ‚úì Dependent effect sizes detected. Proceeding with three-level model.")            # --- 3. Run REML Optimization ---            print("\nSTEP 3: RUNNING THREE-LEVEL REML ESTIMATION")            print("---------------------------------")            estimates, data_lists, optimizer_result = _run_three_level_reml(analysis_data, effect_col, var_col)            if estimates is None:                raise RuntimeError("REML optimization failed to converge.")            # Unpack data_lists to get N_total and M_studies            y_all, v_all, N_total, M_studies = data_lists            # --- 4. Calculate Final Results ---            print("\nSTEP 4: CALCULATING FINAL ESTIMATES")            print("---------------------------------")            mu = estimates['mu']            se_mu = estimates['se_mu']            var_mu = estimates['var_mu']            ci_lower = mu - 1.96 * se_mu            ci_upper = mu + 1.96 * se_mu            p_value = 2 * (1 - norm.cdf(abs(mu / se_mu)))            tau_sq = estimates['tau_sq']            sigma_sq = estimates['sigma_sq']            ci_lower_tau_sq = estimates['ci_lower_tau_sq']            ci_upper_tau_sq = estimates['ci_upper_tau_sq']            ci_lower_sigma_sq = estimates['ci_lower_sigma_sq']            ci_upper_sigma_sq = estimates['ci_upper_sigma_sq']            # --- 5. Calculate Diagnostics ---            print("\nSTEP 5: CALCULATING DIAGNOSTICS")            print("---------------------------------")            # ICC            total_var = tau_sq + sigma_sq            if total_var == 0:                ICC_level2, ICC_level3 = 0.0, 0.0            else:                ICC_level2 = (sigma_sq / total_var) * 100 # Within-study                ICC_level3 = (tau_sq / total_var) * 100   # Between-study            # AIC/BIC (k=3 params: mu, tau_sq, sigma_sq)            k_params = 3            log_lik_ml = estimates['log_lik_ml']            AIC = (2 * k_params) - (2 * log_lik_ml)            BIC = (k_params * np.log(N_total)) - (2 * log_lik_ml)            print("  ‚úì Diagnostics calculated")            # --- 6. Display Results ---            print("\n" + "="*70)            print("THREE-LEVEL MODEL: POOLED EFFECT")            print("="*70)            print(f"\n  {'Metric':<20} {'Estimate':>15} {'95% CI Lower':>15} {'95% CI Upper':>15}")            print(f"  {'-'*20} {'-'*15} {'-'*15} {'-'*15}")            print(f"  {es_config['effect_label']:<20} {mu:>15.4f} {ci_lower:>15.4f} {ci_upper:>15.4f}")            if es_config['has_fold_change']:                RR = np.exp(mu)                RR_CI_lower = np.exp(ci_lower)                RR_CI_upper = np.exp(ci_upper)                print(f"  {'Response Ratio (RR)':<20} {RR:>15.4f} {RR_CI_lower:>15.4f} {RR_CI_upper:>15.4f}")            print(f"\n  Z-value: {mu/se_mu:.4f}  |  P-value: {p_value:.4g}")            print("\n" + "="*70)            print("THREE-LEVEL MODEL: VARIANCE COMPONENTS")            print("="*70)            print(f"\n  {'Component':<25} {'Estimate (Var)':>15} {'95% CI Lower':>15} {'95% CI Upper':>15}")            print(f"  {'-'*25} {'-'*15} {'-'*15} {'-'*15}")            print(f"  Level 3: Between-Study (œÑ¬≤): {tau_sq:>15.4f} {ci_lower_tau_sq:>15.4f} {ci_upper_tau_sq:>15.4f}")            print(f"  Level 2: Within-Study (œÉ¬≤):  {sigma_sq:>15.4f} {ci_lower_sigma_sq:>15.4f} {ci_upper_sigma_sq:>15.4f}")            print(f"\n  Intraclass Correlation (ICC):")            print(f"  ‚Ä¢ {ICC_level3:6.1f}% of variance is between studies (Level 3)")            print(f"  ‚Ä¢ {ICC_level2:6.1f}% of variance is within studies (Level 2)")            print(f"\n  Model Fit:")            print(f"  ‚Ä¢ Log-Likelihood (REML): {estimates['log_lik_reml']:.3f}")            print(f"  ‚Ä¢ AIC: {AIC:.3f} | BIC: {BIC:.3f}")            # --- 7. Comparison Table ---            print("\n" + "="*70)            print("COMPARISON: STANDARD VS. THREE-LEVEL MODEL")            print("="*70)            std_effect = overall_results['pooled_effect_random']            std_ci_lower = overall_results['ci_lower_random']            std_ci_upper = overall_results['ci_upper_random']            std_se = overall_results['pooled_SE_random']            print(f"\n  {'Model':<25} {'Effect':>12} {'Std. Error':>12} {'95% CI Width':>12} {'95% CI':<25}")            print(f"  {'-'*25} {'-'*12} {'-'*12} {'-'*12} {'-'*25}")            print(f"  {'Standard (Cell 6)':<25} {std_effect:>12.4f} {std_se:>12.4f} "                  f"{(std_ci_upper - std_ci_lower):>12.4f} "                  f"[{std_ci_lower:.4f}, {std_ci_upper:.4f}]")            print(f"  {'Three-Level (Cell 6.5)':<25} {mu:>12.4f} {se_mu:>12.4f} "                  f"{(ci_upper - ci_lower):>12.4f} "                  f"[{ci_lower:.4f}, {ci_upper:.4f}]")            print("\n  üí° Interpretation:")            if se_mu > std_se:                se_diff = (se_mu - std_se) / std_se * 100                print(f"  ‚úì Three-level model SE is {se_diff:.1f}% larger (more conservative).")                print("  ‚úì This correctly accounts for data dependency.")            else:                print("  ‚ö†Ô∏è  Three-level model SE is not larger. Check model assumptions.")            # --- 8. Save Results ---            print("\nSTEP 6: SAVING RESULTS")            print("---------------------------------")            results_dict = {                'timestamp': datetime.datetime.now(),                'status': 'completed',                'k_obs': k_obs,                'k_studies': k_studies,                'pooled_effect': mu,                'se': se_mu,                'var': var_mu,                'ci_lower': ci_lower,                'ci_upper': ci_upper,                'p_value': p_value,                'tau_squared': tau_sq,                'se_tau_sq': estimates.get('se_tau_sq'),                'ci_lower_tau_sq': estimates.get('ci_lower_tau_sq'),                'ci_upper_tau_sq': estimates.get('ci_upper_tau_sq'),                'sigma_squared': sigma_sq,                'se_sigma_sq': estimates.get('se_sigma_sq'),                'ci_lower_sigma_sq': estimates.get('ci_lower_sigma_sq'),                'ci_upper_sigma_sq': estimates.get('ci_upper_sigma_sq'),                'ICC_level2_pct': ICC_level2,                'ICC_level3_pct': ICC_level3,                'log_lik_reml': estimates['log_lik_reml'],                'log_lik_ml': estimates['log_lik_ml'],                'AIC': AIC,                'BIC': BIC,                'optimizer_result': optimizer_result            }            ANALYSIS_CONFIG['three_level_results'] = results_dict            print("  ‚úì Results saved to ANALYSIS_CONFIG['three_level_results']")        except Exception as e:            print(f"\n‚ùå AN ERROR OCCURRED:\n")            print(f"  Type: {type(e).__name__}")            print(f"  Message: {e}")            print("\n  Traceback:")            traceback.print_exc(file=sys.stdout)            print("\n" + "="*70)            print("ANALYSIS FAILED. See error message above.")            print("Please check your data and configuration.")            print("="*70)# --- 5. DISPLAY WIDGETS ---# Check if required data is present before displayingtry:    if 'ANALYSIS_CONFIG' not in globals() or 'overall_results' not in ANALYSIS_CONFIG:        print("="*70)        print("‚ö†Ô∏è  PREREQUISITE NOT MET")        print("="*70)        print("Please run Cell 6 (Overall Meta-Analysis) before running this cell.")    else:        # Check for auto-detection        data_check = ANALYSIS_CONFIG.get('analysis_data', data_filtered)        k_obs_check = len(data_check)        k_studies_check = data_check['id'].nunique()        if k_obs_check == k_studies_check:            print("="*70)            print("‚úÖ THREE-LEVEL ANALYSIS NOT REQUIRED")            print("="*70)            print("  Your dataset has only one effect size per study.")            print("  The standard meta-analysis from Cell 6 is sufficient and correct.")        else:            print("="*70)            print("THREE-LEVEL ANALYSIS INTERFACE READY")            print("="*70)            print(f"  ‚úì Dependent effect sizes detected ({k_obs_check} effects from {k_studies_check} studies).")            print("  ‚úì This model is recommended to account for data dependency.")            print("  Click the button below to run the analysis.")            display(widgets.VBox([                widgets.HTML("<hr style='margin: 15px 0;'>"),                run_button,                analysis_output            ]))except Exception as e:    print(f"‚ùå An error occurred during initialization: {e}")    print("Please ensure the notebook has been run in order.")

In [None]:
# ‚ö†Ô∏è PREREQUISITES:# - Must have categorical moderator variable in your data# - Requires at least 2 groups with sufficient sample sizes (recommended: n ‚â• 3 per group)## Expected runtime: 5-20 seconds## INTERPRETATION:# - Significant Q_between (p < 0.05) indicates subgroups differ in their effect sizes# - Compare effect sizes and confidence intervals across subgroups# Hedges' g and Cohen's d Calculation# Methods from Hedges & Olkin (1985). Statistical methods for meta-analysis.# Academic Press.## Cohen's d = (M1 - M2) / SD_pooled# Hedges' g = d * (1 - 3/(4*df - 1))  # Small sample correction## These are standardized mean differences, common in psychology and medicine.#@title ‚öôÔ∏è SUBGROUP ANALYSIS CONFIGURATION#@title ‚öôÔ∏è SUBGROUP ANALYSIS CONFIGURATION# =============================================================================# CELL 7: SUBGROUP ANALYSIS CONFIGURATION# Purpose: Configure moderator variables and settings for subgroup analysis# Dependencies: Cell 6 (overall_results, analysis_data)# Outputs: ANALYSIS_CONFIG['subgroup_config'], interactive widgets# =============================================================================print("\n" + "="*70)print("SUBGROUP ANALYSIS CONFIGURATION")print("="*70)print(f"Timestamp: {datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")# --- STEP 1: CHECK PREREQUISITES ---print("\n" + "="*70)print("STEP 1: VERIFYING PREREQUISITES")print("="*70)try:    effect_col = ANALYSIS_CONFIG['effect_col']    var_col = ANALYSIS_CONFIG['var_col']    se_col = ANALYSIS_CONFIG['se_col']    es_config = ANALYSIS_CONFIG['es_config']    overall_results = ANALYSIS_CONFIG['overall_results']    Qt_overall = overall_results['Qt']    I_squared_overall = overall_results['I_squared']    print(f"‚úì Overall analysis results loaded successfully")    print(f"  Effect size: {es_config['effect_label']} ({es_config['effect_label_short']})")    print(f"  Effect column: {effect_col}")    print(f"  Overall Q statistic: {Qt_overall:.4f}")    print(f"  Overall I¬≤: {I_squared_overall:.2f}%")except KeyError as e:    print(f"‚ùå ERROR: Overall analysis results not found - {e}")    print("\nTroubleshooting:")    print("  1. Ensure Cell 6 (overall meta-analysis) was run successfully")    print("  2. Check that ANALYSIS_CONFIG['overall_results'] exists")    print("  3. Verify that analysis_data DataFrame is available")    raise# Check if analysis_data existsif 'analysis_data' not in globals():    print(f"\n‚ùå ERROR: analysis_data not found")    print(f"   Please ensure Cell 6 was executed successfully")    raise NameError("analysis_data not defined")# Dataset informationk_total = len(analysis_data)k_papers = analysis_data['id'].nunique()print(f"\nüìä Dataset Summary:")print(f"  ‚Ä¢ Total observations: {k_total}")print(f"  ‚Ä¢ Unique papers: {k_papers}")print(f"  ‚Ä¢ Avg obs per paper: {k_total/k_papers:.2f}")# Check if subgroup analysis is appropriateif k_total < 10:    print(f"\n‚ö†Ô∏è  WARNING: Limited data for subgroup analysis")    print(f"   With only {k_total} observations, subgroup analyses may be underpowered")    print(f"   Results should be interpreted with caution")elif k_total < 20:    print(f"\n‚ö†Ô∏è  CAUTION: Moderate data for subgroup analysis")    print(f"   With {k_total} observations, some subgroup combinations may have few studies")else:    print(f"\n‚úì Adequate data for subgroup analysis ({k_total} observations)")# --- STEP 2: IDENTIFY AVAILABLE MODERATOR COLUMNS ---print("\n" + "="*70)print("STEP 2: IDENTIFYING MODERATOR VARIABLES")print("="*70)print(f"\nüîç Scanning dataset for potential moderator variables...")# Exclude technical columnsexcluded_cols = [    'xe', 'sde', 'ne', 'xc', 'sdc', 'nc', 'id',    'sde_imputed', 'sdc_imputed', 'cv_e', 'cv_c',    'sde_was_imputed', 'sdc_was_imputed',    effect_col, var_col, se_col, 'w_fixed', 'w_random',    'ci_width']# Add effect-size-specific columns to excludeif es_config['has_fold_change']:    if 'Response_Ratio' in analysis_data.columns:        excluded_cols.extend(['Response_Ratio', 'RR_CI_lower', 'RR_CI_upper',                             'fold_change', 'Percent_Change'])    if 'Odds_Ratio' in analysis_data.columns:        excluded_cols.extend(['Odds_Ratio', 'OR_CI_lower', 'OR_CI_upper'])if 'hedges_g' in effect_col or 'cohen_d' in effect_col:    excluded_cols.extend(['df', 'sp', 'sp_squared', 'cohen_d', 'hedges_j'])# Add CI columnsci_cols = [c for c in analysis_data.columns if 'CI_' in c or 'ci_' in c]excluded_cols.extend(ci_cols)# Get categorical columns (potential moderators)available_moderators = [    col for col in analysis_data.columns    if col not in excluded_cols    and analysis_data[col].dtype == 'object'    and analysis_data[col].notna().sum() > 0  # Has some non-missing values]print(f"\nüìã Available Moderator Variables: {len(available_moderators)}")if not available_moderators:    print(f"\n‚ùå ERROR: No categorical moderator columns found in dataset")    print(f"\nAvailable columns in dataset:")    for col in analysis_data.columns:        dtype = analysis_data[col].dtype        n_unique = analysis_data[col].nunique()        print(f"  ‚Ä¢ {col}: {dtype} ({n_unique} unique values)")    print(f"\nüí° Troubleshooting:")    print(f"  1. Ensure your dataset contains categorical variables for grouping")    print(f"  2. Check that moderator columns are not all numeric")    print(f"  3. Verify column names match expected moderator variables")    raise ValueError("No moderators available for subgroup analysis")# Analyze moderator characteristicsmoderator_info = []for col in available_moderators:    n_categories = analysis_data[col].nunique()    n_missing = analysis_data[col].isna().sum()    pct_missing = (n_missing / len(analysis_data)) * 100    categories = sorted(analysis_data[col].dropna().unique())    # Calculate distribution statistics    value_counts = analysis_data[col].value_counts()    min_count = value_counts.min()    max_count = value_counts.max()    moderator_info.append({        'variable': col,        'n_categories': n_categories,        'n_missing': n_missing,        'pct_missing': pct_missing,        'categories': categories,        'min_count': min_count,        'max_count': max_count,        'value_counts': value_counts    })# Display moderator informationprint(f"\n{'Variable':<25} {'Categories':>12} {'Missing':>10} {'Range':>15}")print(f"{'-'*25} {'-'*12} {'-'*10} {'-'*15}")for info in moderator_info:    print(f"{info['variable']:<25} {info['n_categories']:>12} "          f"{info['n_missing']:>10} {info['min_count']:>6}-{info['max_count']:<6}")print(f"\nüìä Detailed Moderator Information:")for info in moderator_info:    print(f"\n  üîπ {info['variable']}")    print(f"     Categories: {info['n_categories']}")    print(f"     Missing: {info['n_missing']} ({info['pct_missing']:.1f}%)")    print(f"     Values: {', '.join(str(c) for c in info['categories'][:5])}"          f"{' ...' if len(info['categories']) > 5 else ''}")    # Show distribution    print(f"     Distribution:")    for category, count in info['value_counts'].items():        papers = analysis_data[analysis_data[info['variable']] == category]['id'].nunique()        pct = (count / len(analysis_data)) * 100        print(f"       ‚Ä¢ {category}: {count} obs ({pct:.1f}%), {papers} papers")    # Warning for imbalanced categories    if info['min_count'] < 3:        print(f"     ‚ö†Ô∏è  Warning: Some categories have very few observations")# --- STEP 3: CREATE ANALYSIS TYPE SELECTION ---print("\n" + "="*70)print("STEP 3: CREATING INTERACTIVE CONFIGURATION")print("="*70)print(f"\nüé® Building interactive widgets...")# Analysis type selectionanalysis_type_widget = widgets.RadioButtons(    options=[        ('Single-Factor Subgroup Analysis', 'single'),        ('Two-Factor Subgroup Analysis (Interaction)', 'two_way')    ],    value='single',    description='Analysis Type:',    style={'description_width': 'initial'},    layout=widgets.Layout(width='600px'))# Info panel for analysis typesanalysis_type_info = {    'single': f"""    <div style='background-color: #e7f3ff; padding: 15px; border-radius: 8px; margin-top: 10px; border-left: 4px solid #0066cc;'>        <h4 style='margin-top: 0; color: #0066cc;'>üìä Single-Factor Subgroup Analysis</h4>        <p><b>Purpose:</b> Test if effect size varies across levels of <b>ONE</b> moderator variable</p>        <p><b>Example Question:</b></p>        <ul>            <li>Does the treatment effect differ between Blood Feeders vs. Herbivores?</li>            <li>Is the effect larger for JH Addition vs. JH Inhibition?</li>        </ul>        <p><b>Statistical Output:</b></p>        <ul>            <li>Pooled effect size for each subgroup (with 95% CI)</li>            <li>Test for differences between subgroups (Q<sub>between</sub> test)</li>            <li>Heterogeneity within each subgroup (Q<sub>within</sub>, I¬≤)</li>            <li>Proportion of heterogeneity explained by moderator (R¬≤)</li>        </ul>        <p><b>Best For:</b></p>        <ul>            <li>Exploring one main source of variation</li>            <li>When you have a primary moderator hypothesis</li>            <li>Datasets with 10+ observations per subgroup</li>        </ul>        <p><b>Current Dataset:</b> {k_total} total observations</p>    </div>    """,    'two_way': f"""    <div style='background-color: #fff3cd; padding: 15px; border-radius: 8px; margin-top: 10px; border-left: 4px solid #ff9800;'>        <h4 style='margin-top: 0; color: #856404;'>üìä Two-Factor Subgroup Analysis (Interaction)</h4>        <p><b>Purpose:</b> Test if effect size varies across combinations of <b>TWO</b> moderator variables</p>        <p><b>Example Question:</b></p>        <ul>            <li>Is the effect of treatment type (JH Addition vs. Inhibition) different for Blood Feeders vs. Herbivores?</li>            <li>Does the combination of diet type and treatment method influence effect size?</li>        </ul>        <p><b>Statistical Output:</b></p>        <ul>            <li>Pooled effect for each combination (e.g., Blood Feeders √ó JH Addition)</li>            <li>Test for overall differences across all combinations</li>            <li>Main effect of each factor</li>            <li>Interaction test (does Factor 1 effect depend on Factor 2?)</li>        </ul>        <p><b>Best For:</b></p>        <ul>            <li>Testing interaction effects between two variables</li>            <li>When effect of one moderator may depend on another</li>            <li>Datasets with sufficient observations per combination</li>        </ul>        <p><b>‚ö†Ô∏è Requirements:</b></p>        <ul>            <li>Minimum 3-5 studies per combination cell</li>            <li>Ideally 20+ total observations</li>            <li>Balanced or near-balanced design preferred</li>        </ul>        <p><b>Current Dataset:</b> {k_total} total observations ‚Üí check distribution carefully!</p>    </div>    """}analysis_type_output = widgets.Output()def update_analysis_type_info(change):    """Update info panel when analysis type changes"""    with analysis_type_output:        clear_output()        display(HTML(analysis_type_info[change['new']]))        # Update visibility of second moderator selector        if change['new'] == 'single':            moderator2_container.layout.visibility = 'hidden'            moderator2_container.layout.display = 'none'        else:            moderator2_container.layout.visibility = 'visible'            moderator2_container.layout.display = 'block'analysis_type_widget.observe(update_analysis_type_info, names='value')# Initialize with defaultwith analysis_type_output:    display(HTML(analysis_type_info['single']))# --- STEP 4: CREATE MODERATOR SELECTION WIDGETS ---print(f"  ‚úì Analysis type selector created")moderator1_label = widgets.HTML(    "<h4 style='color: #2E86AB; margin-bottom: 5px;'>üîç Select Moderator Variable(s)</h4>"    "<p style='margin-top: 0; color: #666;'><i>Choose categorical variables to explore sources of heterogeneity</i></p>")moderator1_widget = widgets.Dropdown(    options=available_moderators,    value=available_moderators[0],    description='Moderator 1:',    style={'description_width': '120px'},    layout=widgets.Layout(width='600px'))# Moderator 2 (only for two-way analysis)moderator2_widget = widgets.Dropdown(    options=['None'] + available_moderators,    value='None',    description='Moderator 2:',    style={'description_width': '120px'},    layout=widgets.Layout(width='600px'))moderator2_container = widgets.VBox([moderator2_widget])moderator2_container.layout.visibility = 'hidden'moderator2_container.layout.display = 'none'print(f"  ‚úì Moderator selectors created")# Preview of selected moderator(s)preview_output = widgets.Output()def update_moderator_preview(change=None):    """Show preview of selected moderator(s)"""    with preview_output:        clear_output()        mod1 = moderator1_widget.value        mod2 = moderator2_widget.value if analysis_type_widget.value == 'two_way' else None        print("\n" + "="*70)        print("MODERATOR SELECTION PREVIEW")        print("="*70)        # Moderator 1 info        print(f"\nüìä Moderator 1: {mod1}")        mod1_counts = analysis_data[mod1].value_counts().sort_index()        print(f"\n  Distribution:")        print(f"  {'Category':<30} {'Observations':>15} {'Papers':>10} {'Percent':>10}")        print(f"  {'-'*30} {'-'*15} {'-'*10} {'-'*10}")        for category, count in mod1_counts.items():            papers = analysis_data[analysis_data[mod1] == category]['id'].nunique()            pct = (count / len(analysis_data)) * 100            print(f"  {str(category):<30} {count:>15} {papers:>10} {pct:>9.1f}%")        print(f"  {'-'*30} {'-'*15} {'-'*10} {'-'*10}")        print(f"  {'TOTAL':<30} {len(analysis_data):>15} {analysis_data['id'].nunique():>10} {'100.0':>9}%")        # Check for adequate sample sizes        min_group = mod1_counts.min()        if min_group < 5:            print(f"\n  ‚ö†Ô∏è  WARNING: Smallest group has only {min_group} observations")            print(f"     Consider raising minimum thresholds or combining categories")        else:            print(f"\n  ‚úì All groups have ‚â• 5 observations")        # Moderator 2 info (if two-way)        if mod2 and mod2 != 'None':            print(f"\n{'‚îÄ'*70}")            print(f"üìä Moderator 2: {mod2}")            mod2_counts = analysis_data[mod2].value_counts().sort_index()            print(f"\n  Distribution:")            print(f"  {'Category':<30} {'Observations':>15} {'Papers':>10} {'Percent':>10}")            print(f"  {'-'*30} {'-'*15} {'-'*10} {'-'*10}")            for category, count in mod2_counts.items():                papers = analysis_data[analysis_data[mod2] == category]['id'].nunique()                pct = (count / len(analysis_data)) * 100                print(f"  {str(category):<30} {count:>15} {papers:>10} {pct:>9.1f}%")            # Show combination matrix            print(f"\n{'‚îÄ'*70}")            print(f"üìä Combination Matrix: {mod1} √ó {mod2}")            print(f"\n  Number of observations in each combination:\n")            crosstab = pd.crosstab(                analysis_data[mod1],                analysis_data[mod2],                margins=True,                margins_name='Total'            )            print(crosstab.to_string())            # Detailed cell analysis            print(f"\n  üìã Cell-by-Cell Analysis:")            for cat1 in mod1_counts.index:                for cat2 in mod2_counts.index:                    cell_data = analysis_data[(analysis_data[mod1] == cat1) & (analysis_data[mod2] == cat2)]                    n_obs = len(cell_data)                    n_papers = cell_data['id'].nunique()                    if n_obs > 0:                        status = "‚úì" if n_obs >= 5 else "‚ö†Ô∏è"                        print(f"    {status} {cat1} √ó {cat2}: {n_obs} obs, {n_papers} papers")            # Warnings for small cells            min_cell = crosstab.iloc[:-1, :-1].min().min()            if min_cell == 0:                print(f"\n  üî¥ ERROR: Some combinations have ZERO observations!")                print(f"     Two-way analysis not possible with empty cells")                print(f"     Recommendation: Use single-factor analysis")            elif min_cell < 3:                print(f"\n  ‚ö†Ô∏è  WARNING: Some combinations have very few observations (min = {min_cell})")                print(f"     Recommendations:")                print(f"       1. Increase minimum thresholds")                print(f"       2. Consider combining categories")                print(f"       3. Use single-factor analysis instead")            elif min_cell < 5:                print(f"\n  ‚ö†Ô∏è  CAUTION: Some combinations have limited observations (min = {min_cell})")                print(f"     Results may be unstable for small groups")            else:                print(f"\n  ‚úì All combinations have ‚â• 5 observations")# Attach observersmoderator1_widget.observe(update_moderator_preview, names='value')moderator2_widget.observe(update_moderator_preview, names='value')analysis_type_widget.observe(lambda change: update_moderator_preview(), names='value')print(f"  ‚úì Preview function configured")# Initialize previewupdate_moderator_preview()# Continue to Part 2...# --- STEP 5: MINIMUM THRESHOLDS ---print("\n" + "="*70)print("STEP 4: QUALITY THRESHOLD CONFIGURATION")print("="*70)thresholds_label = widgets.HTML(    "<h4 style='color: #2E86AB; margin-bottom: 5px;'>‚öôÔ∏è Quality Thresholds</h4>"    "<p style='margin-top: 0; color: #666;'><i>Subgroups not meeting these criteria will be excluded from analysis</i></p>")thresholds_desc = widgets.HTML("""    <div style='background-color: #f8f9fa; padding: 12px; border-radius: 5px; margin-bottom: 10px;'>        <p style='margin: 0;'><b>Purpose:</b> Ensure each subgroup has sufficient data for reliable estimation</p>        <ul style='margin: 5px 0;'>            <li><b>Min Papers:</b> Accounts for multiple observations from same study</li>            <li><b>Min Observations:</b> Total data points needed for stable estimates</li>        </ul>        <p style='margin: 0;'><b>Recommendation:</b> Higher thresholds = more reliable but fewer subgroups</p>    </div>""")min_papers_subgroup = widgets.IntSlider(    value=3,    min=1,    max=10,    step=1,    description='Min Papers/Group:',    style={'description_width': '150px'},    layout=widgets.Layout(width='550px'))min_obs_subgroup = widgets.IntSlider(    value=5,    min=2,    max=20,    step=1,    description='Min Observations/Group:',    style={'description_width': '150px'},    layout=widgets.Layout(width='550px'))# Dynamic threshold feedbackthreshold_feedback = widgets.Output()def update_threshold_feedback(change=None):    """Show impact of current thresholds"""    with threshold_feedback:        clear_output()        min_papers = min_papers_subgroup.value        min_obs = min_obs_subgroup.value        mod1 = moderator1_widget.value        print("\nüìä Impact Analysis:")        print(f"  Current thresholds: ‚â•{min_papers} papers AND ‚â•{min_obs} observations")        print(f"\n  Checking subgroups in '{mod1}'...")        # Check which subgroups meet criteria        groups_meeting_criteria = []        groups_failing_criteria = []        for category in analysis_data[mod1].dropna().unique():            group_data = analysis_data[analysis_data[mod1] == category]            n_papers = group_data['id'].nunique()            n_obs = len(group_data)            if n_papers >= min_papers and n_obs >= min_obs:                groups_meeting_criteria.append((category, n_obs, n_papers))            else:                reason = []                if n_papers < min_papers:                    reason.append(f"papers: {n_papers}<{min_papers}")                if n_obs < min_obs:                    reason.append(f"obs: {n_obs}<{min_obs}")                groups_failing_criteria.append((category, n_obs, n_papers, ", ".join(reason)))        print(f"\n  ‚úì Groups meeting criteria: {len(groups_meeting_criteria)}")        for cat, obs, papers in groups_meeting_criteria:            print(f"    ‚Ä¢ {cat}: {obs} obs, {papers} papers")        if groups_failing_criteria:            print(f"\n  ‚úó Groups excluded: {len(groups_failing_criteria)}")            for cat, obs, papers, reason in groups_failing_criteria:                print(f"    ‚Ä¢ {cat}: {obs} obs, {papers} papers (excluded: {reason})")        # Overall assessment        if len(groups_meeting_criteria) < 2:            print(f"\n  üî¥ ERROR: Need at least 2 groups for subgroup analysis!")            print(f"     Current thresholds too strict - please lower them")        elif len(groups_meeting_criteria) == 2:            print(f"\n  ‚ö†Ô∏è  WARNING: Only 2 groups available")            print(f"     Analysis will be limited to comparing these two groups")        else:            print(f"\n  ‚úì {len(groups_meeting_criteria)} groups available for analysis")        # Calculate total retained data        total_retained_obs = sum(obs for _, obs, _ in groups_meeting_criteria)        retention_rate = (total_retained_obs / len(analysis_data)) * 100        print(f"\n  üìà Data Retention:")        print(f"     Observations retained: {total_retained_obs}/{len(analysis_data)} ({retention_rate:.1f}%)")        if retention_rate < 50:            print(f"     ‚ö†Ô∏è  Less than 50% of data retained - consider lowering thresholds")        elif retention_rate < 75:            print(f"     ‚ö†Ô∏è  Moderate data loss - verify this is acceptable")        else:            print(f"     ‚úì Good data retention")# Attach observers to thresholdsmin_papers_subgroup.observe(update_threshold_feedback, names='value')min_obs_subgroup.observe(update_threshold_feedback, names='value')moderator1_widget.observe(update_threshold_feedback, names='value')print(f"  ‚úì Threshold widgets created")# Initialize threshold feedbackupdate_threshold_feedback()# --- STEP 6: RUN ANALYSIS BUTTON ---print("\n" + "="*70)print("STEP 5: CREATING RUN BUTTON")print("="*70)run_button = widgets.Button(    description='‚ñ∂ Run Subgroup Analysis',    button_style='success',    layout=widgets.Layout(width='450px', height='50px'),    style={'font_weight': 'bold', 'font_size': '14px'})run_output = widgets.Output()def on_run_button_clicked(b):    """Save configuration and prepare for analysis"""    with run_output:        clear_output()        print("\n" + "="*70)        print("VALIDATING CONFIGURATION")        print("="*70)        # Get selections        analysis_type = analysis_type_widget.value        moderator1 = moderator1_widget.value        moderator2 = moderator2_widget.value if analysis_type == 'two_way' and moderator2_widget.value != 'None' else None        min_papers = min_papers_subgroup.value        min_obs = min_obs_subgroup.value        # --- Validation Checks ---        validation_errors = []        validation_warnings = []        # Check 1: Two-way analysis requires moderator 2        if analysis_type == 'two_way' and not moderator2:            validation_errors.append("Two-way analysis requires selecting Moderator 2")        # Check 2: Moderators cannot be the same        if moderator1 == moderator2:            validation_errors.append("Moderator 1 and Moderator 2 cannot be the same variable")        # Check 3: At least 2 groups must meet criteria        groups_meeting_criteria = 0        valid_groups_list = []        if analysis_type == 'single':            for category in analysis_data[moderator1].dropna().unique():                group_data = analysis_data[analysis_data[moderator1] == category]                n_papers = group_data['id'].nunique()                n_obs = len(group_data)                if n_papers >= min_papers and n_obs >= min_obs:                    groups_meeting_criteria += 1                    valid_groups_list.append(category)        else:            # Two-way analysis - check each combination            for cat1 in analysis_data[moderator1].dropna().unique():                for cat2 in analysis_data[moderator2].dropna().unique():                    cell_data = analysis_data[(analysis_data[moderator1] == cat1) &                                             (analysis_data[moderator2] == cat2)]                    n_papers = cell_data['id'].nunique()                    n_obs = len(cell_data)                    if n_papers >= min_papers and n_obs >= min_obs:                        groups_meeting_criteria += 1                        valid_groups_list.append((cat1, cat2))        if groups_meeting_criteria < 2:            validation_errors.append(f"Only {groups_meeting_criteria} group(s) meet criteria. Need at least 2 groups for subgroup analysis. Lower thresholds or choose different moderator.")        # Check 4: For two-way, check for empty cells (WARNING, not ERROR)        if analysis_type == 'two_way' and moderator2:            crosstab = pd.crosstab(analysis_data[moderator1], analysis_data[moderator2])            n_empty_cells = (crosstab == 0).sum().sum()            total_cells = crosstab.shape[0] * crosstab.shape[1]            if n_empty_cells > 0:                validation_warnings.append(                    f"{n_empty_cells}/{total_cells} combinations have zero observations. "                    f"These empty cells will be automatically excluded from analysis. "                    f"Proceeding with {groups_meeting_criteria} valid combinations."                )            # Check for very small cells            min_cell = crosstab[crosstab > 0].min().min() if (crosstab > 0).any().any() else 0            if min_cell > 0 and min_cell < 3:                validation_warnings.append(                    f"Some combinations have very few observations (minimum = {min_cell}). "                    f"Results for these groups may be unstable."                )        # Check 5: Sufficient overall sample size        if len(analysis_data) < 10:            validation_warnings.append(                f"Limited total sample size ({len(analysis_data)} observations). "                f"Subgroup analysis may be underpowered."            )        # Display validation results        if validation_errors:            print("\n‚ùå VALIDATION FAILED")            print("\nErrors that must be fixed:")            for i, error in enumerate(validation_errors, 1):                print(f"  {i}. {error}")            print("\n‚ö†Ô∏è  Please fix the errors above and try again")            return        if validation_warnings:            print("\n‚ö†Ô∏è  VALIDATION WARNINGS")            print("\nWarnings (analysis will proceed, but be cautious):")            for i, warning in enumerate(validation_warnings, 1):                print(f"  {i}. {warning}")            print("\n‚úì Analysis can proceed - empty cells will be automatically excluded")        # --- Configuration Summary ---        print("\n" + "="*70)        print("‚úì VALIDATION PASSED - CONFIGURATION SAVED")        print("="*70)        print(f"\nüìã Subgroup Analysis Configuration:")        print(f"  {'Parameter':<30} {'Value':<40}")        print(f"  {'-'*30} {'-'*40}")        print(f"  {'Analysis Type':<30} {analysis_type:<40}")        print(f"  {'Primary Moderator':<30} {moderator1:<40}")        if moderator2:            print(f"  {'Secondary Moderator':<30} {moderator2:<40}")        print(f"  {'Min Papers per Group':<30} {min_papers:<40}")        print(f"  {'Min Observations per Group':<30} {min_obs:<40}")        print(f"  {'Valid Groups/Combinations':<30} {groups_meeting_criteria:<40}")        # Calculate expected data retention        if analysis_type == 'single':            retained_data = analysis_data[analysis_data[moderator1].isin(valid_groups_list)].copy()        else:            retained_data = analysis_data[                analysis_data.apply(                    lambda row: (row[moderator1], row[moderator2]) in valid_groups_list,                    axis=1                )            ].copy()        retention_pct = (len(retained_data) / len(analysis_data)) * 100        print(f"  {'Data Retained':<30} {len(retained_data)}/{len(analysis_data)} ({retention_pct:.1f}%)")        # Show which groups will be included        if analysis_type == 'two_way' and n_empty_cells > 0:            print(f"\nüìä Valid Combinations to be Analyzed:")            for i, (cat1, cat2) in enumerate(valid_groups_list, 1):                cell_data = analysis_data[(analysis_data[moderator1] == cat1) &                                         (analysis_data[moderator2] == cat2)]                print(f"  {i}. {cat1} √ó {cat2}: k={len(cell_data)}, papers={cell_data['id'].nunique()}")        # Save to config        ANALYSIS_CONFIG['subgroup_config'] = {            'timestamp': datetime.datetime.now(),            'analysis_type': analysis_type,            'moderator1': moderator1,            'moderator2': moderator2,            'min_papers': min_papers,            'min_obs': min_obs,            'expected_groups': groups_meeting_criteria,            'valid_groups_list': valid_groups_list,  # NEW: Store valid groups            'data_retained': len(retained_data),            'retention_pct': retention_pct,            'has_empty_cells': n_empty_cells > 0 if analysis_type == 'two_way' else False,            'n_empty_cells': n_empty_cells if analysis_type == 'two_way' else 0        }        # Save moderator information        ANALYSIS_CONFIG['subgroup_config']['moderator1_info'] = {            'name': moderator1,            'n_categories': analysis_data[moderator1].nunique(),            'categories': sorted(analysis_data[moderator1].dropna().unique().tolist())        }        if moderator2:            ANALYSIS_CONFIG['subgroup_config']['moderator2_info'] = {                'name': moderator2,                'n_categories': analysis_data[moderator2].nunique(),                'categories': sorted(analysis_data[moderator2].dropna().unique().tolist())            }        print(f"\n" + "="*70)        print("‚úì CONFIGURATION SAVED SUCCESSFULLY")        print("="*70)        print(f"\nüìä Configuration saved to: ANALYSIS_CONFIG['subgroup_config']")        print(f"\n‚ñ∂Ô∏è  Next Steps:")        print(f"  1. Review the configuration summary above")        if validation_warnings:            print(f"  2. Note the warnings - empty combinations will be excluded automatically")            print(f"  3. Run the next cell to perform subgroup analysis")        else:            print(f"  2. Run the next cell to perform subgroup analysis")        print(f"  4. Results will include:")        if analysis_type == 'single':            print(f"     ‚Ä¢ Pooled effects for each subgroup")            print(f"     ‚Ä¢ Test for between-group differences (Q-test)")            print(f"     ‚Ä¢ Within-group heterogeneity (I¬≤)")            print(f"     ‚Ä¢ Proportion of heterogeneity explained (R¬≤)")        else:            print(f"     ‚Ä¢ Pooled effects for {groups_meeting_criteria} valid combinations")            print(f"     ‚Ä¢ Main effects and interaction tests")            print(f"     ‚Ä¢ Heterogeneity decomposition")            if n_empty_cells > 0:                print(f"     ‚Ä¢ Note: {n_empty_cells} empty combinations automatically excluded")        print("\n" + "="*70)run_button.on_click(on_run_button_clicked)print(f"  ‚úì Run button configured with validation")# --- STEP 7: ASSEMBLE WIDGET LAYOUT ---print("\n" + "="*70)print("STEP 6: ASSEMBLING WIDGET INTERFACE")print("="*70)widget_layout = widgets.VBox([    widgets.HTML("<hr style='margin: 20px 0; border: none; border-top: 2px solid #ddd;'>"),    # Analysis Type Section    widgets.HTML("<h3 style='color: #2E86AB;'>1Ô∏è‚É£ Select Analysis Type</h3>"),    analysis_type_widget,    analysis_type_output,    widgets.HTML("<hr style='margin: 20px 0; border: none; border-top: 2px solid #ddd;'>"),    # Moderator Selection Section    widgets.HTML("<h3 style='color: #2E86AB;'>2Ô∏è‚É£ Select Moderator Variable(s)</h3>"),    moderator1_label,    moderator1_widget,    moderator2_container,    preview_output,    widgets.HTML("<hr style='margin: 20px 0; border: none; border-top: 2px solid #ddd;'>"),    # Threshold Section    widgets.HTML("<h3 style='color: #2E86AB;'>3Ô∏è‚É£ Set Quality Thresholds</h3>"),    thresholds_label,    thresholds_desc,    min_papers_subgroup,    min_obs_subgroup,    threshold_feedback,    widgets.HTML("<hr style='margin: 20px 0; border: none; border-top: 2px solid #ddd;'>"),    # Run Button Section    widgets.HTML("<h3 style='color: #2E86AB;'>4Ô∏è‚É£ Run Analysis</h3>"),    widgets.HTML("<p style='color: #666;'><i>Review your configuration above, then click the button to proceed</i></p>"),    run_button,    run_output])print(f"  ‚úì Widget layout assembled")# Display widgetsdisplay(widget_layout)print(f"\n‚úì Interactive interface displayed")# --- FINAL STATUS ---print("\n" + "="*70)print("‚úÖ SUBGROUP ANALYSIS CONFIGURATION READY")print("="*70)print(f"\nüìä Configuration Summary:")print(f"  ‚Ä¢ Available moderators: {len(available_moderators)}")print(f"  ‚Ä¢ Total observations: {k_total}")print(f"  ‚Ä¢ Unique papers: {k_papers}")print(f"  ‚Ä¢ Overall heterogeneity (I¬≤): {I_squared_overall:.2f}%")if I_squared_overall > 50:    print(f"\n  üî¥ High heterogeneity detected - subgroup analysis highly recommended")    print(f"     Explore which moderators explain the variation between studies")elif I_squared_overall > 25:    print(f"\n  üü° Moderate heterogeneity - subgroup analysis may be informative")else:    print(f"\n  üü¢ Low heterogeneity - subgroup analysis exploratory")print(f"\nüëÜ INSTRUCTIONS:")print(f"  1. Select analysis type (single-factor or two-factor)")print(f"  2. Choose moderator variable(s) from the dropdown(s)")print(f"  3. Review the distribution preview")print(f"  4. Adjust quality thresholds if needed")print(f"  5. Click '‚ñ∂ Run Subgroup Analysis' button")print(f"  6. After validation, proceed to next cell for results")print(f"\nüí° Tips:")print(f"  ‚Ä¢ Start with single-factor analysis to identify main moderators")print(f"  ‚Ä¢ Use two-factor analysis to test interactions")print(f"  ‚Ä¢ Higher thresholds = more reliable but fewer groups")print(f"  ‚Ä¢ Check distribution preview for balance and sample sizes")print("\n" + "="*70)# Store configuration metadataSUBGROUP_CONFIG_METADATA = {    'timestamp': datetime.datetime.now(),    'available_moderators': available_moderators,    'moderator_info': moderator_info,    'total_observations': k_total,    'total_papers': k_papers,    'overall_heterogeneity_I2': I_squared_overall,    'interface_created': True}print(f"\nüìä Metadata saved to SUBGROUP_CONFIG_METADATA")

In [None]:
# ‚ö†Ô∏è PREREQUISITES:# - Must calculate effect sizes first# - Requires hierarchical structure (multiple effects per study OR nested grouping)## Expected runtime: 5-30 seconds depending on dataset size and complexity## INTERPRETATION:# - tau¬≤_level2: Variance between effect sizes within studies# - tau¬≤_level3: Variance between studies# - ICC (Intraclass correlation): Proportion of variance at each level# THREE-LEVEL META-ANALYTIC MODEL## Implementation based on:# - Cheung (2014). Modeling dependent effect sizes with three-level meta-analyses:#   A structural equation modeling approach. Psychological Methods, 19(2), 211-229.# - Van den Noortgate et al. (2013). Three-level meta-analysis of dependent effect sizes.#   Behavior Research Methods, 45(2), 576-594.## Model structure:#   Level 1: Sampling variance within studies (known, vi)#   Level 2: Variance between effect sizes within the same study (tau_squared_level2)#   Level 3: Variance between studies (tau_squared_level3)## Estimation uses REML (Restricted Maximum Likelihood) for unbiased variance estimates.# Hedges' g and Cohen's d Calculation# Methods from Hedges & Olkin (1985). Statistical methods for meta-analysis.# Academic Press.## Cohen's d = (M1 - M2) / SD_pooled# Hedges' g = d * (1 - 3/(4*df - 1))  # Small sample correction## These are standardized mean differences, common in psychology and medicine.# Log Response Ratio (lnRR) Calculation# Method from Hedges et al. (1999) and Lajeunesse (2011).# Lajeunesse, M.J. (2011). On the meta-analysis of response ratios for studies with# correlated and multi-group designs. Ecology, 92(11), 2049-2055.## lnRR = ln(mean_treatment / mean_control)# Commonly used in ecology and environmental sciences.#@title üî¨ PERFORM THREE-LEVEL SUBGROUP ANALYSIS (ADVANCED)#@title üî¨ PERFORM THREE-LEVEL SUBGROUP ANALYSIS (ADVANCED)# =============================================================================# CELL 8 (ADVANCED REPLACEMENT): THREE-LEVEL SUBGROUP ANALYSIS# Purpose: Calculate pooled effects for each subgroup using a robust#          three-level model to account for within-study dependency.# Method: Runs a separate three-level REML analysis for each subgroup.#         Partitions heterogeneity using standard Q-statistics.# Dependencies: Cell 4.5, Cell 6, Cell 7# Outputs: 'subgroup_results' in ANALYSIS_CONFIG, compatible with Cell 9# =============================================================================# --- 0. HELPER FUNCTIONS (COPIED FROM PREVIOUS CELLS) ---# This cell must be self-contained to run the analyses.# --- 0a. Copied from Cell 4.5 (Advanced Heterogeneity Estimators) ---# Needed to get starting values for the 3-level modeldef calculate_tau_squared_DL(df, effect_col, var_col):    """    Calculate between-study variance using DerSimonian-Laird estimator.    Method from DerSimonian & Laird (1986). Meta-analysis in clinical trials.    Controlled Clinical Trials, 7(3), 177-188.    This is the most widely used method, though it can underestimate variance.    """    """DerSimonian-Laird estimator for tau-squared"""    k = len(df)    if k < 2: return 0.0    try:        w = 1 / df[var_col]        sum_w = w.sum()        if sum_w <= 0: return 0.0        pooled_effect = (w * df[effect_col]).sum() / sum_w        Q = (w * (df[effect_col] - pooled_effect)**2).sum()        df_Q = k - 1        sum_w_sq = (w**2).sum()        C = sum_w - (sum_w_sq / sum_w)        if C > 0 and Q > df_Q:            tau_sq = (Q - df_Q) / C        else:            tau_sq = 0.0        return max(0.0, tau_sq)    except Exception:        return 0.0def calculate_tau_squared_REML(df, effect_col, var_col, max_iter=100, tol=1e-8):    """    Calculate tau-squared using Restricted Maximum Likelihood (REML).    Method from Viechtbauer (2005). Bias and efficiency of meta-analytic variance    estimators in the random-effects model. J. Educational & Behavioral Statistics, 30(3), 261-293.    REML is generally preferred over ML as it accounts for loss of degrees of freedom.    """    """REML estimator for tau-squared"""    k = len(df)    if k < 2: return 0.0    try:        yi = df[effect_col].values        vi = df[var_col].values        valid_mask = np.isfinite(vi) & (vi > 0)        if not valid_mask.all():            yi = yi[valid_mask]            vi = vi[valid_mask]            k = len(yi)        if k < 2: return 0.0        def reml_objective(tau2):            tau2 = max(0, tau2)            wi = 1 / (vi + tau2)            sum_wi = wi.sum()            if sum_wi <= 0: return 1e10            mu = (wi * yi).sum() / sum_wi            Q = (wi * (yi - mu)**2).sum()            log_lik = -0.5 * (np.sum(np.log(vi + tau2)) + np.log(sum_wi) + Q)            return -log_lik        var_yi = np.var(yi, ddof=1) if k > 2 else 1.0        upper_bound = max(10 * var_yi, 100)        result = minimize_scalar(reml_objective, bounds=(0, upper_bound), method='bounded', options={'maxiter': max_iter, 'xatol': tol})        if result.success:            return max(0.0, result.x)        else:            return calculate_tau_squared_DL(df, effect_col, var_col)    except Exception:        return calculate_tau_squared_DL(df, effect_col, var_col)def calculate_tau_squared(df, effect_col, var_col, method='REML', **kwargs):    """Unified function to calculate tau-squared using specified method"""    method = method.upper()    if method == 'REML':        tau_sq = calculate_tau_squared_REML(df, effect_col, var_col, **kwargs)    else: # Default to DL        tau_sq = calculate_tau_squared_DL(df, effect_col, var_col)    info = {'method': method, 'tau_squared': tau_sq, 'success': True}    return tau_sq, info# --- 0b. Copied from Cell 6.5 (Three-Level Model) ---# The core 3-level analysis enginedef _get_three_level_estimates(params, y_all, v_all, N_total, M_studies):    """Core function to calculate estimates given variance components."""    try:        tau_sq, sigma_sq = params        if tau_sq < 0 or sigma_sq < 0:            return {'log_lik_reml': np.inf}        sum_log_det_Vi = 0.0        sum_S = 0.0        sum_Sy = 0.0        sum_ySy = 0.0        for i in range(M_studies):            y_i = y_all[i]            v_i = v_all[i]            A_diag = v_i + sigma_sq            A_inv_diag = 1.0 / A_diag            log_det_A = np.sum(np.log(A_diag))            sum_A_inv_1 = np.sum(A_inv_diag)            term_S = 1.0 + tau_sq * sum_A_inv_1            if term_S <= 1e-10: return {'log_lik_reml': np.inf}            log_det_Vi = log_det_A + np.log(term_S)            sum_log_det_Vi += log_det_Vi            A_inv_y = A_inv_diag * y_i            sum_A_inv_y = np.sum(A_inv_y)            Vi_inv_y = A_inv_y - (tau_sq * A_inv_diag * sum_A_inv_y) / term_S            Vi_inv_1_vec = A_inv_diag - (tau_sq * A_inv_diag * sum_A_inv_1) / term_S            sum_S += np.sum(Vi_inv_1_vec)            sum_Sy += np.sum(Vi_inv_y)            sum_ySy += np.dot(y_i, Vi_inv_y)        if sum_S <= 1e-10: return {'log_lik_reml': np.inf}        mu_hat = sum_Sy / sum_S        var_mu = 1.0 / sum_S        se_mu = np.sqrt(var_mu)        residual_ss = sum_ySy - 2.0 * mu_hat * sum_Sy + mu_hat**2 * sum_S        log_lik_reml = -0.5 * (sum_log_det_Vi + np.log(sum_S) + residual_ss)        log_lik_ml = -0.5 * (N_total * np.log(2.0 * np.pi) + sum_log_det_Vi + residual_ss)        return {'mu': mu_hat, 'se_mu': se_mu, 'var_mu': var_mu,                'log_lik_reml': log_lik_reml, 'log_lik_ml': log_lik_ml,                'tau_sq': tau_sq, 'sigma_sq': sigma_sq}    except (FloatingPointError, ValueError, np.linalg.LinAlgError):        return {'log_lik_reml': np.inf}def _negative_log_likelihood_reml(params, y_all, v_all, N_total, M_studies):    """Wrapper for optimizer."""    estimates = _get_three_level_estimates(params, y_all, v_all, N_total, M_studies)    return -estimates['log_lik_reml']def _run_three_level_reml_for_subgroup(analysis_data, effect_col, var_col):    """    Main optimization function for a *single subgroup*.    Returns estimates or None on failure.    """    grouped = analysis_data.groupby('id')    y_all = [group[effect_col].values for _, group in grouped]    v_all = [group[var_col].values for _, group in grouped]    N_total = len(analysis_data)    M_studies = len(y_all)    if M_studies < 2:        print("  ‚ö†Ô∏è  Not enough studies (<=1) for 3-level model in this subgroup.")        return None, None    try:        tau_sq_start, _ = calculate_tau_squared(analysis_data, effect_col, var_col, method='REML')    except Exception:        tau_sq_start = 0.01    initial_params = [max(0, tau_sq_start), 0.01]    bounds = [(0, None), (0, None)]    with warnings.catch_warnings():        warnings.simplefilter("ignore")        optimizer_result = minimize(            _negative_log_likelihood_reml,            x0=initial_params,            args=(y_all, v_all, N_total, M_studies),            method='L-BFGS-B',            bounds=bounds,            options={'ftol': 1e-10, 'gtol': 1e-6, 'maxiter': 500}        )    if not optimizer_result.success:        print(f"  ‚ùå SUBGROUP OPTIMIZATION FAILED: {optimizer_result.message}")        return None, None    final_estimates = _get_three_level_estimates(        optimizer_result.x, y_all, v_all, N_total, M_studies    )    return final_estimates, (y_all, v_all, N_total, M_studies)# --- 1. SCRIPT START ---analysis_output = widgets.Output()display(analysis_output) # Display the output area ONCE.# All analysis will be directed into this output areawith analysis_output:    print("="*70)    print("THREE-LEVEL SUBGROUP ANALYSIS")    print("="*70)    print(f"Timestamp: {datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")    try:        # --- 2. LOAD CONFIGURATION ---        print("STEP 1: LOADING CONFIGURATION")        print("---------------------------------")        if 'ANALYSIS_CONFIG' not in globals():            raise NameError("ANALYSIS_CONFIG not found. Run previous cells first.")        # Load data from global scope first        if 'analysis_data' in globals():            analysis_data = analysis_data.copy()            print(f"  ‚úì Found global 'analysis_data' (Shape: {analysis_data.shape})")        elif 'data_filtered' in globals():            analysis_data = data_filtered.copy()            print(f"  ‚úì Found global 'data_filtered' as fallback (Shape: {analysis_data.shape})")        else:            raise ValueError("Data not found. Run Cell 5/6 first.")        # Now, check for config keys        required_keys = ['effect_col', 'var_col', 'se_col', 'es_config',                         'overall_results', 'three_level_results', 'subgroup_config']        missing_keys = [k for k in required_keys if k not in ANALYSIS_CONFIG]        if missing_keys:            raise KeyError(f"Missing required config keys: {missing_keys}. Run Cells 6, 6.5, and 7.")        # Load config items        effect_col = ANALYSIS_CONFIG['effect_col']        var_col = ANALYSIS_CONFIG['var_col']        se_col = ANALYSIS_CONFIG['se_col']        es_config = ANALYSIS_CONFIG['es_config']        overall_results = ANALYSIS_CONFIG['overall_results']        three_level_results = ANALYSIS_CONFIG['three_level_results']        subgroup_config = ANALYSIS_CONFIG['subgroup_config']        if three_level_results.get('status') != 'completed':            raise ValueError("Cell 6.5 (Three-Level Analysis) must be run successfully first.")        analysis_type = subgroup_config['analysis_type']        moderator1 = subgroup_config['moderator1']        moderator2 = subgroup_config['moderator2']        valid_groups_list = subgroup_config['valid_groups_list']        print(f"  ‚úì Configuration loaded successfully")        print(f"  ‚úì Analysis Type: {analysis_type}")        print(f"  ‚úì Moderator 1: {moderator1}")        if moderator2:            print(f"  ‚úì Moderator 2: {moderator2}")        print(f"  ‚úì Found {len(valid_groups_list)} valid subgroups to analyze.")        # Clean the moderator column(s) in the main data        analysis_data[moderator1] = analysis_data[moderator1].astype(str).str.strip()        if moderator2:             analysis_data[moderator2] = analysis_data[moderator2].astype(str).str.strip()        print(f"  ‚úì Cleaned moderator column(s) in analysis_data")        # --- 3. ANALYZE EACH SUBGROUP ---        print("\nSTEP 2: RUNNING 3-LEVEL ANALYSIS FOR EACH SUBGROUP")        print("---------------------------------")        subgroup_results_list = []        total_Q_within_fe = 0.0 # We use the standard FE Q-stats for the Q_between test        for group_item in valid_groups_list:            # --- Get Group Data ---            if analysis_type == 'single':                group_name = str(group_item) # group_item is a string like 'Barley'                group_data = analysis_data[analysis_data[moderator1] == group_name].copy()            else: # two_way                group_tuple = group_item # group_item is a tuple like ('Barley', 'High_N')                group_name = f"{group_tuple[0]} x {group_tuple[1]}"                group_data = analysis_data[                    (analysis_data[moderator1] == group_tuple[0]) &                    (analysis_data[moderator2] == group_tuple[1])                ].copy()            print(f"\nAnalyzing Subgroup: {group_name}")            k_group = len(group_data)            n_papers_group = group_data['id'].nunique()            print(f"  k_obs = {k_group}, k_studies = {n_papers_group}")            if k_group < 2 or n_papers_group < 2:                print("  ‚ö†Ô∏è  Skipping group (k < 2 or papers < 2).")                continue            # --- Run 3-Level Model on Subgroup ---            estimates, _ = _run_three_level_reml_for_subgroup(group_data, effect_col, var_col)            if estimates is None:                print(f"  ‚ùå 3-Level model failed for this subgroup. Skipping.")                continue            # --- Extract 3-Level Results (for plotting) ---            mu_re = estimates['mu']            se_re = estimates['se_mu']            var_re = estimates['var_mu']            ci_lower_re = mu_re - 1.96 * se_re            ci_upper_re = mu_re + 1.96 * se_re            p_value_re = 2 * (1 - norm.cdf(abs(mu_re / se_re)))            tau_sq_re = estimates['tau_sq']            sigma_sq_re = estimates['sigma_sq']            # Calculate 3-Level I-squared            mean_v_i = np.mean(group_data[var_col])            total_variance_est = tau_sq_re + sigma_sq_re + mean_v_i            I_squared_re = ((tau_sq_re + sigma_sq_re) / total_variance_est) * 100 if total_variance_est > 0 else 0            # --- Run Standard FE Model on Subgroup (for Q_between test) ---            w_fe = 1 / group_data[var_col]            sum_w_fe = w_fe.sum()            pooled_effect_fe = (w_fe * group_data[effect_col]).sum() / sum_w_fe            Q_within_group = (w_fe * (group_data[effect_col] - pooled_effect_fe)**2).sum()            total_Q_within_fe += Q_within_group            # --- *** FIX: ADD FOLD CHANGE PLACEHOLDER *** ---            if es_config.get('has_fold_change', False):                # Calculate real fold change for lnRR                RR = np.exp(mu_re)                fold_change_re = RR if mu_re >= 0 else -1/RR            else:                # Add NaN placeholder for Hedges' g / other                fold_change_re = np.nan            # --- *** END FIX *** ---            # --- Store Results ---            result_dict = {                'group': group_name,                'k': k_group,                'n_papers': n_papers_group,                # These are the 3-LEVEL results, named to match Cell 9's expectations                'pooled_effect_re': mu_re,                'pooled_se_re': se_re,                'pooled_var_re': var_re,                'ci_lower_re': ci_lower_re,                'ci_upper_re': ci_upper_re,                'p_value_re': p_value_re,                'I_squared': I_squared_re,                'tau_squared': tau_sq_re,                'sigma_squared': sigma_sq_re,                'fold_change_re': fold_change_re, # <-- Added this line                # Store FE Q-stat for partitioning                'Q_within': Q_within_group,                'df_Q': k_group - 1            }            if analysis_type == 'two_way':                group_tuple = group_item                result_dict[moderator1] = group_tuple[0]                result_dict[moderator2] = group_tuple[1]            subgroup_results_list.append(result_dict)            print(f"  ‚úì Subgroup analysis complete.")        # --- 4. HETEROGENEITY PARTITIONING ---        print("\nSTEP 3: PARTITIONING HETEROGENEITY")        print("---------------------------------")        results_df = pd.DataFrame(subgroup_results_list)        if results_df.empty:            raise ValueError("No subgroups were successfully analyzed.")        # Use Q-total from the *standard* fixed-effect model (Cell 6)        Qt_overall = overall_results['Qt']        k_overall = overall_results['k']        Qe_sum = results_df['Q_within'].sum()        df_Qe = results_df['df_Q'].sum()        M_groups = len(results_df)        df_QM = M_groups - 1        QM = max(0, Qt_overall - Qe_sum)        p_value_QM = 1 - chi2.cdf(QM, df_QM) if df_QM > 0 else np.nan        R_squared = max(0, (QM / Qt_overall) * 100) if Qt_overall > 0 else 0        print(f"\n  Heterogeneity Decomposition (based on standard FE Q-stats):")        print(f"  {'Component':<25} {'Q':>12} {'df':>8} {'P-value':>10}")        print(f"  {'-'*25} {'-'*12} {'-'*8} {'-'*10}")        print(f"  {'Total (Q_T)':<25} {Qt_overall:>12.4f} {k_overall-1:>8} {'-':>10}")        print(f"  {'Between-Groups (Q_M)':<25} {QM:>12.4f} {df_QM:>8} {p_value_QM:>10.4g}")        print(f"  {'Within-Groups (Q_E)':<25} {Qe_sum:>12.4f} {df_Qe:>8} {'-':>10}")        print(f"\n  Variance Explained (R¬≤): {R_squared:.1f}%")        print(f"  Interpretation: The moderator explains {R_squared:.1f}% of the *standard* heterogeneity.")        # --- 5. DISPLAY RESULTS TABLE ---        print("\n" + "="*70)        print("THREE-LEVEL SUBGROUP ANALYSIS: RESULTS")        print("="*70)        print("\n  NOTE: Pooled effects below are from robust 3-level models for each subgroup.\n")        print(f"  {'Group':<35} {'k':>5} {'Papers':>8} {'Effect (RE)':>12} {'95% CI':>22} {'P-value':>10}")        print(f"  {'-'*35} {'-'*5} {'-'*8} {'-'*12} {'-'*22} {'-'*10}")        for _, row in results_df.iterrows():            group_name = str(row['group'])[:35]            ci_str = f"[{row['ci_lower_re']:.3f}, {row['ci_upper_re']:.3f}]"            sig_marker = "***" if row['p_value_re'] < 0.001 else "**" if row['p_value_re'] < 0.01 else "*" if row['p_value_re'] < 0.05 else "ns"            print(f"  {group_name:<35} {row['k']:>5} {row['n_papers']:>8} {row['pooled_effect_re']:>12.4f} {ci_str:>22} {row['p_value_re']:>9.4g} {sig_marker}")        print(f"\n  Test for Subgroup Differences (Q_M): p = {p_value_QM:.4g}")        # --- 6. SAVE RESULTS FOR CELL 9 ---        print("\nSTEP 4: SAVING RESULTS")        print("---------------------------------")        # Save results in the format Cell 9 expects        ANALYSIS_CONFIG['subgroup_results'] = {            'timestamp': datetime.datetime.now(),            'results_df': results_df, # This is the key part for Cell 9            'analysis_type': analysis_type,            'moderator1': moderator1,            'moderator2': moderator2,            # Add the partitioning results            'Qt_overall': Qt_overall,            'QM': QM,            'Qe': Qe_sum,            'df_QM': df_QM,            'df_Qe': df_Qe,            'p_value_QM': p_value_QM,            'R_squared': R_squared        }        print("  ‚úì Results saved to ANALYSIS_CONFIG['subgroup_results']")        print("  ‚úì The next cell (Forest Plot) will now use these 3-level estimates.")        print("\n" + "="*70)        print("‚úÖ THREE-LEVEL SUBGROUP ANALYSIS COMPLETE")        print("="*70)        print("  ‚ñ∂Ô∏è  Run Cell 9 (Dynamic Forest Plot) to visualize these results.")    except Exception as e:        print(f"\n‚ùå AN ERROR OCCURRED:\n")        print(f"  Type: {type(e).__name__}")        print(f"  Message: {e}")        print("\n  Traceback:")        traceback.print_exc(file=sys.stdout)        print("\n" + "="*70)        print("ANALYSIS FAILED. See error message above.")        print("Please check your data and configuration.")        print("="*70)# --- 4. INITIAL CHECK (REMOVED) ---# The logic is now all contained within the `with analysis_output:` block.# We just need the pre-run check.try:    if ('ANALYSIS_CONFIG' not in globals() or        'overall_results' not in ANALYSIS_CONFIG or        'three_level_results' not in ANALYSIS_CONFIG or        'subgroup_config' not in ANALYSIS_CONFIG):        print("="*70)        print("‚ö†Ô∏è  PREREQUISITES NOT MET")        print("="*70)        print("Please run the following cells in order before this one:")        print("  1. Cell 6 (Overall Meta-Analysis)")        print("  2. Cell 6.5 (Three-Level Meta-Analysis)")        print("  3. Cell 7 (Subgroup Analysis Configuration)")    else:        # Check for auto-detection        if 'analysis_data' in globals():            data_check = analysis_data        elif 'data_filtered' in globals():            data_check = data_filtered        else:            raise ValueError("Data not found for pre-check")        k_obs_check = len(data_check)        k_studies_check = data_check['id'].nunique()        if k_obs_check == k_studies_check:            print("="*70)            print("‚úÖ THREE-LEVEL ANALYSIS NOT REQUIRED")            print("="*70)            print("  Your dataset has only one effect size per study.")            print("  The standard meta-analysis from Cell 8 is sufficient.")        else:            print("="*70)            print("‚úÖ READY FOR THREE-LEVEL SUBGROUP ANALYSIS")            print("="*70)            print("  This cell is ready to run.")            print("  It will use the configurations from Cells 6, 6.5, and 7.")            print("  This will replace the standard subgroup analysis with a more robust 3-level model.")            # Display the output area, which will be populated when the cell runs.except Exception as e:     print(f"‚ùå An error occurred during initialization: {e}")

In [None]:
# Hedges' g and Cohen's d Calculation# Methods from Hedges & Olkin (1985). Statistical methods for meta-analysis.# Academic Press.## Cohen's d = (M1 - M2) / SD_pooled# Hedges' g = d * (1 - 3/(4*df - 1))  # Small sample correction## These are standardized mean differences, common in psychology and medicine.# Log Response Ratio (lnRR) Calculation# Method from Hedges et al. (1999) and Lajeunesse (2011).# Lajeunesse, M.J. (2011). On the meta-analysis of response ratios for studies with# correlated and multi-group designs. Ecology, 92(11), 2049-2055.## lnRR = ln(mean_treatment / mean_control)# Commonly used in ecology and environmental sciences.#@title üìä DYNAMIC FOREST PLOT (Publication-Ready)#@title üìä DYNAMIC FOREST PLOT (Publication-Ready)# =============================================================================# CELL 9: PUBLICATION-READY FOREST PLOT# Purpose: Create customizable forest plots for meta-analysis results# Dependencies: Cell 6 (overall_results), Cell 8 (subgroup_results)# Outputs: PDF and PNG forest plots with full customization# =============================================================================# --- 1. LOAD CONFIGURATION ---print("="*70)print("FOREST PLOT CONFIGURATION")print("="*70)try:    if 'ANALYSIS_CONFIG' not in locals() and 'ANALYSIS_CONFIG' not in globals():        raise NameError("ANALYSIS_CONFIG not found.")    subgroup_results = ANALYSIS_CONFIG.get('subgroup_results', {})    overall_results = ANALYSIS_CONFIG['overall_results']    es_config = ANALYSIS_CONFIG['es_config']    # Determine if we have subgroup analysis    has_subgroups = bool(subgroup_results) and 'results_df' in subgroup_results    if has_subgroups:        analysis_type = subgroup_results['analysis_type']        moderator1 = subgroup_results['moderator1']        moderator2 = subgroup_results.get('moderator2', None)        results_df = subgroup_results['results_df']        # Set dynamic defaults        if analysis_type == 'two_way':            default_title = f'Forest Plot: {moderator1} √ó {moderator2}'            default_y_label = moderator2        else:            default_title = f'Forest Plot: {moderator1}'            default_y_label = moderator1    else:        # Overall only (no subgroups)        analysis_type = 'overall_only'        default_title = 'Forest Plot: Overall Effect'        default_y_label = 'Study'        moderator1 = None        moderator2 = None    default_x_label = es_config.get('effect_label', "Effect Size")    print(f"‚úì Analysis type: {analysis_type}")    print(f"‚úì Has subgroups: {has_subgroups}")    print(f"‚úì Configuration loaded successfully")except (KeyError, NameError) as e:    print(f"‚ùå ERROR: Failed to load configuration: {e}")    print("   Please run Cell 6 (overall analysis) first")    raise# --- 2. DEFINE CUSTOMIZATION WIDGETS ---# ========== TAB 1: PLOT STYLE ==========style_header = widgets.HTML("<h3 style='color: #2E86AB;'>Plot Style & Layout</h3>")model_widget = widgets.Dropdown(    options=[('Random-Effects', 'RE'), ('Fixed-Effects', 'FE')],    value='RE',    description='Model:',    style={'description_width': '130px'},    layout=widgets.Layout(width='450px'))width_widget = widgets.FloatSlider(    value=8.0, min=6.0, max=14.0, step=0.5,    description='Plot Width (in):',    continuous_update=False,    style={'description_width': '130px'},    layout=widgets.Layout(width='450px'))height_widget = widgets.FloatSlider(    value=0.4, min=0.2, max=1.0, step=0.05,    description='Height per Row (in):',    continuous_update=False,    style={'description_width': '130px'},    layout=widgets.Layout(width='450px'))title_fontsize_widget = widgets.IntSlider(    value=12, min=8, max=18, step=1,    description='Title Font Size:',    continuous_update=False,    style={'description_width': '130px'},    layout=widgets.Layout(width='450px'))label_fontsize_widget = widgets.IntSlider(    value=11, min=8, max=16, step=1,    description='Axis Label Size:',    continuous_update=False,    style={'description_width': '130px'},    layout=widgets.Layout(width='450px'))tick_fontsize_widget = widgets.IntSlider(    value=9, min=6, max=14, step=1,    description='Tick Label Size:',    continuous_update=False,    style={'description_width': '130px'},    layout=widgets.Layout(width='450px'))annot_fontsize_widget = widgets.IntSlider(    value=8, min=6, max=12, step=1,    description='Annotation Size:',    continuous_update=False,    style={'description_width': '130px'},    layout=widgets.Layout(width='450px'))color_scheme_widget = widgets.Dropdown(    options=[        ('Grayscale (Publication)', 'gray'),        ('Color (Presentation)', 'color'),        ('Black & White Only', 'bw')    ],    value='gray',    description='Color Scheme:',    style={'description_width': '130px'},    layout=widgets.Layout(width='450px'))marker_style_widget = widgets.Dropdown(    options=[        ('Circle/Diamond (‚óè/‚óÜ)', 'circle_diamond'),        ('Square/Diamond (‚ñ†/‚óÜ)', 'square_diamond'),        ('Circle/Star (‚óè/‚òÖ)', 'circle_star')    ],    value='circle_diamond',    description='Marker Style:',    style={'description_width': '130px'},    layout=widgets.Layout(width='450px'))ci_style_widget = widgets.Dropdown(    options=[        ('Solid Line', 'solid'),        ('Dashed Line', 'dashed'),        ('Solid with Caps', 'caps')    ],    value='solid',    description='CI Line Style:',    style={'description_width': '130px'},    layout=widgets.Layout(width='450px'))style_tab = widgets.VBox([    style_header,    model_widget,    widgets.HTML("<hr style='margin: 10px 0;'>"),    widgets.HTML("<b>Dimensions:</b>"),    width_widget,    height_widget,    widgets.HTML("<hr style='margin: 10px 0;'>"),    widgets.HTML("<b>Typography:</b>"),    title_fontsize_widget,    label_fontsize_widget,    tick_fontsize_widget,    annot_fontsize_widget,    widgets.HTML("<hr style='margin: 10px 0;'>"),    widgets.HTML("<b>Visual Style:</b>"),    color_scheme_widget,    marker_style_widget,    ci_style_widget])# ========== TAB 2: TEXT & LABELS ==========text_header = widgets.HTML("<h3 style='color: #2E86AB;'>Text & Labels</h3>")show_title_widget = widgets.Checkbox(    value=True,    description='Show Plot Title',    indent=False,    layout=widgets.Layout(width='450px'))title_widget = widgets.Text(    value=default_title,    description='Plot Title:',    layout=widgets.Layout(width='450px'),    style={'description_width': '130px'})xlabel_widget = widgets.Text(    value=default_x_label,    description='X-Axis Label:',    layout=widgets.Layout(width='450px'),    style={'description_width': '130px'})ylabel_widget = widgets.Text(    value=default_y_label,    description='Y-Axis Label:',    layout=widgets.Layout(width='450px'),    style={'description_width': '130px'})show_ylabel_widget = widgets.Checkbox(    value=True,    description='Show Y-Axis Label',    indent=False,    layout=widgets.Layout(width='450px'))text_tab = widgets.VBox([    text_header,    show_title_widget,    title_widget,    widgets.HTML("<hr style='margin: 10px 0;'>"),    xlabel_widget,    show_ylabel_widget,    ylabel_widget])# ========== TAB 3: ANNOTATIONS ==========annot_header = widgets.HTML("<h3 style='color: #2E86AB;'>Annotations</h3>")show_k_widget = widgets.Checkbox(    value=True,    description='Show k (observations)',    indent=False,    layout=widgets.Layout(width='450px'))show_papers_widget = widgets.Checkbox(    value=True,    description='Show paper count',    indent=False,    layout=widgets.Layout(width='450px'))show_fold_change_widget = widgets.Checkbox(    value=es_config.get('has_fold_change', False),    description='Show Fold-Change',    indent=False,    layout=widgets.Layout(width='450px'))annot_pos_widget = widgets.Dropdown(    options=[        ('Right of CI', 'right'),        ('Above Marker', 'above'),        ('Below Marker', 'below')    ],    value='right',    description='Position:',    style={'description_width': '130px'},    layout=widgets.Layout(width='450px'))annot_offset_widget = widgets.FloatSlider(    value=0.0, min=-1.0, max=1.0, step=0.05,    description='H-Offset:',    continuous_update=False,    style={'description_width': '130px'},    layout=widgets.Layout(width='450px'),    readout_format='.2f')group_label_box = widgets.VBox()if has_subgroups and analysis_type == 'two_way':    group_label_h_offset_widget = widgets.FloatSlider(        value=0.0, min=-2.0, max=2.0, step=0.1,        description='Group H-Offset:',        continuous_update=False,        style={'description_width': '130px'},        layout=widgets.Layout(width='450px')    )    group_label_v_offset_widget = widgets.FloatSlider(        value=0.0, min=-1.0, max=1.0, step=0.1,        description='Group V-Offset:',        continuous_update=False,        style={'description_width': '130px'},        layout=widgets.Layout(width='450px')    )    group_label_fontsize_widget = widgets.IntSlider(        value=10, min=7, max=14, step=1,        description='Group Font Size:',        continuous_update=False,        style={'description_width': '130px'},        layout=widgets.Layout(width='450px')    )    group_label_box = widgets.VBox([        widgets.HTML("<hr style='margin: 10px 0;'>"),        widgets.HTML("<b>Group Labels (Two-Way):</b>"),        group_label_h_offset_widget,        group_label_v_offset_widget,        group_label_fontsize_widget    ])annot_tab = widgets.VBox([    annot_header,    widgets.HTML("<b>Show in Annotations:</b>"),    show_k_widget,    show_papers_widget,    show_fold_change_widget,    widgets.HTML("<hr style='margin: 10px 0;'>"),    widgets.HTML("<b>Position:</b>"),    annot_pos_widget,    annot_offset_widget,    group_label_box])# ========== TAB 4: AXES & SCALE ==========axes_header = widgets.HTML("<h3 style='color: #2E86AB;'>Axes & Scaling</h3>")auto_scale_widget = widgets.Checkbox(    value=True,    description='Auto-Scale X-Axis',    indent=False,    layout=widgets.Layout(width='450px'))x_min_widget = widgets.FloatText(    value=-2.0,    description='X-Min:',    style={'description_width': '80px'},    layout=widgets.Layout(width='220px', visibility='hidden'))x_max_widget = widgets.FloatText(    value=2.0,    description='X-Max:',    style={'description_width': '80px'},    layout=widgets.Layout(width='220px', visibility='hidden'))manual_scale_box = widgets.HBox([x_min_widget, x_max_widget])def toggle_manual_scale(change):    if change['new']:        x_min_widget.layout.visibility = 'hidden'        x_max_widget.layout.visibility = 'hidden'    else:        x_min_widget.layout.visibility = 'visible'        x_max_widget.layout.visibility = 'visible'auto_scale_widget.observe(toggle_manual_scale, names='value')show_grid_widget = widgets.Checkbox(    value=True,    description='Show Grid',    indent=False,    layout=widgets.Layout(width='450px'))grid_style_widget = widgets.Dropdown(    options=[        ('Dashed (Light)', 'dashed_light'),        ('Dotted (Light)', 'dotted_light'),        ('Solid (Light)', 'solid_light')    ],    value='dashed_light',    description='Grid Style:',    style={'description_width': '130px'},    layout=widgets.Layout(width='450px'))show_null_line_widget = widgets.Checkbox(    value=True,    description='Show Null Effect Line',    indent=False,    layout=widgets.Layout(width='450px'))show_fold_axis_widget = widgets.Checkbox(    value=es_config.get('has_fold_change', False) and show_fold_change_widget.value,    description='Show Fold-Change Axis (Top)',    indent=False,    layout=widgets.Layout(width='450px'))axes_tab = widgets.VBox([    axes_header,    auto_scale_widget,    manual_scale_box,    widgets.HTML("<hr style='margin: 10px 0;'>"),    widgets.HTML("<b>Grid & Reference Lines:</b>"),    show_grid_widget,    grid_style_widget,    show_null_line_widget,    widgets.HTML("<hr style='margin: 10px 0;'>"),    show_fold_axis_widget])# ========== TAB 5: EXPORT OPTIONS ==========export_header = widgets.HTML("<h3 style='color: #2E86AB;'>Export Options</h3>")save_pdf_widget = widgets.Checkbox(    value=True,    description='Save as PDF',    indent=False,    layout=widgets.Layout(width='450px'))save_png_widget = widgets.Checkbox(    value=True,    description='Save as PNG',    indent=False,    layout=widgets.Layout(width='450px'))png_dpi_widget = widgets.IntSlider(    value=300, min=150, max=600, step=50,    description='PNG DPI:',    continuous_update=False,    style={'description_width': '130px'},    layout=widgets.Layout(width='450px'))filename_prefix_widget = widgets.Text(    value='ForestPlot',    description='Filename Prefix:',    layout=widgets.Layout(width='450px'),    style={'description_width': '130px'})transparent_bg_widget = widgets.Checkbox(    value=False,    description='Transparent Background',    indent=False,    layout=widgets.Layout(width='450px'))export_tab = widgets.VBox([    export_header,    save_pdf_widget,    save_png_widget,    png_dpi_widget,    widgets.HTML("<hr style='margin: 10px 0;'>"),    filename_prefix_widget,    transparent_bg_widget])# ========== TAB 6: LABEL EDITOR ==========label_editor_header = widgets.HTML("<h3 style='color: #2E86AB;'>Label Editor</h3>")label_editor_desc = widgets.HTML(    "<p style='color: #666;'><i>Customize display names for all groups and subgroups in the plot</i></p>")print(f"\nüîç Identifying labels for editor...")unique_labels = set()label_widgets_dict = {}try:    if has_subgroups:        if analysis_type == 'single':            unique_labels.update(results_df['group'].astype(str).unique())        else:  # two_way            unique_labels.update(results_df[moderator1].astype(str).unique())            unique_labels.update(results_df[moderator2].astype(str).unique())    unique_labels.add('Overall')    sorted_labels = sorted(list(unique_labels))    print(f"  ‚úì Found {len(sorted_labels)} unique labels")    label_editor_widgets = []    for label in sorted_labels:        widget_label = f"Overall Effect:" if label == 'Overall' else f"{label}:"        text_widget = widgets.Text(            value=str(label),            description=widget_label,            layout=widgets.Layout(width='500px'),            style={'description_width': '200px'}        )        label_editor_widgets.append(text_widget)        label_widgets_dict[str(label)] = text_widget    label_editor_tab = widgets.VBox([        label_editor_header,        label_editor_desc,        widgets.HTML("<hr style='margin: 10px 0;'>"),        widgets.HTML(            "<p><b>Instructions:</b> Edit the text on the right to change how labels appear in the plot. "            "The original coded names are shown on the left.</p>"        ),        widgets.HTML("<hr style='margin: 10px 0;'>"),        *label_editor_widgets    ])    print(f"  ‚úì Label editor created")except Exception as e:    print(f"  ‚ö†Ô∏è  Error creating label editor: {e}")    label_editor_tab = widgets.VBox([        label_editor_header,        widgets.HTML("<p style='color: red;'>Error creating label editor.</p>")    ])    label_widgets_dict = {}# ========== CREATE TAB WIDGET ==========tab_children = [style_tab, text_tab, annot_tab, axes_tab, export_tab, label_editor_tab]tab = widgets.Tab(children=tab_children)tab.set_title(0, 'üé® Style')tab.set_title(1, 'üìù Text')tab.set_title(2, 'üè∑Ô∏è Annotations')tab.set_title(3, 'üìè Axes')tab.set_title(4, 'üíæ Export')tab.set_title(5, '‚úèÔ∏è Labels')# Continue to Part 2 (plot generation function)...# --- 3. DEFINE PLOT GENERATION FUNCTION ---plot_output = widgets.Output()def generate_plot(b):    with plot_output:        clear_output(wait=True)        print("\n" + "="*70)        print("GENERATING FOREST PLOT")        print("="*70)        print(f"Timestamp: {datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")        try:            # --- GET WIDGET VALUES ---            plot_model = model_widget.value            plot_width = width_widget.value            height_per_row = height_widget.value            title_fontsize = title_fontsize_widget.value            label_fontsize = label_fontsize_widget.value            tick_fontsize = tick_fontsize_widget.value            annot_fontsize = annot_fontsize_widget.value            color_scheme = color_scheme_widget.value            marker_style = marker_style_widget.value            ci_style = ci_style_widget.value            show_title = show_title_widget.value            graph_title = title_widget.value            x_label = xlabel_widget.value            show_ylabel = show_ylabel_widget.value            y_label = ylabel_widget.value            show_k = show_k_widget.value            show_papers = show_papers_widget.value            show_fold_change = show_fold_change_widget.value            annot_pos = annot_pos_widget.value            annot_offset = annot_offset_widget.value            auto_scale = auto_scale_widget.value            x_min_manual = x_min_widget.value            x_max_manual = x_max_widget.value            show_grid = show_grid_widget.value            grid_style = grid_style_widget.value            show_null_line = show_null_line_widget.value            show_fold_axis = show_fold_axis_widget.value            save_pdf = save_pdf_widget.value            save_png = save_png_widget.value            png_dpi = png_dpi_widget.value            filename_prefix = filename_prefix_widget.value            transparent_bg = transparent_bg_widget.value            # Group label offsets (two-way only)            if has_subgroups and analysis_type == 'two_way':                group_label_h_offset = group_label_h_offset_widget.value                group_label_v_offset = group_label_v_offset_widget.value                group_label_fontsize = group_label_fontsize_widget.value            else:                group_label_h_offset = 0                group_label_v_offset = 0                group_label_fontsize = 10            # --- BUILD LABEL MAPPING FROM EDITOR ---            label_mapping = {}            for original_label, widget in label_widgets_dict.items():                custom_label = widget.value                label_mapping[original_label] = custom_label                label_mapping[str(original_label)] = custom_label            print(f"üìä Configuration:")            print(f"  Model: {plot_model}")            print(f"  Dimensions: {plot_width}\" √ó auto")            print(f"  Color scheme: {color_scheme}")            print(f"  Has subgroups: {has_subgroups}")            # Show custom labels if any were changed            changed_labels = {k: v for k, v in label_mapping.items() if k != v}            if changed_labels:                print(f"\nüìù Custom labels ({len(changed_labels)} changed):")                for orig, custom in list(changed_labels.items())[:5]:                    print(f"  '{orig}' ‚Üí '{custom}'")                if len(changed_labels) > 5:                    print(f"  ... and {len(changed_labels)-5} more")            overall_label_text = label_mapping.get('Overall', 'Overall Effect')            # --- DETERMINE COLUMN NAMES BASED ON MODEL ---            if plot_model == 'FE':                effect_col = 'pooled_effect_fe'                se_col = 'pooled_se_fe'                ci_lower_col = 'ci_lower_fe'                ci_upper_col = 'ci_upper_fe'                fold_col = 'fold_change_fe'                overall_effect_key = 'pooled_effect_fixed'                overall_se_key = 'pooled_SE_fixed'                overall_ci_lower_key = 'ci_lower_fixed'                overall_ci_upper_key = 'ci_upper_fixed'                overall_fold_key = 'pooled_fold_fixed'            else:  # RE                effect_col = 'pooled_effect_re'                se_col = 'pooled_se_re'                ci_lower_col = 'ci_lower_re'                ci_upper_col = 'ci_upper_re'                fold_col = 'fold_change_re'                overall_effect_key = 'pooled_effect_random'                overall_se_key = 'pooled_SE_random'                overall_ci_lower_key = 'ci_lower_random'                overall_ci_upper_key = 'ci_upper_random'                overall_fold_key = 'pooled_fold_random'            # --- PREPARE DATA ---            if has_subgroups:                plot_df_subgroups = results_df.copy()                plot_df_subgroups = plot_df_subgroups.rename(columns={                    effect_col: 'EffectSize',                    se_col: 'SE',                    ci_lower_col: 'CI_Lower',                    ci_upper_col: 'CI_Upper',                    fold_col: 'FoldChange',                    'k': 'k',                    'n_papers': 'nPapers'                })                if analysis_type == 'two_way':                    plot_df_subgroups['GroupVar'] = plot_df_subgroups[moderator1].astype(str)                    plot_df_subgroups['LabelVar'] = plot_df_subgroups[moderator2].astype(str)                else:  # single                    plot_df_subgroups['GroupVar'] = 'Subgroup'                    plot_df_subgroups['LabelVar'] = plot_df_subgroups['group'].astype(str)                required_cols = ['GroupVar', 'LabelVar', 'k', 'nPapers',                               'EffectSize', 'SE', 'CI_Lower', 'CI_Upper', 'FoldChange']                plot_df_subgroups = plot_df_subgroups[required_cols]                plot_df_subgroups.dropna(subset=['EffectSize', 'SE'], inplace=True)                print(f"  Subgroups: {len(plot_df_subgroups)}")            else:                plot_df_subgroups = pd.DataFrame(columns=[                    'GroupVar', 'LabelVar', 'k', 'nPapers',                    'EffectSize', 'SE', 'CI_Lower', 'CI_Upper', 'FoldChange'                ])            # --- ADD OVERALL EFFECT ---            overall_effect_val = overall_results[overall_effect_key]            overall_se_val = overall_results[overall_se_key]            overall_ci_lower_val = overall_results[overall_ci_lower_key]            overall_ci_upper_val = overall_results[overall_ci_upper_key]            overall_k_val = overall_results['k']            overall_papers_val = overall_results['k_papers']            overall_fold_val = overall_results.get(overall_fold_key, np.nan)            overall_row = pd.DataFrame([{                'GroupVar': 'Overall',                'LabelVar': 'Overall',                'k': overall_k_val,                'nPapers': overall_papers_val,                'EffectSize': overall_effect_val,                'SE': overall_se_val,                'CI_Lower': overall_ci_lower_val,                'CI_Upper': overall_ci_upper_val,                'FoldChange': overall_fold_val            }])            print(f"  Overall: k={overall_k_val}, papers={overall_papers_val}")            # --- COMBINE DATA (OVERALL ON TOP) ---            plot_df = pd.concat([overall_row, plot_df_subgroups], ignore_index=True)            plot_df['SortKey_Group'] = plot_df['GroupVar'].apply(                lambda x: 'AAAAA' if x == 'Overall' else str(x)            )            plot_df['SortKey_Label'] = plot_df['LabelVar'].apply(                lambda x: 'AAAAA' if x == 'Overall' else str(x)            )            plot_df.sort_values(by=['SortKey_Group', 'SortKey_Label'], inplace=True)            plot_df.reset_index(drop=True, inplace=True)            if plot_df.empty:                print("‚ùå ERROR: No data to plot")                return            print(f"  Total rows: {len(plot_df)}")            # --- CALCULATE PLOT DIMENSIONS ---            num_rows = len(plot_df)            y_positions = np.arange(num_rows)            base_height = 2.5            plot_height = max(base_height, num_rows * height_per_row + 1.5)            y_margin_top = 0.75            y_margin_bottom = 0.75            y_lim_bottom = y_positions[0] - y_margin_bottom            y_lim_top = y_positions[-1] + y_margin_top            # --- Y-TICK LABELS (USE CUSTOM MAPPING) ---            y_tick_labels = []            for i, row in plot_df.iterrows():                if row['GroupVar'] == 'Overall':                    y_tick_labels.append(overall_label_text)                else:                    original_label = str(row['LabelVar'])                    display_label = label_mapping.get(original_label, original_label)                    y_tick_labels.append(display_label)            # --- CALCULATE X-AXIS LIMITS (FIXED - USE ALL DATA) ---            min_ci = plot_df['CI_Lower'].min()            max_ci = plot_df['CI_Upper'].max()            min_effect = plot_df['EffectSize'].min()            max_effect = plot_df['EffectSize'].max()            plot_min = min(min_ci, 0)            plot_max = max(max_ci, 0)            x_range = plot_max - plot_min            if x_range == 0:                x_range = 1            print(f"\nüìè Data range:")            print(f"  Effect sizes: [{min_effect:.3f}, {max_effect:.3f}]")            print(f"  CI range: [{min_ci:.3f}, {max_ci:.3f}]")            print(f"  Plot range: [{plot_min:.3f}, {plot_max:.3f}]")            # --- ESTIMATE ANNOTATION SPACE NEEDED ---            max_k = int(plot_df['k'].max())            max_np = int(plot_df['nPapers'].max()) if 'nPapers' in plot_df.columns else 0            annot_parts = []            if show_k:                annot_parts.append(f"k={max_k}")            if show_papers:                annot_parts.append(f"({max_np})")            if show_fold_change and es_config.get('has_fold_change', False):                max_fold = plot_df['FoldChange'].abs().max() if 'FoldChange' in plot_df.columns else 10                annot_parts.append(f"[-{max_fold:.2f}√ó]")            example_annot = " ".join(annot_parts) if annot_parts else "k=100 (10)"            char_width_fraction = (annot_fontsize / 8.0) * 0.006            annot_space_fraction = len(example_annot) * char_width_fraction            print(f"  Annotation example: '{example_annot}' ({len(example_annot)} chars)")            # --- CALCULATE SPACE FOR GROUP LABELS (TWO-WAY) ---            group_label_space = 0            if has_subgroups and analysis_type == 'two_way':                max_group_len = 0                for group_val in plot_df[plot_df['GroupVar'] != 'Overall']['GroupVar'].unique():                    custom_label = label_mapping.get(str(group_val), str(group_val))                    max_group_len = max(max_group_len, len(custom_label))                char_width_group = (group_label_fontsize / 8.0) * 0.006                group_label_space = max_group_len * char_width_group                print(f"  Group label max: {max_group_len} chars")            # --- AUTO-SCALE CALCULATION ---            if auto_scale:                left_padding = 0.05                annot_distance = 0.015                right_padding = 0.03                total_right_fraction = (annot_distance +                                       annot_space_fraction +                                       group_label_space +                                       right_padding)                x_min_auto = plot_min - x_range * left_padding                x_max_auto = plot_max + x_range * (total_right_fraction / (1 - total_right_fraction))                x_limits = (x_min_auto, x_max_auto)                print(f"  X-axis (auto): [{x_min_auto:.3f}, {x_max_auto:.3f}]")            else:                x_limits = (x_min_manual, x_max_manual)                print(f"  X-axis (manual): [{x_min_manual:.3f}, {x_max_manual:.3f}]")            # --- DETERMINE COLORS AND MARKERS ---            if color_scheme == 'gray':                subgroup_color = 'dimgray'                overall_color = 'black'                ci_color_subgroup = 'gray'                ci_color_overall = 'black'            elif color_scheme == 'color':                subgroup_color = '#4A90E2'                overall_color = '#E74C3C'                ci_color_subgroup = '#4A90E2'                ci_color_overall = '#E74C3C'            else:  # bw                subgroup_color = 'black'                overall_color = 'black'                ci_color_subgroup = 'black'                ci_color_overall = 'black'            if marker_style == 'circle_diamond':                subgroup_marker = 'o'                overall_marker = 'D'            elif marker_style == 'square_diamond':                subgroup_marker = 's'                overall_marker = 'D'            else:  # circle_star                subgroup_marker = 'o'                overall_marker = '*'            subgroup_marker_size = 6            overall_marker_size = 8            subgroup_ci_width = 1.5            overall_ci_width = 2.0            if ci_style == 'solid':                capsize = 0            elif ci_style == 'dashed':                capsize = 0            else:  # caps                capsize = 4            # --- CREATE FIGURE ---            fig, ax = plt.subplots(figsize=(plot_width, plot_height))            if transparent_bg:                fig.patch.set_alpha(0)                ax.patch.set_alpha(0)            print(f"\nüé® Plotting {num_rows} rows...")            # --- PLOT DATA POINTS AND ERROR BARS ---            for i, row in plot_df.iterrows():                is_overall = (row['GroupVar'] == 'Overall')                marker = overall_marker if is_overall else subgroup_marker                msize = overall_marker_size if is_overall else subgroup_marker_size                color = overall_color if is_overall else subgroup_color                ci_color = ci_color_overall if is_overall else ci_color_subgroup                ci_width = overall_ci_width if is_overall else subgroup_ci_width                zorder = 5 if is_overall else 3                linestyle = '-' if ci_style != 'dashed' else '--'                ax.errorbar(                    x=row['EffectSize'],                    y=y_positions[i],                    xerr=[[row['EffectSize'] - row['CI_Lower']],                          [row['CI_Upper'] - row['EffectSize']]],                    fmt='none',                    capsize=capsize,                    color=ci_color,                    linewidth=ci_width,                    linestyle=linestyle,                    alpha=0.9,                    zorder=zorder-1                )                ax.plot(                    row['EffectSize'],                    y_positions[i],                    marker=marker,                    markersize=msize,                    markerfacecolor=color,                    markeredgecolor='black' if color_scheme != 'bw' else 'black',                    markeredgewidth=1.0,                    linestyle='none',                    zorder=zorder                )            # --- SET AXIS LIMITS FIRST ---            ax.set_xlim(x_limits[0], x_limits[1])            ax.set_ylim(y_lim_top, y_lim_bottom)  # Inverted            final_xlims = ax.get_xlim()            final_xrange = final_xlims[1] - final_xlims[0]            print(f"  Final X-axis: [{final_xlims[0]:.3f}, {final_xlims[1]:.3f}]")            # --- ADD ANNOTATIONS ---            print(f"  Adding annotations...")            annot_x_offset = annot_distance * final_xrange            for i, row in plot_df.iterrows():                is_overall = (row['GroupVar'] == 'Overall')                font_weight = 'bold' if is_overall else 'normal'                annot_parts = []                if show_k:                    annot_parts.append(f"k={int(row['k'])}")                if show_papers and pd.notna(row['nPapers']):                    annot_parts.append(f"({int(row['nPapers'])})")                if show_fold_change and pd.notna(row['FoldChange']) and es_config.get('has_fold_change', False):                    fold_sign = "+" if row['FoldChange'] > 0 else ""                    annot_parts.append(f"[{fold_sign}{row['FoldChange']:.2f}√ó]")                annotation_text = " ".join(annot_parts) if annot_parts else ""                if annotation_text:                    if annot_pos == 'right':                        x_pos = row['CI_Upper'] + annot_x_offset + (annot_offset * final_xrange * 0.1)                        y_pos = y_positions[i]                        va = 'center'                        ha = 'left'                    elif annot_pos == 'above':                        x_pos = row['EffectSize'] + (annot_offset * final_xrange * 0.1)                        y_pos = y_positions[i] - 0.2                        va = 'bottom'                        ha = 'center'                    else:  # below                        x_pos = row['EffectSize'] + (annot_offset * final_xrange * 0.1)                        y_pos = y_positions[i] + 0.2                        va = 'top'                        ha = 'center'                    ax.text(                        x_pos, y_pos,                        annotation_text,                        va=va, ha=ha,                        fontsize=annot_fontsize,                        fontweight=font_weight,                        clip_on=False                    )            # --- ADD GROUP LABELS (TWO-WAY) ---            if has_subgroups and analysis_type == 'two_way':                print(f"  Adding group labels...")                current_group = None                first_subgroup_idx = 1 if 'Overall' in plot_df['GroupVar'].values else 0                group_label_x_base = final_xlims[1] - (right_padding * final_xrange)                for i, row in plot_df.iterrows():                    group_val = str(row['GroupVar'])                    if group_val != 'Overall' and group_val != current_group:                        if i > first_subgroup_idx:                            ax.axhline(                                y=y_positions[i] - 0.5,                                color='darkgray',                                linewidth=0.8,                                linestyle='-',                                xmin=0.01,                                xmax=0.99,                                zorder=1                            )                        group_indices = plot_df[plot_df['GroupVar'] == group_val].index                        label_y = (y_positions[group_indices[0]] + y_positions[group_indices[-1]]) / 2.0                        label_x = group_label_x_base + (group_label_h_offset * final_xrange * 0.05)                        label_y = label_y + group_label_v_offset                        display_group_label = label_mapping.get(group_val, group_val)                        ax.text(                            label_x, label_y,                            display_group_label,                            va='center',                            ha='right',                            fontweight='bold',                            fontsize=group_label_fontsize,                            color='black',                            clip_on=False                        )                        current_group = group_val            # --- ADD SEPARATOR LINE BELOW OVERALL ---            if len(plot_df) > 1:                separator_y = y_positions[0] + 0.5                ax.axhline(                    y=separator_y,                    color='black',                    linewidth=1.5,                    linestyle='-'                )            # --- CUSTOMIZE AXES ---            print(f"  Customizing axes...")            if show_null_line:                ax.axvline(                    x=0,                    color='black',                    linestyle='-',                    linewidth=1.5,                    alpha=0.8,                    zorder=1                )            ax.set_xlabel(x_label, fontsize=label_fontsize, fontweight='bold')            if show_ylabel:                ax.set_ylabel(y_label, fontsize=label_fontsize, fontweight='bold')            if show_title:                ax.set_title(graph_title, fontweight='bold', fontsize=title_fontsize, pad=15)            ax.set_yticks(y_positions)            ax.set_yticklabels(y_tick_labels, fontsize=tick_fontsize)            ax.tick_params(axis='x', labelsize=tick_fontsize)            if show_grid:                if grid_style == 'dashed_light':                    ax.grid(axis='x', alpha=0.3, linestyle='--', linewidth=0.5)                elif grid_style == 'dotted_light':                    ax.grid(axis='x', alpha=0.3, linestyle=':', linewidth=0.5)                else:  # solid_light                    ax.grid(axis='x', alpha=0.2, linestyle='-', linewidth=0.5)            # --- ADD FOLD-CHANGE AXIS (TOP) ---            if show_fold_axis and es_config.get('has_fold_change', False):                print(f"  Adding fold-change axis...")                ax2 = ax.twiny()                fold_ticks_lnRR = np.array([-2, -1.5, -1, -0.5, 0, 0.5, 1, 1.5, 2])                fold_ticks_RR = np.exp(fold_ticks_lnRR)                valid_mask = ((fold_ticks_lnRR >= final_xlims[0]) &                             (fold_ticks_lnRR <= final_xlims[1]))                fold_ticks_lnRR = fold_ticks_lnRR[valid_mask]                fold_ticks_RR = fold_ticks_RR[valid_mask]                ax2.set_xlim(final_xlims[0], final_xlims[1])                ax2.set_xticks(fold_ticks_lnRR)                fold_labels = []                for rr in fold_ticks_RR:                    if rr < 1:                        fold_labels.append(f"{1/rr:.1f}√ó ‚Üì")                    elif rr > 1:                        fold_labels.append(f"{rr:.1f}√ó ‚Üë")                    else:                        fold_labels.append("1√ó")                ax2.set_xticklabels(fold_labels, fontsize=tick_fontsize)                ax2.set_xlabel("Fold-Change", fontsize=label_fontsize, fontweight='bold')            # --- FINALIZE PLOT ---            fig.tight_layout()            # --- SAVE FILES ---            print(f"\nüíæ Saving files...")            timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")            base_filename = f"{filename_prefix}_{plot_model}_{timestamp}"            saved_files = []            if save_pdf:                pdf_filename = f"{base_filename}.pdf"                fig.savefig(pdf_filename, bbox_inches='tight', transparent=transparent_bg)                saved_files.append(pdf_filename)                print(f"  ‚úì {pdf_filename}")            if save_png:                png_filename = f"{base_filename}.png"                fig.savefig(png_filename, dpi=png_dpi, bbox_inches='tight', transparent=transparent_bg)                saved_files.append(png_filename)                print(f"  ‚úì {png_filename} (DPI: {png_dpi})")            plt.show()            print(f"\n" + "="*70)            print("‚úÖ FOREST PLOT COMPLETE")            print("="*70)            print(f"Files: {', '.join(saved_files)}")        except Exception as e:            print(f"\n‚ùå ERROR: {e}")            traceback.print_exc()# --- 4. CREATE BUTTON AND DISPLAY ---plot_button = widgets.Button(    description='üìä Generate Forest Plot',    button_style='success',    layout=widgets.Layout(width='450px', height='50px'),    style={'font_weight': 'bold', 'font_size': '14px'})plot_button.on_click(generate_plot)print("\n" + "="*70)print("‚úÖ FOREST PLOT INTERFACE READY")print("="*70)print("üëÜ Customize your plot using the tabs above, then click Generate")print("\nüìù Tips:")print("  ‚Ä¢ Use the 'Labels' tab to rename coded variables")print("  ‚Ä¢ Auto-scale considers ALL data points for proper spacing")print("  ‚Ä¢ Annotations and group labels will fit within the plot")print("="*70 + "\n")display(widgets.VBox([    widgets.HTML("<h3 style='color: #2E86AB;'>üìä Forest Plot Generator</h3>"),    widgets.HTML("<p style='color: #666;'>Create publication-ready forest plots with full customization</p>"),    widgets.HTML("<hr style='margin: 15px 0;'>"),    tab,    widgets.HTML("<hr style='margin: 15px 0;'>"),    plot_button,    plot_output]))

In [None]:
# ‚ö†Ô∏è PREREQUISITES:# - Must have continuous or categorical moderator variable# - Recommended: at least 10 studies for reliable inference## Expected runtime: 5-30 seconds with cluster-robust estimation## INTERPRETATION:# - Significant slope (p < 0.05) indicates the moderator explains heterogeneity# - R¬≤ indicates proportion of heterogeneity explained by the moderator# - Cluster-robust SEs account for dependence between effect sizes# CLUSTER-ROBUST VARIANCE ESTIMATION## Method from Hedges, Tipton, & Johnson (2010). Robust variance estimation in# meta-regression with dependent effect size estimates. Research Synthesis Methods, 1(1), 39-65.## This approach:# 1. Allows for dependence between effect sizes from the same cluster (study)# 2. Provides valid standard errors even with small number of clusters# 3. Uses t-distribution with degrees of freedom correction## Particularly important when:# - Multiple effect sizes are extracted from the same study# - Effect sizes are nested within higher-level units# - Assumption of independence is violated#@title üìà META-REGRESSION (Cluster-Robust) (new)#@title üìà META-REGRESSION (Cluster-Robust) (new)# =============================================================================# CELL 10 (REPLACEMENT): META-REGRESSION WITH ROBUST VARIANCE ESTIMATION# Purpose: Test moderators (continuous or categorical)# Method: Uses a weighted-least-squares model with cluster-robust#         standard errors to account for non-independence of effects#         from the same study (id).# Dependencies: Cell 6 (for overall tau-squared)# Outputs: 'meta_regression_RVE_results' in ANALYSIS_CONFIG# =============================================================================# --- 1. HELPER FUNCTIONS ---def run_cluster_robust_regression(reg_df, moderator_col, effect_col, var_col, cluster_col, tau_squared):    """    Runs a mixed-effects meta-regression using weighted least squares (WLS)    and computes cluster-robust standard errors.    """    # --- 1. Prepare data ---    reg_df['weights'] = 1.0 / (reg_df[var_col] + tau_squared)    y = reg_df[effect_col]    X = reg_df[moderator_col]    X = sm.add_constant(X)    weights = reg_df['weights']    # --- 2. Run Weighted Least Squares (WLS) ---    wls_model = sm.WLS(y, X, weights=weights).fit()    # --- 3. Get Cluster-Robust Variance-Covariance Matrix ---    robust_cov = wls_model.get_robustcov_results(        cov_type='cluster',        groups=reg_df[cluster_col]    )    # --- 4. Extract all results ---    M_studies = reg_df[cluster_col].nunique()    k_obs = len(reg_df)    df = M_studies - X.shape[1] # Degrees of freedom    if df < 1:        print(f"  ‚ö†Ô∏è  WARNING: Insufficient clusters ({M_studies}) for {X.shape[1]} predictors. Results are unreliable.")        df = 1    betas = wls_model.params    se_robust = robust_cov.bse    t_stats = betas / se_robust    p_values = 2 * (1 - t.cdf(np.abs(t_stats), df=df))    ci_lower = betas - t.ppf(0.975, df=df) * se_robust    ci_upper = betas + t.ppf(0.975, df=df) * se_robust    # --- 5. Calculate R-squared ---    QM = wls_model.mse_model / wls_model.scale    QT = ANALYSIS_CONFIG['overall_results']['Qt']    R_squared = max(0, (QM / QT) * 100) if QT > 0 else 0.0    results = {        'coefficients': betas,        'std_errors_robust': se_robust,        'var_betas_robust': robust_cov.cov_params(), # *** THIS IS THE NEWLY ADDED LINE ***        't_stats': t_stats,        'p_values_robust': p_values,        'ci_lower_robust': ci_lower,        'ci_upper_robust': ci_upper,        'R_squared_adj': R_squared,        'k_obs': k_obs,        'M_studies': M_studies,        'df': df,        'reg_df': reg_df    }    return results# --- 2. WIDGET DEFINITIONS ---potential_moderators = []analysis_data_init = Nonetry:    if 'ANALYSIS_CONFIG' not in globals():        raise NameError("ANALYSIS_CONFIG not found")    if 'analysis_data' in globals():        analysis_data_init = analysis_data.copy()    elif 'data_filtered' in globals():        analysis_data_init = data_filtered.copy()    else:        raise ValueError("No data found")    excluded_cols = [        ANALYSIS_CONFIG.get('effect_col'), ANALYSIS_CONFIG.get('var_col'),        ANALYSIS_CONFIG.get('se_col'), 'w_fixed', 'w_random', 'id',        'xe', 'sde', 'ne', 'xc', 'sdc', 'nc',        ANALYSIS_CONFIG.get('ci_lower_col'), ANALYSIS_CONFIG.get('ci_upper_col')    ]    excluded_cols = [col for col in excluded_cols if col is not None]    for col in analysis_data_init.columns:        if col not in excluded_cols:            try:                temp_numeric = pd.to_numeric(analysis_data_init[col], errors='coerce')                n_valid = temp_numeric.notna().sum()                n_unique = temp_numeric.nunique()                if n_valid >= 2 and n_unique >= 2:                    potential_moderators.append(col)            except:                passexcept Exception as e:    print(f"‚ö†Ô∏è  Initialization Error: {e}. Please run previous cells.")# --- Widget Interface ---header = widgets.HTML(    "<h3 style='color: #2E86AB;'>Meta-Regression (Cluster-Robust)</h3>"    "<p style='color: #666;'><i>Test moderators using cluster-robust standard errors. This is the correct model for your data structure.</i></p>")moderator_widget = widgets.Dropdown(    options=potential_moderators if potential_moderators else ['No moderators available'],    value=potential_moderators[0] if potential_moderators else 'No moderators available',    description='Moderator:',    style={'description_width': '120px'},    layout=widgets.Layout(width='450px'),    disabled=not bool(potential_moderators))show_info_button = widgets.Button(    description='üìä Show Moderator Info', button_style='info',    layout=widgets.Layout(width='200px'), disabled=not bool(potential_moderators))info_output = widgets.Output()run_button = widgets.Button(    description='‚ñ∂ Run Meta-Regression',    button_style='success',    layout=widgets.Layout(width='450px', height='50px'),    style={'font_weight': 'bold'},    disabled=not bool(potential_moderators))regression_output = widgets.Output()# --- 3. WIDGET EVENT HANDLERS ---def on_show_info_clicked(b):    with info_output:        clear_output()        if moderator_widget.value and moderator_widget.value != 'No moderators available':            try:                mod_col = moderator_widget.value                mod_data = pd.to_numeric(analysis_data_init[mod_col], errors='coerce').dropna()                if len(mod_data) > 0:                    print(f"\nüìä Moderator: {mod_col}")                    print(f"  Valid observations: {len(mod_data)} / {len(analysis_data_init)}")                    print(f"  Range: [{mod_data.min():.3f}, {mod_data.max():.3f}]")                    print(f"  Mean: {mod_data.mean():.3f} | Median: {mod_data.median():.3f}")                else:                    print(f"‚ö†Ô∏è  No valid numeric data found for {mod_col}")            except Exception as e:                print(f"‚ùå Error: {e}")show_info_button.on_click(on_show_info_clicked)# --- 4. MAIN ANALYSIS FUNCTION (Attached to Button) ---@run_button.on_clickdef run_main_analysis(b):    with regression_output:        clear_output(wait=True)        print("="*70)        print("RUNNING CLUSTER-ROBUST META-REGRESSION")        print("="*70)        print(f"Timestamp: {datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")        try:            # --- 1. Load Config and Data ---            print("STEP 1: LOADING CONFIGURATION")            print("---------------------------------")            if 'ANALYSIS_CONFIG' not in globals():                raise NameError("ANALYSIS_CONFIG not found.")            local_data_copy = None            if 'analysis_data' in globals():                local_data_copy = analysis_data.copy()                print(f"  ‚úì Found global 'analysis_data' (Shape: {local_data_copy.shape})")            elif 'data_filtered' in globals():                local_data_copy = data_filtered.copy()                print(f"  ‚úì Found global 'data_filtered' (Shape: {local_data_copy.shape})")            else:                raise ValueError("Data not found. Run Cell 5/6 first.")            effect_col = ANALYSIS_CONFIG['effect_col']            var_col = ANALYSIS_CONFIG['var_col']            es_config = ANALYSIS_CONFIG['es_config']            overall_results = ANALYSIS_CONFIG['overall_results']            tau_sq_uncond = overall_results.get('tau_squared')            if tau_sq_uncond is None:                raise ValueError("tau_squared not found in overall_results. Run Cell 6 first.")            moderator_col_name = moderator_widget.value            if moderator_col_name == 'No moderators available':                raise ValueError("No valid moderator selected.")            print(f"  ‚úì Effect: {es_config['effect_label']} ({effect_col})")            print(f"  ‚úì Moderator: {moderator_col_name}")            print(f"  ‚úì Cluster variable: id")            print(f"  ‚úì Using œÑ¬≤ (from Cell 6): {tau_sq_uncond:.4f}")            # --- 2. Prepare Data ---            print("\nSTEP 2: PREPARING REGRESSION DATA")            print("---------------------------------")            reg_df = local_data_copy.copy()            reg_df[moderator_col_name] = pd.to_numeric(reg_df[moderator_col_name], errors='coerce')            initial_n = len(reg_df)            reg_df.dropna(subset=[effect_col, var_col, 'id', moderator_col_name], inplace=True)            n_non_positive_var = (reg_df[var_col] <= 0).sum()            if n_non_positive_var > 0:                print(f"  ‚ö†Ô∏è  Dropping {n_non_positive_var} observations with zero or negative variance.")                reg_df = reg_df[reg_df[var_col] > 0]            k_reg = len(reg_df)            n_dropped = initial_n - k_reg            if n_dropped > 0:                print(f"  ‚úì Dropped {n_dropped} total observations with missing/invalid data.")            print(f"  ‚úì Using {k_reg} observations for regression.")            if k_reg < 3:                raise ValueError(f"Not enough data (k={k_reg}) for meta-regression. Need at least 3.")            # --- 3. Run Cluster-Robust Regression ---            print("\nSTEP 3: RUNNING WLS + CLUSTER-ROBUST SE")            print("---------------------------------")            results = run_cluster_robust_regression(                reg_df, moderator_col_name, effect_col, var_col, 'id', tau_sq_uncond            )            print("  ‚úì Regression complete.")            # --- 4. Display Results ---            print("\n" + "="*70)            print(f"META-REGRESSION RESULTS: {moderator_col_name}")            print("="*70)            b0, b1 = results['coefficients']            se0, se1 = results['std_errors_robust']            p0, p1 = results['p_values_robust']            ci0_l, ci0_u = results['ci_lower_robust']['const'], results['ci_upper_robust']['const']            ci1_l, ci1_u = results['ci_lower_robust'][moderator_col_name], results['ci_upper_robust'][moderator_col_name]            print(f"\nüìê Regression Model (k = {results['k_obs']} obs, M = {results['M_studies']} studies, df = {results['df']}):")            sign = "+" if b1 >= 0 else ""            print(f"   {effect_col} = {b0:.4f} {sign} {b1:.4f} √ó {moderator_col_name}")            print(f"\nüìä Coefficients (Cluster-Robust SE):")            print(f"  {'Parameter':<20} {'Estimate':<12} {'Robust SE':<10} {'95% CI':<25} {'P-value (t)':<10} {'Sig':<5}")            print(f"  {'-'*20} {'-'*12} {'-'*10} {'-'*25} {'-'*10} {'-'*5}")            sig0 = "***" if p0 < 0.001 else "**" if p0 < 0.01 else "*" if p0 < 0.05 else "ns"            print(f"  {'Intercept':<20} {b0:>11.4f} {se0:>10.4f} [{ci0_l:>7.4f}, {ci0_u:>7.4f}] {p0:>10.4g} {sig0:<5}")            sig1 = "***" if p1 < 0.001 else "**" if p1 < 0.01 else "*" if p1 < 0.05 else "ns"            print(f"  {moderator_col_name:<20} {b1:>11.4f} {se1:>10.4f} [{ci1_l:>7.4f}, {ci1_u:>7.4f}] {p1:>10.4g} {sig1:<5}")            print(f"\n  Significance: *** p<0.001, ** p<0.01, * p<0.05, ns = not significant")            print("\n" + "="*70)            print("HETEROGENEITY EXPLAINED (R¬≤)")            print("="*70)            R_sq = results['R_squared_adj']            print(f"  ‚Ä¢ R¬≤ (approximate): {R_sq:.1f}%")            print(f"  ‚Ä¢ Interpretation: The moderator '{moderator_col_name}' explains approximately {R_sq:.1f}% of the total heterogeneity (œÑ¬≤).")            # --- 5. Save Results ---            print("\nSTEP 4: SAVING RESULTS")            print("---------------------------------")            reg_results_dict = {                'timestamp': datetime.datetime.now(),                'status': 'completed',                'model_type': 'Cluster-Robust (2-Level)',                'moderator_col_name': moderator_col_name,                'effect_col': effect_col,                'k_reg': results['k_obs'],                'M_studies': results['M_studies'],                'df_robust': results['df'],                'betas': results['coefficients'],                'se_betas_robust': results['std_errors_robust'],                'var_betas_robust': results['var_betas_robust'], # *** THIS IS THE CRITICAL ADDITION ***                'b0_intercept': b0, 'b1_slope': b1,                'se_slope': se1, 'p_slope': p1, 'ci_slope': [ci1_l, ci1_u],                'R_squared_adj': R_sq,                'reg_df': results['reg_df'] # Save the data used for plotting            }            ANALYSIS_CONFIG['meta_regression_RVE_results'] = reg_results_dict            print("  ‚úì Results saved to ANALYSIS_CONFIG['meta_regression_RVE_results']")            print("\n" + "="*70)            print("‚úÖ META-REGRESSION (CLUSTER-ROBUST) COMPLETE")            print("="*70)            print(f"  ‚ñ∂Ô∏è  Ready for plotting! Run the next cell (Cell 11) to visualize.")        except Exception as e:            print(f"\n‚ùå AN ERROR OCCURRED:\n")            print(f"  Type: {type(e).__name__}")            print(f"  Message: {e}")            print("\n  Traceback:")            traceback.print_exc(file=sys.stdout)            print("\n" + "="*70)            print("ANALYSIS FAILED. See error message above.")            print("Please check your data and configuration.")            print("="*70)# --- 6. DISPLAY WIDGETS ---try:    if 'ANALYSIS_CONFIG' not in globals() or 'overall_results' not in ANALYSIS_CONFIG:        print("="*70)        print("‚ö†Ô∏è  PREREQUISITE NOT MET")        print("="*70)        print("Please run Cell 6 (Overall Meta-Analysis) before running this cell.")    elif ANALYSIS_CONFIG['overall_results'].get('tau_squared') is None:        print("="*70)        print("‚ö†Ô∏è  PREREQUISITE NOT MET")        print("="*70)        print("Could not find 'tau_squared' in Cell 6 results. Please re-run Cell 6.")    elif not bool(potential_moderators):        print("="*70)        print("‚ö†Ô∏è  NO MODERATORS FOUND")        print("="*70)        print("  No suitable continuous (numeric) moderators were found in your data.")        print("  Meta-regression requires a numeric variable (e.g., year, dose, temperature).")    else:        print("="*70)        print("‚úÖ CLUSTER-ROBUST META-REGRESSION INTERFACE READY")        print("="*70)        print("  ‚úì Select a continuous moderator to test.")        print("  ‚úì Click 'Run' to perform the analysis.")        display(widgets.VBox([            header,            widgets.HTML("<hr style='margin: 15px 0;'>"),            moderator_widget,            widgets.HBox([show_info_button]),            info_output,            widgets.HTML("<hr style='margin: 15px 0;'>"),            run_button,            regression_output        ]))except Exception as e:    print(f"‚ùå An error occurred during initialization: {e}")    print("Please ensure the notebook has been run in order.")

In [None]:
# ‚ö†Ô∏è PREREQUISITES:# - Must have continuous or categorical moderator variable# - Recommended: at least 10 studies for reliable inference## Expected runtime: 5-30 seconds with cluster-robust estimation## INTERPRETATION:# - Significant slope (p < 0.05) indicates the moderator explains heterogeneity# - R¬≤ indicates proportion of heterogeneity explained by the moderator# - Cluster-robust SEs account for dependence between effect sizes# CLUSTER-ROBUST VARIANCE ESTIMATION## Method from Hedges, Tipton, & Johnson (2010). Robust variance estimation in# meta-regression with dependent effect size estimates. Research Synthesis Methods, 1(1), 39-65.## This approach:# 1. Allows for dependence between effect sizes from the same cluster (study)# 2. Provides valid standard errors even with small number of clusters# 3. Uses t-distribution with degrees of freedom correction## Particularly important when:# - Multiple effect sizes are extracted from the same study# - Effect sizes are nested within higher-level units# - Assumption of independence is violated#@title üìà META-REGRESSION PLOT (Cluster-Robust)#@title üìà META-REGRESSION PLOT (Cluster-Robust)# =============================================================================# CELL 11 (REPLACEMENT): META-REGRESSION PLOT# Purpose: Visualize the meta-regression results from Cell 10# Method: Creates a bubble plot with cluster-robust confidence bands# Dependencies: Cell 10 (meta_regression_RVE_results)# Outputs: Publication-ready plot (PDF/PNG)# =============================================================================# --- 1. WIDGET DEFINITIONS ---# Initialize listsavailable_color_moderators = ['None']analysis_data_init = Nonedefault_x_label = "Moderator"default_y_label = "Effect Size"default_title = "Meta-Regression Plot"label_widgets_dict = {} # Dictionary to store label widgetstry:    if 'ANALYSIS_CONFIG' not in globals():        raise NameError("ANALYSIS_CONFIG not found")    if 'analysis_data' in globals():        analysis_data_init = analysis_data.copy()    elif 'data_filtered' in globals():        analysis_data_init = data_filtered.copy()    else:        raise ValueError("No data found")    if 'meta_regression_RVE_results' in ANALYSIS_CONFIG:        reg_results = ANALYSIS_CONFIG['meta_regression_RVE_results']        es_config = ANALYSIS_CONFIG['es_config']        default_x_label = reg_results['moderator_col_name']        default_y_label = es_config['effect_label']        default_title = f"Meta-Regression: {default_y_label} vs. {default_x_label}"    # Find categorical moderators for color AND labels    excluded_cols = [        ANALYSIS_CONFIG.get('effect_col'), ANALYSIS_CONFIG.get('var_col'),        ANALYSIS_CONFIG.get('se_col'), 'w_fixed', 'w_random', 'id',        'xe', 'sde', 'ne', 'xc', 'sdc', 'nc',        ANALYSIS_CONFIG.get('ci_lower_col'), ANALYSIS_CONFIG.get('ci_upper_col')    ]    excluded_cols = [col for col in excluded_cols if col is not None]    categorical_cols = analysis_data_init.select_dtypes(include=['object', 'category']).columns    available_color_moderators.extend([        col for col in categorical_cols        if col not in excluded_cols and analysis_data_init[col].nunique() <= 10    ])    # *** NEW: Find all unique labels for the Label Editor ***    all_categorical_labels = set()    for col in available_color_moderators:        if col != 'None' and col in analysis_data_init.columns:            # Add the column name itself (e.g., "Crop")            all_categorical_labels.add(col)            # Add all unique values in that column (e.g., "B", "C", "R", "W")            all_categorical_labels.update(analysis_data_init[col].astype(str).str.strip().unique())    # Remove any empty strings    all_categorical_labels.discard('')    all_categorical_labels.discard('nan')except Exception as e:    print(f"‚ö†Ô∏è  Initialization Error: {e}. Please run previous cells.")# --- Widget Interface ---header = widgets.HTML(    "<h3 style='color: #2E86AB;'>Meta-Regression Plot Setup</h3>"    "<p style='color: #666;'><i>Visualize the relationship between moderator and effect size</i></p>")# ========== TAB 1: PLOT STYLE ==========show_title_widget = widgets.Checkbox(value=True, description='Show Plot Title', indent=False)title_widget = widgets.Text(value=default_title, description='Plot Title:',                            layout=widgets.Layout(width='450px'), style={'description_width': '120px'})xlabel_widget = widgets.Text(value=default_x_label, description='X-Axis Label:',                             layout=widgets.Layout(width='450px'), style={'description_width': '120px'})ylabel_widget = widgets.Text(value=default_y_label, description='Y-Axis Label:',                             layout=widgets.Layout(width='450px'), style={'description_width': '120px'})width_widget = widgets.FloatSlider(value=8.0, min=5.0, max=14.0, step=0.5, description='Plot Width (in):',                                   continuous_update=False, style={'description_width': '120px'},                                   layout=widgets.Layout(width='450px'))height_widget = widgets.FloatSlider(value=6.0, min=4.0, max=12.0, step=0.5, description='Plot Height (in):',                                    continuous_update=False, style={'description_width': '120px'},                                    layout=widgets.Layout(width='450px'))style_tab = widgets.VBox([    widgets.HTML("<h4 style='color: #2E86AB;'>Labels & Size</h4>"),    show_title_widget, title_widget, xlabel_widget, ylabel_widget, width_widget, height_widget])# ========== TAB 2: DATA POINTS ==========color_mod_widget = widgets.Dropdown(options=available_color_moderators, value='None', description='Color By:',                                    style={'description_width': '120px'}, layout=widgets.Layout(width='450px'))point_color_widget = widgets.Dropdown(options=['gray', 'blue', 'red', 'green', 'purple', 'orange'], value='gray',                                      description='Point Color:', style={'description_width': '120px'},                                      layout=widgets.Layout(width='450px'))bubble_base_widget = widgets.IntSlider(value=20, min=0, max=200, step=10, description='Min Bubble Size:',                                       continuous_update=False, style={'description_width': '120px'},                                       layout=widgets.Layout(width='450px'))bubble_range_widget = widgets.IntSlider(value=800, min=100, max=2000, step=100, description='Max Bubble Size:',                                        continuous_update=False, style={'description_width': '120px'},                                        layout=widgets.Layout(width='450px'))bubble_alpha_widget = widgets.FloatSlider(value=0.6, min=0.1, max=1.0, step=0.1, description='Transparency:',                                          continuous_update=False, style={'description_width': '120px'},                                          layout=widgets.Layout(width='450px'))points_tab = widgets.VBox([    widgets.HTML("<h4 style='color: #2E86AB;'>Data Points</h4>"),    color_mod_widget, point_color_widget,    widgets.HTML("<hr style='margin: 10px 0;'>"),    widgets.HTML("<b>Bubble Size (by precision):</b>"),    bubble_base_widget, bubble_range_widget, bubble_alpha_widget])# ========== TAB 3: REGRESSION LINE ==========show_ci_widget = widgets.Checkbox(value=True, description='Show 95% Confidence Band', indent=False)line_color_widget = widgets.Dropdown(options=['red', 'blue', 'black', 'green', 'purple'], value='red',                                     description='Line Color:', style={'description_width': '120px'},                                     layout=widgets.Layout(width='450px'))line_width_widget = widgets.FloatSlider(value=2.0, min=0.5, max=5.0, step=0.5, description='Line Width:',                                        continuous_update=False, style={'description_width': '120px'},                                        layout=widgets.Layout(width='450px'))ci_alpha_widget = widgets.FloatSlider(value=0.3, min=0.1, max=0.8, step=0.1, description='CI Transparency:',                                      continuous_update=False, style={'description_width': '120px'},                                      layout=widgets.Layout(width='450px'))show_equation_widget = widgets.Checkbox(value=True, description='Show Regression Equation & P-value', indent=False)show_r2_widget = widgets.Checkbox(value=True, description='Show R¬≤ Value', indent=False)regline_tab = widgets.VBox([    widgets.HTML("<h4 style='color: #2E86AB;'>Regression Line</h4>"),    line_color_widget, line_width_widget, show_ci_widget, ci_alpha_widget,    widgets.HTML("<hr style='margin: 10px 0;'>"),    show_equation_widget, show_r2_widget])# ========== TAB 4: LAYOUT & EXPORT ==========show_grid_widget = widgets.Checkbox(value=True, description='Show Grid', indent=False)show_null_line_widget = widgets.Checkbox(value=True, description='Show Null Effect Line (y=0)', indent=False)legend_loc_widget = widgets.Dropdown(options=['best', 'upper right', 'upper left', 'lower left', 'lower right'],                                     value='best', description='Legend Position:',                                     style={'description_width': '120px'}, layout=widgets.Layout(width='450px'))legend_fontsize_widget = widgets.IntSlider(value=10, min=6, max=14, step=1, description='Legend Font:',                                           continuous_update=False, style={'description_width': '120px'},                                           layout=widgets.Layout(width='450px'))save_pdf_widget = widgets.Checkbox(value=True, description='Save as PDF', indent=False)save_png_widget = widgets.Checkbox(value=True, description='Save as PNG', indent=False)png_dpi_widget = widgets.IntSlider(value=300, min=150, max=600, step=50, description='PNG DPI:',                                   continuous_update=False, style={'description_width': '120px'},                                   layout=widgets.Layout(width='450px'))filename_prefix_widget = widgets.Text(value='MetaRegression_Plot', description='Filename Prefix:',                                      layout=widgets.Layout(width='450px'), style={'description_width': '120px'})transparent_bg_widget = widgets.Checkbox(value=False, description='Transparent Background', indent=False)layout_tab = widgets.VBox([    widgets.HTML("<h4 style='color: #2E86AB;'>Layout & Legend</h4>"),    show_grid_widget, show_null_line_widget, legend_loc_widget, legend_fontsize_widget,    widgets.HTML("<hr style='margin: 10px 0;'>"),    widgets.HTML("<h4 style='color: #2E86AB;'>Export</h4>"),    save_pdf_widget, save_png_widget, png_dpi_widget, filename_prefix_widget, transparent_bg_widget])# ========== TAB 5: LABELS (NEW) ==========label_editor_widgets = []for label in sorted(list(all_categorical_labels)):    text_widget = widgets.Text(        value=str(label),        description=f"{label}:",        layout=widgets.Layout(width='500px'),        style={'description_width': '200px'}    )    label_editor_widgets.append(text_widget)    label_widgets_dict[str(label)] = text_widget # Store widget by its original namelabel_tab = widgets.VBox([    widgets.HTML("<h4 style='color: #2E86AB;'>Edit Plot Labels</h4>"),    widgets.HTML("<p style='color: #666;'><i>Rename raw data values (e.g., 'W') to publication-ready labels (e.g., 'Wheat').</i></p>"),    *label_editor_widgets])# --- Assemble Tabs ---tab = widgets.Tab(children=[style_tab, points_tab, regline_tab, layout_tab, label_tab])tab.set_title(0, 'üé® Style'); tab.set_title(1, '‚ö´ Points'); tab.set_title(2, 'üìà Regression')tab.set_title(3, 'üíæ Layout/Export'); tab.set_title(4, '‚úèÔ∏è Labels')run_plot_button = widgets.Button(    description='üìä Generate Regression Plot',    button_style='success',    layout=widgets.Layout(width='450px', height='50px'),    style={'font_weight': 'bold'})plot_output = widgets.Output()# --- 2. PLOTTING FUNCTION ---@run_plot_button.on_clickdef generate_regression_plot(b):    """Generate meta-regression scatter plot with regression line"""    with plot_output:        clear_output(wait=True)        print("="*70)        print("GENERATING CLUSTER-ROBUST META-REGRESSION PLOT")        print("="*70)        print(f"Timestamp: {datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")        try:            # --- 1. Load Data & Config ---            print("STEP 1: LOADING RESULTS FROM CELL 10")            print("---------------------------------")            if 'meta_regression_RVE_results' not in ANALYSIS_CONFIG:                raise ValueError("No meta-regression results found. Please re-run Cell 10.")            reg_results = ANALYSIS_CONFIG['meta_regression_RVE_results']            es_config = ANALYSIS_CONFIG['es_config']            plot_data = reg_results['reg_df'].copy()            moderator_col = reg_results['moderator_col_name']            effect_col = reg_results['effect_col']            var_col = ANALYSIS_CONFIG['var_col']            b0, b1 = reg_results['betas']            var_betas_robust = reg_results['var_betas_robust']            R_sq = reg_results['R_squared_adj']            p_slope = reg_results['p_slope']            df_robust = reg_results['df_robust']            print(f"  ‚úì Loaded results for moderator: {moderator_col}")            print(f"  ‚úì Found {len(plot_data)} data points to plot.")            # --- 2. Get Widget Values (*** FIX: ADDED .value TO ALL ***) ---            show_title = show_title_widget.value            graph_title = title_widget.value            x_label = xlabel_widget.value            y_label = ylabel_widget.value            plot_width = width_widget.value            plot_height = height_widget.value            color_mod_name = color_mod_widget.value            point_color = point_color_widget.value            bubble_base = bubble_base_widget.value            bubble_range = bubble_range_widget.value            bubble_alpha = bubble_alpha_widget.value            show_ci = show_ci_widget.value            line_color = line_color_widget.value            line_width = line_width_widget.value            ci_alpha = ci_alpha_widget.value            show_equation = show_equation_widget.value            show_r2 = show_r2_widget.value            show_grid = show_grid_widget.value            show_null_line = show_null_line_widget.value            legend_loc = legend_loc_widget.value            legend_fontsize = legend_fontsize_widget.value            save_pdf = save_pdf_widget.value            save_png = save_png_widget.value            png_dpi = png_dpi_widget.value            filename_prefix = filename_prefix_widget.value            transparent_bg = transparent_bg_widget.value            # *** END FIX ***            print(f"\nüìä Configuration:")            print(f"  Plot size: {plot_width}\\\" √ó {plot_height}\\\"")            print(f"  Color by: {color_mod_name}")            # --- 2b. Build Label Mapping ---            label_mapping = {orig: w.value for orig, w in label_widgets_dict.items()}            # --- 3. Prepare Data for Plotting ---            print("\nSTEP 2: PREPARING PLOT DATA")            print("---------------------------------")            if 'weights' not in plot_data.columns:                tau_sq_overall = ANALYSIS_CONFIG['overall_results']['tau_squared']                plot_data['weights'] = 1 / (plot_data[var_col] + tau_sq_overall)            min_w = plot_data['weights'].min()            max_w = plot_data['weights'].max()            if max_w > min_w:                plot_data['BubbleSize'] = bubble_base + (                    ((plot_data['weights'] - min_w) / (max_w - min_w)) * bubble_range                )            else:                plot_data['BubbleSize'] = bubble_base + bubble_range / 2            print(f"  ‚úì Bubble sizes calculated (Range: {plot_data['BubbleSize'].min():.0f} to {plot_data['BubbleSize'].max():.0f})")            # --- Handle Color Coding (*** FIX: Corrected logic ***) ---            c_values = point_color            cmap = None            norm = None            unique_cats = []            if color_mod_name != 'None':                if color_mod_name in analysis_data_init.columns:                    # Merge color data from the original dataframe based on index                    color_data = analysis_data_init[[color_mod_name]].copy()                    plot_data = plot_data.merge(color_data, left_index=True, right_index=True, how='left',                                                suffixes=('', '_color'))                    # Use the merged column                    color_col_merged = f"{color_mod_name}"                    plot_data[color_col_merged] = plot_data[color_col_merged].fillna('N/A').astype(str).str.strip()                    plot_data['color_codes'], unique_cats = pd.factorize(plot_data[color_col_merged])                    c_values = plot_data['color_codes']                    cmap = 'tab10' # A good categorical colormap                    norm = plt.Normalize(vmin=0, vmax=len(unique_cats)-1)                    print(f"  ‚úì Applying color based on '{color_mod_name}' ({len(unique_cats)} categories)")                else:                    print(f"  ‚ö†Ô∏è  Color moderator '{color_mod_name}' not found, using default.")                    color_mod_name = 'None'            # *** END COLOR FIX ***            # --- 4. Create Figure ---            print("\nSTEP 3: GENERATING PLOT")            print("---------------------------------")            fig, ax = plt.subplots(figsize=(plot_width, plot_height))            if transparent_bg:                fig.patch.set_alpha(0)                ax.patch.set_alpha(0)            # --- Plot Data Points ---            ax.scatter(                x=plot_data[moderator_col],                y=plot_data[effect_col],                s=plot_data['BubbleSize'],                c=c_values,                cmap=cmap,                norm=norm,                alpha=bubble_alpha,                edgecolors='black',                linewidths=0.5,                zorder=3            )            # --- Plot Regression Line & Confidence Band ---            x_min = plot_data[moderator_col].min()            x_max = plot_data[moderator_col].max()            x_range_val = x_max - x_min            x_padding = x_range_val * 0.05 if x_range_val > 0 else 1            x_line = np.linspace(x_min - x_padding, x_max + x_padding, 100)            y_line = b0 + b1 * x_line            ax.plot(x_line, y_line, color=line_color, linewidth=line_width, zorder=2, label="Regression Line")            if show_ci:                X_line_pred = sm.add_constant(x_line, prepend=True)                se_line = np.array([                    np.sqrt(np.array([1, x]) @ var_betas_robust @ np.array([1, x]).T)                    for x in x_line                ])                t_crit = t.ppf(0.975, df=df_robust)                y_ci_upper = y_line + t_crit * se_line                y_ci_lower = y_line - t_crit * se_line                ax.fill_between(x_line, y_ci_lower, y_ci_upper,                                color=line_color, alpha=ci_alpha, zorder=1, label=f"95% CI (Robust, df={df_robust})")                print("  ‚úì Plotted regression line and robust confidence band.")            # --- Customize Axes ---            if show_null_line:                ax.axhline(es_config.get('null_value', 0), color='gray', linestyle='--', linewidth=1.0, zorder=0)            ax.set_xlabel(x_label, fontsize=12, fontweight='bold')            ax.set_ylabel(y_label, fontsize=12, fontweight='bold')            if show_title:                ax.set_title(graph_title, fontsize=14, fontweight='bold', pad=15)            if show_grid:                ax.grid(True, linestyle=':', alpha=0.4, zorder=0)            # --- Add Equation and R¬≤ ---            if show_equation or show_r2:                text_lines = []                if show_equation:                    sign = "+" if b1 >= 0 else ""                    sig_marker = "***" if p_slope < 0.001 else "**" if p_slope < 0.01 else "*" if p_slope < 0.05 else "ns"                    eq_text = f"y = {b0:.3f} {sign} {b1:.3f}x"                    p_text = f"p (slope) = {p_slope:.3g} {sig_marker}"                    text_lines.append(eq_text)                    text_lines.append(p_text)                if show_r2:                    r2_text = f"R¬≤ (adj) ‚âà {R_sq:.1f}%"                    text_lines.append(r2_text)                ax.text(                    0.05, 0.95, "\n".join(text_lines),                    transform=ax.transAxes, fontsize=10, verticalalignment='top',                    bbox=dict(boxstyle='round', facecolor='white', alpha=0.8, edgecolor='gray'),                    zorder=10                )            # --- Create Legend ---            handles, labels = ax.get_legend_handles_labels()            # *** FIX: Use Label Mapping ***            if color_mod_name != 'None':                for i, cat in enumerate(unique_cats):                    display_label = label_mapping.get(cat, cat) # Get new label                    color_val = plt.get_cmap(cmap)(norm(i))                    handles.append(mpatches.Patch(color=color_val, label=display_label, alpha=bubble_alpha, ec='black', lw=0.5))                    labels.append(display_label)            handles.append(plt.scatter([], [], s=bubble_base + bubble_range/2, c='gray' if color_mod_name == 'None' else 'lightgray',                                       alpha=bubble_alpha, ec='black', lw=0.5))            labels.append("Weight (1 / (v·µ¢ + œÑ¬≤))")            display_legend_title = label_mapping.get(color_mod_name, color_mod_name)            ax.legend(handles=handles, labels=labels, loc=legend_loc,                      fontsize=legend_fontsize, framealpha=0.9,                      title=display_legend_title if color_mod_name != 'None' else None)            # *** END FIX ***            fig.tight_layout()            plt.show()            # --- 5. Save Files ---            print(f"\nSTEP 4: SAVING FILES")            print("---------------------------------")            timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")            base_filename = f"{filename_prefix}_{moderator_col.replace(' ','_')}_{timestamp}"            saved_files = []            if save_pdf:                pdf_filename = f"{base_filename}.pdf"                fig.savefig(pdf_filename, bbox_inches='tight', transparent=transparent_bg)                saved_files.append(pdf_filename)                print(f"  ‚úì {pdf_filename}")            if save_png:                png_filename = f"{base_filename}.png"                fig.savefig(png_filename, dpi=png_dpi, bbox_inches='tight', transparent=transparent_bg)                saved_files.append(png_filename)                print(f"  ‚úì {png_filename} (DPI: {png_dpi})")            print(f"\n" + "="*70)            print("‚úÖ PLOT GENERATION COMPLETE")            print("="*70)        except Exception as e:            print(f"\n‚ùå AN ERROR OCCURRED:\n")            print(f"  Type: {type(e).__name__}")            print(f"  Message: {e}")            print("\n  Traceback:")            traceback.print_exc(file=sys.stdout)            print("\n" + "="*70)            print("ANALYSIS FAILED. See error message above.")            print("Please check your data and configuration.")            print("="*70)# --- 6. DISPLAY WIDGETS ---try:    if 'ANALYSIS_CONFIG' not in globals() or 'meta_regression_RVE_results' not in ANALYSIS_CONFIG:        print("="*70)        print("‚ö†Ô∏è  PREREQUISITE NOT MET")        print("="*70)        print("Please run Cell 10 (Meta-Regression) successfully before running this cell.")    else:        print("="*70)        print("‚úÖ ROBUST META-REGRESSION PLOTTER READY")        print("="*70)        print("  ‚úì Results from Cell 10 are loaded.")        print("  ‚úì Customize your plot using the tabs below and click 'Generate'.")        # Hook up widget events        def on_color_mod_change(change):            point_color_widget.layout.display = 'none' if change['new'] != 'None' else 'flex'        color_mod_widget.observe(on_color_mod_change, names='value')        display(widgets.VBox([            header,            widgets.HTML("<hr style='margin: 15px 0;'>"),            widgets.HTML("<b>Plot Options:</b>"),            tab,            widgets.HTML("<hr style='margin: 15px 0;'>"),            run_plot_button,            plot_output        ]))except Exception as e:    print(f"‚ùå An error occurred during initialization: {e}")    print("Please ensure the notebook has been run in order.")

In [None]:
# CLUSTER-ROBUST VARIANCE ESTIMATION## Method from Hedges, Tipton, & Johnson (2010). Robust variance estimation in# meta-regression with dependent effect size estimates. Research Synthesis Methods, 1(1), 39-65.## This approach:# 1. Allows for dependence between effect sizes from the same cluster (study)# 2. Provides valid standard errors even with small number of clusters# 3. Uses t-distribution with degrees of freedom correction## Particularly important when:# - Multiple effect sizes are extracted from the same study# - Effect sizes are nested within higher-level units# - Assumption of independence is violated#@title üåä NATURAL CUBIC SPLINE ANALYSIS (CLUSTER-ROBUST)#@title üåä NATURAL CUBIC SPLINE ANALYSIS (CLUSTER-ROBUST)# =============================================================================# NATURAL CUBIC SPLINE META-REGRESSION (ANALYSIS)# Purpose: Test for non-linear relationships using splines# Method: WLS + cluster-robust SE# Dependencies: Cell 6 (for tau¬≤)# Outputs: 'spline_model_results' in ANALYSIS_CONFIG# =============================================================================# Import patsy for spline basis generationtry:    PATSY_AVAILABLE = Trueexcept ImportError:    PATSY_AVAILABLE = False    print("‚ö†Ô∏è  WARNING: patsy not installed. Install with: !pip install patsy")# --- 1. CORE FUNCTIONS ---def run_cluster_robust_spline(reg_df, moderator_col, effect_col, var_col,                               cluster_col, tau_squared, df_spline):    """    Runs spline meta-regression with cluster-robust standard errors.    """    # Standardize moderator for numerical stability    mod_mean = reg_df[moderator_col].mean()    mod_std = reg_df[moderator_col].std()    if mod_std == 0 or np.isnan(mod_std):        raise ValueError(f"Moderator has zero variance")    moderator_col_std = f"{moderator_col}_std"    reg_df[moderator_col_std] = (reg_df[moderator_col] - mod_mean) / mod_std    # Generate natural cubic spline basis    spline_formula = f"cr({moderator_col_std}, df={df_spline}) - 1"    try:        X_spline = patsy.dmatrix(spline_formula, data=reg_df, return_type='dataframe')    except Exception as e:        raise ValueError(f"Failed to create spline basis: {e}")    # Add intercept manually    X_full = sm.add_constant(X_spline, prepend=True, has_constant='add')    # Response and weights    y = reg_df[effect_col]    weights = 1.0 / (reg_df[var_col] + tau_squared)    # Fit WLS model    wls_model = sm.WLS(y, X_full, weights=weights).fit()    # Get cluster-robust covariance    robust_cov = wls_model.get_robustcov_results(        cov_type='cluster',        groups=reg_df[cluster_col]    )    M_studies = reg_df[cluster_col].nunique()    k_obs = len(reg_df)    p_params = X_full.shape[1]    df_resid = M_studies - p_params    if df_resid < 1:        print(f"  ‚ö†Ô∏è  WARNING: Only {M_studies} clusters for {p_params} parameters")        df_resid = 1    # Extract results    betas = wls_model.params    se_robust = robust_cov.bse    t_stats = betas / se_robust    p_values = 2 * (1 - t.cdf(np.abs(t_stats), df=df_resid))    ci_lower = betas - t.ppf(0.975, df=df_resid) * se_robust    ci_upper = betas + t.ppf(0.975, df=df_resid) * se_robust    # F-test for overall spline effect (test all except intercept)    R = np.eye(p_params)[1:, :]    try:        f_test = wls_model.f_test(R)        f_stat = f_test.fvalue[0][0]        f_pvalue = f_test.pvalue    except:        f_stat = np.nan        f_pvalue = np.nan    # Generate predictions for plotting    x_min = reg_df[moderator_col].min()    x_max = reg_df[moderator_col].max()    x_pred_orig = np.linspace(x_min, x_max, 100)    x_pred_std = (x_pred_orig - mod_mean) / mod_std    pred_data = pd.DataFrame({moderator_col_std: x_pred_std})    X_pred_spline = patsy.dmatrix(spline_formula, data=pred_data, return_type='dataframe')    X_pred_full = sm.add_constant(X_pred_spline, prepend=True, has_constant='add')    # Convert to arrays    X_pred_arr = np.array(X_pred_full)    betas_arr = np.array(betas)    var_betas_arr = np.array(robust_cov.cov_params())    # Predictions and CI    y_pred = X_pred_arr @ betas_arr    var_pred = np.sum((X_pred_arr @ var_betas_arr) * X_pred_arr, axis=1)    se_pred = np.sqrt(var_pred)    t_crit = t.ppf(0.975, df_resid)    ci_lower_pred = y_pred - t_crit * se_pred    ci_upper_pred = y_pred + t_crit * se_pred    results = {        'betas': betas,        'se_robust': se_robust,        'var_betas_robust': robust_cov.cov_params(),        't_stats': t_stats,        'p_values': p_values,        'ci_lower': ci_lower,        'ci_upper': ci_upper,        'k_obs': k_obs,        'M_studies': M_studies,        'df_resid': df_resid,        'p_params': p_params,        'f_stat': f_stat,        'f_pvalue': f_pvalue,        'X_full': X_full,        'spline_formula': spline_formula,        'mod_mean': mod_mean,        'mod_std': mod_std,        'moderator_col_std': moderator_col_std,        'reg_df': reg_df,        'predictions': {            'x_orig': x_pred_orig,            'y_pred': y_pred,            'ci_lower': ci_lower_pred,            'ci_upper': ci_upper_pred        }    }    return results# --- 2. WIDGET DEFINITIONS ---potential_moderators = []analysis_data_init = Nonetry:    if 'ANALYSIS_CONFIG' not in globals():        raise NameError("ANALYSIS_CONFIG not found")    if 'analysis_data' in globals():        analysis_data_init = analysis_data.copy()    elif 'data_filtered' in globals():        analysis_data_init = data_filtered.copy()    else:        raise ValueError("No data found")    excluded_cols = [        ANALYSIS_CONFIG.get('effect_col'), ANALYSIS_CONFIG.get('var_col'),        ANALYSIS_CONFIG.get('se_col'), 'w_fixed', 'w_random', 'id',        'xe', 'sde', 'ne', 'xc', 'sdc', 'nc',        ANALYSIS_CONFIG.get('ci_lower_col'), ANALYSIS_CONFIG.get('ci_upper_col')    ]    excluded_cols = [col for col in excluded_cols if col is not None]    for col in analysis_data_init.columns:        if col not in excluded_cols:            try:                temp_numeric = pd.to_numeric(analysis_data_init[col], errors='coerce')                n_valid = temp_numeric.notna().sum()                n_unique = temp_numeric.nunique()                if n_valid >= 5 and n_unique >= 4:                    potential_moderators.append(col)            except:                passexcept Exception as e:    print(f"‚ö†Ô∏è  Initialization Error: {e}")# Widget Interfaceheader = widgets.HTML(    "<h3 style='color: #2E86AB;'>üåä Natural Cubic Spline Analysis</h3>"    "<p style='color: #666;'><i>Test for non-linear relationships using cluster-robust splines</i></p>")moderator_widget = widgets.Dropdown(    options=potential_moderators if potential_moderators else ['No moderators available'],    value=potential_moderators[0] if potential_moderators else 'No moderators available',    description='Moderator:',    style={'description_width': '120px'},    layout=widgets.Layout(width='450px'),    disabled=not bool(potential_moderators) or not PATSY_AVAILABLE)df_slider = widgets.IntSlider(    value=3,    min=3,    max=6,    step=1,    description='Degrees of Freedom:',    style={'description_width': '150px'},    layout=widgets.Layout(width='450px'),    disabled=not bool(potential_moderators) or not PATSY_AVAILABLE)df_info = widgets.HTML(    "<p style='color: #666; font-size: 12px;'>"    "‚Ä¢ df=3: Simple curve (1 knot) | ‚Ä¢ df=4: Moderate (2 knots) | ‚Ä¢ df=5-6: More complex"    "</p>")run_button = widgets.Button(    description='‚ñ∂ Run Spline Analysis',    button_style='success',    layout=widgets.Layout(width='450px', height='50px'),    style={'font_weight': 'bold'},    disabled=not bool(potential_moderators) or not PATSY_AVAILABLE)analysis_output = widgets.Output()# --- 3. MAIN ANALYSIS FUNCTION ---@run_button.on_clickdef run_spline_analysis(b):    with analysis_output:        clear_output(wait=True)        print("="*70)        print("NATURAL CUBIC SPLINE ANALYSIS")        print("="*70)        print(f"Timestamp: {datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")        try:            # --- 1. Load Config ---            print("STEP 1: LOADING CONFIGURATION")            print("-" * 70)            if 'ANALYSIS_CONFIG' not in globals():                raise NameError("ANALYSIS_CONFIG not found")            local_data_copy = None            if 'analysis_data' in globals():                local_data_copy = analysis_data.copy()            elif 'data_filtered' in globals():                local_data_copy = data_filtered.copy()            else:                raise ValueError("Data not found")            effect_col = ANALYSIS_CONFIG['effect_col']            var_col = ANALYSIS_CONFIG['var_col']            es_config = ANALYSIS_CONFIG['es_config']            overall_results = ANALYSIS_CONFIG['overall_results']            tau_sq = overall_results.get('tau_squared')            if tau_sq is None:                raise ValueError("tau_squared not found. Run Cell 6 first.")            moderator_col = moderator_widget.value            df_spline = df_slider.value            if moderator_col == 'No moderators available':                raise ValueError("No valid moderator selected")            # Clean moderator name            moderator_col = moderator_col.replace("_std", "")            print(f"  ‚úì Effect: {es_config['effect_label']} ({effect_col})")            print(f"  ‚úì Moderator: {moderator_col}")            print(f"  ‚úì Spline df: {df_spline}")            print(f"  ‚úì Using œÑ¬≤ from Cell 6: {tau_sq:.4f}")            # --- 2. Prepare Data ---            print("\nSTEP 2: PREPARING DATA")            print("-" * 70)            reg_df = local_data_copy.copy()            reg_df[moderator_col] = pd.to_numeric(reg_df[moderator_col], errors='coerce')            initial_n = len(reg_df)            reg_df.dropna(subset=[effect_col, var_col, 'id', moderator_col], inplace=True)            reg_df = reg_df[reg_df[var_col] > 0]            k_reg = len(reg_df)            M_studies = reg_df['id'].nunique()            n_dropped = initial_n - k_reg            if n_dropped > 0:                print(f"  ‚úì Dropped {n_dropped} observations with missing/invalid data")            print(f"  ‚úì Using {k_reg} observations from {M_studies} studies")            if k_reg < df_spline + 3:                raise ValueError(f"Not enough data (k={k_reg}) for df={df_spline}. Need at least {df_spline + 3}.")            if M_studies < df_spline + 2:                print(f"  ‚ö†Ô∏è  WARNING: Only {M_studies} studies for {df_spline+1} spline parameters")                print(f"     Results may be unstable. Consider reducing df or using linear model.")            # --- 3. Run Spline Model ---            print("\nSTEP 3: FITTING SPLINE MODEL")            print("-" * 70)            print("  Estimating parameters...")            results = run_cluster_robust_spline(                reg_df, moderator_col, effect_col, var_col,                'id', tau_sq, df_spline            )            print(f"  ‚úì Model fitted successfully")            print(f"  ‚úì Parameters: {results['p_params']} ({results['p_params']-1} spline basis + 1 intercept)")            # --- 4. Display Results ---            print("\n" + "="*70)            print("SPLINE MODEL RESULTS")            print("="*70)            print(f"\nüìä Model Summary:")            print(f"  ‚Ä¢ Observations: {results['k_obs']}")            print(f"  ‚Ä¢ Studies (clusters): {results['M_studies']}")            print(f"  ‚Ä¢ Degrees of freedom: {results['df_resid']}")            print(f"  ‚Ä¢ Formula: {results['spline_formula']}")            if not np.isnan(results['f_stat']):                print(f"\nüî¨ Overall Non-linearity Test:")                print(f"  ‚Ä¢ F-statistic: {results['f_stat']:.3f}")                print(f"  ‚Ä¢ P-value: {results['f_pvalue']:.4g}")                if results['f_pvalue'] < 0.05:                    sig_marker = "***" if results['f_pvalue'] < 0.001 else "**" if results['f_pvalue'] < 0.01 else "*"                    print(f"  ‚úì Significant non-linear relationship {sig_marker}")                else:                    print(f"  ‚Ä¢ No significant non-linearity detected (p ‚â• 0.05)")            print(f"\nüìà Coefficient Table (Cluster-Robust SE):")            print(f"  {'Parameter':<25} {'Estimate':>10} {'Robust SE':>10} {'t-stat':>8} {'p-value':>10}")            print(f"  {'-'*25} {'-'*10} {'-'*10} {'-'*8} {'-'*10}")            # Convert to arrays            betas_arr = np.array(results['betas'])            se_arr = np.array(results['se_robust'])            t_arr = np.array(results['t_stats'])            p_arr = np.array(results['p_values'])            # Get parameter names            if hasattr(results['betas'], 'index'):                names = list(results['betas'].index)            else:                names = ['Intercept'] + [f'Spline_{i}' for i in range(1, len(betas_arr))]            for i in range(len(betas_arr)):                sig = "***" if p_arr[i] < 0.001 else "**" if p_arr[i] < 0.01 else "*" if p_arr[i] < 0.05 else ""                name_display = names[i] if len(names[i]) < 25 else names[i][:22]+"..."                print(f"  {name_display:<25} {betas_arr[i]:>10.4f} {se_arr[i]:>10.4f} {t_arr[i]:>8.3f} {p_arr[i]:>10.4g} {sig}")            print(f"\n  Significance: *** p<0.001, ** p<0.01, * p<0.05")            # --- 5. Interpretation ---            print("\n" + "="*70)            print("INTERPRETATION")            print("="*70)            # Check if F-test was successful            if not np.isnan(results['f_pvalue']):                if results['f_pvalue'] < 0.05:                    print(f"\n‚úì SIGNIFICANT NON-LINEAR RELATIONSHIP DETECTED")                    print(f"\n  The spline analysis reveals that the relationship between")                    print(f"  {moderator_col} and {effect_col} is non-linear (p={results['f_pvalue']:.4g}).")                    print(f"\n  What this means:")                    print(f"  ‚Ä¢ The effect of {moderator_col} changes depending on its value")                    print(f"  ‚Ä¢ A simple linear model would miss important patterns")                    print(f"  ‚Ä¢ There may be optimal ranges, thresholds, or saturation effects")                    print(f"\n  Next steps:")                    print(f"  ‚Üí Run the next cell to visualize the non-linear pattern")                    print(f"  ‚Üí Look for turning points, plateaus, or U-shaped curves")                else:                    print(f"\n‚Ä¢ NO SIGNIFICANT NON-LINEARITY DETECTED (p={results['f_pvalue']:.3f})")                    print(f"\n  The relationship between {moderator_col} and {effect_col}")                    print(f"  appears to be approximately linear, or there is insufficient")                    print(f"  data to detect non-linearity.")                    print(f"\n  Possible reasons:")                    print(f"  ‚Ä¢ The relationship is truly linear")                    print(f"  ‚Ä¢ Sample size is too small to detect non-linearity")                    print(f"  ‚Ä¢ Limited range of moderator values")                    print(f"\n  Recommendation:")                    print(f"  ‚Üí Consider using the simpler linear meta-regression (Cell 10)")                    print(f"  ‚Üí Or visualize anyway to check the pattern")            else:                # F-test failed or wasn't computed                print(f"\n‚ö†Ô∏è  F-TEST NOT AVAILABLE")                print(f"\n  The overall F-test for non-linearity could not be computed.")                print(f"  This can happen due to:")                print(f"  ‚Ä¢ Numerical issues in the model")                print(f"  ‚Ä¢ Perfect multicollinearity in spline basis")                print(f"  ‚Ä¢ Insufficient degrees of freedom")                print(f"\n  üí° ALTERNATIVE INTERPRETATION:")                print(f"\n  Look at individual spline coefficients:")                # Show which spline terms are significant                spline_indices = [i for i, name in enumerate(names) if 'cr(' in str(name)]                n_sig = sum([p_arr[i] < 0.05 for i in spline_indices])                if n_sig > 0:                    print(f"  ‚Ä¢ {n_sig} out of {len(spline_indices)} spline terms are significant (p<0.05)")                    print(f"  ‚Ä¢ This suggests SOME non-linearity may be present")                    print(f"\n  Significant terms:")                    for i in spline_indices:                        if p_arr[i] < 0.05:                            sig_marker = "***" if p_arr[i] < 0.001 else "**" if p_arr[i] < 0.01 else "*"                            print(f"    - {names[i]}: p={p_arr[i]:.4g} {sig_marker}")                else:                    print(f"  ‚Ä¢ NONE of the {len(spline_indices)} spline terms are significant")                    print(f"  ‚Ä¢ This suggests the relationship is likely LINEAR")                print(f"\n  Next steps:")                print(f"  ‚Üí Visualize the curve (next cell) to inspect the pattern")                print(f"  ‚Üí Compare with linear meta-regression (Cell 10)")                print(f"  ‚Üí If curve looks flat, use linear model instead")            # --- 6. Save Results ---            print("\n" + "="*70)            print("SAVING RESULTS")            print("="*70)            results_dict = {                'timestamp': datetime.datetime.now(),                'status': 'completed',                'model_type': 'Spline (Cluster-Robust)',                'moderator_col': moderator_col,                'effect_col': effect_col,                'df_spline': df_spline,                'f_stat': results['f_stat'],                'f_pvalue': results['f_pvalue'],                'betas': results['betas'],                'se_robust': results['se_robust'],                'var_betas_robust': results['var_betas_robust'],                'k_obs': results['k_obs'],                'M_studies': results['M_studies'],                'df_resid': results['df_resid'],                'predictions': results['predictions'],                'reg_df': results['reg_df'],                'mod_mean': results['mod_mean'],                'mod_std': results['mod_std']            }            ANALYSIS_CONFIG['spline_model_results'] = results_dict            print("  ‚úì Results saved to ANALYSIS_CONFIG['spline_model_results']")            print("\n" + "="*70)            print("‚úÖ SPLINE ANALYSIS COMPLETE")            print("="*70)            print("\n  üìä Ready for visualization!")            print("  ‚Üí Run the next cell to create customizable spline plot")        except Exception as e:            print(f"\n‚ùå ERROR OCCURRED:\n")            print(f"  Type: {type(e).__name__}")            print(f"  Message: {e}")            print("\n  Traceback:")            traceback.print_exc()            print("\n" + "="*70)            print("ANALYSIS FAILED")            print("="*70)# --- 4. DISPLAY WIDGETS ---try:    if not PATSY_AVAILABLE:        print("="*70)        print("‚ö†Ô∏è  PATSY NOT INSTALLED")        print("="*70)        print("  The 'patsy' package is required for spline analysis.")        print("  Install with: !pip install patsy")    elif 'ANALYSIS_CONFIG' not in globals() or 'overall_results' not in ANALYSIS_CONFIG:        print("="*70)        print("‚ö†Ô∏è  PREREQUISITE NOT MET")        print("="*70)        print("  Please run Cell 6 (Overall Meta-Analysis) before running this cell.")    elif not bool(potential_moderators):        print("="*70)        print("‚ö†Ô∏è  NO MODERATORS FOUND")        print("="*70)        print("  No suitable continuous (numeric) moderators were found in your data.")        print("  Spline analysis requires at least 5 observations with 4+ unique values.")    else:        print("="*70)        print("‚úÖ SPLINE ANALYSIS INTERFACE READY")        print("="*70)        print("  ‚úì Select a continuous moderator to test for non-linearity")        print("  ‚úì Choose degrees of freedom (complexity of curve)")        print("  ‚úì Click 'Run' to perform the analysis")        display(widgets.VBox([            header,            widgets.HTML("<hr style='margin: 15px 0;'>"),            moderator_widget,            df_slider,            df_info,            widgets.HTML("<hr style='margin: 15px 0;'>"),            run_button,            analysis_output        ]))except Exception as e:    print(f"‚ùå Initialization error: {e}")    print("Please ensure the notebook has been run in order.")

In [None]:
# CLUSTER-ROBUST VARIANCE ESTIMATION## Method from Hedges, Tipton, & Johnson (2010). Robust variance estimation in# meta-regression with dependent effect size estimates. Research Synthesis Methods, 1(1), 39-65.## This approach:# 1. Allows for dependence between effect sizes from the same cluster (study)# 2. Provides valid standard errors even with small number of clusters# 3. Uses t-distribution with degrees of freedom correction## Particularly important when:# - Multiple effect sizes are extracted from the same study# - Effect sizes are nested within higher-level units# - Assumption of independence is violated#@title üåä SPLINE PLOT (CLUSTER-ROBUST)#@title üåä SPLINE PLOT (CLUSTER-ROBUST)# =============================================================================# SPLINE VISUALIZATION CELL# Purpose: Visualize the spline meta-regression results# Method: Creates customizable plot with spline curve and confidence bands# Dependencies: Previous cell (spline_model_results)# Outputs: Publication-ready plot (PDF/PNG)# =============================================================================# --- 1. WIDGET DEFINITIONS ---available_color_moderators = ['None']analysis_data_init = Nonedefault_x_label = "Moderator"default_y_label = "Effect Size"default_title = "Natural Cubic Spline Analysis"label_widgets_dict = {}try:    if 'ANALYSIS_CONFIG' not in globals():        raise NameError("ANALYSIS_CONFIG not found")    if 'analysis_data' in globals():        analysis_data_init = analysis_data.copy()    elif 'data_filtered' in globals():        analysis_data_init = data_filtered.copy()    else:        raise ValueError("No data found")    if 'spline_model_results' in ANALYSIS_CONFIG:        spline_results = ANALYSIS_CONFIG['spline_model_results']        es_config = ANALYSIS_CONFIG['es_config']        default_x_label = spline_results['moderator_col']        default_y_label = es_config['effect_label']        default_title = f"Spline Analysis: {default_y_label} vs. {default_x_label}"    # Find categorical moderators for color    excluded_cols = [        ANALYSIS_CONFIG.get('effect_col'), ANALYSIS_CONFIG.get('var_col'),        ANALYSIS_CONFIG.get('se_col'), 'w_fixed', 'w_random', 'id',        'xe', 'sde', 'ne', 'xc', 'sdc', 'nc',        ANALYSIS_CONFIG.get('ci_lower_col'), ANALYSIS_CONFIG.get('ci_upper_col')    ]    excluded_cols = [col for col in excluded_cols if col is not None]    categorical_cols = analysis_data_init.select_dtypes(include=['object', 'category']).columns    available_color_moderators.extend([        col for col in categorical_cols        if col not in excluded_cols and analysis_data_init[col].nunique() <= 10    ])    # Find all unique labels for Label Editor    all_categorical_labels = set()    for col in available_color_moderators:        if col != 'None' and col in analysis_data_init.columns:            all_categorical_labels.add(col)            all_categorical_labels.update(analysis_data_init[col].astype(str).str.strip().unique())    all_categorical_labels.discard('')    all_categorical_labels.discard('nan')except Exception as e:    print(f"‚ö†Ô∏è  Initialization Error: {e}")# --- Widget Interface ---header = widgets.HTML(    "<h3 style='color: #2E86AB;'>üåä Spline Plot Setup</h3>"    "<p style='color: #666;'><i>Visualize the non-linear relationship between moderator and effect size</i></p>")# ========== TAB 1: PLOT STYLE ==========show_title_widget = widgets.Checkbox(value=True, description='Show Plot Title', indent=False)title_widget = widgets.Text(value=default_title, description='Plot Title:',                            layout=widgets.Layout(width='450px'), style={'description_width': '120px'})xlabel_widget = widgets.Text(value=default_x_label, description='X-Axis Label:',                             layout=widgets.Layout(width='450px'), style={'description_width': '120px'})ylabel_widget = widgets.Text(value=default_y_label, description='Y-Axis Label:',                             layout=widgets.Layout(width='450px'), style={'description_width': '120px'})width_widget = widgets.FloatSlider(value=10.0, min=5.0, max=14.0, step=0.5, description='Plot Width (in):',                                   continuous_update=False, style={'description_width': '120px'},                                   layout=widgets.Layout(width='450px'))height_widget = widgets.FloatSlider(value=6.0, min=4.0, max=12.0, step=0.5, description='Plot Height (in):',                                    continuous_update=False, style={'description_width': '120px'},                                    layout=widgets.Layout(width='450px'))style_tab = widgets.VBox([    widgets.HTML("<h4 style='color: #2E86AB;'>Labels & Size</h4>"),    show_title_widget, title_widget, xlabel_widget, ylabel_widget, width_widget, height_widget])# ========== TAB 2: DATA POINTS ==========color_mod_widget = widgets.Dropdown(options=available_color_moderators, value='None', description='Color By:',                                    style={'description_width': '120px'}, layout=widgets.Layout(width='450px'))point_color_widget = widgets.Dropdown(options=['steelblue', 'gray', 'blue', 'red', 'green', 'purple', 'orange'],                                      value='steelblue', description='Point Color:',                                      style={'description_width': '120px'}, layout=widgets.Layout(width='450px'))point_size_widget = widgets.IntSlider(value=50, min=10, max=200, step=10, description='Point Size:',                                      continuous_update=False, style={'description_width': '120px'},                                      layout=widgets.Layout(width='450px'))point_alpha_widget = widgets.FloatSlider(value=0.5, min=0.1, max=1.0, step=0.1, description='Transparency:',                                         continuous_update=False, style={'description_width': '120px'},                                         layout=widgets.Layout(width='450px'))show_points_widget = widgets.Checkbox(value=True, description='Show Data Points', indent=False)points_tab = widgets.VBox([    widgets.HTML("<h4 style='color: #2E86AB;'>Data Points</h4>"),    show_points_widget,    color_mod_widget, point_color_widget, point_size_widget, point_alpha_widget])# ========== TAB 3: SPLINE CURVE ==========show_ci_widget = widgets.Checkbox(value=True, description='Show 95% Confidence Band', indent=False)curve_color_widget = widgets.Dropdown(options=['darkred', 'red', 'blue', 'black', 'green', 'purple'],                                      value='darkred', description='Curve Color:',                                      style={'description_width': '120px'}, layout=widgets.Layout(width='450px'))curve_width_widget = widgets.FloatSlider(value=2.5, min=0.5, max=5.0, step=0.5, description='Curve Width:',                                         continuous_update=False, style={'description_width': '120px'},                                         layout=widgets.Layout(width='450px'))ci_alpha_widget = widgets.FloatSlider(value=0.2, min=0.1, max=0.8, step=0.1, description='CI Transparency:',                                      continuous_update=False, style={'description_width': '120px'},                                      layout=widgets.Layout(width='450px'))show_f_test_widget = widgets.Checkbox(value=True, description='Show F-test Result', indent=False)curve_tab = widgets.VBox([    widgets.HTML("<h4 style='color: #2E86AB;'>Spline Curve</h4>"),    curve_color_widget, curve_width_widget, show_ci_widget, ci_alpha_widget,    widgets.HTML("<hr style='margin: 10px 0;'>"),    show_f_test_widget])# ========== TAB 4: LAYOUT & EXPORT ==========show_grid_widget = widgets.Checkbox(value=True, description='Show Grid', indent=False)show_null_line_widget = widgets.Checkbox(value=True, description='Show Null Effect Line (y=0)', indent=False)legend_loc_widget = widgets.Dropdown(options=['best', 'upper right', 'upper left', 'lower left', 'lower right'],                                     value='best', description='Legend Position:',                                     style={'description_width': '120px'}, layout=widgets.Layout(width='450px'))legend_fontsize_widget = widgets.IntSlider(value=10, min=6, max=14, step=1, description='Legend Font:',                                           continuous_update=False, style={'description_width': '120px'},                                           layout=widgets.Layout(width='450px'))save_pdf_widget = widgets.Checkbox(value=True, description='Save as PDF', indent=False)save_png_widget = widgets.Checkbox(value=True, description='Save as PNG', indent=False)png_dpi_widget = widgets.IntSlider(value=300, min=150, max=600, step=50, description='PNG DPI:',                                   continuous_update=False, style={'description_width': '120px'},                                   layout=widgets.Layout(width='450px'))filename_prefix_widget = widgets.Text(value='Spline_Plot', description='Filename Prefix:',                                      layout=widgets.Layout(width='450px'), style={'description_width': '120px'})transparent_bg_widget = widgets.Checkbox(value=False, description='Transparent Background', indent=False)layout_tab = widgets.VBox([    widgets.HTML("<h4 style='color: #2E86AB;'>Layout & Legend</h4>"),    show_grid_widget, show_null_line_widget, legend_loc_widget, legend_fontsize_widget,    widgets.HTML("<hr style='margin: 10px 0;'>"),    widgets.HTML("<h4 style='color: #2E86AB;'>Export</h4>"),    save_pdf_widget, save_png_widget, png_dpi_widget, filename_prefix_widget, transparent_bg_widget])# ========== TAB 5: LABELS ==========label_editor_widgets = []for label in sorted(list(all_categorical_labels)):    text_widget = widgets.Text(        value=str(label),        description=f"{label}:",        layout=widgets.Layout(width='500px'),        style={'description_width': '200px'}    )    label_editor_widgets.append(text_widget)    label_widgets_dict[str(label)] = text_widgetlabel_tab = widgets.VBox([    widgets.HTML("<h4 style='color: #2E86AB;'>Edit Plot Labels</h4>"),    widgets.HTML("<p style='color: #666;'><i>Rename raw data values to publication-ready labels.</i></p>"),    *label_editor_widgets])# --- Assemble Tabs ---tab = widgets.Tab(children=[style_tab, points_tab, curve_tab, layout_tab, label_tab])tab.set_title(0, 'üé® Style')tab.set_title(1, '‚ö´ Points')tab.set_title(2, 'üåä Spline')tab.set_title(3, 'üíæ Layout/Export')tab.set_title(4, '‚úèÔ∏è Labels')run_plot_button = widgets.Button(    description='üìä Generate Spline Plot',    button_style='success',    layout=widgets.Layout(width='450px', height='50px'),    style={'font_weight': 'bold'})plot_output = widgets.Output()# --- 2. PLOTTING FUNCTION ---@run_plot_button.on_clickdef generate_spline_plot(b):    """Generate spline plot with customizations"""    with plot_output:        clear_output(wait=True)        print("="*70)        print("GENERATING SPLINE PLOT")        print("="*70)        print(f"Timestamp: {datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")        try:            # --- 1. Load Data & Config ---            print("STEP 1: LOADING RESULTS")            print("---------------------------------")            if 'spline_model_results' not in ANALYSIS_CONFIG:                raise ValueError("No spline results found. Please run the spline analysis cell first.")            spline_results = ANALYSIS_CONFIG['spline_model_results']            es_config = ANALYSIS_CONFIG['es_config']            plot_data = spline_results['reg_df'].copy()            moderator_col = spline_results['moderator_col']            effect_col = spline_results['effect_col']            predictions = spline_results['predictions']            x_pred = predictions['x_orig']            y_pred = predictions['y_pred']            ci_lower = predictions['ci_lower']            ci_upper = predictions['ci_upper']            f_stat = spline_results['f_stat']            f_pvalue = spline_results['f_pvalue']            df_spline = spline_results['df_spline']            print(f"  ‚úì Loaded results for moderator: {moderator_col}")            print(f"  ‚úì Found {len(plot_data)} data points")            print(f"  ‚úì Spline df: {df_spline}")            # --- 2. Get Widget Values ---            show_title = show_title_widget.value            graph_title = title_widget.value            x_label = xlabel_widget.value            y_label = ylabel_widget.value            plot_width = width_widget.value            plot_height = height_widget.value            color_mod_name = color_mod_widget.value            point_color = point_color_widget.value            point_size = point_size_widget.value            point_alpha = point_alpha_widget.value            show_points = show_points_widget.value            show_ci = show_ci_widget.value            curve_color = curve_color_widget.value            curve_width = curve_width_widget.value            ci_alpha = ci_alpha_widget.value            show_f_test = show_f_test_widget.value            show_grid = show_grid_widget.value            show_null_line = show_null_line_widget.value            legend_loc = legend_loc_widget.value            legend_fontsize = legend_fontsize_widget.value            save_pdf = save_pdf_widget.value            save_png = save_png_widget.value            png_dpi = png_dpi_widget.value            filename_prefix = filename_prefix_widget.value            transparent_bg = transparent_bg_widget.value            print(f"\nüìä Configuration:")            print(f"  Plot size: {plot_width}\" √ó {plot_height}\"")            print(f"  Color by: {color_mod_name}")            # --- 3. Build Label Mapping ---            label_mapping = {orig: w.value for orig, w in label_widgets_dict.items()}            # --- 4. Prepare Data for Plotting ---            print("\nSTEP 2: PREPARING PLOT DATA")            print("---------------------------------")            # Handle Color Coding            c_values = point_color            cmap = None            norm = None            unique_cats = []            if color_mod_name != 'None':                if color_mod_name in analysis_data_init.columns:                    color_data = analysis_data_init[[color_mod_name]].copy()                    plot_data = plot_data.merge(color_data, left_index=True, right_index=True,                                               how='left', suffixes=('', '_color'))                    plot_data[color_mod_name] = plot_data[color_mod_name].fillna('N/A').astype(str).str.strip()                    plot_data['color_codes'], unique_cats = pd.factorize(plot_data[color_mod_name])                    c_values = plot_data['color_codes']                    cmap = 'tab10'                    norm = plt.Normalize(vmin=0, vmax=len(unique_cats)-1)                    print(f"  ‚úì Applying color based on '{color_mod_name}' ({len(unique_cats)} categories)")                else:                    print(f"  ‚ö†Ô∏è  Color moderator '{color_mod_name}' not found, using default.")                    color_mod_name = 'None'            # --- 5. Create Figure ---            print("\nSTEP 3: GENERATING PLOT")            print("---------------------------------")            fig, ax = plt.subplots(figsize=(plot_width, plot_height))            if transparent_bg:                fig.patch.set_alpha(0)                ax.patch.set_alpha(0)            # --- Plot Spline Curve ---            ax.plot(x_pred, y_pred, color=curve_color, linewidth=curve_width,                   zorder=2, label='Spline Fit')            # --- Plot Confidence Band ---            if show_ci:                ax.fill_between(x_pred, ci_lower, ci_upper,                               color=curve_color, alpha=ci_alpha, zorder=1,                               label='95% CI')                print("  ‚úì Plotted spline curve and confidence band")            # --- Plot Data Points ---            if show_points:                ax.scatter(                    x=plot_data[moderator_col],                    y=plot_data[effect_col],                    s=point_size,                    c=c_values,                    cmap=cmap,                    norm=norm,                    alpha=point_alpha,                    edgecolors='black',                    linewidths=0.5,                    zorder=3,                    label='Observed Effects'                )                print("  ‚úì Plotted data points")            # --- Customize Axes ---            if show_null_line:                ax.axhline(es_config.get('null_value', 0), color='gray',                          linestyle='--', linewidth=1.0, zorder=0)            ax.set_xlabel(x_label, fontsize=12, fontweight='bold')            ax.set_ylabel(y_label, fontsize=12, fontweight='bold')            if show_title:                ax.set_title(graph_title, fontsize=14, fontweight='bold', pad=15)            if show_grid:                ax.grid(True, linestyle=':', alpha=0.3, zorder=0)            # --- Add F-test Result ---            if show_f_test and not np.isnan(f_pvalue):                sig_marker = "***" if f_pvalue < 0.001 else "**" if f_pvalue < 0.01 else "*" if f_pvalue < 0.05 else "ns"                f_text = f"Non-linearity Test:\nF({df_spline-1}) = {f_stat:.2f}\np = {f_pvalue:.4g} {sig_marker}"                ax.text(                    0.05, 0.95, f_text,                    transform=ax.transAxes, fontsize=10, verticalalignment='top',                    bbox=dict(boxstyle='round', facecolor='white', alpha=0.8, edgecolor='gray'),                    zorder=10                )            # --- Create Legend ---            handles, labels = ax.get_legend_handles_labels()            # Add color categories            if color_mod_name != 'None':                for i, cat in enumerate(unique_cats):                    display_label = label_mapping.get(cat, cat)                    color_val = plt.get_cmap(cmap)(norm(i))                    handles.append(mpatches.Patch(color=color_val, label=display_label,                                                  alpha=point_alpha, ec='black', lw=0.5))                    labels.append(display_label)            display_legend_title = label_mapping.get(color_mod_name, color_mod_name)            ax.legend(handles=handles, labels=labels, loc=legend_loc,                     fontsize=legend_fontsize, framealpha=0.9,                     title=display_legend_title if color_mod_name != 'None' else None)            fig.tight_layout()            plt.show()            # --- 6. Save Files ---            print(f"\nSTEP 4: SAVING FILES")            print("---------------------------------")            timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")            base_filename = f"{filename_prefix}_{moderator_col.replace(' ','_')}_{timestamp}"            saved_files = []            if save_pdf:                pdf_filename = f"{base_filename}.pdf"                fig.savefig(pdf_filename, bbox_inches='tight', transparent=transparent_bg)                saved_files.append(pdf_filename)                print(f"  ‚úì {pdf_filename}")            if save_png:                png_filename = f"{base_filename}.png"                fig.savefig(png_filename, dpi=png_dpi, bbox_inches='tight', transparent=transparent_bg)                saved_files.append(png_filename)                print(f"  ‚úì {png_filename} (DPI: {png_dpi})")            print(f"\n" + "="*70)            print("‚úÖ PLOT GENERATION COMPLETE")            print("="*70)        except Exception as e:            print(f"\n‚ùå AN ERROR OCCURRED:\n")            print(f"  Type: {type(e).__name__}")            print(f"  Message: {e}")            print("\n  Traceback:")            traceback.print_exc()            print("\n" + "="*70)            print("PLOT GENERATION FAILED")            print("="*70)# --- 3. DISPLAY WIDGETS ---try:    if 'ANALYSIS_CONFIG' not in globals() or 'spline_model_results' not in ANALYSIS_CONFIG:        print("="*70)        print("‚ö†Ô∏è  PREREQUISITE NOT MET")        print("="*70)        print("  Please run the spline analysis cell successfully before running this cell.")    else:        print("="*70)        print("‚úÖ SPLINE PLOTTER READY")        print("="*70)        print("  ‚úì Spline results are loaded")        print("  ‚úì Customize your plot using the tabs below and click 'Generate'")        # Hook up widget events        def on_color_mod_change(change):            point_color_widget.layout.display = 'none' if change['new'] != 'None' else 'flex'        color_mod_widget.observe(on_color_mod_change, names='value')        display(widgets.VBox([            header,            widgets.HTML("<hr style='margin: 15px 0;'>"),            widgets.HTML("<b>Plot Options:</b>"),            tab,            widgets.HTML("<hr style='margin: 15px 0;'>"),            run_plot_button,            plot_output        ]))except Exception as e:    print(f"‚ùå Initialization error: {e}")    print("Please ensure the notebook has been run in order.")

In [None]:
# ‚ö†Ô∏è PREREQUISITES:# - Recommended minimum: 10 studies for meaningful bias assessment# - Funnel plot asymmetry can arise from heterogeneity, not just publication bias## Expected runtime: < 5 seconds## INTERPRETATION:# - Asymmetric funnel plot suggests potential publication bias# - Egger's test p < 0.05 indicates significant asymmetry# - Consider other sources of asymmetry (heterogeneity, study quality)# EGGER'S REGRESSION TEST FOR PUBLICATION BIAS## Method from Egger et al. (1997). Bias in meta-analysis detected by a simple,# graphical test. BMJ, 315(7109), 629-634.## Tests for funnel plot asymmetry by regressing standardized effect size on precision.# Significant intercept suggests publication bias (small-study effects).## Limitations:# - Assumes publication bias is the only source of asymmetry# - Can be affected by heterogeneity# - Low power with few studies (< 10)# CLUSTER-ROBUST VARIANCE ESTIMATION## Method from Hedges, Tipton, & Johnson (2010). Robust variance estimation in# meta-regression with dependent effect size estimates. Research Synthesis Methods, 1(1), 39-65.## This approach:# 1. Allows for dependence between effect sizes from the same cluster (study)# 2. Provides valid standard errors even with small number of clusters# 3. Uses t-distribution with degrees of freedom correction## Particularly important when:# - Multiple effect sizes are extracted from the same study# - Effect sizes are nested within higher-level units# - Assumption of independence is violated# Hedges' g and Cohen's d Calculation# Methods from Hedges & Olkin (1985). Statistical methods for meta-analysis.# Academic Press.## Cohen's d = (M1 - M2) / SD_pooled# Hedges' g = d * (1 - 3/(4*df - 1))  # Small sample correction## These are standardized mean differences, common in psychology and medicine.#@title üìä FUNNEL PLOT & BIAS ASSESSMENT (Cluster-Robust)#@title üìä FUNNEL PLOT & BIAS ASSESSMENT (Cluster-Robust)# =============================================================================# CELL 12 (REPLACEMENT): FUNNEL PLOT & PUBLICATION BIAS ASSESSMENT# Purpose: Assess publication bias using a funnel plot and robust tests.# Method:  Plots individual effects against standard error.#          Uses the 3-level pooled effect (from Cell 6.5) as the center line.#          Runs a 3-level meta-regression for Egger's test (robust).# Dependencies: Cell 6.5, Cell 5 (data)# Outputs: Funnel plot (PDF/PNG) and robust bias test results# =============================================================================# --- 0. HELPER FUNCTIONS (from Cell 10) ---# We need the 3-level regression engine to run Egger's testdef _get_three_level_regression_estimates_v2(params, y_all, v_all, X_all, N_total, M_studies, p_params):    """Core function to calculate GLS estimates and REML log-likelihood."""    try:        tau_sq, sigma_sq = params        if tau_sq < 0 or sigma_sq < 0: return {'log_lik_reml': np.inf}        sum_log_det_Vi = 0.0        sum_XWX = np.zeros((p_params, p_params))        sum_XWy = np.zeros(p_params)        sum_yWy = 0.0        for i in range(M_studies):            y_i = y_all[i]            v_i = v_all[i]            X_i = X_all[i]            A_diag = v_i + sigma_sq            A_diag = np.clip(A_diag, 1e-10, 1e10)            if np.any(A_diag <= 0): return {'log_lik_reml': np.inf}            A_inv_diag = 1.0 / A_diag            log_det_A = np.sum(np.log(A_diag))            sum_A_inv_1 = np.sum(A_inv_diag)            term_S = 1.0 + tau_sq * sum_A_inv_1            if term_S <= 1e-10: return {'log_lik_reml': np.inf}            log_det_Vi = log_det_A + np.log(term_S)            sum_log_det_Vi += log_det_Vi            A_inv_y = A_inv_diag * y_i            sum_A_inv_y = np.sum(A_inv_y)            W_i_y = A_inv_y - (tau_sq * A_inv_diag * sum_A_inv_y) / term_S            A_inv_X = A_inv_diag[:, np.newaxis] * X_i            sum_A_inv_X = np.sum(A_inv_X, axis=0)            W_i_X = A_inv_X - (tau_sq * A_inv_diag[:, np.newaxis] * sum_A_inv_X) / term_S            sum_XWX += X_i.T @ W_i_X            sum_XWy += X_i.T @ W_i_y            sum_yWy += y_i.T @ W_i_y        try:            cond_number = np.linalg.cond(sum_XWX)            if cond_number > 1e10: return {'log_lik_reml': np.inf}        except:             return {'log_lik_reml': np.inf}        det_XWX = np.linalg.det(sum_XWX)        if det_XWX == 0 or np.isnan(det_XWX) or np.isinf(det_XWX):            return {'log_lik_reml': np.inf}        var_betas = np.linalg.inv(sum_XWX)        betas = var_betas @ sum_XWy        residual_ss = sum_yWy - betas.T @ sum_XWy        log_lik_reml = -0.5 * (sum_log_det_Vi + np.log(det_XWX) + residual_ss)        if np.isnan(log_lik_reml) or np.isinf(log_lik_reml):            return {'log_lik_reml': np.inf}        return {'betas': betas, 'var_betas': var_betas, 'se_betas': np.sqrt(np.diag(var_betas)),                'log_lik_reml': log_lik_reml, 'tau_sq': tau_sq, 'sigma_sq': sigma_sq}    except (FloatingPointError, ValueError, np.linalg.LinAlgError):        return {'log_lik_reml': np.inf}def _negative_log_likelihood_reml_reg_v2(params, y_all, v_all, X_all, N_total, M_studies, p_params):    """Wrapper for optimizer."""    estimates = _get_three_level_regression_estimates_v2(params, y_all, v_all, X_all, N_total, M_studies, p_params)    return -estimates['log_lik_reml']def _run_three_level_reml_regression_v2(analysis_data, moderator_col, effect_col, var_col):    """Main optimization function for meta-regression."""    grouped = analysis_data.groupby('id')    y_all = [group[effect_col].values for _, group in grouped]    v_all = [group[var_col].values for _, group in grouped]    X_all = []    for _, group in grouped:        mod_vals = group[moderator_col].values        X_i = sm.add_constant(mod_vals, prepend=True)        X_all.append(X_i)    N_total = len(analysis_data)    M_studies = len(y_all)    p_params = 2    try:        unconditional_results = ANALYSIS_CONFIG['three_level_results']        tau_sq_start = min(unconditional_results.get('tau_squared', 0.01), 5.0)        sigma_sq_start = min(unconditional_results.get('sigma_squared', 0.01), 5.0)    except Exception:        tau_sq_start, sigma_sq_start = 0.01, 0.01    initial_params = [max(1e-6, tau_sq_start), max(1e-6, sigma_sq_start)]    bounds = [(1e-6, 100.0), (1e-6, 100.0)]    optimizer_result = minimize(        _negative_log_likelihood_reml_reg_v2,        x0=initial_params,        args=(y_all, v_all, X_all, N_total, M_studies, p_params),        method='L-BFGS-B',        bounds=bounds,        options={'ftol': 1e-10, 'gtol': 1e-6, 'maxiter': 500}    )    if not optimizer_result.success:        return None, None, optimizer_result    final_estimates = _get_three_level_regression_estimates_v2(        optimizer_result.x, y_all, v_all, X_all, N_total, M_studies, p_params    )    return final_estimates, (N_total, M_studies, p_params), optimizer_result# --- 1. WIDGET DEFINITIONS ---header = widgets.HTML(    "<h3 style='color: #2E86AB;'>Funnel Plot & Bias Assessment</h3>"    "<p style='color: #666;'><i>Visual and statistical assessment of publication bias using robust methods.</i></p>")# --- Plot Widgets ---title_widget = widgets.Text(value="Funnel Plot for Publication Bias", description='Plot Title:',                            layout=widgets.Layout(width='450px'), style={'description_width': '120px'})xlabel_widget = widgets.Text(value="Effect Size (Hedges' g)", description='X-Axis Label:',                             layout=widgets.Layout(width='450px'), style={'description_width': '120px'})ylabel_widget = widgets.Text(value="Standard Error (Inverted)", description='Y-Axis Label:',                             layout=widgets.Layout(width='450px'), style={'description_width': '120px'})width_widget = widgets.FloatSlider(value=8.0, min=5.0, max=14.0, step=0.5, description='Plot Width (in):',                                   continuous_update=False, style={'description_width': '120px'},                                   layout=widgets.Layout(width='450px'))height_widget = widgets.FloatSlider(value=6.0, min=4.0, max=12.0, step=0.5, description='Plot Height (in):',                                    continuous_update=False, style={'description_width': '120px'},                                    layout=widgets.Layout(width='450px'))show_ci_funnel_widget = widgets.Checkbox(value=True, description='Show 95% CI Funnel', indent=False)show_contours_widget = widgets.Checkbox(value=False, description='Show Significance Contours (p<0.05, p<0.01)', indent=False)point_color_widget = widgets.Dropdown(options=['gray', 'blue', 'black', 'red'], value='gray',                                      description='Point Color:', style={'description_width': '120px'},                                      layout=widgets.Layout(width='450px'))point_alpha_widget = widgets.FloatSlider(value=0.6, min=0.1, max=1.0, step=0.1, description='Transparency:',                                          continuous_update=False, style={'description_width': '120px'},                                          layout=widgets.Layout(width='450px'))save_pdf_widget = widgets.Checkbox(value=True, description='Save as PDF', indent=False)save_png_widget = widgets.Checkbox(value=True, description='Save as PNG', indent=False)png_dpi_widget = widgets.IntSlider(value=300, min=150, max=600, step=50, description='PNG DPI:',                                   continuous_update=False, style={'description_width': '120px'},                                   layout=widgets.Layout(width='450px'))filename_prefix_widget = widgets.Text(value='Funnel_Plot', description='Filename Prefix:',                                      layout=widgets.Layout(width='450px'), style={'description_width': '120px'})transparent_bg_widget = widgets.Checkbox(value=False, description='Transparent Background', indent=False)show_title_widget = widgets.Checkbox(value=True, description='Show Plot Title', indent=False) # Was missingshow_grid_widget = widgets.Checkbox(value=True, description='Show Grid', indent=False) # Was missing# --- Assemble Tabs ---style_tab = widgets.VBox([    widgets.HTML("<h4 style='color: #2E86AB;'>Labels & Size</h4>"),    show_title_widget, title_widget, xlabel_widget, ylabel_widget, width_widget, height_widget])elements_tab = widgets.VBox([    widgets.HTML("<h4 style='color: #2E86AB;'>Plot Elements</h4>"),    show_ci_funnel_widget, show_contours_widget, show_grid_widget,    widgets.HTML("<hr style='margin: 10px 0;'>"),    point_color_widget, point_alpha_widget])export_tab = widgets.VBox([    widgets.HTML("<h4 style='color: #2E86AB;'>Export</h4>"),    save_pdf_widget, save_png_widget, png_dpi_widget, filename_prefix_widget, transparent_bg_widget])tab = widgets.Tab(children=[style_tab, elements_tab, export_tab])tab.set_title(0, 'üé® Style'); tab.set_title(1, 'üìä Elements'); tab.set_title(2, 'üíæ Export')run_plot_button = widgets.Button(    description='üìä Generate Funnel Plot & Run Tests',    button_style='success',    layout=widgets.Layout(width='450px', height='50px'),    style={'font_weight': 'bold'})plot_output = widgets.Output()# --- 2. MAIN FUNCTION (Attached to Button) ---@run_plot_button.on_clickdef generate_funnel_plot(b):    """Generate funnel plot with publication bias assessment"""    with plot_output:        clear_output(wait=True)        print("="*70)        print("GENERATING FUNNEL PLOT & BIAS ASSESSMENT")        print("="*70)        print(f"Timestamp: {datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")        try:            # --- 1. Load Data & Config ---            print("STEP 1: LOADING DATA & CONFIGURATION")            print("---------------------------------")            if 'ANALYSIS_CONFIG' not in globals() or 'three_level_results' not in ANALYSIS_CONFIG:                raise ValueError("Prerequisites not met. Run Cell 6.5 (Three-Level Analysis) first.")            if 'analysis_data' in globals():                plot_data = analysis_data.copy()            elif 'data_filtered' in globals():                plot_data = data_filtered.copy()            else:                raise ValueError("Data not found (analysis_data or data_filtered).")            es_config = ANALYSIS_CONFIG['es_config']            effect_col = ANALYSIS_CONFIG['effect_col']            se_col = ANALYSIS_CONFIG['se_col']            var_col = ANALYSIS_CONFIG['var_col']            pooled_effect = ANALYSIS_CONFIG['three_level_results']['pooled_effect']            print(f"  ‚úì Loaded {len(plot_data)} observations.")            print(f"  ‚úì Center line (from Cell 6.5): {pooled_effect:.4f}")            # --- 2. Get Widget Values (*** FIX: ADDED .value ***) ---            show_title = show_title_widget.value            graph_title = title_widget.value            x_label = xlabel_widget.value            y_label = ylabel_widget.value            plot_width = width_widget.value            plot_height = height_widget.value            show_ci_funnel = show_ci_funnel_widget.value            show_contours = show_contours_widget.value            point_color = point_color_widget.value            point_alpha = point_alpha_widget.value            save_pdf = save_pdf_widget.value            save_png = save_png_widget.value            png_dpi = png_dpi_widget.value            filename_prefix = filename_prefix_widget.value            transparent_bg = transparent_bg_widget.value            show_grid = show_grid_widget.value            # *** END FIX ***            # --- 3. Prepare Data for Plotting & Tests ---            print("\nSTEP 2: PREPARING DATA & RUNNING ROBUST EGGER'S TEST")            print("---------------------------------")            plot_data = plot_data.dropna(subset=[effect_col, se_col, 'id'])            plot_data = plot_data[plot_data[se_col] > 0]            plot_data['precision'] = 1.0 / plot_data[se_col]            plot_data['z_effect'] = plot_data[effect_col] / plot_data[se_col]            k_reg = len(plot_data)            m_reg = plot_data['id'].nunique()            print(f"  ‚úì Using {k_reg} observations from {m_reg} studies for bias tests.")            if k_reg < 10:                print("  ‚ö†Ô∏è  WARNING: Bias tests have low power with fewer than 10 studies.")            # --- 4. Run 3-Level Egger's Test ---            # Model: effect = Œ≤‚ÇÄ_se + Œ≤‚ÇÅ*SE + (u_i + r_ij + e_ij)            estimates, (N_total, M_studies, p_params), _ = _run_three_level_reml_regression_v2(                analysis_data = plot_data,                moderator_col = se_col, # Use SE as the moderator                effect_col = effect_col,                var_col = var_col            )            if estimates is None:                print("  ‚ùå Robust Egger's test failed to converge.")                egger_p_value = np.nan                egger_intercept = np.nan                df_robust = np.nan            else:                betas = estimates['betas']                se_betas = estimates['se_betas']                b0_intercept, b1_slope = betas[0], betas[1]                se0_intercept = se_betas[0]                t_stat_intercept = b0_intercept / se0_intercept                df_robust = M_studies - p_params                egger_p_value = 2 * (1 - t.cdf(np.abs(t_stat_intercept), df=df_robust))                egger_intercept = b0_intercept                print(f"  ‚úì Robust Egger's Test (3-Level) Complete.")                print(f"    - Intercept (Bias): {egger_intercept:.4f}")                print(f"    - Robust SE: {se0_intercept:.4f}")                print(f"    - p-value (t-test, df={df_robust}): {egger_p_value:.4g}")            # --- 5. Display Bias Test Results ---            print("\n" + "="*70)            print("PUBLICATION BIAS ASSESSMENT")            print("="*70)            if np.isnan(egger_p_value):                 print("\n  Unable to calculate robust Egger's test.")            elif egger_p_value < 0.05:                print(f"\n  üî¥ SIGNIFICANT ASYMMETRY DETECTED (p = {egger_p_value:.3g})")                print(f"     Evidence of publication bias or small-study effects.")            elif egger_p_value < 0.10:                print(f"\n  üü° MARGINAL ASYMMETRY (p = {egger_p_value:.3g})")                print(f"     Suggests possible publication bias.")            else:                print(f"\n  ‚úì NO SIGNIFICANT ASYMMETRY (p = {egger_p_value:.3g})")                print(f"     No strong statistical evidence of publication bias.")            # --- 6. Create Figure ---            print("\nSTEP 3: GENERATING PLOT")            print("---------------------------------")            fig, ax = plt.subplots(figsize=(plot_width, plot_height))            if transparent_bg:                fig.patch.set_alpha(0)                ax.patch.set_alpha(0)            # --- Plot 95% CI Funnel ---            if show_ci_funnel:                se_max = plot_data[se_col].max()                se_range = np.linspace(0, se_max * 1.1, 100)                upper_ci = pooled_effect + 1.96 * se_range                lower_ci = pooled_effect - 1.96 * se_range                ax.plot(upper_ci, se_range, color='gray', linestyle='--', linewidth=1.5, label='95% CI', alpha=0.7)                ax.plot(lower_ci, se_range, color='gray', linestyle='--', linewidth=1.5, alpha=0.7)                ax.fill_betweenx(se_range, lower_ci, upper_ci, color='lightgray', alpha=0.2)            # --- Plot Significance Contours ---            if show_contours:                se_max = plot_data[se_col].max()                se_range = np.linspace(0, se_max * 1.1, 100)                null_val = es_config.get('null_value', 0)                p05_upper = null_val + 1.96 * se_range                p05_lower = null_val - 1.96 * se_range                ax.plot(p05_upper, se_range, color='darkgray', linestyle=':', linewidth=1, label='p = 0.05')                ax.plot(p05_lower, se_range, color='darkgray', linestyle=':', linewidth=1)                p01_upper = null_val + 2.58 * se_range                p01_lower = null_val - 2.58 * se_range                ax.plot(p01_upper, se_range, color='gray', linestyle=':', linewidth=1, label='p = 0.01')                ax.plot(p01_lower, se_range, color='gray', linestyle=':', linewidth=1)            # --- Plot Data Points ---            ax.scatter(                plot_data[effect_col],                plot_data[se_col],                s=40, # Fixed size for funnel plots                c=point_color,                alpha=point_alpha,                edgecolors='black',                linewidths=0.5,                label='Studies',                zorder=3            )            # --- Plot Reference Line ---            ax.axvline(                x=pooled_effect,                color='red',                linestyle='-',                linewidth=2,                label=f'3-Level Pooled Effect ({pooled_effect:.3f})',                zorder=2            )            # --- Customize Axes ---            ax.set_xlabel(x_label, fontsize=12, fontweight='bold')            ax.set_ylabel(y_label, fontsize=12, fontweight='bold')            if show_title:                ax.set_title(graph_title, fontsize=14, fontweight='bold', pad=15)            ax.invert_yaxis()            if show_grid:                ax.grid(True, linestyle=':', alpha=0.4, zorder=0)            ax.legend(loc='best', fontsize=10, framealpha=0.9)            fig.tight_layout()            # --- 7. Save Files ---            print("\nSTEP 4: SAVING FILES")            print("---------------------------------")            timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")            base_filename = f"{filename_prefix}_{timestamp}"            saved_files = []            if save_pdf:                pdf_filename = f"{base_filename}.pdf"                fig.savefig(pdf_filename, bbox_inches='tight', transparent=transparent_bg)                saved_files.append(pdf_filename)                print(f"  ‚úì {pdf_filename}")            if save_png:                png_filename = f"{base_filename}.png"                fig.savefig(png_filename, dpi=png_dpi, bbox_inches='tight', transparent=transparent_bg)                saved_files.append(png_filename)                print(f"  ‚úì {png_filename} (DPI: {png_dpi})")            plt.show()            print(f"\n" + "="*70)            print("‚úÖ FUNNEL PLOT & BIAS TEST COMPLETE")            print("="*70)            # --- 8. Save Results ---            ANALYSIS_CONFIG['funnel_plot_results'] = {                'timestamp': datetime.datetime.now(),                'egger_test_robust': {                    'intercept': egger_intercept,                    'se': se0_intercept,                    'p_value': egger_p_value,                    'df': df_robust                },                'n_studies': m_reg,                'pooled_effect_reference': pooled_effect            }            print(f"‚úì Results saved to ANALYSIS_CONFIG['funnel_plot_results']")        except Exception as e:            print(f"\n‚ùå AN ERROR OCCURRED:\n")            print(f"  Type: {type(e).__name__}")            print(f"  Message: {e}")            print("\n  Traceback:")            traceback.print_exc(file=sys.stdout)            print("\n" + "="*70)            print("ANALYSIS FAILED. See error message above.")            print("Please check your data and configuration.")            print("="*70)# --- 6. DISPLAY WIDGETS ---try:    if 'ANALYSIS_CONFIG' not in globals() or 'three_level_results' not in ANALYSIS_CONFIG:        print("="*70)        print("‚ö†Ô∏è  PREREQUISITE NOT MET")        print("="*70)        print("Please run Cell 6.5 (Three-Level Meta-Analysis) successfully before running this cell.")    elif ANALYSIS_CONFIG['three_level_results'].get('status') != 'completed':         print("="*70)         print("‚ö†Ô∏è  PREREQUISITE NOT MET")         print("="*70)         print("Cell 6.5 (Three-Level Meta-Analysis) must be run successfully first.")    else:        # Pre-fill labels from config        xlabel_widget.value = ANALYSIS_CONFIG['es_config'].get('effect_label', "Effect Size")        print("="*70)        print("‚úÖ ROBUST FUNNEL PLOT INTERFACE READY")        print("="*70)        print("  ‚úì Center line will use the robust 3-level pooled effect from Cell 6.5.")        print("  ‚úì Egger's test will be run using a robust 3-level meta-regression.")        print("  ‚úì Customize your plot using the tabs below and click 'Generate'.")        display(widgets.VBox([            header,            widgets.HTML("<hr style='margin: 15px 0;'>"),            widgets.HTML("<b>Plot Options:</b>"),            tab,            widgets.HTML("<hr style='margin: 15px 0;'>"),            run_plot_button,            plot_output        ]))except Exception as e:    print(f"‚ùå An error occurred during initialization: {e}")    print("Please ensure the notebook has been run in order.")

In [None]:
# ‚ö†Ô∏è WARNING:# - Trim-and-fill assumes publication bias is the ONLY source of asymmetry# - Results should be interpreted cautiously# - Not recommended if heterogeneity is high## Expected runtime: 10-30 seconds (iterative algorithm)## INTERPRETATION:# - k0: Number of potentially missing studies# - Adjusted effect size accounts for imputed missing studies# - Large adjustments suggest substantial publication bias# EGGER'S REGRESSION TEST FOR PUBLICATION BIAS## Method from Egger et al. (1997). Bias in meta-analysis detected by a simple,# graphical test. BMJ, 315(7109), 629-634.## Tests for funnel plot asymmetry by regressing standardized effect size on precision.# Significant intercept suggests publication bias (small-study effects).## Limitations:# - Assumes publication bias is the only source of asymmetry# - Can be affected by heterogeneity# - Low power with few studies (< 10)#@title üîÑ TRIM-AND-FILL SENSITIVITY ANALYSIS#@title üîÑ TRIM-AND-FILL SENSITIVITY ANALYSIS# =============================================================================# TRIM-AND-FILL SENSITIVITY ANALYSIS# Purpose: Assess potential impact of publication bias using trim-and-fill method# Method: Duval & Tweedie (2000) iterative trim-and-fill procedure# IMPORTANT: This is a SENSITIVITY ANALYSIS, not a correction!# Dependencies: Cell 8 (overall results), Cell 7 (effect sizes)# Outputs: Comparison of original vs. "filled" estimates, forest plot# =============================================================================# =============================================================================# WIDGET SETUP# =============================================================================# Create output widgetoutput_widget = widgets.Output()# Configuration widgetsestimator_widget = widgets.Dropdown(    options=[        ('L0 (Linear, default)', 'L0'),        ('R0 (Rank-based)', 'R0'),        ('Q0 (Quadratic)', 'Q0')    ],    value='L0',    description='Estimator:',    style={'description_width': '100px'},    layout=widgets.Layout(width='350px'))side_widget = widgets.Dropdown(    options=[        ('Auto-detect (recommended)', 'auto'),        ('Right (assume small positive missing)', 'right'),        ('Left (assume small negative missing)', 'left')    ],    value='auto',    description='Side:',    style={'description_width': '100px'},    layout=widgets.Layout(width='450px'))max_iter_widget = widgets.IntSlider(    value=100,    min=10,    max=500,    step=10,    description='Max iterations:',    style={'description_width': '100px'},    layout=widgets.Layout(width='350px'))# Plot configurationshow_plot_widget = widgets.Checkbox(    value=True,    description='Show forest plot with imputed studies',    style={'description_width': 'initial'})run_button = widgets.Button(    description='‚ñ∂ Run Trim-and-Fill Analysis',    button_style='success',    layout=widgets.Layout(width='300px', height='40px'))# =============================================================================# TRIM-AND-FILL IMPLEMENTATION# =============================================================================def trimfill_analysis(data, effect_col, var_col, estimator='L0', side='auto', max_iter=100):    """    Duval & Tweedie (2000) Trim-and-Fill Method    This is a SENSITIVITY ANALYSIS to assess potential publication bias impact.    DO NOT use the "adjusted" estimate as your final result!    Parameters:    -----------    data : DataFrame        Data with effect sizes and variances    effect_col : str        Column name for effect sizes    var_col : str        Column name for variances    estimator : str        'L0' (linear), 'R0' (rank), or 'Q0' (quadratic)    side : str        'left', 'right', or 'auto'    max_iter : int        Maximum iterations    Returns:    --------    dict : Results including k0 (# studies to trim), filled data, estimates    """    # Prepare data    yi = data[effect_col].values    vi = data[var_col].values    k = len(yi)    # Calculate fixed-effect pooled estimate    wi = 1 / vi    pooled_fe = np.sum(wi * yi) / np.sum(wi)    # Center effect sizes    yi_centered = yi - pooled_fe    # Determine side if auto    if side == 'auto':        # Check which side has more extreme small studies        se = np.sqrt(vi)        # Look at correlation between effect size and SE        # Positive correlation suggests missing on right (small negative studies)        # Negative correlation suggests missing on left (small positive studies)        if len(yi) >= 3:            # Use Egger-like approach            corr, _ = pearsonr(np.abs(yi), se)            # Also check skewness            skew_sign = np.sign(np.mean(yi_centered))            if skew_sign > 0:                side = 'left'  # Positive skew suggests missing negative studies            else:                side = 'right'  # Negative skew suggests missing positive studies        else:            side = 'right'  # Default    side_multiplier = 1 if side == 'right' else -1    # Iterative trim-and-fill    k0 = 0  # Number of studies to trim    yi_work = yi.copy()    vi_work = vi.copy()    for iteration in range(max_iter):        k_current = len(yi_work)        # Re-calculate pooled estimate with current data        wi_work = 1 / vi_work        pooled_work = np.sum(wi_work * yi_work) / np.sum(wi_work)        # Center effects        yi_centered_work = yi_work - pooled_work        # Rank effects (by signed deviation from pooled)        if estimator == 'R0':            # Rank-based estimator            signed_effects = side_multiplier * yi_centered_work            ranks = rankdata(signed_effects)            # Count studies on the "expected" side            T_plus = np.sum(ranks[signed_effects > 0])            k_plus = np.sum(signed_effects > 0)            # Estimate k0            expected_T = k_plus * (k_current + 1) / 2            k0_new = int(np.floor((4 * T_plus - k_plus * (k_current + 1)) / (2 * k_current - k_plus + 2)))        else:  # L0 or Q0 (we'll implement L0)            # Linear estimator (Duval & Tweedie 2000, equation 3)            signed_effects = side_multiplier * yi_centered_work            # Sort by signed effect            sorted_idx = np.argsort(signed_effects)            sorted_effects = signed_effects[sorted_idx]            # Calculate sum of signed ranks            S = 0            for i, eff in enumerate(sorted_effects):                rank = i + 1                if eff > 0:                    S += (rank - (k_current + 1) / 2)            # Estimate k0 using L0 estimator            gamma = S / (k_current - 1) if k_current > 1 else 0            k0_new = int(np.floor((k_current - 1 - gamma) / 2))        # Constrain k0        k0_new = max(0, min(k0_new, k_current - 1))        # Check convergence        if k0_new == k0:            break        k0 = k0_new        # Trim k0 most extreme studies from the "expected" side        if k0 > 0:            signed_effects = side_multiplier * (yi - pooled_work)            trim_idx = np.argsort(signed_effects)[-k0:]  # Most extreme on expected side            keep_idx = np.setdiff1d(np.arange(k), trim_idx)            yi_work = yi[keep_idx]            vi_work = vi[keep_idx]        else:            yi_work = yi.copy()            vi_work = vi.copy()    # Calculate trimmed estimate    if len(yi_work) > 0:        wi_trim = 1 / vi_work        pooled_trim = np.sum(wi_trim * yi_work) / np.sum(wi_trim)    else:        pooled_trim = pooled_fe    # Fill: Create mirror images of the trimmed studies    if k0 > 0:        # Get the k0 most extreme studies that were trimmed        signed_effects = side_multiplier * (yi - pooled_trim)        trim_idx = np.argsort(signed_effects)[-k0:]        # Create filled (imputed) studies as mirror images        yi_trimmed = yi[trim_idx]        vi_trimmed = vi[trim_idx]        # Mirror around the trimmed pooled estimate        yi_filled = 2 * pooled_trim - yi_trimmed        vi_filled = vi_trimmed.copy()  # Same variance        # Combine original + filled        yi_combined = np.concatenate([yi, yi_filled])        vi_combined = np.concatenate([vi, vi_filled])    else:        yi_combined = yi.copy()        vi_combined = vi.copy()        yi_filled = np.array([])        vi_filled = np.array([])    # Calculate filled estimate    wi_combined = 1 / vi_combined    pooled_filled = np.sum(wi_combined * yi_combined) / np.sum(wi_combined)    var_filled = 1 / np.sum(wi_combined)    se_filled = np.sqrt(var_filled)    ci_lower_filled = pooled_filled - 1.96 * se_filled    ci_upper_filled = pooled_filled + 1.96 * se_filled    # Calculate original estimate statistics    var_original = 1 / np.sum(wi)    se_original = np.sqrt(var_original)    ci_lower_original = pooled_fe - 1.96 * se_original    ci_upper_original = pooled_fe + 1.96 * se_original    return {        'k0': k0,        'side': side,        'k_original': k,        'k_filled': k + k0,        'pooled_original': pooled_fe,        'se_original': se_original,        'ci_lower_original': ci_lower_original,        'ci_upper_original': ci_upper_original,        'pooled_filled': pooled_filled,        'se_filled': se_filled,        'ci_lower_filled': ci_lower_filled,        'ci_upper_filled': ci_upper_filled,        'yi_filled': yi_filled,        'vi_filled': vi_filled,        'yi_combined': yi_combined,        'vi_combined': vi_combined,        'estimator': estimator,        'converged': iteration < max_iter - 1    }def plot_trim_fill_forest(data, effect_col, se_col, results, es_label):    """Create forest plot showing original + imputed studies"""    yi_original = data[effect_col].values    se_original = data[se_col].values    k_original = len(yi_original)    k0 = results['k0']    # Prepare plot data    all_effects = list(yi_original)    all_se = list(se_original)    all_labels = [f"Study {i+1}" for i in range(k_original)]    all_colors = ['black'] * k_original    all_markers = ['o'] * k_original    # Add filled studies    if k0 > 0:        yi_filled = results['yi_filled']        se_filled = np.sqrt(results['vi_filled'])        for i in range(k0):            all_effects.append(yi_filled[i])            all_se.append(se_filled[i])            all_labels.append(f"Filled {i+1}")            all_colors.append('red')            all_markers.append('s')  # Square marker    # Calculate confidence intervals    all_effects = np.array(all_effects)    all_se = np.array(all_se)    ci_lower = all_effects - 1.96 * all_se    ci_upper = all_effects + 1.96 * all_se    # Sort by effect size    sort_idx = np.argsort(all_effects)[::-1]    # Create figure    fig, ax = plt.subplots(figsize=(10, max(8, len(all_effects) * 0.3)))    # Plot studies    y_pos = np.arange(len(all_effects))    for i, idx in enumerate(sort_idx):        # Plot CI        ax.plot([ci_lower[idx], ci_upper[idx]], [i, i],                color=all_colors[idx], linewidth=1.5, alpha=0.6)        # Plot point estimate        ax.scatter([all_effects[idx]], [i],                  marker=all_markers[idx], s=100,                  color=all_colors[idx], edgecolors='black',                  linewidths=1.5, zorder=3,                  alpha=0.8 if all_colors[idx] == 'red' else 1.0)    # Add pooled estimates    y_pooled_original = len(all_effects) + 1    y_pooled_filled = len(all_effects) + 2    # Original pooled    ax.plot([results['ci_lower_original'], results['ci_upper_original']],            [y_pooled_original, y_pooled_original],            color='blue', linewidth=3, alpha=0.7)    ax.scatter([results['pooled_original']], [y_pooled_original],              marker='D', s=150, color='blue',              edgecolors='black', linewidths=2, zorder=3,              label='Original pooled')    # Filled pooled    if k0 > 0:        ax.plot([results['ci_lower_filled'], results['ci_upper_filled']],                [y_pooled_filled, y_pooled_filled],                color='red', linewidth=3, alpha=0.7, linestyle='--')        ax.scatter([results['pooled_filled']], [y_pooled_filled],                  marker='D', s=150, color='red',                  edgecolors='black', linewidths=2, zorder=3,                  label='Filled pooled (sensitivity)')    # Add null line    ax.axvline(x=0, color='gray', linestyle='--', linewidth=1, alpha=0.5)    # Formatting    ax.set_yticks(range(len(all_effects) + 3))    labels_plot = [all_labels[idx] for idx in sort_idx] + ['', 'Original Pooled']    if k0 > 0:        labels_plot.append('Filled Pooled')    ax.set_yticklabels(labels_plot)    ax.set_xlabel(f'{es_label} (95% CI)', fontsize=12, fontweight='bold')    ax.set_ylabel('Studies', fontsize=12, fontweight='bold')    ax.set_title('Trim-and-Fill Sensitivity Analysis\n(Red = Imputed Studies)',                fontsize=14, fontweight='bold', pad=20)    # Legend    original_patch = mpatches.Patch(color='black', label='Original studies')    filled_patch = mpatches.Patch(color='red', label='Imputed studies')    ax.legend(handles=[original_patch, filled_patch] if k0 > 0 else [original_patch],             loc='best', frameon=True, fancybox=True, shadow=True)    ax.spines['top'].set_visible(False)    ax.spines['right'].set_visible(False)    ax.grid(axis='x', alpha=0.3, linestyle=':')    plt.tight_layout()    plt.show()# =============================================================================# MAIN ANALYSIS FUNCTION# =============================================================================def run_trim_fill_analysis(b):    """Execute trim-and-fill analysis"""    with output_widget:        clear_output(wait=True)        print("="*70)        print("TRIM-AND-FILL SENSITIVITY ANALYSIS")        print("="*70)        print(f"Timestamp: {datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")        print()        # Warning banner        print("‚ö†Ô∏è  "*25)        print("IMPORTANT: THIS IS A SENSITIVITY ANALYSIS")        print("‚ö†Ô∏è  "*25)        print()        print("Trim-and-fill should be used to assess HOW VULNERABLE your")        print("results are to publication bias, NOT to 'correct' your estimate.")        print()        print("The 'filled' estimate shows what results MIGHT look like if")        print("missing studies existed, but this is speculative.")        print()        print("Report BOTH original and filled estimates, and interpret with caution.")        print("="*70)        print()        try:            # Load configuration            if 'ANALYSIS_CONFIG' not in globals():                raise NameError("ANALYSIS_CONFIG not found. Run previous cells first.")            effect_col = ANALYSIS_CONFIG['effect_col']            var_col = ANALYSIS_CONFIG['var_col']            se_col = ANALYSIS_CONFIG['se_col']            es_config = ANALYSIS_CONFIG['es_config']            if 'analysis_data' in globals():                data = analysis_data.copy()            elif 'data_filtered' in globals():                data = data_filtered.copy()            else:                raise ValueError("No data found. Run previous cells first.")            # Clean data            data = data.dropna(subset=[effect_col, var_col])            data = data[data[var_col] > 0]            k = len(data)            print(f"STEP 1: LOADING DATA")            print("-"*70)            print(f"  ‚úì Loaded {k} observations")            print(f"  ‚úì Effect size: {es_config['effect_label']}")            print()            if k < 3:                print("‚ùå ERROR: Need at least 3 studies for trim-and-fill")                return            # Run analysis            print(f"STEP 2: RUNNING TRIM-AND-FILL")            print("-"*70)            print(f"  ‚Ä¢ Estimator: {estimator_widget.value}")            print(f"  ‚Ä¢ Side: {side_widget.value}")            print(f"  ‚Ä¢ Max iterations: {max_iter_widget.value}")            print()            results = trimfill_analysis(                data=data,                effect_col=effect_col,                var_col=var_col,                estimator=estimator_widget.value,                side=side_widget.value,                max_iter=max_iter_widget.value            )            if not results['converged']:                print("  ‚ö†Ô∏è  WARNING: Analysis did not converge within max iterations")            print(f"  ‚úì Analysis complete")            print(f"  ‚úì Detected side: {results['side']}")            print()            # Display results            print("="*70)            print("RESULTS")            print("="*70)            print()            print(f"üìä NUMBER OF STUDIES TRIMMED/FILLED: {results['k0']}")            print()            if results['k0'] == 0:                print("‚úÖ RESULT: No evidence of missing studies detected")                print()                print("Interpretation:")                print("  ‚Ä¢ The trim-and-fill algorithm found no asymmetry suggesting")                print("    missing studies on either side of the funnel plot.")                print("  ‚Ä¢ This provides some reassurance against publication bias,")                print("    though it does NOT prove bias is absent.")                print("  ‚Ä¢ Other bias assessment methods should also be considered.")            else:                print(f"‚ö†Ô∏è  RESULT: {results['k0']} studies potentially missing on the {results['side']} side")                print()                # Comparison table                print(f"{'Estimate':<30} {'Original':<15} {'After Filling':<15} {'Difference':<15}")                print("-"*75)                print(f"{'k (# studies)':<30} {results['k_original']:<15} {results['k_filled']:<15} {results['k0']:<15}")                print(f"{'Pooled effect':<30} {results['pooled_original']:<15.4f} {results['pooled_filled']:<15.4f} {results['pooled_filled'] - results['pooled_original']:<15.4f}")                print(f"{'Standard error':<30} {results['se_original']:<15.4f} {results['se_filled']:<15.4f} {results['se_filled'] - results['se_original']:<15.4f}")                print(f"{'95% CI lower':<30} {results['ci_lower_original']:<15.4f} {results['ci_lower_filled']:<15.4f} {'‚Äî':<15}")                print(f"{'95% CI upper':<30} {results['ci_upper_original']:<15.4f} {results['ci_upper_filled']:<15.4f} {'‚Äî':<15}")                print()                # Calculate percent change                pct_change = abs((results['pooled_filled'] - results['pooled_original']) / results['pooled_original'] * 100)                print("üéØ INTERPRETATION:")                print()                print(f"  ‚Ä¢ If {results['k0']} studies were missing due to publication bias,")                print(f"    the pooled effect would change by {pct_change:.1f}%")                print()                if pct_change < 10:                    print("  ‚úì Result is relatively ROBUST to potential publication bias")                    print("    (< 10% change in estimate)")                elif pct_change < 25:                    print("  ‚ö†Ô∏è  Result shows MODERATE sensitivity to publication bias")                    print("    (10-25% change in estimate)")                else:                    print("  üî¥ Result shows HIGH sensitivity to publication bias")                    print("    (> 25% change in estimate)")                    print("    Interpret original findings with considerable caution")                # Check if conclusion changes                original_sig = not (results['ci_lower_original'] <= 0 <= results['ci_upper_original'])                filled_sig = not (results['ci_lower_filled'] <= 0 <= results['ci_upper_filled'])                print()                if original_sig != filled_sig:                    print("  ‚ö†Ô∏è  CRITICAL: Statistical significance CHANGES after filling!")                    print("     This suggests results may be heavily influenced by bias.")                else:                    print("  ‚úì Statistical significance does NOT change after filling")            print()            print("="*70)            print("REPORTING GUIDANCE")            print("="*70)            print()            print("When reporting trim-and-fill results:")            print()            print("  1. ‚úì Report it as a SENSITIVITY ANALYSIS, not a correction")            print("  2. ‚úì Report both original and filled estimates")            print("  3. ‚úì Emphasize the ROBUSTNESS interpretation:")            print("       'Results were [robust/sensitive] to potential publication bias'")            print("  4. ‚úì Note the assumptions:")            print("       - Assumes bias is due to small studies only")            print("       - Assumes symmetric funnel plot without bias")            print("       - Cannot distinguish publication bias from other causes")            print("  5. ‚ö†Ô∏è  Do NOT report the filled estimate as your main finding")            print()            # Save results            ANALYSIS_CONFIG['trimfill_results'] = {                'timestamp': datetime.datetime.now(),                'k0': results['k0'],                'side': results['side'],                'estimator': results['estimator'],                'pooled_original': results['pooled_original'],                'pooled_filled': results['pooled_filled'],                'se_original': results['se_original'],                'se_filled': results['se_filled'],                'ci_original': [results['ci_lower_original'], results['ci_upper_original']],                'ci_filled': [results['ci_lower_filled'], results['ci_upper_filled']],                'percent_change': pct_change if results['k0'] > 0 else 0            }            print("  ‚úì Results saved to ANALYSIS_CONFIG['trimfill_results']")            print()            # Plot            if show_plot_widget.value and results['k0'] > 0:                print("="*70)                print("FOREST PLOT")                print("="*70)                print()                plot_trim_fill_forest(                    data=data,                    effect_col=effect_col,                    se_col=se_col,                    results=results,                    es_label=es_config['effect_label']                )        except Exception as e:            print(f"\n‚ùå ERROR: {type(e).__name__}")            print(f"Message: {e}")            traceback.print_exc()# Attach handlerrun_button.on_click(run_trim_fill_analysis)# =============================================================================# DISPLAY UI# =============================================================================help_html = widgets.HTML("""<div style='background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);            padding: 20px; border-radius: 10px; color: white; margin-bottom: 20px;'>    <h2 style='color: white; margin-top: 0;'>üîÑ Trim-and-Fill Sensitivity Analysis</h2>    <p style='font-size: 14px; margin-bottom: 0;'>        Assess how vulnerable your results are to publication bias    </p></div><div style='background-color: #fff3cd; border-left: 4px solid #ff9800;            padding: 15px; margin: 15px 0; border-radius: 4px;'>    <b>‚ö†Ô∏è IMPORTANT:</b> This is a <b>sensitivity analysis</b>, NOT a correction!    <br><br>    <b>Purpose:</b> Estimate how much your results might change IF unpublished studies exist    <br>    <b>Do NOT:</b> Use the "filled" estimate as your final answer    <br>    <b>Do:</b> Report both estimates and discuss robustness</div><div style='background-color: #e7f3ff; padding: 15px; margin: 15px 0; border-radius: 4px;'>    <b>üìö How it works:</b>    <ol style='margin: 5px 0;'>        <li>Detects asymmetry in the funnel plot</li>        <li>Estimates number of "missing" studies (k‚ÇÄ)</li>        <li>Adds mirror-image imputed studies</li>        <li>Recalculates pooled effect with imputed studies</li>        <li>Compares original vs. "filled" estimates</li>    </ol>    <b>Interpretation:</b> If results change little, they're robust to bias.    If results change substantially, interpret with caution.</div>""")config_box = widgets.VBox([    widgets.HTML("<h4 style='color: #2E86AB;'>‚öôÔ∏è Configuration</h4>"),    estimator_widget,    side_widget,    max_iter_widget,    widgets.HTML("<br>"),    show_plot_widget], layout=widgets.Layout(    border='1px solid #ddd',    padding='15px',    margin='10px 0'))# Check prerequisitestry:    if 'ANALYSIS_CONFIG' not in globals() or 'overall_results' not in ANALYSIS_CONFIG:        display(HTML("""        <div style='background-color: #f8d7da; border: 2px solid #f5c6cb;                    padding: 20px; border-radius: 5px; color: #721c24;'>            <h3>‚ùå Prerequisites Not Met</h3>            <p>Please run the following cells first:</p>            <ol>                <li>Cell 8: Overall Meta-Analysis</li>                <li>Cell 7: Effect Size Calculation</li>            </ol>        </div>        """))    else:        display(help_html)        display(config_box)        display(run_button)        display(output_widget)        display(widgets.HTML("""        <div style='background-color: #d4edda; border-left: 4px solid #28a745;                    padding: 12px; margin: 15px 0; border-radius: 4px;'>            ‚úÖ Ready! Configure options above and click the button to run the analysis.        </div>        """))except Exception as e:    print(f"‚ùå Initialization error: {e}")

In [None]:
# ‚ö†Ô∏è PREREQUISITES:# - Must calculate effect sizes first# - Requires hierarchical structure (multiple effects per study OR nested grouping)## Expected runtime: 5-30 seconds depending on dataset size and complexity## INTERPRETATION:# - tau¬≤_level2: Variance between effect sizes within studies# - tau¬≤_level3: Variance between studies# - ICC (Intraclass correlation): Proportion of variance at each level# THREE-LEVEL META-ANALYTIC MODEL## Implementation based on:# - Cheung (2014). Modeling dependent effect sizes with three-level meta-analyses:#   A structural equation modeling approach. Psychological Methods, 19(2), 211-229.# - Van den Noortgate et al. (2013). Three-level meta-analysis of dependent effect sizes.#   Behavior Research Methods, 45(2), 576-594.## Model structure:#   Level 1: Sampling variance within studies (known, vi)#   Level 2: Variance between effect sizes within the same study (tau_squared_level2)#   Level 3: Variance between studies (tau_squared_level3)## Estimation uses REML (Restricted Maximum Likelihood) for unbiased variance estimates.#@title üîÑ LEAVE-ONE-OUT SENSITIVITY (THREE-LEVEL)#@title üîÑ LEAVE-ONE-OUT SENSITIVITY (THREE-LEVEL)# =============================================================================# CELL 13 (ADVANCED REPLACEMENT): THREE-LEVEL LEAVE-ONE-OUT ANALYSIS# Purpose: Assess influence of individual studies on the 3-level pooled effect# Method:  Re-runs the full 3-level REML optimization (from Cell 6.5)#          for each study removed.# Dependencies: Cell 6.5 (for baseline results)# Outputs: 'loo_3level_results' in ANALYSIS_CONFIG, and an influence plot# =============================================================================# --- 0. HELPER FUNCTIONS (COPIED FROM CELL 6.5) ---# We need the full 3-level unconditional model engine heredef _get_three_level_estimates_loo(params, y_all, v_all, N_total, M_studies):    """Core function to calculate 3-level estimates (silent version)"""    try:        tau_sq, sigma_sq = params        if tau_sq < 0 or sigma_sq < 0: return {'log_lik_reml': np.inf}        sum_log_det_Vi = 0.0        sum_S, sum_Sy, sum_ySy = 0.0, 0.0, 0.0        for i in range(M_studies):            y_i, v_i = y_all[i], v_all[i]            A_diag = v_i + sigma_sq            if np.any(A_diag <= 0): return {'log_lik_reml': np.inf}            A_inv_diag = 1.0 / A_diag            log_det_A = np.sum(np.log(A_diag))            sum_A_inv_1 = np.sum(A_inv_diag)            term_S = 1.0 + tau_sq * sum_A_inv_1            if term_S <= 1e-10: return {'log_lik_reml': np.inf}            log_det_Vi = log_det_A + np.log(term_S)            sum_log_det_Vi += log_det_Vi            A_inv_y = A_inv_diag * y_i            sum_A_inv_y = np.sum(A_inv_y)            Vi_inv_y = A_inv_y - (tau_sq * A_inv_diag * sum_A_inv_y) / term_S            Vi_inv_1_vec = A_inv_diag - (tau_sq * A_inv_diag * sum_A_inv_1) / term_S            sum_S += np.sum(Vi_inv_1_vec)            sum_Sy += np.sum(Vi_inv_y)            sum_ySy += np.dot(y_i, Vi_inv_y)        if sum_S <= 1e-10: return {'log_lik_reml': np.inf}        mu_hat = sum_Sy / sum_S        var_mu = 1.0 / sum_S        se_mu = np.sqrt(var_mu)        residual_ss = sum_ySy - 2.0 * mu_hat * sum_Sy + mu_hat**2 * sum_S        log_lik_reml = -0.5 * (sum_log_det_Vi + np.log(sum_S) + residual_ss)        if np.isnan(log_lik_reml): return {'log_lik_reml': np.inf}        return {'mu': mu_hat, 'se_mu': se_mu, 'var_mu': var_mu,                'log_lik_reml': log_lik_reml, 'tau_sq': tau_sq, 'sigma_sq': sigma_sq}    except (FloatingPointError, ValueError, np.linalg.LinAlgError):        return {'log_lik_reml': np.inf}def _negative_log_likelihood_reml_loo(params, y_all, v_all, N_total, M_studies):    """Wrapper for optimizer."""    estimates = _get_three_level_estimates_loo(params, y_all, v_all, N_total, M_studies)    return -estimates['log_lik_reml']def _run_three_level_reml_loo(analysis_data, effect_col, var_col):    """Main optimization function for a single LOO iteration."""    grouped = analysis_data.groupby('id')    y_all = [group[effect_col].values for _, group in grouped]    v_all = [group[var_col].values for _, group in grouped]    N_total = len(analysis_data)    M_studies = len(y_all)    if M_studies < 2:        return None # Not enough studies    try:        # Use simple starting values for speed and stability        tau_sq_start, sigma_sq_start = 0.01, 0.01    except Exception:        tau_sq_start, sigma_sq_start = 0.01, 0.01    initial_params = [max(1e-6, tau_sq_start), max(1e-6, sigma_sq_start)]    bounds = [(1e-6, 100.0), (1e-6, 100.0)]    with warnings.catch_warnings():        warnings.simplefilter("ignore")        optimizer_result = minimize(            _negative_log_likelihood_reml_loo,            x0=initial_params,            args=(y_all, v_all, N_total, M_studies),            method='L-BFGS-B',            bounds=bounds,            options={'ftol': 1e-10, 'gtol': 1e-6, 'maxiter': 500}        )    if not optimizer_result.success:        return None    final_estimates = _get_three_level_estimates_loo(        optimizer_result.x, y_all, v_all, N_total, M_studies    )    return final_estimates# --- 1. WIDGET DEFINITIONS ---header = widgets.HTML(    "<h3 style='color: #2E86AB;'>Three-Level Leave-One-Out Sensitivity Analysis</h3>"    "<p style='color: #666;'><i>Assesses the influence of each individual study on the robust 3-level pooled effect.</i></p>"    "<p style='color: red; font-weight: bold;'>‚ö†Ô∏è This analysis is computationally intensive and may take several minutes to run.</p>")# Plot optionsplot_width_widget = widgets.FloatSlider(    value=10.0, min=6.0, max=14.0, step=0.5,    description='Plot Width:', continuous_update=False,    style={'description_width': '120px'}, layout=widgets.Layout(width='450px'))sort_by_widget = widgets.Dropdown(    options=[('Effect Size', 'effect'), ('Study ID', 'id'), ('Influence (distance from original)', 'influence')],    value='effect', description='Sort By:',    style={'description_width': '120px'}, layout=widgets.Layout(width='450px'))# Export optionssave_pdf_widget = widgets.Checkbox(value=True, description='Save as PDF', indent=False)save_png_widget = widgets.Checkbox(value=True, description='Save as PNG', indent=False)png_dpi_widget = widgets.IntSlider(value=300, min=150, max=600, step=50, description='PNG DPI:',                                   continuous_update=False, style={'description_width': '120px'},                                   layout=widgets.Layout(width='450px'))filename_prefix_widget = widgets.Text(value='LeaveOneOut_3Level', description='Filename Prefix:',                                      layout=widgets.Layout(width='450px'), style={'description_width': '120px'})# --- Assemble Tabs ---plot_tab = widgets.VBox([    widgets.HTML("<h4 style='color: #2E86AB;'>Plot Options</h4>"),    plot_width_widget, sort_by_widget])export_tab = widgets.VBox([    widgets.HTML("<h4 style='color: #2E86AB;'>Export</h4>"),    save_pdf_widget, save_png_widget, png_dpi_widget, filename_prefix_widget])tab = widgets.Tab(children=[plot_tab, export_tab])tab.set_title(0, 'üé® Plot'); tab.set_title(1, 'üíæ Export')run_button = widgets.Button(    description='‚ñ∂ Run 3-Level Leave-One-Out Analysis',    button_style='success',    layout=widgets.Layout(width='450px', height='50px'),    style={'font_weight': 'bold'})analysis_output = widgets.Output()# --- 2. MAIN ANALYSIS FUNCTION (Attached to Button) ---@run_button.on_clickdef run_loo_analysis(b):    with analysis_output:        clear_output(wait=True)        print("="*70)        print("RUNNING THREE-LEVEL LEAVE-ONE-OUT ANALYSIS")        print("="*70)        print(f"Timestamp: {datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")        print("‚ö†Ô∏è  This is the most computationally intensive cell. Please be patient...")        try:            # --- 1. Load Config and Data ---            print("\nSTEP 1: LOADING CONFIGURATION")            print("---------------------------------")            if 'ANALYSIS_CONFIG' not in globals() or 'three_level_results' not in ANALYSIS_CONFIG:                raise ValueError("Prerequisites not met. Run Cell 6.5 (Three-Level Analysis) first.")            # *** FIX: Use a new local variable name ***            data_for_loo = None            if 'analysis_data' in globals():                data_for_loo = analysis_data.copy()                print(f"  ‚úì Found global 'analysis_data' (Shape: {data_for_loo.shape})")            elif 'data_filtered' in globals():                data_for_loo = data_filtered.copy()                print(f"  ‚úì Found global 'data_filtered' (Shape: {data_for_loo.shape})")            else:                raise ValueError("Data not found (analysis_data or data_filtered).")            # *** END FIX ***            if data_for_loo.empty:                raise ValueError("Analysis data is empty")            effect_col = ANALYSIS_CONFIG['effect_col']            var_col = ANALYSIS_CONFIG['var_col']            es_config = ANALYSIS_CONFIG['es_config']            # Get original 3-level results for comparison            original_results = ANALYSIS_CONFIG['three_level_results']            original_effect = original_results['pooled_effect']            original_ci_lower = original_results['ci_lower']            original_ci_upper = original_results['ci_upper']            original_tau2 = original_results['tau_squared']            original_sigma2 = original_results['sigma_squared']            original_k_studies = original_results['k_studies']            print(f"  ‚úì Loaded {len(data_for_loo)} observations from {original_k_studies} studies.")            print(f"  ‚úì Original 3-Level Effect: {original_effect:.4f} [{original_ci_lower:.4f}, {original_ci_upper:.4f}]")            # --- 2. Get Widget Values ---            plot_width = plot_width_widget.value            sort_by = sort_by_widget.value            save_pdf = save_pdf_widget.value            save_png = save_png_widget.value            png_dpi = png_dpi_widget.value            filename_prefix = filename_prefix_widget.value            # --- 3. Run Leave-One-Out Loop ---            print("\nSTEP 2: RUNNING LEAVE-ONE-OUT ITERATIONS")            print("---------------------------------")            # *** FIX: Use the new local variable name ***            removal_ids = data_for_loo['id'].unique()            loo_results = []            for i, remove_id in enumerate(removal_ids):                print(f"  Running analysis {i+1}/{len(removal_ids)}: Removing study '{remove_id}'...")                df_loo = data_for_loo[data_for_loo['id'] != remove_id].copy()                # *** END FIX ***                if df_loo['id'].nunique() < 2:                    print(f"    ...Skipped (not enough studies remain)")                    continue                # Run the full 3-level model on the subset                estimates = _run_three_level_reml_loo(df_loo, effect_col, var_col)                if estimates is None:                    print(f"    ...REML failed to converge for this subset. Skipping.")                    continue                # Calculate new stats                mu_loo = estimates['mu']                se_loo = estimates['se_mu']                ci_lower_loo = mu_loo - 1.96 * se_loo                ci_upper_loo = mu_loo + 1.96 * se_loo                # Calculate influence                effect_diff = mu_loo - original_effect                # Check significance change                null_val = es_config.get('null_value', 0)                original_is_sig = not (original_ci_lower <= null_val <= original_ci_upper)                loo_is_sig = not (ci_lower_loo <= null_val <= ci_upper_loo)                changes_significance = (original_is_sig != loo_is_sig)                loo_results.append({                    'unit_removed': str(remove_id),                    'k_studies': df_loo['id'].nunique(),                    'k_obs': len(df_loo),                    'pooled_effect': mu_loo,                    'se': se_loo,                    'ci_lower': ci_lower_loo,                    'ci_upper': ci_upper_loo,                    'tau_squared': estimates['tau_sq'],                    'sigma_squared': estimates['sigma_sq'],                    'effect_diff': effect_diff,                    'abs_diff': abs(effect_diff),                    'changes_sig': changes_significance                })            print("  ‚úì Analysis complete")            if len(loo_results) == 0:                raise ValueError("No LOO iterations were successful.")            results_df = pd.DataFrame(loo_results)            # --- 4. Analyze and Display Results ---            print("\n" + "="*70)            print("LEAVE-ONE-OUT RESULTS SUMMARY")            print("="*70)            min_effect = results_df['pooled_effect'].min()            max_effect = results_df['pooled_effect'].max()            effect_range = max_effect - min_effect            print(f"\nüìä Effect Size Range:")            print(f"  ‚Ä¢ Original: {original_effect:.4f} [{original_ci_lower:.4f}, {original_ci_upper:.4f}]")            print(f"  ‚Ä¢ Minimum:  {min_effect:.4f} (when removing '{results_df.loc[results_df['pooled_effect'].idxmin(), 'unit_removed']}')")            print(f"  ‚Ä¢ Maximum:  {max_effect:.4f} (when removing '{results_df.loc[results_df['pooled_effect'].idxmax(), 'unit_removed']}')")            # Find most influential studies            top_influential = results_df.nlargest(min(3, len(results_df)), 'abs_diff')            print(f"\nüîç Most Influential Studies (by effect change):")            for _, row in top_influential.iterrows():                direction = "increases" if row['effect_diff'] > 0 else "decreases"                print(f"  ‚Ä¢ {row['unit_removed']}: Effect {direction} by {row['abs_diff']:.4f}")            # Check for significance changes            sig_changers = results_df[results_df['changes_sig'] == True]            if len(sig_changers) > 0:                print(f"\nüî¥ WARNING: {len(sig_changers)} study/studies change significance when removed:")                for _, row in sig_changers.iterrows():                    print(f"  ‚Ä¢ {row['unit_removed']}: New CI [{row['ci_lower']:.4f}, {row['ci_upper']:.4f}]")                print(f"  ‚ö†Ô∏è  Results are sensitive to these studies.")                print(f"     Consider investigating these studies more closely")            else:                print(f"\n‚úì STABLE: No single study changes the overall statistical significance.")            # --- 5. Create Plot ---            print("\nSTEP 3: GENERATING PLOT")            print("---------------------------------")            # Sort for plotting            if sort_by == 'effect':                plot_df = results_df.sort_values('pooled_effect')            elif sort_by == 'influence':                plot_df = results_df.sort_values('abs_diff', ascending=False)            else:                plot_df = results_df.sort_values('unit_removed')            plot_df = plot_df.reset_index(drop=True)            plot_height_adj = max(6, len(plot_df) * 0.3 + 2)            fig, ax = plt.subplots(figsize=(plot_width, plot_height_adj))            y_positions = np.arange(len(plot_df))            # Plot LOO effects with CIs            for idx, (_, row) in enumerate(plot_df.iterrows()):                color = 'red' if row['changes_sig'] else 'blue'                ax.errorbar(                    x=row['pooled_effect'], y=y_positions[idx],                    xerr=[[row['pooled_effect'] - row['ci_lower']], [row['ci_upper'] - row['pooled_effect']]],                    fmt='o', capsize=3, color=color, ecolor=color, mfc=color,                    mec='black', markersize=5, linewidth=1.5, zorder=3                )            # --- Add Legend & Reference Lines ---            legend_elements = [                Line2D([0], [0], marker='o', color='w', markerfacecolor='blue', markeredgecolor='black', markersize=8, label='No significance change'),                Line2D([0], [0], marker='o', color='w', markerfacecolor='red', markeredgecolor='black', markersize=8, label='Changes significance'),                Line2D([0], [0], color='darkred', linestyle='--', linewidth=2, label=f'Original Effect ({original_effect:.3f})'),                plt.Rectangle((0, 0), 1, 1, fc='red', alpha=0.1, label='Original 95% CI')            ]            ax.axvline(x=original_effect, color='darkred', linestyle='--', linewidth=2, zorder=1)            ax.axvspan(original_ci_lower, original_ci_upper, color='red', alpha=0.1, zorder=0)            ax.axvline(x=es_config.get('null_value', 0), color='gray', linestyle='-', linewidth=1, alpha=0.5, zorder=0)            ax.set_yticks(y_positions)            ax.set_yticklabels(plot_df['unit_removed'], fontsize=8)            ax.set_xlabel(f"Pooled Effect ({es_config['effect_label']})", fontsize=12, fontweight='bold')            ax.set_ylabel(f"Study Removed", fontsize=12, fontweight='bold')            ax.set_title("Three-Level Leave-One-Out Sensitivity Analysis", fontsize=14, fontweight='bold', pad=15)            ax.legend(handles=legend_elements, loc='best', fontsize=10, framealpha=0.9)            ax.grid(axis='x', linestyle=':', alpha=0.4)            fig.tight_layout()            # --- 6. Save Files ---            print("\nSTEP 4: SAVING FILES")            print("---------------------------------")            timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")            base_filename = f"{filename_prefix}_{timestamp}"            saved_files = []            if save_pdf:                pdf_filename = f"{base_filename}.pdf"                fig.savefig(pdf_filename, bbox_inches='tight')                saved_files.append(pdf_filename)                print(f"  ‚úì {pdf_filename}")            if save_png:                png_filename = f"{base_filename}.png"                fig.savefig(png_filename, dpi=png_dpi, bbox_inches='tight')                saved_files.append(png_filename)                print(f"  ‚úì {png_filename} (DPI: {png_dpi})")            plt.show()            print(f"\n" + "="*70)            print("‚úÖ 3-LEVEL LEAVE-ONE-OUT COMPLETE")            print("="*70)            # --- 7. Save Results ---            ANALYSIS_CONFIG['loo_3level_results'] = {                'timestamp': datetime.datetime.now(),                'results_df': results_df,                'removal_unit': 'study',                'original_effect': original_effect,                'effect_range': effect_range,                'n_sig_changers': len(sig_changers)            }            print(f"‚úì Results saved to ANALYSIS_CONFIG['loo_3level_results']")        except Exception as e:            print(f"\n‚ùå AN ERROR OCCURRED:\n")            print(f"  Type: {type(e).__name__}")            print(f"  Message: {e}")            print("\n  Traceback:")            traceback.print_exc(file=sys.stdout)            print("\n" + "="*70)            print("ANALYSIS FAILED. See error message above.")            print("Please check your data and configuration.")            print("="*70)# --- 6. DISPLAY WIDGETS ---try:    if 'ANALYSIS_CONFIG' not in globals() or 'three_level_results' not in ANALYSIS_CONFIG:        print("="*70)        print("‚ö†Ô∏è  PREREQUISITE NOT MET")        print("="*70)        print("Please run Cell 6.5 (Three-Level Meta-Analysis) successfully before running this cell.")    elif ANALYSIS_CONFIG['three_level_results'].get('status') != 'completed':         print("="*70)         print("‚ö†Ô∏è  PREREQUISITE NOT MET")         print("="*70)         print("Cell 6.5 (Three-Level Meta-Analysis) must be run successfully first.")    else:        print("="*70)        print("‚úÖ 3-LEVEL LEAVE-ONE-OUT INTERFACE READY")        print("="*70)        print("  ‚úì This will re-run the 3-level model for each study removed.")        print("  ‚úì Customize plot options and click 'Run'.")        display(widgets.VBox([            header,            widgets.HTML("<hr style='margin: 15px 0;'>"),            widgets.HTML("<b>Plot Options:</b>"),            tab,            widgets.HTML("<hr style='margin: 15px 0;'>"),            run_button,            analysis_output        ]))except Exception as e:    print(f"‚ùå An error occurred during initialization: {e}")    print("Please ensure the notebook has been run in order.")

In [None]:
# ‚ö†Ô∏è PREREQUISITES:# - Must calculate effect sizes first## Expected runtime: < 5 seconds## INTERPRETATION:# - I¬≤ > 75% indicates considerable heterogeneity (high variability between studies)# - I¬≤ 50-75% = moderate heterogeneity# - I¬≤ < 50% = low heterogeneity# - Significant Q-test (p < 0.05) indicates heterogeneity is present# CLUSTER-ROBUST VARIANCE ESTIMATION## Method from Hedges, Tipton, & Johnson (2010). Robust variance estimation in# meta-regression with dependent effect size estimates. Research Synthesis Methods, 1(1), 39-65.## This approach:# 1. Allows for dependence between effect sizes from the same cluster (study)# 2. Provides valid standard errors even with small number of clusters# 3. Uses t-distribution with degrees of freedom correction## Particularly important when:# - Multiple effect sizes are extracted from the same study# - Effect sizes are nested within higher-level units# - Assumption of independence is violated#@title üìà CUMULATIVE META-ANALYSIS#@title üìà CUMULATIVE META-ANALYSIS# =============================================================================# CELL 14: CUMULATIVE META-ANALYSIS# Purpose: Show how effect sizes evolve chronologically as studies accumulate.# Method:  "Two-Step" Approach for clustered data:#          1. Aggregate effects within each study (if 'By Study' selected)#          2. Perform cumulative Random-Effects meta-analysis over time# Dependencies: Cell 6 (overall_results), Cell 5 (data)# Outputs: Cumulative forest plot and stability metrics# =============================================================================print("="*70)print("CUMULATIVE META-ANALYSIS")print("="*70)# --- 1. HELPER FUNCTIONS ---def calculate_tau_squared_dl(df, effect_col, var_col):    """    Calculate Tau-squared. Uses Global Advanced Estimator (Cell 4.5) if available,    otherwise falls back to DerSimonian-Laird (DL).    """    k = len(df)    if k < 2: return 0.0    # Try using the advanced REML estimator from Cell 4.5 first    if 'calculate_tau_squared' in globals():        tau_method = 'REML' # Prefer REML for consistency        try:            tau_sq, info = calculate_tau_squared(df, effect_col, var_col, method=tau_method)            if info.get('success', True):                return tau_sq        except Exception:            pass # Fall back to DL if REML fails (common in small cumulative steps)    # Classic DL Method (Fallback)    try:        w_fixed = 1 / df[var_col]        sum_w = w_fixed.sum()        if sum_w <= 0: return 0.0        pooled_effect = (w_fixed * df[effect_col]).sum() / sum_w        Qt = (w_fixed * (df[effect_col] - pooled_effect)**2).sum()        df_Q = k - 1        sum_w_sq = (w_fixed**2).sum()        C = sum_w - (sum_w_sq / sum_w)        if C > 0 and Qt > df_Q:            tau_squared = (Qt - df_Q) / C        else:            tau_squared = 0.0        return max(0.0, tau_squared)    except Exception:        return 0.0def calculate_re_pooled(df, tau_squared, effect_col, var_col, alpha=0.05):    """Calculate Random-Effects pooled estimate with CI"""    k = len(df)    if k < 1: return np.nan, np.nan, np.nan, np.nan, np.nan    try:        w_re = 1 / (df[var_col] + tau_squared)        sum_w_re = w_re.sum()        if sum_w_re <= 0: return np.nan, np.nan, np.nan, np.nan, np.nan        pooled_effect = (w_re * df[effect_col]).sum() / sum_w_re        pooled_var = 1 / sum_w_re        pooled_se = np.sqrt(pooled_var)        z_crit = norm.ppf(1 - alpha / 2)        ci_lower = pooled_effect - z_crit * pooled_se        ci_upper = pooled_effect + z_crit * pooled_se        # Calculate I-squared        w_fixed = 1 / df[var_col]        sum_w_fixed = w_fixed.sum()        pooled_effect_fe = (w_fixed * df[effect_col]).sum() / sum_w_fixed        Q = (w_fixed * (df[effect_col] - pooled_effect_fe)**2).sum()        df_Q = k - 1        I_sq = max(0, ((Q - df_Q) / Q) * 100) if Q > 0 else 0        return pooled_effect, pooled_se, ci_lower, ci_upper, I_sq    except Exception:        return np.nan, np.nan, np.nan, np.nan, np.nan# --- 2. LOAD CONFIGURATION ---try:    if 'ANALYSIS_CONFIG' not in locals() and 'ANALYSIS_CONFIG' not in globals():        raise NameError("ANALYSIS_CONFIG not found.")    if 'analysis_data' in ANALYSIS_CONFIG:        analysis_data = ANALYSIS_CONFIG['analysis_data']    elif 'data_filtered' in globals():        analysis_data = data_filtered    else:        raise ValueError("Cannot find analysis data")    if analysis_data.empty:        raise ValueError("Analysis data is empty")    effect_col = ANALYSIS_CONFIG['effect_col']    var_col = ANALYSIS_CONFIG['var_col']    es_config = ANALYSIS_CONFIG['es_config']    overall_results = ANALYSIS_CONFIG['overall_results']    if 'year' not in analysis_data.columns:        raise ValueError("'year' column not found. Ensure data has publication years.")    # Clean year data    analysis_data_with_year = analysis_data.copy()    analysis_data_with_year['year'] = pd.to_numeric(analysis_data_with_year['year'], errors='coerce')    analysis_data_with_year = analysis_data_with_year.dropna(subset=['year'])    if len(analysis_data_with_year) < 2:        raise ValueError(f"Insufficient data with valid years. Need at least 2.")    n_studies = analysis_data_with_year['id'].nunique()    n_obs = len(analysis_data_with_year)    year_range = (int(analysis_data_with_year['year'].min()), int(analysis_data_with_year['year'].max()))    print(f"‚úì Configuration loaded")    print(f"  Effect size: {es_config['effect_label']}")    print(f"  Data: {n_obs} observations from {n_studies} studies")    print(f"  Year range: {year_range[0]} - {year_range[1]}")except (NameError, KeyError, ValueError) as e:    print(f"‚ùå ERROR: {e}")    print("  Please ensure Cells 1-6 have been run.")    raise# --- 3. CREATE WIDGETS ---header = widgets.HTML(    "<h3 style='color: #2E86AB;'>Cumulative Meta-Analysis Setup</h3>"    "<p style='color: #666;'><i>Visualize how pooled effect sizes change as evidence accumulates over time</i></p>")sort_order_widget = widgets.RadioButtons(    options=[('Chronological (oldest first)', 'ascending'), ('Reverse Chronological (newest first)', 'descending')],    value='ascending', description='Sort Order:', style={'description_width': '120px'}, layout=widgets.Layout(width='500px'))unit_widget = widgets.RadioButtons(    options=[('By Study (aggregate first - Recommended)', 'study'), ('By Observation (ignore clustering)', 'observation')],    value='study', description='Aggregation:', style={'description_width': '120px'}, layout=widgets.Layout(width='500px'))show_title_widget = widgets.Checkbox(value=True, description='Show Plot Title', indent=False, layout=widgets.Layout(width='450px'))title_widget = widgets.Text(value=f'Cumulative Meta-Analysis: {es_config["effect_label"]} Over Time', description='Title:', layout=widgets.Layout(width='500px'), style={'description_width': '120px'})xlabel_widget = widgets.Text(value='Year', description='X-Axis Label:', layout=widgets.Layout(width='500px'), style={'description_width': '120px'})ylabel_widget = widgets.Text(value=es_config['effect_label'], description='Y-Axis Label:', layout=widgets.Layout(width='500px'), style={'description_width': '120px'})plot_width_widget = widgets.FloatSlider(value=12.0, min=8.0, max=16.0, step=0.5, description='Plot Width:', continuous_update=False, style={'description_width': '120px'}, layout=widgets.Layout(width='450px'))plot_height_widget = widgets.FloatSlider(value=8.0, min=4.0, max=12.0, step=0.5, description='Plot Height:', continuous_update=False, style={'description_width': '120px'}, layout=widgets.Layout(width='450px'))show_ci_widget = widgets.Checkbox(value=True, description='Show 95% Confidence Intervals', indent=False, layout=widgets.Layout(width='450px'))show_null_widget = widgets.Checkbox(value=True, description='Show Null Effect Line', indent=False, layout=widgets.Layout(width='450px'))show_final_widget = widgets.Checkbox(value=True, description='Highlight Final Effect (dashed line)', indent=False, layout=widgets.Layout(width='450px'))show_i2_widget = widgets.Checkbox(value=False, description='Show I¬≤ Trajectory (secondary axis)', indent=False, layout=widgets.Layout(width='450px'))line_color_widget = widgets.Dropdown(options=['blue', 'red', 'black', 'green', 'purple', 'orange'], value='blue', description='Line Color:', style={'description_width': '120px'}, layout=widgets.Layout(width='450px'))line_width_widget = widgets.FloatSlider(value=2.0, min=0.5, max=4.0, step=0.5, description='Line Width:', continuous_update=False, style={'description_width': '120px'}, layout=widgets.Layout(width='450px'))ci_alpha_widget = widgets.FloatSlider(value=0.3, min=0.1, max=0.8, step=0.1, description='CI Transparency:', continuous_update=False, style={'description_width': '120px'}, layout=widgets.Layout(width='450px'))marker_size_widget = widgets.IntSlider(value=50, min=20, max=200, step=10, description='Marker Size:', continuous_update=False, style={'description_width': '120px'}, layout=widgets.Layout(width='450px'))save_pdf_widget = widgets.Checkbox(value=True, description='Save as PDF', indent=False, layout=widgets.Layout(width='450px'))save_png_widget = widgets.Checkbox(value=True, description='Save as PNG', indent=False, layout=widgets.Layout(width='450px'))png_dpi_widget = widgets.IntSlider(value=300, min=150, max=600, step=50, description='PNG DPI:', continuous_update=False, style={'description_width': '120px'}, layout=widgets.Layout(width='450px'))show_table_widget = widgets.Checkbox(value=True, description='Show detailed results table', indent=False, layout=widgets.Layout(width='450px'))tab1 = widgets.VBox([widgets.HTML("<h4 style='color: #2E86AB;'>Analysis Options</h4>"), sort_order_widget, widgets.HTML("<hr style='margin: 10px 0;'>"), unit_widget])tab2 = widgets.VBox([widgets.HTML("<h4 style='color: #2E86AB;'>Labels & Dimensions</h4>"), show_title_widget, title_widget, widgets.HTML("<hr style='margin: 10px 0;'>"), xlabel_widget, ylabel_widget, widgets.HTML("<hr style='margin: 10px 0;'>"), plot_width_widget, plot_height_widget])tab3 = widgets.VBox([widgets.HTML("<h4 style='color: #2E86AB;'>Visual Elements</h4>"), show_ci_widget, show_null_widget, show_final_widget, show_i2_widget, widgets.HTML("<hr style='margin: 10px 0;'>"), line_color_widget, line_width_widget, ci_alpha_widget, marker_size_widget])tab4 = widgets.VBox([widgets.HTML("<h4 style='color: #2E86AB;'>Export Options</h4>"), save_pdf_widget, save_png_widget, png_dpi_widget, widgets.HTML("<hr style='margin: 10px 0;'>"), show_table_widget])tabs = widgets.Tab(children=[tab1, tab2, tab3, tab4])tabs.set_title(0, '‚öôÔ∏è Analysis'); tabs.set_title(1, 'üìù Labels'); tabs.set_title(2, 'üé® Visuals'); tabs.set_title(3, 'üíæ Export')run_button = widgets.Button(description='‚ñ∂ Run Cumulative Meta-Analysis', button_style='success', layout=widgets.Layout(width='500px', height='50px'), style={'font_weight': 'bold'})analysis_output = widgets.Output()# --- 4. DEFINE ANALYSIS FUNCTION ---def run_cumulative_analysis(b):    with analysis_output:        clear_output(wait=True)        print("\n" + "="*70)        print("CUMULATIVE META-ANALYSIS")        print("="*70)        print(f"Timestamp: {datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")        try:            # Prepare data            data = analysis_data_with_year.copy()            unit = unit_widget.value            sort_order = sort_order_widget.value            # --- Step 1: Aggregation (Handle Clustering) ---            if unit == 'study':                print(f"‚öôÔ∏è  Aggregating observations by study (Two-Step Approach)...")                # For each study, take the earliest year                study_years = data.groupby('id')['year'].min().reset_index()                study_years.columns = ['id', 'study_year']                data = data.merge(study_years, on='id', how='left')                study_data = []                for study_id in data['id'].unique():                    study_obs = data[data['id'] == study_id]                    study_year = study_obs['study_year'].iloc[0]                    # Pool observations within study using fixed-effects (standard practice)                    if len(study_obs) > 1:                        w_study = 1 / study_obs[var_col]                        sum_w_study = w_study.sum()                        pooled_es = (w_study * study_obs[effect_col]).sum() / sum_w_study                        pooled_var = 1 / sum_w_study                    else:                        pooled_es = study_obs[effect_col].iloc[0]                        pooled_var = study_obs[var_col].iloc[0]                    study_data.append({                        'id': study_id,                        'year': study_year,                        effect_col: pooled_es,                        var_col: pooled_var,                        'n_obs': len(study_obs)                    })                data_sorted = pd.DataFrame(study_data)                print(f"  ‚úì Aggregated {len(data)} observations into {len(data_sorted)} studies")            else:                # Use observations directly (less robust)                data_sorted = data[[effect_col, var_col, 'year', 'id']].copy()                data_sorted['n_obs'] = 1            # --- Step 2: Cumulative Analysis ---            data_sorted = data_sorted.sort_values('year', ascending=(sort_order == 'ascending'))            data_sorted = data_sorted.reset_index(drop=True)            n_units = len(data_sorted)            print(f"\n‚öôÔ∏è  Running cumulative analysis on {n_units} {unit}s...")            cumulative_results = []            for i in range(1, n_units + 1):                df_cum = data_sorted.iloc[:i].copy()                tau2_cum = calculate_tau_squared_dl(df_cum, effect_col, var_col)                effect_cum, se_cum, ci_lower_cum, ci_upper_cum, I2_cum = calculate_re_pooled(                    df_cum, tau2_cum, effect_col, var_col                )                cumulative_results.append({                    'step': i,                    'year': df_cum['year'].iloc[-1],                    'id_added': df_cum['id'].iloc[-1],                    'n_studies': df_cum['id'].nunique(),                    'pooled_effect': effect_cum,                    'ci_lower': ci_lower_cum,                    'ci_upper': ci_upper_cum,                    'I_squared': I2_cum                })                if i % 10 == 0 or i == n_units: print(f"  Progress: {i}/{n_units}", end='\r')            print(f"\n  ‚úì Analysis complete")            results_df = pd.DataFrame(cumulative_results)            # --- Step 3: Display Table ---            if show_table_widget.value:                print(f"\n" + "="*70)                print("CUMULATIVE RESULTS TABLE")                print("="*70)                print(f"\n{'Step':<5} {'Year':<6} {'N':<4} {'Effect':<10} {'95% CI':<25} {'I¬≤%':<8}")                print("-" * 70)                indices_to_show = (list(range(5)) + list(range(len(results_df)-5, len(results_df)))) if len(results_df) > 10 else range(len(results_df))                last_shown = -1                for idx in indices_to_show:                    if idx >= len(results_df): continue                    if idx - last_shown > 1: print("  ...")                    row = results_df.iloc[idx]                    ci_str = f"[{row['ci_lower']:.4f}, {row['ci_upper']:.4f}]"                    print(f"{int(row['step']):<5} {int(row['year']):<6} {int(row['n_studies']):<4} {row['pooled_effect']:<10.4f} {ci_str:<25} {row['I_squared']:<8.1f}")                    last_shown = idx            # --- Step 4: Create Plot ---            fig, ax1 = plt.subplots(figsize=(plot_width_widget.value, plot_height_widget.value))            ax1.plot(results_df['year'], results_df['pooled_effect'],                     color=line_color_widget.value, linewidth=line_width_widget.value, marker='o',                     markersize=marker_size_widget.value/10, label='Cumulative Effect', zorder=3)            if show_ci_widget.value:                ax1.fill_between(results_df['year'], results_df['ci_lower'], results_df['ci_upper'],                                 color=line_color_widget.value, alpha=ci_alpha_widget.value, label='95% CI', zorder=2)            if show_null_widget.value:                ax1.axhline(y=es_config['null_value'], color='gray', linestyle='--', linewidth=1.5, label='Null Effect', zorder=1)            if show_final_widget.value:                ax1.axhline(y=results_df.iloc[-1]['pooled_effect'], color=line_color_widget.value, linestyle=':',                           linewidth=2, alpha=0.7, label='Final Effect', zorder=1)            ax1.set_xlabel(xlabel_widget.value, fontsize=12, fontweight='bold')            ax1.set_ylabel(ylabel_widget.value, fontsize=12, fontweight='bold')            ax1.grid(True, alpha=0.3)            ax1.legend(loc='upper left', frameon=True)            if show_i2_widget.value:                ax2 = ax1.twinx()                ax2.plot(results_df['year'], results_df['I_squared'], color='orange', linestyle='--', alpha=0.7, label='I¬≤ (%)')                ax2.set_ylabel('Heterogeneity (I¬≤%)', color='orange', fontweight='bold')                ax2.set_ylim(0, 100)                ax2.legend(loc='upper right')            if show_title_widget.value:                plt.title(title_widget.value, fontsize=14, fontweight='bold', pad=20)            plt.tight_layout()            # --- Step 5: Save ---            timestamp = datetime.datetime.now().strftime('%Y%m%d_%H%M%S')            if save_pdf_widget.value:                plt.savefig(f'Cumulative_Meta_{timestamp}.pdf', bbox_inches='tight')                print(f"  ‚úì Saved PDF")            if save_png_widget.value:                plt.savefig(f'Cumulative_Meta_{timestamp}.png', dpi=png_dpi_widget.value, bbox_inches='tight')                print(f"  ‚úì Saved PNG")            plt.show()            ANALYSIS_CONFIG['cumulative_results'] = results_df        except Exception as e:            print(f"\n‚ùå ERROR: {e}")            traceback.print_exc()run_button.on_click(run_cumulative_analysis)display(header)display(tabs)display(run_button)display(analysis_output)print("\n‚úÖ Widget interface ready.")

## Save Your Results

**IMPORTANT**: Google Colab sessions are temporary. Your analyses and outputs are NOT automatically saved when you close this notebook.

### How to Save Your Work

**1. Download Figures**:
- Right-click on any generated plot ‚Üí "Save image as..."
- Plots are generated as high-resolution PNG files suitable for publication

**2. Export Data Tables**:
```python
# Run this code in a new cell to save results to CSV
# Example: Save meta-analysis results
# results_df.to_csv('meta_analysis_results.csv', index=False)
# from google.colab import files
# files.download('meta_analysis_results.csv')
```

**3. Save to Google Drive** (recommended for large projects):
```python
# from google.colab import drive
# drive.mount('/content/drive')
# # Then save files to /content/drive/MyDrive/your_folder/
```

**4. Copy Results from Output Cells**:
- You can copy-paste text output (effect sizes, statistics) directly from cell outputs

### Tips

- **Save frequently** if running multiple analyses
- **Take screenshots** of widget configurations for reproducibility
- **Document your choices** (moderators selected, filters applied) for your methods section

---

## Statistical Methods and References

This notebook implements established statistical methods from the meta-analytic literature. Below are key references for the methods used.

### Three-Level Meta-Analytic Models

- **Cheung, M. W. L. (2014).** Modeling dependent effect sizes with three-level meta-analyses: A structural equation modeling approach. *Psychological Methods*, 19(2), 211-229. https://doi.org/10.1037/a0032968

- **Van den Noortgate, W., L√≥pez-L√≥pez, J. A., Mar√≠n-Mart√≠nez, F., & S√°nchez-Meca, J. (2013).** Three-level meta-analysis of dependent effect sizes. *Behavior Research Methods*, 45(2), 576-594. https://doi.org/10.3758/s13428-012-0261-6

- **Assink, M., & Wibbelink, C. J. (2016).** Fitting three-level meta-analytic models in R: A step-by-step tutorial. *The Quantitative Methods for Psychology*, 12(3), 154-174. https://doi.org/10.20982/tqmp.12.3.p154

### Heterogeneity Estimation Methods

- **DerSimonian, R., & Laird, N. (1986).** Meta-analysis in clinical trials. *Controlled Clinical Trials*, 7(3), 177-188. [DerSimonian-Laird estimator]

- **Viechtbauer, W. (2005).** Bias and efficiency of meta-analytic variance estimators in the random-effects model. *Journal of Educational and Behavioral Statistics*, 30(3), 261-293. [REML, ML, PM, SJ estimators]

- **Veroniki, A. A., Jackson, D., Viechtbauer, W., et al. (2016).** Methods to estimate the between-study variance and its uncertainty in meta-analysis. *Research Synthesis Methods*, 7(1), 55-79. https://doi.org/10.1002/jrsm.1164

### Cluster-Robust Variance Estimation

- **Hedges, L. V., Tipton, E., & Johnson, M. C. (2010).** Robust variance estimation in meta-regression with dependent effect size estimates. *Research Synthesis Methods*, 1(1), 39-65. https://doi.org/10.1002/jrsm.5

- **Pustejovsky, J. E., & Tipton, E. (2021).** Meta-analysis with robust variance estimation: Expanding the range of working models. *Prevention Science*, 23(3), 425-438. https://doi.org/10.1007/s11121-021-01246-3

### Knapp-Hartung Adjustment

- **Knapp, G., & Hartung, J. (2003).** Improved tests for a random effects meta-regression with a single covariate. *Statistics in Medicine*, 22(17), 2693-2710. https://doi.org/10.1002/sim.1482

- **IntHout, J., Ioannidis, J. P., & Borm, G. F. (2014).** The Hartung-Knapp-Sidik-Jonkman method for random effects meta-analysis is straightforward and considerably outperforms the standard DerSimonian-Laird method. *BMC Medical Research Methodology*, 14, 25. https://doi.org/10.1186/1471-2288-14-25

### Effect Size Calculations

- **Hedges, L. V., & Olkin, I. (1985).** *Statistical methods for meta-analysis*. Academic Press. [Hedges' g, variance calculations]

- **Borenstein, M., Hedges, L. V., Higgins, J. P., & Rothstein, H. R. (2009).** *Introduction to meta-analysis*. John Wiley & Sons. [Comprehensive effect size methods]

- **Lajeunesse, M. J. (2011).** On the meta-analysis of response ratios for studies with correlated and multi-group designs. *Ecology*, 92(11), 2049-2055. [Log response ratio in ecology]

### Publication Bias Assessment

- **Egger, M., Smith, G. D., Schneider, M., & Minder, C. (1997).** Bias in meta-analysis detected by a simple, graphical test. *BMJ*, 315(7109), 629-634. [Egger's regression test]

- **Duval, S., & Tweedie, R. (2000).** Trim and fill: A simple funnel-plot-based method of testing and adjusting for publication bias in meta-analysis. *Biometrics*, 56(2), 455-463. [Trim-and-fill method]

- **Nakagawa, S., Lagisz, M., Jennions, M. D., et al. (2022).** Methods for testing publication bias in ecological and evolutionary meta-analyses. *Methods in Ecology and Evolution*, 13(1), 4-21. https://doi.org/10.1111/2041-210X.13724

### Meta-Regression with Splines

- **Royston, P., & Altman, D. G. (1994).** Regression using fractional polynomials of continuous covariates: Parsimonious parametric modelling. *Journal of the Royal Statistical Society: Series C*, 43(3), 429-467. [Spline methodology]

- **Crippa, A., Discacciati, A., Bottai, M., Spiegelman, D., & Orsini, N. (2019).** One-stage dose-response meta-analysis for aggregated data. *Statistical Methods in Medical Research*, 28(5), 1579-1596. [Spline meta-regression]

### Software and Computational Methods

- **Viechtbauer, W. (2010).** Conducting meta-analyses in R with the metafor package. *Journal of Statistical Software*, 36(3), 1-48. https://doi.org/10.18637/jss.v036.i03

---

## Version History

### Version 1.0.0 (November 2025)
- Initial public release
- Complete refactoring for scientific publication
- Added comprehensive documentation and citations
- Consolidated imports and optimized code structure
- Added user guidance and technical specifications
- Implemented publication-ready visualizations

### Future Enhancements (Planned)
- Support for additional effect size metrics
- Bayesian meta-analysis options
- Enhanced batch processing capabilities
- Integration with additional data sources

---

*Last updated: November 2025*