# Hypothesis Generation Agent

This notebook implements a bivariate hypothesis generation agent that takes user questions and generates testable statistical hypotheses.

In [17]:
# Import required libraries
import os
import json
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field
from typing import List, Literal

In [18]:
# Load environment variables
load_dotenv()

# Read data dictionary
with open('data/HR_Data_Dictionary.csv', 'r') as f:
    DATA_DICTIONARY = f.read()

# Read KPI documentation for context
with open('data/hr_kpi_documentation.txt', 'r') as f:
    KPI_CONTEXT = f.read()

# Create comprehensive context
DATASET_CONTEXT = """
HR EMPLOYEE ATTRITION DATASET OVERVIEW:

Dataset: hr_employee_attrition
Total Records: 1,470 employees
Purpose: HR analytics data collection for employee attrition analysis

KEY METRICS AND DEFINITIONS:
""" + KPI_CONTEXT + """

AVAILABLE VARIABLES (35 columns):
- Demographics: age, gender, maritalstatus, education, educationfield
- Job Information: department, jobrole, joblevel, businesstravel, overtime
- Compensation: monthlyincome, dailyrate, hourlyrate, monthlyrate, percentsalaryhike, stockoptionlevel
- Work Experience: totalworkingyears, yearsatcompany, yearsincurrentrole, yearssincelastpromotion, yearswithcurrmanager, numcompaniesworked
- Satisfaction Metrics: jobsatisfaction, environmentsatisfaction, relationshipsatisfaction, worklifebalance
- Performance & Engagement: performancerating, jobinvolvement, trainingtimeslastyear
- Attrition: attrition (Target variable: Yes/No)
- Other: distancefromhome

VARIABLE TYPES:
- Categorical (TRUE in is_categorical): attrition, businesstravel, department, education, educationfield, 
  environmentsatisfaction, gender, jobinvolvement, joblevel, jobrole, jobsatisfaction, maritalstatus, 
  over18, overtime, performancerating, relationshipsatisfaction, stockoptionlevel, worklifebalance
  
- Numerical (FALSE in is_categorical): age, dailyrate, distancefromhome, employeecount, employeenumber, 
  hourlyrate, monthlyincome, monthlyrate, numcompaniesworked, percentsalaryhike, standardhours, 
  totalworkingyears, trainingtimeslastyear, yearsatcompany, yearsincurrentrole, yearssincelastpromotion, 
  yearswithcurrmanager

ANALYSIS FOCUS AREAS:
1. Employee Attrition Prediction and Prevention
2. Compensation and Salary Analysis
3. Work-Life Balance and Employee Satisfaction
4. Career Development and Promotion Trends
5. Department and Role-based Performance Analysis
6. Gender Diversity and Pay Equity
7. Training and Development Impact
"""

print("‚úÖ Environment loaded")
print(f"‚úÖ Data Dictionary loaded ({len(DATA_DICTIONARY)} characters)")
print(f"‚úÖ KPI Documentation loaded ({len(KPI_CONTEXT)} characters)")
print(f"‚úÖ Dataset Context prepared ({len(DATASET_CONTEXT)} characters)")

‚úÖ Environment loaded
‚úÖ Data Dictionary loaded (6457 characters)
‚úÖ KPI Documentation loaded (2302 characters)
‚úÖ Dataset Context prepared (4175 characters)


In [19]:
# Define the output schema for hypotheses
class BivariateHypothesis(BaseModel):
    """Single bivariate hypothesis with all required fields."""
    
    hypothesis_id: int = Field(description="Unique identifier for the hypothesis (1, 2, 3, ...)")
    
    null_hypothesis: str = Field(
        description="The null hypothesis (H0) stating no relationship or effect exists"
    )
    
    alternative_hypothesis: str = Field(
        description="The alternative hypothesis (H1) stating the expected relationship or effect"
    )
    
    variable_1: str = Field(
        description="First variable name (must exist in data dictionary)"
    )
    
    variable_2: str = Field(
        description="Second variable name (must exist in data dictionary)"
    )
    
    variable_1_type: Literal["categorical", "numerical"] = Field(
        description="Data type of variable 1"
    )
    
    variable_2_type: Literal["categorical", "numerical"] = Field(
        description="Data type of variable 2"
    )
    
    recommended_test: str = Field(
        description="Statistical test to use (e.g., 't-test', 'chi-square', 'ANOVA', 'correlation')"
    )
    
    rationale: str = Field(
        description="Brief explanation of why this hypothesis is relevant to the user's question"
    )


class HypothesisList(BaseModel):
    """List of bivariate hypotheses."""
    hypotheses: List[BivariateHypothesis] = Field(
        description="List of generated bivariate hypotheses"
    )


# Create parser
parser = PydanticOutputParser(pydantic_object=HypothesisList)

print("‚úÖ Output schema defined")
print("\nExpected JSON format:")
print(parser.get_format_instructions()[:500] + "...")

‚úÖ Output schema defined

Expected JSON format:
The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"$defs": {"BivariateHypothesis": {"description":...


In [20]:
# Initialize LLM (Local LM Studio)
llm = ChatOpenAI(
    base_url="http://127.0.0.1:1234/v1",
    api_key="lm-studio",
    model="ibm-granite/granite-3.1-8b-instruct",
    temperature=0.3,  # Lower temperature for more consistent hypothesis generation
    max_tokens=4000
)

print("‚úÖ LLM initialized (LM Studio - IBM Granite 3.1 8B)")

‚úÖ LLM initialized (LM Studio - IBM Granite 3.1 8B)


In [21]:
# Create the hypothesis generation prompt
hypothesis_prompt = ChatPromptTemplate.from_messages([
    ("system", 
     "You are an expert statistician and data scientist specializing in hypothesis generation "
     "for employee attrition analysis.\n\n"
     
     "TASK:\n"
     "Generate {num_hypotheses} testable bivariate hypotheses based on the user's question. "
     "Each hypothesis must involve exactly TWO variables from the provided data dictionary.\n\n"
     
     "REQUIREMENTS:\n"
     "1. Each hypothesis MUST use variables that exist in the data dictionary\n"
     "2. Hypotheses should be relevant to the user's research question\n"
     "3. Include both null (H0) and alternative (H1) hypotheses\n"
     "4. Specify the correct statistical test based on variable types:\n"
     "   - Numerical vs Numerical ‚Üí Correlation (Pearson/Spearman)\n"
     "   - Categorical vs Numerical ‚Üí t-test or ANOVA\n"
     "   - Categorical vs Categorical ‚Üí Chi-square test\n"
     "5. Provide clear rationale connecting hypothesis to the user's question\n\n"
     
     "CONTEXT ABOUT THE DATASET:\n"
     "{context}\n\n"
     
     "DATA DICTIONARY:\n"
     "{data_dictionary}\n\n"
     
     "{format_instructions}\n\n"
     
     "IMPORTANT: Return ONLY valid JSON matching the schema. No explanations outside the JSON."),
    
    ("user", 
     "User Question: {user_question}\n\n"
     "Generate {num_hypotheses} bivariate hypotheses to explore this question.")
])

# Create the chain
hypothesis_chain = hypothesis_prompt | llm | parser

print("‚úÖ Hypothesis generation chain created")

‚úÖ Hypothesis generation chain created


In [22]:
# Define the hypothesis generation function
def generate_hypotheses(
    user_question: str,
    num_hypotheses: int = 3,
    data_dictionary: str = DATA_DICTIONARY,
    context: str = DATASET_CONTEXT
) -> dict:
    """
    Generate bivariate hypotheses based on user question.
    
    Args:
        user_question: The research question from the user
        num_hypotheses: Number of hypotheses to generate (default: 3)
        data_dictionary: CSV data dictionary content
        context: Additional context about the dataset (uses DATASET_CONTEXT by default)
        
    Returns:
        Dictionary containing list of hypotheses in JSON format
    """
    try:
        result = hypothesis_chain.invoke({
            "user_question": user_question,
            "num_hypotheses": num_hypotheses,
            "data_dictionary": data_dictionary,
            "context": context,
            "format_instructions": parser.get_format_instructions()
        })
        
        # Convert Pydantic model to dict
        return result.dict()
        
    except Exception as e:
        return {
            "error": str(e),
            "hypotheses": []
        }

print("‚úÖ Hypothesis generation function defined")

‚úÖ Hypothesis generation function defined


## Test the Hypothesis Agent

In [23]:
# Test 1: Employee attrition factors
user_question_1 = "What factors influence employee attrition in the organization?"

print(f"üîç User Question: {user_question_1}")
print("\n‚è≥ Generating hypotheses...\n")

result_1 = generate_hypotheses(
    user_question=user_question_1,
    num_hypotheses=3
)

print(json.dumps(result_1, indent=2))

üîç User Question: What factors influence employee attrition in the organization?

‚è≥ Generating hypotheses...

{
  "hypotheses": [
    {
      "hypothesis_id": 1,
      "null_hypothesis": "There is no significant relationship between job satisfaction and employee attrition.",
      "alternative_hypothesis": "Employees with lower job satisfaction are more likely to leave the organization.",
      "variable_1": "jobsatisfaction",
      "variable_2": "attrition",
      "variable_1_type": "numerical",
      "variable_2_type": "categorical",
      "recommended_test": "Chi-square test",
      "rationale": "Exploring the relationship between job satisfaction (numerical) and employee attrition (categorical) can reveal if dissatisfied employees are more likely to leave."
    },
    {
      "hypothesis_id": 2,
      "null_hypothesis": "There is no significant correlation between years at company and employee attrition.",
      "alternative_hypothesis": "Employees who have been with the compan

In [24]:
# Test 2: Job satisfaction analysis
user_question_2 = "How does job satisfaction relate to employee performance and retention?"

print(f"üîç User Question: {user_question_2}")
print("\n‚è≥ Generating hypotheses...\n")

result_2 = generate_hypotheses(
    user_question=user_question_2,
    num_hypotheses=5
)

print(json.dumps(result_2, indent=2))

üîç User Question: How does job satisfaction relate to employee performance and retention?

‚è≥ Generating hypotheses...

{
  "hypotheses": [
    {
      "hypothesis_id": 1,
      "null_hypothesis": "There is no significant relationship between job satisfaction (Variable 2) and employee performance (Variable 1).",
      "alternative_hypothesis": "Higher levels of job satisfaction are associated with better employee performance.",
      "variable_1": "performancerating",
      "variable_2": "jobsatisfaction",
      "variable_1_type": "numerical",
      "variable_2_type": "numerical",
      "recommended_test": "Correlation (Pearson)",
      "rationale": "This hypothesis explores the direct relationship between job satisfaction and employee performance, which could indicate that satisfied employees are more engaged and productive."
    },
    {
      "hypothesis_id": 2,
      "null_hypothesis": "There is no significant difference in attrition rates (Variable 3) between employees with hig

In [25]:
# Test 3: Custom question
user_question_3 = "Does work-life balance impact employee turnover?"

print(f"üîç User Question: {user_question_3}")
print("\n‚è≥ Generating hypotheses...\n")

result_3 = generate_hypotheses(
    user_question=user_question_3,
    num_hypotheses=4,
    context="HR dataset with 1,470 employees. Focus on work-life balance and retention patterns."
)

print(json.dumps(result_3, indent=2))

üîç User Question: Does work-life balance impact employee turnover?

‚è≥ Generating hypotheses...

{
  "hypotheses": [
    {
      "hypothesis_id": 1,
      "null_hypothesis": "There is no significant relationship between work-life balance and employee turnover.",
      "alternative_hypothesis": "Employees with lower work-life balance ratings are more likely to leave the company.",
      "variable_1": "worklifebalance",
      "variable_2": "attrition",
      "variable_1_type": "numerical",
      "variable_2_type": "categorical",
      "recommended_test": "Logistic Regression",
      "rationale": "This hypothesis explores the relationship between work-life balance (numerical) and employee turnover (categorical). Lower numerical ratings on work-life balance might indicate dissatisfaction, potentially leading to higher attrition rates."
    },
    {
      "hypothesis_id": 2,
      "null_hypothesis": "There is no association between the level of work-life balance satisfaction and the like

## Pretty Print Function

In [26]:
def print_hypotheses(result: dict):
    """Pretty print hypotheses in a readable format."""
    
    if "error" in result:
        print(f"‚ùå Error: {result['error']}")
        return
    
    hypotheses = result.get("hypotheses", [])
    
    if not hypotheses:
        print("‚ö†Ô∏è No hypotheses generated")
        return
    
    print(f"\nüìä Generated {len(hypotheses)} Hypotheses\n")
    print("=" * 80)
    
    for hyp in hypotheses:
        print(f"\n#{hyp['hypothesis_id']}")
        print("-" * 80)
        print(f"\nüîπ Variables: {hyp['variable_1']} ({hyp['variable_1_type']}) vs {hyp['variable_2']} ({hyp['variable_2_type']})")
        print(f"\nüìà Recommended Test: {hyp['recommended_test']}")
        print(f"\n‚ùå H0: {hyp['null_hypothesis']}")
        print(f"\n‚úÖ H1: {hyp['alternative_hypothesis']}")
        print(f"\nüí° Rationale: {hyp['rationale']}")
        print()
    
    print("=" * 80)

# Test pretty print
print_hypotheses(result_1)


üìä Generated 3 Hypotheses


#1
--------------------------------------------------------------------------------

üîπ Variables: jobsatisfaction (numerical) vs attrition (categorical)

üìà Recommended Test: Chi-square test

‚ùå H0: There is no significant relationship between job satisfaction and employee attrition.

‚úÖ H1: Employees with lower job satisfaction are more likely to leave the organization.

üí° Rationale: Exploring the relationship between job satisfaction (numerical) and employee attrition (categorical) can reveal if dissatisfied employees are more likely to leave.


#2
--------------------------------------------------------------------------------

üîπ Variables: yearsatcompany (numerical) vs attrition (categorical)

üìà Recommended Test: Chi-square test

‚ùå H0: There is no significant correlation between years at company and employee attrition.

‚úÖ H1: Employees who have been with the company for a shorter period are more likely to leave.

üí° Rationale: In

## Save Hypotheses to JSON File

In [27]:
# Save hypotheses to a JSON file
output_file = "generated_hypotheses.json"

with open(output_file, 'w') as f:
    json.dump(result_1, f, indent=2)

print(f"‚úÖ Hypotheses saved to {output_file}")

‚úÖ Hypotheses saved to generated_hypotheses.json


# Stats Agent

| Type of Question | Example Hypothesis | Statistical Test Used | What It Checks |
| --- | --- | --- | --- |
| **Are two or more groups different?** | "Do different marketing campaigns produce different sales?" | **ANOVA (Analysis of Variance)** | Compares means across multiple groups. |
| **Are two categorical variables related?** | "Is there a relationship between gender and purchase preference?" | **Chi-Square Test** | Checks if categories are dependent or independent. |
| **Are two continuous variables correlated?** | "Does customer age correlate with spending?" | **Pearson Correlation** | Checks linear correlation (sensitive to outliers). |
| **Are two ranked/nonlinear variables correlated?** | "Does satisfaction rating (1‚Äì5) increase with income?" | **Spearman Correlation** | Checks monotonic relationships (nonlinear or ordinal data). |

This agent takes a single hypothesis and executes the appropriate statistical test based on variable types.

In [28]:
# Import additional libraries for statistical testing
import pandas as pd
import numpy as np
from scipy import stats
from scipy.stats import chi2_contingency, pearsonr, spearmanr, f_oneway, ttest_ind
from sqlalchemy import create_engine
from urllib.parse import quote_plus
import warnings
warnings.filterwarnings('ignore')

print("‚úÖ Statistical testing libraries imported")
print("‚úÖ Database connection libraries imported")

‚úÖ Statistical testing libraries imported
‚úÖ Database connection libraries imported


In [33]:
# Load the HR dataset from PostgreSQL
from sqlalchemy import create_engine
from urllib.parse import quote_plus

# Build PostgreSQL connection URL
encoded_pw = quote_plus(os.getenv("DB_PASSWORD"))
postgres_url = (
    f"postgresql+psycopg2://{os.getenv('DB_USER')}:{encoded_pw}"
    f"@{os.getenv('DB_HOST')}:{os.getenv('DB_PORT')}/{os.getenv('DB_NAME')}"
)

# Create database engine
engine = create_engine(postgres_url)

# Load HR employee attrition data from PostgreSQL
# Based on database inspection, using public.hr_employee_attrition
schema_name = 'public'  # Table is in public schema
table_name = 'hr_employee_attrition'

# Use read_sql_query with fully qualified table name
query = f'SELECT * FROM "{schema_name}"."{table_name}"'
df = pd.read_sql_query(query, engine)

# Convert column names to lowercase for consistency
df.columns = df.columns.str.lower()

print(f"‚úÖ Dataset loaded from PostgreSQL: {df.shape[0]} rows, {df.shape[1]} columns")
print(f"üìä Table: {schema_name}.{table_name}")
print(f"\nüìã Sample columns: {list(df.columns[:10])}")

‚úÖ Dataset loaded from PostgreSQL: 1470 rows, 35 columns
üìä Table: public.hr_employee_attrition

üìã Sample columns: ['age', 'attrition', 'businesstravel', 'dailyrate', 'department', 'distancefromhome', 'education', 'educationfield', 'employeecount', 'employeenumber']


In [34]:
# Check available schemas and tables in PostgreSQL
from sqlalchemy import inspect, text

# Build PostgreSQL connection URL
encoded_pw = quote_plus(os.getenv("DB_PASSWORD"))
postgres_url = (
    f"postgresql+psycopg2://{os.getenv('DB_USER')}:{encoded_pw}"
    f"@{os.getenv('DB_HOST')}:{os.getenv('DB_PORT')}/{os.getenv('DB_NAME')}"
)

# Create database engine
engine = create_engine(postgres_url)

print("üîç Inspecting PostgreSQL Database...\n")

# Get inspector
inspector = inspect(engine)

# List all schemas
schemas = inspector.get_schema_names()
print(f"üìã Available Schemas: {schemas}\n")

# List tables in each schema
for schema in schemas:
    tables = inspector.get_table_names(schema=schema)
    if tables:
        print(f"\nüìä Tables in schema '{schema}':")
        for table in tables:
            print(f"   ‚Ä¢ {table}")

print("\n" + "="*80)
print("üí° Use the schema and table name shown above in the next cell")
print("="*80)

üîç Inspecting PostgreSQL Database...

üìã Available Schemas: ['hr_data', 'information_schema', 'public']


üìä Tables in schema 'hr_data':
   ‚Ä¢ employee_attrition

üìä Tables in schema 'information_schema':
   ‚Ä¢ sql_features
   ‚Ä¢ sql_implementation_info
   ‚Ä¢ sql_parts
   ‚Ä¢ sql_sizing

üìä Tables in schema 'public':
   ‚Ä¢ hr_employee_attrition

üí° Use the schema and table name shown above in the next cell


### Alternative: Load with Custom SQL Query

You can also load data using a custom SQL query instead of loading the entire table:

```python
# Example: Load with custom query
query = "SELECT * FROM hr_employee_attrition WHERE attrition = 'Yes'"
df = pd.read_sql_query(query, engine)
```

In [35]:
class StatsAgent:
    """
    Statistical Testing Agent that executes appropriate tests based on hypothesis variable types.
    """
    
    def __init__(self, dataframe: pd.DataFrame):
        """
        Initialize the Stats Agent with a dataset.
        
        Args:
            dataframe: The dataset to perform statistical tests on
        """
        self.df = dataframe
        self.results = {}
    
    def chi_square_test(self, var1: str, var2: str) -> dict:
        """
        Perform Chi-Square test for two categorical variables.
        
        Args:
            var1: First categorical variable
            var2: Second categorical variable
            
        Returns:
            Dictionary containing test results
        """
        try:
            # Create contingency table
            contingency_table = pd.crosstab(self.df[var1], self.df[var2])
            
            # Perform chi-square test
            chi2, p_value, dof, expected_freq = chi2_contingency(contingency_table)
            
            # Calculate Cram√©r's V (effect size)
            n = contingency_table.sum().sum()
            min_dim = min(contingency_table.shape) - 1
            cramers_v = np.sqrt(chi2 / (n * min_dim)) if min_dim > 0 else 0
            
            return {
                "test_name": "Chi-Square Test of Independence",
                "test_type": "categorical_vs_categorical",
                "variable_1": var1,
                "variable_2": var2,
                "chi2_statistic": round(chi2, 4),
                "p_value": round(p_value, 6),
                "degrees_of_freedom": int(dof),
                "cramers_v": round(cramers_v, 4),
                "sample_size": int(n),
                "interpretation": self._interpret_p_value(p_value),
                "effect_size_interpretation": self._interpret_cramers_v(cramers_v),
                "contingency_table": contingency_table.to_dict()
            }
        except Exception as e:
            return {"error": f"Chi-square test failed: {str(e)}"}
    
    def t_test(self, categorical_var: str, numerical_var: str) -> dict:
        """
        Perform Independent T-Test for categorical (2 groups) vs numerical variable.
        
        Args:
            categorical_var: Categorical variable with 2 groups
            numerical_var: Numerical variable
            
        Returns:
            Dictionary containing test results
        """
        try:
            # Get unique groups
            groups = self.df[categorical_var].unique()
            
            if len(groups) != 2:
                return {
                    "error": f"T-test requires exactly 2 groups, found {len(groups)}. Use ANOVA instead."
                }
            
            # Split data by groups
            group1_data = self.df[self.df[categorical_var] == groups[0]][numerical_var].dropna()
            group2_data = self.df[self.df[categorical_var] == groups[1]][numerical_var].dropna()
            
            # Perform t-test
            t_stat, p_value = ttest_ind(group1_data, group2_data)
            
            # Calculate Cohen's d (effect size)
            cohens_d = self._calculate_cohens_d(group1_data, group2_data)
            
            return {
                "test_name": "Independent Samples T-Test",
                "test_type": "categorical_vs_numerical",
                "categorical_variable": categorical_var,
                "numerical_variable": numerical_var,
                "group_1": str(groups[0]),
                "group_2": str(groups[1]),
                "group_1_mean": round(group1_data.mean(), 4),
                "group_2_mean": round(group2_data.mean(), 4),
                "group_1_std": round(group1_data.std(), 4),
                "group_2_std": round(group2_data.std(), 4),
                "group_1_n": int(len(group1_data)),
                "group_2_n": int(len(group2_data)),
                "t_statistic": round(t_stat, 4),
                "p_value": round(p_value, 6),
                "cohens_d": round(cohens_d, 4),
                "interpretation": self._interpret_p_value(p_value),
                "effect_size_interpretation": self._interpret_cohens_d(cohens_d)
            }
        except Exception as e:
            return {"error": f"T-test failed: {str(e)}"}
    
    def anova_test(self, categorical_var: str, numerical_var: str) -> dict:
        """
        Perform One-Way ANOVA for categorical (3+ groups) vs numerical variable.
        
        Args:
            categorical_var: Categorical variable with 3 or more groups
            numerical_var: Numerical variable
            
        Returns:
            Dictionary containing test results
        """
        try:
            # Get unique groups
            groups = self.df[categorical_var].unique()
            
            if len(groups) < 2:
                return {"error": "ANOVA requires at least 2 groups"}
            
            # Prepare data for each group
            group_data = [
                self.df[self.df[categorical_var] == group][numerical_var].dropna()
                for group in groups
            ]
            
            # Perform ANOVA
            f_stat, p_value = f_oneway(*group_data)
            
            # Calculate eta squared (effect size)
            eta_squared = self._calculate_eta_squared(group_data)
            
            # Calculate group statistics
            group_stats = {
                str(group): {
                    "mean": round(data.mean(), 4),
                    "std": round(data.std(), 4),
                    "n": int(len(data))
                }
                for group, data in zip(groups, group_data)
            }
            
            return {
                "test_name": "One-Way ANOVA",
                "test_type": "categorical_vs_numerical",
                "categorical_variable": categorical_var,
                "numerical_variable": numerical_var,
                "num_groups": len(groups),
                "groups": [str(g) for g in groups],
                "f_statistic": round(f_stat, 4),
                "p_value": round(p_value, 6),
                "eta_squared": round(eta_squared, 4),
                "group_statistics": group_stats,
                "interpretation": self._interpret_p_value(p_value),
                "effect_size_interpretation": self._interpret_eta_squared(eta_squared)
            }
        except Exception as e:
            return {"error": f"ANOVA test failed: {str(e)}"}
    
    def pearson_correlation(self, var1: str, var2: str) -> dict:
        """
        Perform Pearson Correlation for two numerical variables.
        
        Args:
            var1: First numerical variable
            var2: Second numerical variable
            
        Returns:
            Dictionary containing test results
        """
        try:
            # Get clean data (remove NaN)
            clean_data = self.df[[var1, var2]].dropna()
            
            # Perform Pearson correlation
            r, p_value = pearsonr(clean_data[var1], clean_data[var2])
            
            return {
                "test_name": "Pearson Correlation",
                "test_type": "numerical_vs_numerical",
                "variable_1": var1,
                "variable_2": var2,
                "correlation_coefficient": round(r, 4),
                "p_value": round(p_value, 6),
                "sample_size": int(len(clean_data)),
                "r_squared": round(r**2, 4),
                "interpretation": self._interpret_p_value(p_value),
                "correlation_strength": self._interpret_correlation(r),
                "direction": "positive" if r > 0 else "negative" if r < 0 else "none"
            }
        except Exception as e:
            return {"error": f"Pearson correlation failed: {str(e)}"}
    
    def spearman_correlation(self, var1: str, var2: str) -> dict:
        """
        Perform Spearman Correlation for two variables (ranked/ordinal).
        
        Args:
            var1: First variable
            var2: Second variable
            
        Returns:
            Dictionary containing test results
        """
        try:
            # Get clean data (remove NaN)
            clean_data = self.df[[var1, var2]].dropna()
            
            # Perform Spearman correlation
            rho, p_value = spearmanr(clean_data[var1], clean_data[var2])
            
            return {
                "test_name": "Spearman Correlation",
                "test_type": "numerical_vs_numerical (nonlinear/ordinal)",
                "variable_1": var1,
                "variable_2": var2,
                "spearman_rho": round(rho, 4),
                "p_value": round(p_value, 6),
                "sample_size": int(len(clean_data)),
                "interpretation": self._interpret_p_value(p_value),
                "correlation_strength": self._interpret_correlation(rho),
                "direction": "positive" if rho > 0 else "negative" if rho < 0 else "none"
            }
        except Exception as e:
            return {"error": f"Spearman correlation failed: {str(e)}"}
    
    def execute_hypothesis_test(self, hypothesis: dict) -> dict:
        """
        Execute the appropriate statistical test based on hypothesis variable types.
        
        Args:
            hypothesis: Single hypothesis dictionary from hypothesis agent
            
        Returns:
            Dictionary containing all statistical test results
        """
        var1 = hypothesis.get('variable_1', '').lower()
        var2 = hypothesis.get('variable_2', '').lower()
        var1_type = hypothesis.get('variable_1_type', '')
        var2_type = hypothesis.get('variable_2_type', '')
        
        # Validate variables exist in dataset
        if var1 not in self.df.columns or var2 not in self.df.columns:
            return {
                "error": f"Variables not found in dataset. Available: {list(self.df.columns)}"
            }
        
        results = {
            "hypothesis_id": hypothesis.get('hypothesis_id'),
            "null_hypothesis": hypothesis.get('null_hypothesis'),
            "alternative_hypothesis": hypothesis.get('alternative_hypothesis'),
            "variable_1": var1,
            "variable_2": var2,
            "variable_1_type": var1_type,
            "variable_2_type": var2_type,
            "recommended_test": hypothesis.get('recommended_test'),
            "statistical_results": {}
        }
        
        # Determine and execute appropriate test
        if var1_type == "categorical" and var2_type == "categorical":
            results["statistical_results"] = self.chi_square_test(var1, var2)
            
        elif var1_type == "categorical" and var2_type == "numerical":
            # Check number of groups for t-test vs ANOVA
            num_groups = self.df[var1].nunique()
            if num_groups == 2:
                results["statistical_results"] = self.t_test(var1, var2)
            else:
                results["statistical_results"] = self.anova_test(var1, var2)
                
        elif var1_type == "numerical" and var2_type == "categorical":
            # Check number of groups for t-test vs ANOVA
            num_groups = self.df[var2].nunique()
            if num_groups == 2:
                results["statistical_results"] = self.t_test(var2, var1)
            else:
                results["statistical_results"] = self.anova_test(var2, var1)
                
        elif var1_type == "numerical" and var2_type == "numerical":
            # Run both Pearson and Spearman
            results["statistical_results"]["pearson"] = self.pearson_correlation(var1, var2)
            results["statistical_results"]["spearman"] = self.spearman_correlation(var1, var2)
        
        return results
    
    # Helper methods for effect size calculations and interpretations
    
    def _calculate_cohens_d(self, group1, group2):
        """Calculate Cohen's d effect size."""
        n1, n2 = len(group1), len(group2)
        var1, var2 = group1.var(), group2.var()
        pooled_std = np.sqrt(((n1 - 1) * var1 + (n2 - 1) * var2) / (n1 + n2 - 2))
        return (group1.mean() - group2.mean()) / pooled_std if pooled_std > 0 else 0
    
    def _calculate_eta_squared(self, group_data):
        """Calculate eta squared effect size for ANOVA."""
        all_data = np.concatenate(group_data)
        grand_mean = all_data.mean()
        ss_between = sum(len(group) * (group.mean() - grand_mean)**2 for group in group_data)
        ss_total = sum((all_data - grand_mean)**2)
        return ss_between / ss_total if ss_total > 0 else 0
    
    def _interpret_p_value(self, p_value, alpha=0.05):
        """Interpret p-value significance."""
        if p_value < 0.001:
            return f"Highly significant (p < 0.001) - Strong evidence against null hypothesis"
        elif p_value < 0.01:
            return f"Very significant (p < 0.01) - Strong evidence against null hypothesis"
        elif p_value < alpha:
            return f"Significant (p < {alpha}) - Reject null hypothesis"
        else:
            return f"Not significant (p >= {alpha}) - Fail to reject null hypothesis"
    
    def _interpret_cohens_d(self, d):
        """Interpret Cohen's d effect size."""
        abs_d = abs(d)
        if abs_d < 0.2:
            return "Negligible effect"
        elif abs_d < 0.5:
            return "Small effect"
        elif abs_d < 0.8:
            return "Medium effect"
        else:
            return "Large effect"
    
    def _interpret_eta_squared(self, eta_sq):
        """Interpret eta squared effect size."""
        if eta_sq < 0.01:
            return "Negligible effect"
        elif eta_sq < 0.06:
            return "Small effect"
        elif eta_sq < 0.14:
            return "Medium effect"
        else:
            return "Large effect"
    
    def _interpret_cramers_v(self, v):
        """Interpret Cram√©r's V effect size."""
        if v < 0.1:
            return "Negligible association"
        elif v < 0.3:
            return "Weak association"
        elif v < 0.5:
            return "Moderate association"
        else:
            return "Strong association"
    
    def _interpret_correlation(self, r):
        """Interpret correlation coefficient strength."""
        abs_r = abs(r)
        if abs_r < 0.1:
            return "Negligible correlation"
        elif abs_r < 0.3:
            return "Weak correlation"
        elif abs_r < 0.5:
            return "Moderate correlation"
        elif abs_r < 0.7:
            return "Strong correlation"
        else:
            return "Very strong correlation"

print("‚úÖ StatsAgent class defined")

‚úÖ StatsAgent class defined


## Test Stats Agent with Generated Hypotheses

In [36]:
# Initialize Stats Agent
stats_agent = StatsAgent(df)

print("‚úÖ Stats Agent initialized with HR dataset")
print(f"üìä Dataset shape: {df.shape}")

‚úÖ Stats Agent initialized with HR dataset
üìä Dataset shape: (1470, 35)


In [37]:
# Test with a single hypothesis from previously generated results
if result_1.get('hypotheses'):
    # Get the first hypothesis
    single_hypothesis = result_1['hypotheses'][0]
    
    print("üß™ Testing single hypothesis:")
    print(f"   H0: {single_hypothesis['null_hypothesis']}")
    print(f"   H1: {single_hypothesis['alternative_hypothesis']}")
    print(f"\n‚è≥ Running statistical test...\n")
    
    # Execute test
    test_result = stats_agent.execute_hypothesis_test(single_hypothesis)
    
    # Display results
    print(json.dumps(test_result, indent=2))
else:
    print("‚ö†Ô∏è No hypotheses available. Please run hypothesis generation first.")

üß™ Testing single hypothesis:
   H0: There is no significant relationship between job satisfaction and employee attrition.
   H1: Employees with lower job satisfaction are more likely to leave the organization.

‚è≥ Running statistical test...

{
  "hypothesis_id": 1,
  "null_hypothesis": "There is no significant relationship between job satisfaction and employee attrition.",
  "alternative_hypothesis": "Employees with lower job satisfaction are more likely to leave the organization.",
  "variable_1": "jobsatisfaction",
  "variable_2": "attrition",
  "variable_1_type": "numerical",
  "variable_2_type": "categorical",
  "recommended_test": "Chi-square test",
  "statistical_results": {
    "test_name": "Independent Samples T-Test",
    "test_type": "categorical_vs_numerical",
    "categorical_variable": "attrition",
    "numerical_variable": "jobsatisfaction",
    "group_1": "Yes",
    "group_2": "No",
    "group_1_mean": 2.4684,
    "group_2_mean": 2.7786,
    "group_1_std": 1.1181,
 

In [38]:
# Function to execute all hypotheses and collage results
def execute_all_hypotheses(hypotheses_result: dict, stats_agent: StatsAgent) -> dict:
    """
    Execute statistical tests for all hypotheses and collage results in JSON format.
    
    Args:
        hypotheses_result: Output from hypothesis generation agent
        stats_agent: Initialized StatsAgent instance
        
    Returns:
        Dictionary containing all hypothesis test results
    """
    if "error" in hypotheses_result:
        return {"error": hypotheses_result["error"]}
    
    hypotheses = hypotheses_result.get("hypotheses", [])
    
    if not hypotheses:
        return {"error": "No hypotheses to test"}
    
    all_results = {
        "summary": {
            "total_hypotheses": len(hypotheses),
            "execution_timestamp": pd.Timestamp.now().isoformat(),
            "dataset_shape": stats_agent.df.shape
        },
        "hypothesis_results": []
    }
    
    for i, hypothesis in enumerate(hypotheses, 1):
        print(f"\n{'='*80}")
        print(f"üß™ Testing Hypothesis #{i}")
        print(f"{'='*80}")
        print(f"Variables: {hypothesis['variable_1']} vs {hypothesis['variable_2']}")
        print(f"Test: {hypothesis['recommended_test']}")
        
        # Execute test
        result = stats_agent.execute_hypothesis_test(hypothesis)
        all_results["hypothesis_results"].append(result)
        
        # Print summary
        if "error" in result.get("statistical_results", {}):
            print(f"‚ùå Error: {result['statistical_results']['error']}")
        else:
            stats_res = result.get("statistical_results", {})
            
            # Handle different result structures
            if "pearson" in stats_res:
                # Numerical vs Numerical
                print(f"‚úÖ Pearson r = {stats_res['pearson'].get('correlation_coefficient')}, "
                      f"p = {stats_res['pearson'].get('p_value')}")
                print(f"   {stats_res['pearson'].get('interpretation')}")
            elif "p_value" in stats_res:
                # Single test result
                print(f"‚úÖ p-value = {stats_res.get('p_value')}")
                print(f"   {stats_res.get('interpretation')}")
    
    print(f"\n{'='*80}")
    print(f"‚úÖ All {len(hypotheses)} hypotheses tested successfully!")
    print(f"{'='*80}\n")
    
    return all_results

print("‚úÖ Batch execution function defined")

‚úÖ Batch execution function defined


In [39]:
# Execute all hypotheses from result_1
all_stats_results = execute_all_hypotheses(result_1, stats_agent)

# Display complete results in JSON
print("\nüìã COMPLETE STATISTICAL RESULTS (JSON Format):")
print("=" * 80)
print(json.dumps(all_stats_results, indent=2))


üß™ Testing Hypothesis #1
Variables: jobsatisfaction vs attrition
Test: Chi-square test
‚úÖ p-value = 7e-05
   Highly significant (p < 0.001) - Strong evidence against null hypothesis

üß™ Testing Hypothesis #2
Variables: yearsatcompany vs attrition
Test: Chi-square test
‚úÖ p-value = 0.0
   Highly significant (p < 0.001) - Strong evidence against null hypothesis

üß™ Testing Hypothesis #3
Variables: overtime vs attrition
Test: Chi-square test
‚úÖ p-value = 0.0
   Highly significant (p < 0.001) - Strong evidence against null hypothesis

‚úÖ All 3 hypotheses tested successfully!


üìã COMPLETE STATISTICAL RESULTS (JSON Format):
{
  "summary": {
    "total_hypotheses": 3,
    "execution_timestamp": "2025-10-31T19:36:39.178259",
    "dataset_shape": [
      1470,
      35
    ]
  },
  "hypothesis_results": [
    {
      "hypothesis_id": 1,
      "null_hypothesis": "There is no significant relationship between job satisfaction and employee attrition.",
      "alternative_hypothesis": 

In [40]:
# Pretty print function for statistical results
def print_stats_results(all_results: dict):
    """
    Pretty print statistical test results in a human-readable format.
    
    Args:
        all_results: Dictionary containing all hypothesis test results
    """
    if "error" in all_results:
        print(f"‚ùå Error: {all_results['error']}")
        return
    
    summary = all_results.get("summary", {})
    results = all_results.get("hypothesis_results", [])
    
    print("\n" + "=" * 100)
    print(f"{'STATISTICAL TEST RESULTS SUMMARY':^100}")
    print("=" * 100)
    print(f"\nüìä Total Hypotheses Tested: {summary.get('total_hypotheses')}")
    print(f"‚è∞ Execution Time: {summary.get('execution_timestamp')}")
    print(f"üìà Dataset: {summary.get('dataset_shape')[0]} rows √ó {summary.get('dataset_shape')[1]} columns")
    print("\n" + "=" * 100)
    
    for i, result in enumerate(results, 1):
        print(f"\n{'‚îÄ' * 100}")
        print(f"HYPOTHESIS #{result.get('hypothesis_id', i)}")
        print(f"{'‚îÄ' * 100}")
        
        print(f"\n‚ùå H0: {result.get('null_hypothesis')}")
        print(f"‚úÖ H1: {result.get('alternative_hypothesis')}")
        
        print(f"\nüîπ Variables:")
        print(f"   ‚Ä¢ {result.get('variable_1')} ({result.get('variable_1_type')})")
        print(f"   ‚Ä¢ {result.get('variable_2')} ({result.get('variable_2_type')})")
        
        stats_res = result.get("statistical_results", {})
        
        if "error" in stats_res:
            print(f"\n‚ùå Test Error: {stats_res['error']}")
            continue
        
        # Handle different test types
        if "pearson" in stats_res and "spearman" in stats_res:
            # Numerical vs Numerical - both tests
            print(f"\nüìà PEARSON CORRELATION TEST")
            pearson = stats_res["pearson"]
            print(f"   ‚Ä¢ Correlation Coefficient (r): {pearson.get('correlation_coefficient')}")
            print(f"   ‚Ä¢ R-squared: {pearson.get('r_squared')}")
            print(f"   ‚Ä¢ P-value: {pearson.get('p_value')}")
            print(f"   ‚Ä¢ Sample Size: {pearson.get('sample_size')}")
            print(f"   ‚Ä¢ Direction: {pearson.get('direction')}")
            print(f"   ‚Ä¢ Strength: {pearson.get('correlation_strength')}")
            print(f"   ‚Ä¢ üìä {pearson.get('interpretation')}")
            
            print(f"\nüìà SPEARMAN CORRELATION TEST")
            spearman = stats_res["spearman"]
            print(f"   ‚Ä¢ Spearman's Rho (œÅ): {spearman.get('spearman_rho')}")
            print(f"   ‚Ä¢ P-value: {spearman.get('p_value')}")
            print(f"   ‚Ä¢ Sample Size: {spearman.get('sample_size')}")
            print(f"   ‚Ä¢ Direction: {spearman.get('direction')}")
            print(f"   ‚Ä¢ Strength: {spearman.get('correlation_strength')}")
            print(f"   ‚Ä¢ üìä {spearman.get('interpretation')}")
            
        elif stats_res.get("test_name") == "Chi-Square Test of Independence":
            # Categorical vs Categorical
            print(f"\nüî≤ CHI-SQUARE TEST OF INDEPENDENCE")
            print(f"   ‚Ä¢ Chi-square Statistic: {stats_res.get('chi2_statistic')}")
            print(f"   ‚Ä¢ P-value: {stats_res.get('p_value')}")
            print(f"   ‚Ä¢ Degrees of Freedom: {stats_res.get('degrees_of_freedom')}")
            print(f"   ‚Ä¢ Cram√©r's V: {stats_res.get('cramers_v')}")
            print(f"   ‚Ä¢ Sample Size: {stats_res.get('sample_size')}")
            print(f"   ‚Ä¢ Effect Size: {stats_res.get('effect_size_interpretation')}")
            print(f"   ‚Ä¢ üìä {stats_res.get('interpretation')}")
            
        elif stats_res.get("test_name") == "Independent Samples T-Test":
            # Categorical (2 groups) vs Numerical
            print(f"\nüìä INDEPENDENT SAMPLES T-TEST")
            print(f"   ‚Ä¢ Groups: {stats_res.get('group_1')} vs {stats_res.get('group_2')}")
            print(f"   ‚Ä¢ Group 1 Mean: {stats_res.get('group_1_mean')} (SD: {stats_res.get('group_1_std')}, n={stats_res.get('group_1_n')})")
            print(f"   ‚Ä¢ Group 2 Mean: {stats_res.get('group_2_mean')} (SD: {stats_res.get('group_2_std')}, n={stats_res.get('group_2_n')})")
            print(f"   ‚Ä¢ T-statistic: {stats_res.get('t_statistic')}")
            print(f"   ‚Ä¢ P-value: {stats_res.get('p_value')}")
            print(f"   ‚Ä¢ Cohen's d: {stats_res.get('cohens_d')}")
            print(f"   ‚Ä¢ Effect Size: {stats_res.get('effect_size_interpretation')}")
            print(f"   ‚Ä¢ üìä {stats_res.get('interpretation')}")
            
        elif stats_res.get("test_name") == "One-Way ANOVA":
            # Categorical (3+ groups) vs Numerical
            print(f"\nüìä ONE-WAY ANOVA")
            print(f"   ‚Ä¢ Number of Groups: {stats_res.get('num_groups')}")
            print(f"   ‚Ä¢ Groups: {', '.join(stats_res.get('groups', []))}")
            print(f"   ‚Ä¢ F-statistic: {stats_res.get('f_statistic')}")
            print(f"   ‚Ä¢ P-value: {stats_res.get('p_value')}")
            print(f"   ‚Ä¢ Eta-squared (Œ∑¬≤): {stats_res.get('eta_squared')}")
            print(f"   ‚Ä¢ Effect Size: {stats_res.get('effect_size_interpretation')}")
            print(f"   ‚Ä¢ üìä {stats_res.get('interpretation')}")
            
            print(f"\n   Group Statistics:")
            for group, stats in stats_res.get('group_statistics', {}).items():
                print(f"     ‚Ä¢ {group}: Mean={stats['mean']}, SD={stats['std']}, n={stats['n']}")
    
    print("\n" + "=" * 100)
    print(f"{'END OF STATISTICAL ANALYSIS':^100}")
    print("=" * 100 + "\n")

# Test pretty print
print_stats_results(all_stats_results)


                                  STATISTICAL TEST RESULTS SUMMARY                                  

üìä Total Hypotheses Tested: 3
‚è∞ Execution Time: 2025-10-31T19:36:39.178259
üìà Dataset: 1470 rows √ó 35 columns


‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
HYPOTHESIS #1
‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ

‚ùå H0: There is no significant relationship between job satisfaction and employee attrition.
‚úÖ H1: Employees with lower job satisfaction are more likely to le

## Save Statistical Results to JSON

In [41]:
# Save statistical results to JSON file
stats_output_file = "statistical_test_results.json"

with open(stats_output_file, 'w') as f:
    json.dump(all_stats_results, f, indent=2)

print(f"‚úÖ Statistical test results saved to {stats_output_file}")
print(f"üìä File contains {len(all_stats_results.get('hypothesis_results', []))} hypothesis test results")

‚úÖ Statistical test results saved to statistical_test_results.json
üìä File contains 3 hypothesis test results


## End-to-End Example: Question ‚Üí Hypotheses ‚Üí Statistical Tests

In [42]:
# Complete end-to-end workflow
def analyze_research_question(user_question: str, num_hypotheses: int = 3):
    """
    Complete pipeline: Generate hypotheses and execute statistical tests.
    
    Args:
        user_question: User's research question
        num_hypotheses: Number of hypotheses to generate
    
    Returns:
        Tuple of (hypotheses_dict, statistical_results_dict)
    """
    print("=" * 100)
    print(f"{'RESEARCH QUESTION ANALYSIS PIPELINE':^100}")
    print("=" * 100)
    print(f"\n‚ùì Research Question: {user_question}\n")
    
    # Step 1: Generate Hypotheses
    print("STEP 1: GENERATING HYPOTHESES")
    print("-" * 100)
    hypotheses = generate_hypotheses(user_question, num_hypotheses)
    
    if "error" in hypotheses:
        print(f"‚ùå Error generating hypotheses: {hypotheses['error']}")
        return hypotheses, {}
    
    print(f"‚úÖ Generated {len(hypotheses.get('hypotheses', []))} hypotheses\n")
    
    # Step 2: Execute Statistical Tests
    print("STEP 2: EXECUTING STATISTICAL TESTS")
    print("-" * 100)
    stats_results = execute_all_hypotheses(hypotheses, stats_agent)
    
    # Step 3: Display Results
    print("\nSTEP 3: RESULTS SUMMARY")
    print("-" * 100)
    print_stats_results(stats_results)
    
    return hypotheses, stats_results


# Example: Run complete analysis
new_question = "What factors are most strongly associated with employee attrition?"
hypotheses_output, stats_output = analyze_research_question(new_question, num_hypotheses=4)

                                RESEARCH QUESTION ANALYSIS PIPELINE                                 

‚ùì Research Question: What factors are most strongly associated with employee attrition?

STEP 1: GENERATING HYPOTHESES
----------------------------------------------------------------------------------------------------
‚úÖ Generated 4 hypotheses

STEP 2: EXECUTING STATISTICAL TESTS
----------------------------------------------------------------------------------------------------

üß™ Testing Hypothesis #1
Variables: jobsatisfaction vs attrition
Test: chi-square
‚úÖ p-value = 7e-05
   Highly significant (p < 0.001) - Strong evidence against null hypothesis

üß™ Testing Hypothesis #2
Variables: yearsatcompany vs attrition
Test: chi-square
‚úÖ p-value = 0.0
   Highly significant (p < 0.001) - Strong evidence against null hypothesis

üß™ Testing Hypothesis #3
Variables: overtime vs attrition
Test: chi-square
‚úÖ p-value = 0.0
   Highly significant (p < 0.001) - Strong evidence agai

## Summary

### Stats Agent Features

The Stats Agent automatically:
1. **Detects variable types** (categorical vs numerical)
2. **Selects appropriate statistical test**:
   - **Chi-Square Test**: Categorical vs Categorical
   - **T-Test**: Categorical (2 groups) vs Numerical
   - **ANOVA**: Categorical (3+ groups) vs Numerical
   - **Pearson Correlation**: Numerical vs Numerical (linear)
   - **Spearman Correlation**: Numerical vs Numerical (non-linear/ordinal)
3. **Calculates effect sizes**:
   - Cram√©r's V (Chi-square)
   - Cohen's d (T-test)
   - Eta-squared (ANOVA)
   - R-squared (Correlation)
4. **Provides interpretations** for p-values and effect sizes
5. **Returns results in JSON format** for easy integration

### Usage

```python
# Single hypothesis test
result = stats_agent.execute_hypothesis_test(single_hypothesis)

# Batch testing all hypotheses
all_results = execute_all_hypotheses(hypotheses_result, stats_agent)

# Complete pipeline (Question ‚Üí Hypotheses ‚Üí Tests)
hypotheses, stats = analyze_research_question("Your question here", num_hypotheses=3)
```