#### Why Prompt Engineering Matters

- Consistency: Get reliable insights every time
- Accuracy: Reduce AI hallucinations and errors
- Efficiency: Better results with fewer API calls
- Business value: Insights that drive real decisions

#### Chain-of-Thought Prompting for Data Analysis:
What it is: A prompting technique that asks the AI to show its reasoning process step-by-step, like thinking out loud.

How it works:
- You explicitly ask the AI to break down its analysis into logical steps
- The AI explains its reasoning at each stage
- Creates a transparent pathway from data to insights

In [None]:
def chain_of_thought_analysis(df, business_question):
    """Use chain-of-thought prompting for deeper analysis"""
    
    prompt = f"""
    I need to analyze this business data step by step.
    
    Data overview:
    - Shape: {df.shape}
    - Columns: {list(df.columns)}
    - Sample: {df.head(3).to_dict('records')}
    
    Business question: {business_question}
    
    Please analyze this step by step:
    
    Step 1: Data Understanding
    - What does each column represent?
    - What is the grain of this data?
    - What time period does this cover?
    
    Step 2: Pattern Identification
    - What patterns do you see in the numbers?
    - Are there any obvious trends or seasonality?
    - What stands out as unusual?
    
    Step 3: Business Context
    - What business processes generated this data?
    - What external factors might influence these numbers?
    - What are the key business metrics here?
    
    Step 4: Insights and Recommendations
    - What are the 3 most important insights?
    - What actions should the business take?
    - What additional data would be helpful?
    
    Step 5: Risk Assessment
    - What assumptions am I making?
    - What could I be missing?
    - How confident am I in these insights?
    
    Walk through each step clearly and show your reasoning.
    """
    
    try:
        response = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "system", "content": "You are a senior data analyst who thinks step-by-step and shows all reasoning."},
                {"role": "user", "content": prompt}
            ],
            max_tokens=800,
            temperature=0.2
        )
        
        return response.choices[0].message.content
    
    except Exception as e:
        return f"Error in analysis: {e}"

# Test chain-of-thought analysis
cot_analysis = chain_of_thought_analysis(sales_data, "How can we improve our regional sales performance?")
print("Chain-of-Thought Analysis:")
print(cot_analysis)

#### Few-Shot Learning for Consistent Insights
What it is: A technique where you provide the AI with examples of the exact format and quality you want, then ask it to follow that pattern.

How it works:
- You show 2-3 examples of perfect responses
- The AI learns the pattern and applies it to new data
- Ensures consistent output format and quality

In [None]:
def few_shot_data_analysis(df, analysis_request):
    """Use few-shot learning for consistent analysis format"""
    
    # Provide examples of good analysis
    examples = """
    Example 1:
    Data: Customer retention by month
    Analysis:
    - TREND: Retention declining from 85% to 78% over 6 months
    - ROOT CAUSE: Likely onboarding issues or product changes in March
    - IMPACT: $50K monthly revenue at risk if trend continues
    - ACTION: Investigate March product changes and improve onboarding
    - METRIC: Track 30-day retention rate weekly
    
    Example 2:
    Data: Website conversion rates by source
    Analysis:
    - TREND: Paid search converting 3.2% vs 1.8% organic
    - ROOT CAUSE: Better keyword targeting and landing page alignment
    - IMPACT: Paid search ROI is 180%, organic needs improvement
    - ACTION: Apply paid search insights to organic content strategy
    - METRIC: Monitor conversion rate by source weekly
    """
    
    data_summary = f"""
    Current Data:
    Shape: {df.shape}
    Columns: {list(df.columns)}
    Sample: {df.head(3).to_dict('records')}
    
    Analysis Request: {analysis_request}
    """
    
    prompt = f"""
    {examples}
    
    Now analyze this data using the same format (TREND, ROOT CAUSE, IMPACT, ACTION, METRIC):
    
    {data_summary}
    
    Provide a structured analysis following the exact format shown in the examples.
    """
    
    try:
        response = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "system", "content": "You are a data analyst who provides structured insights using the TREND-ROOT CAUSE-IMPACT-ACTION-METRIC format."},
                {"role": "user", "content": prompt}
            ],
            max_tokens=400,
            temperature=0.1
        )
        
        return response.choices[0].message.content
    
    except Exception as e:
        return f"Error in few-shot analysis: {e}"

# Test few-shot analysis
few_shot_results = few_shot_data_analysis(sales_data, "Analyze regional sales performance for strategic planning")
print("Few-Shot Analysis:")
print(few_shot_results)

#### When to Use Each
Use Chain-of-Thought when:

- Analyzing complex business problems
- You need to validate AI reasoning
- Working with unfamiliar data patterns
- Explaining analysis to stakeholders

Use Few-Shot Learning when:

- Creating standardized reports
- Need consistent analysis format
- Training AI for specific business contexts
- Ensuring all key metrics are included

Combining Both Techniques

The most powerful approach often combines both:
```python
def hybrid_analysis(df, question):
    """Combine chain-of-thought reasoning with few-shot formatting"""
    
    prompt = f"""
    First, here are examples of our standard analysis format:
    
    Example: Sales Analysis
    - TREND: Clear pattern description
    - ROOT CAUSE: Likely explanation
    - IMPACT: Business consequences
    - ACTION: Specific next steps
    - METRIC: How to track progress
    
    Now, analyze this data step-by-step, then format your final answer using the standard format above:
    
    Step 1: Examine the data patterns...
    Step 2: Consider business context...
    Step 3: Identify key insights...
    
    Then provide your final analysis in the TREND-ROOT CAUSE-IMPACT-ACTION-METRIC format.
    
    Data: {df.head().to_string()}
    Question: {question}
    """
```

#### Role-Based Prompting for Different Audiences

##### Role-Based Prompting Explained
What it is: Instructing the AI to adopt a specific professional persona and perspective when analyzing data.

How it works:
- You tell the AI to "act as" a specific role (CEO, Sales Manager, Data Scientist, etc.)
- The AI adjusts its language, focus areas, and recommendations to match that role's needs
- Same data gets interpreted through different professional lenses

When to Use Role-Based Prompting

Primary Use Cases:
- Multi-Stakeholder Reports: Same analysis for different audiences
- Presentation Adaptation: Adjusting technical findings for business leaders
- Decision Support: Getting perspectives from different functional areas
- Training AI Systems: Teaching AI to understand different business contexts

In [None]:
class RoleBasedAnalyst:
    """Generate analysis for different business roles"""
    
    def __init__(self):
        self.roles = {
            'ceo': {
                'persona': "You are a CEO who needs high-level strategic insights.",
                'focus': "Strategic implications, competitive advantage, major risks and opportunities",
                'format': "Executive summary with 3 key points and recommended actions"
            },
            'sales_manager': {
                'persona': "You are a Sales Manager who needs actionable sales insights.",
                'focus': "Sales performance, team productivity, customer behavior, revenue optimization",
                'format': "Tactical recommendations with specific metrics and targets"
            },
            'data_scientist': {
                'persona': "You are a Data Scientist who needs technical analysis.",
                'focus': "Statistical patterns, data quality, modeling opportunities, technical recommendations",
                'format': "Technical analysis with statistical insights and methodology suggestions"
            },
            'operations': {
                'persona': "You are an Operations Manager who needs process insights.",
                'focus': "Efficiency, bottlenecks, resource allocation, process improvements",
                'format': "Operational recommendations with process optimization focus"
            }
        }
    
    def analyze_for_role(self, df, role, question):
        """Generate role-specific analysis"""
        
        if role not in self.roles:
            return f"Role '{role}' not supported. Available roles: {list(self.roles.keys())}"
        
        role_config = self.roles[role]
        
        data_context = f"""
        Data Summary:
        - Records: {df.shape[0]}
        - Metrics: {list(df.columns)}
        - Sample: {df.head(2).to_dict('records')}
        """
        
        prompt = f"""
        {role_config['persona']}
        
        Focus on: {role_config['focus']}
        Format: {role_config['format']}
        
        {data_context}
        
        Question: {question}
        
        Provide analysis appropriate for a {role.replace('_', ' ').title()}.
        """
        
        try:
            response = openai.ChatCompletion.create(
                model="gpt-3.5-turbo",
                messages=[
                    {"role": "system", "content": role_config['persona']},
                    {"role": "user", "content": prompt}
                ],
                max_tokens=400,
                temperature=0.3
            )
            
            return response.choices[0].message.content
        
        except Exception as e:
            return f"Error in role-based analysis: {e}"

# Test role-based analysis
role_analyst = RoleBasedAnalyst()

# Same data, different perspectives
ceo_view = role_analyst.analyze_for_role(sales_data, 'ceo', 'What should we prioritize for next quarter?')
sales_view = role_analyst.analyze_for_role(sales_data, 'sales_manager', 'How can we improve team performance?')

print("CEO Perspective:")
print(ceo_view)
print("\nSales Manager Perspective:")
print(sales_view)

#### How All Three Techniques Relate

Think of them as layered approaches that can be combined:

- Role-Based Prompting = WHO is doing the analysis
- Chain-of-Thought = HOW to think through the problem  
- Few-Shot Learning = WHAT the output format should be

### Production-Ready Prompting Patterns
##### Template-Based Prompt Management

Use any of the four template types by changing the first argument in `analyze_with_template()`. Each template expects different keyword arguments based on its `variables` list.  Pull the data to a string format for each templates required `variables` (defined in the `variables` list), to provide the right keyword arguments for whichever template chosen to execute. 

The templates are taking pandas analysis results and asking an LLM to provide business interpretation of the patterns found.

In [None]:
class PromptTemplateManager:
    """Manage reusable prompt templates for consistent analysis"""
    
    def __init__(self):
        self.templates = {
            'sql_explainer': {
                'system': "You are a database expert who explains SQL queries in business terms.",
                'template': """
                SQL Query:
                {query}
                
                Results Summary:
                - Records returned: {row_count}
                - Key columns: {columns}
                
                Please explain:
                1. What business question this query answers
                2. Key insights from the results
                3. Potential follow-up questions
                4. Any data quality concerns
                
                Keep explanations clear for non-technical stakeholders.
                """,
                'variables': ['query', 'row_count', 'columns']
            },
            
            'anomaly_detector': {
                'system': "You are a statistical analyst expert at identifying unusual patterns in business data.",
                'template': """
                Dataset: {dataset_name}
                Metric: {metric_name}
                
                Statistical Summary:
                - Mean: {mean:.2f}
                - Standard Deviation: {std:.2f}
                - Min/Max: {min_val:.2f} / {max_val:.2f}
                - Outlier threshold: {outlier_threshold:.2f}
                
                Potential Outliers:
                {outliers}
                
                Analysis needed:
                1. Are these outliers legitimate or data errors?
                2. What business events might explain unusual values?
                3. Should we investigate further?
                4. How might this impact business decisions?
                
                Focus on business implications, not just statistics.
                """,
                'variables': ['dataset_name', 'metric_name', 'mean', 'std', 'min_val', 'max_val', 'outlier_threshold', 'outliers']
            },
            
            'trend_analyzer': {
                'system': "You are a business analyst who identifies trends and predicts business implications.",
                'template': """
                Time Series Data: {data_description}
                Time Period: {time_period}
                
                Trend Analysis:
                {trend_data}
                
                Please provide:
                1. TREND IDENTIFICATION: What patterns do you see?
                2. BUSINESS DRIVERS: What might be causing these trends?
                3. FUTURE IMPLICATIONS: What should we expect next?
                4. RECOMMENDED ACTIONS: What should the business do?
                5. MONITORING METRICS: What should we track going forward?
                
                Consider seasonality, external factors, and business context.
                """,
                'variables': ['data_description', 'time_period', 'trend_data']
            },
            
            'comparative_analysis': {
                'system': "You are a strategic analyst who compares performance across different segments.",
                'template': """
                Comparison Analysis: {analysis_title}
                
                Segments Being Compared:
                {segments_data}
                
                Key Metrics:
                {metrics_summary}
                
                Provide structured comparison:
                1. TOP PERFORMERS: Which segments are leading and why?
                2. UNDERPERFORMERS: Which segments need attention?
                3. KEY DIFFERENCES: What explains the performance gaps?
                4. OPPORTUNITIES: Where can we improve quickly?
                5. RESOURCE ALLOCATION: How should we prioritize investments?
                
                Make recommendations specific and actionable.
                """,
                'variables': ['analysis_title', 'segments_data', 'metrics_summary']
            }
        }
    
    def get_prompt(self, template_name, **kwargs):
        """Generate prompt from template with variables"""
        if template_name not in self.templates:
            raise ValueError(f"Template '{template_name}' not found")
        
        template = self.templates[template_name]
        
        # Check required variables
        missing_vars = [var for var in template['variables'] if var not in kwargs]
        if missing_vars:
            raise ValueError(f"Missing required variables: {missing_vars}")
        
        # Format the template
        formatted_prompt = template['template'].format(**kwargs)
        
        return {
            'system': template['system'],
            'user': formatted_prompt
        }
    
    def analyze_with_template(self, template_name, **kwargs):
        """Execute analysis using template"""
        prompt_data = self.get_prompt(template_name, **kwargs)
        
        try:
            response = openai.ChatCompletion.create(
                model="gpt-3.5-turbo",
                messages=[
                    {"role": "system", "content": prompt_data['system']},
                    {"role": "user", "content": prompt_data['user']}
                ],
                max_tokens=600,
                temperature=0.2
            )
            
            return response.choices[0].message.content
        
        except Exception as e:
            return f"Template analysis error: {e}"

# Example usage
prompt_manager = PromptTemplateManager()

# Analyze SQL query results
sql_analysis = prompt_manager.analyze_with_template(
    'sql_explainer',
    query="SELECT region, SUM(revenue) FROM sales GROUP BY region ORDER BY SUM(revenue) DESC",
    row_count=5,
    columns="region, total_revenue"
)

print("SQL Query Analysis:")
print(sql_analysis)

#### Combining All Three Techniques: Roles, Few-shot and Chain-of-thought

Here's how they work together in practice:

AI-Enhanced SQL Reporter with all three techniques:

```python
def comprehensive_business_analysis(df, sql_query, target_audience, business_context):
    """
    Role-based: Adapt for target audience (CEO, Sales, Operations)
    Chain-of-thought: Show analytical reasoning  
    Few-shot: Consistent report format
    """
    
    # Role configuration
    role_configs = {
        'ceo': {
            'persona': 'strategic executive focused on competitive advantage',
            'priorities': ['revenue impact', 'market position', 'resource allocation']
        },
        'sales': {
            'persona': 'sales leader focused on team performance and targets',
            'priorities': ['quota attainment', 'pipeline health', 'conversion rates']
        }
    }
    
    # Few-shot examples for consistency
    example_format = """
    - INSIGHT: Key finding with context
    - IMPLICATION: What this means for the business
    - ACTION: Specific next step with timeline
    - METRIC: How to track success
    """
    
    # Chain-of-thought reasoning structure
    reasoning_steps = """
    Step 1: Data Quality Assessment
    Step 2: Pattern Recognition  
    Step 3: Business Context Integration
    Step 4: Strategic Recommendations
    """
    
    # This creates analysis that is:
    # - Appropriate for the audience (role-based)
    # - Shows clear reasoning (chain-of-thought)  
    # - Consistently formatted (few-shot)
```

Client Consulting Work:
```python
# Adapt your analysis to client's organizational structure
def client_analysis(df, client_role, urgency="normal"):
    roles = {
        'startup_ceo': "fast decisions, resource constraints, growth focus",
        'enterprise_manager': "risk mitigation, compliance, scalability", 
        'nonprofit_director': "impact measurement, donor relations, efficiency"
    }
```

In [None]:
import pyodbc
import pandas as pd
from openai import OpenAI
import os
import json
import time
from datetime import datetime
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

class AdvancedAIAnalyzer:
    """
    Combines Chain-of-Thought, Few-Shot Learning, and Role-Based prompting
    for comprehensive database analysis
    """
    
    def __init__(self, openai_api_key=None):
        """Initialize with OpenAI API key"""
        api_key = openai_api_key or os.getenv('OPENAI_API_KEY')
        
        if not api_key:
            raise ValueError("OpenAI API key required. Set OPENAI_API_KEY environment variable.")
        
        self.client = OpenAI(api_key=api_key)
        
        # Role configurations
        self.roles = {
            'ceo': {
                'persona': 'You are a CEO focused on strategic decisions, competitive advantage, and high-level business impact.',
                'focus': 'Strategic implications, market opportunities, major risks, resource allocation, competitive positioning',
                'format': 'Executive summary with strategic priorities and board-level recommendations',
                'priorities': ['revenue growth', 'market share', 'competitive advantage', 'strategic risks']
            },
            'sales_director': {
                'persona': 'You are a Sales Director focused on revenue optimization, team performance, and customer relationships.',
                'focus': 'Sales performance, customer behavior, pipeline health, territory management, conversion optimization',
                'format': 'Sales-focused insights with actionable tactics and performance metrics',
                'priorities': ['quota attainment', 'customer acquisition', 'deal velocity', 'team productivity']
            },
            'operations_manager': {
                'persona': 'You are an Operations Manager focused on efficiency, process optimization, and cost management.',
                'focus': 'Operational efficiency, process bottlenecks, cost optimization, resource utilization, quality metrics',
                'format': 'Operational recommendations with process improvements and efficiency gains',
                'priorities': ['cost reduction', 'process efficiency', 'quality improvement', 'resource optimization']
            },
            'data_scientist': {
                'persona': 'You are a Senior Data Scientist focused on statistical patterns, predictive opportunities, and technical insights.',
                'focus': 'Statistical significance, correlation patterns, predictive modeling opportunities, data quality, technical recommendations',
                'format': 'Technical analysis with statistical insights and modeling recommendations',
                'priorities': ['statistical patterns', 'predictive opportunities', 'data quality', 'analytical methods']
            }
        }
        
        # Few-shot examples for consistent formatting
        self.few_shot_examples = {
            'business_analysis': """
Example 1 - Customer Retention Analysis:
• TREND: Customer retention dropped from 87% to 82% over 6 months
• ROOT CAUSE: Product quality issues and increased competitor pricing pressure
• IMPACT: $2.3M annual revenue at risk if trend continues
• ACTION: Implement customer success program and review pricing strategy
• METRIC: Track monthly retention rate and customer satisfaction scores

Example 2 - Sales Performance Analysis:
• TREND: Q3 sales up 15% YoY but down 8% from Q2 seasonally adjusted
• ROOT CAUSE: Strong new customer acquisition offset by declining average deal size
• IMPACT: Revenue growth slowing, margin pressure from smaller deals
• ACTION: Focus on upselling existing customers and premium product positioning
• METRIC: Monitor average deal size and customer lifetime value monthly
""",
            'operational_analysis': """
Example 1 - Inventory Analysis:
• EFFICIENCY: Inventory turnover improved to 8.2x from 6.1x last year
• BOTTLENECK: Warehouse capacity constraining faster fulfillment in peak periods
• COST IMPACT: $450K in carrying cost savings, but $200K in lost sales from stockouts
• OPTIMIZATION: Implement demand forecasting and expand warehouse capacity by 25%
• KPI: Track inventory turnover, stockout rate, and fulfillment time

Example 2 - Order Processing Analysis:
• EFFICIENCY: Order processing time reduced from 2.1 to 1.4 days average
• BOTTLENECK: Manual credit checks causing 30% of delays
• COST IMPACT: Faster processing improved customer satisfaction by 12%
• OPTIMIZATION: Automate credit approval for orders under $5K
• KPI: Monitor processing time, manual intervention rate, customer satisfaction
"""
        }
    
    def connect_to_worldwideimporters(self, server='localhost\\SQLEXPRESS', database='WideWorldImporters'):
        """Connect to SQL Server WorldWideImporters database"""
        try:
            connection_string = f"""
            DRIVER={{ODBC Driver 17 for SQL Server}};
            SERVER={server};
            DATABASE={database};
            Trusted_Connection=yes;
            """
            
            self.connection = pyodbc.connect(connection_string)
            print(f"✅ Connected to {database} database successfully!")
            return True
            
        except Exception as e:
            print(f"❌ Database connection failed: {e}")
            print("\nTroubleshooting tips:")
            print("1. Ensure SQL Server is running")
            print("2. Verify database name: WideWorldImporters")
            print("3. Check if ODBC Driver 17 for SQL Server is installed")
            print("4. Try different server name format if needed")
            print("5. Try server name: 'localhost' or '.' or your computer name")
            return False
    
    def execute_query(self, query):
        """Execute SQL query and return DataFrame"""
        try:
            # Suppress the pandas warning by using the connection directly
            with pd.option_context('mode.chained_assignment', None):
                df = pd.read_sql_query(query, self.connection)
            return df
        except Exception as e:
            print(f"Query execution error: {e}")
            return None
    
    def chain_of_thought_analysis(self, df, query, business_question):
        """Apply chain-of-thought reasoning to data analysis"""
        
        data_summary = f"""
Data Overview:
- Records: {df.shape[0]:,}
- Columns: {list(df.columns)}
- Date Range: {self._get_date_range(df)}
- Sample Records: {df.head(3).to_dict('records')}
"""
        
        prompt = f"""
SQL Query Executed:
{query}

{data_summary}

Business Question: {business_question}

Please analyze this step-by-step using chain-of-thought reasoning:

STEP 1: DATA COMPREHENSION
- What business processes does this data represent?
- What is the grain/level of this data?
- What time period and scope does this cover?
- Are there any immediate data quality observations?

STEP 2: PATTERN RECOGNITION
- What numerical patterns, trends, or distributions do you observe?
- Are there any seasonal, cyclical, or temporal patterns?
- What outliers or anomalies stand out?
- How do different segments or categories compare?

STEP 3: BUSINESS CONTEXT ANALYSIS
- What external business factors could influence these patterns?
- How do these numbers relate to typical business performance?
- What competitive or market dynamics might be at play?
- What operational processes could drive these results?

STEP 4: INSIGHT SYNTHESIS
- What are the 3 most significant insights from this analysis?
- What cause-and-effect relationships can you identify?
- What assumptions am I making, and how confident am I?
- What questions does this analysis raise?

STEP 5: STRATEGIC IMPLICATIONS
- What business decisions should this data inform?
- What actions would have the highest impact?
- What risks or opportunities does this reveal?
- What additional analysis would be valuable?

Show your complete reasoning process for each step.
"""
        
        return self._make_api_call(prompt, system_message="You are a senior business analyst who thinks step-by-step and shows detailed reasoning for data analysis.")
    
    def few_shot_analysis(self, df, query, analysis_type='business_analysis'):
        """Apply few-shot learning for consistent analysis format"""
        
        examples = self.few_shot_examples.get(analysis_type, self.few_shot_examples['business_analysis'])
        
        data_context = f"""
Query: {query}
Results: {df.shape[0]} records, {df.shape[1]} columns
Sample Data: {df.head(3).to_dict('records')}
Key Metrics: {list(df.select_dtypes(include=['number']).columns)}
"""
        
        prompt = f"""
Here are examples of high-quality business analysis format:

{examples}

Now analyze this data using the exact same structured format:

{data_context}

Provide your analysis following the same pattern as the examples above:
• TREND: [Clear description of main pattern/trend]
• ROOT CAUSE: [Most likely explanation for the trend]  
• IMPACT: [Business consequences with quantification where possible]
• ACTION: [Specific recommended next steps with timeline]
• METRIC: [How to track progress and success]

Focus on actionable business insights with specific recommendations.
"""
        
        return self._make_api_call(prompt, system_message="You are a business analyst who provides structured insights using the TREND-ROOT CAUSE-IMPACT-ACTION-METRIC format.")
    
    def role_based_analysis(self, df, query, role, business_context):
        """Generate role-specific analysis"""
        
        if role not in self.roles:
            return f"Role '{role}' not supported. Available: {list(self.roles.keys())}"
        
        role_config = self.roles[role]
        
        data_summary = f"""
Data Context: {business_context}
Query: {query}
Results: {df.shape[0]} records with columns: {list(df.columns)}
Sample: {df.head(2).to_dict('records')}
Key Numbers: {df.describe().to_dict() if len(df.select_dtypes(include=['number']).columns) > 0 else 'No numeric data'}
"""
        
        prompt = f"""
{role_config['persona']}

Analysis Focus: {role_config['focus']}
Key Priorities: {', '.join(role_config['priorities'])}
Expected Format: {role_config['format']}

{data_summary}

Provide analysis specifically tailored for a {role.replace('_', ' ').title()}:

1. EXECUTIVE SUMMARY (2-3 sentences)
   - What does this data mean for our {role.replace('_', ' ')} priorities?

2. KEY INSIGHTS (3-4 bullet points)
   - Focus on insights most relevant to {role.replace('_', ' ')} decisions

3. STRATEGIC RECOMMENDATIONS (2-3 actions)
   - Specific actions appropriate for {role.replace('_', ' ')} authority level

4. SUCCESS METRICS
   - KPIs this role should track based on this analysis

5. RISK ASSESSMENT
   - What could go wrong? What are we missing?

Tailor language, priorities, and recommendations specifically for a {role.replace('_', ' ').title()}.
"""
        
        return self._make_api_call(prompt, system_message=role_config['persona'])
    
    def comprehensive_analysis(self, query, business_question, target_roles=['ceo', 'sales_director'], business_context=""):
        """
        Run complete analysis combining all three techniques
        """
        print(f"🔄 Executing comprehensive analysis...")
        print(f"Query: {query[:100]}...")
        
        # Execute query
        df = self.execute_query(query)
        if df is None or len(df) == 0:
            return {"error": "Query returned no data"}
        
        print(f"✅ Query executed: {df.shape[0]} records returned")
        
        results = {
            'query': query,
            'business_question': business_question,
            'data_summary': {
                'records': df.shape[0],
                'columns': list(df.columns),
                'date_range': self._get_date_range(df),
                'sample_data': df.head(3).to_dict('records')
            },
            'analyses': {}
        }
        
        # 1. Chain-of-Thought Analysis
        print("🧠 Running chain-of-thought analysis...")
        results['analyses']['chain_of_thought'] = self.chain_of_thought_analysis(df, query, business_question)
        
        # 2. Few-Shot Analysis  
        print("📋 Running few-shot structured analysis...")
        results['analyses']['few_shot'] = self.few_shot_analysis(df, query)
        
        # 3. Role-Based Analyses
        print(f"👥 Running role-based analysis for: {', '.join(target_roles)}")
        results['analyses']['role_based'] = {}
        
        for role in target_roles:
            print(f"   📊 Analyzing for {role.replace('_', ' ').title()}...")
            results['analyses']['role_based'][role] = self.role_based_analysis(df, query, role, business_context)
        
        return results
    
    def _make_api_call(self, prompt, system_message=None, max_tokens=800):
        """Make OpenAI API call with error handling using new client"""
        try:
            messages = []
            if system_message:
                messages.append({"role": "system", "content": system_message})
            messages.append({"role": "user", "content": prompt})
            
            response = self.client.chat.completions.create(
                model="gpt-3.5-turbo",
                messages=messages,
                max_tokens=max_tokens,
                temperature=0.3
            )
            
            return response.choices[0].message.content
            
        except Exception as e:
            return f"API Error: {e}"
    
    def _get_date_range(self, df):
        """Extract date range from DataFrame if date columns exist"""
        date_columns = df.select_dtypes(include=['datetime64']).columns
        if len(date_columns) > 0:
            date_col = date_columns[0]
            return f"{df[date_col].min()} to {df[date_col].max()}"
        return "No date columns found"
    
    def print_comprehensive_report(self, results):
        """Print formatted comprehensive analysis report"""
        
        print("\n" + "="*80)
        print("🚀 COMPREHENSIVE AI DATABASE ANALYSIS REPORT")
        print("="*80)
        
        print(f"\n📊 DATA SUMMARY:")
        print(f"Records Analyzed: {results['data_summary']['records']:,}")
        print(f"Columns: {', '.join(results['data_summary']['columns'])}")
        print(f"Date Range: {results['data_summary']['date_range']}")
        
        print(f"\n❓ BUSINESS QUESTION:")
        print(f"{results['business_question']}")
        
        print(f"\n🔍 SQL QUERY:")
        print(f"{results['query']}")
        
        # Chain-of-Thought Analysis
        print("\n" + "="*60)
        print("🧠 CHAIN-OF-THOUGHT ANALYSIS (Step-by-Step Reasoning)")
        print("="*60)
        print(results['analyses']['chain_of_thought'])
        
        # Few-Shot Analysis
        print("\n" + "="*60)
        print("📋 STRUCTURED ANALYSIS (Few-Shot Learning Format)")
        print("="*60)
        print(results['analyses']['few_shot'])
        
        # Role-Based Analyses
        print("\n" + "="*60)
        print("👥 ROLE-BASED PERSPECTIVES")
        print("="*60)
        
        for role, analysis in results['analyses']['role_based'].items():
            print(f"\n🎯 {role.replace('_', ' ').upper()} PERSPECTIVE:")
            print("-" * 50)
            print(analysis)
    
    def close_connection(self):
        """Close database connection"""
        if hasattr(self, 'connection'):
            self.connection.close()
            print("Database connection closed.")


# Sample business analysis queries for WorldWideImporters
SAMPLE_QUERIES = {
    'sales_performance': {
        'query': """
        SELECT 
            c.CustomerCategoryName,
            YEAR(o.OrderDate) as Year,
            MONTH(o.OrderDate) as Month,
            COUNT(DISTINCT o.OrderID) as OrderCount,
            COUNT(DISTINCT o.CustomerID) as UniqueCustomers,
            SUM(ol.Quantity * ol.UnitPrice) as TotalRevenue,
            AVG(ol.Quantity * ol.UnitPrice) as AvgOrderValue
        FROM Sales.Orders o
        JOIN Sales.OrderLines ol ON o.OrderID = ol.OrderID
        JOIN Sales.Customers cust ON o.CustomerID = cust.CustomerID
        JOIN Sales.CustomerCategories c ON cust.CustomerCategoryID = c.CustomerCategoryID
        WHERE o.OrderDate >= '2016-01-01'
        GROUP BY c.CustomerCategoryName, YEAR(o.OrderDate), MONTH(o.OrderDate)
        ORDER BY Year DESC, Month DESC, TotalRevenue DESC
        """,
        'business_question': 'How is our sales performance trending across different customer categories, and what opportunities exist for growth?',
        'context': 'Monthly sales analysis for strategic planning and customer segment optimization',
        'roles': ['ceo', 'sales_director']
    },
    
    'inventory_analysis': {
        'query': """
        SELECT 
            sg.StockGroupName,
            si.StockItemName,
            sh.QuantityOnHand,
            sh.LastStocktakeQuantity,
            si.UnitPrice,
            (sh.QuantityOnHand * si.UnitPrice) as InventoryValue,
            si.RecommendedRetailPrice - si.UnitPrice as PotentialProfit
        FROM Warehouse.StockItemHoldings sh
        JOIN Warehouse.StockItems si ON sh.StockItemID = si.StockItemID
        JOIN Warehouse.StockItemStockGroups sisg ON si.StockItemID = sisg.StockItemID
        JOIN Warehouse.StockGroups sg ON sisg.StockGroupID = sg.StockGroupID
        WHERE sh.QuantityOnHand > 0
        ORDER BY InventoryValue DESC
        """,
        'business_question': 'What is our current inventory position and how can we optimize inventory management for better profitability?',
        'context': 'Inventory optimization analysis for operational efficiency and cash flow management',
        'roles': ['operations_manager', 'ceo']
    },
    
    'customer_behavior': {
        'query': """
        SELECT 
            c.CustomerName,
            COUNT(DISTINCT o.OrderID) as TotalOrders,
            SUM(ol.Quantity * ol.UnitPrice) as TotalSpent,
            AVG(ol.Quantity * ol.UnitPrice) as AvgOrderValue,
            MIN(o.OrderDate) as FirstOrder,
            MAX(o.OrderDate) as LastOrder,
            DATEDIFF(day, MIN(o.OrderDate), MAX(o.OrderDate)) as CustomerLifespanDays
        FROM Sales.Customers c
        JOIN Sales.Orders o ON c.CustomerID = o.CustomerID
        JOIN Sales.OrderLines ol ON o.OrderID = ol.OrderID
        GROUP BY c.CustomerID, c.CustomerName
        HAVING COUNT(DISTINCT o.OrderID) >= 5
        ORDER BY TotalSpent DESC
        """,
        'business_question': 'Who are our most valuable customers and what patterns can we identify to improve customer retention and acquisition?',
        'context': 'Customer lifetime value analysis for relationship management and marketing strategy',
        'roles': ['sales_director', 'ceo', 'data_scientist']
    }
}


def main():
    """
    Main function to demonstrate all three AI techniques with WorldWideImporters
    """
    
    print("🚀 Advanced AI Database Analyzer")
    print("Combining Chain-of-Thought + Few-Shot Learning + Role-Based Analysis")
    print("="*70)
    
    # Initialize analyzer
    try:
        analyzer = AdvancedAIAnalyzer()
        print("✅ AI Analyzer initialized successfully!")
    except ValueError as e:
        print(f"❌ {e}")
        print("\nPlease set your OpenAI API key:")
        print("1. Create a .env file with: OPENAI_API_KEY=your_key_here")
        print("2. Or set environment variable: OPENAI_API_KEY")
        print("3. Install required packages: pip install openai>=1.0")
        return
    
    # Connect to database
    if not analyzer.connect_to_worldwideimporters():
        return
    
    print("\n" + "="*70)
    print("Available Analysis Options:")
    print("="*70)
    
    for key, config in SAMPLE_QUERIES.items():
        print(f"\n{key.upper().replace('_', ' ')}:")
        print(f"Question: {config['business_question']}")
        print(f"Roles: {', '.join([r.replace('_', ' ').title() for r in config['roles']])}")
    
    # Let user choose analysis or run all
    print(f"\nChoose analysis to run:")
    print("1. Sales Performance Analysis")
    print("2. Inventory Analysis") 
    print("3. Customer Behavior Analysis")
    print("4. Run All Analyses")
    
    choice = input("\nEnter choice (1-4) or press Enter for Sales Performance: ").strip()
    
    analyses_to_run = []
    if choice == '2':
        analyses_to_run = ['inventory_analysis']
    elif choice == '3':
        analyses_to_run = ['customer_behavior']
    elif choice == '4':
        analyses_to_run = list(SAMPLE_QUERIES.keys())
    else:
        analyses_to_run = ['sales_performance']
    
    # Run selected analyses
    for analysis_key in analyses_to_run:
        config = SAMPLE_QUERIES[analysis_key]
        
        print(f"\n{'='*80}")
        print(f"🔄 RUNNING: {analysis_key.upper().replace('_', ' ')} ANALYSIS")
        print(f"{'='*80}")
        
        results = analyzer.comprehensive_analysis(
            query=config['query'],
            business_question=config['business_question'],
            target_roles=config['roles'],
            business_context=config['context']
        )
        
        if 'error' in results:
            print(f"❌ Analysis failed: {results['error']}")
            continue
        
        # Print comprehensive report
        analyzer.print_comprehensive_report(results)
        
        if len(analyses_to_run) > 1:
            input("\nPress Enter to continue to next analysis...")
    
    # Cleanup
    analyzer.close_connection()
    print(f"\n✅ Analysis complete!")
    print(f"\n💡 Key Observations:")
    print(f"• Chain-of-Thought: Shows step-by-step reasoning process")
    print(f"• Few-Shot Learning: Provides consistent TREND-ROOT CAUSE-IMPACT-ACTION-METRIC format")
    print(f"• Role-Based: Adapts insights for different business stakeholders")
    print(f"\nThis demonstrates how AI can provide multiple perspectives on the same data!")


if __name__ == "__main__":
    main()

For more review...this code provided by Claude AI - haven't used it in practice.

#### Dynamic Prompt Optimization

In [None]:
class AdaptivePromptEngine:
    """Automatically optimize prompts based on data characteristics"""
    
    def __init__(self):
        self.optimization_history = {}
    
    def analyze_data_characteristics(self, df):
        """Analyze data to determine optimal prompting strategy"""
        characteristics = {
            'row_count': len(df),
            'column_count': len(df.columns),
            'numeric_columns': len(df.select_dtypes(include=['number']).columns),
            'categorical_columns': len(df.select_dtypes(include=['object', 'category']).columns),
            'has_dates': any(df.dtypes == 'datetime64[ns]'),
            'has_nulls': df.isnull().any().any(),
            'data_size': 'small' if len(df) < 100 else 'medium' if len(df) < 1000 else 'large'
        }
        
        return characteristics
    
    def select_optimal_approach(self, characteristics, analysis_goal):
        """Choose best prompting approach based on data characteristics"""
        
        # Small datasets: Full data analysis
        if characteristics['data_size'] == 'small':
            return 'full_data_analysis'
        
        # Large datasets: Statistical summary approach
        elif characteristics['data_size'] == 'large':
            return 'statistical_summary'
        
        # Time series data: Trend analysis
        elif characteristics['has_dates']:
            return 'time_series_analysis'
        
        # Mostly categorical: Segment analysis
        elif characteristics['categorical_columns'] > characteristics['numeric_columns']:
            return 'categorical_analysis'
        
        # Default: Balanced approach
        else:
            return 'balanced_analysis'
    
    def generate_adaptive_prompt(self, df, question, analysis_goal='general'):
        """Generate optimized prompt based on data characteristics"""
        
        characteristics = self.analyze_data_characteristics(df)
        approach = self.select_optimal_approach(characteristics, analysis_goal)
        
        base_context = f"""
        Dataset Overview:
        - Size: {characteristics['row_count']} rows, {characteristics['column_count']} columns
        - Numeric metrics: {characteristics['numeric_columns']}
        - Categories: {characteristics['categorical_columns']}
        - Data quality: {'Clean' if not characteristics['has_nulls'] else 'Has missing values'}
        """
        
        if approach == 'full_data_analysis':
            # Small dataset - include more detail
            data_sample = df.head(10).to_dict('records')
            prompt = f"""
            {base_context}
            
            Complete dataset sample:
            {json.dumps(data_sample, indent=2, default=str)}
            
            Question: {question}
            
            Since this is a small dataset, please provide:
            1. Detailed analysis of each record
            2. Specific patterns and outliers
            3. Actionable recommendations
            4. Confidence level in findings
            """
        
        elif approach == 'statistical_summary':
            # Large dataset - focus on statistics
            stats_summary = df.describe().to_dict() if len(df.select_dtypes(include=['number']).columns) > 0 else {}
            prompt = f"""
            {base_context}
            
            Statistical Summary:
            {json.dumps(stats_summary, indent=2, default=str)}
            
            Question: {question}
            
            For this large dataset, focus on:
            1. High-level trends and patterns
            2. Statistical significance of findings
            3. Strategic implications
            4. Areas needing deeper investigation
            """
        
        elif approach == 'time_series_analysis':
            # Time-based data
            date_columns = [col for col in df.columns if df[col].dtype == 'datetime64[ns]']
            prompt = f"""
            {base_context}
            
            Time Series Data with date columns: {date_columns}
            Sample: {df.head(5).to_dict('records')}
            
            Question: {question}
            
            For this time series data, analyze:
            1. Temporal trends and seasonality
            2. Growth rates and momentum
            3. Forecast implications
            4. Time-based recommendations
            """
        
        else:
            # Balanced approach
            prompt = f"""
            {base_context}
            
            Representative Sample:
            {df.head(5).to_dict('records')}
            
            Question: {question}
            
            Please provide balanced analysis covering:
            1. Key insights and patterns
            2. Business implications
            3. Actionable recommendations
            4. Next steps for investigation
            """
        
        return prompt
    
    def adaptive_analysis(self, df, question, analysis_goal='general'):
        """Run adaptive analysis with optimized prompting"""
        
        optimized_prompt = self.generate_adaptive_prompt(df, question, analysis_goal)
        
        try:
            response = openai.ChatCompletion.create(
                model="gpt-3.5-turbo",
                messages=[
                    {"role": "system", "content": "You are an expert data analyst who adapts analysis depth to data characteristics."},
                    {"role": "user", "content": optimized_prompt}
                ],
                max_tokens=700,
                temperature=0.3
            )
            
            return response.choices[0].message.content
        
        except Exception as e:
            return f"Adaptive analysis error: {e}"

# Test adaptive prompting
adaptive_engine = AdaptivePromptEngine()

# This will automatically choose the best approach for your data
adaptive_results = adaptive_engine.adaptive_analysis(
    sales_data, 
    "What strategies should we pursue to maximize revenue growth?",
    analysis_goal='strategic_planning'
)

print("Adaptive Analysis Results:")
print(adaptive_results)