Advanced Prompt Engineering Problem Set - Unit 1.3
=======================================================================

This problem focuses on market sentiment analysis using multiple data sources
and self-consistency checking. You'll create a system that generates robust
sentiment analysis with confidence scoring.

### Key Concepts to Practice
----------
1. Multi-Source Sentiment Analysis
2. Self-Consistency Checking
3. Confidence Scoring
4. Consensus Generation
5. Cross-Validation Techniques

Let's build a robust sentiment analysis system!

## Step 0: Setup and Dependencies
--------------------------------
First, let's ensure we have all required packages installed.

In [None]:
!pip install numpy pandas matplotlib langchain openai python-dotenv typing-extensions pydantic pydantic_settings langchain-community langchain-openai json--quiet

## Step 1: Initial Configuration
--------------------------------
Set up our environment and imports.

In [1]:
from typing import Any, Dict, List
from langchain.chat_models import ChatOpenAI

## Step 1.5: Configuration Management
--------------------------------
Set up configuration management for OpenAI credentials in Colab environment.

In [35]:
import os
from typing import Optional
from pydantic_settings import BaseSettings
from langchain_openai import ChatOpenAI

import json

In [None]:
# Configuration Values - Update these with your API keys
from dotenv import load_dotenv

load_dotenv()

api_key = os.getenv("OPENAI_API_KEY")

if not api_key:
    raise ValueError("API key is missing. Check your .env file.")

print("API key loaded successfully")
OPENAI_API_KEY = api_key

In [4]:
class Settings(BaseSettings):
    """Configuration management for API credentials.

    This class manages API credentials for:
    1. OpenAI
    2. Other services as needed

    Attributes:
        openai_api_key: OpenAI API key
        model_name: OpenAI model identifier
        temperature: Model temperature setting
    """

    openai_api_key: str = OPENAI_API_KEY
    model_name: str = "gpt-3.5-turbo-0125"
    temperature: float = 0.7

In [5]:
def setup_environment() -> ChatOpenAI:
    """Initialize environment and create LLM instance.

    This function:
    1. Loads settings
    2. Sets environment variables
    3. Initializes chat model

    Returns:
        ChatOpenAI: Configured language model instance
    """
    # Load settings
    settings = Settings()

    # Set environment variables
    os.environ["OPENAI_API_KEY"] = settings.openai_api_key

    # Initialize ChatOpenAI with settings
    llm = ChatOpenAI(
        model_name=settings.model_name,
        temperature=settings.temperature
    )

    return llm

In [6]:
# Initialize LLM
try:
    llm = setup_environment()
except Exception as e:
    print(f"Error initializing LLM: {e}")
    print("Please ensure API key is properly set")

## Usage Instructions:
1. Run the pip install cell first
2. Update OPENAI_API_KEY at the top of this section
3. Run remaining cells to initialize environment
4. Use `llm` instance in your code

## Configuration Tips:
1. Keep API keys secure
2. Update settings as needed
3. Add additional services similarly
4. Manage environment consistently

## Problem 3: Market Sentiment Analysis System
--------------------------------
Design and implement a comprehensive market sentiment analysis system
that combines multiple data sources and ensures consistency.

### Requirements:
1. Multi-Source Analysis:
   - News articles and headlines
   - Social media sentiment
   - Technical indicators
   - Market statistics
   - Analyst reports

2. Self-Consistency Checks:
   - Cross-validation of sources
   - Internal consistency metrics
   - Temporal consistency
   - Source reliability scoring

3. Confidence Scoring:
   - Source-specific confidence
   - Analysis reliability metrics
   - Consensus confidence
   - Time-sensitivity factors

### Template Structure:

In [None]:
class MarketSentimentAnalyzer:
    """A system for comprehensive market sentiment analysis.

    Implement this class to create a robust sentiment analysis system
    that combines multiple sources and ensures consistency.
    """

    def __init__(self, llm: Any):
        """Initialize sentiment analysis system.

        Your implementation should:
        1. Set up data source handlers
        2. Configure analysis parameters
        3. Initialize scoring systems

        Args:
            llm: LangChain chat model instance
        """
        self.llm = llm
        
        #Data Sources handler

        self.news_source_handler = self._initialize_news_handler()
        self.social_media_handler = self._initialize_social_media_handler()
        self.technical_indicator_handler = self._initialize_technical_handler()
        self.market_stats_handler = self._initialize_market_stats_handler()
        self.analyst_report_handler = self._initialize_analyst_report_handler()

        print("✅Market Sentiment Analyzer initialized successfully.")

    def _initialize_news_handler(self):
        """Set up news data handler (parses headlines and content)"""

        def fun_analyze_news_sentiment(market_data):
            news_articles = market_data.get("news_articles", "").strip().split("\n\n")
            info_news = [article.replace("HEADLINE: ", "") for article in news_articles if article]
            
            try:
                response = self.llm.predict(
                """
                Role: Financial expert with stock market knowledge.
                Objective: Determine if the market is beneficial for investments.
                Task: Analyze the sentiment of the following news headlines:
                {info_news}
                
                Provide a response in the following JSON format:
                {{
                    "overall_analysis": "one-word sentiment",
                    "explanation": "Financial expert insights and recommendation",
                    "confidence_score": confidence_score_value
                }}""".strip()
                )

                return response
            
            except json.JSONDecodeError as e:
                print("❌ Error parsing LLM response for news:", e)
                return {"overall_analysis": "Undefined", "explanation": "No news found", "confidence_score": 0.0}
        
        return fun_analyze_news_sentiment
    
    def _initialize_social_media_handler(self):
        """Set up social media data handler (parses sentiment statistics)"""

        def fun_analyze_social_media_sentiment(market_data):

            info_social_media = market_data.get("social_media_sentiment", "")

            try:
                response = self.llm.predict(
                """
                Role: Marketing expert, with some knowledge on linguistics. 
                Objective: Determine if the general public has a positive or negative sentiment toward the general market.
                Task: Analyze the sentiment of the following information retrieved from the social media:
                {info_social_media}
                
                Provide a response in the following JSON format:
                {{
                    "overall_analysis": "one-word sentiment",
                    "explanation": "in-depth explanation for an investor and the recomendation",
                    "confidence_score": confidence_score_value
                }}""".strip()
                )
                
                return response

            except json.JSONDecodeError as e:
                print("❌ Error parsing LLM response for news:", e)
                return {"overall_analysis": "Undefined", "explanation": "No social media found", "confidence_score": 0.0}
        
        return fun_analyze_social_media_sentiment
    
    def _initialize_technical_handler(self):
        """Set up technical indicator analysis system (parses indicator values)"""

        def fun_analyze_technical_indicators(market_data):
            
            info_technical_indicators = market_data.get("technical_indicators","")

            try:

                response = self.llm.predict(
                """
                Role: Stock market expert with
                Objective: Determine if the market is being bearish, bullish or neutral
                Analyze technically the following market information:
                {info_technical_indicators}
                
                Provide a response in the following JSON format:
                {{  
                    "overall_analysis": "one-word result",
                    "explanation": "in-depth explanation for an investor and the recomendation",
                    "confidence_score": confidence_score_value
                }}""".strip()
                )
                
                return response
            
            except json.JSONDecodeError as e:
                print("❌ Error parsing LLM response for news:", e)
                return {"overall_analysis": "Undefined", "explanation": "No technical indicators found", "confidence_score": 0.0}
        
        return fun_analyze_technical_indicators
    
    def _initialize_market_stats_handler(self):
        """Set up market statistics data handler (extracts performance stats)"""

        def fun_analyze_market_stats(market_data):
            
            info_market_stats = market_data.get("market_statistics", "")

            try:
                response = self.llm.predict(
                """
                Role: Stock market expert
                Objective: Determine if the market is being bearish, bullish or neutral
                Analyze technically the following market information:
                {info_market_stats}
                
                Provide a response in the following JSON format:
                {{
                    "overall_analysis": "one-word result",
                    "explanation": "in-depth explanation for an investor and the recomendation",
                    "confidence_score": confidence_score_value
                }}""".strip()
                )
                return response
        
            except json.JSONDecodeError as e:
                print("❌ Error parsing LLM response for news:", e)
                return {"overall_analysis": "Undefined", "explanation": "No market statistics found", "confidence_score": 0.0}
        return fun_analyze_market_stats
    
    def _initialize_analyst_report_handler(self):
        """Set up analyst report processing system (extracts report insights)"""

        def fun_analyze_analyst_reports(market_data):
            info_reports = market_data.get("analyst_reports", "")

            try:
                response = self.llm.predict(
                """
                Role: Investor with some experience in the sector
                Objective: Determine if the market is benefitial for investing
                Analyze the following reports:
                {info_reports}
                
                Provide a response in the following JSON format:
                {{
                    "overall_analysis": "one-word result",
                    "explanation": "in-depth explanation for an investor and the recomendation",
                    "confidence_score": confidence_score_value
                }}""".strip()
                )
                return response
            
            except json.JSONDecodeError as e:
                print("❌ Error parsing LLM response for news:", e)
                return {"overall_analysis": "Undefined", "explanation": "No analyst reports found", "confidence_score": 0.0}
        
        return fun_analyze_analyst_reports


    def fun_analyze_general_sentiment(self, market_data: Dict[str, str]) -> Dict[str, Any]:
        """Analyze market sentiment from multiple sources.

        Your implementation should:
        1. Process each data source
        2. Generate individual analyses
        3. Assign confidence scores

        Args:
            market_data: Dictionary of data sources and their content

        Returns:
            Dict containing sentiment analyses and metrics
        """

        dict_analyses_raw = {
            "news": self.news_source_handler(market_data),
            "social_media": self.social_media_handler(market_data),
            "technical_indicators": self.technical_indicator_handler(market_data),
            "market_stats": self.market_stats_handler(market_data),
            "analyst_reports": self.analyst_report_handler(market_data)
        }


        for source, data in dict_analyses_raw.items():
            print(source, data)

        results_prev = {
            source: json.loads(data) for source, data in dict_analyses_raw.items()
        }

        dict_analyses_results = [
            {
                "source": source,
                "overall_analysis": data["overall_analysis"],
                "explanation": data["explanation"],
                "confidence_score": data["confidence_score"]
            }

            for source, data in results_prev.items()
        ]
        
        print(dict_analyses_results)

        return dict_analyses_results
    
    def fun_check_consistency(self, analyses: List[Dict[str, Any]]) -> float:
        """Check consistency between different analyses.

        Your implementation should:
        1. Compare source agreements
        2. Measure consistency metrics
        3. Identify contradictions

        Args:
            analyses: List of individual analyses

        Returns:
            float: Consistency score
        """
        formatted_analyses = json.dumps(analyses, indent = 4)
        response = self.llm.predict(
            f"""
            Given the following sentiment analyses determine the level of agreement between different sources.
            Consider synonyms and similar sentiments as consistent.

            The data follows this format:
            {formatted_analyses}
            
            Analyze the agreement among sooures and return a JSON response with the following format:
            {{
                "consistency_score": float,
                "contradictions": {{
                    "source_name": "explanation of why it contradicts the others"
                }}
            }}
            Ensure that the actual source names from the analyse are used instead of placeholders.
            """
        )

        try:
            #Ensure response is not empty or invalid
            if not response.strip():
                raise ValueError("Empty response from LLM")
            
            consistency_results = json.loads(response)
            return {
                "consistency_score": consistency_results.get("consistency_score", 0.0),
                "contradictions": consistency_results.get("contradictions", {})
            }
        
        except json.JSONDecodeError as e:
            print("Error decoding JSON response from LLM:", e)
            print("Raw response:", response)

            return {
                "consistency_score": 0.0,
                "contradictions": {},
                "error": "Invalid JSON format received form LLM"
            }
        
    def fun_generate_consensus(self, analyses: List[Dict[str, Any]], consistency_results: Dict[str, Any]) -> Dict[str, Any]:
        """Generate weighted consensus with confidence scores.

        Your implementation should:
        1. Weight source contributions
        2. Resolve contradictions
        3. Calculate confidence

        Args:
            analyses: List of individual analyses

        Returns:
            Dict containing consensus and confidence scores
        """
        formatted_analyses = json.dumps(analyses, indent = 4)
        formatted_consistency = json.dumps(consistency_results, indent = 4)
        response = self.llm.predict(
            f"""
            Given the following sentiment analyses and their consistency evaluation, determine a weighted consensus sentiment by considering source importance, resolving contradictions, and calculating an overall condifence score.
            If some sources are missing, the output will indicate that not all sources were considered.
            Role: As investor in meeting room with marketing expert, stock market expert and financial expert.

            Sentiment Analyses:
            {formatted_analyses}

            Consistency Evaluation:
            {formatted_consistency}

            Generate a JSON response in the following format:

            {{
                "consensus_sentiment": "overall one-word sentiment",
                "explanation": "detailed explanation of how the consensus was derived",
                "confidence_score": float
            }}
            Ensure that the response accurately reflects the weighted contributions from the sources and accounts for contradictions
            """
        )
        
        consensus_results = json.loads(response)
        return{
            "consensus_sentiment": consensus_results.get("consensus_sentiment", "No consensus achieved."),
            "explanation": consensus_results.get("explanation", "No explanation provided."),
            "confidence_score": consensus_results.get("confidence_score", 0.0)
        }
    

### Example Test Data:

In [210]:
market_data = {
    "news_articles": """
HEADLINE: Tech Stocks Rally on Strong Earnings
(Reuters) - Technology stocks surged today following better-than-expected
earnings from major players. Apple Inc. and Microsoft Corp. both beat analyst
estimates, driving broader market gains. Cloud computing and AI segments
showed particular strength.

HEADLINE: Fed Signals Potential Rate Cuts
The Federal Reserve indicated openness to rate cuts later this year, citing
moderating inflation pressures. Markets responded positively to the news,
with bond yields declining.

HEADLINE: Startup Funding Shows Signs of Recovery
Venture capital investments increased 15% in Q4, marking the first
quarterly rise since 2022. Software and fintech sectors led the recovery.
""",

    "social_media_sentiment": """
$AAPL trending positive:
- 65% positive mentions
- 28% neutral mentions
- 7% negative mentions
Volume: 50,000 mentions

$MSFT sentiment metrics:
- 72% positive mentions
- 22% neutral mentions
- 6% negative mentions
Volume: 35,000 mentions

#TechStocks trending topics:
1. #EarningsSeason
2. #TechRally
3. #InvestInTech
""",

    "technical_indicators": """
Market Technical Analysis:
- S&P 500 RSI: 62.5
- NASDAQ RSI: 65.8
- VIX: 16.5
- Moving Averages: Most above 200-day
- Volume: +25% vs 30-day average
- Advance/Decline: 2.5:1
""",

    "market_statistics": """
Market Overview:
- S&P 500: +1.2%
- NASDAQ: +1.8%
- DOW: +0.9%
- Small Caps: +1.5%
- Sector Leaders: Tech +2.3%, Communications +1.9%
- Sector Laggards: Utilities -0.4%, Real Estate -0.2%
""",

    "analyst_reports": """
Goldman Sachs: Overweight Tech Sector
- Target price revisions: +10% average
- Sector outlook: Positive
- Key drivers: AI adoption, cloud growth
- Risk factors: Valuations, rate sensitivity

Morgan Stanley: Market Outlook
- Stance: Constructively bullish
- Focus areas: Quality growth stocks
- Concerns: Geopolitical tensions
- 12-month S&P target: 5200
"""
}

In [216]:
sentiment_analyzer = MarketSentimentAnalyzer(llm)

# Step 3: Provide market data and analyze sentiment
sentiment_results = sentiment_analyzer.fun_analyze_general_sentiment(market_data)

# Step 4: Print results
print(json.dumps(sentiment_results, indent = 4))

✅Market Sentiment Analyzer initialized successfully.
news {
    "overall_analysis": "Positive",
    "explanation": "After analyzing the sentiment of the news headlines, it seems that the market is currently showing positive signs for investments. The news articles indicate potential growth and opportunities in various sectors. As a financial expert, I would recommend considering investments in sectors that are highlighted in the news as they seem to have favorable conditions. However, it is important to always conduct thorough research and consider diversification to mitigate risks.",
    "confidence_score": 0.85
}
social_media {
    "overall_analysis": "Positive",
    "explanation": "The sentiment of the general public towards the general market seems to be positive based on the information retrieved from social media. There are mentions of excitement, optimism, and confidence in the market's performance. This positive sentiment can be beneficial for investors looking to capitalize on

In [217]:
sentiment_consistency = sentiment_analyzer.fun_check_consistency(sentiment_results)
print(json.dumps(sentiment_consistency, indent = 4))

{
    "consistency_score": 0.6,
    "contradictions": {
        "technical_indicators": "The technical indicators source indicates a bearish trend, which contradicts the positive sentiment expressed by the news, social media, and analyst reports."
    }
}


In [218]:
consensus = sentiment_analyzer.fun_generate_consensus(sentiment_results, sentiment_consistency)
print(json.dumps(consensus, indent = 4))
    

{
    "consensus_sentiment": "Neutral",
    "explanation": "The consensus sentiment was derived by considering the weighted contributions from different sources. Although the news, social media, and analyst reports expressed positive sentiments, the technical indicators source indicated a bearish trend, causing a contradiction. The market statistics source also suggested a neutral position, adding to the mixed signals. Therefore, after weighing the inputs from all sources and resolving the contradictions, the overall sentiment is assessed as neutral.",
    "confidence_score": 0.76
}


### Implementation Requirements:

1. Code Quality:
   - Clear documentation
   - Type hints
   - Error handling
   - Modular design

2. Analysis Quality:
   - Source-specific processing
   - Robust consistency checks
   - Reliable confidence scoring
   - Clear consensus generation

3. Testing Approach:
   - Multiple data scenarios
   - Edge case handling
   - Time sensitivity tests
   - Cross-validation

4. Output Format:
   - Source-specific sentiments
   - Consistency metrics
   - Confidence scores
   - Final consensus

### Evaluation Criteria:

Your solution will be evaluated on:
1. Implementation completeness
2. Analysis robustness
3. Consistency checking
4. Confidence scoring
5. Code quality

### Tips for Success:

1. Process sources independently
2. Implement thorough validation
3. Weight sources appropriately
4. Handle contradictions clearly
5. Document assumptions

### Common Pitfalls to Avoid:

1. Over-relying on single sources
2. Weak consistency checks
3. Poor confidence scoring
4. Missing edge cases
5. Inadequate validation

### Next Steps:

After completing this problem:
1. Add more data sources
2. Enhance consistency checks
3. Improve confidence scoring
4. Add temporal analysis
5. Implement visualizations