# Introduction to Retrieval Augmented Generation with S&P 500 news

In this notebook, you will explore how to build a simple Retrieval-Augmented Generation (RAG) pipeline using financial news articles from S&P 500 companies.

We'll start by vectorizing text data, creating a vector store using FAISS, and integrating it with OpenAI's GPT models to answer questions using retrieved information.

This workflow emulates real-world systems in finance where natural language data (news, filings, analyst reports) are used to support decision-making.

# 📌 Objectives

By the end of this notebook, students will be able to:

1. **Perform Semantic Search with Metadata Filtering:**
   - Query the provided FAISS vector store to retrieve relevant financial news articles based on natural language questions.
   - Apply optional filters using metadata such as ticker or publication date to refine search results.

2. **Enrich Data with Company Metadata:**
   - Use the `yfinance` library to retrieve company-level metadata (company name, sector, industry) for tickers in the dataset.
   - Integrate this metadata to support enhanced filtering and analysis of news data.

3. **Build a Retrieval-Augmented Generation (RAG) Pipeline:**
   - Combine retrieved news snippets as context to generate answers using OpenAI’s GPT models.
   - Construct effective prompts that guide the language model to provide concise, context-aware responses.

4. **Evaluate and Analyze RAG Outputs:**
   - Review generated answers alongside the supporting news excerpts.
   - Reflect on the strengths and limitations of the simple RAG pipeline and consider potential improvements, such as adding more filters or refining retrieval strategies.

5. **Incorporate Financial Metadata into Retrieval Context:**
   - Enrich retrieved news snippets with key financial metadata including ticker, company name, sector, and industry.
   - Format prompts that combine both text excerpts and metadata to provide richer context to the language model.

6. **Generate Context-Aware Answers Using OpenAI Models:**
   - Construct and send prompts to an LLM that leverage both news content and metadata to produce concise, informed financial analysis.

7. **Compare Answers With and Without Metadata:**
   - Evaluate the impact of including financial metadata on answer quality using criteria such as clarity, detail, accuracy, and contextual relevance.
   - Summarize findings to reflect on the role of metadata in improving retrieval-augmented generation.

## Install and Import important librairies

First, we install and import the necessary libraries for:
- Text embedding generation (sentence-transformers)
- Efficient similarity search (faiss)
- Data manipulation (pandas, numpy)
- Visualization (matplotlib)

> ℹ️ FAISS uses inner product for cosine similarity by normalizing vectors.

In [None]:
%pip install sentence-transformers
%pip install faiss-cpu

Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch>=1.11.0->sentence-transformers)
 

In [None]:
from sentence_transformers import SentenceTransformer
import faiss
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from collections import Counter
import matplotlib.pyplot as plt
import faiss

## Load news data
We load a CSV file of financial news, focusing on TITLE and SUMMARY, along with metadata like TICKER and PUBLICATION_DATE.
These will be embedded into vectors and used for semantic retrieval.

In [None]:
K = 25

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
df_news = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/MNA/FZ4025.10 Fintech/Tareas/S3/Project 2/df_news.csv')
df_news['PUBLICATION_DATE'] = pd.to_datetime(df_news['PUBLICATION_DATE']).dt.date
display(df_news)

Unnamed: 0,TICKER,TITLE,SUMMARY,PUBLICATION_DATE,PROVIDER,URL
0,MMM,2 Dow Jones Stocks with Promising Prospects an...,The Dow Jones (^DJI) is made up of 30 of the m...,2025-05-29,StockStory,https://finance.yahoo.com/news/2-dow-jones-sto...
1,MMM,3 S&P 500 Stocks Skating on Thin Ice,The S&P 500 (^GSPC) is often seen as a benchma...,2025-05-27,StockStory,https://finance.yahoo.com/news/3-p-500-stocks-...
2,MMM,3M Rises 15.8% YTD: Should You Buy the Stock N...,"MMM is making strides in the aerospace, indust...",2025-05-22,Zacks,https://finance.yahoo.com/news/3m-rises-15-8-y...
3,MMM,Q1 Earnings Roundup: 3M (NYSE:MMM) And The Res...,Quarterly earnings results are a good time to ...,2025-05-22,StockStory,https://finance.yahoo.com/news/q1-earnings-rou...
4,MMM,3 Cash-Producing Stocks with Questionable Fund...,While strong cash flow is a key indicator of s...,2025-05-19,StockStory,https://finance.yahoo.com/news/3-cash-producin...
...,...,...,...,...,...,...
4866,ZTS,2 Dividend Stocks to Buy With $500 and Hold Fo...,Zoetis is a leading animal health company with...,2025-05-23,Motley Fool,https://www.fool.com/investing/2025/05/23/2-di...
4867,ZTS,Zoetis (NYSE:ZTS) Declares US$0.50 Dividend Pe...,Zoetis (NYSE:ZTS) recently affirmed a dividend...,2025-05-22,Simply Wall St.,https://finance.yahoo.com/news/zoetis-nyse-zts...
4868,ZTS,Jim Cramer on Zoetis (ZTS): “It Does Seem to B...,We recently published a list of Jim Cramer Tal...,2025-05-21,Insider Monkey,https://finance.yahoo.com/news/jim-cramer-zoet...
4869,ZTS,Zoetis (ZTS) Upgraded to Buy: Here's Why,Zoetis (ZTS) might move higher on growing opti...,2025-05-21,Zacks,https://finance.yahoo.com/news/zoetis-zts-upgr...


In [None]:
df_news['EMBEDDED_TEXT'] = df_news['TITLE'] + ' : ' + df_news['SUMMARY']

In [None]:
model = SentenceTransformer('all-MiniLM-L6-v2')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

## Implement FAISS vector store
We:
- Use a pre-trained sentence transformer (all-MiniLM-L6-v2) to embed documents.
- Normalize vectors to use cosine similarity.
- Create a FAISS index and implement a basic search function.

This will allow us to retrieve relevant news snippets given a natural language question.


In [None]:
# Load model and compute embeddings
text_embeddings = model.encode(df_news['EMBEDDED_TEXT'].tolist(), convert_to_numpy=True)

# Normalize embeddings to use cosine similarity (via inner product in FAISS)
text_embeddings = text_embeddings / np.linalg.norm(text_embeddings, axis=1, keepdims=True)

# Prepare metadata
documents = df_news['EMBEDDED_TEXT'].tolist()
metadata = [
    {
        'PUBLICATION_DATE': row['PUBLICATION_DATE'],
        'TICKER': row['TICKER'],
        'PROVIDER': row['PROVIDER']
    }
    for _, row in df_news.iterrows()
]

  return forward_call(*args, **kwargs)


In [None]:
embedding_dim = text_embeddings.shape[1]
faiss_index = faiss.IndexFlatIP(embedding_dim)  # Cosine similarity via inner product
faiss_index.add(text_embeddings)

In [None]:
class FaissVectorStore:
    def __init__(self, model, index, embeddings, documents, metadata):
        self.model = model
        self.index = index
        self.embeddings = embeddings
        self.documents = documents
        self.metadata = metadata

    def search(self, query, k=5, metadata_filter=None):
        query_embedding = self.model.encode([query])
        query_embedding = query_embedding / np.linalg.norm(query_embedding)

        if metadata_filter:
            filtered_indices = [i for i, meta in enumerate(self.metadata) if metadata_filter(meta)]
            if not filtered_indices:
                return []
            filtered_embeddings = self.embeddings[filtered_indices]
            temp_index = faiss.IndexFlatIP(filtered_embeddings.shape[1])
            temp_index.add(filtered_embeddings)
            D, I = temp_index.search(query_embedding, k)
            indices = [filtered_indices[i] for i in I[0]]
        else:
            D, I = self.index.search(query_embedding, k)
            indices = I[0]
            D = D[0]

        results = []
        for idx, sim in zip(indices, D):
            results.append((self.documents[idx], self.metadata[idx], float(sim)))
        return results

In [None]:
# Create FAISS-based store
faiss_store = FaissVectorStore(
    model=model,
    index=faiss_index,
    embeddings=text_embeddings,
    documents=documents,
    metadata=metadata
)

### Setup OpenAI Client

👉 **Instructions**:
- Import the `OpenAI` client from the `openai` Python library.
- You will need an **OpenAI API key** to use their models programmatically:
  - Go to [https://platform.openai.com/](https://platform.openai.com/) and sign up or log in.
  - Create an API key from your [API keys dashboard](https://platform.openai.com/account/api-keys).
  - ⚠️ **Keep your API key private** and **do not** share or hardcode it in public notebooks.
- Note that **usage of the OpenAI API is not free**. You will need to:
  - Add a payment method.
  - Monitor your usage to avoid unexpected charges.
  - Optionally set usage limits from your account settings.
- You can refer to the **course’s Study Resources** for a step-by-step guide on creating an OpenAI account and retrieving your API key.

Then:
- Initialize the client with `OpenAI(api_key="YOUR_KEY_HERE")`.
- Send a test request using `.responses.create()` and the `"gpt-4o-mini"` model with a simple prompt:

  ```python
  response = client.responses.create(
      model="gpt-4o-mini",
      input="Write a one-sentence bedtime story about a unicorn."
  )
  print(response.output_text)


In [None]:
from google.colab import userdata
OpenAI_api_key = userdata.get('OpenAI_API_Key')

In [None]:
# Defining and using a language model agent for fed policy Q & A
# Define and test and openAI LLM Client
from openai import OpenAI
client = OpenAI(api_key=OpenAI_api_key)

response = client.responses.create(
    model="gpt-4o-mini",
    input="Write a one-sentence bedtime story about a unicorn."
)
print(response.output_text)

As the moonlight danced on the shimmering lake, a brave little unicorn named Luna discovered her own magic by helping lost stars find their way home.


## Retrieve Additional Metadata from Yahoo Finance

👉 **Instructions**:
- We will enrich our news dataset by retrieving **company-level metadata** using the `yfinance` library.
- The goal is to map each unique stock ticker (`TICKER`) in the dataset to:
  - `COMPANY_NAME`
  - `SECTOR`
  - `INDUSTRY`

> ℹ️ `yfinance` fetches live data from Yahoo Finance. If you're running this in a cloud environment or during peak hours, expect some tickers to fail or rate limits to apply.

✅ After this step, you will have a new DataFrame (e.g. `df_meta`) with the columns `TICKER`, `COMPANY_NAME`, `SECTOR`, `INDUSTRY` that maps tickers to their company names, sectors, and industries. This metadata will be useful later to add filters and analysis based on sector or industry categories.


In [None]:
%pip install yfinance



In [None]:
unique_tickers = df_news['TICKER'].unique()
display(unique_tickers)

array(['MMM', 'AOS', 'ABT', 'ABBV', 'ACN', 'ADBE', 'AMD', 'AES', 'AFL',
       'A', 'APD', 'ABNB', 'AKAM', 'ARE', 'ALGN', 'ALLE', 'LNT', 'ALL',
       'GOOGL', 'GOOG', 'MO', 'AMZN', 'AMCR', 'AEE', 'AEP', 'AXP', 'AIG',
       'AMT', 'AWK', 'AMP', 'AME', 'AMGN', 'APH', 'ADI', 'ANSS', 'AON',
       'APA', 'APO', 'AAPL', 'AMAT', 'APTV', 'ACGL', 'ADM', 'ANET', 'AJG',
       'AIZ', 'T', 'ATO', 'ADSK', 'ADP', 'AZO', 'AVB', 'AVY', 'AXON',
       'BKR', 'BALL', 'BAC', 'BAX', 'BDX', 'BBY', 'TECH', 'BIIB', 'BLK',
       'BX', 'BK', 'BA', 'BKNG', 'BSX', 'BMY', 'AVGO', 'BR', 'BRO',
       'BLDR', 'BG', 'BXP', 'CHRW', 'CDNS', 'CZR', 'CPT', 'CPB', 'COF',
       'CAH', 'KMX', 'CCL', 'CARR', 'CAT', 'CBOE', 'CBRE', 'CDW', 'COR',
       'CNC', 'CNP', 'CF', 'CRL', 'SCHW', 'CHTR', 'CVX', 'CMG', 'CHD',
       'CI', 'CINF', 'CTAS', 'CSCO', 'C', 'CFG', 'CLX', 'CME', 'CMS',
       'KO', 'CTSH', 'COIN', 'CL', 'CMCSA', 'CAG', 'COP', 'ED', 'STZ',
       'CEG', 'COO', 'CPRT', 'GLW', 'CPAY', 'CTVA', 'CSGP', 'COST',

In [None]:
import yfinance as yf
import time

company_metadata = []

for ticker in unique_tickers:
    try:
        ticker_yf = yf.Ticker(ticker)
        info = ticker_yf.info

        company_name = info.get('longName', 'N/A')
        sector = info.get('sector', 'N/A')
        industry = info.get('industry', 'N/A')

        company_metadata.append({
            'TICKER': ticker,
            'COMPANY_NAME': company_name,
            'SECTOR': sector,
            'INDUSTRY': industry
        })
        time.sleep(1)  # Add a small delay to avoid hitting API rate limits

    except Exception as e:
        print(f"Could not retrieve data for {ticker}: {e}")
        company_metadata.append({
            'TICKER': ticker,
            'COMPANY_NAME': 'N/A',
            'SECTOR': 'N/A',
            'INDUSTRY': 'N/A'
        })


In [None]:
df_meta = pd.DataFrame(company_metadata)
display(df_meta)

Unnamed: 0,TICKER,COMPANY_NAME,SECTOR,INDUSTRY
0,MMM,3M Company,Industrials,Conglomerates
1,AOS,A. O. Smith Corporation,Industrials,Specialty Industrial Machinery
2,ABT,Abbott Laboratories,Healthcare,Medical Devices
3,ABBV,AbbVie Inc.,Healthcare,Drug Manufacturers - General
4,ACN,Accenture plc,Technology,Information Technology Services
...,...,...,...,...
485,XEL,Xcel Energy Inc.,Utilities,Utilities - Regulated Electric
486,XYL,Xylem Inc.,Industrials,Specialty Industrial Machinery
487,YUM,"Yum! Brands, Inc.",Consumer Cyclical,Restaurants
488,ZBH,"Zimmer Biomet Holdings, Inc.",Healthcare,Medical Devices


## Retrieval-Augmented Generation (RAG): Retrieve Documents and Generate Answers

👉 **Instructions**:

In this part of the assignment, your task is to build a simple Retrieval-Augmented Generation (RAG) pipeline that:

- Takes a user question as input.
- Searches the FAISS vector store to find a set of relevant financial news articles based on semantic similarity.
- Uses the retrieved news articles as context to generate a clear, concise answer to the question by interacting with the OpenAI language model.
- Returns both the generated answer and the underlying news snippets used for context.

### What you need to focus on:

- Implement a retrieval mechanism to query your vector store and obtain the top relevant documents for any question.
- Construct prompts that effectively combine retrieved news content with the user’s question to guide the language model’s response.
- Use the OpenAI API to generate answers grounded in the retrieved context.
- Organize the outputs so that for each question, you have:
  - The generated answer.
  - The collection of news excerpts used to produce that answer.

### What you will be provided:

- Helper functions to display outputs in markdown format.
- Lists of example questions covering topics, companies, and industries to test your implementation.

---

Your solution can take any form or structure you find appropriate, as long as it fulfills these core objectives. This exercise will give you hands-on experience with integrating retrieval and generation for practical applications in finance.


#### Print markdown
You can use the following function to print answers from GPT4o-mini in markdown.

In [2]:
from IPython.display import Markdown, display

def print_markdown(text):
    display(Markdown(text))

#### Predefined questions

In [4]:
questions_topic = [
"What are the major concerns expressed in financial news about inflation?",
"How is investor sentiment described in recent financial headlines?",
"What role is artificial intelligence playing in recent finance-related news stories?"
]

questions_company = [
"How is Microsoft being portrayed in news stories about artificial intelligence?",
"What financial news headlines connect Amazon with automation or logistics?"
]

questions_industry = [
"What are the main themes emerging in financial news about the semiconductor industry?",
"What trends are being reported in the retail industry?",
"What risks or challenges are discussed in recent news about the energy industry?"
]

In [None]:
# Define the RAG function
def rag(query, k=K):
    """
    Retrieves relevant documents from the FAISS store and formats them for the LLM prompt.
    """
    retrieved_docs = faiss_store.search(query, k=k)

    # Format the retrieved documents for the LLM prompt
    context = "\n\n".join([doc[0] for doc in retrieved_docs])

    return context, retrieved_docs

In [None]:
# Generate the answers with OpenAI client
def generate_answer(query, context):
    """
    Generates an answer to the query based on the provided context using the OpenAI model.
    """
    prompt = f"""
    Based on the following news articles, answer the question:

    {context}

    Question: {query}

    Provide a concise answer based only on the information provided in the articles.
    """
    response = client.responses.create(
        model="gpt-4o-mini",
        input=prompt
    )
    return response.output_text

In [None]:
# Create the RAG pipeline that takes the previous steps into one function that takes a question and returns the answer and the source docs
def run_rag_pipeline(query, k=K):
    """
    Runs the RAG pipeline to retrieve relevant documents and generate an answer.
    """
    context, retrieved_docs = rag(query, k=k)
    answer = generate_answer(query, context)
    return answer, retrieved_docs

In [None]:
question_lists = {
    "Topic Questions": questions_topic,
    "Company Questions": questions_company,
    "Industry Questions": questions_industry
}

for list_name, questions in question_lists.items():
    print(f"--- {list_name} ---")
    for question in questions:
        print(f"\nQuestion: {question}")
        answer, retrieved_docs = run_rag_pipeline(question)
        print_markdown(f"**Answer:**\n{answer}")
        print("\nRetrieved Documents:")
        for i, (doc_text, doc_meta, sim) in enumerate(retrieved_docs):
            print(f"{i+1}. Similarity: {sim:.4f}")
            print(f"   Title/Summary: {doc_text.split(':', 1)[0] if ':' in doc_text else doc_text}")
            print(f"   Metadata: Ticker: {doc_meta['TICKER']}, Date: {doc_meta['PUBLICATION_DATE']}, Provider: {doc_meta['PROVIDER']}")
            print(f"   Summary: {doc_text.split(':', 1)[1].strip() if ':' in doc_text else 'N/A'}")
        print("-" * 50)

--- Topic Questions ---

Question: What are the major concerns expressed in financial news about inflation?


  return forward_call(*args, **kwargs)


**Answer:**
The major concerns expressed in financial news about inflation include:

1. **Persistent Inflation Risks**: The Federal Reserve is worried about ongoing inflation and its potential impact on the economy, suggesting it could lead to an economic slowdown.

2. **Food Inflation**: Rising food prices are dampening hopes for a rate cut, indicating that inflation continues to affect essential goods.

3. **Economic Uncertainty**: Analysts note that uncertainty about the macroeconomic landscape is negatively impacting earnings expectations and overall economic outlook.

4. **Impact on Consumer Stocks**: Consumer discretionary businesses are particularly vulnerable, with significant declines observed due to macroeconomic uncertainties.

5. **Rising Costs and Tariffs**: Tariff changes and their implications are contributing to rising costs, which could further strain consumers and influence inflation dynamics.

6. **Investor Behavior**: There is an increased interest in hard assets like gold as investors seek stability amidst growing inflation concerns and skepticism about fiscal sustainability.


Retrieved Documents:
1. Similarity: 0.5771
   Title/Summary: Bitcoin price slips as Fed minutes flag US inflation risks 
   Metadata: Ticker: BLK, Date: 2025-05-29, Provider: Yahoo Finance UK
   Summary: The Federal Reserve’s May policy meeting revealed mounting concern over persistent US inflation and the potential for economic slowdown.
2. Similarity: 0.4920
   Title/Summary: The Weekend
   Metadata: Ticker: TSLA, Date: 2025-05-31, Provider: Yahoo Finance UK
   Summary: Food inflation dampens hopes of a rate cut as tariff twists and turns continue : Key moments from the last seven days, plus a glimpse at the week ahead
3. Similarity: 0.4920
   Title/Summary: The Weekend
   Metadata: Ticker: NVDA, Date: 2025-05-31, Provider: Yahoo Finance UK
   Summary: Food inflation dampens hopes of a rate cut as tariff twists and turns continue : Key moments from the last seven days, plus a glimpse at the week ahead
4. Similarity: 0.4920
   Title/Summary: The Weekend
   Metadata: Ticker: LULU, Dat

**Answer:**
Investor sentiment in recent financial headlines is largely characterized by skepticism and caution. Analysts often issue optimistic price targets, driven by institutional pressures, despite acknowledging significant headwinds and questionable fundamentals for many favored stocks. Conversely, bearish forecasts, which are rare, attract attention as they signal serious concerns about certain stocks. Overall, there's a dichotomy between bullish analyst projections and underlying financial realities.


Retrieved Documents:
1. Similarity: 0.6115
   Title/Summary: 3 of Wall Street’s Favorite Stocks Facing Headwinds 
   Metadata: Ticker: KMX, Date: 2025-05-26, Provider: StockStory
   Summary: Wall Street has set ambitious price targets for the stocks in this article. While this suggests attractive upside potential, it’s important to remain skeptical because analysts face institutional pressures that can sometimes lead to overly optimistic forecasts.
2. Similarity: 0.5978
   Title/Summary: 3 Hyped Up  Stocks Facing Headwinds 
   Metadata: Ticker: MCHP, Date: 2025-05-20, Provider: StockStory
   Summary: Great things are happening to the stocks in this article. They’re all outperforming the market over the last month because of positive catalysts such as a new product line, constructive news flow, or even a loyal Reddit fanbase.
3. Similarity: 0.5894
   Title/Summary: 1 of Wall Street’s Favorite Stock with Impressive Fundamentals and 2 to Think Twice About 
   Metadata: Ticker: MPWR, Date

**Answer:**
Artificial intelligence (AI) is increasingly central to finance-related news stories, highlighting its impact on various sectors. Companies like Jack Henry & Associates and Upstart leverage AI to enhance lending processes and quantify credit risk, improving efficiency and returns for lenders. In fintech, AI-driven automation is fostering growth, while Salesforce faces challenges due to an overemphasis on AI, leading to market share loss. Additionally, firms like News Corp are pivoting towards AI-integrated digital operations for competitive advantage. Overall, AI is viewed as a transformative force in finance, influencing investment strategies and operational efficiency.


Retrieved Documents:
1. Similarity: 0.6974
   Title/Summary: Jack Henry (JKHY) Integrates AI-Driven Lending Tech With Algebrik 
   Metadata: Ticker: JKHY, Date: 2025-03-17, Provider: Insider Monkey
   Summary: We recently published a list of 12 AI News Investors Should Not Miss This Week. In this article, we are going to take a look at where Jack Henry & Associates, Inc. (NASDAQ:JKHY) stands against other AI news Investors should not miss this week. Artificial Intelligence (AI) is known to increase productivity, decrease human error, […]
2. Similarity: 0.6257
   Title/Summary: This "Magnificent Seven" Stock Is Set to Skyrocket If Its AI Investments Pay Off 
   Metadata: Ticker: META, Date: 2025-05-31, Provider: Motley Fool
   Summary: Meta Platforms has investments in several AI applications.  The tech giant's stock is only valued on its legacy business.  Over the past two-and-a-half years, investors have heard about various artificial intelligence (AI) investments that tech companies

**Answer:**
Microsoft is portrayed as a major player in the advancement of artificial intelligence, highlighted as a leader in innovations along with companies like Google and NVIDIA. The articles indicate that Microsoft's involvement in AI is contributing to market growth and emphasizes its collaborative efforts with other AI giants, showcasing its strategic importance in the evolving AI landscape.


Retrieved Documents:
1. Similarity: 0.5427
   Title/Summary: This "Magnificent Seven" Stock Is Set to Skyrocket If Its AI Investments Pay Off 
   Metadata: Ticker: META, Date: 2025-05-31, Provider: Motley Fool
   Summary: Meta Platforms has investments in several AI applications.  The tech giant's stock is only valued on its legacy business.  Over the past two-and-a-half years, investors have heard about various artificial intelligence (AI) investments that tech companies are making.
2. Similarity: 0.5054
   Title/Summary: How Salesforce has 'overcorrected' by leaning into AI 
   Metadata: Ticker: CRM, Date: 2025-05-29, Provider: Yahoo Finance Video
   Summary: D.A. Davidson head of technology research Gil Luria joins Market Domination to discuss Salesforce (CRM) earnings and the company's trajectory. Luria says Salesforce is "too focused" on artificial intelligence (AI), as the other parts of its business "rapidly" decelerate and the company loses market share to competitors. Luria h

**Answer:**
The financial news headlines connecting Amazon with automation or logistics include:

1. **UPS Sells Ware2Go To Peter Thiel-Backed Stord**: This deal allows Stord to acquire additional fulfillment space to compete with Amazon in the e-commerce sector.
  
2. **Woodward's Volumes, Automation to Drive Earnings Growth**: The article discusses the broader trend of automation in companies, indirectly relating to Amazon's operational strategies.

3. **Nvidia can't be stopped, Apple falls behind, and the AI data center race**: This highlights Amazon Web Services considering its strategy on leases, possibly in the context of automation and logistics for AI data centers.  

These points emphasize Amazon's relationship with logistics and automation in the context of market competition and industry dynamics.


Retrieved Documents:
1. Similarity: 0.6013
   Title/Summary: Truist Reiterates Buy on Amazon.com (AMZN) as Q2 Revenue Tracks Ahead 
   Metadata: Ticker: TFC, Date: 2025-05-25, Provider: Insider Monkey
   Summary: We recently published a list of 10 AI Stocks on Wall Street’s Radar. In this article, we are going to take a look at where Amazon.com Inc. (NASDAQ:AMZN) stands against other AI stocks on Wall Street’s radar. Amazon.com Inc. (NASDAQ:AMZN) is an American technology company offering e-commerce, cloud computing, and other services, including digital streaming […]
2. Similarity: 0.5231
   Title/Summary: Amazon's AI Roadmap With AWS CEO Garman 
   Metadata: Ticker: AMZN, Date: 2025-05-30, Provider: Bloomberg
   Summary: Every aspect of Amazon is leveraging artificial intelligence, says Matt Garman, CEO of Amazon Web Services. Garman discusses Amazon's AI roadmap and reflects on his first year in the role with Ed Ludlow on "Bloomberg Technology."
3. Similarity: 0.5163
   Title/Summa

**Answer:**
The main themes in the financial news about the semiconductor industry include:

1. **Investor Interest and Stock Performance**: Companies like ON Semiconductor are attracting significant investor attention, with reports of notable stock price increases despite soft earnings. Share repurchase programs are seen as a factor boosting confidence.

2. **International Revenue Trends**: There is a focus on how international revenue affects forecasts and stock prospects. Companies are being analyzed for their growth potential tied to global markets.

3. **Challenges and Opportunities**: Despite some companies showing strong growth in specific areas like SiC (Silicon Carbide) and AI Data Centers, the industry faces challenges such as declining demand in markets like electric vehicles (EVs) amid tough macroeconomic conditions.

4. **Divergence of Performance**: There is a contrast between some stocks’ performance, with some gaining significantly while others see declines year-to-date, reflecting varied responses to market dynamics and earnings reports.


Retrieved Documents:
1. Similarity: 0.6429
   Title/Summary: Investing in ON Semiconductor Corp. (ON)? Don't Miss Assessing Its International Revenue Trends 
   Metadata: Ticker: ON, Date: 2025-05-13, Provider: Zacks
   Summary: Explore ON Semiconductor Corp.'s (ON) international revenue trends and how these numbers impact Wall Street's forecasts and what's ahead for the stock.
2. Similarity: 0.5822
   Title/Summary: ON Semiconductor Corporation (ON) is Attracting Investor Attention
   Metadata: Ticker: ON, Date: 2025-05-21, Provider: Zacks
   Summary: Here is What You Should Know : Recently, Zacks.com users have been paying close attention to ON Semiconductor Corp. (ON). This makes it worthwhile to examine what the stock has in store.
3. Similarity: 0.5649
   Title/Summary: Some May Be Optimistic About ON Semiconductor's (NASDAQ
   Metadata: Ticker: ON, Date: 2025-05-12, Provider: Simply Wall St.
   Summary: ON) Earnings : Soft earnings didn't appear to concern ON Semiconductor Corpo

**Answer:**
The retail industry is experiencing significant challenges, with stock performance declining by 13.7% over the past six months, worse than the S&P 500's 5.5% loss. Retailers are adapting their business models to changing consumer shopping habits due to technological advances and trade policies, particularly Trump's tariff policies which have led to changes in supply chains. Many price increases have already affected consumers, and an inventory overflow is anticipated due to a surge in imports ahead of a tariff pause, potentially leading to deeper discounts and margin pressures. Overall, demand trends appear to be working against retailers, causing uncertainty in the market.


Retrieved Documents:
1. Similarity: 0.5898
   Title/Summary: 3 Consumer Stocks That Concern Us 
   Metadata: Ticker: KMX, Date: 2025-05-12, Provider: StockStory
   Summary: Retailers are adapting their business models as technology changes how people shop. Still, demand can be volatile as the industry is exposed to the ups and downs of consumer spending. This has stirred some uncertainty lately as retail stocks have tumbled by 13.7% over the past six months. This performance was worse than the S&P 500’s 5.5% loss.
2. Similarity: 0.5056
   Title/Summary: Retailers, Ducking Trade-War Curveballs, Stick to Their Plans 
   Metadata: Ticker: BBY, Date: 2025-05-29, Provider: The Wall Street Journal
   Summary: As legal rulings roll in on Trump’s tariff policies, retail executives say they have shifted their supply chains and many price increases already have hit shelves.
3. Similarity: 0.4997
   Title/Summary: 3 Consumer Stocks Skating on Thin Ice 
   Metadata: Ticker: HLT, Date: 2025-05-22,

**Answer:**
Recent news about the energy industry highlights several risks and challenges:

1. **Legislative Threats**: A proposed bill in Congress threatens to repeal essential subsidies for the renewable energy sector, potentially making projects uneconomical and leading to a crash in renewable energy stocks.

2. **Oil and Gas Sector Struggles**: Companies in the oilfield service sector, such as SLB, HAL, and BKR, face challenges due to sliding oil prices, rising tariffs, and shrinking drilling budgets, which may impact profitability.

3. **Trade Wars and Tariffs**: Ongoing trade conflicts, particularly stemming from the Trump administration's tariffs, create uncertainty for various energy-related companies and could affect their competitive positioning.

4. **Investor Sentiment**: The industrials sector, which includes energy companies, is experiencing a forecast of a prolonged downturn, reflected in a significant stock pullback over the past six months.

5. **Wildfire Liability**: Companies like Xcel Energy and Edison International face legal and financial challenges related to wildfire risks, which could impact their operations and investor confidence.

6. **Market Valuation**: A notable portion of mid- and small-cap oil and gas stocks are trading below their book values, indicating potential undervaluation amidst market uncertainties.


Retrieved Documents:
1. Similarity: 0.5362
   Title/Summary: Renewable Energy Stocks Crash as U.S. Advances Bill That Could Decimate the Industry 
   Metadata: Ticker: NEE, Date: 2025-05-23, Provider: Motley Fool
   Summary: Congress is pushing forward a bill that could upend the renewable energy industry.  Just as companies have ramped up production and renewable electricity generation in the U.S., those projects may become uneconomical.  The news was about as bad as it could get for renewable energy stocks this week as the U.S. House of Representatives early Thursday passed a bill that will repeal some of the most important subsidies for the industry if it becomes law.
2. Similarity: 0.5362
   Title/Summary: Renewable Energy Stocks Crash as U.S. Advances Bill That Could Decimate the Industry 
   Metadata: Ticker: ENPH, Date: 2025-05-23, Provider: Motley Fool
   Summary: Congress is pushing forward a bill that could upend the renewable energy industry.  Just as companies have ramped 

## Analysis & Questions - Section 1

### Analysis and Reflection on Retrieval and Generation Results
After running the RAG pipeline and obtaining answers along with their supporting news excerpts, take some time to carefully review both the generated responses and the retrieved contexts.

- **For each question, read the answer and then the corresponding news snippets used as context.**

- Reflect on the following points and document your observations:
1. **Relevance**
2. **Completeness**  
3. **Bias or Noise**
4. **Consistency**  
5. **Improvement Ideas**   

and answer the questions below:

#### **Question 1.** How well do the retrieved news snippets support the generated answer? Are the key facts or themes in the answer clearly grounded in the context?

In most of the cases the top 5 news snippets support the generated answer, but as we move on to the rest of the news with less similarity score we start to find some unrelated information about the question and answers provided.
The key facts or themes in the answers are generally grounded in the context, mentioning specific terms included in the questions.


There is some variation on the level of support depending on the question and the specific snippets retrieved. Some snippets are more relevant and informative than others, leading to a better answer while other points might be based on weaker or less explicit information from the snippets.

#### **Question 2.** Does the answer fully address the question, or does it leave important aspects out? Consider if the retrieved context provided enough information to generate a thorough response.

The answers generally address the questions, but there may be some aspects that are left out, this could be because of the lack of completeness found on the news or that the contexts are weak for specific questions.

By adding more context to the news (metadata), we can get better and more thourough responses.

#### **Question 3.** Are there any irrelevant or misleading snippets retrieved that may have influenced the answer? How might this affect the quality of the output?

Yes, there can be irrelevant or misleading snippets retrieved. This can happen if we include all the different news providers, without applying a specific filter (some news providers could be less serious than others).  
The semantic search can find relevant documents based on the matching keywords but they can have different contexts that can affect the similarity score.


Irrelevant snippets can negatively affect the quality of the output in the following ways:
*   **Affecting relevant information:** They can push relevant snippets out of the top K results.
*   **Introducing noise:** The LLM might include information from these snippets that is not pertinent to the question, making the answer less complete or relevant.
*   **Potential inaccuracies:** In some cases, misleading snippets could create an answer that is partially inaccurate, especially if conflicting information is present.

#### **Question 4.**  Do the news snippets show consistent information, or are there conflicting viewpoints? How does the LLM handle potential contradictions in the context?

The news snippets can sometimes show inconsistent information or different viewpoints, for example when discussing the market sentiment.

The LLM helped summarized the news but it didn't necessarily handle potential contradictions. The contradictions can be included on the answers.

#### **Question 5.**  Based on your observations, suggest ways the retrieval or generation process could be improved (e.g., better filtering, adjusting `k`, refining prompt design).

Some suggestions to improve the generation process are:

*   **Improve filtering:** Include a metadata filter to get more relevant information (by sector, date, etc).
*   **Try different `k`:** With a smaller `k` we could reduce noise but also miss relevant information. With a larger `k` we could include noise and make the prompt take longer. We can try different values to see what makes the answers more grounded and relevant.
*   **Try different prompts :** Improve the prompt to the LLM to better handle potential inconsistencies and focus on synthesizing information. For example, asking the LLM to identify different viewpoints if present and instruct the LLM to prioritize information from more reputable sources.
*   **Re-ranking Retrieved Documents:** After initial retrieval, re-rank the documents based on additional criteria, such as publication date (prioritizing newer news) or source reliability, before passing them to the LLM.
*   **Group news by "title/summary":** Some of the news snippets were repeated because they would include different tickers. This could give more importance to them and leave more relevant news out of the context.

## 🧠 Retrieval-Augmented Generation (RAG) v2: Adding Financial Metadata to Improve Generation

👉 **Instructions**:

In this part of the assignment, you’ll enhance your Retrieval-Augmented Generation (RAG) pipeline by incorporating *financial metadata* to provide more contextually rich answers.

Your goal is to evaluate whether metadata such as **company name**, **sector**, and **industry** helps the LLM generate **more accurate and grounded answers** to financial questions.

---

### ✅ What your updated pipeline should do:

- Retrieve relevant financial news articles using semantic similarity with FAISS.
- Enrich each retrieved document with financial metadata:
  - Ticker symbol
  - Full company name
  - Sector (e.g., Technology, Energy)
  - Industry (e.g., Semiconductors, Retail)
- Construct prompts that include both:
  - Retrieved news text
  - Associated metadata
- Send the prompt to the OpenAI model to generate an informed response.
- Return:
  - The final answer
  - The exact set of contextual documents used to produce that answer

---

### 🧪 Evaluation and Comparison:

You will test your improved RAG pipeline on the same three types of questions provided earlier:
- **Topic-focused** (e.g., inflation, interest rates)
- **Company-focused** (e.g., questions about Tesla, Nvidia)
- **Industry-focused** (e.g., semiconductors, utilities)


In [None]:
# Define the RAG function with metadata enrichment
def rag_metadata(query, df_meta, k=K):
    """
    Retrieves relevant documents from the FAISS store and formats them for the LLM prompt,
    including financial metadata.
    """
    retrieved_docs = faiss_store.search(query, k=k)

    enriched_docs = []
    context_parts = []

    for doc_text, doc_meta, sim in retrieved_docs:
        ticker = doc_meta.get('TICKER')
        company_name = 'N/A'
        sector = 'N/A'
        industry = 'N/A'

        if ticker and ticker in df_meta['TICKER'].values:
            meta_row = df_meta[df_meta['TICKER'] == ticker].iloc[0]
            company_name = meta_row.get('COMPANY_NAME', 'N/A')
            sector = meta_row.get('SECTOR', 'N/A')
            industry = meta_row.get('INDUSTRY', 'N/A')

        # Create an enriched metadata dictionary
        enriched_meta = {
            'TICKER': ticker,
            'COMPANY_NAME': company_name,
            'SECTOR': sector,
            'INDUSTRY': industry,
            'PUBLICATION_DATE': doc_meta.get('PUBLICATION_DATE'),
            'PROVIDER': doc_meta.get('PROVIDER')
        }

        enriched_docs.append((doc_text, enriched_meta, sim))

        # Format context for the LLM, including metadata
        context_parts.append(
            f"Title/Summary: {doc_text}\n"
            f"Ticker: {enriched_meta['TICKER']}, Company: {enriched_meta['COMPANY_NAME']}, "
            f"Sector: {enriched_meta['SECTOR']}, Industry: {enriched_meta['INDUSTRY']}, "
            f"Date: {enriched_meta['PUBLICATION_DATE']}, Provider: {enriched_meta['PROVIDER']}"
        )

    context = "\n\n---\n\n".join(context_parts)

    return context, enriched_docs

# Update the run_rag_pipeline function to pass df_meta to rag
def run_rag_pipeline_with_metadata(query, df_meta, k=K):
    """
    Runs the RAG pipeline with metadata enrichment to retrieve relevant documents
    and generate an answer.
    """
    context, retrieved_docs = rag_metadata(query, df_meta, k=k)
    answer = generate_answer(query, context) # generate_answer uses the enriched context
    return answer, retrieved_docs

In [None]:
question_lists = {
    "Topic Questions (with Metadata)": questions_topic,
    "Company Questions (with Metadata)": questions_company,
    "Industry Questions (with Metadata)": questions_industry
}

for list_name, questions in question_lists.items():
    print(f"--- {list_name} ---")
    for question in questions:
        print(f"\nQuestion: {question}")
        # Use the updated RAG pipeline function
        answer, retrieved_docs = run_rag_pipeline_with_metadata(question, df_meta)
        print_markdown(f"**Answer (with Metadata):**\n{answer}")
        print("\nRetrieved Documents (with Metadata):")
        for i, (doc_text, doc_meta, sim) in enumerate(retrieved_docs):
            print(f"{i+1}. Similarity: {sim:.4f}")
            print(f"   Metadata: Ticker: {doc_meta['TICKER']}, Company: {doc_meta['COMPANY_NAME']}, Sector: {doc_meta['SECTOR']}, Industry: {doc_meta['INDUSTRY']}, Date: {doc_meta['PUBLICATION_DATE']}, Provider: {doc_meta['PROVIDER']}")
            # Print the original Title/Summary to keep it clean, as the full text is in the context sent to LLM
            print(f"   Original Text (Title/Summary): {doc_text}")
        print("-" * 50)

--- Topic Questions (with Metadata) ---

Question: What are the major concerns expressed in financial news about inflation?


  return forward_call(*args, **kwargs)


**Answer (with Metadata):**
The major concerns expressed in financial news about inflation include:

1. **Persistent Inflation**: The Federal Reserve's minutes highlighted mounting worries over ongoing inflation in the U.S., potentially leading to an economic slowdown.

2. **Food Inflation**: Recent reports indicate that rising food prices are dampening hopes for potential interest rate cuts, exacerbated by ongoing tariff issues.

3. **Macroeconomic Uncertainty**: Analysts are cutting earnings projections due to uncertainties in the macroeconomic landscape, which are significantly impacting the overall earnings outlook.

4. **Rising Costs**: Articles cited increased grocery bills due to inflation and tariffs, affecting consumer spending.

5. **Geopolitical Tensions**: Concerns about geopolitical factors contributing to inflationary pressures, particularly in relation to hard assets like gold. 

Overall, inflation is viewed as a significant risk impacting both economic stability and consumer behavior.


Retrieved Documents (with Metadata):
1. Similarity: 0.5771
   Metadata: Ticker: BLK, Company: BlackRock, Inc., Sector: Financial Services, Industry: Asset Management, Date: 2025-05-29, Provider: Yahoo Finance UK
   Original Text (Title/Summary): Bitcoin price slips as Fed minutes flag US inflation risks : The Federal Reserve’s May policy meeting revealed mounting concern over persistent US inflation and the potential for economic slowdown.
2. Similarity: 0.4920
   Metadata: Ticker: TSLA, Company: Tesla, Inc., Sector: Consumer Cyclical, Industry: Auto Manufacturers, Date: 2025-05-31, Provider: Yahoo Finance UK
   Original Text (Title/Summary): The Weekend: Food inflation dampens hopes of a rate cut as tariff twists and turns continue : Key moments from the last seven days, plus a glimpse at the week ahead
3. Similarity: 0.4920
   Metadata: Ticker: NVDA, Company: NVIDIA Corporation, Sector: Technology, Industry: Semiconductors, Date: 2025-05-31, Provider: Yahoo Finance UK
   Original Te

**Answer (with Metadata):**
Investor sentiment in recent financial headlines is characterized by skepticism and caution. Many articles highlight that analysts are overwhelmingly bullish on certain stocks, yet there are warnings about the reliability of their optimistic forecasts due to institutional pressures. Additionally, bearish sentiment is noted, especially when Wall Street issues downbeat forecasts, indicating serious concerns about several stocks. Overall, there is a mixture of enthusiasm for some stocks and caution due to potential headwinds and questionable fundamentals for others.


Retrieved Documents (with Metadata):
1. Similarity: 0.6115
   Metadata: Ticker: KMX, Company: CarMax, Inc., Sector: Consumer Cyclical, Industry: Auto & Truck Dealerships, Date: 2025-05-26, Provider: StockStory
   Original Text (Title/Summary): 3 of Wall Street’s Favorite Stocks Facing Headwinds : Wall Street has set ambitious price targets for the stocks in this article. While this suggests attractive upside potential, it’s important to remain skeptical because analysts face institutional pressures that can sometimes lead to overly optimistic forecasts.
2. Similarity: 0.5978
   Metadata: Ticker: MCHP, Company: Microchip Technology Incorporated, Sector: Technology, Industry: Semiconductors, Date: 2025-05-20, Provider: StockStory
   Original Text (Title/Summary): 3 Hyped Up  Stocks Facing Headwinds : Great things are happening to the stocks in this article. They’re all outperforming the market over the last month because of positive catalysts such as a new product line, constructive new

**Answer (with Metadata):**
Artificial intelligence (AI) is playing a significant role in recent finance-related news stories by enhancing efficiency and decision-making across various sectors:

1. **Credit Risk Assessment**: Companies like Upstart utilize AI to quantify credit risk, improving returns for lenders.
  
2. **Taxpayer Experience Improvement**: Intuit is using AI agents and human AI-assisted experts to reduce the time customers spend on tax returns, aiming to boost revenue.

3. **Investment Trends**: Billionaires are investing in AI stocks expected to yield high returns, indicating market confidence in AI’s potential.

4. **Financial Technology**: AI is driving momentum in fintech, as seen with companies leveraging AI for automation and improved service delivery.

5. **Stock Performance & Valuation**: Firms such as Palantir are experiencing growth driven by their AI-related offerings, showcasing the market's response to AI capabilities.

Overall, AI is viewed as a critical factor for innovation and profitability in finance-related sectors.


Retrieved Documents (with Metadata):
1. Similarity: 0.6974
   Metadata: Ticker: JKHY, Company: Jack Henry & Associates, Inc., Sector: Technology, Industry: Information Technology Services, Date: 2025-03-17, Provider: Insider Monkey
   Original Text (Title/Summary): Jack Henry (JKHY) Integrates AI-Driven Lending Tech With Algebrik : We recently published a list of 12 AI News Investors Should Not Miss This Week. In this article, we are going to take a look at where Jack Henry & Associates, Inc. (NASDAQ:JKHY) stands against other AI news Investors should not miss this week. Artificial Intelligence (AI) is known to increase productivity, decrease human error, […]
2. Similarity: 0.6257
   Metadata: Ticker: META, Company: Meta Platforms, Inc., Sector: Communication Services, Industry: Internet Content & Information, Date: 2025-05-31, Provider: Motley Fool
   Original Text (Title/Summary): This "Magnificent Seven" Stock Is Set to Skyrocket If Its AI Investments Pay Off : Meta Platforms has i

**Answer (with Metadata):**
Microsoft is portrayed as a key player in the artificial intelligence (AI) sector, engaging in significant collaborations with other technology giants like Amazon and providing infrastructure for AI applications. However, there are concerns highlighted regarding potential setbacks, such as losing design contracts for AI chips related to Amazon. Overall, Microsoft is seen as benefiting from the AI trend, but faces challenges that could impact its market position.


Retrieved Documents (with Metadata):
1. Similarity: 0.5427
   Metadata: Ticker: META, Company: Meta Platforms, Inc., Sector: Communication Services, Industry: Internet Content & Information, Date: 2025-05-31, Provider: Motley Fool
   Original Text (Title/Summary): This "Magnificent Seven" Stock Is Set to Skyrocket If Its AI Investments Pay Off : Meta Platforms has investments in several AI applications.  The tech giant's stock is only valued on its legacy business.  Over the past two-and-a-half years, investors have heard about various artificial intelligence (AI) investments that tech companies are making.
2. Similarity: 0.5054
   Metadata: Ticker: CRM, Company: Salesforce, Inc., Sector: Technology, Industry: Software - Application, Date: 2025-05-29, Provider: Yahoo Finance Video
   Original Text (Title/Summary): How Salesforce has 'overcorrected' by leaning into AI : D.A. Davidson head of technology research Gil Luria joins Market Domination to discuss Salesforce (CRM) earnings and 

**Answer (with Metadata):**
The financial news headlines that connect Amazon with automation or logistics include:

1. **Amazon's AI Roadmap With AWS CEO Garman** - This article highlights how every aspect of Amazon is leveraging artificial intelligence.
   
2. **UPS Sells Ware2Go To Peter Thiel-Backed Stord** - This article discusses Stord's acquisition of Ware2Go, enhancing its logistics capabilities to compete with Amazon in e-commerce.

Both articles emphasize Amazon's involvement in automation and logistics, particularly through AI and competition in the logistics space.


Retrieved Documents (with Metadata):
1. Similarity: 0.6013
   Metadata: Ticker: TFC, Company: Truist Financial Corporation, Sector: Financial Services, Industry: Banks - Regional, Date: 2025-05-25, Provider: Insider Monkey
   Original Text (Title/Summary): Truist Reiterates Buy on Amazon.com (AMZN) as Q2 Revenue Tracks Ahead : We recently published a list of 10 AI Stocks on Wall Street’s Radar. In this article, we are going to take a look at where Amazon.com Inc. (NASDAQ:AMZN) stands against other AI stocks on Wall Street’s radar. Amazon.com Inc. (NASDAQ:AMZN) is an American technology company offering e-commerce, cloud computing, and other services, including digital streaming […]
2. Similarity: 0.5231
   Metadata: Ticker: AMZN, Company: Amazon.com, Inc., Sector: Consumer Cyclical, Industry: Internet Retail, Date: 2025-05-30, Provider: Bloomberg
   Original Text (Title/Summary): Amazon's AI Roadmap With AWS CEO Garman : Every aspect of Amazon is leveraging artificial intelligence, sa

**Answer (with Metadata):**
The main themes emerging in financial news about the semiconductor industry include:

1. **International Revenue Trends**: Companies like ON Semiconductor are focusing on their international revenue growth, which is critical for Wall Street forecasts and stock performance.

2. **Investor Attention**: ON Semiconductor is attracting significant investor interest, raising discussions about its long-term potential despite current earnings concerns.

3. **Market Volatility**: Despite softer earnings reports, some semiconductor stocks, including ON, have seen notable price surges, indicating investor confidence and potentially strong underlying fundamentals.

4. **Challenges and Adaptability**: The industry faces macroeconomic challenges, such as declining electric vehicle demand, but is also showing growth in areas like silicon carbide (SiC) and AI data centers.

5. **Stock Performance Metrics**: Reports indicate fluctuations in stock performance, with ON Semiconductor experiencing a significant price drop year-to-date, balanced by optimism over its buyback programs and growth potential.

6. **Comparison with Peers**: Various articles draw comparisons between semiconductor companies, highlighting standout performers and sectors within the industry. 

These themes underscore the dynamic nature of the semiconductor market, where investor sentiment and revenue trends play crucial roles amidst ongoing challenges.


Retrieved Documents (with Metadata):
1. Similarity: 0.6429
   Metadata: Ticker: ON, Company: ON Semiconductor Corporation, Sector: Technology, Industry: Semiconductors, Date: 2025-05-13, Provider: Zacks
   Original Text (Title/Summary): Investing in ON Semiconductor Corp. (ON)? Don't Miss Assessing Its International Revenue Trends : Explore ON Semiconductor Corp.'s (ON) international revenue trends and how these numbers impact Wall Street's forecasts and what's ahead for the stock.
2. Similarity: 0.5822
   Metadata: Ticker: ON, Company: ON Semiconductor Corporation, Sector: Technology, Industry: Semiconductors, Date: 2025-05-21, Provider: Zacks
   Original Text (Title/Summary): ON Semiconductor Corporation (ON) is Attracting Investor Attention: Here is What You Should Know : Recently, Zacks.com users have been paying close attention to ON Semiconductor Corp. (ON). This makes it worthwhile to examine what the stock has in store.
3. Similarity: 0.5649
   Metadata: Ticker: ON, Company: O

**Answer (with Metadata):**
The articles report several concerning trends in the retail industry:

1. **Volatile Demand**: Retailers are experiencing fluctuating demand as consumer spending changes, impacting sales negatively. 

2. **Stock Performance Decline**: Retail stocks have tumbled recently, with performance declines reported at 13.7% and 12.3% over the past six months, worse than the S&P 500’s losses.

3. **Supply Chain Adjustments**: Retailers are shifting their supply chains in response to changes in tariff policies, leading to price increases already hitting shelves.

4. **Inventory Overflows**: A potential inventory overflow is expected due to surges in imports, leading to deeper discounts and margin pressure.

5. **High Operating Costs**: The restaurant segment, particularly, faces challenges from high inventory and labor costs, leading to thin margins and increased risk if demand decreases further.

Overall, there is a consensus on a challenging retail environment with significant pressures from both external economic factors and internal operational challenges.


Retrieved Documents (with Metadata):
1. Similarity: 0.5898
   Metadata: Ticker: KMX, Company: CarMax, Inc., Sector: Consumer Cyclical, Industry: Auto & Truck Dealerships, Date: 2025-05-12, Provider: StockStory
   Original Text (Title/Summary): 3 Consumer Stocks That Concern Us : Retailers are adapting their business models as technology changes how people shop. Still, demand can be volatile as the industry is exposed to the ups and downs of consumer spending. This has stirred some uncertainty lately as retail stocks have tumbled by 13.7% over the past six months. This performance was worse than the S&P 500’s 5.5% loss.
2. Similarity: 0.5056
   Metadata: Ticker: BBY, Company: Best Buy Co., Inc., Sector: Consumer Cyclical, Industry: Specialty Retail, Date: 2025-05-29, Provider: The Wall Street Journal
   Original Text (Title/Summary): Retailers, Ducking Trade-War Curveballs, Stick to Their Plans : As legal rulings roll in on Trump’s tariff policies, retail executives say they have shift

**Answer (with Metadata):**
Recent news highlights several risks and challenges facing the energy industry:

1. **Legislative Risks**: A bill in Congress threatens to repeal significant subsidies for renewable energy, which could make new projects uneconomical and adversely impact companies like NextEra Energy and Enphase Energy.

2. **Commodity Price Volatility**: Oil prices are sliding, and increases in tariffs are leading to tighter drilling budgets, affecting companies in the oil and gas equipment and services sector, such as Halliburton and Baker Hughes.

3. **Economic Cycles**: The industrial sector, which includes energy, is experiencing demand fluctuations tied to economic conditions, leading to stock declines in companies like Otis Worldwide and Dover Corporation.

4. **Wildfire Liability**: Utilities like Xcel Energy and Edison International face litigation risks related to wildfire management, which can negatively impact their operations and investor confidence.

5. **Market Underperformance**: Some energy stocks are trading below their book values due to a perceived overall undervaluation of the sector, which may create uncertainties for future investments.

6. **Environmental Regulations**: Projects like Dominion Energy's offshore wind initiative face uncertainty as they are affected by political climates and regulatory changes.

Overall, the energy sector is navigating a combination of regulatory, market, economic, and environmental challenges.


Retrieved Documents (with Metadata):
1. Similarity: 0.5362
   Metadata: Ticker: NEE, Company: NextEra Energy, Inc., Sector: Utilities, Industry: Utilities - Regulated Electric, Date: 2025-05-23, Provider: Motley Fool
   Original Text (Title/Summary): Renewable Energy Stocks Crash as U.S. Advances Bill That Could Decimate the Industry : Congress is pushing forward a bill that could upend the renewable energy industry.  Just as companies have ramped up production and renewable electricity generation in the U.S., those projects may become uneconomical.  The news was about as bad as it could get for renewable energy stocks this week as the U.S. House of Representatives early Thursday passed a bill that will repeal some of the most important subsidies for the industry if it becomes law.
2. Similarity: 0.5362
   Metadata: Ticker: ENPH, Company: Enphase Energy, Inc., Sector: Technology, Industry: Solar, Date: 2025-05-23, Provider: Motley Fool
   Original Text (Title/Summary): Renewable Energ

## Analysis & Questions - Section 2

### Instructions: Evaluate Answers With and Without Metadata

For each question, compare the two answers provided:
- One generated **without** metadata
- One generated **with** metadata

---

### Steps:

1. Use the following evaluation criteria:
   - Clarity
   - Detail & Depth
   - Use of Context
   - Accuracy & Grounding
   - Relevance
   - Narrrative Flow

2. For each criterion, write brief notes comparing how the answer **without metadata** performs versus the answer **with metadata**.

3. Summarize your evaluation in a markdown table with the following columns:

| Criteria       | WITHOUT METADATA            | WITH METADATA             |
|----------------|----------------------------|--------------------------|
| Clarity        | [Your brief note here]     | [Your brief note here]   |
| Detail & Depth         | [Your brief note here]     | [Your brief note here]   |
| Use of Context        | [Your brief note here]     | [Your brief note here]   |
| Accuracy & Grounding       | [Your brief note here]     | [Your brief note here]   |
| Relevance      | [Your brief note here]     | [Your brief note here]   |
| Narrative Flow      | [Your brief note here]     | [Your brief note here]   |

---

**Note:** Keep comments short and clear for easy comparison.



In [3]:
print_markdown("""
| Criteria         | WITHOUT METADATA                                                                                                | WITH METADATA                                                                                                                               |
|------------------|-----------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------|
| Clarity          | Answers are generally clear, but sometimes lack specific context.                                  | Answers are clear and often provide more specific context by referencing company names, sectors, and industries from the metadata. |
| Detail & Depth   | Provides a good overview based on the text, but can miss information related to specific companies or industries.     | Offers more detailed answers by including company and industry information, allowing for a better understanding.          |
| Use of Context   | Uses the text of the articles effectively to form the answer.                                                     | Uses both the text and the provided metadata (Ticker, Company, Sector, Industry) to form the answer, providing more complete context.          |
| Accuracy & Grounding| Grounded in the retrieved text, but accuracy can be limited by the lack of company/sector identification in the context. | More accurately connects information to companies and sectors due to the explicit metadata provided in the context.         |
| Relevance        | Generally relevant to the query based on semantic similarity of the text.                                         | Relevance is enhanced by the metadata, ensuring the answer is related to specific companies or industries mentioned in the retrieved context.   |
| Narrative Flow   | The answer can be read as a summary of the retrieved snippets.                                                                   | The answer reads as a structured summary, organizing information by company or industry, improving the narrative flow and readability.     |
""")


| Criteria         | WITHOUT METADATA                                                                                                | WITH METADATA                                                                                                                               |
|------------------|-----------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------|
| Clarity          | Answers are generally clear, but sometimes lack specific context.                                  | Answers are clear and often provide more specific context by referencing company names, sectors, and industries from the metadata. |
| Detail & Depth   | Provides a good overview based on the text, but can miss information related to specific companies or industries.     | Offers more detailed answers by including company and industry information, allowing for a better understanding.          |
| Use of Context   | Uses the text of the articles effectively to form the answer.                                                     | Uses both the text and the provided metadata (Ticker, Company, Sector, Industry) to form the answer, providing more complete context.          |
| Accuracy & Grounding| Grounded in the retrieved text, but accuracy can be limited by the lack of company/sector identification in the context. | More accurately connects information to companies and sectors due to the explicit metadata provided in the context.         |
| Relevance        | Generally relevant to the query based on semantic similarity of the text.                                         | Relevance is enhanced by the metadata, ensuring the answer is related to specific companies or industries mentioned in the retrieved context.   |
| Narrative Flow   | The answer can be read as a summary of the retrieved snippets.                                                                   | The answer reads as a structured summary, organizing information by company or industry, improving the narrative flow and readability.     |
