# Part 1: Get Sentiment Using an LLM

You are given the daily market news of AAPL, MSFT, and TSLA from the first six months of 2023 in market_news.csv. Using the Groq API and market_news.csv, feed the news to the model and ask it to give you sentiment for each asset based on the news. 

Hint: sentiment can be broken down into negative, positive, or neutral. How can you translate this to trading signals in the next part?

## Imports
(feel free to import other things if you feel the need)

In [4]:
import pandas as pd
from groq import Groq
import json
import os
from dotenv import load_dotenv
import time

## Configure API

Make sure to create a Groq API Key and store it in your .env file as GROQ_API_KEY.

You can create as many free API keys as you need, so if you run out of credits just make a new one

In [5]:
load_dotenv()
GROQ_API_KEY = os.getenv("GROQ_API_KEY")
client = Groq(api_key=GROQ_API_KEY)
model = 'llama-3.1-8b-instant'

## Use the API to Ask the Model to Give Sentiment Scores Based on the News

**Important**: to avoid rate limiting issues and to speed up the process, you should batch multiple dates/news summaries together when invoking the API call and use `time.sleep()` for per-minute rate limits.

In [None]:

# SENTIMENTS ARE PROVIDED IN MARKET_NEWS.CSV
def get_sentiment(text):

    prompt = """
You are a senior financial sentiment analyst specializing in equity market news and earnings commentary.

TASK:
For each provided summary, assign a sentiment score on a continuous scale from -1.0 to 1.0.

SCALE DEFINITION:
-  1.00  = Extremely positive with strong, clear upside implications for investors
-  0.75  = Clearly positive with meaningful favorable impact
-  0.50  = Moderately positive
-  0.25  = Slightly positive
-  0.00  = Neutral, balanced, or purely informational with no clear directional impact
- -0.25  = Slightly negative
- -0.50  = Moderately negative
- -0.75  = Clearly negative with meaningful downside implications
- -1.00  = Extremely negative with severe downside risk

SCORING RULES:
- Evaluate sentiment based on financial and market impact, not emotional wording.
- Focus strictly on implications for investors, valuation, risk, growth, profitability, or outlook.
- Use the full range when appropriate; avoid clustering near 0 unless the summary is truly neutral.
- Earnings beats, raised guidance, growth acceleration, upgrades → Positive.
- Missed earnings, lowered guidance, regulatory risk, litigation, layoffs → Negative.
- Mixed signals should be weighted by overall investor impact.
- Round the final score to two decimal places (e.g., 0.37, -0.82, 1.00).
- Do NOT explain your reasoning.
- Do NOT include commentary.
- Output ONLY valid JSON.

INPUT FORMAT:
Each item is formatted as:
[IDX]: [SUMMARY]

OUTPUT FORMAT (strictly follow this structure):
[{"row_id": IDX, "sentiment": 0.84},
  {"row_id": IDX, "sentiment": -0.63}]

IMPORTANT:
- Preserve the exact IDX values.
- Do not reorder items.
- Do not invent or skip IDs.
- Ensure all sentiment values are numeric (not strings).
- Strictly follow the output format as seen above.
- Return ONLY valid JSON as output. Ensure the output is valid JSON.
- Do NOT enclose output in markdown formatting (no ```json).

Summaries:
""" + text

    
    chat_completion = client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": prompt,
            }
        ],
        model=model,
    )
    return chat_completion.choices[0].message.content

## Get and Save the Sentiment Scores for Backtest Later

Can save them to a new csv file or any other way you can think of

In [9]:
BATCH_SIZE = 20 # Feel free to change this
mndf = pd.read_csv("market_news.csv")

start = 0
end = len(mndf)

for i in range(start, end, BATCH_SIZE):
    rows = mndf.iloc[i:(i + BATCH_SIZE)]
    sentiment_text = ""
    for row_id, row in rows.iterrows():
        sentiment_text += f"{row_id}: {row.summary}\n"

    retries = 0
    while retries < 3:
        sentiment_json = get_sentiment(sentiment_text)
        if not sentiment_json:
            print(f"Error: Sentiment json output {sentiment_json}")
            retries += 1
            time.sleep(5)
            continue
        
        try:
            sentiment_json = json.loads(sentiment_json)
            print(f"Completed indices {i}-{i+BATCH_SIZE}: {sentiment_json}")
            for item in sentiment_json:
                mndf.loc[item["row_id"], "sentiment"] = round(float(item["sentiment"]), 2)
            mndf.to_csv("market_news_sentiment.csv", index=False)
            break

        except json.JSONDecodeError:
            print(f"Error: Sentiment json output {sentiment_json}")
            retries += 1
            time.sleep(5)
            continue
    time.sleep(5) # pause for rate limiting

print("Done")

Completed indices 0-20: [{'row_id': 0, 'sentiment': 0.87}, {'row_id': 1, 'sentiment': 0.83}, {'row_id': 2, 'sentiment': 0.41}, {'row_id': 3, 'sentiment': 0.82}, {'row_id': 4, 'sentiment': 0.73}, {'row_id': 5, 'sentiment': 0.33}, {'row_id': 6, 'sentiment': 0.9}, {'row_id': 7, 'sentiment': 0.86}, {'row_id': 8, 'sentiment': -0.45}, {'row_id': 9, 'sentiment': 0.83}, {'row_id': 10, 'sentiment': 0.8}, {'row_id': 11, 'sentiment': -0.62}, {'row_id': 12, 'sentiment': 0.88}, {'row_id': 13, 'sentiment': 0.86}, {'row_id': 14, 'sentiment': -0.25}, {'row_id': 15, 'sentiment': 0.88}, {'row_id': 16, 'sentiment': 0.93}, {'row_id': 17, 'sentiment': -0.73}, {'row_id': 18, 'sentiment': 0.93}, {'row_id': 19, 'sentiment': 0.85}]
Completed indices 20-40: [{'row_id': 20, 'sentiment': 0.5}, {'row_id': 21, 'sentiment': 0.75}, {'row_id': 22, 'sentiment': 0.75}, {'row_id': 23, 'sentiment': 0.25}, {'row_id': 24, 'sentiment': 0.75}, {'row_id': 25, 'sentiment': 0.83}, {'row_id': 26, 'sentiment': 0.13}, {'row_id': 27