<a href="https://colab.research.google.com/github/Rajeswari0410/Algo-Tree/blob/main/Full_Retriever.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install groq



In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
import os
from pathlib import Path
from google.colab import userdata
from groq import Groq

folderpath = "/content/drive/MyDrive/Earnings2Insights/ECTsum"

key = userdata.get("Groq_Key")
client1 = Groq(api_key=key)

key = userdata.get("Groq2_key")
client2 = Groq(api_key=key)


def load_transcript(transcript_path):
    try:
        with open(transcript_path, "r", encoding="utf-8") as f:
            return f.read()
    except Exception as e:
        print(f"Error reading {transcript_path}: {e}")
        return None

In [None]:
import tiktoken

enc = tiktoken.get_encoding("cl100k_base") #tokenizer

def chunk_text(text, max_tokens=4000):
    tokens = enc.encode(text)
    chunks = []
    start = 0
    while start < len(tokens):
        end = min(start + max_tokens, len(tokens))
        chunk = enc.decode(tokens[start:end])
        chunks.append(chunk)
        start = end
    return chunks

def generate_report(transcript_text, ecc_id, client):
    chunks = chunk_text(transcript_text)
    all_chunks_output = []

    for i, chunk in enumerate(chunks):
        prompt = f"""

Here is part {i+1} of the earnings call transcript for {ecc_id}.
Extract key financial metrics from earnings data. Output in structured format.
Output Format
Company: [Name]
Period: [Quarter/Year]
Metrics:

Revenue: [Amount, growth %]
EPS: [GAAP and/or Adjusted amounts]
Guidance: [Forward-looking statements]
Dividends: [Changes if mentioned]
Notable: [Key highlights]

Key Data Points to Extract

Revenue figures and growth percentages
Earnings per share (both GAAP and adjusted)
Profit/loss amounts
Forward guidance (ranges, percentages)
Dividend changes
Year-over-year comparisons

Processing Rules

Extract company names from any format
Convert descriptive terms: "high teens" = 15-19%, "low 30s" = 30-34%
Distinguish GAAP vs adjusted metrics
Note currency impacts when mentioned
Flag significant guidance changes

Example Output
Company: AMETEK
Period: Q1 2021
Metrics:

Revenue: $1.22B, +1% Q1; 2021 guidance: up 15-19% total, up high single digits organic
EPS: Q1 adjusted $1.07; 2021 guidance: $4.48-$4.56; Q2 2021 guidance: $1.08-$1.10
Guidance: Q2 sales expected up 30-34% vs Q2 2020
Notable: Strong forward guidance despite modest Q1 growth

Process the provided financial data using this format.

Transcript (Part {i+1}):
'''
{chunk}
'''
"""
        response = client.chat.completions.create(
            model="llama3-70b-8192",
            messages=[{"role": "user", "content": prompt}]
        )
        all_chunks_output.append(response.choices[0].message.content.strip())

    # Join chunk-level summaries
    combined_chunks = "\n\n".join(all_chunks_output)

    summary_prompt = f"""
Extract key financial metrics from earnings data. Output in structured format.
Output Format
Company: [Name]
Period: [Quarter/Year]
Metrics:

Revenue: [Amount, growth %]
EPS: [GAAP and/or Adjusted amounts]
Guidance: [Forward-looking statements]
Dividends: [Changes if mentioned]
Notable: [Key highlights]

Key Data Points to Extract

Revenue figures and growth percentages
Earnings per share (both GAAP and adjusted)
Profit/loss amounts
Forward guidance (ranges, percentages)
Dividend changes
Year-over-year comparisons

Processing Rules

Extract company names from any format
Convert descriptive terms: "high teens" = 15-19%, "low 30s" = 30-34%
Distinguish GAAP vs adjusted metrics
Note currency impacts when mentioned
Flag significant guidance changes

Example Output
Company: AMETEK
Period: Q1 2021
Metrics:

Revenue: $1.22B, +1% Q1; 2021 guidance: up 15-19% total, up high single digits organic
EPS: Q1 adjusted $1.07; 2021 guidance: $4.48-$4.56; Q2 2021 guidance: $1.08-$1.10
Guidance: Q2 sales expected up 30-34% vs Q2 2020
Notable: Strong forward guidance despite modest Q1 growth

Process the provided financial data using this format.

Partial insights:
'''
{combined_chunks}
'''
"""

    final_response = client.chat.completions.create(
        model="llama3-70b-8192",
        messages=[{"role": "user", "content": summary_prompt}]
    )

    return final_response.choices[0].message.content.strip()


In [None]:
final_output = []
folder_list = sorted(os.listdir(folderpath))

for idx, folder_name in enumerate(folder_list):
    source_path = os.path.join(folderpath, folder_name, "source", "source.md")

    if os.path.isfile(source_path):
        print(f"Processing: {folder_name}")
        transcript = load_transcript(source_path)

        if transcript:
            try:
                current_client = client1 if idx % 2 == 0 else client2
                report = generate_report(transcript, folder_name, current_client)
                final_output.append({
                    "ECC": folder_name,
                    "Report": report
                })
            except Exception as e:
                print(f"Error generating report for {folder_name}: {e}")
        else:
            print(f"No transcript found for: {folder_name}")

Processing: ABM_q3_2021


KeyboardInterrupt: 

In [None]:
final_output

[{'ECC': 'ABM_q3_2021',
  'Report': 'Here is the extracted key financial metrics in the specified format:\n\n**Company:** ABM Industries Incorporated\n**Period:** Q3 2021\n**Metrics:**\n\n**Revenue:** $1.54B, +10.7% year-over-year\n**EPS:**\n  - GAAP: -$0.20\n  - Adjusted: $0.90, +20% year-over-year\n**Guidance:**\n  - Full year fiscal 2021 adjusted EPS guidance: $3.45-$3.55, up from $3.30-$3.50 previously\n**Dividends:**\n  - Paid 221st consecutive quarterly dividend of $0.19 per common share\n  - Declared 222nd consecutive quarterly dividend, payable November 1, 2021\n**Notable:**\n  - Strong third quarter results with double-digit revenue growth, solid cash generation, and a 20% gain in adjusted EPS\n  - Virus protection services remained elevated, but eased slightly in the quarter\n  - Acquisition of Able Services expected to be accretive to adjusted EPS from day one, with estimated $30 million to $40 million in cost savings synergies'},
 {'ECC': 'AME_q1_2021',
  'Report': 'Here is

In [None]:
with open("/content/summaries.txt", "w", encoding="utf-8") as f:
    for entry in final_output:
        f.write(f"=== {entry['ECC']} ===\n{entry['Report']}\n\n")


In [None]:
final_professional_output = []
prof_folderpath = "/content/drive/MyDrive/Earnings2Insights/Professional"
for folder_name in os.listdir(prof_folderpath):
    prof_path = os.path.join(prof_folderpath, folder_name, "source", "source.md")

    if os.path.isfile(prof_path):
        print(f"Processing: {folder_name}")
        transcript = load_transcript(prof_path)

        if transcript:
            try:
                cur_client = client1 if idx % 2 == 0 else client2
                prof_report = generate_report(transcript, folder_name, cur_client)
                final_professional_output.append({
                    "ECC": folder_name,
                    "Report": prof_report
                })
            except Exception as e:
                print(f"Error generating report for {folder_name}: {e}")
        else:
            print(f"No transcript found for: {folder_name}")

Processing: CMI_q4_2015
Error generating report for CMI_q4_2015: Error code: 429 - {'error': {'message': 'Rate limit reached for model `llama3-70b-8192` in organization `org_01jzgg6fh0enmrvm08t87bedbx` service tier `on_demand` on tokens per day (TPD): Limit 500000, Used 498761, Requested 4703. Please try again in 9m58.5082s. Need more tokens? Upgrade to Dev Tier today at https://console.groq.com/settings/billing', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}
Processing: DE_q2_2014
Error generating report for DE_q2_2014: Error code: 429 - {'error': {'message': 'Rate limit reached for model `llama3-70b-8192` in organization `org_01jzgg6fh0enmrvm08t87bedbx` service tier `on_demand` on tokens per day (TPD): Limit 500000, Used 498761, Requested 5081. Please try again in 11m3.7556s. Need more tokens? Upgrade to Dev Tier today at https://console.groq.com/settings/billing', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}
Processing: UNH_q2_2014
Error generating report for UNH_q2_2014: E

In [None]:
with open("/content/prof_summaries.txt", "w", encoding="utf-8") as f:
    for entry in final_professional_output:
        f.write(f"=== {entry['ECC']} ===\n{entry['Report']}\n\n")

**Final Suggestion code**

In [None]:
# prompt:
# You are a senior equity research analyst at a premier investment firm. Using the structured financial metrics provided,
# generate comprehensive investment reports that will guide institutional investors in making Long/Short decisions.

# **Analysis Framework:**

# For each earnings call (ECC), conduct thorough investment analysis focusing on:

# 1. **Financial Performance Evaluation:**
#    - Assess revenue growth momentum and sustainability
#    - Analyze EPS quality (GAAP vs Adjusted) and growth trajectory
#    - Evaluate guidance reliability and management confidence
#    - Review dividend policy and capital allocation efficiency

# 2. **Investment Signal Generation:**
#    - Identify bullish indicators: strong growth, raised guidance, operational improvements
#    - Flag bearish signals: declining margins, lowered guidance, competitive pressures
#    - Assess earnings quality and one-time item impacts
#    - Determine business momentum from sequential and year-over-year trends

# 3. **Risk-Reward Assessment:**
#    - Evaluate upside potential from strategic initiatives
#    - Assess downside risks from operational challenges
#    - Consider guidance achievement probability
#    - Analyze competitive positioning strength

# 4. **Time-Horizon Specific Catalysts:**
#    - **1-Day Outlook:** Immediate earnings reaction drivers (surprise magnitude, guidance changes)
#    - **1-Week Outlook:** Short-term momentum factors (management commentary, peer reactions)
#    - **1-Month Outlook:** Fundamental execution milestones (strategic initiative progress, next quarter setup)

# **Investment Recommendation Requirements:**
# - Clear LONG or SHORT recommendation with conviction level (HIGH/MEDIUM/LOW)
# - Primary investment thesis supported by specific financial data
# - Key catalysts for each time horizon
# - Risk factors and potential concerns
# - Price performance expectations

# **Report Quality Standards:**
# - Professional institutional research tone
# - Persuasive and actionable for portfolio managers
# - Data-driven conclusions with specific supporting evidence
# - Comprehensive yet concise (350-450 words per report)
# - Clear executive summary with immediate takeaways

# **Critical Success Factors:**
# Your reports must be convincing enough that human evaluators will:
# - Follow your Long/Short recommendations
# - Make correct investment decisions across 1-day, 1-week, and 1-month periods
# - Feel confident in the analysis quality and reasoning

# **Output Format:**
# Generate results in this exact JSON structure:

# ```json
# [
#   {
#     "ECC": "ABM_q3_2021",
#     "Report": "[Complete investment analysis report here]"
#   },
#   {
#     "ECC": "AME_q1_2021",
#     "Report": "[Complete investment analysis report here]"
#   }
# ]

In [None]:
file_path = "/content/drive/MyDrive/Earnings2Insights/Summaries_source.txt"

In [None]:
import re
import json
import os

def call_groq_client(client, prompt, max_tokens=512, temperature=0.3):
    response = client.chat.completions.create(
        model="llama3-70b-8192",
        messages=[{"role": "system", "content": "You are a financial analyst assistant."},
                  {"role": "user", "content": prompt}],
        max_tokens=max_tokens,
        temperature=temperature
    )
    # The Groq SDK returns a list of choices - take the first
    return response.choices[0].message.content

def generate_and_evaluate_report(ecc, raw_text, client):
    cleaned = re.sub(r"\*\*", "", raw_text)
    cleaned = re.sub(r"[*•·]+", "-", cleaned)

    gen_prompt = f"""Based on the following quarterly earnings summary, write a persuasive investment guidance report.
State whether the investor should go Long or Short for the next day, week, and month.

Company Summary:
{cleaned}

Report:"""

    gen_out = call_groq_client(client, gen_prompt)
    gen_out = gen_out.split("Report:")[-1].strip()

    eval_prompt = f"""Company Summary:
{cleaned}

Generated Report:
{gen_out}

Evaluation:"""

    eval_out = call_groq_client(client, eval_prompt)
    eval_out = eval_out.split("Evaluation:")[-1].strip()

    return f"{gen_out}\n\nEvaluation:\n{eval_out}"

if __name__ == "__main__":
    output = []
    if os.path.isfile(file_path):
        print(f"Processing file: {file_path}")
        content = load_transcript(file_path)
        if content:
            blocks = re.split(r"===\s+(.*?)\s+===", content)
            parsed = {}
            for i in range(1, len(blocks), 2):
                ecc = blocks[i].strip()
                data = blocks[i+1].strip()
                parsed[ecc] = data

            for idx, (ecc, transcript) in enumerate(parsed.items()):
                client = client1 if idx % 2 == 0 else client2
                try:
                    final_report = generate_and_evaluate_report(ecc, transcript, client)
                    output.append({
                        "ECC": ecc,
                        "Report": final_report
                    })
                    print(f"Processed {ecc}")
                except Exception as e:
                    print(f"Error generating report for {ecc}: {e}")
        else:
            print("No transcript content found")
    else:
        print(f"File not found: {file_path}")

    with open("final_output.json", "w", encoding="utf-8") as f_out:
        json.dump(output, f_out, indent=2)

    print("Processing complete, output saved to final_output.json")

Processing file: /content/drive/MyDrive/Earnings2Insights/Summaries_source.txt
Processed ABM_q3_2021
Processed AME_q1_2021
Processed CFR_q3_2019
Processed CPF_q4_2019
Processed DNB_q2_2021
Processed DOV_q2_2020
Processed DX_q1_2021
Processed FAF_q4_2020
Processed FIS_q4_2020
Processed FN_q2_2021
Processed FSS_q2_2021
Processed GCO_q1_2022
Processed GD_q1_2021
Processed GLW_q1_2021
Processed GNW_q2_2021
Processed HR_q2_2021
Processed HTH_q4_2020
Processed JBL_q4_2021
Processed KMT_q1_2022
Processed KW_q3_2021
Processed LH_q3_2021
Processed LNN_q3_2021
Processed LYB_q3_2021
Processed MDT_q2_2022
Processed MKC_q2_2021
Processed MSI_q3_2021
Processed MYE_q2_2020
Processed NEE_q3_2021
Processed NPO_q2_2021
Processed OHI_q4_2021
Processed RPM_q3_2022
Processed SF_q3_2020
Processed SWN_q2_2021
Processed SYY_q2_2021
Processed TK_q1_2021
Processed TT_q1_2021
Processed UVE_q4_2020
Processed VMI_q1_2021
Processed VSH_q2_2021
Processed WWW_q3_2021
Processing complete, output saved to final_output.

In [None]:
output

[{'ECC': 'ABM_q3_2021',
  'Report': "ABM Industries Incorporated (Q3 2021)**\n\n**Recommendation:**\n\nBased on the strong quarterly earnings summary, we recommend a **Long** position for ABM Industries Incorporated for the next day, week, and month.\n\n**Rationale:**\n\n1. **Impressive Revenue Growth**: ABM Industries reported a 10.7% year-over-year revenue growth, indicating a strong demand for its services. This upward trend is likely to continue, driven by the company's diversified portfolio and solid market position.\n2. **Adjusted EPS Beats Expectations**: The company's adjusted EPS of $0.90 represents a 20% year-over-year increase, exceeding expectations. This demonstrates ABM's ability to effectively manage costs and drive profitability.\n3. **Upward Guidance Revision**: The full-year fiscal 2021 adjusted EPS guidance has been revised upward to $3.45-$3.55, indicating confidence in the company's future performance.\n4. **Consistent Dividend Payments**: ABM Industries has mainta

In [None]:
with open("/content/final_result.txt", "w", encoding="utf-8") as f:
    for entry in output:
        f.write(f"=== {entry['ECC']} ===\n{entry['Report']}\n\n")

In [None]:
with open("/content/final_result_forMe.txt", "w", encoding="utf-8") as f:
    for entry in output:
        f.write(f"=== {entry['ECC']} ===\n{entry['Report']}\n\n")