# Stock Analysis Report Generator Using Perplexity

<small>

#### **Overview**
This notebook automatically generates:

- Individual stock analysis reports with key financial and growth highlights followed by BuccoCapital 13 point analysis framework
- ETF and Mutual Fund analysis with key financial and growth highlights, composition, fees, liquidity, etc in the style of Brian Belsky 
- Analyst ratings and price targets (when available)

This notebook uses Perplexity AI's Sonar Pro model via API. Each report is saved as a formatted Microsoft Word document.6. 

#### **Features**
- Batch Processing: Processes multiple stocks, ETF's and mutual funds from a simple text file list of tickers. 
- Smart Skipping: Avoids duplicate API calls by checking for existing reports2. 5-year price performance vs. NASDAQ
- Markdown Formatting: Converts AI-generated markdown to professional Word formatting1. Stock ticker and generation date
- Comprehensive Analysis: Includes price performance charts, financial metrics, and analyst ratingsEach report includes:
- Date Stamping: Automatically timestamps each report## Output Format

#### **Requirements**
- Perplexity API Key- 
- Input file: Text file with stock tickers (one per line)
- Python 3.7+
- App like VS Code to run the notebook (run all)
- Folder structure locally to get inputs (like equity list) and store outputs (like individual equity reports) Look at getting started .txt file for the right structure and files you need.
- Create an environment file (.env file from text) with your API key and specific folder locations of inputs (like equity list) and outputs (for individual reports).  Look at the getting started .txt file for the right way to structure and create your .env file.  

#### **Output**
- Analysis word document for each equity




## Import Libraries

In [None]:
# Uncomment below pip install lines only if packages are not already installed
%pip install -q -r requirements.txt
%pip install python-docx

# Import necessary packages
import sys
import requests
from docx import Document
from docx.shared import Pt, RGBColor
from docx.enum.text import WD_PARAGRAPH_ALIGNMENT
import os
from dotenv import load_dotenv
from openai import OpenAI
import re
import os.path
from os.path import isfile, join
from os import listdir
import subprocess
import json
from datetime import datetime, timedelta
from typing import Dict, List, Optional, Any
from IPython.display import Markdown, display, SVG
import numpy as np
from time import time
np.random.seed(10)
import pandas as pd
import matplotlib.pyplot as plt
import plotly.graph_objs as go
import argparse
import random

print("✅ Libraries imported successfully!")

## Configuration Settings

In [None]:
### Variable informational detail
# **MAX_TOKENS**: `2000` - Maximum length of generated response
# **INPUT_DIR**: Directory containing the equity list file- **TEMPERATURE**: `0` - Deterministic output for consistent, factual analysis
# **OUTPUT_DIR**: Directory where generated reports will be saved- **MODEL**: `sonar-pro` - Perplexity's most advanced model with real-time web access and current financial data
# **EQUITY_LIST_FILE**: Text file with stock tickers (one per line)
# **Model Settings: Specific model to be invoked, 0 temperature to eliminate creativity and limit the number of tokens for the query.

# Load environment variables

load_dotenv()

# Get API key
API_KEY = os.getenv("PERPLEXITY_API_KEY")

# Input/Output Configuration
INPUT_DIR = os.getenv("Input_dir")
OUTPUT_DIR_SEC_FILINGS = os.getenv("Output_dir_sec_filings")
OUTPUT_DIR_INDIVIDUAL_STOCK_ANALYSIS = os.getenv("Output_dir_individual_equities")
OUTPUT_DIR_PORTFOLIO_ANALYSIS = os.getenv("Output_dir_portfolio")

# Prompts and Input Lists
EQUITY_LIST_FILE = os.getenv("EQUITY_LIST_FILE")
PROMPT_DIR = os.getenv("Prompt_dir")
PROMPT_INDIVIDUAL_EQUITY_ANALYSIS = os.getenv("PROMPT_INDIVIDUAL_EQUITY_FILE")
PROMPT_PORTFOLIO_ANALYSIS = os.getenv("PROMPT_PORTFOLIO_FILE")
PROMPT_RATINGS_CHANGE=os.getenv("PROMPT_RATINGS_CHANGE_FILE")
PROMPT_ETF_ANALYSIS_FILE = os.getenv("PROMPT_ETF_ANALYSIS_FILE") 

# Model Configuration
MODEL = "sonar-pro" 
TEMPERATURE = 0
MAX_TOKENS = 2000

# SEC code header
SEC_HEADER =os.getenv("User_Agent")

## Read in Text Inputs

In [None]:

# Read stock list from file
with open(EQUITY_LIST_FILE, 'r') as f:
    EQUITY_LIST = [line.strip() for line in f if line.strip()]
# print small selection of equities from list to verify 
print(EQUITY_LIST)  # Print equities for verification
  
# Read individual equity prompt template from file
with open(PROMPT_INDIVIDUAL_EQUITY_ANALYSIS, 'r') as f:
    PROMPT_TEMPLATE = f.read().strip()
#print(PROMPT_TEMPLATE)

# Read portfolio prompt template from file
with open(PROMPT_PORTFOLIO_ANALYSIS, 'r') as f:
    PROMPT_PORTFOLIO_TEMPLATE = f.read().strip()
print(" ")
#print(PROMPT_PORTFOLIO_TEMPLATE)

# Read ratings change prompt template from file
with open(PROMPT_RATINGS_CHANGE, 'r') as f:
    PROMPT_RATINGS_CHANGE_TEMPLATE = f.read().strip()
print(" ")
#print(PROMPT_RATINGS_CHANGE_TEMPLATE)

# Read ETF prompt template from file
with open(PROMPT_ETF_ANALYSIS_FILE, 'r') as f:
    PROMPT_ETF_ANALYSIS_FILE_TEMPLATE = f.read().strip()
print(" ")
#print(PROMPT_ETF_ANALYSIS_FILE_TEMPLATE)

## Define Class to Call Model API

In [None]:
from openai import OpenAI

class PerplexityClient:
    def __init__(self, api_key):
        self.client = OpenAI(
            api_key=api_key,
            base_url="https://api.perplexity.ai"
        )

    def chat(self, message, model=MODEL):
        response = self.client.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": message}]
        )
        return response.choices[0].message.content

## Seperate Individual Equity, ETF's and Other from List of Tickers

In [None]:
def classify_tickers(equity_list, client, model="sonar-pro"):
    """
    Classify tickers as Individual Equity, ETF, Mutual Fund, or Other using a single Perplexity API call.
    Returns four lists: individual_equities, etfs, mutual_funds, other.
    """
    # Create comma-separated list of tickers
    tickers_str = ", ".join(equity_list)
    
    prompt = f"""Classify each of the following tickers as one of: Individual Equity, ETF, Mutual Fund, or Other.

Tickers: {tickers_str}

Return the results in this exact format, one per line:
TICKER: CLASSIFICATION

Example format:
AAPL: Individual Equity
SPY: ETF
VWICX: Mutual Fund

Classify each ticker now:"""

    try:
        response = client.chat(prompt, model=model)
        print("API Response received. Parsing classifications...")
        print("-" * 40)
        
        # Parse the response
        individual_equities = []
        etfs = []
        mutual_funds = []
        other = []
        
        for line in response.strip().split('\n'):
            line = line.strip()
            if not line or ':' not in line:
                continue
            
            # Parse "TICKER: CLASSIFICATION" format
            parts = line.split(':', 1)
            if len(parts) != 2:
                continue
                
            ticker = parts[0].strip().upper()
            classification = parts[1].strip().lower()
            
            # Only process tickers that are in our original list
            if ticker not in [t.upper() for t in equity_list]:
                continue
            
            # Find the original case ticker
            original_ticker = next((t for t in equity_list if t.upper() == ticker), ticker)
            
            print(f"{original_ticker}: {classification}")
            
            if "etf" in classification:
                etfs.append(original_ticker)
            elif "mutual fund" in classification:
                mutual_funds.append(original_ticker)
            elif "individual" in classification or "equity" in classification:
                individual_equities.append(original_ticker)
            else:
                other.append(original_ticker)
        
        # Check for any tickers that weren't classified
        classified = set(t.upper() for t in individual_equities + etfs + mutual_funds + other)
        for ticker in equity_list:
            if ticker.upper() not in classified:
                print(f"{ticker}: not classified (adding to other)")
                other.append(ticker)
        
        print("-" * 40)
        return individual_equities, etfs, mutual_funds, other
        
    except Exception as e:
        print(f"Error classifying tickers: {e}")
        # Return all as other if API call fails
        return [], [], [], equity_list

In [None]:
# Classify tickers and print results
if 'individual_equities' not in globals() or not individual_equities:
    client = PerplexityClient(api_key=API_KEY)
    individual_equities, etfs, mutual_funds, other = classify_tickers(EQUITY_LIST, client)
print("Individual Equities:", individual_equities)
print("ETFs:", etfs)
print("Mutual Funds:", mutual_funds)
print("Other:", other)


## Define Functions to Retrieve SEC CIK Codes and Filings

In [None]:
# Function to Fetch SEC CIK code for a given ticker symbol
def get_cik(ticker):
    """
    Retrieve the CIK (Central Index Key) for a given stock ticker from the SEC database.
    Returns the CIK as a zero-padded string if found, otherwise None.
    """
    url = "https://www.sec.gov/files/company_tickers_exchange.json"
    headers = {"User-Agent": SEC_HEADER}
    print("Fetching CIK for ticker:", ticker)
    
    resp = requests.get(url, headers=headers)
    if resp.status_code != 200:
        print(f"Failed to fetch CIK data. Status code: {resp.status_code}")
        return None
    try:
        data = resp.json()
    except Exception as e:
        print(f"Error decoding JSON: {e}")
        print("Response text:", resp.text[:200])
        return None
    # SEC JSON structure: {'fields': [...], 'data': [[...], ...]}
    if isinstance(data, dict) and "fields" in data and "data" in data:
        fields = data["fields"]
        data_list = data["data"]
        # Find the index for 'ticker' and for the CIK field (usually 'cik' or 'cik_str')
        ticker_idx = next((i for i, f in enumerate(fields) if f.lower() == "ticker"), None)
        cik_idx = next((i for i, f in enumerate(fields) if "cik" in f.lower()), None)
        if ticker_idx is None or cik_idx is None:
            print("Could not find required fields in SEC data.")
            return None
        for entry in data_list:
            if entry[ticker_idx].upper() == ticker.upper():
                return str(entry[cik_idx]).zfill(10)
        print(f"Ticker {ticker} not found in SEC database.")
        return None
    print("Unexpected SEC JSON structure.")
    return None


In [None]:

# Function to Fetch recent SEC filings for a given CIK
def get_sec_filings(cik, forms=["10-K", "10-Q", "8-K"]):
    """
    Retrieve the most recent 10-K, 10-Q, and 8-K filings for a given CIK.
    Returns a list of up to 3 filings (one per form type).
    """
    base_url = f"https://data.sec.gov/submissions/CIK{str(cik).zfill(10)}.json"
    headers = {"User-Agent": SEC_HEADER}
    try:
        resp = requests.get(base_url, headers=headers)
        if resp.status_code != 200:
            print(f"Failed to fetch filings for CIK {cik}. Status code: {resp.status_code}")
            return []
        data = resp.json()
    except Exception as e:
        print(f"Error fetching filings for CIK {cik}: {e}")
        return []
    filings_dict = {}
    recent = data.get("filings", {}).get("recent", {})
    forms_list = recent.get("form", [])
    accession_list = recent.get("accessionNumber", [])
    filing_dates = recent.get("filingDate", [])
    primary_docs = recent.get("primaryDocument", [])
    for i, form in enumerate(forms_list):
        if form in forms and form not in filings_dict:
            filing = {
                "form": form,
                "date": filing_dates[i] if i < len(filing_dates) else None,
                "accession": accession_list[i] if i < len(accession_list) else None,
                "url": f"https://www.sec.gov/Archives/edgar/data/{int(cik)}/{accession_list[i].replace('-', '')}/{primary_docs[i]}" if i < len(accession_list) and i < len(primary_docs) else None,
            }
            filings_dict[form] = filing
            if len(filings_dict) == len(forms):
                break
    return [filings_dict[form] for form in forms if form in filings_dict]



def download_sec_filings(ticker, filings, output_dir):
    """
    Download SEC filings and save them to the specified output directory.
    Each file is named as: {ticker}_{form}_{date}.html
    """
    os.makedirs(output_dir, exist_ok=True)
    for filing in filings:
        url = filing.get("url")
        form = filing.get("form")
        date = filing.get("date")
        if not url or not form or not date:
            print(f"Skipping incomplete filing for {ticker}: {filing}")
            continue
        filename = f"{ticker}_{form}_{date}.html"
        filepath = os.path.join(output_dir, filename)
        try:
            resp = requests.get(url, headers={"User-Agent": SEC_HEADER})
            if resp.status_code == 200:
                with open(filepath, "wb") as f:
                    f.write(resp.content)
                print(f"Saved: {filepath}")
            else:
                print(f"Failed to download {url} (status {resp.status_code})")
        except Exception as e:
            print(f"Error downloading {url}: {e}")

## Save Time by Using Previously Downloaded CIK Codes

In [None]:
# ============================================================
# Summarize and Save CIK Lookup File for Future Use
# ============================================================

# Load existing CIK lookup file if it exists
cik_lookup_file = os.path.join(OUTPUT_DIR_SEC_FILINGS, "cik_lookup.json")
existing_cik_lookup = {"individual_equities": {}, "etfs": {}, "mutual_funds": {}}

if os.path.exists(cik_lookup_file):
    with open(cik_lookup_file, 'r') as f:
        existing_cik_lookup = json.load(f)
    print(f"✅ Loaded existing CIK lookup file: {cik_lookup_file}")
    print(f"   - Individual Equities: {len(existing_cik_lookup.get('individual_equities', {}))} CIKs")
    print(f"   - ETFs: {len(existing_cik_lookup.get('etfs', {}))} CIKs")
    print(f"   - Mutual Funds: {len(existing_cik_lookup.get('mutual_funds', {}))} CIKs")
else:
    print(f"ℹ️ No existing CIK lookup file found. Will fetch CIKs from SEC.")

# Define variables for each type of investment
equity_ciks = dict(existing_cik_lookup.get('individual_equities', {}))
mutual_fund_ciks = dict(existing_cik_lookup.get('mutual_funds', {}))
etf_ciks = dict(existing_cik_lookup.get('etfs', {}))

print(f"Individual Equities: {len(equity_ciks)} CIKs (filings: 10-K, 10-Q, 8-K)", flush=True)
print(f"ETFs: {len(etf_ciks)} CIKs (filings: N-CSR, N-CSRS, N-PORT)", flush=True)
print(f"Mutual Funds: {len(mutual_fund_ciks)} CIKs (filings: N-CSR, N-CSRS, N-PORT)", flush=True)
print(f"Filings saved to: {OUTPUT_DIR_SEC_FILINGS}", flush=True)

# Create combined CIK lookup dictionary
cik_lookup = {
    "individual_equities": equity_ciks,
    "etfs": etf_ciks,
    "mutual_funds": mutual_fund_ciks,
    "metadata": {
        "generated_date": datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
        "total_equities": len(equity_ciks),
        "total_etfs": len(etf_ciks),
        "total_mutual_funds": len(mutual_fund_ciks)
    }
}

# Save CIK lookup to JSON file
cik_lookup_file = os.path.join(OUTPUT_DIR_SEC_FILINGS, "cik_lookup.json")
with open(cik_lookup_file, 'w') as f:
    json.dump(cik_lookup, f, indent=2)
print(f"\n✅ CIK lookup file saved to: {cik_lookup_file}", flush=True)

# Also create a flat CSV for easy reference
cik_flat_list = []
for ticker, cik in equity_ciks.items():
    cik_flat_list.append({"ticker": ticker, "cik": cik, "type": "Individual Equity"})
for ticker, cik in etf_ciks.items():
    cik_flat_list.append({"ticker": ticker, "cik": cik, "type": "ETF"})
for ticker, cik in mutual_fund_ciks.items():
    cik_flat_list.append({"ticker": ticker, "cik": cik, "type": "Mutual Fund"})

cik_df = pd.DataFrame(cik_flat_list)
cik_csv_file = os.path.join(OUTPUT_DIR_SEC_FILINGS, "cik_lookup.csv")
cik_df.to_csv(cik_csv_file, index=False)
# print(cik_df.head(20))
print(f"✅ CIK lookup CSV saved to: {cik_csv_file}", flush=True)

print("\n✅ SEC processing complete!", flush=True)

In [None]:
# Download SEC filings for individual equities, ETFs, and mutual funds
import sys
from time import sleep

# SEC EDGAR Rate Limiting: Max 10 requests per second
# We use 0.15 second delay (~6.6 requests/sec) to stay safely under the limit
SEC_REQUEST_DELAY = 0.15  # seconds between requests
SEC_MAX_RETRIES = 3       # number of retry attempts on failure

def sec_request_with_retry(url, headers, max_retries=SEC_MAX_RETRIES, delay=SEC_REQUEST_DELAY):
    """
    Make a request to SEC EDGAR with rate limiting and retry logic.
    Handles ConnectionResetError and other transient failures.
    """
    for attempt in range(max_retries):
        try:
            sleep(delay)  # Rate limiting delay
            resp = requests.get(url, headers=headers, timeout=30)
            return resp
        except (requests.exceptions.ConnectionError, 
                requests.exceptions.Timeout,
                ConnectionResetError) as e:
            wait_time = (attempt + 1) * 2  # Exponential backoff: 2, 4, 6 seconds
            print(f"    ⚠️ Connection error (attempt {attempt + 1}/{max_retries}): {type(e).__name__}")
            if attempt < max_retries - 1:
                print(f"    Waiting {wait_time} seconds before retry...", flush=True)
                sleep(wait_time)
            else:
                print(f"    ❌ Failed after {max_retries} attempts", flush=True)
                return None
    return None

# Helper function to get CIK for stocks/ETFs (exchange-listed securities)
def fetch_cik(ticker, headers):
    """Fetch CIK for stocks and ETFs from company_tickers_exchange.json"""
    url = "https://www.sec.gov/files/company_tickers_exchange.json"
    resp = sec_request_with_retry(url, headers)
    if resp and resp.status_code == 200:
        data = resp.json()
        fields = data.get("fields", [])
        data_list = data.get("data", [])
        ticker_idx = next((i for i, f in enumerate(fields) if f.lower() == "ticker"), None)
        cik_idx = next((i for i, f in enumerate(fields) if "cik" in f.lower()), None)
        if ticker_idx is not None and cik_idx is not None:
            for entry in data_list:
                if entry[ticker_idx].upper() == ticker.upper():
                    return str(entry[cik_idx]).zfill(10)
    return None

# Helper function to get CIK for mutual funds (different SEC file)
def fetch_mutual_fund_cik(ticker, headers):
    """Fetch CIK for mutual funds from company_tickers_mf.json"""
    url = "https://www.sec.gov/files/company_tickers_mf.json"
    resp = sec_request_with_retry(url, headers)
    if resp and resp.status_code == 200:
        data = resp.json()
        # Mutual fund JSON structure: {"fields": [...], "data": [[...], ...]}
        # fields = ['cik', 'seriesId', 'classId', 'symbol']
        fields = data.get("fields", [])
        data_list = data.get("data", [])
        
        # Find indices for symbol and cik
        symbol_idx = fields.index("symbol") if "symbol" in fields else None
        cik_idx = fields.index("cik") if "cik" in fields else None
        
        if symbol_idx is not None and cik_idx is not None:
            for row in data_list:
                if row[symbol_idx].upper() == ticker.upper():
                    return str(row[cik_idx]).zfill(10)
    return None

headers = {"User-Agent": SEC_HEADER}

In [None]:
# ============================================================
# 1. Process INDIVIDUAL EQUITIES - Get CIK and download 10-K, 10-Q, 8-K
# ============================================================
print("=" * 60, flush=True)
print("INDIVIDUAL EQUITIES - Fetching CIK and SEC filings (10-K, 10-Q, 8-K)", flush=True)
print("=" * 60, flush=True)

# Start with existing CIKs from lookup file
equity_ciks = dict(existing_cik_lookup.get('individual_equities', {}))
fetched_count = 0
cached_count = 0

for ticker in individual_equities:
    print(f"--- Processing {ticker} ---", flush=True)
    
    # Check if CIK already exists in lookup
    if ticker in equity_ciks:
        cik = equity_ciks[ticker]
        print(f"  CIK: {cik} (cached)", flush=True)
        cached_count += 1
    else:
        # Fetch CIK from SEC
        cik = fetch_cik(ticker, headers)
        if cik:
            equity_ciks[ticker] = cik
            print(f"  CIK: {cik} (fetched)", flush=True)
            fetched_count += 1
        else:
            print(f"  Could not find CIK", flush=True)
            continue
    
    # Fetch filings for individual equities
    base_url = f"https://data.sec.gov/submissions/CIK{cik}.json"
    resp = sec_request_with_retry(base_url, headers)
    filings = []
    if resp and resp.status_code == 200:
        data = resp.json()
        recent = data.get("filings", {}).get("recent", {})
        forms_list = recent.get("form", [])
        accession_list = recent.get("accessionNumber", [])
        filing_dates = recent.get("filingDate", [])
        primary_docs = recent.get("primaryDocument", [])
        seen_forms = set()
        for i, form in enumerate(forms_list):
            if form in ["10-K", "10-Q", "8-K"] and form not in seen_forms:
                filings.append({
                    "form": form,
                    "date": filing_dates[i],
                    "url": f"https://www.sec.gov/Archives/edgar/data/{int(cik)}/{accession_list[i].replace('-', '')}/{primary_docs[i]}"
                })
                seen_forms.add(form)
                if len(seen_forms) == 3:
                    break
    
    print(f"  Found {len(filings)} filings", flush=True)
    
    # Download filings
    os.makedirs(OUTPUT_DIR_SEC_FILINGS, exist_ok=True)
    for filing in filings:
        filename = f"{ticker}_{filing['form']}_{filing['date']}.html"
        filepath = os.path.join(OUTPUT_DIR_SEC_FILINGS, filename)
        resp = sec_request_with_retry(filing['url'], headers)
        if resp and resp.status_code == 200:
            with open(filepath, "wb") as f:
                f.write(resp.content)
            print(f"    Saved: {filename}", flush=True)
        else:
            print(f"    ⚠️ Failed to download: {filename}", flush=True)

print(f"\n✅ Individual equities complete: {len(equity_ciks)} CIKs ({cached_count} cached, {fetched_count} fetched)\n", flush=True)

In [None]:
# ============================================================
# 2. Process ETFs - Get CIK and download Fund SEC Filings
# Note: Some ETFs (especially open-end fund ETFs) are in the mutual fund file
# ============================================================
print("=" * 60, flush=True)
print("ETFs - Fetching CIK and SEC filings", flush=True)
print("=" * 60, flush=True)

# Define all fund-related SEC forms to fetch:
# - Form N-1A: Statutory Prospectus and SAI (initial registration)
# - 485BPOS/485APOS: Post-effective amendments to Form N-1A
# - N-CSR: Annual shareholder report (certified)
# - N-CSRS: Semi-annual shareholder report
# - N-PORT/N-PORT-P: Portfolio holdings report
# - N-CEN: Annual report for registered investment companies
FUND_FORMS = {
    # Prospectus & Registration Forms
    "N-1A": "N-1A_Prospectus_SAI",           # Initial Form N-1A (statutory prospectus + SAI)
    "N-1A/A": "N-1A_Amendment",               # Amendment to Form N-1A
    "485BPOS": "N-1A_Post_Effective_Amend",   # Post-effective amendment to Form N-1A
    "485APOS": "N-1A_Pre_Effective_Amend",    # Pre-effective amendment to Form N-1A
    # Shareholder Reports
    "N-CSR": "Annual_Shareholder_Report",     # Annual shareholder report
    "N-CSRS": "Semiannual_Shareholder_Report", # Semi-annual shareholder report
    # Portfolio & Annual Reports
    "N-PORT": "Portfolio_Holdings",           # Monthly portfolio holdings
    "N-PORT-P": "Portfolio_Holdings",         # Portfolio holdings (partial)
    "N-CEN": "Annual_Report_N-CEN",           # Annual report for investment companies
}

# Start with existing CIKs from lookup file
etf_ciks = dict(existing_cik_lookup.get('etfs', {}))
fetched_count = 0
cached_count = 0

for ticker in etfs:
    print(f"--- Processing {ticker} ---", flush=True)
    
    # Check if CIK already exists in lookup
    if ticker in etf_ciks:
        cik = etf_ciks[ticker]
        print(f"  CIK: {cik} (cached)", flush=True)
        cached_count += 1
    else:
        # First try exchange file (standard ETFs)
        cik = fetch_cik(ticker, headers)
        if not cik:
            # Fallback to mutual fund file (open-end fund ETFs like VNM)
            cik = fetch_mutual_fund_cik(ticker, headers)
        if cik:
            etf_ciks[ticker] = cik
            print(f"  CIK: {cik} (fetched)", flush=True)
            fetched_count += 1
        else:
            print(f"  Could not find CIK in exchange or mutual fund files", flush=True)
            continue
    
    # Fetch filings for ETFs - get latest of each form type
    base_url = f"https://data.sec.gov/submissions/CIK{cik}.json"
    resp = sec_request_with_retry(base_url, headers)
    filings = []
    if resp and resp.status_code == 200:
        data = resp.json()
        recent = data.get("filings", {}).get("recent", {})
        forms_list = recent.get("form", [])
        accession_list = recent.get("accessionNumber", [])
        filing_dates = recent.get("filingDate", [])
        primary_docs = recent.get("primaryDocument", [])
        
        # Track which form types we've found (get latest of each)
        found_forms = set()
        
        for i, form in enumerate(forms_list):
            # Check if this form is in our target list and we haven't found it yet
            if form in FUND_FORMS and form not in found_forms:
                doc_type = FUND_FORMS[form]
                filings.append({
                    "form": form,
                    "type": doc_type,
                    "date": filing_dates[i],
                    "url": f"https://www.sec.gov/Archives/edgar/data/{int(cik)}/{accession_list[i].replace('-', '')}/{primary_docs[i]}"
                })
                found_forms.add(form)
                print(f"  Found {form}: {doc_type} ({filing_dates[i]})", flush=True)
            
            # Stop once we have all form types
            if len(found_forms) == len(FUND_FORMS):
                break
    
    print(f"  Total filings to download: {len(filings)}", flush=True)
    
    # Download filings
    if filings:
        os.makedirs(OUTPUT_DIR_SEC_FILINGS, exist_ok=True)
        for filing in filings:
            # Replace slashes in form names for valid filenames
            form_name = filing['form'].replace('/', '-')
            doc_type = filing['type'].replace(' ', '_')
            filename = f"{ticker}_{doc_type}_{form_name}_{filing['date']}.html"
            filepath = os.path.join(OUTPUT_DIR_SEC_FILINGS, filename)
            resp = sec_request_with_retry(filing['url'], headers)
            if resp and resp.status_code == 200:
                with open(filepath, "wb") as f:
                    f.write(resp.content)
                print(f"    Saved: {filename}", flush=True)
            else:
                print(f"    ⚠️ Failed to download: {filename}", flush=True)

print(f"\n✅ ETFs complete: {len(etf_ciks)} CIKs ({cached_count} cached, {fetched_count} fetched)\n", flush=True)



In [None]:
# Download filings
# ============================================================
# 3. Process MUTUAL FUNDS - Get CIK and download SEC Filings
# ============================================================
print("=" * 60, flush=True)
print("MUTUAL FUNDS - Fetching CIK and SEC filings", flush=True)
print("=" * 60, flush=True)

# Start with existing CIKs from lookup file
mutual_fund_ciks = dict(existing_cik_lookup.get('mutual_funds', {}))
fetched_count = 0
cached_count = 0

for ticker in mutual_funds:
    print(f"--- Processing {ticker} ---", flush=True)
    
    # Check if CIK already exists in lookup
    if ticker in mutual_fund_ciks:
        cik = mutual_fund_ciks[ticker]
        print(f"  CIK: {cik} (cached)", flush=True)
        cached_count += 1
    else:
        # Use mutual fund specific function
        cik = fetch_mutual_fund_cik(ticker, headers)
        if cik:
            mutual_fund_ciks[ticker] = cik
            print(f"  CIK: {cik} (fetched)", flush=True)
            fetched_count += 1
        else:
            print(f"  Could not find CIK", flush=True)
            continue
    
    # Fetch filings for mutual funds - get latest of each form type
    base_url = f"https://data.sec.gov/submissions/CIK{cik}.json"
    resp = sec_request_with_retry(base_url, headers)
    filings = []
    if resp and resp.status_code == 200:
        data = resp.json()
        recent = data.get("filings", {}).get("recent", {})
        forms_list = recent.get("form", [])
        accession_list = recent.get("accessionNumber", [])
        filing_dates = recent.get("filingDate", [])
        primary_docs = recent.get("primaryDocument", [])
        
        # Track which form types we've found (get latest of each)
        found_forms = set()
        
        for i, form in enumerate(forms_list):
            # Check if this form is in our target list and we haven't found it yet
            if form in FUND_FORMS and form not in found_forms:
                doc_type = FUND_FORMS[form]
                filings.append({
                    "form": form,
                    "type": doc_type,
                    "date": filing_dates[i],
                    "url": f"https://www.sec.gov/Archives/edgar/data/{int(cik)}/{accession_list[i].replace('-', '')}/{primary_docs[i]}"
                })
                found_forms.add(form)
                print(f"  Found {form}: {doc_type} ({filing_dates[i]})", flush=True)
            
            # Stop once we have all form types
            if len(found_forms) == len(FUND_FORMS):
                break
    
    print(f"  Total filings to download: {len(filings)}", flush=True)
    
    
    if filings:
        os.makedirs(OUTPUT_DIR_SEC_FILINGS, exist_ok=True)
        for filing in filings:
            # Replace slashes in form names for valid filenames
            form_name = filing['form'].replace('/', '-')
            doc_type = filing['type'].replace(' ', '_')
            filename = f"{ticker}_{doc_type}_{form_name}_{filing['date']}.html"
            filepath = os.path.join(OUTPUT_DIR_SEC_FILINGS, filename)
            resp = sec_request_with_retry(filing['url'], headers)
            if resp and resp.status_code == 200:
                with open(filepath, "wb") as f:
                    f.write(resp.content)
                print(f"    Saved: {filename}", flush=True)
            else:
                print(f"    ⚠️ Failed to download: {filename}", flush=True)

print(f"\n✅ Mutual Funds complete: {len(mutual_fund_ciks)} CIKs ({cached_count} cached, {fetched_count} fetched)\n", flush=True)

## Define Classes to Format the Output in a Word Report

In [None]:
# Function to convert markdown to Word document content
def add_markdown_to_word(doc, markdown_text):
    """Convert markdown text to formatted Word document content.
    Uses narrative paragraphs for most content.
    Bullet points only for true lists (multiple consecutive short items under a header).
    Bold is only used for titles, section headers, and table headers.
    """
    lines = markdown_text.split('\n')
    i = 0
    in_list_context = False  # Track if we're in a list context (after header or multiple bullets)
    consecutive_bullets = 0   # Count consecutive bullet items
    
    def is_list_item(text):
        """Check if text looks like a list item (short, data-like content)."""
        if len(text) < 100:
            return True
        if re.search(r':\s*[\d$%]|[\d.]+%|\$[\d,.]+|\d+\s*(million|billion|M|B|K)', text):
            return True
        return False
    
    def count_upcoming_bullets(lines, start_idx):
        """Count how many consecutive bullet lines follow."""
        count = 0
        for j in range(start_idx, len(lines)):
            line = lines[j].strip()
            if line.startswith(('- ', '* ', '• ')):
                count += 1
            elif line == '':
                continue
            else:
                break
        return count
    
    def extract_styled_header(line):
        """Extract text from styled headers like **<span style="...">1. Title</span>**
        Returns (clean_text, is_styled_header)
        """
        # Pattern for **<span style="...">text</span>** format
        span_pattern = r'\*\*<span[^>]*>(.+?)</span>\*\*'
        match = re.search(span_pattern, line)
        if match:
            return match.group(1).strip(), True
        
        # Also handle <span> without outer **
        span_pattern2 = r'<span[^>]*>(.+?)</span>'
        match2 = re.search(span_pattern2, line)
        if match2:
            return match2.group(1).strip(), True
        
        return None, False
    
    def clean_html_tags(text):
        """Remove any HTML tags from text."""
        return re.sub(r'<[^>]+>', '', text)
    
    while i < len(lines):
        line = lines[i]
        
        # Skip empty lines
        if not line.strip():
            i += 1
            in_list_context = False
            consecutive_bullets = 0
            continue
        
        # Check for styled numbered headers (e.g., **<span style="...">1. Title</span>**)
        styled_text, is_styled = extract_styled_header(line)
        if is_styled:
            p = doc.add_paragraph()
            run = p.add_run(styled_text)
            run.bold = True
            run.font.size = Pt(14)  # Larger than normal text
            run.font.color.rgb = RGBColor(0, 0, 139)  # Dark blue color
            in_list_context = True
            consecutive_bullets = 0
            i += 1
            continue
        
        # Headers (# ## ###) - These get bold via heading styles
        if line.startswith('#'):
            level = len(line) - len(line.lstrip('#'))
            text = line.lstrip('#').strip()
            text = clean_html_tags(text)  # Remove any HTML tags
            doc.add_heading(text, level=min(level, 9))
            in_list_context = True
            consecutive_bullets = 0
        
        # Tables (|...) - Keep table formatting as-is
        elif '|' in line and i + 1 < len(lines) and '|' in lines[i + 1]:
            table_lines = [line]
            i += 1
            if '---' in lines[i] or ':-:' in lines[i]:
                i += 1
            while i < len(lines) and '|' in lines[i]:
                table_lines.append(lines[i])
                i += 1
            
            headers = [cell.strip() for cell in table_lines[0].split('|') if cell.strip()]
            num_cols = len(headers)
            num_rows = len(table_lines)
            
            table = doc.add_table(rows=num_rows, cols=num_cols)
            table.style = 'Light Grid Accent 1'
            
            for j, header in enumerate(headers):
                cell = table.rows[0].cells[j]
                header_text = clean_html_tags(header.replace('**', '').replace('__', ''))
                cell.text = header_text
                cell.paragraphs[0].runs[0].bold = True
            
            for row_idx in range(1, len(table_lines)):
                cells = [cell.strip() for cell in table_lines[row_idx].split('|') if cell.strip()]
                for col_idx, cell_text in enumerate(cells):
                    if col_idx < num_cols:
                        clean_text = clean_html_tags(cell_text.replace('**', '').replace('__', ''))
                        table.rows[row_idx].cells[col_idx].text = clean_text
            
            doc.add_paragraph()
            in_list_context = False
            consecutive_bullets = 0
            continue
        
        # Bullet points (- or *) - Only use bullets for true lists
        elif line.strip().startswith(('- ', '* ', '• ')):
            text = line.strip()[2:].strip()
            text = clean_html_tags(text)  # Remove any HTML tags
            
            if consecutive_bullets == 0:
                upcoming = count_upcoming_bullets(lines, i)
            else:
                upcoming = consecutive_bullets
            
            use_bullet = (upcoming >= 3) or (in_list_context and is_list_item(text) and upcoming >= 2)
            
            if use_bullet:
                p = doc.add_paragraph(style='List Bullet')
                add_formatted_text(p, text, allow_bold=False)
                consecutive_bullets = upcoming
            else:
                p = doc.add_paragraph()
                add_formatted_text(p, text, allow_bold=False)
                consecutive_bullets = 0
            
            in_list_context = False
        
        # Numbered lists (1. 2. etc) - Check if it has HTML styling first
        elif re.match(r'^\d+\.\s', line.strip()):
            # Check if the line contains HTML span styling
            if '<span' in line:
                styled_text, is_styled = extract_styled_header(line)
                if is_styled:
                    p = doc.add_paragraph()
                    run = p.add_run(styled_text)
                    run.bold = True
                    run.font.size = Pt(14)
                    run.font.color.rgb = RGBColor(0, 0, 139)
                    in_list_context = True
                    consecutive_bullets = 0
                    i += 1
                    continue
            
            # Regular numbered list without HTML styling
            text = re.sub(r'^\d+\.\s', '', line.strip())
            text = clean_html_tags(text)
            p = doc.add_paragraph(style='List Number')
            add_formatted_text(p, text, allow_bold=True)
            for run in p.runs:
                run.font.size = Pt(14)
                run.bold = True
            in_list_context = True
            consecutive_bullets = 0
        
        # Bold text lines that appear to be section headers
        elif (line.strip().startswith('**') and line.strip().endswith('**') and 
              len(line.strip()) < 100 and '\n' not in line):
            # Check for HTML inside
            if '<span' in line:
                styled_text, is_styled = extract_styled_header(line)
                if is_styled:
                    p = doc.add_paragraph()
                    run = p.add_run(styled_text)
                    run.bold = True
                    run.font.size = Pt(14)
                    run.font.color.rgb = RGBColor(0, 0, 139)
                    in_list_context = True
                    consecutive_bullets = 0
                    i += 1
                    continue
            
            p = doc.add_paragraph()
            clean_line = clean_html_tags(line)
            add_formatted_text(p, clean_line, allow_bold=True)
            in_list_context = True
            consecutive_bullets = 0
        
        # Regular paragraph - no bold, no bullets
        else:
            p = doc.add_paragraph()
            clean_line = clean_html_tags(line)
            add_formatted_text(p, clean_line, allow_bold=False)
            in_list_context = False
            consecutive_bullets = 0
        
        i += 1

def add_formatted_text(paragraph, text, allow_bold=False):
    """Add text with optional bold/italic markdown formatting to a paragraph.
    
    Args:
        paragraph: The Word document paragraph to add text to
        text: The text content (may contain markdown formatting)
        allow_bold: If True, apply bold to **text**. If False, strip bold markers.
    """
    # First remove any HTML tags
    text = re.sub(r'<[^>]+>', '', text)
    
    parts = re.split(r'(\*\*.*?\*\*|__.*?__|`.*?`)', text)
    
    for part in parts:
        if not part:
            continue
        
        if part.startswith('**') and part.endswith('**'):
            run = paragraph.add_run(part[2:-2])
            if allow_bold:
                run.bold = True
        elif part.startswith('__') and part.endswith('__'):
            run = paragraph.add_run(part[2:-2])
            if allow_bold:
                run.bold = True
        elif part.startswith('`') and part.endswith('`'):
            run = paragraph.add_run(part[1:-1])
            run.font.name = 'Courier New'
        else:
            paragraph.add_run(part)


def sort_markdown_table_by_industry(markdown_text):
    lines = markdown_text.split('\n')
    table_start = None
    table_end = None
    for i, line in enumerate(lines):
        if '|' in line and table_start is None:
            table_start = i
        elif table_start is not None and ('|' not in line or not line.strip()):
            table_end = i
            break
    if table_start is None or table_end is None:
        return markdown_text

    header = lines[table_start]
    separator = lines[table_start + 1]
    rows = lines[table_start + 2:table_end]
    columns = [col.strip().lower() for col in header.split('|')]
    try:
        industry_idx = columns.index('industry')
    except ValueError:
        try:
            industry_idx = columns.index('category')
        except ValueError:
            return markdown_text

    def get_industry(row):
        cells = [cell.strip() for cell in row.split('|')]
        return cells[industry_idx] if industry_idx < len(cells) else ''
    rows_sorted = sorted(rows, key=get_industry)

    sorted_table = [header, separator] + rows_sorted
    lines = lines[:table_start] + sorted_table + lines[table_end:]
    return '\n'.join(lines)

## Generate Lookup List of SEC Documents in SEC Folder

In [None]:
# Generate list of SEC documents in OUTPUT_DIR_SEC_FILINGS and save to file
sec_files = []
if os.path.exists(OUTPUT_DIR_SEC_FILINGS):
    sec_files = sorted(os.listdir(OUTPUT_DIR_SEC_FILINGS))
    print(f"SEC Filings Directory: {OUTPUT_DIR_SEC_FILINGS}")
    print(f"Total files: {len(sec_files)}\n")
    print("-" * 60)
    #for f in sec_files:
        #print(f)
    #print("-" * 60)
    
    # Save the list to a text file in the SEC filings directory
    sec_files_list_path = os.path.join(OUTPUT_DIR_SEC_FILINGS, "sec_filings_list.txt")
    with open(sec_files_list_path, 'w') as f:
        f.write(f"SEC Filings List - Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")
        f.write(f"Directory: {OUTPUT_DIR_SEC_FILINGS}\n")
        f.write(f"Total files: {len(sec_files)}\n")
        f.write("-" * 60 + "\n")
        for sec_file in sec_files:
            f.write(f"{sec_file}\n")
    print(f"\n✅ SEC filings list saved to: {sec_files_list_path}")
else:
    print(f"Directory not found: {OUTPUT_DIR_SEC_FILINGS}")

## Build a dictionary of SEC files grouped by ticker

In [None]:
# Build dictionary of SEC files grouped by ticker
sec_files_by_ticker = {}

# Files to exclude (not actual SEC filings)
exclude_prefixes = ['cik', 'sec_files', 'sec_filings']

for f in sec_files:
    # Extract ticker (characters before first underscore)
    if '_' in f:
        ticker = f.split('_')[0]
    else:
        ticker = f  # Use full filename if no underscore
    
    # Skip non-filing files (cik_lookup.json, cik_lookup.csv, sec_files_by_ticker.json, etc.)
    if ticker.lower() in exclude_prefixes:
        continue
    
    if ticker not in sec_files_by_ticker:
        sec_files_by_ticker[ticker] = []
    sec_files_by_ticker[ticker].append(f)

# Display results
print("-" * 60)
print(f"Unique tickers with SEC filings: {len(sec_files_by_ticker)}")
print("-" * 60)
for ticker, files in sorted(sec_files_by_ticker.items()):
    print(f"\n{ticker} ({len(files)} files):")
    for file in files:
        print(f"  - {file}")
print("-" * 60)

# Save the dictionary to a JSON file in the SEC filings directory
sec_files_dict_path = os.path.join(OUTPUT_DIR_SEC_FILINGS, "sec_files_by_ticker.json")
with open(sec_files_dict_path, 'w') as f:
    json.dump(sec_files_by_ticker, f, indent=2)
print(f"\n✅ SEC files dictionary saved to: {sec_files_dict_path}")

In [None]:
# Ensure output directory exists
os.makedirs(OUTPUT_DIR_INDIVIDUAL_STOCK_ANALYSIS, exist_ok=True)  

# Initialize Perplexity client
client = PerplexityClient(api_key=API_KEY)
counter = 0

# Helper function to read SEC filing content
def read_sec_filing_content(filepath, max_chars=50000):
    """Read SEC filing HTML and extract text content, truncated to max_chars"""
    try:
        with open(filepath, 'r', encoding='utf-8', errors='ignore') as f:
            content = f.read()
        # Simple HTML tag removal for text extraction
        text = re.sub(r'<[^>]+>', ' ', content)
        text = re.sub(r'\s+', ' ', text).strip()
        if len(text) > max_chars:
            text = text[:max_chars] + "... [truncated]"
        return text
    except Exception as e:
        print(f"    Warning: Could not read {filepath}: {e}")
        return None


## Generate Analysis for Individual Equity and in the List

In [None]:

# Process each individual equity (not ETFs or mutual funds)
for equity in individual_equities:
    print(equity)
    # Check if report already exists for today
    date_str = datetime.now().strftime('%Y-%m-%d')
    output_filename = f"Equity Report - {equity} {date_str}.docx"
    output_path = os.path.join(OUTPUT_DIR_INDIVIDUAL_STOCK_ANALYSIS, output_filename)
 
    if os.path.exists(output_path):
        #print(f"⏭️  Skipping {equity} - report already exists for {date_str}")
        continue

    # Gather SEC filing content for this ticker
    sec_content = ""
    sec_files_used = []
    if equity in sec_files_by_ticker:
        print(f"  Loading {len(sec_files_by_ticker[equity])} SEC filings for {equity}...")
        for sec_file in sec_files_by_ticker[equity]:
            filepath = os.path.join(OUTPUT_DIR_SEC_FILINGS, sec_file)
            file_content = read_sec_filing_content(filepath, max_chars=30000)
            if file_content:
                sec_files_used.append(sec_file)
                sec_content += f"\n\n--- SEC Filing: {sec_file} ---\n{file_content}"
    
    # Construct prompt with equity ticker, SEC filings, and template
    if sec_content:
        prompt = f"For the equity {equity}, analyze the following SEC filings and {PROMPT_TEMPLATE}\n\nSEC FILINGS:\n{sec_content}"
    else:
        prompt = f"For the equity {equity} {PROMPT_TEMPLATE}"

    try:
        # Query the LLM
        print(f"Querying Perplexity API for {equity} ({counter + 1}/{len(individual_equities)})...")
        if sec_files_used:
            print(f"  Including SEC filings: {', '.join(sec_files_used)}")

        generated_text = client.chat(prompt, model=MODEL)
        counter += 1
        
        # Create Word document
        doc = Document()
        doc.add_heading(f"Market Outlook Report - {equity}", 0)
        doc.add_paragraph(f"Perplexity Sonar Pro Model Generated: {date_str}")
        doc.add_paragraph()
        
        # Add SEC filings used
        if sec_files_used:
            doc.add_heading("SEC Filings Analyzed:", level=2)
            for sf in sec_files_used:
                doc.add_paragraph(sf, style='List Bullet')
            doc.add_paragraph()
        
        # Convert markdown to Word formatting
        add_markdown_to_word(doc, generated_text)
        
        # Add prompt info at the end (without the full SEC content for readability)
        doc.add_paragraph()
        doc.add_heading("Prompt Used:", level=2)
        prompt_summary = f"For the equity {equity} {PROMPT_TEMPLATE}"
        if sec_files_used:
            prompt_summary += f"\n\n[SEC filings included: {', '.join(sec_files_used)}]"
        doc.add_paragraph(prompt_summary)
        
        # Save the document
        doc.save(output_path)
        # print(f"✅ Saved: {output_filename}")
        
    except Exception as e:
        print(f"❌ Error processing {equity}: {e}")
        continue

print(f"\n{'='*60}")
print(f"✅ Completed processing {len(individual_equities)} individual equities")
print(f"Reports saved to: {OUTPUT_DIR_INDIVIDUAL_STOCK_ANALYSIS}")
print(f"{'='*60}")

## Generate Analysis for Each ETF and in the List

In [None]:
# Process each ETF with SEC filing integration
# Output directory for ETF reports (can be same as individual or separate)
OUTPUT_DIR_ETF_ANALYSIS = os.getenv("Output_dir_etf_analysis", OUTPUT_DIR_INDIVIDUAL_STOCK_ANALYSIS)
os.makedirs(OUTPUT_DIR_ETF_ANALYSIS, exist_ok=True)

etf_counter = 0

for etf in etfs:
    print(etf)
    # Check if report already exists for today
    date_str = datetime.now().strftime('%Y-%m-%d')
    output_filename = f"ETF Report - {etf} {date_str}.docx"
    output_path = os.path.join(OUTPUT_DIR_ETF_ANALYSIS, output_filename)
 
    if os.path.exists(output_path):
        print(f"⏭️  Skipping {etf} - report already exists for {date_str}")
        continue

    # Gather SEC filing content for this ETF
    sec_content = ""
    sec_files_used = []
    if etf in sec_files_by_ticker:
        print(f"  Loading {len(sec_files_by_ticker[etf])} SEC filings for {etf}...")
        for sec_file in sec_files_by_ticker[etf]:
            filepath = os.path.join(OUTPUT_DIR_SEC_FILINGS, sec_file)
            file_content = read_sec_filing_content(filepath, max_chars=30000)
            if file_content:
                sec_files_used.append(sec_file)
                sec_content += f"\n\n--- SEC Filing: {sec_file} ---\n{file_content}"
    
    # Construct prompt with ETF ticker, SEC filings, and ETF-specific template
    if sec_content:
        prompt = f"For the ETF {etf}, analyze the following SEC filings and {PROMPT_ETF_ANALYSIS_FILE_TEMPLATE}\n\nSEC FILINGS:\n{sec_content}"
    else:
        prompt = f"For the ETF {etf} {PROMPT_ETF_ANALYSIS_FILE_TEMPLATE}"

    try:
        # Query the LLM
        print(f"Querying Perplexity API for ETF {etf} ({etf_counter + 1}/{len(etfs)})...")
        if sec_files_used:
            print(f"  Including SEC filings: {', '.join(sec_files_used)}")

        generated_text = client.chat(prompt, model=MODEL)
        etf_counter += 1
        
        # Create Word document
        doc = Document()
        doc.add_heading(f"ETF Analysis Report - {etf}", 0)
        doc.add_paragraph(f"Perplexity Sonar Pro Model Generated: {date_str}")
        doc.add_paragraph()
        
        # Add SEC filings used
        if sec_files_used:
            doc.add_heading("SEC Filings Analyzed:", level=2)
            for sf in sec_files_used:
                doc.add_paragraph(sf, style='List Bullet')
            doc.add_paragraph()
        
        # Convert markdown to Word formatting
        add_markdown_to_word(doc, generated_text)
        
        # Add prompt info at the end (without the full SEC content for readability)
        # doc.add_paragraph()
        # doc.add_heading("Prompt Used:", level=2)
        # prompt_summary = f"For the ETF {etf} {PROMPT_ETF_ANALYSIS_FILE_TEMPLATE}"
        # if sec_files_used:
        #     prompt_summary += f"\n\n[SEC filings included: {', '.join(sec_files_used)}]"
        # doc.add_paragraph(prompt_summary)
        
        # Save the document
        doc.save(output_path)
        print(f"✅ Saved: {output_filename}")
        
    except Exception as e:
        print(f"❌ Error processing ETF {etf}: {e}")
        continue

print(f"\n{'='*60}")
print(f"✅ Completed processing {len(etfs)} ETFs")
print(f"Reports saved to: {OUTPUT_DIR_ETF_ANALYSIS}")
print(f"{'='*60}")

## Generate Analysis for Each Mutual Fund in the List

In [None]:
# Process each Mutual Fund with SEC filing integration
# Output directory for Mutual Fund reports (can be same as individual or separate)
OUTPUT_DIR_MUTUAL_FUND_ANALYSIS = os.getenv("Output_dir_mutual_fund_analysis", OUTPUT_DIR_INDIVIDUAL_STOCK_ANALYSIS)
os.makedirs(OUTPUT_DIR_MUTUAL_FUND_ANALYSIS, exist_ok=True)

# Use the ETF analysis template for mutual funds (or define a separate one if needed)
PROMPT_MUTUAL_FUND_TEMPLATE = PROMPT_ETF_ANALYSIS_FILE_TEMPLATE

mf_counter = 0

for mf in mutual_funds:
    
    # Check if report already exists for today
    date_str = datetime.now().strftime('%Y-%m-%d')
    output_filename = f"Mutual Fund Report - {mf} {date_str}.docx"
    output_path = os.path.join(OUTPUT_DIR_MUTUAL_FUND_ANALYSIS, output_filename)
 
    if os.path.exists(output_path):
        print(f"⏭️  Skipping {mf} - report already exists for {date_str}")
        continue

    # Gather SEC filing content for this mutual fund
    sec_content = ""
    sec_files_used = []
    if mf in sec_files_by_ticker:
        print(f"  Loading {len(sec_files_by_ticker[mf])} SEC filings for {mf}...")
        for sec_file in sec_files_by_ticker[mf]:
            filepath = os.path.join(OUTPUT_DIR_SEC_FILINGS, sec_file)
            file_content = read_sec_filing_content(filepath, max_chars=30000)
            if file_content:
                sec_files_used.append(sec_file)
                sec_content += f"\n\n--- SEC Filing: {sec_file} ---\n{file_content}"
    
    # Construct prompt with mutual fund ticker, SEC filings, and template
    if sec_content:
        prompt = f"For the Mutual Fund {mf}, analyze the following SEC filings and {PROMPT_MUTUAL_FUND_TEMPLATE}\n\nSEC FILINGS:\n{sec_content}"
    else:
        prompt = f"For the Mutual Fund {mf} {PROMPT_MUTUAL_FUND_TEMPLATE}"

    try:
        # Query the LLM
        print(f"Querying Perplexity API for Mutual Fund {mf} ({mf_counter + 1}/{len(mutual_funds)})...")
        if sec_files_used:
            print(f"  Including SEC filings: {', '.join(sec_files_used)}")

        generated_text = client.chat(prompt, model=MODEL)
        mf_counter += 1
        
        # Create Word document
        doc = Document()
        doc.add_heading(f"Mutual Fund Analysis Report - {mf}", 0)
        doc.add_paragraph(f"Perplexity Sonar Pro Model Generated: {date_str}")
        doc.add_paragraph()
        
        # Add SEC filings used
        if sec_files_used:
            doc.add_heading("SEC Filings Analyzed:", level=2)
            for sf in sec_files_used:
                doc.add_paragraph(sf, style='List Bullet')
            doc.add_paragraph()
        
        # Convert markdown to Word formatting
        add_markdown_to_word(doc, generated_text)
        
        # Add prompt info at the end (without the full SEC content for readability)
        doc.add_paragraph()
        doc.add_heading("Prompt Used:", level=2)
        prompt_summary = f"For the Mutual Fund {mf} {PROMPT_MUTUAL_FUND_TEMPLATE}"
        if sec_files_used:
            prompt_summary += f"\n\n[SEC filings included: {', '.join(sec_files_used)}]"
        doc.add_paragraph(prompt_summary)
        
        # Save the document
        doc.save(output_path)
        print(f"✅ Saved: {output_filename}")
        
    except Exception as e:
        print(f"❌ Error processing Mutual Fund {mf}: {e}")
        continue

print(f"\n{'='*60}")
print(f"✅ Completed processing {len(mutual_funds)} Mutual Funds")
print(f"Reports saved to: {OUTPUT_DIR_MUTUAL_FUND_ANALYSIS}")
print(f"{'='*60}")

## Generate Ratings Change Report and Notifications for Individual Equities

In [None]:
# Generate Ratings Change Report for select equities
from docx import Document
from datetime import datetime
import os
import time

# Ensure output directory exists
os.makedirs(OUTPUT_DIR_PORTFOLIO_ANALYSIS, exist_ok=True)

date_str = datetime.now().strftime('%Y-%m-%d')
output_filename = f"Ratings Change Report {date_str}.docx"
output_path = os.path.join(OUTPUT_DIR_PORTFOLIO_ANALYSIS, output_filename)

doc = Document()
doc.add_heading("Ratings Change Report", 0)
doc.add_paragraph(f"Perplexity Sonar Pro Model Generated: {date_str}")
doc.add_paragraph()

#Summary table API call time: {t1-t0:.2f} seconds
# --- Individual equity details ---

for equity in individual_equities:
    prompt = f"For the equity {equity} {PROMPT_RATINGS_CHANGE_TEMPLATE}"
    doc.add_heading(f"{equity}", level=1)
    try:
        t0 = time.time()
        generated_text = client.chat(prompt, model=MODEL)
        t1 = time.time()
        add_markdown_to_word(doc, generated_text)
        # print(f"{equity} API call time: {t1-t0:.2f} seconds")
    except Exception as e:
        doc.add_paragraph(f"❌ Error processing {equity}: {e}")
    doc.add_paragraph()  # Space between equities

# Add prompt template at the end for reference
doc.add_heading("Prompt Template Used:", level=2)
doc.add_paragraph(PROMPT_RATINGS_CHANGE_TEMPLATE)

doc.save(output_path)
print(f"✅ Saved: {output_filename} in {OUTPUT_DIR_PORTFOLIO_ANALYSIS}")

## End