# 03 - Data Generation Playground

This notebook provides a workspace to use the defined prompt templates to generate various types of mock data for the Credit Judge PoC. You can use this to create:
1.  **Simulated Input Reports (RAG-style):** Text snippets mimicking retrieved data for LLM context.
2.  **AI-Generated Credit Reports (JSON):** Full credit reports generated by an LLM based on the detailed template.
3.  **Expert Human-Style Review Tables (JSON):** Reviews of existing reports, generated by an LLM acting as an expert.
4.  **Comparison Tables (Markdown):** Detailed comparisons between AI reports and human/expert reports.

**Objective:** To easily generate and save varied datasets for developing, testing, and refining the PoC components.

## 1. Setup and Imports

In [None]:
import datetime
import json
import os

# Assuming the notebook is run from the root of 'credit_judge_poc' or paths adjusted
from credit_judge_poc.src.prompts.judge_prompts import (
    CORPORATE_CREDIT_RATING_REPORT_TEMPLATE,
    SIMULATED_INPUT_REPORT_TEMPLATE,
    EXPERT_REVIEW_TABLE_GENERATION_TEMPLATE,
    COMPARISON_TABLE_GENERATION_TEMPLATE
)

# Placeholder for LLM SDK - replace with actual SDK if used
# import google.generativeai as genai
# GENAI_API_KEY = os.getenv("YOUR_LLM_API_KEY_ENV_VAR") # Example
# if GENAI_API_KEY:
#     genai.configure(api_key=GENAI_API_KEY)

# --- Output Directory Setup ---
# Create a general directory for generated mock data if it doesn't exist
GENERATED_DATA_DIR = os.path.join("..", "data", "generated_mock_data")
SUBDIRS = {
    "simulated_inputs": os.path.join(GENERATED_DATA_DIR, "simulated_input_reports"),
    "ai_reports": os.path.join(GENERATED_DATA_DIR, "ai_credit_reports"),
    "expert_reviews": os.path.join(GENERATED_DATA_DIR, "expert_review_tables"),
    "comparison_tables": os.path.join(GENERATED_DATA_DIR, "comparison_tables")
}

for subdir in SUBDIRS.values():
    os.makedirs(subdir, exist_ok=True)

print(f"Generated data will be saved under: {os.path.abspath(GENERATED_DATA_DIR)}")

## 2. Helper Function for Conceptual LLM Call & Saving Output

This is a conceptual function. You'll need to integrate your actual LLM calling logic here.

In [None]:
def generate_content_with_llm(prompt_text, output_filename, output_subdir_key, output_type="text"):
    """
    Conceptual function to simulate an LLM call and save the output.
    Replace with your actual LLM interaction logic.
    """
    print(f"\n--- Sending Prompt to LLM (Conceptual) ---")
    # print(prompt_text[:500] + "...") # Print snippet of prompt
    
    # --- !!! REPLACE THIS WITH ACTUAL LLM CALL !!! ---
    # model = genai.GenerativeModel(model_name="gemini-1.5-flash-latest")
    # response = model.generate_content(prompt_text)
    # llm_output = response.text
    llm_output = f"Placeholder LLM output for: {output_filename}. Content based on prompt type: {output_type}."
    if output_type == "json":
        llm_output = json.dumps({"placeholder_data": "This is JSON output for " + output_filename, "prompt_used": prompt_text[:100] + "..."}, indent=2)
    elif output_type == "markdown":
        llm_output = f"# {output_filename}\n\nThis is Markdown placeholder content."
    # --- END OF REPLACE SECTION ---
    
    print(f"--- LLM Response Received (Conceptual) ---")
    # print(llm_output[:500] + "...")

    # Save the output
    filepath = os.path.join(SUBDIRS[output_subdir_key], output_filename)
    try:
        with open(filepath, 'w', encoding='utf-8') as f:
            f.write(llm_output)
        print(f"Successfully saved output to: {filepath}")
    except IOError as e:
        print(f"Error saving file {filepath}: {e}")
    
    return llm_output

## 3. Generate Simulated Input Reports (RAG-style Text)

Use `SIMULATED_INPUT_REPORT_TEMPLATE`.

In [None]:
sim_input_scenarios = [
    {"company_name": "FutureGadgets Corp.", "ticker_symbol": "FGCP", "industry": "Consumer Electronics", "focus_areas": "New VR headset launch, supply chain issues"},
    {"company_name": "GreenEnergy Solutions", "ticker_symbol": "GES", "industry": "Renewable Energy", "focus_areas": "Impact of new government subsidies, competitor innovations"}
]

for i, scenario in enumerate(sim_input_scenarios):
    prompt = SIMULATED_INPUT_REPORT_TEMPLATE.format(**scenario)
    filename = f"{scenario['ticker_symbol']}_simulated_input_{i+1}.txt"
    # generate_content_with_llm(prompt, filename, "simulated_inputs", output_type="text") # Uncomment to run

## 4. Generate AI Credit Reports (JSON Output)

Use `CORPORATE_CREDIT_RATING_REPORT_TEMPLATE`. Remember to instruct the LLM (via meta-prompt if needed) to produce JSON output.

In [None]:
ai_report_scenarios = [
    {
        "company_name": "FutureGadgets Corp.", "ticker_symbol": "FGCP", "stock_exchange": "NASDAQ", 
        "current_date": "2024-05-15", "specific_financial_period_override": "N/A",
        "specific_revenue_override": "N/A", "specific_adj_ebitda_override": "N/A",
        "specific_net_income_override": "N/A", "specific_fcf_override": "N/A",
        "specific_other_data_override": "N/A", "special_focus_areas": "VR market penetration strategy",
        "human_example_report_text": "N/A" # Or provide one to test comparison generation within this prompt
    }
]

for i, scenario in enumerate(ai_report_scenarios):
    prompt = CORPORATE_CREDIT_RATING_REPORT_TEMPLATE.format(**scenario)
    # Crucial: Add meta-instruction for LLM to output JSON for this specific prompt.
    meta_instruction_for_json = "Your response must be a single, valid JSON object representing the credit report, based on the structure implied by the input template. "
    full_prompt_for_json_ai_report = meta_instruction_for_json + "\n\n--- TEMPLATE START ---\n" + prompt + "\n--- TEMPLATE END ---"
    filename = f"{scenario['ticker_symbol']}_ai_credit_report_{i+1}.json"
    # generate_content_with_llm(full_prompt_for_json_ai_report, filename, "ai_reports", output_type="json") # Uncomment to run

## 5. Generate Expert Human-Style Review Tables (JSON Output)

Use `EXPERT_REVIEW_TABLE_GENERATION_TEMPLATE`. This requires an existing report text as input.

In [None]:
# Assume we have a generated AI report text (or human report text) to review
# For example, load one of the AI reports generated in the previous step or use a placeholder.
report_to_review_filename = os.path.join(SUBDIRS["ai_reports"], "FGCP_ai_credit_report_1.json") # Example path
input_report_text = "Placeholder: Text of FGCP_ai_credit_report_1.json would go here. Or load from file."
try:
    with open(report_to_review_filename, 'r') as f:
        loaded_report_json = json.load(f)
        input_report_text = json.dumps(loaded_report_json, indent=2) # Or a summarized text version
except FileNotFoundError:
    print(f"File not found: {report_to_review_filename}, using placeholder text for review generation.")

expert_review_scenarios = [
    {"report_id_placeholder": "FGCP_Report_Review_1", "current_date": "2024-05-16", "input_credit_report_text": input_report_text}
]

for i, scenario in enumerate(expert_review_scenarios):
    prompt = EXPERT_REVIEW_TABLE_GENERATION_TEMPLATE.format(**scenario)
    filename = f"{scenario['report_id_placeholder']}_expert_review_{i+1}.json"
    # generate_content_with_llm(prompt, filename, "expert_reviews", output_type="json") # Uncomment to run

## 6. Generate Comparison Tables (Markdown Output)

Use `COMPARISON_TABLE_GENERATION_TEMPLATE`. Requires two report texts: an AI one and a Human/Expert one.

In [None]:
# Assume we have an AI report and a Human Expert report text
ai_report_for_comp_text = "Placeholder: Text of FGCP_ai_credit_report_1.json..."
# You might load an actual generated AI report here
try:
    with open(os.path.join(SUBDIRS["ai_reports"], "FGCP_ai_credit_report_1.json"), 'r') as f:
        ai_report_for_comp_text = json.dumps(json.load(f), indent=2) # Or a summarized text version
except FileNotFoundError:
    print("AI report for comparison not found, using placeholder.")

human_expert_report_for_comp_text = """ # Placeholder for a human expert report on the same company
**I. Overview (Human Expert for FGCP):** FutureGadgets Corp. shows promise but faces significant market headwinds. VR adoption is slower than anticipated.
**II. Financials (Human Expert):** Revenue growth is primarily from legacy products, VR sales are lagging. Margins are under pressure.
**III. Rating (Human Expert):** Assign B rating, Outlook Negative, due to execution risk in VR and competitive threats.
"""

comparison_scenarios = [
    {
        "company_name": "FutureGadgets Corp.", 
        "ai_generated_report_text": ai_report_for_comp_text,
        "human_expert_report_text": human_expert_report_for_comp_text
    }
]

for i, scenario in enumerate(comparison_scenarios):
    prompt = COMPARISON_TABLE_GENERATION_TEMPLATE.format(**scenario)
    filename = f"{scenario['company_name'].replace(' ', '')}_comparison_table_{i+1}.md"
    # generate_content_with_llm(prompt, filename, "comparison_tables", output_type="markdown") # Uncomment to run

## Next Steps

1.  Replace the conceptual `generate_content_with_llm` function with actual calls to your chosen LLM API.
2.  Customize the scenarios and input data to generate a diverse dataset.
3.  Use the generated data to test parsing, formatting, and evaluation components of the PoC.
4.  Iterate on the prompts themselves based on the quality of the LLM outputs.