# Ab Initio to PySpark Code Conversion

This notebook implements an Azure OpenAI service-based code conversion agent that converts Ab Initio code to PySpark. The agent will:
1. Read XFR files and schema layouts
2. Generate PySpark transformation code
3. Perform code review and optimization
4. Output production-ready PySpark code

# Package Installation and Setup
First, let's install all the required packages for our code interpreter agent. We'll use pip to install them.

In [2]:
# Install required packages
!pip install openai python-dotenv pandas jupyter notebook ipykernel requests matplotlib
%pip install --upgrade typing_extensions

# Import Required Libraries
Import necessary libraries for working with Azure OpenAI, environment variables, and data processing.

In [3]:
# Import necessary libraries
import openai
import pandas as pd
import requests
import matplotlib.pyplot as plt



In [6]:
from pydantic import BaseModel, Field
from openai import AzureOpenAI
import os
# import dependencies

import os
from openai import AzureOpenAI
from dotenv import load_dotenv
import json
import copy
import textwrap

# Load environment variables from root directory
from pathlib import Path
root_env_path = Path(__file__).parent.parent / '.env'
load_dotenv(root_env_path)

print(f"Loading .env file from: {root_env_path}")

client = AzureOpenAI(  
    api_version=os.getenv("AZURE_OPENAI_API_VERSION"),
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),  
    api_key=os.getenv("AZURE_OPENAI_API_KEY")
)

# class CaseStudy(BaseModel):
#     case_study_task: str = Field(..., 
#         description="You are a data engineer with expertise in Ab Initio and PySpark. You will be provided with a case study task that requires you to analyze and process data using these technologies. Your goal is to provide a solution that meets the requirements outlined in the task description.")
#     case_study_solution: str = Field(..., 
#         description="The expected solution to the case study.")


def o3minicall(prompt, reasoning_effort, response_format=None):

    system_message = """
    You are a **principal data engineer** who is fluent in both **Ab Initio XFR** and **PySpark** (Databricks).  
    Your task is to translate a given XFR file into a production-ready PySpark solution.

    ### 1. Deliverables  
    | Item | Description |
    |------|-------------|
    | **A. Modular functions** | Reusable PySpark functions that replicate every XFR rule or sub-graph.<br>• Name each function after the business rule it implements.<br>• No hard-coded paths or secrets—parameterize where appropriate. |
    | **B. Pipeline assembly** | A single `main()` (or notebook cell) that:<br>1. Reads **`input_schema`** (from the provided layout).<br>2. Sequentially applies the modular functions.<br>3. Selects / renames columns to match **`output_schema`**.<br>4. Writes the result (Parquet or table) ready for Databricks jobs. |
    | **C. Step-by-step explanation** | For every function and pipeline stage, include a concise markdown comment explaining *what* it does and *why* (1-3 sentences).<br>Focus on business logic, joins, aggregations, date maths, and default rules. |

    ### 2. Input Artifacts (available in variables)  
    * `xfr_content` - full Ab Initio logic  
    * `input_layout` - markdown layout file for the source schema  
    * `output_layout` - markdown layout file for the target schema  

    ### 3. Coding Guidelines  
    * Use **PySpark 3.x DataFrame API** only (no RDDs).  
    * Optimize for readability first; add `.cache()` only where beneficial.  
    * Follow PEP-8 naming (e.g., `add_company_number`, `calculate_policy_term`).  
    * Keep all string literals in a dedicated **`constants.py`** block (you may inline in the prompt for brevity).  
    * Validate data types explicitly; cast to `StringType` where the XFR expects `hive_string_t`.

    ### 4. Output Format  
    Provide a **single markdown code block** containing:  

    1. `import` statements and any constant dictionaries  
    2. All modular function definitions  
    3. The end-to-end pipeline assembly (`main()` or notebook cells)  
    4. Inline markdown comments for explanations  

    ---

    **Begin converting now.**


    """
    
    # Create the base parameters for the API call
    params = {
        "model": "o3",  # replace with the model deployment name
        "messages": [
            {
                "role": "user", 
                "content": system_message+" "+prompt
            }
        ],
        "reasoning_effort": reasoning_effort
    }
    
    # Only add response_format if it's provided
    if response_format is not None:
        params["response_format"] = response_format
        
    # Make the API call with the appropriate parameters
    completion = client.chat.completions.create(**params)
    

    # Comment out since 'event' is not defined
    # print(event)
    
    # print(completion.model_dump_json(indent=2))
    return completion



## Define inputs

## Define Example Case Study

Provide an example case study showing the conversion of ASC_VIP_Premium Ab Initio workflow to PySpark, including:
- Input/Output schema definitions
- Transformation functions
- Pipeline assembly
- Data validation

In [7]:
example_case_studies="""
### START EXAMPLE CASE STUDY

**Converting the ASC_VIP_Premium Ab Initio Workflow to PySpark**

Below is a complete walkthrough showing how to translate the Ab Initio *ASC_VIP_Premium* workflow (defined in a `.xfr` transform) into an equivalent, modular PySpark pipeline.
The pipeline reads source data, applies every business rule, and writes an output DataFrame that matches the required layout.

---

## 1 Input Data Schema and Loading

We start by defining the schema described in `simple_input_layout.txt`.
All original fields are `hive_string_t`, so we map them to `StringType` in Spark:

```python
from pyspark.sql.types import StructType, StructField, StringType

input_schema = StructType([
    StructField("messageid",             StringType(), True),
    StructField("agreementid",           StringType(), True),
    StructField("systementcd",           StringType(), True),
    StructField("sourcesystementcd",     StringType(), True),
    StructField("transactionuserid",     StringType(), True),
    StructField("transactioneffdttime",  StringType(), True),
    StructField("transactiontypeentcd",  StringType(), True),
    StructField("transactionprocesseddttimestr", StringType(), True),
    StructField("policynbr",             StringType(), True),
    StructField("policyversionnbr",      StringType(), True),
    StructField("premiumTypeEntCd",      StringType(), True),
    StructField("contracttermexpdttimestr", StringType(), True),
    StructField("contracttermlengthcnt", StringType(), True),
    StructField("contracttermeffdttime", StringType(), True),
    StructField("downPaymentAmtStr",     StringType(), True),
    StructField("downPaymentPct",        StringType(), True),
    StructField("downPaymentPctStr",     StringType(), True),
    StructField("accountingcompanyentcd",StringType(), True),
    StructField("stateproventcd",        StringType(), True),
    StructField("coverageentcd",         StringType(), True),
    StructField("coverageeffdttimestr",  StringType(), True),
    StructField("postalcode",            StringType(), True),
    StructField("fiscalperiod",          StringType(), True),
    StructField("peroccurrencelimitamtstr", StringType(), True),
    StructField("totalpremiumamtstr",    StringType(), True),
    StructField("fipscountyentcd",       StringType(), True),
    StructField("allstatecountycd",      StringType(), True),
    StructField("cityname",              StringType(), True),
    StructField("policystatusentcd",     StringType(), True),
    StructField("policyterminatedentcd", StringType(), True),
    StructField("netchangeamt",          StringType(), True),
    StructField("recordtype",            StringType(), True),
    StructField("batchid",               StringType(), True),
    StructField("rawfilename",           StringType(), True),
    StructField("recordsequenceid",      StringType(), True),
    StructField("accountingyear",        StringType(), True),
    StructField("accountingmonth",       StringType(), True)
])

df_input = (
    spark.read.format("csv")
    .schema(input_schema)
    .option("header", "false")
    .load("/path/to/ASC_VIP_Premium_input_data")
)
```

*(In production you may load from Hive or Parquet; keeping everything as strings preserves fidelity with the Ab Initio types.)*

---

## 2 Transformation Functions

### 2.1 Company Number Mapping

```python
import pyspark.sql.functions as F

COMPANY_MAP = {
    10:  "0031",  27:  "1585",  20:  "0025",  21:  "0776",  22:  "0002",
    60:  "0520",  63:  "0467",  64:  "0067",  65:  "0458",  68:  "1305",
    70:  "0420",  85:  "0517",  95:  "0522",  330: "1481",  270: "0192",
    339: "1437",  365: "1534",  382: "1546",  383: "1545",  357: "1562",
    359: "1564",  360: "1560",  361: "1561",  367: "1563",  387: "2206",
    386: "2207"
}
DEFAULT_COMPANY = "1481"

def add_companynumber(df):
    return df.withColumn(
        "companynumber",
        F.coalesce(
            F.expr(
                "CASE CAST(accountingcompanyentcd AS INT) " +
                " ".join([f"WHEN {k} THEN '{v}'" for k, v in COMPANY_MAP.items()]) +
                " END"
            ),
            F.lit(DEFAULT_COMPANY)
        )
    )
```

### 2.2 Policy Term Calculation

```python
from pyspark.sql.functions import year, month, to_timestamp, lit, lpad

def add_policyterm(df):
    df = (
        df.withColumn("trans_eff_dt", to_timestamp("transactioneffdttime"))
          .withColumn("term_exp_dt",  to_timestamp("contracttermexpdttimestr"))
          .withColumn("term_eff_dt",  to_timestamp("contracttermeffdttime"))
          .withColumn(
              "month_diff",
              (year("term_exp_dt") - year("trans_eff_dt")) * 12 +
              (month("term_exp_dt") - month("trans_eff_dt"))
          )
    )

    cond_newbiz          = F.col("transactiontypeentcd") == lit("0001")
    cond_endorse_nonzero = F.col("transactiontypeentcd").isin("0002", "0010") & (F.col("month_diff") != 0)
    cond_endorse_zero    = F.col("transactiontypeentcd").isin("0002", "0010") & (F.col("month_diff") == 0)
    cond_cancel          = F.col("transactiontypeentcd").isin("0004", "0012")
    cond_cancel_sameDay  = cond_cancel & (F.to_date("trans_eff_dt") == F.to_date("term_eff_dt"))
    cond_cancel_ge12     = cond_cancel & (F.col("month_diff") >= 12)
    cond_cancel_le1      = cond_cancel & (F.col("month_diff") <= 1)

    df = df.withColumn(
        "policyterm",
        F.when(cond_newbiz, lpad("contracttermlengthcnt", 2, "0"))
         .when(cond_endorse_nonzero, lpad(F.col("month_diff").cast("string"), 2, "0"))
         .when(cond_endorse_zero,    lit("01"))
         .when(cond_cancel_sameDay,  lpad("contracttermlengthcnt", 2, "0"))
         .when(cond_cancel_ge12,     lit("11"))
         .when(cond_cancel_le1,      lit("01"))
         .when(cond_cancel,          lpad(F.col("month_diff").cast("string"), 2, "0"))
         .otherwise(lpad("contracttermlengthcnt", 2, "0"))
    )

    return df.drop("trans_eff_dt", "term_exp_dt", "term_eff_dt", "month_diff")
```

### 2.3 Policy Effective Year

```python
def add_policyeffectiveyear(df):
    df = (
        df.withColumn("trans_eff_date",      to_timestamp("transactioneffdttime"))
          .withColumn("policy_term_eff_date", to_timestamp("contracttermeffdttime"))
    )

    cond_endorse    = F.col("transactiontypeentcd") == lit("0002")
    cond_cancel_mid = (
        F.col("transactiontypeentcd").isin("0004", "0012") &
        (F.to_date("trans_eff_date") != F.to_date("policy_term_eff_date"))
    )

    df = df.withColumn(
        "policyeffectiveyear",
        F.when(cond_endorse | cond_cancel_mid, F.date_format("trans_eff_date", "yyyy"))
         .otherwise(F.date_format("policy_term_eff_date", "yyyy"))
    )

    return df.drop("trans_eff_date", "policy_term_eff_date")
```

### 2.4 Classification Code Mapping

```python
def add_classificationcode(df):
    cov   = F.col("coverageentcd").cast("int")
    state = F.col("stateproventcd")

    df = df.withColumn(
        "classificationcode",
        F.when((state != "NY") & (cov == 317), "722000")
         .when((state != "NY") & (cov == 318), "709700")
         .when((state != "NY") & (cov == 319), "721000")
         .when((state != "NY") & (cov == 320), "751300")
         .when((state != "NY") & (cov == 321), "751500")
         .when((state != "NY") & (cov == 322), "799900")
         .when((state != "NY") & (cov == 323), "714400")
         .when((state != "NY") & (cov == 324), "714200")
         .when((state != "NY") & (cov == 325), "717700")
         .when((state == "NY") & (cov == 317), "703200")
         .when((state == "NY") & (cov == 319), "705100")
         .otherwise("")
    )
    return df
```

### 2.5 Miscellaneous Constant / Simple Fields

```python
from datetime import datetime

def add_misc_fields(df, filing_version="1.0"):
    run_ts = datetime.now().strftime("%Y-%m-%d%H:%M:%S")

    return (
        df
        .withColumn("lineofbusinesscode",  F.lit("06"))
        .withColumn("statecode",           F.col("stateproventcd"))
        .withColumn(
            "callyear",
            F.when(F.col("accountingyear").cast("int").isNotNull(),
                   (F.col("accountingyear").cast("int") + 1).cast("string"))
             .otherwise(F.lit(""))
        )
        .withColumn("experienceperiodyear", F.lit("0000"))
        .withColumn("experienceperiodmonth",F.lit("00"))
        .withColumn("experienceperiodday",  F.lit("00"))
        .withColumn("typeoflosscode",       F.lit("00"))
        .withColumn("annualstatementlobcd", F.lit("091"))
        .withColumn("policyidentification", F.lit("10"))
        .withColumn("claimantidentifier",   F.lit("000"))
        .withColumn("claimidentifier",      F.lit("0"*15))
        .withColumn("writtenpremium",
                    F.when(F.length("netchangeamt") > 0, F.col("netchangeamt"))
                     .otherwise(F.lit("0")))
        .withColumn("paidlosses",           F.lit("0"*12))
        .withColumn("paidclaims",           F.lit("0"*12))
        .withColumn("outstandinglosses",    F.lit("0"*12))
        .withColumn("outstandingclaims",    F.lit("0"*12))
        .withColumn("filingtype",           F.lit("PREMIUM"))
        .withColumn("filingruntimestamp",   F.lit(run_ts))
        .withColumn("filingsversion",       F.lit(filing_version))
    )
```

---

## 3 End-to-End Pipeline Assembly

```python
df_transformed = df_input
df_transformed = add_companynumber(df_transformed)
df_transformed = add_policyterm(df_transformed)
df_transformed = add_policyeffectiveyear(df_transformed)
df_transformed = add_classificationcode(df_transformed)
df_transformed = add_misc_fields(df_transformed, filing_version="1.0")
```

### Select Final Output Columns

```python
df_output = df_transformed.select(
    "messageid", "agreementid", "companynumber", "lineofbusinesscode", "statecode",
    "callyear", "experienceperiodyear", "experienceperiodmonth", "experienceperiodday",
    "classificationcode", "typeoflosscode", "policyeffectiveyear",
    "annualstatementlobcd", "policyidentification", "policyterm",
    "claimantidentifier", "claimidentifier", "writtenpremium",
    "paidlosses", "paidclaims", "outstandinglosses", "outstandingclaims",
    "policynbr", "recordsequenceid", "policyversionnbr", "coverageentcd",
    "transactiontypeentcd", "recordtype", "filingtype", "filingruntimestamp",
    "filingsversion", "accountingyear", "accountingmonth"
)
```

---

## 4 Writing the Output

```python
# As Parquet
df_output.write.mode("overwrite").parquet("/path/to/output/ASC_VIP_Premium_transformed.parquet")

# Or as a Hive table
df_output.write.mode("overwrite").saveAsTable("prod.asc_vip_premium_output")
```

The resulting Parquet files (or Hive table) exactly match the required output schema while implementing every business rule from the original Ab Initio workflow in PySpark.

### END EXAMPLE CASE STUDY
"""

In [8]:
#### ATTENTION: AI-generated code can include errors or operations you didn't intend. Review the code in this cell carefully before running it.

# ---------------------------------------------
# Imports
# ---------------------------------------------
import os

# Fabric puts mssparkutils under notebookutils
try:
    from notebookutils import mssparkutils  # ✅ Fabric/Synapse import path
except ImportError:
    # In the very rare case the runtime exposes it top‑level
    import mssparkutils

# ---------------------------------------------
# Paths
# ---------------------------------------------
file_prefix = "file:/lakehouse/default/"  # file prefix for absolute path

files = {
    "xfr"   : f"{file_prefix}Files/code-conversion-data/Complex/PersonalAuto_Premium_iFiling.xfr",
    "input" : f"{file_prefix}Files/code-conversion-data/Complex/complex_input_layout.txt",
    "output": f"{file_prefix}Files/code-conversion-data/Complex/complex_output_layout.txt",
}

# ---------------------------------------------
# Helper to read a small/medium text file
# ---------------------------------------------
def read_text(path: str) -> str:
    # Spark returns a single‑column DataFrame named “value” (or _c0 on older runtimes)
    return "\n".join(r[0] for r in spark.read.text(path).collect())

# ---------------------------------------------
# Load the three files
# ---------------------------------------------
try:
    xfr_content           = read_text(files["xfr"])
    input_layout_content  = read_text(files["input"])
    output_layout_content = read_text(files["output"])

    print("✅ Files read successfully!")
    print(f"  • XFR file:      {len(xfr_content):,} characters")
    print(f"  • Input layout:  {len(input_layout_content):,} characters")
    print(f"  • Output layout: {len(output_layout_content):,} characters")

except Exception as err:
    print(f"⚠️  Error reading files with Spark: {err}")
    # Optional quick existence check with mssparkutils
    for label, full_onelake in files.items():
        print(f"  • {label:<6}: {'FOUND' if mssparkutils.fs.exists(full_onelake) else 'NOT FOUND'}  → {full_onelake}")
    raise  # stops execution so downstream vars are not undefined



In [9]:
scenario=f"""

You are a data engineer with expertise in Ab Initio and PySpark.

Convert the following Ab Initio XFR transformation logic into reusable PySpark functions. 
Convert this to modular PySpark code suitable for Databricks.
Also create an end-to-end Spark DataFrame pipeline that reads input using the input layout, 
applies the logic, and outputs a DataFrame matching the output layout.

The final result will be a complete PySpark workflow that reads 
the input data, applies all the business rules, and writes the output, tailored for execution in Databricks.
Provide the explanation for each step.

**Strict output rules**  
• Return **one complete Python code block** and **nothing else** (no prose, no headings).  
• All explanations must appear as inline comments inside that code block.  
• The script must:  
   1. Define reusable PySpark functions that replicate every XFR rule.  
   2. Build an end-to-end DataFrame pipeline that:  
      a. Reads data using the *Input Layout* schema.  
      b. Applies all business-rule functions.  
      c. Selects/renames columns to match the *Output Layout*.  
      d. Writes the final DataFrame (Parquet or table) for Databricks jobs.  
• No extra commentary outside the code fence.

=== Input Layout ===
{input_layout_content}

=== Output Layout ===
{output_layout_content}

=== XFR Logic ===
{xfr_content}

=== Example ===
Example case study is:
{example_case_studies}

================================================================
Produce only the PySpark script:
```python
# (model inserts the complete Databricks-ready PySpark code with inline comments)
"""

In [22]:
# Instructions for Ab Initio to PySpark conversion:
# 1. Extract transformations and business rules from Ab Initio .xfr and HTML report
# 2. Convert each sub-model into modular PySpark functions
# 3. Create complete PySpark workflow for Databricks execution
# 4. Provide PySpark code with explanations for each step

In [41]:
print(scenario)

## Develop the case study

## Generate and Review Code

Generate PySpark code from the input XFR file and perform a detailed code review to ensure:
- Schema alignment
- Performance optimization
- Best practices compliance
- Security and governance


In [10]:
# 1️⃣  Call the model
output_case_study = o3minicall(scenario, "high")
markdown_text = output_case_study.choices[0].message.content

# 2️⃣  Destination: remove the "file:" prefix and (optionally) create the folder first
folder_path = "Files/code-conversion-data/solution"   # absolute OneLake path
file_name   = "solution_with_details.py"
dest_path   = f"{folder_path}/{file_name}"

from notebookutils import mssparkutils   # Fabric import

# Make sure the folder exists (creates it if missing)
mssparkutils.fs.mkdirs(folder_path)

# 3️⃣  Write the markdown file
mssparkutils.fs.put(dest_path, markdown_text, overwrite=True)
print(f"✅ Saved markdown to {dest_path}")


In [47]:
from notebookutils import mssparkutils

path = "Files/code-conversion-data/solution/solution_with_details.py"
print(mssparkutils.fs.head(path, 1024))   # first 1 KB


In [48]:
code_review_prompt=f"""
You are a principal Spark architect and code-review specialist.

=============================  TASK 1 - DETAILED REVIEW  =============================
**Cross-check the code against the supplied Input Layout and Output Layout.**  
For every issue you find, cite the offending line(s) or snippet.

A. Schema Mismatches  
  - Columns or types that do not exist in the input layout but are referenced.  
  - Missing required output columns or wrong data types for them.

B. Unused or Redundant Elements  
  - Columns, UDFs, caches, or joins that are created but never used downstream.

C. Performance Risks  
  - Wide shuffles, high-cardinality joins without broadcast/salt, `.collect()` on large DFs, `.repartition(1)`, etc.

D. Anti-Patterns & Style  
  - Hard-coded paths, secrets, magic numbers, long monolithic functions, non-PEP-8 names, replaceable UDFs.

E. Security / Governance  
  - Exposure of PII, unmasked secrets, non-encrypted S3/ADLS paths.

Return the review in **markdown** with four sections:
- **Critical Issues**   (must fix for correctness)  
- **Performance Risks**  
- **Style / Maintainability**  
- **Quick Wins**

=============================  TASK 2 - AUTO-REFACTOR  =============================
Produce a **single, complete Python code block** that resolves every Critical Issue:

* Align all source reads and writes with the provided layouts.  
* Drop unused columns early; broadcast or cache judiciously.  
* Replace UDFs with native Spark SQL functions when feasible.  
* Parameterize paths/secrets; follow PEP-8 (`snake_case`, <= 79-char lines).  
* Annotate each major step with a brief comment (what & why).  
* Guarantee the final DataFrame exactly matches the **Output Layout**.

Output format:
```markdown
### Review
- ...

### Refactored Code
```python
# full, runnable PySpark script with improvements
# ...

=============================  CONTEXT  SECTION  =============================
## Input Layout (complete file)
<input_layout>
{input_layout_content}
</input_layout>

## Output Layout (complete file)
<output_layout>
{output_layout_content}
</output_layout>

## Additional Context (optional)
<constraints>

</constraints>

## PySpark Code to Review
<code>
{output_case_study.choices[0].message.content}
</code>
----------------------------------------------------------------
"""

In [49]:
code_review_python=f"""
You are a principal Spark architect and code-review specialist.

Your mission:

1. **Analyse** the PySpark code against the supplied *Input Layout* and *Output Layout*.  
   - Find schema mismatches, unused columns, performance risks, anti-patterns, and security issues.  
   - Resolve every Critical Issue in the refactor.

2. **Deliver exactly one thing**:  
   **→ A single, complete PySpark code block** that incorporates all fixes and contains concise inline comments explaining *what was changed and why*.  
   - No prose, no review section, no headings—just the final script inside triple back-ticks.  
   - Conform to PEP-8 (`snake_case`, <= 79-char lines).  
   - Align reads/writes with the provided layouts.  
   - Drop unused columns early; use broadcast/caching judiciously.  
   - Replace UDFs with native Spark SQL functions where possible.  
   - Parameterise paths/secrets.  
   - Ensure the resulting DataFrame exactly matches the **Output Layout**.

Return format **must be only**:

```python
# refactored, runnable PySpark script with inline comments
# ...

=============================  CONTEXT  SECTION  =============================
## Input Layout (complete file)
<input_layout>
{input_layout_content}
</input_layout>

## Output Layout (complete file)
<output_layout>
{output_layout_content}
</output_layout>

## Additional Context (optional)
<constraints>
Running on Databricks Runtime 13.x, auto-scaling 2-8 nodes
</constraints>

## PySpark Code to Review
<code>
{output_case_study.choices[0].message.content}
</code>
----------------------------------------------------------------
"""

In [12]:
# code_review=o3minicall(code_review_prompt,"high")
# print(code_review.choices[0].message.content)

In [50]:
# 1️⃣  Call the model
code_review=o3minicall(code_review_prompt,"high")
markdown_text_review = code_review.choices[0].message.content

# 2️⃣  Destination: remove the "file:" prefix and (optionally) create the folder first
folder_path = "Files/code-conversion-data/solution"   # absolute OneLake path
file_name   = "code_review_with_details.md"
dest_path   = f"{folder_path}/{file_name}"

from notebookutils import mssparkutils   # Fabric import

# Make sure the folder exists (creates it if missing)
mssparkutils.fs.mkdirs(folder_path)

# 3️⃣  Write the markdown file
mssparkutils.fs.put(dest_path, markdown_text_review, overwrite=True)
print(f"✅ Saved markdown to {dest_path}")

In [None]:
# code_review_python=o3minicall(code_review_python,"high")
# print(code_review_python.choices[0].message.content)

In [53]:
# 1️⃣  Call the model
code_review_python=o3minicall(code_review_python,"high")
python_output = code_review_python.choices[0].message.content

# 2️⃣  Destination: remove the "file:" prefix and (optionally) create the folder first
folder_path = "Files/code-conversion-data/solution"   # absolute OneLake path
file_name   = "code_review_just_python_output_with_details.py"
dest_path   = f"{folder_path}/{file_name}"

from notebookutils import mssparkutils   # Fabric import

# Make sure the folder exists (creates it if missing)
mssparkutils.fs.mkdirs(folder_path)

# 3️⃣  Write the markdown file
mssparkutils.fs.put(dest_path, python_output, overwrite=True)
print(f"✅ Saved markdown to {dest_path}")