# LLM Preprocessing Results – Attribute Span Tagging

This notebook:
- Runs the preprocessing (if needed)
- Loads the results from `results/`
- Lets us inspect some examples

In [1]:
import os
import pandas as pd
from pipeline import run_preprocessing

# Go 3 folders up from the notebook's folder
project_root = os.path.abspath(os.path.join(os.getcwd(), "..", "..", ".."))

print("Correct project root resolved as:", project_root)

infile = os.path.join(
    project_root,
    "data", "raw", "synthetic",
    "B2B_Customer_Feedback_Dataset.xlsx"
)

results_dir = os.path.join(os.getcwd(), "results")

excel_path = os.path.join(results_dir, "b2b_feedback_with_attribute_spans.xlsx")
flat_path = os.path.join(results_dir, "b2b_feedback_attribute_spans_flat.csv")
jsonl_path = os.path.join(results_dir, "b2b_feedback_attribute_spans.jsonl")

comment_col = "Comment"

Correct project root resolved as: c:\Users\tengc\Downloads\develop_ai_pipelines_testing\ai_pipeline_testing


In [2]:
if not (os.path.exists(excel_path) and os.path.exists(flat_path) and os.path.exists(jsonl_path)):
    print("Results not found – running preprocessing now...")
    df, flat_df = run_preprocessing(
        infile=infile,
        comment_col=comment_col,
        id_col=None,  # or your ID col if you have one
        out_excel=excel_path,
        out_flat=flat_path,
        out_jsonl=jsonl_path,
        model="gpt-4.1-mini",
        limit=None,  # change to small int for quick test
    )
else:
    print("Results already exist – loading from disk...")
    df = pd.read_excel(excel_path)
    flat_df = pd.read_csv(flat_path)

len(df), len(flat_df)

Results not found – running preprocessing now...


100%|██████████| 50/50 [02:05<00:00,  2.51s/it]


(50, 123)

In [3]:
# Show a few original comments + JSON span mapping
df[["Comment", "llm_spans_json"]].head(10)


Unnamed: 0,Comment,llm_spans_json
0,Status?,"{""comment"": ""Status?"", ""attributes"": {""Deliver..."
1,As per our call just now pls rush the 6 inch A...,"{""comment"": ""As per our call just now pls rush..."
2,"MTC received but cert date shows March 2024, o...","{""comment"": ""MTC received but cert date shows ..."
3,Can faster or not? Client side keep asking me ...,"{""comment"": ""Can faster or not? Client side ke..."
4,Good support from your Katherine during LKG pr...,"{""comment"": ""Good support from your Katherine ..."
5,Wrong item sent again. This is 3rd time alread...,"{""comment"": ""Wrong item sent again. This is 3r..."
6,Refer to my email dated 15 Oct regarding the m...,"{""comment"": ""Refer to my email dated 15 Oct re..."
7,Driver cannot find our Pioneer Road location. ...,"{""comment"": ""Driver cannot find our Pioneer Ro..."
8,Hi the valve you quoted is for freshwater syst...,"{""comment"": ""Hi the valve you quoted is for fr..."
9,"Flanges received yesterday, quality looks good...","{""comment"": ""Flanges received yesterday, quali..."


In [4]:
# One row per (comment, attribute, span)
flat_df.head(20)

Unnamed: 0,row_index,comment,attribute,text_span
0,0,Status?,Delivery,Status?
1,1,As per our call just now pls rush the 6 inch A...,Product,6 inch ANSI 150 flanges
2,1,As per our call just now pls rush the 6 inch A...,Delivery,rush the 6 inch ANSI 150 flanges to Tuas site ...
3,3,Can faster or not? Client side keep asking me ...,Product,DI fittings
4,3,Can faster or not? Client side keep asking me ...,Delivery,Need the DI fittings by COB today
5,4,Good support from your Katherine during LKG pr...,Service,Good support from your Katherine
6,4,Good support from your Katherine during LKG pr...,Service,same service level
7,6,Refer to my email dated 15 Oct regarding the m...,Product,manhole covers
8,6,Refer to my email dated 15 Oct regarding the m...,Service,Still no reply from your side?
9,7,Driver cannot find our Pioneer Road location. ...,Delivery,Driver cannot find our Pioneer Road location.


In [5]:
# Optional: confirm JSONL content
with open(jsonl_path, "r", encoding="utf-8") as f:
    for i, line in enumerate(f):
        if i >= 5:
            break
        print(line.strip())

{"comment": "Status?", "attributes": {"Delivery": ["Status?"]}}
{"comment": "As per our call just now pls rush the 6 inch ANSI 150 flanges to Tuas site by 3pm today. Foreman waiting.", "attributes": {"Product": ["6 inch ANSI 150 flanges"], "Delivery": ["rush the 6 inch ANSI 150 flanges to Tuas site by 3pm today"]}}
{"comment": "MTC received but cert date shows March 2024, our PO is Feb 2024. Which batch is this? Pls clarify asap.", "attributes": {}}
{"comment": "Can faster or not? Client side keep asking me already. Need the DI fittings by COB today if not project delay.", "attributes": {"Product": ["DI fittings"], "Delivery": ["Need the DI fittings by COB today"]}}
{"comment": "Good support from your Katherine during LKG project shutdown last month. Hope can maintain same service level for upcoming Jurong Island works.", "attributes": {"Service": ["Good support from your Katherine", "same service level"]}}
