# DEMO CLIP

#### Author Jiacheng Huang

In this tutorial, we will cover the following basics for labeling and fine-tuning multimodal data using the GPT module from LabelGenius (https://github.com/mediaccs/LabelGenius):

1. Labeling image data with GPT
2. Labeling text data with GPT
3. Combining text and image inputs for multimodal classification
4. Fine-tuning GPT for improved text only task-specific performance


#### Cite LabelGenius
Huang, J., Zhang, Z., & Su, C. C. (2025). LabelGenius: A Tool for Dynamic and Flexible Labeling for LLM-based Multimodal Content Labeling [Computer software]. GitHub. https://github.com/mediaccs/LabelGenius

In [1]:
! pip install labelgenius



In [2]:
# Import the functions
from labelgenius import classification_GPT, generate_GPT_finetune_jsonl, finetune_GPT,price_estimation, auto_verification

api_key = "" #replace with your API key

Using device: cpu


Demo 1: Single-Category Classification using N24News Dataset
-------------------------------------------------------------

This demo shows how to classify a single news article into one of 24 category
using the N24News dataset. Each article in the dataset includes both textual
and visual information.

Source: https://aclanthology.org/2022.lrec-1.729/


Each article contains the following fields:
- 'section': Ground truth label (one of 24 category)
- 'headline': Title of the article
- 'abstract': Short summary of the article
- 'article': Full text content
- 'article_url': Link to the original article
- 'image': Encoded image or metadata (optional)
- 'caption': Image caption
- 'image_id': Unique image identifier
- 'img_dir': Path to the associated image (e.g., 'N24News/imgs_200_sample1/12345.jpg')
- 'article_id': Unique article identifier

Image file: Multimodal_image

Example category (See prompt_D1 for the complete category):
------------------------
1. Health
2. Science
3. Television
...
24. Global Business

Reference:
----------
Wang, Z., Shan, X., Zhang, X., & Yang, J. (2022).
N24News: A New Dataset for Multimodal News Classification.
In *Proceedings of the Thirteenth Language Resources and Evaluation Conference* (pp. 6768–6775). LREC.


# Example without using LabelGenius
## Step 1: Test the Prompt at OpenAI Playground (https://platform.openai.com/playground/)

Choose the Model: Select the model you want to use (e.g., gpt-4.1, o4-mini, etc.).

Familiarize with Parameters: Understand the adjustable parameters such as temperature, effort.

Test the Prompts: Experiment with different prompt structures to optimize performance and clarity. Ensure you have a well-structured and effective prompt before moving to implementation.

## Step 2: Run the Prompt in Python

In the playground, remove the GPT-generated responses; only keep the system settings and your chat instructions.

Click "Code" in the Playground and select "Copy Code" for the Python template.

Understand the Example Output: Analyze the output format and structure. You may need to handle parsing, error catching, or post-processing depending on your application.

In [3]:
import os
import time
import pandas as pd
from openai import OpenAI
from tqdm.auto import tqdm


# ── CONFIG ────────────────────────────────────────────────────────────────
os.environ["OPENAI_API_KEY"] = api_key
client = OpenAI()

# ── Testing (copied from OpenAI playground) ───────────────────────────────
response = client.responses.create(
  model="o4-mini",
  input=[
    {
      "role": "developer",
      "content": [
        {
          "type": "input_text",
          "text": """You are given a short news article. Based on its content, return the most appropriate category label from the list below. Respond with only a number from 1 to 24, corresponding to the best-matching category:

1. Health – medical news, public health, fitness, mental health, wellness
2. Science – scientific discoveries, research studies, space, innovations
3. Television – TV shows, reviews, industry news, streaming
4. Travel – tourism, destinations, travel guides, airlines, vacation trends
5. Movies – film news, reviews, box office, upcoming releases
6. Dance – ballet, contemporary styles, street dance, performances
7. Real Estate – housing trends, property sales, architecture, urban planning
8. Economy – macroeconomics, inflation, markets, GDP, financial policies
9. Sports – competitions, athletes, Demo_results, professional sports
10. Theater – plays, Broadway, live performances, stage reviews
11. Opinion – editorials, commentary, analysis
12. Music – albums, artists, concerts, festivals, music trends
13. Books – literature, bestsellers, author interviews, book reviews
14. Art & Design – fine arts, museums, exhibitions, visual/design trends
15. Style – fashion trends, beauty, personal style, aesthetics
16. Media – journalism, publishing, digital media, mass communication
17. Food – restaurants, recipes, cooking, culinary culture
18. Well – lifestyle, self-care, mental well-being, personal development
19. Fashion – clothing, designers, fashion weeks, industry insights
20. Technology – AI, gadgets, software, cybersecurity, tech innovations
21. Your Money – personal finance, investing, budgeting, financial planning
22. Education – schools, universities, learning methods, education policies
23. Automobiles – car news, EVs, reviews, industry trends
24. Global Business – international trade, corporations, mergers, global markets

Return the complete label only (e.g., 1), no extra words."""
            }
      ]
    },
    {
      "role": "user",
      "content": [
        {
          "type": "input_text",
          "text": "\"\"\" Our guide to the city's best classical music and opera.\"\"\"\n      "
        }
      ]
    }
  ],
  text={
    "format": {
      "type": "text"
    }
  },
  reasoning={
    "effort": "medium"
  },
  tools=[],
  store=True
)

In [4]:
response

Response(id='resp_684604fdd7e4819e9d5714fb47ed811400800864b5b64e6a', created_at=1749419261.0, error=None, incomplete_details=None, instructions=None, metadata={}, model='o4-mini-2025-04-16', object='response', output=[ResponseReasoningItem(id='rs_684604fe36ec819e862b6eb0dd9ffd8d00800864b5b64e6a', summary=[], type='reasoning', status=None), ResponseOutputMessage(id='msg_684604fee650819e9b36ec0d37050a5700800864b5b64e6a', content=[ResponseOutputText(annotations=[], text='12', type='output_text')], role='assistant', status='completed', type='message')], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[], top_p=1.0, max_output_tokens=None, previous_response_id=None, reasoning=Reasoning(effort='medium', generate_summary=None, summary=None), status='completed', text=ResponseTextConfig(format=ResponseFormatText(type='text')), truncation='disabled', usage=ResponseUsage(input_tokens=439, input_tokens_details=InputTokensDetails(cached_tokens=0), output_tokens=71, output_token

## Price Estimation

#### This function estimates the cost required to label a dataset using a language model like GPT, based on OpenAI API pricing.

### Function Purpose
The `price_estimation()` function calculates the estimated total cost for labeling data rows using a language model, where each row is reviewed by multiple voters (e.g., for majority vote labeling).

### Parameters
- **response**: The model's response content (used to compute input/output token lengths).
- **num_rows** *(int)*: Total number of data rows to label.
- **input_cost_per_million** *(float)*: Cost (in USD) per 1 million tokens sent *to* the model (prompt).
- **output_cost_per_million** *(float)*: Cost (in USD) per 1 million tokens returned *by* the model (completion).
- **num_votes** *(int)*: Number of times each row is labeled (e.g., run the same task for 3 to use majority voting).

#### Price for different model can be found at: https://openai.com/api/pricing/

In [5]:
price_estimation (response,
                  num_rows = 20,
                  input_cost_per_million = 1.10,
                  output_cost_per_million = 4.44,
                  num_votes = 3)

# https://openai.com/api/pricing/


🧮 Estimated Cost for 60 calls (20 rows × 3 votes)
• Avg prompt tokens/call:     439
• Avg completion tokens/call: 71
• Pricing ($/1M tokens): prompt=$1.1, completion=$4.44
💰 Total: $0.0479    (±10% → $0.0431–$0.0527)



0.04788840000000001

## Initial Labeling with GPT-4o

This function uses the `classification_GPT()` method to label a dataset using the GPT-4o model. The goal is to generate theme/category labels for each row of text input through model prompting and majority voting.


### Demo 1a: Single-Category Text Classification

**Datasets:**
- `D1_1.csv`: Used for initial labeling and fine-tuning.
- `D1_1.csv`: Used for testing the fine-tuned model's performance.


In [6]:
# Define the list of 24 category labels
category_D1_GPT = [
    "1", "2", "3", "4", "5", "6",
    "7", "8", "9", "10", "11", "12",
    "13", "14", "15", "16", "17", "18",
    "19", "20", "21", "22", "23", "24"
]


# Define the descriptive prompts for each category
prompt_D1_GPT = ["""You are given a short news article. Based on its content, return the most appropriate category label from the list below. Respond with only a number from 1 to 24, corresponding to the best-matching category:

1. Health – medical news, public health, fitness, mental health, wellness
2. Science – scientific discoveries, research studies, space, innovations
3. Television – TV shows, reviews, industry news, streaming
4. Travel – tourism, destinations, travel guides, airlines, vacation trends
5. Movies – film news, reviews, box office, upcoming releases
6. Dance – ballet, contemporary styles, street dance, performances
7. Real Estate – housing trends, property sales, architecture, urban planning
8. Economy – macroeconomics, inflation, markets, GDP, financial policies
9. Sports – competitions, athletes, Demo_results, professional sports
10. Theater – plays, Broadway, live performances, stage reviews
11. Opinion – editorials, commentary, analysis
12. Music – albums, artists, concerts, festivals, music trends
13. Books – literature, bestsellers, author interviews, book reviews
14. Art & Design – fine arts, museums, exhibitions, visual/design trends
15. Style – fashion trends, beauty, personal style, aesthetics
16. Media – journalism, publishing, digital media, mass communication
17. Food – restaurants, recipes, cooking, culinary culture
18. Well – lifestyle, self-care, mental well-being, personal development
19. Fashion – clothing, designers, fashion weeks, industry insights
20. Technology – AI, gadgets, software, cybersecurity, tech innovations
21. Your Money – personal finance, investing, budgeting, financial planning
22. Education – schools, universities, learning methods, education policies
23. Automobiles – car news, EVs, reviews, industry trends
24. Global Business – international trade, corporations, mergers, global markets

Return the categorty label only (e.g., 5), no extra words."""]


## GPT-4o

In [7]:
# Perform initial labeling using GPT-4o with majority voting
D1a_GPT_4o_inital_lableing = classification_GPT(
    
    # Path to the input CSV file containing text data
    text_path="Demo_data/D1_1.csv",
    
    # List or dictionary of candidate categories to classify into
    category=category_D1_GPT,
    
    # Prompt that guides GPT's labeling behavior (e.g., task instructions)
    prompt=prompt_D1_GPT,
    
    # List of column names whose content will be combined as input for labeling
    column_4_labeling=["headline", "article", "abstract"],
    
    # GPT model to use (e.g., GPT-4o compact version)
    model="gpt-4o-mini",
    
    # API key for accessing OpenAI services
    api_key=api_key,
    
    # Controls randomness (required for GPT-4o and GPT-4-turbo); 1 = moderate creativity
    temperature=1,
    
    # Type of content (e.g., "text" means textual input, not image or code)
    mode="text",
    
    # Name of the column that will store the labeling output in the resulting DataFrame
    output_column_name="D1a_GPT_4o_inital_lableing",
    
    # Number of labels to generate per row (usually 1 for single-label tasks)
    num_themes=1,
    
    # Number of independent model votes per row (e.g., 3 = majority vote among 3 outputs)
    num_votes=3
)

# Save the labeled results to a CSV file
D1a_GPT_4o_inital_lableing.to_csv("Demo_result/D1a_GPT_4o_inital_lableing.csv", index=False)


Classifying text_class: 100%|██████████| 20/20 [00:19<00:00,  1.01item/s]


In [8]:
D1a_GPT_4o_inital_lableing

Unnamed: 0,section,headline,article_url,article,abstract,article_id,image,caption,image_id,image_path,section_numeric,text_content,image_data_url,final_input,D1a_GPT_4o_inital_lableing,D1a_GPT_4o_inital_lableing_raw,D1a_GPT_4o_inital_lableing_1
0,Fashion & Style,"On This Runway, Non-Models and Cool Kids",https://www.nytimes.com/2016/09/10/fashion/eck...,"Over Labor Day weekend, a steady stream of hop...",How the Eckhaus Latta designers select fashion...,e9cd6477-5eb6-58b2-8e33-fd2d881bf656,https://static01.nyt.com/images/2016/09/10/fas...,"Mike Eckhaus, center right, and Zoe Latta, cen...",e9cd6477-5eb6-58b2-8e33-fd2d881bf656,/Demo_data/D1_imgs_1/e9cd6477-5eb6-58b2-8e33-f...,19,"On This Runway, Non-Models and Cool Kids Over ...",,"On This Runway, Non-Models and Cool Kids Over ...",19,[19],19
1,Theater,"The New 42nd Street, a Theater Nonprofit, Name...",https://www.nytimes.com/2019/06/11/theater/new...,The New 42nd Street -- the nonprofit organizat...,Russell Granet will be the new president and c...,acdcd1ef-71a9-55f2-9993-7a231d57396f,https://static01.nyt.com/images/2019/06/11/art...,"The New 42nd Street board chairwoman, Fiona Ru...",acdcd1ef-71a9-55f2-9993-7a231d57396f,/Demo_data/D1_imgs_1/acdcd1ef-71a9-55f2-9993-7...,10,"The New 42nd Street, a Theater Nonprofit, Name...",,"The New 42nd Street, a Theater Nonprofit, Name...",10,[10],10
2,Economy,Tax Tactics Threaten Public Funds,https://www.nytimes.com/2014/10/02/business/ec...,When the European Commission charged this week...,If global corporations can continue to evade t...,1de5c156-9cea-5971-9fdf-b6a4ce9bf35c,https://static01.nyt.com/images/2012/02/29/bus...,Eduardo Porter,1de5c156-9cea-5971-9fdf-b6a4ce9bf35c,/Demo_data/D1_imgs_1/1de5c156-9cea-5971-9fdf-b...,8,Tax Tactics Threaten Public Funds When the Eur...,,Tax Tactics Threaten Public Funds When the Eur...,24,[24],24
3,Television,Review: 'The Chi' Returns to the South Side of...,https://www.nytimes.com/2019/04/05/arts/televi...,"""The Chi"" was built around a series of shootin...",Lena Waithe's neighborhood drama on Showtime d...,7bb11a9c-2f6d-57f9-bb36-db7f00f59589,https://static01.nyt.com/images/2019/04/05/art...,"Alex Hibbert as Kevin in ""The Chi,"" a sprawlin...",7bb11a9c-2f6d-57f9-bb36-db7f00f59589,/Demo_data/D1_imgs_1/7bb11a9c-2f6d-57f9-bb36-d...,3,Review: 'The Chi' Returns to the South Side of...,,Review: 'The Chi' Returns to the South Side of...,3,[3],3
4,Opinion,Banks Should Face History and Pay Reparations,https://www.nytimes.com/2020/06/26/opinion/sun...,Ms. Blackwell is founder in residence at Polic...,The financial industry can close the wealth ga...,165426ae-1e7c-5193-8596-ef96357645a8,https://static01.nyt.com/images/2020/06/25/opi...,A Wells Fargo bank in Minneapolis was set on f...,165426ae-1e7c-5193-8596-ef96357645a8,/Demo_data/D1_imgs_1/165426ae-1e7c-5193-8596-e...,11,Banks Should Face History and Pay Reparations ...,,Banks Should Face History and Pay Reparations ...,21,[21],21
5,Books,"Mercedes Barcha, Gabriel García Márquez's Wife...",https://www.nytimes.com/2020/08/23/books/merce...,"Mercedes Barcha and her husband, the Colombian...","For nearly 60 years, she was the Nobel Prize-w...",23348aa1-6fa8-5ca8-a875-71fe704e6dd2,https://static01.nyt.com/images/2020/08/24/obi...,"Mercedes Barcha and her husband, the Colombian...",23348aa1-6fa8-5ca8-a875-71fe704e6dd2,/Demo_data/D1_imgs_1/23348aa1-6fa8-5ca8-a875-7...,13,"Mercedes Barcha, Gabriel García Márquez's Wife...",,"Mercedes Barcha, Gabriel García Márquez's Wife...",13,[13],13
6,Opinion,The George Floyd Election,https://www.nytimes.com/2020/06/03/opinion/geo...,"Mike Erlandson, a former chairman of the state...",How the protests come to be viewed may determi...,32d42c23-4bd8-504a-8151-912b4c3f002d,https://static01.nyt.com/images/2020/06/03/opi...,Protests over the killing of George Floyd in N...,32d42c23-4bd8-504a-8151-912b4c3f002d,/Demo_data/D1_imgs_1/32d42c23-4bd8-504a-8151-9...,11,"The George Floyd Election Mike Erlandson, a fo...",,"The George Floyd Election Mike Erlandson, a fo...",11,[11],11
7,Books,"For Black and Mixed-Race Women, Hair and Ident...",https://www.nytimes.com/2020/03/17/books/revie...,What do we lose in translation? How do we know...,"In ""That Hair,"" the Portuguese writer Djaimili...",7e7703aa-4180-57a3-9781-c74a25d31afb,https://static01.nyt.com/images/2020/03/22/boo...,Djaimilia Pereira de Almeida,7e7703aa-4180-57a3-9781-c74a25d31afb,/Demo_data/D1_imgs_1/7e7703aa-4180-57a3-9781-c...,13,"For Black and Mixed-Race Women, Hair and Ident...",,"For Black and Mixed-Race Women, Hair and Ident...",13,[13],13
8,Books,A Contemporary Feminist Spin on the Traditiona...,https://www.nytimes.com/2019/07/08/books/revie...,Motivated in part by guilt as a privileged out...,"In Madeline ffitch's debut, ""Stay and Fight,"" ...",c030fbee-2601-54f5-a9df-75fa8c13368c,https://static01.nyt.com/images/2019/06/27/boo...,"Madeline ffitch's debut novel, ""Stay and Fight...",c030fbee-2601-54f5-a9df-75fa8c13368c,/Demo_data/D1_imgs_1/c030fbee-2601-54f5-a9df-7...,13,A Contemporary Feminist Spin on the Traditiona...,,A Contemporary Feminist Spin on the Traditiona...,13,[13],13
9,Media,Minnesota Public Radio Drops Garrison Keillor ...,https://www.nytimes.com/2017/11/29/business/me...,Minnesota Public Radio said Wednesday that it ...,The network says it has severed all business t...,ffc5888c-af9f-5898-b5db-e838334b487e,https://static01.nyt.com/images/2017/11/30/bus...,Garrison Keillor in 2016.,ffc5888c-af9f-5898-b5db-e838334b487e,/Demo_data/D1_imgs_1/ffc5888c-af9f-5898-b5db-e...,16,Minnesota Public Radio Drops Garrison Keillor ...,,Minnesota Public Radio Drops Garrison Keillor ...,3,[3],3


In [9]:
## Check the accuracy: GPT 4o inital labeling
D1a_GPT_4o_inital_lableing['D1a_GPT_4o_inital_lableing'] = pd.to_numeric(D1a_GPT_4o_inital_lableing['D1a_GPT_4o_inital_lableing'], errors='coerce')
D1a_GPT_4o_inital_lableing['section_numeric'] = pd.to_numeric(D1a_GPT_4o_inital_lableing['section_numeric'], errors='coerce')

auto_verification(
    D1a_GPT_4o_inital_lableing,           # DataFrame containing predictions and true labels
    predicted_cols="D1a_GPT_4o_inital_lableing",  # Column with model-generated labels
    true_cols="section_numeric",                 # Column with ground-truth labels
    category=category_D1_GPT                     # List or dict of label definitions
)


== Verification of 'D1a_GPT_4o_inital_lableing' vs. 'section_numeric' ==
Accuracy:   70.00%
Macro F1:   49.94%
Micro  F1:  70.00%

Full classification report:
              precision    recall  f1-score   support

           1       0.00      0.00      0.00         0
           3       0.75      1.00      0.86         3
           4       1.00      1.00      1.00         1
           5       0.67      1.00      0.80         2
           8       0.00      0.00      0.00         1
          10       1.00      1.00      1.00         1
          11       1.00      0.33      0.50         3
          13       1.00      1.00      1.00         3
          16       0.00      0.00      0.00         1
          18       0.00      0.00      0.00         0
          19       1.00      0.50      0.67         2
          20       1.00      1.00      1.00         1
          21       0.00      0.00      0.00         0
          23       1.00      0.50      0.67         2
          24       0.00      

### Note: Fine-Tune GPT models:

Ensure that fine-tuning is performed on snapshot models, meaning the model's state is preserved with the training data up to that point. This approach allows the model to learn from incremental updates effectively.


GPT reasoning models (i.e., o-series) can not be fine-tuned for now

#### Minimum Data Requirement:
Fine-tuning for GPT-4o starts to show effective learning with around ~20 examples each category. Insufficient data may not capture nuanced learning effectively.

#### Bad Perormance Tiny Datasets:

When fine-tuning with very small datasets, the optimizer may converge to simply output the majority token as this is the easiest way to minimize loss.

This behavior is not representative of true learning—it reflects a bias towards frequent labels instead of understanding contextual differences.


## Prepare Data for GPT-4o Fine-Tuning

This section prepares the dataset for fine-tuning GPT-4o by generating a `.jsonl` (JSON Lines) file that follows OpenAI’s fine-tuning format.

### JSONL Structure
Each line in the output file will contain:
1. **`system_prompt`**: The classification or labeling instruction that guides the model (e.g., task prompt).
2. **`input_col`**: The concatenated content from specified columns (e.g., `"headline"`, `"article"`, `"abstract"`) that the model uses as input.
3. **`label_col`**: The target label column that provides the correct classification (e.g., `"section_numeric"`).


In [10]:
generate_GPT_finetune_jsonl(
    D1a_GPT_4o_inital_lableing,  # DataFrame containing the input text and labels
    output_path="Demo_result/D1a_GPT_4o_inital_lableing.jsonl",  # Where to save the JSONL file
    system_prompt=prompt_D1_GPT,  # Instruction guiding the model’s behavior
    input_col=["headline", "article", "abstract"],  # Input fields to combine as prompt context
    label_col=["section_numeric"]  # Target label for training
)


# GPT-4o Fine-Tune Hyperparameters

---

##  **1️⃣ Batch Size Considerations**
The batch size determines how many samples are processed at once. Larger batch sizes may speed up training but can lead to instability, especially with limited data.

###  **Recommendations:**
| Dataset Size         | Recommended Batch Size |
|-----------------------|------------------------|
| **< 1,000 samples**  | `batch_size: 4`       |
| **1,000 - 10,000**   | `batch_size: 8` or `16` |
| **> 10,000**         | `batch_size: 32`      |
| **> 100,000**        | `batch_size: 64`      |

###  **Caution:**
- Increasing batch size can lead to overfitting if the data is not diverse.
- If training loss is unstable, consider lowering the batch size.

---

##  **2️⃣ Learning Rate Multiplier**
The learning rate multiplier scales the base learning rate of the model. A value of **0.1** means the effective learning rate is **10%** of the model's default.

###  **Recommendations:**
| Data Type                         | Learning Rate Multiplier |
|-----------------------------------|--------------------------|
| **High variance text** (e.g., news articles, social media posts) | `0.02` to `0.05` |
| **Domain-specific text** (e.g., scientific abstracts, legal documents) | `0.02` to `0.1`  |
| **Noisy or mixed-domain data** (e.g., user-generated content) | `0.01` to `0.02` |
| **Highly structured data** (e.g., technical manuals) | `0.05` to `0.1` |

###  **Caution:**
- If loss suddenly spikes, reduce the learning rate multiplier.
- If the model underfits (low accuracy and low loss), slightly increase the multiplier.
- Avoid setting too high a learning rate (`> 0.1`) unless you have very clean and structured data.

---

In [11]:
## Fine-Tune GPT-4o

D1a_GPT_4o_model_finetune = finetune_GPT(
    training_file_path="Demo_result/D1a_GPT_4o_inital_lableing.jsonl",  # Path to training data in JSONL format
    model="gpt-4o-mini-2024-07-18",  # Base GPT model to fine-tune
    hyperparameters={"batch_size": 8, "learning_rate_multiplier": 0.01},  # Training configuration
    api_key=api_key  # OpenAI API key for authentication
)

Started fine-tune job ftjob-gqG8ue60J5l2yEZamyiWHh4f
[0s] status=validating_files
[15s] status=validating_files
[30s] status=validating_files
[45s] status=validating_files
[60s] status=validating_files
[75s] status=running
[90s] status=running
[105s] status=running
[120s] status=running
[135s] status=running
[150s] status=running
[165s] status=running
[180s] status=running
[195s] status=running
[210s] status=running
[225s] status=running
[240s] status=running
[255s] status=running
[270s] status=running
[285s] status=succeeded
✅ succeeded: ft:gpt-4o-mini-2024-07-18:jcs-research::BgIEt8f3


In [12]:
D1a_GPT_4o_model_finetune

'ft:gpt-4o-mini-2024-07-18:jcs-research::BgIEt8f3'

#### Note. You can wait in the code untill status = finish 
##### Or you will receive an email from OpenAI when the fine tune is done 

#### Example email
Hi JCs‘ research,
Your fine-tuning job ftjob-xxxx has successfully completed, and a new model **ft:gpt-4o-mini-2024-07-18:xxx::xxxx** has been created for your use.

Copy the identifier to the classification_GPT()'s model.

## Classify with Fine-Tuned GPT-4o Model

This function performs text classification using a **fine-tuned GPT-4o model**, where all other parameters (e.g., prompt, input columns, voting) are the same as used with base models.

> ✅ **Key difference**: The `model` parameter is now a fine-tuned model ID (e.g., returned from `finetune_GPT()`), rather than a base model name like `"gpt-4o-mini"`.


In [13]:

D1_GPT_4o_finetune = classification_GPT(
    text_path="Demo_data/D1_2.csv",
    category = category_D1_GPT,
    prompt = prompt_D1_GPT,
    column_4_labeling=["headline", "article", "abstract"],
    model = D1a_GPT_4o_model_finetune,
    api_key = api_key,
    temperature = 1,
    mode = "text",
    output_column_name="D1_GPT_4o_finetune",
    num_themes = 1,
    num_votes = 1)



D1_GPT_4o_finetune.to_csv("Demo_result/D1_GPT_4o_finetune.csv", index=False)

Classifying text_class: 100%|██████████| 20/20 [01:44<00:00,  5.24s/item]


In [14]:
## Check the accuracy: GPT o4 initial labeling
auto_verification(
    D1_GPT_4o_finetune,
    predicted_cols="D1_GPT_4o_finetune",
    true_cols="section_numeric",
    category=category_D1_GPT
);


== Verification of 'D1_GPT_4o_finetune' vs. 'section_numeric' ==
Accuracy:   70.00%
Macro F1:   56.25%
Micro  F1:  70.00%

Full classification report:
              precision    recall  f1-score   support

         2.0       1.00      1.00      1.00         1
         3.0       0.50      1.00      0.67         1
         5.0       0.00      0.00      0.00         0
         7.0       1.00      1.00      1.00         1
         8.0       0.00      0.00      0.00         0
         9.0       0.50      1.00      0.67         1
        10.0       1.00      1.00      1.00         2
        13.0       1.00      1.00      1.00         2
        14.0       1.00      1.00      1.00         1
        16.0       1.00      0.33      0.50         3
        17.0       1.00      1.00      1.00         2
        19.0       0.00      0.00      0.00         1
        20.0       0.50      0.50      0.50         2
        21.0       0.00      0.00      0.00         1
        23.0       1.00      0.50    

#### Note:
The error observed here is expected and occurs because OpenAI did not return a result from the anticipated category. The code is designed to handle this automatically by sending another request (Default = 3). However, if you consistently encounter this issue throughout the output, please verify the prompt settings in the Playground.

## GPT-o4 (reasoning model)


## Classify with Fine-Tuned GPT-4o Model

This function uses the `classification_GPT()` method to label data using the **GPT-4o model (`o4-mini`)**.

> ⚠️ **Note**: Reasoning-based models, such as GPT-4o (those whose names start with `o4-`), **require the `effort` parameter** instead of `temperature`.  
> `effort` controls how much computation is used per request and may affect accuracy, latency, and cost.

---

In [15]:
D1a_GPT_o4_inital_lableing = classification_GPT(
    text_path="Demo_data/D1_1.csv",                    # Input CSV file containing text data
    category=category_D1_GPT,                          # Category definitions for classification
    prompt=prompt_D1_GPT,                              # Prompt guiding labeling (optional for fine-tuned)
    column_4_labeling=["headline", "article", "abstract"],  # Columns to concatenate as model input
    model="o4-mini",                                   # GPT-4o model (compact version)
    api_key=api_key,                                   # OpenAI API key
    effort="medium",                                   # Required for o4 models; options: "low", "medium", "high"
    mode="text",                                       # Type of input
    output_column_name="D1a_GPT_o4_inital_lableing",   # Output column for prediction results
    num_themes=1,                                      # Number of labels to predict per row
    num_votes=1                                        # No majority voting; single run per instance
)

# Save the labeled results to a CSV file
D1a_GPT_o4_inital_lableing.to_csv("Demo_result/D1a_GPT_o4_inital_lableing.csv", index=False)


Classifying text_class: 100%|██████████| 20/20 [00:52<00:00,  2.64s/item]


In [16]:
## Check the accuracy: GPT o4 initial labeling
auto_verification(
    D1a_GPT_o4_inital_lableing,
    predicted_cols="D1a_GPT_o4_inital_lableing",
    true_cols="section_numeric",
    category=category_D1_GPT
);


== Verification of 'D1a_GPT_o4_inital_lableing' vs. 'section_numeric' ==
Accuracy:   89.47%
Macro F1:   84.44%
Micro  F1:  89.47%

Full classification report:
              precision    recall  f1-score   support

         3.0       1.00      1.00      1.00         3
         4.0       1.00      1.00      1.00         1
         5.0       0.67      1.00      0.80         2
         8.0       1.00      1.00      1.00         1
        10.0       1.00      1.00      1.00         1
        11.0       1.00      1.00      1.00         2
        13.0       1.00      1.00      1.00         3
        16.0       1.00      1.00      1.00         1
        18.0       0.00      0.00      0.00         0
        19.0       1.00      0.50      0.67         2
        20.0       1.00      1.00      1.00         1
        23.0       1.00      0.50      0.67         2

    accuracy                           0.89        19
   macro avg       0.89      0.83      0.84        19
weighted avg       0.96     

#### Note: GPT reasnong models (i.e., o-series) can not be finetuned for now

# Demo 1b: Classify single-category image data

**Datasets:**
- `imgs_40`: Used for initial labeling and fine-tuning.
- `imgs_40_2`: Used for testing the fine-tuned model's performance.


In [17]:
# Define the list of 24 category labels
category_D1_GPT = [
    "1", "2", "3", "4", "5", "6",
    "7", "8", "9", "10", "11", "12",
    "13", "14", "15", "16", "17", "18",
    "19", "20", "21", "22", "23", "24"
]


# Define the descriptive prompts for each category
prompt_D1_GPT = ["""You are given an image. Based on its content, return the most appropriate category label from the list below. Respond with only a number from 1 to 24, corresponding to the best-matching category:

1. Health – medical news, public health, fitness, mental health, wellness
2. Science – scientific discoveries, research studies, space, innovations
3. Television – TV shows, reviews, industry news, streaming
4. Travel – tourism, destinations, travel guides, airlines, vacation trends
5. Movies – film news, reviews, box office, upcoming releases
6. Dance – ballet, contemporary styles, street dance, performances
7. Real Estate – housing trends, property sales, architecture, urban planning
8. Economy – macroeconomics, inflation, markets, GDP, financial policies
9. Sports – competitions, athletes, Demo_results, professional sports
10. Theater – plays, Broadway, live performances, stage reviews
11. Opinion – editorials, commentary, analysis
12. Music – albums, artists, concerts, festivals, music trends
13. Books – literature, bestsellers, author interviews, book reviews
14. Art & Design – fine arts, museums, exhibitions, visual/design trends
15. Style – fashion trends, beauty, personal style, aesthetics
16. Media – journalism, publishing, digital media, mass communication
17. Food – restaurants, recipes, cooking, culinary culture
18. Well – lifestyle, self-care, mental well-being, personal development
19. Fashion – clothing, designers, fashion weeks, industry insights
20. Technology – AI, gadgets, software, cybersecurity, tech innovations
21. Your Money – personal finance, investing, budgeting, financial planning
22. Education – schools, universities, learning methods, education policies
23. Automobiles – car news, EVs, reviews, industry trends
24. Global Business – international trade, corporations, mergers, global markets

Return the categorty label only (e.g., 5), no extra words."""]





## GPT-4o


In [18]:
D1b_GPT_4o_inital_lableing = classification_GPT(
    image_dir="Demo_data/D1_imgs_1",
    text_path="Demo_data/D1_1.csv",
    category = category_D1_GPT,
    prompt = prompt_D1_GPT,
    model = "gpt-4o-mini",
    api_key = api_key,
    temperature = 0.8,
    mode = "image", # The mode is image now 
    output_column_name="D1b_GPT_4o_inital_lableing",
    num_themes = 1,
    num_votes = 3)


D1b_GPT_4o_inital_lableing.to_csv("Demo_result/D1b_GPT_4o_inital_lableing.csv", index=False)

Classifying image_class: 100%|██████████| 20/20 [00:45<00:00,  2.30s/item]


In [19]:
D1b_GPT_4o_inital_lableing

Unnamed: 0,image_id,image_dir,text_content,image_data_url,final_input,D1b_GPT_4o_inital_lableing,D1b_GPT_4o_inital_lableing_raw,D1b_GPT_4o_inital_lableing_1
0,e05685c9-6cca-5415-94c4-b4977e4fbcea,Demo_data/D1_imgs_1/e05685c9-6cca-5415-94c4-b4...,,"data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...","data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...",18,[18],18
1,23348aa1-6fa8-5ca8-a875-71fe704e6dd2,Demo_data/D1_imgs_1/23348aa1-6fa8-5ca8-a875-71...,,"data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...","data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...",13,[13],13
2,acdcd1ef-71a9-55f2-9993-7a231d57396f,Demo_data/D1_imgs_1/acdcd1ef-71a9-55f2-9993-7a...,,"data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...","data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...",16,[16],16
3,c030fbee-2601-54f5-a9df-75fa8c13368c,Demo_data/D1_imgs_1/c030fbee-2601-54f5-a9df-75...,,"data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...","data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...",18,[18],18
4,f7af67a1-d43a-5949-a837-1fcc7f8b2a98,Demo_data/D1_imgs_1/f7af67a1-d43a-5949-a837-1f...,,"data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...","data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...",23,[23],23
5,1de5c156-9cea-5971-9fdf-b6a4ce9bf35c,Demo_data/D1_imgs_1/1de5c156-9cea-5971-9fdf-b6...,,"data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...","data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...",11,[11],11
6,9b15aa80-ee3b-5f5a-9055-a94d36196d85,Demo_data/D1_imgs_1/9b15aa80-ee3b-5f5a-9055-a9...,,"data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...","data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...",20,[20],20
7,32d42c23-4bd8-504a-8151-912b4c3f002d,Demo_data/D1_imgs_1/32d42c23-4bd8-504a-8151-91...,,"data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...","data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...",11,[11],11
8,0b5c86ae-81bd-5f9b-854a-d5a9b3c231f2,Demo_data/D1_imgs_1/0b5c86ae-81bd-5f9b-854a-d5...,,"data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...","data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...",9,[9],9
9,7d93428e-d390-5cb2-a4a8-cb3c5a93a1c2,Demo_data/D1_imgs_1/7d93428e-d390-5cb2-a4a8-cb...,,"data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...","data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...",1,[1],1


In [20]:
## Check the accuracy: GPT 4o inital labeling
# Merge the Demo_result of the image with the human label
D1b_CLIP_human = pd.read_csv("Demo_data/D1_1.csv")
D1b_GPT_4o_inital_lableing = pd.merge(D1b_GPT_4o_inital_lableing, D1b_CLIP_human, on="image_id", how="inner")
D1b_GPT_4o_inital_lableing.to_csv("Demo_result/D1b_GPT_4o_inital_lableing.csv", index=False)


auto_verification(
    D1b_GPT_4o_inital_lableing,
    predicted_cols="D1b_GPT_4o_inital_lableing",
    true_cols="section_numeric",
    category=category_D1_GPT
);


== Verification of 'D1b_GPT_4o_inital_lableing' vs. 'section_numeric' ==
Accuracy:   44.44%
Macro F1:   35.00%
Micro  F1:  44.44%

Full classification report:
              precision    recall  f1-score   support

         3.0       0.50      1.00      0.67         2
         4.0       1.00      1.00      1.00         1
         5.0       1.00      0.50      0.67         2
         8.0       0.00      0.00      0.00         1
         9.0       0.00      0.00      0.00         0
        10.0       0.00      0.00      0.00         1
        11.0       0.33      0.50      0.40         2
        13.0       1.00      0.33      0.50         3
        15.0       0.00      0.00      0.00         0
        16.0       0.00      0.00      0.00         1
        18.0       0.00      0.00      0.00         0
        19.0       0.00      0.00      0.00         2
        20.0       1.00      1.00      1.00         1
        23.0       1.00      0.50      0.67         2

    accuracy                

## GPT-o4 (reasoning model)


In [21]:
D1b_GPT_o4_inital_lableing = classification_GPT(
    image_dir="Demo_data/D1_imgs_1",
    text_path="Demo_data/D1_1.csv",
    category = category_D1_GPT,
    prompt = prompt_D1_GPT,
    model = "o4-mini",
    api_key = api_key,
    effort = "medium",
    mode = "image",# The mode is image now 
    output_column_name="D1b_GPT_o4_inital_lableing",
    num_themes = 1,
    num_votes = 1)


D1b_GPT_o4_inital_lableing.to_csv("Demo_result/D1b_GPT_o4_inital_lableing.csv", index=False)

Classifying image_class: 100%|██████████| 20/20 [01:39<00:00,  4.97s/item]


In [22]:
D1b_GPT_o4_inital_lableing

Unnamed: 0,image_id,image_dir,text_content,image_data_url,final_input,D1b_GPT_o4_inital_lableing,D1b_GPT_o4_inital_lableing_raw,D1b_GPT_o4_inital_lableing_1
0,e05685c9-6cca-5415-94c4-b4977e4fbcea,Demo_data/D1_imgs_1/e05685c9-6cca-5415-94c4-b4...,,"data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...","data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...",14,[14],14
1,23348aa1-6fa8-5ca8-a875-71fe704e6dd2,Demo_data/D1_imgs_1/23348aa1-6fa8-5ca8-a875-71...,,"data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...","data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...",13,[13],13
2,acdcd1ef-71a9-55f2-9993-7a231d57396f,Demo_data/D1_imgs_1/acdcd1ef-71a9-55f2-9993-7a...,,"data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...","data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...",24,[24],24
3,c030fbee-2601-54f5-a9df-75fa8c13368c,Demo_data/D1_imgs_1/c030fbee-2601-54f5-a9df-75...,,"data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...","data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...",11,[11],11
4,f7af67a1-d43a-5949-a837-1fcc7f8b2a98,Demo_data/D1_imgs_1/f7af67a1-d43a-5949-a837-1f...,,"data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...","data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...",23,[23],23
5,1de5c156-9cea-5971-9fdf-b6a4ce9bf35c,Demo_data/D1_imgs_1/1de5c156-9cea-5971-9fdf-b6...,,"data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...","data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...",11,[11],11
6,9b15aa80-ee3b-5f5a-9055-a94d36196d85,Demo_data/D1_imgs_1/9b15aa80-ee3b-5f5a-9055-a9...,,"data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...","data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...",20,[20],20
7,32d42c23-4bd8-504a-8151-912b4c3f002d,Demo_data/D1_imgs_1/32d42c23-4bd8-504a-8151-91...,,"data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...","data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...",11,[11],11
8,0b5c86ae-81bd-5f9b-854a-d5a9b3c231f2,Demo_data/D1_imgs_1/0b5c86ae-81bd-5f9b-854a-d5...,,"data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...","data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...",9,[9],9
9,7d93428e-d390-5cb2-a4a8-cb3c5a93a1c2,Demo_data/D1_imgs_1/7d93428e-d390-5cb2-a4a8-cb...,,"data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...","data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...",17,[17],17


In [23]:
## Check the accuracy: GPT o4 initial labeling
D1b_CLIP_human = pd.read_csv("Demo_data/D1_1.csv")
D1b_GPT_o4_inital_lableing = pd.merge(D1b_GPT_o4_inital_lableing, D1b_CLIP_human, on="image_id", how="inner")
D1b_GPT_o4_inital_lableing.to_csv("Demo_result/D1b_GPT_4o_inital_lableing.csv", index=False)


auto_verification(
    D1b_GPT_o4_inital_lableing,
    predicted_cols="D1b_GPT_o4_inital_lableing",
    true_cols="section_numeric",
    category=category_D1_GPT
);


== Verification of 'D1b_GPT_o4_inital_lableing' vs. 'section_numeric' ==
Accuracy:   31.58%
Macro F1:   25.21%
Micro  F1:  31.58%

Full classification report:
              precision    recall  f1-score   support

         3.0       0.00      0.00      0.00         2
         4.0       1.00      1.00      1.00         1
         5.0       1.00      0.50      0.67         2
         7.0       0.00      0.00      0.00         0
         8.0       0.00      0.00      0.00         1
         9.0       0.00      0.00      0.00         0
        10.0       0.00      0.00      0.00         1
        11.0       0.14      0.33      0.20         3
        13.0       1.00      0.33      0.50         3
        14.0       0.00      0.00      0.00         0
        16.0       0.00      0.00      0.00         1
        17.0       0.00      0.00      0.00         0
        19.0       0.00      0.00      0.00         2
        20.0       1.00      1.00      1.00         1
        23.0       1.00      

#### Note: the current GPT models have strict restrictions on fine-tuning images. So it is not used here.

# Demo 1c: Classify single-category text + image data

**Datasets:**
- `nytimes_40.csv`: Used for initial labeling and fine-tuning.
- `imgs_40`: Used for initial labeling and fine-tuning.

- `nytimes_40_2.csv`: Used for testing the fine-tuned model's performance.
- `imgs_40_2`: Used for testing the fine-tuned model's performance.


The text dataset should contain a column `img_dir` to map the images for each row.

## CLIP: local labeling

In [24]:
# Define the list of 24 category labels
category_D1_GPT = [
    "1", "2", "3", "4", "5", "6",
    "7", "8", "9", "10", "11", "12",
    "13", "14", "15", "16", "17", "18",
    "19", "20", "21", "22", "23", "24"
]


# Define the descriptive prompts for each category
prompt_D1_GPT = ["""You are given an article and image. Based on its content, return the most appropriate category label from the list below. Respond with only a number from 1 to 24, corresponding to the best-matching category:

1. Health – medical news, public health, fitness, mental health, wellness
2. Science – scientific discoveries, research studies, space, innovations
3. Television – TV shows, reviews, industry news, streaming
4. Travel – tourism, destinations, travel guides, airlines, vacation trends
5. Movies – film news, reviews, box office, upcoming releases
6. Dance – ballet, contemporary styles, street dance, performances
7. Real Estate – housing trends, property sales, architecture, urban planning
8. Economy – macroeconomics, inflation, markets, GDP, financial policies
9. Sports – competitions, athletes, Demo_results, professional sports
10. Theater – plays, Broadway, live performances, stage reviews
11. Opinion – editorials, commentary, analysis
12. Music – albums, artists, concerts, festivals, music trends
13. Books – literature, bestsellers, author interviews, book reviews
14. Art & Design – fine arts, museums, exhibitions, visual/design trends
15. Style – fashion trends, beauty, personal style, aesthetics
16. Media – journalism, publishing, digital media, mass communication
17. Food – restaurants, recipes, cooking, culinary culture
18. Well – lifestyle, self-care, mental well-being, personal development
19. Fashion – clothing, designers, fashion weeks, industry insights
20. Technology – AI, gadgets, software, cybersecurity, tech innovations
21. Your Money – personal finance, investing, budgeting, financial planning
22. Education – schools, universities, learning methods, education policies
23. Automobiles – car news, EVs, reviews, industry trends
24. Global Business – international trade, corporations, mergers, global markets

Return the categorty label only (e.g., 5), no extra words."""]





## GPT-4o


In [25]:
D1c_GPT_4o_inital_lableing = classification_GPT(
    text_path="Demo_data/D1_1.csv",
    image_dir="Demo_data/D1_imgs_1",
    category=category_D1_GPT,
    prompt=prompt_D1_GPT,
    column_4_labeling=["headline", "article", "abstract"],
    model = "gpt-4o-mini",
    api_key = api_key,
    temperature = 1,
    mode = "both",# The mode is text + iamge now 
    output_column_name="D1c_GPT_4o_inital_lableing",
    num_themes = 1,
    num_votes = 1)


D1c_GPT_4o_inital_lableing.to_csv("Demo_result/D1c_GPT_4o_inital_lableing.csv", index=False)

Classifying text_class: 100%|██████████| 20/20 [00:12<00:00,  1.61item/s]
Classifying image_class: 100%|██████████| 20/20 [00:36<00:00,  1.83s/item]
Classifying final_class: 100%|██████████| 20/20 [00:42<00:00,  2.15s/item]


In [26]:
D1c_GPT_4o_inital_lableing

Unnamed: 0,section,headline,article_url,article,abstract,article_id,image,caption,image_id,image_path,section_numeric,image_dir,text_content,image_data_url,final_input,text_class,image_class,D1c_GPT_4o_inital_lableing,D1c_GPT_4o_inital_lableing_raw,D1c_GPT_4o_inital_lableing_1
0,Fashion & Style,"On This Runway, Non-Models and Cool Kids",https://www.nytimes.com/2016/09/10/fashion/eck...,"Over Labor Day weekend, a steady stream of hop...",How the Eckhaus Latta designers select fashion...,e9cd6477-5eb6-58b2-8e33-fd2d881bf656,https://static01.nyt.com/images/2016/09/10/fas...,"Mike Eckhaus, center right, and Zoe Latta, cen...",e9cd6477-5eb6-58b2-8e33-fd2d881bf656,/Demo_data/D1_imgs_1/e9cd6477-5eb6-58b2-8e33-f...,19,Demo_data/D1_imgs_1/e9cd6477-5eb6-58b2-8e33-fd...,"On This Runway, Non-Models and Cool Kids Over ...","data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...","On This Runway, Non-Models and Cool Kids Over ...",[19],[15],19,[19],19
1,Theater,"The New 42nd Street, a Theater Nonprofit, Name...",https://www.nytimes.com/2019/06/11/theater/new...,The New 42nd Street -- the nonprofit organizat...,Russell Granet will be the new president and c...,acdcd1ef-71a9-55f2-9993-7a231d57396f,https://static01.nyt.com/images/2019/06/11/art...,"The New 42nd Street board chairwoman, Fiona Ru...",acdcd1ef-71a9-55f2-9993-7a231d57396f,/Demo_data/D1_imgs_1/acdcd1ef-71a9-55f2-9993-7...,10,Demo_data/D1_imgs_1/acdcd1ef-71a9-55f2-9993-7a...,"The New 42nd Street, a Theater Nonprofit, Name...","data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...","The New 42nd Street, a Theater Nonprofit, Name...",[1],[16],10,[10],10
2,Economy,Tax Tactics Threaten Public Funds,https://www.nytimes.com/2014/10/02/business/ec...,When the European Commission charged this week...,If global corporations can continue to evade t...,1de5c156-9cea-5971-9fdf-b6a4ce9bf35c,https://static01.nyt.com/images/2012/02/29/bus...,Eduardo Porter,1de5c156-9cea-5971-9fdf-b6a4ce9bf35c,/Demo_data/D1_imgs_1/1de5c156-9cea-5971-9fdf-b...,8,Demo_data/D1_imgs_1/1de5c156-9cea-5971-9fdf-b6...,Tax Tactics Threaten Public Funds When the Eur...,"data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...",Tax Tactics Threaten Public Funds When the Eur...,[8],[16],24,[24],24
3,Television,Review: 'The Chi' Returns to the South Side of...,https://www.nytimes.com/2019/04/05/arts/televi...,"""The Chi"" was built around a series of shootin...",Lena Waithe's neighborhood drama on Showtime d...,7bb11a9c-2f6d-57f9-bb36-db7f00f59589,https://static01.nyt.com/images/2019/04/05/art...,"Alex Hibbert as Kevin in ""The Chi,"" a sprawlin...",7bb11a9c-2f6d-57f9-bb36-db7f00f59589,/Demo_data/D1_imgs_1/7bb11a9c-2f6d-57f9-bb36-d...,3,Demo_data/D1_imgs_1/7bb11a9c-2f6d-57f9-bb36-db...,Review: 'The Chi' Returns to the South Side of...,"data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...",Review: 'The Chi' Returns to the South Side of...,[3],[3],3,[3],3
4,Opinion,Banks Should Face History and Pay Reparations,https://www.nytimes.com/2020/06/26/opinion/sun...,Ms. Blackwell is founder in residence at Polic...,The financial industry can close the wealth ga...,165426ae-1e7c-5193-8596-ef96357645a8,https://static01.nyt.com/images/2020/06/25/opi...,A Wells Fargo bank in Minneapolis was set on f...,165426ae-1e7c-5193-8596-ef96357645a8,/Demo_data/D1_imgs_1/165426ae-1e7c-5193-8596-e...,11,Demo_data/D1_imgs_1/165426ae-1e7c-5193-8596-ef...,Banks Should Face History and Pay Reparations ...,"data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...",Banks Should Face History and Pay Reparations ...,[8],[16],8,[8],8
5,Books,"Mercedes Barcha, Gabriel García Márquez's Wife...",https://www.nytimes.com/2020/08/23/books/merce...,"Mercedes Barcha and her husband, the Colombian...","For nearly 60 years, she was the Nobel Prize-w...",23348aa1-6fa8-5ca8-a875-71fe704e6dd2,https://static01.nyt.com/images/2020/08/24/obi...,"Mercedes Barcha and her husband, the Colombian...",23348aa1-6fa8-5ca8-a875-71fe704e6dd2,/Demo_data/D1_imgs_1/23348aa1-6fa8-5ca8-a875-7...,13,Demo_data/D1_imgs_1/23348aa1-6fa8-5ca8-a875-71...,"Mercedes Barcha, Gabriel García Márquez's Wife...","data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...","Mercedes Barcha, Gabriel García Márquez's Wife...",[13],[13],13,[13],13
6,Opinion,The George Floyd Election,https://www.nytimes.com/2020/06/03/opinion/geo...,"Mike Erlandson, a former chairman of the state...",How the protests come to be viewed may determi...,32d42c23-4bd8-504a-8151-912b4c3f002d,https://static01.nyt.com/images/2020/06/03/opi...,Protests over the killing of George Floyd in N...,32d42c23-4bd8-504a-8151-912b4c3f002d,/Demo_data/D1_imgs_1/32d42c23-4bd8-504a-8151-9...,11,Demo_data/D1_imgs_1/32d42c23-4bd8-504a-8151-91...,"The George Floyd Election Mike Erlandson, a fo...","data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...","The George Floyd Election Mike Erlandson, a fo...",[11],[11],11,[11],11
7,Books,"For Black and Mixed-Race Women, Hair and Ident...",https://www.nytimes.com/2020/03/17/books/revie...,What do we lose in translation? How do we know...,"In ""That Hair,"" the Portuguese writer Djaimili...",7e7703aa-4180-57a3-9781-c74a25d31afb,https://static01.nyt.com/images/2020/03/22/boo...,Djaimilia Pereira de Almeida,7e7703aa-4180-57a3-9781-c74a25d31afb,/Demo_data/D1_imgs_1/7e7703aa-4180-57a3-9781-c...,13,Demo_data/D1_imgs_1/7e7703aa-4180-57a3-9781-c7...,"For Black and Mixed-Race Women, Hair and Ident...","data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...","For Black and Mixed-Race Women, Hair and Ident...",[13],[18],13,[13],13
8,Books,A Contemporary Feminist Spin on the Traditiona...,https://www.nytimes.com/2019/07/08/books/revie...,Motivated in part by guilt as a privileged out...,"In Madeline ffitch's debut, ""Stay and Fight,"" ...",c030fbee-2601-54f5-a9df-75fa8c13368c,https://static01.nyt.com/images/2019/06/27/boo...,"Madeline ffitch's debut novel, ""Stay and Fight...",c030fbee-2601-54f5-a9df-75fa8c13368c,/Demo_data/D1_imgs_1/c030fbee-2601-54f5-a9df-7...,13,Demo_data/D1_imgs_1/c030fbee-2601-54f5-a9df-75...,A Contemporary Feminist Spin on the Traditiona...,"data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...",A Contemporary Feminist Spin on the Traditiona...,[13],[15],13,[13],13
9,Media,Minnesota Public Radio Drops Garrison Keillor ...,https://www.nytimes.com/2017/11/29/business/me...,Minnesota Public Radio said Wednesday that it ...,The network says it has severed all business t...,ffc5888c-af9f-5898-b5db-e838334b487e,https://static01.nyt.com/images/2017/11/30/bus...,Garrison Keillor in 2016.,ffc5888c-af9f-5898-b5db-e838334b487e,/Demo_data/D1_imgs_1/ffc5888c-af9f-5898-b5db-e...,16,Demo_data/D1_imgs_1/ffc5888c-af9f-5898-b5db-e8...,Minnesota Public Radio Drops Garrison Keillor ...,"data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQA...",Minnesota Public Radio Drops Garrison Keillor ...,[3],[6],3,[3],3


In [27]:
## Check the accuracy: GPT 4o inital labeling
auto_verification(
    D1c_GPT_4o_inital_lableing,
    predicted_cols="D1c_GPT_4o_inital_lableing",
    true_cols="section_numeric",
    category=category_D1_GPT
);


== Verification of 'D1c_GPT_4o_inital_lableing' vs. 'section_numeric' ==
Accuracy:   77.78%
Macro F1:   66.59%
Micro  F1:  77.78%

Full classification report:
              precision    recall  f1-score   support

         3.0       0.75      1.00      0.86         3
         4.0       1.00      1.00      1.00         1
         5.0       0.67      1.00      0.80         2
         8.0       0.00      0.00      0.00         1
        10.0       1.00      1.00      1.00         1
        11.0       1.00      0.50      0.67         2
        13.0       1.00      1.00      1.00         3
        16.0       0.00      0.00      0.00         1
        19.0       1.00      1.00      1.00         1
        20.0       1.00      1.00      1.00         1
        23.0       1.00      0.50      0.67         2
        24.0       0.00      0.00      0.00         0

    accuracy                           0.78        18
   macro avg       0.70      0.67      0.67        18
weighted avg       0.81     

## GPT-o4 (reasoning model)

##### The structure and usage are identical to the previous example. However, instead of adjusting the temperature parameter to control randomness, the effort parameter is used to influence the model's reasoning depth and complexity.


In [28]:
D1c_GPT_o4_inital_lableing = classification_GPT(
    text_path="Demo_data/D1_1.csv",
    image_dir="Demo_data/D1_imgs_1",
    category=category_D1_GPT,
    prompt=prompt_D1_GPT,
    column_4_labeling=["headline", "article", "abstract"],
    model = "o4-mini",
    api_key = api_key,
    effort = "medium",
    mode = "both",# The mode is text + iamge now 
    output_column_name="D1c_GPT_o4_inital_lableing",
    num_themes = 1,
    num_votes = 1)


D1c_GPT_o4_inital_lableing.to_csv("Demo_result/D1c_GPT_o4_inital_lableing.csv", index=False)


Classifying text_class: 100%|██████████| 20/20 [01:02<00:00,  3.12s/item]
Classifying image_class: 100%|██████████| 20/20 [01:47<00:00,  5.40s/item]
Classifying final_class: 100%|██████████| 20/20 [01:13<00:00,  3.70s/item]


In [29]:
## Check the accuracy: GPT o4 initial labeling
auto_verification(
    D1c_GPT_o4_inital_lableing,
    predicted_cols="D1c_GPT_o4_inital_lableing",
    true_cols="section_numeric",
    category=category_D1_GPT
);


== Verification of 'D1c_GPT_o4_inital_lableing' vs. 'section_numeric' ==
Accuracy:   88.89%
Macro F1:   78.89%
Micro  F1:  88.89%

Full classification report:
              precision    recall  f1-score   support

         3.0       1.00      1.00      1.00         3
         4.0       1.00      1.00      1.00         1
         5.0       0.67      1.00      0.80         2
         8.0       1.00      1.00      1.00         1
        10.0       1.00      1.00      1.00         1
        11.0       1.00      1.00      1.00         2
        13.0       1.00      1.00      1.00         3
        16.0       1.00      1.00      1.00         1
        19.0       1.00      1.00      1.00         1
        20.0       0.00      0.00      0.00         1
        23.0       1.00      0.50      0.67         2
        24.0       0.00      0.00      0.00         0

    accuracy                           0.89        18
   macro avg       0.81      0.79      0.79        18
weighted avg       0.91     

Demo 2: Multi-Label Theme Classification using News Headlines
--------------------------------------------------------------------

This demo shows how to classify U.S. immigration-related news headlines into
one or two dominant thematic category using a multi-label text-only classification task.
The coding scheme is adapted from Guo et al. (2023).

Dataset: Guo et al. (2023) Immigration News Dataset
Source: https://doi.org/10.1177/19401612231204535


**Datasets:**
- `Text_multiple_category_40.xlsx`: Used for initial labeling and fine-tuning.
- `Text_multiple category_40_2.xlsx`: Used for testing the fine-tuned model's performance.


Each sample in the dataset includes:
- A single new **Post_Title**


Themes:
-------
1) Economic consequences  
2) Crime/safety  
3) Family  
4) Immigrant wellbeing  
5) Culture/society  
6) Politics  
7) Legislation/regulation  
8) Public opinion  
99) None of the above


Reference:
----------
Guo, L., Su, C. C., & Chen, H.-T. (2023).
Do News Frames Really Have Some Influence in the Real World?
A Computational Analysis of Cumulative Framing Effects on Emotions and Opinions About Immigration.
The International Journal of Press/Politics. https://doi.org/10.1177/19401612231204535


#### Note: Based on our tests, CLIP did not perform well for multi-category classification. Its embedding-based matching struggles with distinguishing the order of categories and handling missing categories effectively. Therefore, we recommend using GPT models for multi-category classification tasks for improved accuracy and reliability.

## GPT: third-party API labeling


## GPT 4o-mini

In [30]:
category_D2_GPT = [
    "0", "1"
]

prompt_D2_GPT = '''Here's a news article headline. Please label if it belongs to the following theme.
            Return <1> if this headline belongs to these themes and return <0> if it does not belong to the themes.
            Please code for each of the following 8 topics.
            Please identify up to two dominant themes from the headline, which means you can have a max of 2 <1> in the answer you generated.
            You don't have to label two topic if you don't fint it apply. Just enter 0s.
            - Economic consequences: The story is about economic benefits or costs, or the costs involving immigration-related issues, including: Cost of mass deportation; Economic benefits of immigration (more tax revenue, cheap labor; Economic costs of immigration (taking jobs from Americans, immigrants using healthcare and educational services, overcrowding, housing concerns)
            - Crime/safety: The story is about threats to American's safety, including: Immigration described as a major cause of increased rates of crime, gangs, drug trafficking, etc; Immigrants described as law-breakers who deserve punishment; Immigration described as a threat to national security via terrorism
            - Family: The story is about the impact of immigration on families, including: Separating children from parents; Breaking up multi-generational families; Interfering with children's continued schooling
            - Immigrant wellbeing: This story is about the negative impact of the immigration process on immigrants, including: Prejudice and bias toward immigrants; Physical and/or mental health or safety of immigrants; Immigration policies described as violations of immigrants' civil rights and liberties; Immigration policies regarding illegal immigrants described as unfair to immigrants who have waited to become citizens the legal way
            - Culture/society: This story is about societal-wide factors or consequences related to immigration, including:; Immigration as a threat to American cultural identity, way of living, the predominance of English and Christianity, etc.; Immigrants as isolated from the rest of America, unable to assimilate into communities; Immigration as part of the celebrated history of immigration in America / America-as-melting-pot; Immigration policies as exemplars of society's immorality; Impact of immigration on a specific subculture/community in the US
            - Politics:The story is mainly about the political issues around immigration, including: Political campaigns and upcoming elections (e.g., using immigration as a wedge issue or motivating force to get people to the polls); Fighting between the Democratic and Republican parties, or politicians; One political party or one politician’s stance on immigration. Therefore, when the news headline mentions a politician’s name, it often indicates the theme of politics
            - Legislation/regulation: The story is about issues related to regulating immigration through legislation and other institutional measures: New immigration legislation being introduced/argued over; Flaws in current/old legislation; Enforcement of current legislation
            - Public opinion: The study is about the public’s, including a specific community’s, reactions to immigration-related issues, including: Public opinion polls; Protests; Social media backlash; Community outrage; Celebrity responses/protests
            Answer using the following format [0, 0, 0, 0, 0, 0, 0, 0]. Do not provide any other information'''




In [31]:
D2_GPT_4o_inital_lableing = classification_GPT(
    text_path="Demo_data/D2_1.xlsx",
    category=["0", "1"],
    prompt=prompt_D2_GPT,          
    column_4_labeling=["Post_Title"],
    model="gpt-4o-mini",
    api_key=api_key,
    temperature=0.8,
    mode="text",
    output_column_name="D2_GPT_4o_initial_labeling",
    num_themes=8,
    num_votes=3,
)



D2_GPT_4o_inital_lableing.to_csv("Demo_result/D2_GPT_4o_inital_lableing.csv", index=False)

Classifying text_class: 100%|██████████| 20/20 [00:24<00:00,  1.25s/item]


In [32]:
D2_GPT_4o_inital_lableing

Unnamed: 0,ID,ID_original,GUID,Date (GMT),URL,Post_Title,Q1,Q2,Q3_1_og,Q3_2_og,...,D2_GPT_4o_initial_labeling,D2_GPT_4o_initial_labeling_raw,D2_GPT_4o_initial_labeling_1,D2_GPT_4o_initial_labeling_2,D2_GPT_4o_initial_labeling_3,D2_GPT_4o_initial_labeling_4,D2_GPT_4o_initial_labeling_5,D2_GPT_4o_initial_labeling_6,D2_GPT_4o_initial_labeling_7,D2_GPT_4o_initial_labeling_8
0,1396,6481,585fc69e-51da-4017-bbfe-e104b8880f52,2018-01-06 09:17:47,https://www.yahoo.com/news/trump-administratio...,Trump administration considers eliminating imm...,1,2,6,7,...,"[0, 0, 0, 0, 0, 1, 1, 0]","[0, 0, 0, 0, 0, 1, 1, 0]",0,0,0,0,0,1,1,0
1,354,1728,9d429eec-01ad-4a6d-aad1-5d961d88baa1,2018-01-05 20:42:02,https://www.newsmax.com/newsfront/us-san-franc...,Immigrant Acquitted of Killing Is Sentenced fo...,1,1,2,99,...,"[0, 1, 0, 0, 0, 0, 0, 0]","[0, 1, 0, 0, 0, 0, 0, 0]",0,1,0,0,0,0,0,0
2,1336,6176,d27c43fa-c5e3-41a0-9f89-ef7922db4bf3,2018-01-14 13:05:06,http://www.latimes.com/nation/sns-bc-eu--europ...,Pope: It's a sin if fear makes us hostile to m...,2,99,99,99,...,"[0, 0, 0, 1, 0, 0, 0, 0]","[0, 0, 0, 1, 0, 0, 0, 0]",0,0,0,1,0,0,0,0
3,907,4375,ce5a0f99-1795-4e65-b6f1-6352b753b9c0,2018-01-12 00:14:46,https://www.yahoo.com/news/m/3dfd9bf2-167d-398...,ICE agents descend on dozens of 7-Eleven store...,1,2,2,1,...,"[0, 1, 0, 0, 0, 0, 0, 0]","[0, 1, 0, 0, 0, 0, 0, 0]",0,1,0,0,0,0,0,0
4,1291,5956,73c5fca9-683d-4292-9b5a-879f8df0e7b1,2018-01-10 17:20:29,http://www.breitbart.com/big-hollywood/2018/01...,James Woods Warns Trump: 'If You Fold on Immig...,1,2,6,99,...,"[0, 0, 0, 0, 0, 1, 0, 0]","[0, 0, 0, 0, 0, 1, 0, 0]",0,0,0,0,0,1,0,0
5,1275,5902,4fdf4395-1910-4b74-9f1d-e2b70a33a10b,2018-01-10 01:11:59,https://www.newsmax.com/politics/daca-border-w...,Trump: Wall 'Must'Be Part of Any DACA Deal,1,2,6,7,...,"[0, 0, 0, 0, 0, 1, 0, 0]","[0, 0, 0, 0, 0, 1, 0, 0]",0,0,0,0,0,1,0,0
6,940,4520,fde7fb19-8fe6-4b9b-a53f-70e91a55d8f6,2018-01-04 04:39:27,http://dailycaller.com/2018/01/03/tucker-burst...,Tucker Bursts Into Laughter During Interview W...,1,1,6,99,...,"[0, 0, 0, 0, 0, 1, 0, 0]","[0, 0, 0, 0, 0, 1, 0, 0]",0,0,0,0,0,1,0,0
7,1734,8006,bb744fe6-6ed6-4786-8ac0-09b5d96a6ee0,2018-01-09 22:00:32,https://www.yahoo.com/news/trump-suggests-2-ph...,Trump presents deal for immigration overhaul,1,2,6,7,...,"[0, 0, 0, 0, 0, 1, 1, 0]","[0, 0, 0, 0, 0, 1, 1, 0]",0,0,0,0,0,1,1,0
8,66,286,bf4ef57d-4dbc-4bd7-9a39-560650b31853,2018-01-04 03:20:42,https://www.newsmax.com/politics/border-wall-b...,The Hill: Border Wall Confusion Holds Up Govt ...,1,2,6,99,...,"[0, 0, 0, 0, 0, 1, 1, 0]","[0, 0, 0, 0, 0, 1, 1, 0]",0,0,0,0,0,1,1,0
9,1048,4993,928ded16-8c65-48f3-8963-dc7e3228ae2d,2018-01-11 04:20:55,https://www.huffingtonpost.com/entry/democrats...,"Democrats, GOP Far Apart On Immigration Deal, ...",1,2,7,99,...,"[0, 0, 0, 0, 0, 1, 0, 0]","[0, 0, 0, 0, 0, 1, 0, 0]",0,0,0,0,0,1,0,0


In [33]:
auto_verification(
    D2_GPT_4o_inital_lableing,
    predicted_cols=[
        "D2_GPT_4o_initial_labeling_1",
        "D2_GPT_4o_initial_labeling_2",
        "D2_GPT_4o_initial_labeling_3",
        "D2_GPT_4o_initial_labeling_4",
        "D2_GPT_4o_initial_labeling_5",
        "D2_GPT_4o_initial_labeling_6",
        "D2_GPT_4o_initial_labeling_7",
        "D2_GPT_4o_initial_labeling_8"
    ],
    true_cols=[
        "Q3_1", "Q3_2", "Q3_3", "Q3_4",
        "Q3_5", "Q3_6", "Q3_7", "Q3_8"
    ],
    category=category_D2_GPT
)



== Verification of 'D2_GPT_4o_initial_labeling_1' vs. 'Q3_1' ==
Accuracy:   95.00%
Macro F1:   48.72%
Micro  F1:  95.00%

Full classification report:
              precision    recall  f1-score   support

           0       0.95      1.00      0.97        19
           1       0.00      0.00      0.00         1

    accuracy                           0.95        20
   macro avg       0.47      0.50      0.49        20
weighted avg       0.90      0.95      0.93        20


Confusion matrix:
[[19  0]
 [ 1  0]]

== Verification of 'D2_GPT_4o_initial_labeling_2' vs. 'Q3_2' ==
Accuracy:   90.00%
Macro F1:   84.38%
Micro  F1:  90.00%

Full classification report:
              precision    recall  f1-score   support

           0       0.94      0.94      0.94        16
           1       0.75      0.75      0.75         4

    accuracy                           0.90        20
   macro avg       0.84      0.84      0.84        20
weighted avg       0.90      0.90      0.90        20


Confu



### finetune: GPT 4o

In [34]:
# Prepare the data for GPT-4o 
# The JSONL should have three parts: 
# 1. system_prompt: coding instruction
# 2. input_col: The information needed to be labeled
# 3. label_col: the label

generate_GPT_finetune_jsonl(D2_GPT_4o_inital_lableing, 
                        output_path="Demo_result/D2_GPT_4o_inital_lableing.jsonl", 
                        system_prompt = prompt_D2_GPT,
                        input_col = ["Post_Title"],
                        label_col=["Q3_clean"])


In [35]:
# Fine-tune GPT-4o
D2_GPT_4o_model_finetune = finetune_GPT(
    training_file_path="Demo_result/D2_GPT_4o_inital_lableing.jsonl",
    model="gpt-4o-mini-2024-07-18",  
    hyperparameters={"batch_size":8, "learning_rate_multiplier":0.01},
    api_key= api_key 
)

Started fine-tune job ftjob-amqjwE37LpeUNwOhNQSM1s3I
[0s] status=validating_files
[15s] status=validating_files
[30s] status=validating_files
[45s] status=validating_files
[60s] status=validating_files
[75s] status=running
[90s] status=running
[105s] status=running
[120s] status=running
[135s] status=running
[150s] status=running
[165s] status=running
[180s] status=running
[195s] status=running
[210s] status=running
[225s] status=running
[240s] status=running
[255s] status=running
[270s] status=succeeded
✅ succeeded: ft:gpt-4o-mini-2024-07-18:jcs-research::BgIUE4du


In [36]:
D2_GPT_4o_model_finetune

'ft:gpt-4o-mini-2024-07-18:jcs-research::BgIUE4du'

In [37]:
# test the fineune model to see the example result


response = client.responses.create(
  model=D2_GPT_4o_model_finetune,
  input=[
    {
      "role": "developer",
      "content": [
        {
          "type": "input_text",
          "text": prompt_D2_GPT,

            }
      ]
    },
    {
      "role": "user",
      "content": [
        {
          "type": "input_text",
          "text": "\"\"\" Our guide to the city's best classical music and opera.\"\"\"\n      "
        }
      ]
    }
  ],
  text={
    "format": {
      "type": "text"
    }
  },

  tools=[],
  store=True
)

In [38]:
response

Response(id='resp_68460a024e8081a3a860fa1b6c33d832064e35797dac7e4f', created_at=1749420546.0, error=None, incomplete_details=None, instructions=None, metadata={}, model='ft:gpt-4o-mini-2024-07-18:jcs-research::BgIUE4du', object='response', output=[ResponseOutputMessage(id='msg_68460a0901b881a3bb314356c9096d94064e35797dac7e4f', content=[ResponseOutputText(annotations=[], text='[0, 0, 0, 0, 0, 0, 0, 0]', type='output_text')], role='assistant', status='completed', type='message')], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[], top_p=1.0, max_output_tokens=None, previous_response_id=None, reasoning=Reasoning(effort=None, generate_summary=None, summary=None), status='completed', text=ResponseTextConfig(format=ResponseFormatText(type='text')), truncation='disabled', usage=ResponseUsage(input_tokens=683, input_tokens_details=InputTokensDetails(cached_tokens=0), output_tokens=25, output_tokens_details=OutputTokensDetails(reasoning_tokens=0), total_tokens=708), user=N

In [39]:
# Classify with fine‑tuned 4o model
D2_GPT_4o_finetune = classification_GPT(
    text_path="Demo_data/D2_2.xlsx",
    category = category_D2_GPT,
    prompt = prompt_D2_GPT,
    column_4_labeling=["Post_Title"],
    model = D2_GPT_4o_model_finetune,
    api_key = api_key,
    temperature = 0.8,
    mode = "text",
    output_column_name="D2_GPT_4o_finetune",
    num_themes = 8,
    num_votes = 1)



D2_GPT_4o_finetune.to_csv("Demo_result/D2_GPT_4o_finetune.csv", index=False)

Classifying text_class: 100%|██████████| 20/20 [01:27<00:00,  4.39s/item]


In [40]:
D2_GPT_4o_finetune

Unnamed: 0,ID,ID_original,GUID,Date (GMT),URL,Post_Title,Q1,Q2,Q3_1_og,Q3_2_og,...,D2_GPT_4o_finetune,D2_GPT_4o_finetune_raw,D2_GPT_4o_finetune_1,D2_GPT_4o_finetune_2,D2_GPT_4o_finetune_3,D2_GPT_4o_finetune_4,D2_GPT_4o_finetune_5,D2_GPT_4o_finetune_6,D2_GPT_4o_finetune_7,D2_GPT_4o_finetune_8
0,989,4719,f10212aa-31cc-4748-a0e8-c0a3db3a1a98,2018-01-08 14:17:04,http://www.newsweek.com/clean-dream-act-hunger...,Jailed Immigrants Launch Hunger Strike until C...,1,2,8,7,...,"[0, 0, 0, 1, 0, 1, 0, 0]","[0, 0, 0, 1, 0, 1, 0, 0]",0,0,0,1,0,1,0,0
1,110,552,acf6c4c2-ce94-4c25-9fe3-aaa8b059786b,2018-01-10 00:40:32,http://thehill.com/podcasts/hillcast/368157-li...,Listen: Trump's unusual immigration meeting,1,2,6,99,...,"[0, 0, 0, 0, 0, 1, 0, 0]","[0, 0, 0, 0, 0, 1, 0, 0]",0,0,0,0,0,1,0,0
2,1166,5514,b8c69f81-c592-4d18-ae3d-f3c86706e509,2018-01-10 14:24:58,https://www.yahoo.com/news/m/f154972f-3b1e-399...,Not losing it: Trump lets live TV cover White ...,1,2,6,99,...,"[0, 0, 0, 0, 0, 1, 0, 0]","[0, 0, 0, 0, 0, 1, 0, 0]",0,0,0,0,0,1,0,0
3,1084,5146,54f419c7-91a0-4ec5-8370-4cb6e35da226,2018-01-11 15:55:29,https://www.washingtonpost.com/politics/4th-im...,4th immigrant teen in custody says US preventi...,1,1,3,4,...,"[0, 0, 0, 1, 0, 0, 0, 0]","[0, 0, 0, 1, 0, 0, 0, 0]",0,0,0,1,0,0,0,0
4,1467,6801,b4121087-567e-4532-89f3-dcab7b7a9d54,2018-01-09 03:06:09,http://www.breitbart.com/big-government/2018/0...,Jeff Flake Admits GOP Establishment Working Ag...,1,2,6,7,...,"[0, 0, 0, 0, 0, 1, 0, 0]","[0, 0, 0, 0, 0, 1, 0, 0]",0,0,0,0,0,1,0,0
5,1233,5746,e50ff89f-77e3-419f-bf5b-f08a3025a196,2018-01-09 18:00:20,http://www.breitbart.com/video/2018/01/09/rubi...,Rubio: 'You Can't' Shut Down the Gov't Over DA...,1,2,6,7,...,"[0, 0, 0, 0, 0, 1, 0, 0]","[0, 0, 0, 0, 0, 1, 0, 0]",0,0,0,0,0,1,0,0
6,1770,8189,c72642aa-ff11-4613-9444-5d174078f007,2018-01-11 18:42:19,http://www.breitbart.com/news/german-official-...,German official denies notion Saxony no-go are...,2,99,99,99,...,"[0, 0, 0, 0, 0, 1, 0, 0]","[0, 0, 0, 0, 0, 1, 0, 0]",0,0,0,0,0,1,0,0
7,212,1025,11f9716f-50b8-4cd2-bec0-26f71786c73b,2018-01-10 20:36:16,https://www.nytimes.com/2018/01/10/opinion/cit...,Letter Citizenship and the Census The Latino C...,1,2,7,99,...,"[0, 0, 0, 0, 0, 1, 0, 0]","[0, 0, 0, 0, 0, 1, 0, 0]",0,0,0,0,0,1,0,0
8,974,4643,22a1d210-7cc0-498e-b45f-2e057f398271,2018-01-11 01:35:49,http://www.theblaze.com/news/2018/01/10/rand-p...,Rand Paul tosses cold water on a border wall –...,1,2,6,7,...,"[0, 0, 0, 0, 0, 1, 0, 0]","[0, 0, 0, 0, 0, 1, 0, 0]",0,0,0,0,0,1,0,0
9,1122,5327,136e6f79-418c-42aa-b47d-17df079d259c,2018-01-13 01:23:20,https://www.yahoo.com/news/m/9d804ffa-8bb1-33c...,"Immigrants with jobs, education worry that Tru...",1,2,1,6,...,"[0, 0, 0, 1, 0, 1, 0, 0]","[0, 0, 0, 1, 0, 1, 0, 0]",0,0,0,1,0,1,0,0


In [41]:
auto_verification(
    D2_GPT_4o_finetune,
    predicted_cols=[
        "D2_GPT_4o_finetune_1",
        "D2_GPT_4o_finetune_2",
        "D2_GPT_4o_finetune_3",
        "D2_GPT_4o_finetune_4",
        "D2_GPT_4o_finetune_5",
        "D2_GPT_4o_finetune_6",
        "D2_GPT_4o_finetune_7",
        "D2_GPT_4o_finetune_8"
    ],
    true_cols=[
        "Q3_1", "Q3_2", "Q3_3", "Q3_4",
        "Q3_5", "Q3_6", "Q3_7", "Q3_8"
    ],
    category=category_D2_GPT
)



== Verification of 'D2_GPT_4o_finetune_1' vs. 'Q3_1' ==
Accuracy:   90.00%
Macro F1:   47.37%
Micro  F1:  90.00%

Full classification report:
              precision    recall  f1-score   support

           0       0.90      1.00      0.95        18
           1       0.00      0.00      0.00         2

    accuracy                           0.90        20
   macro avg       0.45      0.50      0.47        20
weighted avg       0.81      0.90      0.85        20


Confusion matrix:
[[18  0]
 [ 2  0]]

== Verification of 'D2_GPT_4o_finetune_2' vs. 'Q3_2' ==
Accuracy:   100.00%
Macro F1:   100.00%
Micro  F1:  100.00%

Full classification report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       1.00      1.00      1.00         1

    accuracy                           1.00        20
   macro avg       1.00      1.00      1.00        20
weighted avg       1.00      1.00      1.00        20


Confusion matrix:




## o4-mini (reasning model)

##### The structure and usage are identical to the previous example. However, instead of adjusting the temperature parameter to control randomness, the effort parameter is used to influence the model's reasoning depth and complexity.


#### Note: GPT reasnong models (i.e., o-series) can not be finetuned for now