# Short Prompt
<br>
James Chapman<br>
CIS 830 Advanced Topics in AI – Term Project<br>
Kansas State University<br><br>

This notebook classifies social media posts for ILLICIT/ABUSIVE drug use.<br>

- Uses 'short_prompt'
- Runs 4 LLMs (2 Paid Services & 2 Local HuggingFace Models)
    - o4-mini 
    - GPT-4o-mini
    - Meta-Llama-3.1-8B-Instruct
    - Qwen-4B


*** Saves response from every LLM call in separate text files.

In [1]:
import pandas as pd
import json
import os
import re
from tqdm import tqdm
tqdm.pandas()

from utils import (
    get_tweets_dataset, 
    extract_responses,
    extract_JSON_labels_and_explanations,
    extract_drug_labels,
    extract_T_F_labels, 
    match_terms,
    get_confusion_matrix_and_metrics,
)
from llm_manager import (
    run_prompt_on_llm,
### paid #################
    # get_claude_response, 
    # get_openai_response, 
    # get_perplexity_response,
    get_openai_gpt4omini_response,
    get_openai_o4mini_response,
### local #################
    get_llama_response, 
    get_qwen_4b_response,                        
    #get_deepseek_qwen_response,
    # get_qwen_response,
    # get_mistral_response,
    cleanup_llama,
    cleanup_qwen,
    cleanup_qwen_4b,
    cleanup_deepseek_qwen,
    cleanup_mistral,
)

In [3]:
short_prompt = """You moderate social‑media posts for drug abuse.

Label rules  
T – text shows illicit or abusive drug use (using, seeking, buying, selling, trafficking, or slang for drugs).  
F – all other cases (medical/policy discussion, news, discouraging use, metaphor).

Return only **T** or **F**, no additional text, reasoning, or explanation.

Post: "{{tweet_text}}"
Label:
"""

In [2]:
tweets = get_tweets_dataset()
SEED = 777
tweets = ( tweets.sample(n=1_000, random_state=SEED, replace=False)
                 .sort_index()
                 .reset_index(drop=True)
)
tweets.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 13 columns):
 #   Column                     Non-Null Count  Dtype 
---  ------                     --------------  ----- 
 0   text                       1000 non-null   object
 1   label                      1000 non-null   object
 2   tweet_num                  1000 non-null   int64 
 3   found_terms                1000 non-null   object
 4   found_index_terms          1000 non-null   object
 5   GPT_found_terms            1000 non-null   object
 6   GPT_found_index_terms      1000 non-null   object
 7   pubchem_found_terms        1000 non-null   object
 8   pubchem_found_index_terms  1000 non-null   object
 9   redmed_found_terms         1000 non-null   object
 10  redmed_found_index_terms   1000 non-null   object
 11  DEA_found_terms            1000 non-null   object
 12  DEA_found_index_terms      1000 non-null   object
dtypes: int64(1), object(12)
memory usage: 101.7+ KB


# RUN 4 MODELS: short_prompt

In [4]:
# o4-mini (smarter than GPT-4o-mini)
responses = run_prompt_on_llm(get_openai_o4mini_response, "o4mini_short", short_prompt, tweets)
#tweets["o4mini_response"] = responses

# GPT-4o-mini
responses = run_prompt_on_llm(get_openai_gpt4omini_response, "gpt4omini_short", short_prompt, tweets)
#tweets["4o_mini_response"] = responses

# Meta-Llama-3.1-8B-Instruct
responses = run_prompt_on_llm(get_llama_response, "llama_short", short_prompt, tweets)
# tweets["llama_response"] = responses
cleanup_llama()

# Qwen-4B
responses = run_prompt_on_llm(get_qwen_4b_response, "qwen_4b_short", short_prompt, tweets)
# tweets["qwen_4b_response"] = responses
cleanup_qwen_4b()


100%|██████████| 1000/1000 [48:41<00:00,  2.92s/it] 
100%|██████████| 1000/1000 [11:08<00:00,  1.50it/s]


In [None]:
:)

In [None]:
short_prompt = """
Classify the following social media content regarding drug references as either illicit/abusive drug-related (**True**) or not (**False**).

Output **only** the single word: **True** or **False**. Do not include any other text, punctuation, or formatting.

**Definitions:**
**True** (Illicit/Abusive Drug-Related): Content mentioning illegal, RX, or OTC drugs in context of abuse (using, buying, promoting), using slang for recreational use, or glorifying misuse.
**False** (Not Illicit/Abusive Drug-Related): Content mentioning drugs in medical/educational/prevention context (discouraging), or using drug terms metaphorically/in unrelated topics.

**Examples:**
Content: `I got her hooked on that lean everyday she say she want codeine`
True

Content: `All me and Tyler do is crack on each other lmfao`
False

Content:{{tweet_text}}
"""
short_prompt = """
You are an expert content‑moderator.  
Respond with **only** the word: True or False.

• True = mentions illegal / Rx / OTC drugs in a misuse context (use, buy, sell, slang, glorify).  
• False = medical, educational, preventative, or unrelated use of drug terms.

Content: {{tweet_text}}
"""

