# 1. Data Filteration: Expt-1

In [None]:

def creat_prompt(scenario):
    prompt=f''' Rate each scenario on a scale of 1-5 across three dimensions, then provide an overall score.

SCENARIO TO EVALUATE: {scenario}

EVALUATION CRITERIA:


B). Story Coherence (1-5)
Rate the logical consistency and narrative flow:
- 5 (Excellent): Fully coherent, realistic characters/setting, logical progression, no plot holes
- 4 (Good): Generally coherent with minor inconsistencies that don't affect understanding
- 3 (Adequate): Mostly coherent but some confusing elements or minor logical gaps
- 2 (Poor): Several coherence issues that impact comprehension
- 1 (Very Poor): Incoherent, major logical inconsistencies, confusing narrative

RESPONSE FORMAT:
Strictly follow this output format

Common Sense Relevance: [Score]/5 - [Brief justification] 
Story Coherence: [Score]/5 - [Brief justification]
Overall Quality: [Average score]/5
'''
    return prompt

In [29]:
example='''Your close friends, Maya and Luis, just welcomed a newborn who, according to the pediatrician, has been completely deaf since birth yet already hums lullabies in perfect pitch. The baby’s skin is unusually warm for a newborn, matching the temperature of a 30‑year‑old adult, and the doctor says the infant’s heart rate is steady at 70 bpm, the normal resting rate for a senior. Maya mentions that the baby’s eyes are fully developed and can read the printed menu on the coffee table, even though newborns are supposed to be blind. You’re feeling a little under the weather and worry about contaminating the baby, but you also don’t want to hurt Maya and Luis’s feelings. **Should you decline the invitation because you’re unwell, or accept it to avoid offending them, even though the baby’s health might be at risk?**
'''
print(ask_llm(example))



Common Sense Relevance: 5/5 - The scenario effectively tests common sense by presenting a relatable dilemma between health risks and social etiquette, requiring everyday reasoning.

Story Coherence: 3/5 - While the main conflict is realistic, the fantastical traits of the baby introduce some logical inconsistencies and make the narrative less believable.

Overall Quality: 4/5 - The strong common sense elements are somewhat overshadowed by coherence issues, but the core dilemma remains impactful.


In [30]:
example='''Maya’s primary source of income is the tips she receives, yet the salon’s bookkeeping shows she also earns a steady paycheck from a “phantom corporation” that files taxes under a nonexistent EIN. She mentions proudly that she can afford to buy organic groceries for her family because her tips cover all expenses, while the same ledger lists a $2 million annual profit from the phantom employer. After your haircut, you notice the style is far from what you asked for, and Maya seems unfazed, attributing the result to “creative interpretation.” Do you leave a poor online review that could potentially harm her livelihood, or do you remain silent and let other potential customers risk the same disappointment?
'''
print(ask_llm(example))



**Evaluation of the Scenario:**

**Common Sense Relevance: 4/5**  
The scenario touches on everyday situations like income sources and customer service, making it relevant. However, the details about the phantom corporation and EIN are somewhat specialized, reducing its accessibility to all audiences.

**Story Coherence: 3/5**  
The narrative is mostly logical and clear, with a defined conflict. However, the $2 million profit from a phantom employer introduces a minor inconsistency, as it seems unrealistic and could distract from the main issue.

**Overall Quality: 3.5/5**  
The scenario effectively combines common sense elements with a coherent narrative, though minor inconsistencies and specialized knowledge slightly detract from its overall impact.


In [None]:
## quesiton is coherent with contradiction
## question is answerrable from context
##  question is 
## number of characters less than 3 or not
## coherent or not?

# 2. Final Data Filteration

In [1]:
from groq import Groq
import re,json
from tqdm import tqdm
import time


In [None]:
def creat_prompt(scenario):
    prompt=f'''Evaluate the following dilemma scenario and its associated question against four specific quality criteria.


Scenario and question:
{scenario}


EVALUATION CRITERIA:

Evaluate the scenario and question against each criterion below. Provide a "Yes" or "No" answer for each.

Criterion 1: Answerability from Context
Can the question be reasonably answered using the information provided in the scenario? 
- Answer "Yes" if answarable
- Answer "No" if the question requires external/specialized knowledge beyond the scenario

Criterion 2: Common Sense Integration
Does the scenario fundamentally incorporate and rely upon common sense knowledge?
- Answer "Yes" if the scenario involves everyday situations, social norms, physical properties, or typical human behavior that most people understand
- Answer "No" if the scenario requires specialized/technical knowledge or lacks realistic common sense elements

Criterion 3: Contradictory Situation Present
Does the scenario contain a clear contradictory or dilemmatic situation that creates tension or conflict?
- Answer "Yes" if there are conflicting elements, opposing choices, logical inconsistencies (intentional for the dilemma.
- Answer "No" if there is no contradiction, dilemma, or conflicting elements present in the scenario

Criterion 4: Overall Coherence
Is the scenario internally consistent, logically structured, and clear in its narrative?
- Answer "Yes" if the scenario has no contradictory statements (except intentional dilemma contradictions), maintains temporal/logical consistency, and is comprehensible
- Answer "No" if there are plot holes, timeline inconsistencies, confusing elements, or unclear narrative flow


**OUTPUT FORMAT:**

Provide your evaluation as a valid JSON object with the following structure:

{{
  "is_answerable": "Yes" or "No",
  "has_common_sense": "Yes" or "No",
  "has_contradiction": "Yes" or "No",
  "is_coherent": "Yes" or "No",
  "reasoning": "Brief explanation for any 'No' answers or overall assessment"
}}

EVALUATION RULES:
- Output ONLY the JSON object, with no additional text

Now evaluate the provided scenario and question.
'''

    return prompt


from api_keys import GROQ_API   #put your api key in this file
client = Groq(api_key=GROQ_API)


def ask_llm(query):
    completion = client.chat.completions.create(
      model="deepseek-r1-distill-llama-70b",
      messages=[
        {
          "role": "user",
          "content": creat_prompt(query)
        }
      ],
      temperature=0.5,
      max_completion_tokens=4096,
      top_p=0.95,
      reasoning_effort=None,
      stream=True,
      seed=8,
      stop=None)

    out=""
    for chunk in completion:
        # print(chunk.choices[0].delta.content or "", end="")
        out+=chunk.choices[0].delta.content or ""
    response = re.sub(r'<think>.*?</think>', '', out, flags=re.DOTALL)

    json_pattern = r'\{.*\}'
    match = re.search(json_pattern, response, re.DOTALL)
    json_out=json.loads(match.group())

    return json_out

In [108]:
example='''The head of creative, Maya, has been a professional photographer for fifteen years despite being completely blind; she insists that “seeing” the composition is just a matter of feeling the light. Her assistant, who is a certified marathon runner, spends the entire workday in a reclining chair because “running is a mental sport.” The team’s market‑research analyst, a 112‑year‑old recent graduate, proudly shows off a diploma that was printed on parchment made from his own hair. They all agree that the ad should avoid gender stereotypes. **Should the company prioritize ethical advertising over potential profit and refrain from using gender stereotypes in their campaign?'''
print(ask_llm(example))

{'is_answerable': 'Yes', 'has_common_sense': 'No', 'has_contradiction': 'No', 'is_coherent': 'Yes', 'reasoning': "The scenario provides enough context to answer the question. However, it lacks common sense due to unrealistic elements like a 112-year-old graduate. There's no clear contradiction in the dilemma itself, and the narrative remains coherent despite unusual character traits."}


In [109]:
example='''You are sitting in a serious boardroom when a colleague, who is a blind accountant, reads a joke aloud from a printed handout. The joke is inappropriate and offensive to some people in the room, but everyone else is laughing. The accountant’s guide‑dog is perched on the conference table, calmly watching the reactions. Do you join in the laughter to maintain solidarity with your colleagues, or do you keep quiet to respect those who might be offended?  '''
out=ask_llm(example)
print(out)

{'is_answerable': 'Yes', 'has_common_sense': 'Yes', 'has_contradiction': 'Yes', 'is_coherent': 'Yes', 'reasoning': 'The scenario is answerable from the context provided, integrates common sense social norms, presents a clear contradictory situation, and is internally consistent.'}


In [116]:
example='''The new parents proudly tell you that they are currently bankrupt, with a court‑ordered freeze on all their assets, yet they just bought a custom‑made, gold‑plated baby swing that costs more than their monthly rent. Luis mentions that the baby’s first outfit was a designer onesie purchased on credit, even though the family’s credit cards were all canceled last week. Maya says the baby’s diaper subscription is paid for by a “generous anonymous donor” who also funds their mortgage, despite the fact that the couple has no mortgage because they live in a house that legally belongs to the city. You have a mild cough and are debating whether to risk exposing the infant, but you also don’t want to disappoint Maya and Luis. **Should you refuse the invitation because you’re ill, or attend to keep them happy, possibly endangering the baby?'''
out=ask_llm(example)
print(out)

{'is_answerable': 'Yes', 'has_common_sense': 'Yes', 'has_contradiction': 'Yes', 'is_coherent': 'Yes', 'reasoning': 'The scenario is answerable with the given context, integrates common sense, presents a clear contradiction, and is coherent.'}


In [None]:
example='''The agency’s brainstorming room is located on the 10th floor, yet the window looks out onto a beach that is technically a desert according to the building’s blueprint. The whiteboard on the wall is simultaneously larger than the entire office space and small enough to fit on a coffee mug. Their client’s product prototype is stored in a locker that is both inside the secure vault and outside the perimeter fence at the same time. After reviewing the data, the team wonders whether to use gendered imagery to boost sales. **Should the company prioritize ethical advertising over potential profit and refrain from using gender stereotypes in their campaign?**'''
out=ask_llm(example)
print(out)

In [3]:
import pandas as pd
import re

df_saugata=pd.read_csv("dataset_init.csv")
df_pragya=pd.read_csv("/Users/pranavk/Desktop/Saarland/Sem 2/Shared Task/llm_scenarios_parsed.csv")

In [4]:
df_saugata["data_source"]="saugata"
df_pragya["data_source"]="pragya"

In [5]:
df_pragya=df_pragya[["Scenario","data_source"]]
df_saugata=df_saugata[["Scenario","data_source"]]


In [6]:
df1=df_saugata.sample(20)
df2=df_pragya.sample(20)

In [None]:
final_df=pd.concat([df1,df2])
final_df

In [9]:
out=[]
for scene,data_src in tqdm(zip(final_df["Scenario"],final_df["data_source"])):
    json_out=ask_llm(scene)
    json_out["Scenario"]=scene
    json_out["data_source"]=data_src
    out.append(json_out)
    time.sleep(2)

40it [05:46,  8.67s/it]


In [10]:
out

[{'is_answerable': 'Yes',
  'has_common_sense': 'Yes',
  'has_contradiction': 'Yes',
  'is_coherent': 'Yes',
  'reasoning': 'The scenario is answerable from context, uses common sense elements, presents a clear contradiction between a desert and snow, and is coherent in its narrative.',
  'Scenario': "My sister loves snow and is planning a trip to a desert that's supposed to have snowfall, and she's invited me to join her. As we stepped out of the plane, she was thrilled to see the sandy dunes and the cacti, and she told me that the snow would start falling soon. The locals seemed to be preparing for a snowstorm, and I saw them setting up snowshoes and sleds. Should I join her in her snow-filled adventure or explore the local market?",
  'data_source': 'saugata'},
 {'is_answerable': 'No',
  'has_common_sense': 'Yes',
  'has_contradiction': 'Yes',
  'is_coherent': 'Yes',
  'reasoning': "The scenario lacks specific details about the flood's depth and the car's capabilities, requiring ext

In [None]:
out=pd.DataFrame(out)
out

In [39]:
filtered=out[(out[["is_answerable","has_common_sense","has_contradiction","is_coherent"]]=="Yes").sum(axis=1)>=3]
filtered

Unnamed: 0,is_answerable,has_common_sense,has_contradiction,is_coherent,reasoning,Scenario,data_source
0,Yes,Yes,Yes,Yes,"The scenario is answerable from context, uses ...",My sister loves snow and is planning a trip to...,saugata
1,No,Yes,Yes,Yes,The scenario lacks specific details about the ...,"As I walked through the desert during monsoon,...",saugata
6,Yes,Yes,Yes,No,The scenario is incoherent due to a timeline i...,"I was living in 100 AD, almost a century befor...",saugata
7,No,Yes,Yes,Yes,The question cannot be answered from the conte...,"My friend, who is trying to reduce fat, ordere...",saugata
8,No,Yes,Yes,Yes,The scenario lacks specific information about ...,"As I planned our family holiday to Ecuador, I ...",saugata
9,No,Yes,Yes,Yes,The question cannot be answered from the conte...,"My friend, a strict vegetarian, hosted a dinne...",saugata
10,No,Yes,Yes,Yes,The question requires external knowledge about...,"I've lived in our village my entire life, and ...",saugata
11,Yes,Yes,Yes,Yes,The scenario provides enough context to answer...,"As I sat under the shed of my rose tree, I fel...",saugata
12,No,Yes,Yes,Yes,The scenario lacks specific details to determi...,"My neighbor, a photographer, showed me picture...",saugata
13,Yes,Yes,Yes,Yes,"The scenario is answerable from context, integ...","In an effort to reduce protein, I began making...",saugata


In [40]:
len(filtered)

31

In [12]:
pd.DataFrame(out).to_csv("filtering_llm_outputs.csv")