# Query Local Llama.cpp LLM with Explicit Value-Only Survey Responses

This notebook loads survey questions from a CSV file and queries a locally-running llama.cpp server with prompts that require the model to answer **only with a value from the provided response scale** (no explanation, just the value).

**Data Source:** EU Values Survey (ZA7500_q_gb_v2.csv)

**Workflow:**
1. Load the CSV file containing all survey questions and their options
2. Build explicit prompts for each question-variable-option combination, instructing the model to answer with a value from the scale
3. Send each prompt to the local llama.cpp server
4. Collect responses
5. Save all output in machine-readable CSV and JSON files for easy parsing

**Prerequisites:**
- A llama.cpp server running on `http://127.0.0.1:10006/` (or adjust the URL in the cells below)
- Install `requests` and `pandas`

**Example to start a local llama.cpp server using :**
```bash
/llama-server -m gemma3:12b --mmproj /Users/jls/Library/Caches/llama.cpp/ggml-org_gemma-3-12b-it-GGUF_mmproj-model-f16.gguf --ctx-size 0  -ub 2048 -b 2048 -ngl 99 -fa on --chat-template-kwargs {\"reasoning_effort\":\"medium\"} --port 10006
```


In [None]:
!pip install pandas requests PyMuPDF pdfplumber


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
^C


In [54]:
import requests
import json
from typing import List, Dict, Optional
import time
import pandas

# Configuration
LLAMA_SERVER_URL = "http://127.0.0.1:10000"
CHAT_ENDPOINT = f"{LLAMA_SERVER_URL}/v1/chat/completions"

print(f"LLM Server URL: {LLAMA_SERVER_URL}")
print(f"Chat Endpoint: {CHAT_ENDPOINT}")

LLM Server URL: http://127.0.0.1:10000
Chat Endpoint: http://127.0.0.1:10000/v1/chat/completions


In [55]:
def query_llm(
    question: str,
    temperature: float = 0.7,
    max_tokens: int = -1,
    timeout: int = 300
) -> Optional[str]:
    """
    Send a question to the local llama.cpp server and get a response.
    
    Args:
        question: The question/prompt to send
        temperature: Sampling temperature (0.0 = deterministic, 1.0 = more random)
        max_tokens: Maximum tokens in response
        timeout: Request timeout in seconds
    
    Returns:
        The LLM's response text, or None if error
    """
    try:
        payload = {
            "model": "local-model",  # llama.cpp uses this default
            "messages": [
                {
                    "role": "user",
                    "content": question
                }
            ],
            "temperature": temperature,
            "max_tokens": max_tokens,
            "stream": False
        }
        
        print(f"Querying: {question[:80]}...")
        response = requests.post(
            CHAT_ENDPOINT,
            json=payload,
            timeout=timeout
        )
        
        if response.status_code == 200:
            result = response.json()
            answer = result.get("choices", [{}])[0].get("message", {}).get("content", "").strip()
            return answer
        else:
            print(f"  Error: HTTP {response.status_code}")
            print(f"  Response: {response.text[:200]}")
            return None
            
    except requests.exceptions.Timeout:
        print(f"  Error: Request timeout after {timeout}s")
        return None
    except requests.exceptions.ConnectionError:
        print(f"  Error: Could not connect to {LLAMA_SERVER_URL}")
        return None
    except Exception as e:
        print(f"  Error: {e}")
        return None

In [56]:
# Test connection to the server with detailed debugging
print("Testing connection to local llama.cpp server...")
test_question = "Hello, how are you?"

try:
    test_payload = {
        "model": "local-model",
        "messages": [{"role": "user", "content": test_question}],
        "temperature": 0.7,
        "max_tokens": 5000,
        "stream": False
    }
    
    print(f"Server URL: {LLAMA_SERVER_URL}")
    print(f"Payload: {test_payload}\n")
    
    response = requests.post(CHAT_ENDPOINT, json=test_payload, timeout=30)
    
    print(f"Response Status: {response.status_code}")
    print(f"Response Headers: {response.headers}")
    print(f"Response Body: {response.text}\n")
    
    if response.status_code == 200:
        result = response.json()
        test_answer = result.get("choices", [{}])[0].get("message", {}).get("content", "").strip()
        print(f"✓ Test successful! LLM responded:")
        print(f"  Q: {test_question}")
        print(f"  A: {test_answer}\n")
    else:
        print(f"✗ Server returned error {response.status_code}")
        try:
            error_detail = response.json()
            print(f"  Error details: {error_detail}")
        except:
            print(f"  Raw response: {response.text}")
            
except Exception as e:
    print(f"✗ Connection failed: {e}")
    print("\nPlease ensure:")
    print("  1. llama.cpp server is running on http://127.0.0.1:10006/")
    print("  2. You can reach the endpoint (check firewall, port, etc.)")
    print("  3. Try in terminal: curl -X POST http://127.0.0.1:10006/v1/chat/completions -H 'Content-Type: application/json' -d '{\"messages\":[{\"role\":\"user\",\"content\":\"hi\"}]}'")

Testing connection to local llama.cpp server...
Server URL: http://127.0.0.1:10000
Payload: {'model': 'local-model', 'messages': [{'role': 'user', 'content': 'Hello, how are you?'}], 'temperature': 0.7, 'max_tokens': 5000, 'stream': False}

Response Status: 200
Response Headers: {'Keep-Alive': 'timeout=5, max=100', 'Content-Type': 'application/json; charset=utf-8', 'Server': 'llama.cpp', 'Content-Length': '1557', 'Access-Control-Allow-Origin': ''}
Response Body: {"choices":[{"finish_reason":"stop","index":0,"message":{"role":"assistant","reasoning_content":"Okay, the user is greeting me. I need to respond appropriately. Since I'm an AI, I don't have feelings, but I can simulate a friendly conversation. I should ask how they're doing in return or respond in a polite manner. I'll go with a friendly response that acknowledges the greeting and asks about their well-being.\n\nLet's craft a response:\n\n\"Hello! I'm an AI, so I don't have feelings, but I'm here and ready to help you. How abo

In [None]:
import os
import requests
import base64

def query_llm_with_pdf(
    pdf_file_path: str,
    prompt: str,
    temperature: float = 0.7,
    max_tokens: int = -1,
    timeout: int = 120
) -> Optional[str]:
    """
    Extract text from a PDF file and send it with a prompt to the local llama.cpp server.
    
    Args:
        pdf_file_path: Path to the PDF file
        prompt: The prompt/instructions to send with the PDF content
        temperature: Sampling temperature
        max_tokens: Maximum tokens in response
        timeout: Request timeout in seconds
    
    Returns:
        The LLM's response text, or None if error
    """
    try:
        # Check if file exists
        if not os.path.exists(pdf_file_path):
            print(f"Error: PDF file not found: {pdf_file_path}")
            return None
        
        # Extract text from PDF using pdfplumber
        try:
            import pdfplumber
        except ImportError:
            print("Installing pdfplumber...")
            import subprocess
            subprocess.check_call(["pip", "install", "pdfplumber"])
            import pdfplumber
        
        # Extract text from all pages
        pdf_text = ""
        with pdfplumber.open(pdf_file_path) as pdf:
            for page_num, page in enumerate(pdf.pages, 1):
                page_text = page.extract_text()
                if page_text:
                    pdf_text += f"\n\n--- Page {page_num} ---\n{page_text}"
        
        if not pdf_text.strip():
            print("Error: Could not extract text from PDF")
            return None
        
        # Build the message with the extracted PDF text
        combined_content = f"PDF Content:\n{pdf_text}\n\n{prompt}"
        
        payload = {
            "model": "local-model",
            "messages": [
                {
                    "role": "user",
                    "content": combined_content
                }
            ],
            "temperature": temperature,
            "max_tokens": max_tokens,
            "stream": False
        }
        
        print(f"Sending PDF: {os.path.basename(pdf_file_path)}")
        print(f"Extracted text: {len(pdf_text)} characters")
        print(f"Prompt: {prompt[:100]}...")
        
        response = requests.post(
            CHAT_ENDPOINT,
            json=payload,
            timeout=timeout
        )
        
        if response.status_code == 200:
            result = response.json()
            answer = result.get("choices", [{}])[0].get("message", {}).get("content", "").strip()
            return answer
        else:
            print(f"Error: HTTP {response.status_code}")
            print(f"Response: {response.text[:500]}")
            return None
            
    except Exception as e:
        print(f"Error: {e}")
        import traceback
        traceback.print_exc()
        return None

# Parsing EU surveys examples

In [None]:
# Query llama.cpp with ZA7500_q_gb.pdf and extraction prompt
pdf_path = "./Surveys/ZA7500_q_gb.pdf"

extraction_prompt = """Extract the main questions and response choices from the attached PDF into a CSV file. 
Make sure that for the provided answer choices you include the choices available in the corresponding card (reported in subsequent pages in the same document).
Include also the text of the questions along with the Question number, e.g., Q1: Please say, for each of the following, how important it is in your life. 
Include a column 'Response_Scale' the scale and the category it belongs the number, e.g., for Q1 the scale or possible options should be: 1 (very important) 
- 2 (quite important) - 3 (not important) - 4 (not at all important) - 8 (Don't Know, DK) - 9 (Not Answer, NA). Split every question if necessary by the response options. 
Extract the variables names v1, v2, v3, ... and so, and be sure that all questions have a variable name, response option and include them as a column of the csv. 
Be sure that every column field is quoted to make easy parsing for text including commas. 
Output the CSV content directly with proper formatting and column names Question Number,Question Text,Response_Scale,Response_Options,Card_Number,Variable_Name
Example for Q1: 
Q1,"Please say, for each of the following, how important it is in your life.","1 (very important) - 2 (quite important) - 3 (not important) - 4 (not at all important) - 8 (Don't Know, DK) - 9 (Not Answer, NA)",Work,1,v1
Q1,"Please say, for each of the following, how important it is in your life.","1 (very important) - 2 (quite important) - 3 (not important) - 4 (not at all important) - 8 (Don't Know, DK) - 9 (Not Answer, NA)",Family,1,v2
Q1,"Please say, for each of the following, how important it is in your life.","1 (very important) - 2 (quite important) - 3 (not important) - 4 (not at all important) - 8 (Don't Know, DK) - 9 (Not Answer, NA)",Friends and acquaintances,1,v3
Q1,"Please say, for each of the following, how important it is in your life.","1 (very important) - 2 (quite important) - 3 (not important) - 4 (not at all important) - 8 (Don't Know, DK) - 9 (Not Answer, NA)",Leisure time,1,v4
Q1,"Please say, for each of the following, how important it is in your life.","1 (very important) - 2 (quite important) - 3 (not important) - 4 (not at all important) - 8 (Don't Know, DK) - 9 (Not Answer, NA)",Politics,1,v5
Q1,"Please say, for each of the following, how important it is in your life.","1 (very important) - 2 (quite important) - 3 (not important) - 4 (not at all important) - 8 (Don't Know, DK) - 9 (Not Answer, NA)",Religion,1,v6
"""

print("=" * 80)
print("Querying llama.cpp with PDF extraction request")
print("=" * 80)
print(f"\nPDF File: {pdf_path}")
print(f"Prompt: {extraction_prompt[:150]}...\n")

# Query the LLM with the PDF
response = query_llm_with_pdf(
    pdf_path,
    extraction_prompt,
    temperature=0.3,  # Lower temperature for more consistent extraction
    max_tokens=-1,  # No max token limit
    timeout=3000 # Longer timeout for PDF processing
)

if response:
    print("\n" + "=" * 80)
    print("LLM Response (CSV Content):")
    print("=" * 80 + "\n")
    print(response)
    
    # Save response to file with same name as PDF but with .csv extension
    output_file = os.path.splitext(pdf_path)[0] + ".csv"
    with open(output_file, "w") as f:
        f.write(response)
    print(f"\n✓ Response saved to: {output_file}")
else:
    print("\n✗ Failed to get response from llama.cpp")

Querying llama.cpp with PDF extraction request

PDF File: ./Surveys/ZA7500_q_gb.pdf
Prompt: Extract the main questions and response choices from the attached PDF into a CSV file. 
Make sure that for the provided answer choices you include the...



Exception ignored in: <bound method IPythonKernel._clean_thread_parent_frames of <ipykernel.ipkernel.IPythonKernel object at 0x714992344c90>>
Traceback (most recent call last):
  File "/home/id02619@hi.inet/.pyenv/versions/langchain/lib/python3.11/site-packages/ipykernel/ipkernel.py", line 775, in _clean_thread_parent_frames
    def _clean_thread_parent_frames(

KeyboardInterrupt: 


Sending PDF: ZA7500_q_gb.pdf
Extracted text: 88943 characters
Prompt: Extract the main questions and response choices from the attached PDF into a CSV file. 
Make sure th...


In [None]:
# Query llama.cpp with ZA7500_q_gb.pdf and extraction prompt
pdf_path = "./Surveys/ZA7500_q_es.pdf"

extraction_prompt = """Extract the main questions and response choices from the corresponding text into a CSV file, parsing the language Spanish. 
Make sure that for the provided answer choices you include the choices available in the corresponding card (reported in subsequent pages in the same document).
Include also the text of the questions along with the Question number, e.g., Q1: Por favor, indica qué tan importante es cada uno de los siguientes aspectos en tu vida. 
Include a column 'Response_Scale' the scale and the category it belongs the number, e.g., for Q1 the scale or possible options should be: 1 (muy importante) - 2 (bastante importante) - 3 (no importante) - 4 (nada importante) - 8 (No Sabe, NS) - 9 (Sin Respuesta, SR).
Split every question if necessary by the response options. Extract the variables names v1, v2, v3, ... and so, and be sure that all questions have a variable name, response option and include them as a column of the csv. 
Example for Q1 in Spanish: 
"Q1","Por favor, indica qué tan importante es cada uno de los siguientes aspectos en tu vida.","1 (muy importante) - 2 (bastante importante) - 3 (no importante) - 4 (nada importante) - 8 (No Sabe, NS) - 9 (Sin Respuesta, SR)","Trabajo","1","v1"
"Q1","Por favor, indica qué tan importante es cada uno de los siguientes aspectos en tu vida.","1 (muy importante) - 2 (bastante importante) - 3 (no importante) - 4 (nada importante) - 8 (No Sabe, NS) - 9 (Sin Respuesta, SR)","Familia","1","v2"
"Q1","Por favor, indica qué tan importante es cada uno de los siguientes aspectos en tu vida.","1 (muy importante) - 2 (bastante importante) - 3 (no importante) - 4 (nada importante) - 8 (No Sabe, NS) - 9 (Sin Respuesta, SR)","Amigos y conocidos","1","v3"
"Q1","Por favor, indica qué tan importante es cada uno de los siguientes aspectos en tu vida.","1 (muy importante) - 2 (bastante importante) - 3 (no importante) - 4 (nada importante) - 8 (No Sabe, NS) - 9 (Sin Respuesta, SR)","Tiempo libre","1","v4"
"Q1","Por favor, indica qué tan importante es cada uno de los siguientes aspectos en tu vida.","1 (muy importante) - 2 (bastante importante) - 3 (no importante) - 4 (nada importante) - 8 (No Sabe, NS) - 9 (Sin Respuesta, SR)","Política","1","v5"
"Q1","Por favor, indica qué tan importante es cada uno de los siguientes aspectos en tu vida.","1 (muy importante) - 2 (bastante importante) - 3 (no importante) - 4 (nada importante) - 8 (No Sabe, NS) - 9 (Sin Respuesta, SR)","Religión","1","v6"
Make sure that each column field is enclosed in quotes to facilitate parsing text containing commas.
Generate the CSV file content directly in a suitable format and with the column names: Question Number, Question Text, Response_Scale, Response_Options, Card_Number, Variable_Name.
There are a total of 111 questions and 286 variables. Extract them all in order, Q1, Q2, ... until Q111 and do not skip any. The csv should contain a total of 286 rows.
"""

extraction_prompt = """Extrae las preguntas principales y las opciones de respuesta del texto en el idioma original y formatea la información en un archivo CSV. Utiliza español. 
Asegúrate de incluir las opciones de respuesta disponibles en la tarjeta correspondiente (informada en páginas posteriores del mismo documento).
Incluye también el texto de las preguntas junto con el número de la pregunta, por ejemplo: P1: Por favor, indica qué tan importante es cada uno de los siguientes aspectos en tu vida. 
Incluye una columna 'Response_Scale' que especifique la escala y la categoría a la que pertenece el número, por ejemplo, para la primera opción en P1, la 'Response_Scale' (posibles opciones) debería ser: 1 (muy importante) - 2 (bastante importante) - 3 (no importante) - 4 (nada importante) - 8 (No Sabe, NS) - 9 (Sin Respuesta, SR) y las 'Response_Options' deberían ser: Trabajo. Divide cada pregunta si es necesario por las opciones de respuesta.
Extrae los nombres de variables v1, v2, v3, y así sucesivamente, y asegúrate de que todas las preguntas tengan un nombre de variable (tantas filas como variables), una opción de respuesta y que se incluyan como una columna en el archivo CSV.
Ejemplo para Q1 en español:
"Question_Number","Question_Text","Response_Scale","Response_Option","Score","Variable_Name"
"Q1","Por favor, indica qué tan importante es cada uno de los siguientes aspectos en tu vida.","1 (muy importante) - 2 (bastante importante) - 3 (no importante) - 4 (nada importante) - 8 (No Sabe, NS) - 9 (Sin Respuesta, SR)","Trabajo","1","v1"
"Q1","Por favor, indica qué tan importante es cada uno de los siguientes aspectos en tu vida.","1 (muy importante) - 2 (bastante importante) - 3 (no importante) - 4 (nada importante) - 8 (No Sabe, NS) - 9 (Sin Respuesta, SR)","Familia","1","v2"
"Q1","Por favor, indica qué tan importante es cada uno de los siguientes aspectos en tu vida.","1 (muy importante) - 2 (bastante importante) - 3 (no importante) - 4 (nada importante) - 8 (No Sabe, NS) - 9 (Sin Respuesta, SR)","Amigos y conocidos","1","v3"
"Q1","Por favor, indica qué tan importante es cada uno de los siguientes aspectos en tu vida.","1 (muy importante) - 2 (bastante importante) - 3 (no importante) - 4 (nada importante) - 8 (No Sabe, NS) - 9 (Sin Respuesta, SR)","Tiempo libre","1","v4"
"Q1","Por favor, indica qué tan importante es cada uno de los siguientes aspectos en tu vida.","1 (muy importante) - 2 (bastante importante) - 3 (no importante) - 4 (nada importante) - 8 (No Sabe, NS) - 9 (Sin Respuesta, SR)","Política","1","v5"
"Q1","Por favor, indica qué tan importante es cada uno de los siguientes aspectos en tu vida.","1 (muy importante) - 2 (bastante importante) - 3 (no importante) - 4 (nada importante) - 8 (No Sabe, NS) - 9 (Sin Respuesta, SR)","Religión","1","v6"
Asegúrate de que cada campo de columna esté entre comillas para facilitar el análisis de texto que incluya comas.
Genera el contenido del archivo CSV directamente con un formato adecuado y nombres de columna: Question Number,Question Text,Response_Scale,Response_Options,Card_Number,Variable_Name
Hay un total de 111 preguntas y 286 variables. Extrae todas en orden, Q1, Q2, ... hasta Q111 y no omitas ninguna. El archivo CSV debe contener un total de 286 filas.
"""

print("=" * 80)
print("Querying llama.cpp with PDF extraction request")
print("=" * 80)
print(f"\nPDF File: {pdf_path}")
print(f"Prompt: {extraction_prompt[:150]}...\n")

# Query the LLM with the PDF
response = query_llm_with_pdf(
    pdf_path,
    extraction_prompt,
    temperature=0.3,  # Lower temperature for more consistent extraction
    max_tokens=-1,  # No max token limit
    timeout=300000 # Longer timeout for PDF processing
)

if response:
    print("\n" + "=" * 80)
    print("LLM Response (CSV Content):")
    print("=" * 80 + "\n")
    print(response)
    
    # Save response to file with same name as PDF but with .csv extension
    output_file = os.path.splitext(pdf_path)[0] + ".csv"
    with open(output_file, "w") as f:
        f.write(response)
    print(f"\n✓ Response saved to: {output_file}")
else:
    print("\n✗ Failed to get response from llama.cpp")

Querying llama.cpp with PDF extraction request

PDF File: ./Surveys/ZA7500_q_es.pdf
Prompt: Extrae las preguntas principales y las opciones de respuesta del texto en el idioma original y formatea la información en un archivo CSV. Utiliza espa...

Sending PDF: ZA7500_q_es.pdf
Extracted text: 72065 characters
Prompt: Extrae las preguntas principales y las opciones de respuesta del texto en el idioma original y forma...

LLM Response (CSV Content):

```csv
"Question_Number","Question_Text","Response_Scale","Response_Options","Card_Number","Variable_Name"
"Q1","Por favor, indica qué tan importante es cada uno de los siguientes aspectos en tu vida.","1 (muy importante) - 2 (bastante importante) - 3 (no muy importante) - 4 (nada importante) - 8 (No Sabe, NS) - 9 (Sin Respuesta, SR)","Trabajo","1","v1"
"Q1","Por favor, indica qué tan importante es cada uno de los siguientes aspectos en tu vida.","1 (muy importante) - 2 (bastante importante) - 3 (no muy importante) - 4 (nada importante) - 8 (N

In [None]:
# Query llama.cpp with ZA7500_q_gb.pdf and extraction prompt
pdf_path = "./Surveys/ZA7500_q_it.pdf"

extraction_prompt = """Estrai le domande principali e le opzioni di risposta dal testo nella lingua originale e formatta le informazioni in un file CSV. Utilizza l'italiano.
Assicurati di includere le opzioni di risposta disponibili sulla scheda corrispondente (indicate nelle pagine successive dello stesso documento).
Includi anche il testo delle domande insieme al numero della domanda, ad esempio: Q1: Per favore, indica quanto sia importante per te ciascuno dei seguenti aspetti nella tua vita.
Includi una colonna 'Response_Scale' che specifichi la scala e la categoria a cui appartiene il numero, ad esempio, per la prima opzione in D1, la 'Response_Scale' (opzioni possibili) 
dovrebbe essere: 1 (molto importante) - 2 (abbastanza importante) - 3 (non importante) - 4 (per niente importante) - 8 (Non so, NS) - 9 (Nessuna risposta, SR) e le 'Response_Option' 
dovrebbero essere: Lavoro. Dividi ogni domanda se necessario per le opzioni di risposta.
Estrai i nomi delle variabili v1, v2, v3, e così via, e assicurati che tutte le domande abbiano un nome di variabile (tante righe quante variabili), un'opzione di risposta e che siano incluse come una colonna nel file CSV.
"Q1","Per favore, indica quanto sia importante per te ciascuno dei seguenti aspetti nella tua vita.","1 (molto importante) - 2 (abbastanza importante) - 3 (non importante) - 4 (per niente importante) - 8 (Non so, NS) - 9 (Nessuna risposta, SR)","Lavoro","1","v1"
"Q1","Per favore, indica quanto sia importante per te ciascuno dei seguenti aspetti nella tua vita.","1 (molto importante) - 2 (abbastanza importante) - 3 (non importante) - 4 (per niente importante) - 8 (Non so, NS) - 9 (Nessuna risposta, SR)","Famiglia","1","v2"
"Q1","Per favore, indica quanto sia importante per te ciascuno dei seguenti aspetti nella tua vita.","1 (molto importante) - 2 (abbastanza importante) - 3 (non importante) - 4 (per niente importante) - 8 (Non so, NS) - 9 (Nessuna risposta, SR)","Amici e conoscenti","1","v3"
"Q1","Per favore, indica quanto sia importante per te ciascuno dei seguenti aspetti nella tua vita.","1 (molto importante) - 2 (abbastanza importante) - 3 (non importante) - 4 (per niente importante) - 8 (Non so, NS) - 9 (Nessuna risposta, SR)","Tempo libero","1","v4"
"Q1","Per favore, indica quanto sia importante per te ciascuno dei seguenti aspetti nella tua vita.","1 (molto importante) - 2 (abbastanza importante) - 3 (non importante) - 4 (per niente importante) - 8 (Non so, NS) - 9 (Nessuna risposta, SR)","Politica","1","v5"
"Q1","Per favore, indica quanto sia importante per te ciascuno dei seguenti aspetti nella tua vita.","1 (molto importante) - 2 (abbastanza importante) - 3 (non importante) - 4 (per niente importante) - 8 (Non so, NS) - 9 (Nessuna risposta, SR)","Religione","1","v6"
Assicurati che ogni campo di colonna sia tra virgolette per facilitare l'analisi di testi che includono virgole.
Genera il contenuto del file CSV direttamente con un formato adeguato e nomi di colonna: Question_Number,Question_Text,Response_Scale,Response_Option,Score,Variable_Name
"""

print("=" * 80)
print("Querying llama.cpp with PDF extraction request")
print("=" * 80)
print(f"\nPDF File: {pdf_path}")
print(f"Prompt: {extraction_prompt[:150]}...\n")

# Query the LLM with the PDF
response = query_llm_with_pdf(
    pdf_path,
    extraction_prompt,
    temperature=0.3,  # Lower temperature for more consistent extraction
    max_tokens=-1,  # No max token limit
    timeout=300000 # Longer timeout for PDF processing
)

if response:
    print("\n" + "=" * 80)
    print("LLM Response (CSV Content):")
    print("=" * 80 + "\n")
    print(response)
    
    # Save response to file with same name as PDF but with .csv extension
    output_file = os.path.splitext(pdf_path)[0] + ".csv"
    with open(output_file, "w") as f:
        f.write(response)
    print(f"\n✓ Response saved to: {output_file}")
else:
    print("\n✗ Failed to get response from llama.cpp")

Querying llama.cpp with PDF extraction request

PDF File: ./Surveys/ZA7500_q_it.pdf
Prompt: Extract the main questions and response choices from the attached PDF into a CSV file using the pdf language (Italian).
Make sure that for the provide...

Sending PDF: ZA7500_q_it.pdf
Extracted text: 65801 characters
Prompt: Extract the main questions and response choices from the attached PDF into a CSV file using the pdf ...


In [None]:
# Query llama.cpp with ZA7500_q_gb.pdf and extraction prompt
pdf_path = "./Surveys/ZA7500_q_ba.pdf"

extraction_prompt = """Extract the main questions and response choices from the attached PDF into a CSV file using the pdf language (Serbian).
Make sure that for the provided answer choices you include the choices available in the corresponding card (reported in subsequent pages in the same document).
Include also the text of the questions along with the Question number, e.g., Q1: Please say, for each of the following, how important it is in your life. 
Include a column 'Response_Scale' the scale and the category it belongs the number, e.g., for Q1 the scale or possible options should be: 1 (very important) 
- 2 (quite important) - 3 (not important) - 4 (not at all important) - 8 (Don't Know, DK) - 9 (Not Answer, NA). Split every question if necessary by the response options. 
Extract the variables names v1, v2, v3, ... and so, and be sure that all questions have a variable name, response option and include them as a column of the csv. 
Be sure that every column field is quoted to make easy parsing for text including commas. 
Output the CSV content directly with proper formatting and column names Question Number,Question Text,Response_Scale,Response_Options,Card_Number,Variable_Name
"""

print("=" * 80)
print("Querying llama.cpp with PDF extraction request")
print("=" * 80)
print(f"\nPDF File: {pdf_path}")
print(f"Prompt: {extraction_prompt[:150]}...\n")

# Query the LLM with the PDF
response = query_llm_with_pdf(
    pdf_path,
    extraction_prompt,
    temperature=0.3,  # Lower temperature for more consistent extraction
    max_tokens=-1,  # No max token limit
    timeout=3000 # Longer timeout for PDF processing
)

if response:
    print("\n" + "=" * 80)
    print("LLM Response (CSV Content):")
    print("=" * 80 + "\n")
    print(response)
    
    # Save response to file with same name as PDF but with .csv extension
    output_file = os.path.splitext(pdf_path)[0] + ".csv"
    with open(output_file, "w") as f:
        f.write(response)
    print(f"\n✓ Response saved to: {output_file}")
else:
    print("\n✗ Failed to get response from llama.cpp")

Querying llama.cpp with PDF extraction request

PDF File: ./Surveys/ZA7500_q_it.pdf
Prompt: Extract the main questions and response choices from the attached PDF into a CSV file using the pdf language (Italian).
Make sure that for the provide...

Sending PDF: ZA7500_q_it.pdf
Extracted text: 65801 characters
Prompt: Extract the main questions and response choices from the attached PDF into a CSV file using the pdf ...

LLM Response (CSV Content):

```csv
"Question Number","Question Text","Response_Scale","Response_Options","Card_Number","Variable_Name"
"Q1","Please say, for each of the following, how important it is in your life.","1 (Very important) - 2 (Quite important) - 3 (Not important) - 4 (Not at all important) - 8 (Don't Know, DK) - 9 (Not Answer, NA)","1,2,3,4,8,9",1,"v1"
"Q1","Please say, for each of the following, how important it is in your life.","1 (Very important) - 2 (Quite important) - 3 (Not important) - 4 (Not at all important) - 8 (Don't Know, DK) - 9 (Not Answer, NA

In [None]:
# Query llama.cpp with ZA7500_q_gb.pdf and extraction prompt
pdf_path = "./Surveys/ZA7500_q_cz.pdf"

extraction_prompt = """Extract the main questions and response choices from the corresponding text into a CSV file, parsing the language Czech. 
Make sure that for the provided answer choices you include the choices available in the corresponding card (reported in subsequent pages in the same document).
Include also the text of the questions along with the Question number, e.g., Q1: Prosím, řekněte, jak moc je každý z následujících aspektů důležitý ve vašem životě. 
Include a column 'Response_Scale' the scale and the category it belongs the number, e.g., for Q1 the scale or possible options should be: 1 (velmi důležité) - 2 (docela důležité) - 3 (nedůležité) - 4 (vůbec nedůležité) - 8 (Nevím, NS) - 9 (Bez odpovědi, SR).
Split every question if necessary by the response options. Extract the variables names v1, v2, v3, ... and so, and be sure that all questions have a variable name, response option and include them as a column of the csv. 
Be sure that every column field is quoted to make easy parsing for text including commas. 
Output the CSV content directly with proper formatting and column names Question Number,Question Text,Response_Scale,Response_Options,Card_Number,Variable_Name
Example for Q1 in Czech: 
"Q1","Prosím, řekněte, jak moc je každý z následujících aspektů důležitý ve vašem životě.","1 (velmi důležité) - 2 (docela důležité) - 3 (nedůležité) - 4 (vůbec nedůležité) - 8 (Nevím, NS) - 9 (Bez odpovědi, SR)","Práce","1","v1"
"Q1","Prosím, řekněte, jak moc je každý z následujících aspektů důležitý ve vašem životě.","1 (velmi důležité) - 2 (docela důležité) - 3 (nedůležité) - 4 (vůbec nedůležité) - 8 (Nevím, NS) - 9 (Bez odpovědi, SR)","Rodina","1","v2"
"Q1","Prosím, řekněte, jak moc je každý z následujících aspektů důležitý ve vašem životě.","1 (velmi důležité) - 2 (docela důležité) - 3 (nedůležité) - 4 (vůbec nedůležité) - 8 (Nevím, NS) - 9 (Bez odpovědi, SR)","Přátelé a známí","1","v3"
"Q1","Prosím, řekněte, jak moc je každý z následujících aspektů důležitý ve vašem životě.","1 (velmi důležité) - 2 (docela důležité) - 3 (nedůležité) - 4 (vůbec nedůležité) - 8 (Nevím, NS) - 9 (Bez odpovědi, SR)","Volný čas","1","v4"
"Q1","Prosím, řekněte, jak moc je každý z následujících aspektů důležitý ve vašem životě.","1 (velmi důležité) - 2 (docela důležité) - 3 (nedůležité) - 4 (vůbec nedůležité) - 8 (Nevím, NS) - 9 (Bez odpovědi, SR)","Politika","1","v5"
"Q1","Prosím, řekněte, jak moc je každý z následujících aspektů důležitý ve vašem životě.","1 (velmi důležité) - 2 (docela důležité) - 3 (nedůležité) - 4 (vůbec nedůležité) - 8 (Nevím, NS) - 9 (Bez odpovědi, SR)","Náboženství","1","v6"
Make sure that each column field is enclosed in quotes to facilitate parsing text containing commas.
Generate the CSV file content directly in a suitable format and with the column names: Question Number, Question Text, Response_Scale, Response_Options, Card_Number, Variable_Name.
There are a total of 111 questions. Extract them all in order, Q1, Q2, ... Q111 and do not skip any.
"""

'''
extraction_prompt = """Extrahujte hlavní otázky a možnosti odpovědí z textu v původním jazyce a naformátujte informace do souboru CSV. Použijte španělštinu.
Ujistěte se, že zahrnete možnosti odpovědí dostupné na příslušné kartě (uvedené na následujících stránkách stejného dokumentu).
Zahrňte také text otázek spolu s číslem otázky, například: Q1: Prosím, řekněte, jak moc je každý z následujících aspektů důležitý ve vašem životě.
Zahrňte sloupec 'Response_Scale', který specifikuje stupnici a kategorii, do které číslo patří. Například pro první volbu v Q1 by měla být 'Response_Scale' (možné volby): 1 (velmi důležité) - 2 (docela důležité) - 3 (nedůležité) -
4 (vůbec nedůležité) - 8 (Nevím, NS) - 9 (Bez odpovědi, SR), a 'Response_Options' by měly být: Práce. V případě potřeby rozdělte každou otázku podle možností odpovědí.
Extrahujte názvy proměnných v1, v2, v3 a tak dále a ujistěte se, že všechny otázky mají název proměnné (tolik řádků, kolik proměnných), možnost odpovědi a že jsou zahrnuty jako sloupec v souboru CSV.
Příklad pro otázku Q1:
Q1,"Prosím, řekněte, jak moc je každý z následujících aspektů důležitý ve vašem životě.","1 (velmi důležité) - 2 (docela důležité) - 3 (nedůležité) - 4 (vůbec nedůležité) - 8 (Nevím, NS) - 9 (Bez odpovědi, SR)","Práce","1","v1"
Q1,"Prosím, řekněte, jak moc je každý z následujících aspektů důležitý ve vašem životě.","1 (velmi důležité) - 2 (docela důležité) - 3 (nedůležité) - 4 (vůbec nedůležité) - 8 (Nevím, NS) - 9 (Bez odpovědi, SR)","Rodina","1","v2"
Q1,"Prosím, řekněte, jak moc je každý z následujících aspektů důležitý ve vašem životě.","1 (velmi důležité) - 2 (docela důležité) - 3 (nedůležité) - 4 (vůbec nedůležité) - 8 (Nevím, NS) - 9 (Bez odpovědi, SR)","Přátelé a známí","1","v3"
Q1,"Prosím, řekněte, jak moc je každý z následujících aspektů důležitý ve vašem životě.","1 (velmi důležité) - 2 (docela důležité) - 3 (nedůležité) - 4 (vůbec nedůležité) - 8 (Nevím, NS) - 9 (Bez odpovědi, SR)","Volný čas","1","v4"
Q1,"Prosím, řekněte, jak moc je každý z následujících aspektů důležitý ve vašem životě.","1 (velmi důležité) - 2 (docela důležité) - 3 (nedůležité) - 4 (vůbec nedůležité) - 8 (Nevím, NS) - 9 (Bez odpovědi, SR)","Politika","1","v5"
Q1,"Prosím, řekněte, jak moc je každý z následujících aspektů důležitý ve vašem životě.","1 (velmi důležité) - 2 (docela důležité) - 3 (nedůležité) - 4 (vůbec nedůležité) - 8 (Nevím, NS) - 9 (Bez odpovědi, SR)","Náboženství","1","v6"
Ujistěte se, že každé pole sloupce je uzavřeno v uvozovkách, aby se usnadnilo parsování textu obsahujícího čárky.
Vygenerujte obsah souboru CSV přímo ve vhodném formátu a s názvy sloupců: Question Number,Question Text,Response_Scale,Response_Options,Card_Number,Variable_Name
Je celkem 111 otázek. Extrahujte je všechny.
"""
'''

print("=" * 80)
print("Querying llama.cpp with PDF extraction request")
print("=" * 80)
print(f"\nPDF File: {pdf_path}")
print(f"Prompt: {extraction_prompt[:150]}...\n")

# Query the LLM with the PDF
response = query_llm_with_pdf(
    pdf_path,
    extraction_prompt,
    temperature=0.3,  # Lower temperature for more consistent extraction
    max_tokens=-1,  # No max token limit
    timeout=30000 # Longer timeout for PDF processing
)

if response:
    print("\n" + "=" * 80)
    print("LLM Response (CSV Content):")
    print("=" * 80 + "\n")
    print(response)
    
    # Save response to file with same name as PDF but with .csv extension
    output_file = os.path.splitext(pdf_path)[0] + ".csv"
    with open(output_file, "w") as f:
        f.write(response)
    print(f"\n✓ Response saved to: {output_file}")
else:
    print("\n✗ Failed to get response from llama.cpp")

Querying llama.cpp with PDF extraction request

PDF File: ./Surveys/ZA7500_q_cz.pdf
Prompt: Extract the main questions and response choices from the corresponding text into a CSV file, parsing the language Czech. 
Make sure that for the provi...

Sending PDF: ZA7500_q_cz.pdf
Extracted text: 99895 characters
Prompt: Extract the main questions and response choices from the corresponding text into a CSV file, parsing...

LLM Response (CSV Content):

```csv
"Question Number","Question Text","Response_Scale","Response_Options","Card_Number","Variable_Name"
"Q1","Prosím, řekněte, jak moc je každý z následujících aspektů důležitý ve vašem životě.","1 (velmi důležité) - 2 (docela důležité) - 3 (nedůležité) - 4 (vůbec nedůležité) - 8 (Nevím, NS) - 9 (Bez odpovědi, SR)","Práce","1","v1"
"Q1","Prosím, řekněte, jak moc je každý z následujících aspektů důležitý ve vašem životě.","1 (velmi důležité) - 2 (docela důležité) - 3 (nedůležité) - 4 (vůbec nedůležité) - 8 (Nevím, NS) - 9 (Bez odpovědi, SR)"

In [None]:
# Query llama.cpp with ZA7500_q_gb.pdf and extraction prompt
pdf_path = "./Surveys/ZA7500_q_hr.pdf"

extraction_prompt = """Extract the main questions and response choices from the attached PDF into a CSV file using the pdf language (Croatian).
Make sure that for the provided answer choices you include the choices available in the corresponding card (reported in subsequent pages in the same document).
Include also the text of the questions along with the Question number, e.g., Q1: Please say, for each of the following, how important it is in your life. 
Include a column 'Response_Scale' the scale and the category it belongs the number, e.g., for Q1 the scale or possible options should be: 1 (very important) 
- 2 (quite important) - 3 (not important) - 4 (not at all important) - 8 (Don't Know, DK) - 9 (Not Answer, NA). Split every question if necessary by the response options. 
Extract the variables names v1, v2, v3, ... and so, and be sure that all questions have a variable name, response option and include them as a column of the csv. 
Be sure that every column field is quoted to make easy parsing for text including commas. 
Output the CSV content directly with proper formatting and column names Question Number,Question Text,Response_Scale,Response_Options,Card_Number,Variable_Name
Be sure than the parsing is done in the language of the images (Croatian).
"""

print("=" * 80)
print("Querying llama.cpp with PDF extraction request")
print("=" * 80)
print(f"\nPDF File: {pdf_path}")
print(f"Prompt: {extraction_prompt[:150]}...\n")

# Query the LLM with the PDF
response = query_llm_with_pdf(
    pdf_path,
    extraction_prompt,
    temperature=0.3,  # Lower temperature for more consistent extraction
    max_tokens=-1,  # No max token limit
    timeout=3000 # Longer timeout for PDF processing
)

if response:
    print("\n" + "=" * 80)
    print("LLM Response (CSV Content):")
    print("=" * 80 + "\n")
    print(response)
    
    # Save response to file with same name as PDF but with .csv extension
    output_file = os.path.splitext(pdf_path)[0] + ".csv"
    with open(output_file, "w") as f:
        f.write(response)
    print(f"\n✓ Response saved to: {output_file}")
else:
    print("\n✗ Failed to get response from llama.cpp")

Querying llama.cpp with PDF extraction request

PDF File: ./Surveys/ZA7500_q_hr.pdf
Prompt: Extract the main questions and response choices from the attached PDF into a CSV file using the pdf language (Croatian).
Make sure that for the provid...

Sending PDF: ZA7500_q_hr.pdf
Extracted text: 77943 characters
Prompt: Extract the main questions and response choices from the attached PDF into a CSV file using the pdf ...


Error: HTTPConnectionPool(host='127.0.0.1', port=10006): Read timed out. (read timeout=3000)

✗ Failed to get response from llama.cpp


Traceback (most recent call last):
  File "/home/id02619@hi.inet/.pyenv/versions/langchain/lib/python3.11/site-packages/urllib3/connectionpool.py", line 536, in _make_request
    response = conn.getresponse()
               ^^^^^^^^^^^^^^^^^^
  File "/home/id02619@hi.inet/.pyenv/versions/langchain/lib/python3.11/site-packages/urllib3/connection.py", line 464, in getresponse
    httplib_response = super().getresponse()
                       ^^^^^^^^^^^^^^^^^^^^^
  File "/home/id02619@hi.inet/.pyenv/versions/3.11.9/lib/python3.11/http/client.py", line 1395, in getresponse
    response.begin()
  File "/home/id02619@hi.inet/.pyenv/versions/3.11.9/lib/python3.11/http/client.py", line 325, in begin
    version, status, reason = self._read_status()
                              ^^^^^^^^^^^^^^^^^^^
  File "/home/id02619@hi.inet/.pyenv/versions/3.11.9/lib/python3.11/http/client.py", line 286, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
               ^^^^^^^^^^

In [None]:
# Query llama.cpp with ZA7500_q_gb.pdf and extraction prompt
pdf_path = "./Surveys/ZA7500_q_hu.pdf"

extraction_prompt = """Extract the main questions and response choices from the attached PDF into a CSV file using the pdf language (Hungarian).
Make sure that for the provided answer choices you include the choices available in the corresponding card (reported in subsequent pages in the same document).
Include also the text of the questions along with the Question number, e.g., Q1: Odgovorite, za svaki od dolje navedenih pojmova, koliku važnost zauzimaju u Vašem 
životu. Include a column 'Response_Scale' the scale and the category it belongs the number, e.g., for Q1 the scale or possible options should be: 1 (very important) 
- 2 (quite important) - 3 (not important) - 4 (not at all important) - 8 (Don't Know, DK) - 9 (Not Answer, NA) but in Hungarian. Split every question if necessary by the response options. 
Extract the variables names v1, v2, v3, ... and so, and be sure that all questions have a variable name, response option and include them as a column of the csv. 
Be sure that every column field is quoted to make easy parsing for text including commas. Do not translate and be sure than the parsing is done in the original language, that is, Hungarian language.
Output the CSV content directly with proper formatting and column names Question Number,Question Text,Response_Scale,Response_Options,Card_Number,Variable_Name
For example for question Q1 in Hungarian:
"Q1","Kérem mondja meg, hogy a következők mennyire fontosak az Ön életében?","1 - nagyon fontos, 2 - elég fontos, 3 - nem fontos, 4 - egyáltalán nem fontos, 8 - Nem tudom, 9 - Nem válaszolt","Munka","1","v1"
"Q1","Kérem mondja meg, hogy a következők mennyire fontosak az Ön életében?","1 - nagyon fontos, 2 - elég fontos, 3 - nem fontos, 4 - egyáltalán nem fontos, 8 - Nem tudom, 9 - Nem válaszolt","Család","1","v2"
"Q1","Kérem mondja meg, hogy a következők mennyire fontosak az Ön életében?","1 - nagyon fontos, 2 - elég fontos, 3 - nem fontos, 4 - egyáltalán nem fontos, 8 - Nem tudom, 9 - Nem válaszolt","Barátok és ismerősök","1","v3"
"Q1","Kérem mondja meg, hogy a következők mennyire fontosak az Ön életében?","1 - nagyon fontos, 2 - elég fontos, 3 - nem fontos, 4 - egyáltalán nem fontos, 8 - Nem tudom, 9 - Nem válaszolt","Szabadidő","1","v4"
"Q1","Kérem mondja meg, hogy a következők mennyire fontosak az Ön életében?","1 - nagyon fontos, 2 - elég fontos, 3 - nem fontos, 4 - egyáltalán nem fontos, 8 - Nem tudom, 9 - Nem válaszolt","Politika","1","v5"
"Q1","Kérem mondja meg, hogy a következők mennyire fontosak az Ön életében?","1 - nagyon fontos, 2 - elég fontos, 3 - nem fontos, 4 - egyáltalán nem fontos, 8 - Nem tudom, 9 - Nem válaszolt","Vallás","1","v6"
Make sure that each column field is enclosed in quotes to facilitate parsing text containing commas.
Generate the CSV file content directly in a suitable format and with the column names: Question Number, Question Text, Response_Scale, Response_Options, Card_Number, Variable_Name.
There are a total of 111 questions. Extract them all in order, Q1, Q2, ... Q111 and do not skip any.
"""

print("=" * 80)
print("Querying llama.cpp with PDF extraction request")
print("=" * 80)
print(f"\nPDF File: {pdf_path}")
print(f"Prompt: {extraction_prompt[:150]}...\n")

# Query the LLM with the PDF
response = query_llm_with_pdf(
    pdf_path,
    extraction_prompt,
    temperature=0.3,  # Lower temperature for more consistent extraction
    max_tokens=-1,  # No max token limit
    timeout=30000 # Longer timeout for PDF processing
)

if response:
    print("\n" + "=" * 80)
    print("LLM Response (CSV Content):")
    print("=" * 80 + "\n")
    print(response)
    
    # Save response to file with same name as PDF but with .csv extension
    output_file = os.path.splitext(pdf_path)[0] + ".csv"
    with open(output_file, "w") as f:
        f.write(response)
    print(f"\n✓ Response saved to: {output_file}")
else:
    print("\n✗ Failed to get response from llama.cpp")

Querying llama.cpp with PDF extraction request

PDF File: ./Surveys/ZA7500_q_hu.pdf
Prompt: Extract the main questions and response choices from the attached PDF into a CSV file using the pdf language (Hungarian).
Make sure that for the provi...

Sending PDF: ZA7500_q_hu.pdf
Extracted text: 84627 characters
Prompt: Extract the main questions and response choices from the attached PDF into a CSV file using the pdf ...

LLM Response (CSV Content):

```csv
"Question Number","Question Text","Response_Scale","Response_Options","Card_Number","Variable_Name"
"Q1","Kérem mondja meg, hogy a következők mennyire fontosak az Ön életében?","1 - nagyon fontos, 2 - elég fontos, 3 - nem fontos, 4 - egyáltalán nem fontos, 8 - Nem tudom, 9 - Nem válaszolt","Munka","1","v1"
"Q1","Kérem mondja meg, hogy a következők mennyire fontosak az Ön életében?","1 - nagyon fontos, 2 - elég fontos, 3 - nem fontos, 4 - egyáltalán nem fontos, 8 - Nem tudom, 9 - Nem válaszolt","Család","1","v2"
"Q1","Kérem mondja meg, h

In [None]:
# Query llama.cpp with ZA7500_q_gb.pdf and extraction prompt
pdf_path = "./Surveys/ZA7500_q_rs.pdf"

"""Extract the main questions and response choices from the attached PDF into a CSV file using the pdf language (Serbian).
Make sure that for the provided answer choices you include the choices available in the corresponding card (reported in subsequent pages in the same document).
Include also the text of the questions along with the Question number, e.g., Q1: Odgovorite, za svaki od dolje navedenih pojmova, koliku važnost zauzimaju u Vašem 
životu. Include a column 'Response_Scale' the scale and the category it belongs the number, e.g., for Q1 the scale or possible options should be: 1 (very important) 
- 2 (quite important) - 3 (not important) - 4 (not at all important) - 8 (Don't Know, DK) - 9 (Not Answer, NA) but in Serbian. Split every question if necessary by the response options. 
Extract the variables names v1, v2, v3, ... and so, and be sure that all questions have a variable name, response option and include them as a column of the csv. 
Be sure that every column field is quoted to make easy parsing for text including commas. Do not translate and be sure than the parsing is done in the original language, that is, Hungarian language.
Output the CSV content directly with proper formatting and column names Question Number,Question Text,Response_Scale,Response_Options,Card_Number,Variable_Name
For example for question Q1 in Serbian:
Q1,"Molimo Vas da nam odgovorite koliki značaj svaki od dole navedenih pojmova zauzima u Vašem životu.","1 (vrlo važno) - 2 (uglavnom vaţno) - 3 (nevaţno) - 4 (potpuno nevaţno) - 8 (Ne znam, NZNM) - 9 (Nisam odgovorio, NO)","posao","1","v1"
Q1,"Molimo Vas da nam odgovorite koliki značaj svaki od dole navedenih pojmova zauzima u Vašem životu.","1 (vrlo važno) - 2 (uglavnom vaţno) - 3 (nevaţno) - 4 (potpuno nevaţno) - 8 (Ne znam, NZNM) - 9 (Nisam odgovorio, NO)","porodica","1","v2"
Q1,"Molimo Vas da nam odgovorite koliki značaj svaki od dole navedenih pojmova zauzima u Vašem životu.","1 (vrlo važno) - 2 (uglavnom vaţno) - 3 (nevaţno) - 4 (potpuno nevaţno) - 8 (Ne znam, NZNM) - 9 (Nisam odgovorio, NO)","prijatelji i poznanici","1","v3"
Q1,"Molimo Vas da nam odgovorite koliki značaj svaki od dole navedenih pojmova zauzima u Vašem životu.","1 (vrlo važno) - 2 (uglavnom vaţno) - 3 (nevaţno) - 4 (potpuno nevaţno) - 8 (Ne znam, NZNM) - 9 (Nisam odgovorio, NO)","slobodno vreme","1","v4"
Q1,"Molimo Vas da nam odgovorite koliki značaj svaki od dole navedenih pojmova zauzima u Vašem životu.","1 (vrlo važno) - 2 (uglavnom vaţno) - 3 (nevaţno) - 4 (potpuno nevaţno) - 8 (Ne znam, NZNM) - 9 (Nisam odgovorio, NO)","politika","1","v5"
Q1,"Molimo Vas da nam odgovorite koliki značaj svaki od dole navedenih pojmova zauzima u Vašem životu.","1 (vrlo važno) - 2 (uglavnom vaţno) - 3 (nevaţno) - 4 (potpuno nevaţno) - 8 (Ne znam, NZNM) - 9 (Nisam odgovorio, NO)","vera","1","v6"
Make sure that each column field is enclosed in quotes to facilitate parsing text containing commas.
Generate the CSV file content directly in a suitable format and with the column names: Question Number, Question Text, Response_Scale, Response_Options, Card_Number, Variable_Name.
There are a total of 111 questions. Extract them all in order, Q1, Q2, ... Q111 and do not skip any.
"""

'''
extraction_prompt = """Izvucite glavna pitanja i opcije odgovora iz teksta na izvornom jeziku i formatirajte informacije u CSV datoteku. Koristite španski.
Obavezno uključite opcije odgovora dostupne na odgovarajućoj kartici (navedene na sledećim stranicama istog dokumenta).
Takođe uključite tekst pitanja zajedno sa brojem pitanja, na primer: Q1: Molim vas, reci, koliko je svaki od sledećih aspekata važan u tvom životu.
Uključite kolonu 'Skala_Odgovora' koja specificira skalu i kategoriju kojoj pripada broj. Na primer, za prvu opciju u Q1, 'Skala_Odgovora' (moguće opcije) bi trebalo da bude: 1 (vrlo važno) - 2 (prilično važno) - 3 (nije važno) - 4 (nikako važno) - 8 (Ne znam, NS) - 9 (Nisam odgovorio, SR), a 'Opcije_Odgovora' bi trebalo da budu: Rad. Podelite svako pitanje, ako je potrebno, prema opcijama odgovora.
Izvucite nazive promenljivih v1, v2, v3, i tako dalje, i obezbedite da sva pitanja imaju naziv promenljive (toliko redova koliko promenljivih), opciju odgovora i da su uključena kao kolona u CSV datoteci.
"Question_Number","Question_Text","Response_Scale","Response_Option","Score","Variable_Name"
Primer za pitanje Q1:
Q1,"Molimo Vas da nam odgovorite koliki značaj svaki od dole navedenih pojmova zauzima u Vašem životu.","1 (vrlo važno) - 2 (uglavnom vaţno) - 3 (nevaţno) - 4 (potpuno nevaţno) - 8 (Ne znam, NZNM) - 9 (Nisam odgovorio, NO)","posao","1","v1"
Q1,"Molimo Vas da nam odgovorite koliki značaj svaki od dole navedenih pojmova zauzima u Vašem životu.","1 (vrlo važno) - 2 (uglavnom vaţno) - 3 (nevaţno) - 4 (potpuno nevaţno) - 8 (Ne znam, NZNM) - 9 (Nisam odgovorio, NO)","porodica","1","v2"
Q1,"Molimo Vas da nam odgovorite koliki značaj svaki od dole navedenih pojmova zauzima u Vašem životu.","1 (vrlo važno) - 2 (uglavnom vaţno) - 3 (nevaţno) - 4 (potpuno nevaţno) - 8 (Ne znam, NZNM) - 9 (Nisam odgovorio, NO)","prijatelji i poznanici","1","v3"
Q1,"Molimo Vas da nam odgovorite koliki značaj svaki od dole navedenih pojmova zauzima u Vašem životu.","1 (vrlo važno) - 2 (uglavnom vaţno) - 3 (nevaţno) - 4 (potpuno nevaţno) - 8 (Ne znam, NZNM) - 9 (Nisam odgovorio, NO)","slobodno vreme","1","v4"
Q1,"Molimo Vas da nam odgovorite koliki značaj svaki od dole navedenih pojmova zauzima u Vašem životu.","1 (vrlo važno) - 2 (uglavnom vaţno) - 3 (nevaţno) - 4 (potpuno nevaţno) - 8 (Ne znam, NZNM) - 9 (Nisam odgovorio, NO)","politika","1","v5"
Q1,"Molimo Vas da nam odgovorite koliki značaj svaki od dole navedenih pojmova zauzima u Vašem životu.","1 (vrlo važno) - 2 (uglavnom vaţno) - 3 (nevaţno) - 4 (potpuno nevaţno) - 8 (Ne znam, NZNM) - 9 (Nisam odgovorio, NO)","vera","1","v6"
Uverite se da je svako polje kolone zatvoreno u navodnicima kako bi se olakšalo parsiranje teksta koji sadrži zareze.
Generišite sadržaj CSV datoteke direktno u pogodnom formatu i sa nazivima kolona: Question Number,Question Text,Response_Scale,Response_Options,Card_Number,Variable_Name
"""
'''

print("=" * 80)
print("Querying llama.cpp with PDF extraction request")
print("=" * 80)
print(f"\nPDF File: {pdf_path}")
print(f"Prompt: {extraction_prompt[:150]}...\n")

# Query the LLM with the PDF
response = query_llm_with_pdf(
    pdf_path,
    extraction_prompt,
    temperature=0.3,  # Lower temperature for more consistent extraction
    max_tokens=-1,  # No max token limit
    timeout=30000 # Longer timeout for PDF processing
)

if response:
    print("\n" + "=" * 80)
    print("LLM Response (CSV Content):")
    print("=" * 80 + "\n")
    print(response)
    
    # Save response to file with same name as PDF but with .csv extension
    output_file = os.path.splitext(pdf_path)[0] + ".csv"
    with open(output_file, "w") as f:
        f.write(response)
    print(f"\n✓ Response saved to: {output_file}")
else:
    print("\n✗ Failed to get response from llama.cpp")

Querying llama.cpp with PDF extraction request

PDF File: ./Surveys/ZA7500_q_rs.pdf
Prompt: Extract the main questions and response choices from the attached PDF into a CSV file using the pdf language (Hungarian).
Make sure that for the provi...

Sending PDF: ZA7500_q_rs.pdf
Extracted text: 65564 characters
Prompt: Extract the main questions and response choices from the attached PDF into a CSV file using the pdf ...


In [None]:
# Query llama.cpp with ZA7500_q_gb.pdf and extraction prompt
pdf_path = "./Surveys/ZA7500_q_ru.pdf"

extraction_prompt = """Extract the main questions and response choices from the attached PDF into a CSV file using the pdf language (Russian).
Make sure that for the provided answer choices you include the choices available in the corresponding card (reported in subsequent pages in the same document).
Include also the text of the questions along with the Question number, e.g., Q1: Для каждой из сторон жизни, которые я сейчас перечислю, скажите, пожалуйста, является ли она для Вас 
очень важной; довольно важной; не важной или совсем не важной?. Include a column 'Response_Scale' the scale and the category it belongs the number, e.g., for Q1 the scale or possible options should be: 1 (Очень важно) 
- 2 (Довольно важно) - 3 (не важно) - 4 (совсем не важно) - 8 (Затр. ответить, /НЕ ЧИТАТЬ/) - 9 (Отказ, /НЕ ЧИТАТЬ/). Split every question if necessary by the response options. 
Extract the variables names v1, v2, v3, ... and so, and be sure that all questions have a variable name, response option and include them as a column of the csv. 
Be sure that every column field is quoted to make easy parsing for text including commas. Do not translate and be sure than the parsing is done in the original language, that is, Hungarian language.
Output the CSV content directly with proper formatting and column names Question Number,Question Text,Response_Scale,Response_Options,Card_Number,Variable_Name
For example for question Q1 in Russian:
"Q1","Для каждой из сторон жизни, которые я сейчас перечислю, скажите, пожалуйста, является ли она для Вас очень важной; довольно важной; не важной или совсем не важной?","1 (Очень важно) - 2 (Довольно важно) - 3 (не важно) - 4 (совсем не важно) - 8 (Затр. ответить, /НЕ ЧИТАТЬ/) - 9 (Отказ, /НЕ ЧИТАТЬ/)","Работа","1","v1"
"Q1","Для каждой из сторон жизни, которые я сейчас перечислю, скажите, пожалуйста, является ли она для Вас очень важной; довольно важной; не важной или совсем не важной?","1 (Очень важно) - 2 (Довольно важно) - 3 (не важно) - 4 (совсем не важно) - 8 (Затр. ответить, /НЕ ЧИТАТЬ/) - 9 (Отказ, /НЕ ЧИТАТЬ/)","Семья","1","v2"
"Q1","Для каждой из сторон жизни, которые я сейчас перечислю, скажите, пожалуйста, является ли она для Вас очень важной; довольно важной; не важной или совсем не важной?","1 (Очень важно) - 2 (Довольно важно) - 3 (не важно) - 4 (совсем не важно) - 8 (Затр. ответить, /НЕ ЧИТАТЬ/) - 9 (Отказ, /НЕ ЧИТАТЬ/)","Друзья и знакомые","1","v3"
"Q1","Для каждой из сторон жизни, которые я сейчас перечислю, скажите, пожалуйста, является ли она для Вас очень важной; довольно важной; не важной или совсем не важной?","1 (Очень важно) - 2 (Довольно важно) - 3 (не важно) - 4 (совсем не важно) - 8 (Затр. ответить, /НЕ ЧИТАТЬ/) - 9 (Отказ, /НЕ ЧИТАТЬ/)","Свободное время","1","v4"
"Q1","Для каждой из сторон жизни, которые я сейчас перечислю, скажите, пожалуйста, является ли она для Вас очень важной; довольно важной; не важной или совсем не важной?","1 (Очень важно) - 2 (Довольно важно) - 3 (не важно) - 4 (совсем не важно) - 8 (Затр. ответить, /НЕ ЧИТАТЬ/) - 9 (Отказ, /НЕ ЧИТАТЬ/)","Политика","1","v5"
"Q1","Для каждой из сторон жизни, которые я сейчас перечислю, скажите, пожалуйста, является ли она для Вас очень важной; довольно важной; не важной или совсем не важной?","1 (Очень важно) - 2 (Довольно важно) - 3 (не важно) - 4 (совсем не важно) - 8 (Затр. ответить, /НЕ ЧИТАТЬ/) - 9 (Отказ, /НЕ ЧИТАТЬ/)","Религия","1","v6"
Make sure that each column field is enclosed in quotes to facilitate parsing text containing commas.
Generate the CSV file content directly in a suitable format and with the column names: Question Number, Question Text, Response_Scale, Response_Options, Card_Number, Variable_Name.
There are a total of 111 questions and 286 variables. Extract them all in order, Q1, Q2, ... Q111 and do not skip any. The csv should contain 286 rows.
"""

print("=" * 80)
print("Querying llama.cpp with PDF extraction request")
print("=" * 80)
print(f"\nPDF File: {pdf_path}")
print(f"Prompt: {extraction_prompt[:150]}...\n")

# Query the LLM with the PDF
response = query_llm_with_pdf(
    pdf_path,
    extraction_prompt,
    temperature=0.3,  # Lower temperature for more consistent extraction
    max_tokens=-1,  # No max token limit
    timeout=300000 # Longer timeout for PDF processing
)

if response:
    print("\n" + "=" * 80)
    print("LLM Response (CSV Content):")
    print("=" * 80 + "\n")
    print(response)
    
    # Save response to file with same name as PDF but with .csv extension
    output_file = os.path.splitext(pdf_path)[0] + ".csv"
    with open(output_file, "w") as f:
        f.write(response)
    print(f"\n✓ Response saved to: {output_file}")
else:
    print("\n✗ Failed to get response from llama.cpp")

Querying llama.cpp with PDF extraction request

PDF File: ./Surveys/ZA7500_q_ru.pdf
Prompt: Extract the main questions and response choices from the attached PDF into a CSV file using the pdf language (Russian).
Make sure that for the provide...

Sending PDF: ZA7500_q_ru.pdf
Extracted text: 95286 characters
Prompt: Extract the main questions and response choices from the attached PDF into a CSV file using the pdf ...

LLM Response (CSV Content):

```csv
"Question Number","Question Text","Response_Scale","Response_Options","Card_Number","Variable_Name"
"Q1","Для каждой из сторон жизни, которые я сейчас перечислю, скажите, пожалуйста, является ли она для Вас очень важной; довольно важной; не важной или совсем не важной?","1 (Очень важно) - 2 (Довольно важно) - 3 (не важно) - 4 (совсем не важно) - 8 (Затр. ответить, /НЕ ЧИТАТЬ/) - 9 (Отказ, /НЕ ЧИТАТЬ/)","Работа","1","v1"
"Q1","Для каждой из сторон жизни, которые я сейчас перечислю, скажите, пожалуйста, является ли она для Вас очень важно

## Generate Responses to Surveys

In [None]:
import pandas as pd

llm_name="apertus"
language="gb"

# Load questions from new CSV file
csv_file_path = f"/home/id02619@hi.inet/repos/eloquence/eloquence/WP1/euvalues/Surveys_parsed/ZA7500_q_{language}.csv"

# Read the CSV
df = pd.read_csv(csv_file_path)

print(f"Loaded CSV with {len(df)} rows")
print(f"Columns: {list(df.columns)}\n")

# Display first few rows
print("First 5 rows of the CSV:")
print(df.head())

#df = df[:10]


Loaded CSV with 229 rows
Columns: ['Question Number', 'Question Text', 'Response_Scale', 'Response_Options', 'Card_Number', 'Variable_Name']

First 5 rows of the CSV:
  Question Number                                      Question Text  \
0              Q1  Please say, for each of the following, how imp...   
1              Q1  Please say, for each of the following, how imp...   
2              Q1  Please say, for each of the following, how imp...   
3              Q1  Please say, for each of the following, how imp...   
4              Q1  Please say, for each of the following, how imp...   

                                      Response_Scale  \
0  1 (very important) - 2 (quite important) - 3 (...   
1  1 (very important) - 2 (quite important) - 3 (...   
2  1 (very important) - 2 (quite important) - 3 (...   
3  1 (very important) - 2 (quite important) - 3 (...   
4  1 (very important) - 2 (quite important) - 3 (...   

            Response_Options Card_Number Variable_Name  
0     

In [None]:
# Prepare detailed query items from the new CSV
query_items = []

for idx, row in df.iterrows():
    query_items.append({
        "question_id": row.get('Question Number', ''),
        "question_text": row.get('Question Text', ''),
        "variable": row.get('Variable_Name', ''),
        "option_text": row.get('Response_Options', ''),
        "response_scale": row.get('Response_Scale', ''),
        "csv_index": idx
    })

print(f"Prepared {len(query_items)} query items from the new CSV (each with variable + option + scale):\n")
for i, item in enumerate(query_items[:5], 1):
    print(f"  {i}. [{item['question_id']}] {item['question_text']}")
    print(f"     Variable: {item['variable']} | Option: {item['option_text']} | Scale: {item['response_scale']}")
print(f"  ... and {len(query_items) - 5} more items\n")


Prepared 229 query items from the new CSV (each with variable + option + scale):

  1. [Q1] Please say, for each of the following, how important it is in your life.
     Variable: v1 | Option: Work | Scale: 1 (very important) - 2 (quite important) - 3 (not important) - 4 (not at all important) - 8 (Don't Know, DK) - 9 (Not Answer, NA)
  2. [Q1] Please say, for each of the following, how important it is in your life.
     Variable: v2 | Option: Family | Scale: 1 (very important) - 2 (quite important) - 3 (not important) - 4 (not at all important) - 8 (Don't Know, DK) - 9 (Not Answer, NA)
  3. [Q1] Please say, for each of the following, how important it is in your life.
     Variable: v3 | Option: Friends and acquaintances | Scale: 1 (very important) - 2 (quite important) - 3 (not important) - 4 (not at all important) - 8 (Don't Know, DK) - 9 (Not Answer, NA)
  4. [Q1] Please say, for each of the following, how important it is in your life.
     Variable: v4 | Option: Leisure time | Scal

In [None]:
# Query the LLM with detailed prompts allowing thinking and reasoning
results = []

print("\n" + "=" * 80)
print("Querying LLM with Survey Questions (With Model Reasoning)")
print("=" * 80 + "\n")

for idx, item in enumerate(query_items, 1):
    # Build a detailed prompt that allows the model to think and reason
    detailed_prompt = f"""You are answering a survey question. Read carefully and provide:
1. Your reasoning about the question and the specific aspect being asked
2. Your answer from the provided response scale

Question [{item['question_id']}]: {item['question_text']}
Aspect: {item['variable']}
Option: {item['option_text']}
Response Scale: {item['response_scale']}

Please think through this carefully and provide:
- Your reasoning (explain why you chose your answer)
- Your final answer from the scale above"""

    print(f"[{idx}/{len(query_items)}] {item['question_id']} - {item['variable']}: {item['option_text']}")
    
    answer = query_llm(detailed_prompt, temperature=0.7, max_tokens=5000)
    
    if answer:
        results.append({
            "question_id": item['question_id'],
            "question_text": item['question_text'],
            "variable": item['variable'],
            "option_text": item['option_text'],
            "response_scale": item['response_scale'],
            "model_response": answer.strip(),
            "status": "success"
        })
        print(f"  ✓ Response received ({len(answer.strip())} chars)\n")
    else:
        results.append({
            "question_id": item['question_id'],
            "question_text": item['question_text'],
            "variable": item['variable'],
            "option_text": item['option_text'],
            "response_scale": item['response_scale'],
            "model_response": None,
            "status": "failed"
        })
        print(f"  ✗ Failed to get answer.\n")
    
    # Small delay between requests to avoid overwhelming the server
    time.sleep(0.2)

print("=" * 80)
print(f"Completed: {sum(1 for r in results if r['status'] == 'success')}/{len(results)} items answered")
print("=" * 80)



Querying LLM with Survey Questions (With Model Reasoning)

[1/229] Q1 - v1: Work
Querying: You are answering a survey question. Read carefully and provide:
1. Your reasoni...
  ✓ Response received (926 chars)

[2/229] Q1 - v2: Family
Querying: You are answering a survey question. Read carefully and provide:
1. Your reasoni...
  ✓ Response received (951 chars)

[3/229] Q1 - v3: Friends and acquaintances
Querying: You are answering a survey question. Read carefully and provide:
1. Your reasoni...
  ✓ Response received (1271 chars)

[4/229] Q1 - v4: Leisure time
Querying: You are answering a survey question. Read carefully and provide:
1. Your reasoni...
  ✓ Response received (894 chars)

[5/229] Q1 - v5: Politics
Querying: You are answering a survey question. Read carefully and provide:
1. Your reasoni...
  ✓ Response received (1041 chars)

[6/229] Q1 - v6: Religion
Querying: You are answering a survey question. Read carefully and provide:
1. Your reasoni...
  ✓ Response received (961 c

In [None]:
# Display results and save in machine-readable format
import pandas as pd

# Create a DataFrame for easy parsing
results_df = pd.DataFrame(results)

print("\nResults Summary:")
print(results_df.head())

# Save to CSV for easy parsing
results_df.to_csv(f"llm_survey_{llm_name}_responses_{language}.csv", index=False)
print(f"Results saved to: llm_survey_{llm_name}_responses_{language}.csv")

# Save full results to JSON
with open(f"llm_survey_{llm_name}_responses_{language}.json", "w") as f:
    json.dump(results, f, indent=2)
print(f"Full results also saved to: llm_survey_{llm_name}_responses_{language}.json")

NameError: name 'results' is not defined

In [None]:
import pandas as pd

language="es"

# Load questions from new CSV file
csv_file_path = f"/home/id02619@hi.inet/repos/eloquence/eloquence/WP1/euvalues/Surveys_parsed/ZA7500_q_{language}.csv"

# Read the CSV
df = pd.read_csv(csv_file_path) #, delimiter=',', quotechar='"')

print(f"Loaded CSV with {len(df)} rows")
print(f"Columns: {list(df.columns)}\n")

# Display first few rows
print("First 5 rows of the CSV:")
print(df.head())

#df = df[:2]


Loaded CSV with 184 rows
Columns: ['Question Number', 'Question Text', 'Response_Scale', 'Response_Options', 'Card_Number', 'Variable_Name']

First 5 rows of the CSV:
  Question Number                                      Question Text  \
0              Q1  Por favor, indica qué tan importante es cada u...   
1              Q1  Por favor, indica qué tan importante es cada u...   
2              Q1  Por favor, indica qué tan importante es cada u...   
3              Q1  Por favor, indica qué tan importante es cada u...   
4              Q1  Por favor, indica qué tan importante es cada u...   

                                      Response_Scale      Response_Options  \
0  1 (muy importante) - 2 (bastante importante) -...               Trabajo   
1  1 (muy importante) - 2 (bastante importante) -...               Familia   
2  1 (muy importante) - 2 (bastante importante) -...    Amigos y conocidos   
3  1 (muy importante) - 2 (bastante importante) -...  Tiempo libre/de ocio   
4  1 (muy 

In [None]:
# Prepare detailed query items from the new CSV
query_items = []

for idx, row in df.iterrows():
    query_items.append({
        "question_id": row.get('Question Number', ''),
        "question_text": row.get('Question Text', ''),
        "variable": row.get('Variable_Name', ''),
        "option_text": row.get('Response_Options', ''),
        "response_scale": row.get('Response_Scale', ''),
        "csv_index": idx
    })

print(f"Prepared {len(query_items)} query items from the new CSV (each with variable + option + scale):\n")
for i, item in enumerate(query_items[:5], 1):
    print(f"  {i}. [{item['question_id']}] {item['question_text']}")
    print(f"     Variable: {item['variable']} | Option: {item['option_text']} | Scale: {item['response_scale']}")
print(f"  ... and {len(query_items) - 5} more items\n")


Prepared 184 query items from the new CSV (each with variable + option + scale):

  1. [Q1] Por favor, indica qué tan importante es cada uno de los siguientes aspectos en tu vida.
     Variable: v1 | Option: Trabajo | Scale: 1 (muy importante) - 2 (bastante importante) - 3 (no muy importante) - 4 (nada importante) - 8 (No Sabe, NS) - 9 (Sin Respuesta, SR)
  2. [Q1] Por favor, indica qué tan importante es cada uno de los siguientes aspectos en tu vida.
     Variable: v2 | Option: Familia | Scale: 1 (muy importante) - 2 (bastante importante) - 3 (no muy importante) - 4 (nada importante) - 8 (No Sabe, NS) - 9 (Sin Respuesta, SR)
  3. [Q1] Por favor, indica qué tan importante es cada uno de los siguientes aspectos en tu vida.
     Variable: v3 | Option: Amigos y conocidos | Scale: 1 (muy importante) - 2 (bastante importante) - 3 (no muy importante) - 4 (nada importante) - 8 (No Sabe, NS) - 9 (Sin Respuesta, SR)
  4. [Q1] Por favor, indica qué tan importante es cada uno de los siguientes as

In [None]:
# Query the LLM with detailed prompts allowing thinking and reasoning
results = []

print("\n" + "=" * 80)
print("Querying LLM with Survey Questions (With Model Reasoning)")
print("=" * 80 + "\n")

for idx, item in enumerate(query_items, 1):
    # Build a detailed prompt that allows the model to think and reason
    detailed_prompt = f"""Estás respondiendo a una pregunta de una encuesta. Lee atentamente y proporciona lo siguiente:
1. Tu razonamiento sobre la pregunta y el aspecto específico que se está preguntando.
2. Tu respuesta de la escala de respuestas proporcionada.
Pregunta [{item['question_id']}]: {item['question_text']}
Aspecto: {item['variable']}
Opción: {item['option_text']}
Escala de Respuestas: {item['response_scale']}

Por favor, reflexiona cuidadosamente y proporciona:

Tu razonamiento (explica por qué elegiste tu respuesta)
Tu respuesta final en la escala anterior"""

    print(f"[{idx}/{len(query_items)}] {item['question_id']} - {item['variable']}: {item['option_text']}")
    
    answer = query_llm(detailed_prompt, temperature=0.7, max_tokens=256)
    
    if answer:
        results.append({
            "question_id": item['question_id'],
            "question_text": item['question_text'],
            "variable": item['variable'],
            "option_text": item['option_text'],
            "response_scale": item['response_scale'],
            "model_response": answer.strip(),
            "status": "success"
        })
        print(f"  ✓ Response received ({len(answer.strip())} chars)\n")
    else:
        results.append({
            "question_id": item['question_id'],
            "question_text": item['question_text'],
            "variable": item['variable'],
            "option_text": item['option_text'],
            "response_scale": item['response_scale'],
            "model_response": None,
            "status": "failed"
        })
        print(f"  ✗ Failed to get answer.\n")
    
    # Small delay between requests to avoid overwhelming the server
    time.sleep(0.2)

print("=" * 80)
print(f"Completed: {sum(1 for r in results if r['status'] == 'success')}/{len(results)} items answered")
print("=" * 80)



Querying LLM with Survey Questions (With Model Reasoning)

[1/184] Q1 - v1: Trabajo
Querying: Estás respondiendo a una pregunta de una encuesta. Lee atentamente y proporciona...
  ✓ Response received (916 chars)

[2/184] Q1 - v2: Familia
Querying: Estás respondiendo a una pregunta de una encuesta. Lee atentamente y proporciona...
  ✓ Response received (1068 chars)

[3/184] Q1 - v3: Amigos y conocidos
Querying: Estás respondiendo a una pregunta de una encuesta. Lee atentamente y proporciona...
  ✓ Response received (1076 chars)

[4/184] Q1 - v4: Tiempo libre/de ocio
Querying: Estás respondiendo a una pregunta de una encuesta. Lee atentamente y proporciona...
  ✓ Response received (972 chars)

[5/184] Q1 - v5: Política
Querying: Estás respondiendo a una pregunta de una encuesta. Lee atentamente y proporciona...
  ✓ Response received (1148 chars)

[6/184] Q1 - v6: Religión
Querying: Estás respondiendo a una pregunta de una encuesta. Lee atentamente y proporciona...
  ✓ Response received 

In [None]:
# Display results and save in machine-readable format
import pandas as pd

# Create a DataFrame for easy parsing
results_df = pd.DataFrame(results)

print("\nResults Summary:")
print(results_df.head())

# Save to CSV for easy parsing
results_df.to_csv(f"llm_survey_gemma_responses_{language}.csv", index=False)
print(f"Results saved to: llm_survey_gemma_responses_{language}.csv")

# Save full results to JSON
with open(f"llm_survey_gemma_responses_{language}.json", "w") as f:
    json.dump(results, f, indent=2)
print(f"Full results also saved to: llm_survey_gemma_responses_{language}.json")


Results Summary:
  question_id                                      question_text variable  \
0          Q1  Por favor, indica qué tan importante es cada u...       v1   
1          Q1  Por favor, indica qué tan importante es cada u...       v2   
2          Q1  Por favor, indica qué tan importante es cada u...       v3   
3          Q1  Por favor, indica qué tan importante es cada u...       v4   
4          Q1  Por favor, indica qué tan importante es cada u...       v5   

            option_text                                     response_scale  \
0               Trabajo  1 (muy importante) - 2 (bastante importante) -...   
1               Familia  1 (muy importante) - 2 (bastante importante) -...   
2    Amigos y conocidos  1 (muy importante) - 2 (bastante importante) -...   
3  Tiempo libre/de ocio  1 (muy importante) - 2 (bastante importante) -...   
4              Política  1 (muy importante) - 2 (bastante importante) -...   

                                      model_respon

In [None]:
language="it"
import pandas as pd

# Load questions from new CSV file
csv_file_path = "/home/id02619@hi.inet/repos/eloquence/eloquence/WP1/euvalues/Surveys_parsed/ZA7500_q_" + language + ".csv"

# Read the CSV
df = pd.read_csv(csv_file_path, delimiter=',') # quotechar='"')

print(f"Loaded CSV with {len(df)} rows")
print(f"Columns: {list(df.columns)}\n")

# Display first few rows
print("First 5 rows of the CSV:")
print(df.head())

#df = df[:2]


Loaded CSV with 212 rows
Columns: ['Question Number', 'Question Text', 'Response_Scale', 'Response_Options', 'Score', 'Variable_Name']

First 5 rows of the CSV:
  Question Number                                      Question Text  \
0              Q1  Per favore, indica quanto sia importante per t...   
1              Q1  Per favore, indica quanto sia importante per t...   
2              Q1  Per favore, indica quanto sia importante per t...   
3              Q1  Per favore, indica quanto sia importante per t...   
4              Q1  Per favore, indica quanto sia importante per t...   

                                      Response_Scale    Response_Options  \
0  1 (molto importante) - 2 (abbastanza important...              Lavoro   
1  1 (molto importante) - 2 (abbastanza important...            Famiglia   
2  1 (molto importante) - 2 (abbastanza important...  Amici e conoscenti   
3  1 (molto importante) - 2 (abbastanza important...        Tempo libero   
4  1 (molto importante) - 

In [None]:
# Prepare detailed query items from the new CSV
query_items = []

for idx, row in df.iterrows():
    query_items.append({
        "question_id": row.get('Question Number', ''),
        "question_text": row.get('Question Text', ''),
        "variable": row.get('Variable_Name', ''),
        "option_text": row.get('Response_Options', ''),
        "response_scale": row.get('Response_Scale', ''),
        "csv_index": idx
    })

print(f"Prepared {len(query_items)} query items from the new CSV (each with variable + option + scale):\n")
for i, item in enumerate(query_items[:5], 1):
    print(f"  {i}. [{item['question_id']}] {item['question_text']}")
    print(f"     Variable: {item['variable']} | Option: {item['option_text']} | Scale: {item['response_scale']}")
print(f"  ... and {len(query_items) - 5} more items\n")


Prepared 212 query items from the new CSV (each with variable + option + scale):

  1. [Q1] Per favore, indica quanto sia importante per te ciascuno dei seguenti aspetti nella tua vita.
     Variable: v1 | Option: Lavoro | Scale: 1 (molto importante) - 2 (abbastanza importante) - 3 (non importante) - 4 (per niente importante) - 8 (Non so, NS) - 9 (Nessuna risposta, SR)
  2. [Q1] Per favore, indica quanto sia importante per te ciascuno dei seguenti aspetti nella tua vita.
     Variable: v2 | Option: Famiglia | Scale: 1 (molto importante) - 2 (abbastanza importante) - 3 (non importante) - 4 (per niente importante) - 8 (Non so, NS) - 9 (Nessuna risposta, SR)
  3. [Q1] Per favore, indica quanto sia importante per te ciascuno dei seguenti aspetti nella tua vita.
     Variable: v3 | Option: Amici e conoscenti | Scale: 1 (molto importante) - 2 (abbastanza importante) - 3 (non importante) - 4 (per niente importante) - 8 (Non so, NS) - 9 (Nessuna risposta, SR)
  4. [Q1] Per favore, indica quant

In [None]:
# Query the LLM with detailed prompts allowing thinking and reasoning
results = []

print("\n" + "=" * 80)
print("Querying LLM with Survey Questions (With Model Reasoning)")
print("=" * 80 + "\n")

for idx, item in enumerate(query_items, 1):
    # Build a detailed prompt that allows the model to think and reason
    detailed_prompt = f"""Stai rispondendo a una domanda di un sondaggio. Leggi attentamente e fornisci quanto segue:

La tua motivazione riguardo alla domanda e all'aspetto specifico a cui si sta facendo riferimento.
La tua risposta dalla scala di risposte fornita.
Domanda [{item['question_id']}]: {item['question_text']}
Aspetto: {item['variable']}
Opzione: {item['option_text']}
Scala di Risposte: {item['response_scale']}

Per favore, rifletti attentamente e fornisci:

La tua motivazione (spiega perché hai scelto la tua risposta)
La tua risposta finale sulla scala sopra indicata"""

    print(f"[{idx}/{len(query_items)}] {item['question_id']} - {item['variable']}: {item['option_text']}")
    
    answer = query_llm(detailed_prompt, temperature=0.7, max_tokens=256)
    
    if answer:
        results.append({
            "question_id": item['question_id'],
            "question_text": item['question_text'],
            "variable": item['variable'],
            "option_text": item['option_text'],
            "response_scale": item['response_scale'],
            "model_response": answer.strip(),
            "status": "success"
        })
        print(f"  ✓ Response received ({len(answer.strip())} chars)\n")
    else:
        results.append({
            "question_id": item['question_id'],
            "question_text": item['question_text'],
            "variable": item['variable'],
            "option_text": item['option_text'],
            "response_scale": item['response_scale'],
            "model_response": None,
            "status": "failed"
        })
        print(f"  ✗ Failed to get answer.\n")
    
    # Small delay between requests to avoid overwhelming the server
    time.sleep(0.2)

print("=" * 80)
print(f"Completed: {sum(1 for r in results if r['status'] == 'success')}/{len(results)} items answered")
print("=" * 80)



Querying LLM with Survey Questions (With Model Reasoning)

[1/212] Q1 - v1: Lavoro
Querying: Stai rispondendo a una domanda di un sondaggio. Leggi attentamente e fornisci qu...
  ✓ Response received (568 chars)

[2/212] Q1 - v2: Famiglia
Querying: Stai rispondendo a una domanda di un sondaggio. Leggi attentamente e fornisci qu...
  ✓ Response received (574 chars)

[3/212] Q1 - v3: Amici e conoscenti
Querying: Stai rispondendo a una domanda di un sondaggio. Leggi attentamente e fornisci qu...
  ✓ Response received (572 chars)

[4/212] Q1 - v4: Tempo libero
Querying: Stai rispondendo a una domanda di un sondaggio. Leggi attentamente e fornisci qu...
  ✓ Response received (604 chars)

[5/212] Q1 - v5: Politica
Querying: Stai rispondendo a una domanda di un sondaggio. Leggi attentamente e fornisci qu...
  ✓ Response received (677 chars)

[6/212] Q1 - v6: Religione
Querying: Stai rispondendo a una domanda di un sondaggio. Leggi attentamente e fornisci qu...
  ✓ Response received (723 chars

In [None]:
# Display results and save in machine-readable format
import pandas as pd

# Create a DataFrame for easy parsing
results_df = pd.DataFrame(results)

print("\nResults Summary:")
print(results_df.head())

# Save to CSV for easy parsing
results_df.to_csv("llm_survey_gemma_responses_" + language + ".csv", index=False)
print("Results saved to: llm_survey_gemma_responses_" + language + ".csv")

# Save full results to JSON
with open("llm_survey_gemma_responses_" + language + ".json", "w") as f:
    json.dump(results, f, indent=2)
print("Full results also saved to: llm_survey_gemma_responses_" + language + ".json")


Results Summary:
  question_id                                      question_text variable  \
0          Q1  Per favore, indica quanto sia importante per t...       v1   
1          Q1  Per favore, indica quanto sia importante per t...       v2   
2          Q1  Per favore, indica quanto sia importante per t...       v3   
3          Q1  Per favore, indica quanto sia importante per t...       v4   
4          Q1  Per favore, indica quanto sia importante per t...       v5   

          option_text                                     response_scale  \
0              Lavoro  1 (molto importante) - 2 (abbastanza important...   
1            Famiglia  1 (molto importante) - 2 (abbastanza important...   
2  Amici e conoscenti  1 (molto importante) - 2 (abbastanza important...   
3        Tempo libero  1 (molto importante) - 2 (abbastanza important...   
4            Politica  1 (molto importante) - 2 (abbastanza important...   

                                      model_response   status 

In [None]:
language="cz"
import pandas as pd

# Load questions from new CSV file
csv_file_path = "/home/id02619@hi.inet/repos/eloquence/eloquence/WP1/euvalues/Surveys_parsed/ZA7500_q_" + language + ".csv"

# Read the CSV
df = pd.read_csv(csv_file_path) #, delimiter=',', quotechar='"')

print(f"Loaded CSV with {len(df)} rows")
print(f"Columns: {list(df.columns)}\n")

# Display first few rows
print("First 5 rows of the CSV:")
print(df.head())

#df = df[:2]


Loaded CSV with 202 rows
Columns: ['Question Number', 'Question Text', 'Response_Scale', 'Response_Options', 'Card_Number', 'Variable_Name']

First 5 rows of the CSV:
  Question Number                                      Question Text  \
0              Q1  Prosím, řekněte, jak moc je každý z následujíc...   
1              Q1  Prosím, řekněte, jak moc je každý z následujíc...   
2              Q1  Prosím, řekněte, jak moc je každý z následujíc...   
3              Q1  Prosím, řekněte, jak moc je každý z následujíc...   
4              Q1  Prosím, řekněte, jak moc je každý z následujíc...   

                                      Response_Scale Response_Options  \
0  1 (velmi důležité) - 2 (docela důležité) - 3 (...            Práce   
1  1 (velmi důležité) - 2 (docela důležité) - 3 (...           Rodina   
2  1 (velmi důležité) - 2 (docela důležité) - 3 (...  Přátelé a známí   
3  1 (velmi důležité) - 2 (docela důležité) - 3 (...        Volný čas   
4  1 (velmi důležité) - 2 (docela d

In [None]:
# Prepare detailed query items from the new CSV
query_items = []

for idx, row in df.iterrows():
    query_items.append({
        "question_id": row.get('Question Number', ''),
        "question_text": row.get('Question Text', ''),
        "variable": row.get('Variable_Name', ''),
        "option_text": row.get('Response_Options', ''),
        "response_scale": row.get('Response_Scale', ''),
        "csv_index": idx
    })

print(f"Prepared {len(query_items)} query items from the new CSV (each with variable + option + scale):\n")
for i, item in enumerate(query_items[:5], 1):
    print(f"  {i}. [{item['question_id']}] {item['question_text']}")
    print(f"     Variable: {item['variable']} | Option: {item['option_text']} | Scale: {item['response_scale']}")
print(f"  ... and {len(query_items) - 5} more items\n")


Prepared 202 query items from the new CSV (each with variable + option + scale):

  1. [Q1] Prosím, řekněte, jak moc je každý z následujících aspektů důležitý ve vašem životě.
     Variable: v1 | Option: Práce | Scale: 1 (velmi důležité) - 2 (docela důležité) - 3 (nedůležité) - 4 (vůbec nedůležité) - 8 (Nevím, NS) - 9 (Bez odpovědi, SR)
  2. [Q1] Prosím, řekněte, jak moc je každý z následujících aspektů důležitý ve vašem životě.
     Variable: v2 | Option: Rodina | Scale: 1 (velmi důležité) - 2 (docela důležité) - 3 (nedůležité) - 4 (vůbec nedůležité) - 8 (Nevím, NS) - 9 (Bez odpovědi, SR)
  3. [Q1] Prosím, řekněte, jak moc je každý z následujících aspektů důležitý ve vašem životě.
     Variable: v3 | Option: Přátelé a známí | Scale: 1 (velmi důležité) - 2 (docela důležité) - 3 (nedůležité) - 4 (vůbec nedůležité) - 8 (Nevím, NS) - 9 (Bez odpovědi, SR)
  4. [Q1] Prosím, řekněte, jak moc je každý z následujících aspektů důležitý ve vašem životě.
     Variable: v4 | Option: Volný čas | Sc

In [None]:
# Query the LLM with detailed prompts allowing thinking and reasoning
results = []

print("\n" + "=" * 80)
print("Querying LLM with Survey Questions (With Model Reasoning)")
print("=" * 80 + "\n")

for idx, item in enumerate(query_items, 1):
    # Build a detailed prompt that allows the model to think and reason
    detailed_prompt = f"""Odpovídáte na otázku dotazníku. Pečlivě si ji přečtěte a poskytněte následující informace:

Vaše úvahy o otázce a konkrétní aspekt, na který se otázka zaměřuje.
Vaše odpověď z poskytnuté stupnice odpovědí.
Otázka [{item['question_id']}]: {item['question_text']}
Aspekt: {item['variable']}
Možnost: {item['option_text']}
Stupnice odpovědí: {item['response_scale']}

Prosím, důkladně se zamyslete a poskytněte:

Vaše úvahy (vysvětlete, proč jste zvolili svou odpověď)
Vaši konečnou odpověď na výše uvedené stupnici."""

    print(f"[{idx}/{len(query_items)}] {item['question_id']} - {item['variable']}: {item['option_text']}")
    
    answer = query_llm(detailed_prompt, temperature=0.7, max_tokens=256)
    
    if answer:
        results.append({
            "question_id": item['question_id'],
            "question_text": item['question_text'],
            "variable": item['variable'],
            "option_text": item['option_text'],
            "response_scale": item['response_scale'],
            "model_response": answer.strip(),
            "status": "success"
        })
        print(f"  ✓ Response received ({len(answer.strip())} chars)\n")
    else:
        results.append({
            "question_id": item['question_id'],
            "question_text": item['question_text'],
            "variable": item['variable'],
            "option_text": item['option_text'],
            "response_scale": item['response_scale'],
            "model_response": None,
            "status": "failed"
        })
        print(f"  ✗ Failed to get answer.\n")
    
    # Small delay between requests to avoid overwhelming the server
    time.sleep(0.2)

print("=" * 80)
print(f"Completed: {sum(1 for r in results if r['status'] == 'success')}/{len(results)} items answered")
print("=" * 80)



Querying LLM with Survey Questions (With Model Reasoning)

[1/202] Q1 - v1: Práce
Querying: Odpovídáte na otázku dotazníku. Pečlivě si ji přečtěte a poskytněte následující ...


  ✓ Response received (833 chars)

[2/202] Q1 - v2: Rodina
Querying: Odpovídáte na otázku dotazníku. Pečlivě si ji přečtěte a poskytněte následující ...
  ✓ Response received (543 chars)

[3/202] Q1 - v3: Přátelé a známí
Querying: Odpovídáte na otázku dotazníku. Pečlivě si ji přečtěte a poskytněte následující ...
  ✓ Response received (639 chars)

[4/202] Q1 - v4: Volný čas
Querying: Odpovídáte na otázku dotazníku. Pečlivě si ji přečtěte a poskytněte následující ...
  ✓ Response received (847 chars)

[5/202] Q1 - v5: Politika
Querying: Odpovídáte na otázku dotazníku. Pečlivě si ji přečtěte a poskytněte následující ...
  ✓ Response received (814 chars)

[6/202] Q1 - v6: Náboženství
Querying: Odpovídáte na otázku dotazníku. Pečlivě si ji přečtěte a poskytněte následující ...
  ✓ Response received (626 chars)

[7/202] Q2 - v7: nan
Querying: Odpovídáte na otázku dotazníku. Pečlivě si ji přečtěte a poskytněte následující ...
  ✓ Response received (741 chars)

[8/202] Q3 - v8: nan
Querying: 

In [None]:
# Display results and save in machine-readable format
import pandas as pd

# Create a DataFrame for easy parsing
results_df = pd.DataFrame(results)

print("\nResults Summary:")
print(results_df.head())

# Save to CSV for easy parsing
results_df.to_csv("llm_survey_gemma_responses_" + language + ".csv", index=False)
print("Results saved to: llm_survey_gemma_responses_" + language + ".csv")

# Save full results to JSON
with open("llm_survey_gemma_responses_" + language + ".json", "w") as f:
    json.dump(results, f, indent=2)
print("Full results also saved to: llm_survey_gemma_responses_" + language + ".json")


Results Summary:
  question_id                                      question_text variable  \
0          Q1  Prosím, řekněte, jak moc je každý z následujíc...       v1   
1          Q1  Prosím, řekněte, jak moc je každý z následujíc...       v2   
2          Q1  Prosím, řekněte, jak moc je každý z následujíc...       v3   
3          Q1  Prosím, řekněte, jak moc je každý z následujíc...       v4   
4          Q1  Prosím, řekněte, jak moc je každý z následujíc...       v5   

       option_text                                     response_scale  \
0            Práce  1 (velmi důležité) - 2 (docela důležité) - 3 (...   
1           Rodina  1 (velmi důležité) - 2 (docela důležité) - 3 (...   
2  Přátelé a známí  1 (velmi důležité) - 2 (docela důležité) - 3 (...   
3        Volný čas  1 (velmi důležité) - 2 (docela důležité) - 3 (...   
4         Politika  1 (velmi důležité) - 2 (docela důležité) - 3 (...   

                                      model_response   status  
0  ## Úvahy o ot

In [None]:
language="hu"
import pandas as pd

# Load questions from new CSV file
csv_file_path = "/home/id02619@hi.inet/repos/eloquence/eloquence/WP1/euvalues/Surveys_parsed/ZA7500_q_" + language + ".csv"

# Read the CSV
df = pd.read_csv(csv_file_path, delimiter=',')# quotechar='"')

print(f"Loaded CSV with {len(df)} rows")
print(f"Columns: {list(df.columns)}\n")

# Display first few rows
print("First 5 rows of the CSV:")
print(df.head())

#df = df[:2]


Loaded CSV with 251 rows
Columns: ['Question Number', 'Question Text', 'Response_Scale', 'Response_Options', 'Card_Number', 'Variable_Name']

First 5 rows of the CSV:
  Question Number                                      Question Text  \
0              Q1  Kérem mondja meg, hogy a következők mennyire f...   
1              Q1  Kérem mondja meg, hogy a következők mennyire f...   
2              Q1  Kérem mondja meg, hogy a következők mennyire f...   
3              Q1  Kérem mondja meg, hogy a következők mennyire f...   
4              Q1  Kérem mondja meg, hogy a következők mennyire f...   

                                      Response_Scale      Response_Options  \
0  1 - nagyon fontos, 2 - elég fontos, 3 - nem fo...                 Munka   
1  1 - nagyon fontos, 2 - elég fontos, 3 - nem fo...                Család   
2  1 - nagyon fontos, 2 - elég fontos, 3 - nem fo...  Barátok és ismerősök   
3  1 - nagyon fontos, 2 - elég fontos, 3 - nem fo...             Szabadidő   
4  1 - nag

In [None]:
# Prepare detailed query items from the new CSV
query_items = []

for idx, row in df.iterrows():
    query_items.append({
        "question_id": row.get('Question Number', ''),
        "question_text": row.get('Question Text', ''),
        "variable": row.get('Variable_Name', ''),
        "option_text": row.get('Response_Options', ''),
        "response_scale": row.get('Response_Scale', ''),
        "csv_index": idx
    })

print(f"Prepared {len(query_items)} query items from the new CSV (each with variable + option + scale):\n")
for i, item in enumerate(query_items[:5], 1):
    print(f"  {i}. [{item['question_id']}] {item['question_text']}")
    print(f"     Variable: {item['variable']} | Option: {item['option_text']} | Scale: {item['response_scale']}")
print(f"  ... and {len(query_items) - 5} more items\n")


Prepared 251 query items from the new CSV (each with variable + option + scale):

  1. [Q1] Kérem mondja meg, hogy a következők mennyire fontosak az Ön életében?
     Variable: v1 | Option: Munka | Scale: 1 - nagyon fontos, 2 - elég fontos, 3 - nem fontos, 4 - egyáltalán nem fontos, 8 - Nem tudom, 9 - Nem válaszolt
  2. [Q1] Kérem mondja meg, hogy a következők mennyire fontosak az Ön életében?
     Variable: v2 | Option: Család | Scale: 1 - nagyon fontos, 2 - elég fontos, 3 - nem fontos, 4 - egyáltalán nem fontos, 8 - Nem tudom, 9 - Nem válaszolt
  3. [Q1] Kérem mondja meg, hogy a következők mennyire fontosak az Ön életében?
     Variable: v3 | Option: Barátok és ismerősök | Scale: 1 - nagyon fontos, 2 - elég fontos, 3 - nem fontos, 4 - egyáltalán nem fontos, 8 - Nem tudom, 9 - Nem válaszolt
  4. [Q1] Kérem mondja meg, hogy a következők mennyire fontosak az Ön életében?
     Variable: v4 | Option: Szabadidő | Scale: 1 - nagyon fontos, 2 - elég fontos, 3 - nem fontos, 4 - egyáltalán nem

In [None]:
# Query the LLM with detailed prompts allowing thinking and reasoning
results = []

print("\n" + "=" * 80)
print("Querying LLM with Survey Questions (With Model Reasoning)")
print("=" * 80 + "\n")

for idx, item in enumerate(query_items, 1):
    # Build a detailed prompt that allows the model to think and reason
    detailed_prompt = f"""Kérdőív kérdésére válaszol. Olvasd el figyelmesen, és add meg a következőket:

Érvelésed a kérdésről és az adott kérdésre vonatkozó konkrét szempontról.
Válaszod a megadott válaszkörön belül.
Kérdés [{item['question_id']}]: {item['question_text']}
Aspektus: {item['variable']}
Lehetőség: {item['option_text']}
Válaszkör: {item['response_scale']}

Kérlek, gondolkodj alaposan, és add meg:

Érvelésed (magyarázd meg, miért választottad az adott válaszát)
Végső válaszod a fenti körön belül."""

    print(f"[{idx}/{len(query_items)}] {item['question_id']} - {item['variable']}: {item['option_text']}")
    
    answer = query_llm(detailed_prompt, temperature=0.7, max_tokens=256)
    
    if answer:
        results.append({
            "question_id": item['question_id'],
            "question_text": item['question_text'],
            "variable": item['variable'],
            "option_text": item['option_text'],
            "response_scale": item['response_scale'],
            "model_response": answer.strip(),
            "status": "success"
        })
        print(f"  ✓ Response received ({len(answer.strip())} chars)\n")
    else:
        results.append({
            "question_id": item['question_id'],
            "question_text": item['question_text'],
            "variable": item['variable'],
            "option_text": item['option_text'],
            "response_scale": item['response_scale'],
            "model_response": None,
            "status": "failed"
        })
        print(f"  ✗ Failed to get answer.\n")
    
    # Small delay between requests to avoid overwhelming the server
    time.sleep(0.2)

print("=" * 80)
print(f"Completed: {sum(1 for r in results if r['status'] == 'success')}/{len(results)} items answered")
print("=" * 80)



Querying LLM with Survey Questions (With Model Reasoning)

[1/251] Q1 - v1: Munka
Querying: Kérdőív kérdésére válaszol. Olvasd el figyelmesen, és add meg a következőket:

É...
  ✓ Response received (819 chars)

[2/251] Q1 - v2: Család
Querying: Kérdőív kérdésére válaszol. Olvasd el figyelmesen, és add meg a következőket:

É...
  ✓ Response received (800 chars)

[3/251] Q1 - v3: Barátok és ismerősök
Querying: Kérdőív kérdésére válaszol. Olvasd el figyelmesen, és add meg a következőket:

É...
  ✓ Response received (763 chars)

[4/251] Q1 - v4: Szabadidő
Querying: Kérdőív kérdésére válaszol. Olvasd el figyelmesen, és add meg a következőket:

É...
  ✓ Response received (813 chars)

[5/251] Q1 - v5: Politika
Querying: Kérdőív kérdésére válaszol. Olvasd el figyelmesen, és add meg a következőket:

É...
  ✓ Response received (808 chars)

[6/251] Q1 - v6: Vallás
Querying: Kérdőív kérdésére válaszol. Olvasd el figyelmesen, és add meg a következőket:

É...
  ✓ Response received (813 chars)

[7/2

In [None]:
# Display results and save in machine-readable format
import pandas as pd

# Create a DataFrame for easy parsing
results_df = pd.DataFrame(results)

print("\nResults Summary:")
print(results_df.head())

# Save to CSV for easy parsing
results_df.to_csv("llm_survey_gemma_responses_" + language + ".csv", index=False)
print("Results saved to: llm_survey_gemma_responses_" + language + ".csv")

# Save full results to JSON
with open("llm_survey_gemma_responses_" + language + ".json", "w") as f:
    json.dump(results, f, indent=2)
print("Full results also saved to: llm_survey_gemma_responses_" + language + ".json")


Results Summary:
  question_id                                      question_text variable  \
0          Q1  Kérem mondja meg, hogy a következők mennyire f...       v1   
1          Q1  Kérem mondja meg, hogy a következők mennyire f...       v2   
2          Q1  Kérem mondja meg, hogy a következők mennyire f...       v3   
3          Q1  Kérem mondja meg, hogy a következők mennyire f...       v4   
4          Q1  Kérem mondja meg, hogy a következők mennyire f...       v5   

            option_text                                     response_scale  \
0                 Munka  1 - nagyon fontos, 2 - elég fontos, 3 - nem fo...   
1                Család  1 - nagyon fontos, 2 - elég fontos, 3 - nem fo...   
2  Barátok és ismerősök  1 - nagyon fontos, 2 - elég fontos, 3 - nem fo...   
3             Szabadidő  1 - nagyon fontos, 2 - elég fontos, 3 - nem fo...   
4              Politika  1 - nagyon fontos, 2 - elég fontos, 3 - nem fo...   

                                      model_respon

In [None]:
language="rs"
import pandas as pd

# Load questions from new CSV file
csv_file_path = "/home/id02619@hi.inet/repos/eloquence/eloquence/WP1/euvalues/Surveys_parsed/ZA7500_q_" + language + ".csv"

# Read the CSV
df = pd.read_csv(csv_file_path) #, delimiter=',', quotechar='"')

print(f"Loaded CSV with {len(df)} rows")
print(f"Columns: {list(df.columns)}\n")

# Display first few rows
print("First 5 rows of the CSV:")
print(df.head())

#df = df[:2]


Loaded CSV with 243 rows
Columns: ['Question Number', 'Question Text', 'Response_Scale', 'Response_Options', 'Card_Number', 'Variable_Name']

First 5 rows of the CSV:
  Question Number                                      Question Text  \
0              Q1  Kérem mondja meg, hogy a következők mennyire f...   
1              Q1  Kérem mondja meg, hogy a következők mennyire f...   
2              Q1  Kérem mondja meg, hogy a következők mennyire f...   
3              Q1  Kérem mondja meg, hogy a következők mennyire f...   
4              Q1  Kérem mondja meg, hogy a következők mennyire f...   

                                      Response_Scale      Response_Options  \
0  1 - nagyon fontos, 2 - elég fontos, 3 - nem fo...                 Munka   
1  1 - nagyon fontos, 2 - elég fontos, 3 - nem fo...                Család   
2  1 - nagyon fontos, 2 - elég fontos, 3 - nem fo...  Barátok és ismerősök   
3  1 - nagyon fontos, 2 - elég fontos, 3 - nem fo...             Szabadidő   
4  1 - nag

In [None]:
# Prepare detailed query items from the new CSV
query_items = []

for idx, row in df.iterrows():
    query_items.append({
        "question_id": row.get('Question Number', ''),
        "question_text": row.get('Question Text', ''),
        "variable": row.get('Variable_Name', ''),
        "option_text": row.get('Response_Options', ''),
        "response_scale": row.get('Response_Scale', ''),
        "csv_index": idx
    })

print(f"Prepared {len(query_items)} query items from the new CSV (each with variable + option + scale):\n")
for i, item in enumerate(query_items[:5], 1):
    print(f"  {i}. [{item['question_id']}] {item['question_text']}")
    print(f"     Variable: {item['variable']} | Option: {item['option_text']} | Scale: {item['response_scale']}")
print(f"  ... and {len(query_items) - 5} more items\n")


Prepared 243 query items from the new CSV (each with variable + option + scale):

  1. [Q1] Kérem mondja meg, hogy a következők mennyire fontosak az Ön életében?
     Variable: v1 | Option: Munka | Scale: 1 - nagyon fontos, 2 - elég fontos, 3 - nem fontos, 4 - egyáltalán nem fontos, 8 - Nem tudom, 9 - Nem válaszolt
  2. [Q1] Kérem mondja meg, hogy a következők mennyire fontosak az Ön életében?
     Variable: v2 | Option: Család | Scale: 1 - nagyon fontos, 2 - elég fontos, 3 - nem fontos, 4 - egyáltalán nem fontos, 8 - Nem tudom, 9 - Nem válaszolt
  3. [Q1] Kérem mondja meg, hogy a következők mennyire fontosak az Ön életében?
     Variable: v3 | Option: Barátok és ismerősök | Scale: 1 - nagyon fontos, 2 - elég fontos, 3 - nem fontos, 4 - egyáltalán nem fontos, 8 - Nem tudom, 9 - Nem válaszolt
  4. [Q1] Kérem mondja meg, hogy a következők mennyire fontosak az Ön életében?
     Variable: v4 | Option: Szabadidő | Scale: 1 - nagyon fontos, 2 - elég fontos, 3 - nem fontos, 4 - egyáltalán nem

In [None]:
# Query the LLM with detailed prompts allowing thinking and reasoning
results = []

print("\n" + "=" * 80)
print("Querying LLM with Survey Questions (With Model Reasoning)")
print("=" * 80 + "\n")

for idx, item in enumerate(query_items, 1):
    # Build a detailed prompt that allows the model to think and reason
    detailed_prompt = f"""Odgovarate na pitanje iz ankete. Pažljivo pročitajte i pružite sledeće:

Vaše razmišljanje o pitanju i specifični aspekt o kojem se pita.
Vaš odgovor iz date skale odgovora.
Pitanje [{item['question_id']}]: {item['question_text']}
Aspekt: {item['variable']}
Opcija: {item['option_text']}
Skala odgovora: {item['response_scale']}

Molim vas, pažljivo razmislite i pružite:

Vaše razmišljanje (objasnite zašto ste izabrali svoj odgovor)
Vaš konačni odgovor na gore navedenoj skali."""

    print(f"[{idx}/{len(query_items)}] {item['question_id']} - {item['variable']}: {item['option_text']}")
    
    answer = query_llm(detailed_prompt, temperature=0.7, max_tokens=256)
    
    if answer:
        results.append({
            "question_id": item['question_id'],
            "question_text": item['question_text'],
            "variable": item['variable'],
            "option_text": item['option_text'],
            "response_scale": item['response_scale'],
            "model_response": answer.strip(),
            "status": "success"
        })
        print(f"  ✓ Response received ({len(answer.strip())} chars)\n")
    else:
        results.append({
            "question_id": item['question_id'],
            "question_text": item['question_text'],
            "variable": item['variable'],
            "option_text": item['option_text'],
            "response_scale": item['response_scale'],
            "model_response": None,
            "status": "failed"
        })
        print(f"  ✗ Failed to get answer.\n")
    
    # Small delay between requests to avoid overwhelming the server
    time.sleep(0.2)

print("=" * 80)
print(f"Completed: {sum(1 for r in results if r['status'] == 'success')}/{len(results)} items answered")
print("=" * 80)



Querying LLM with Survey Questions (With Model Reasoning)

[1/243] Q1 - v1: Munka
Querying: Odgovarate na pitanje iz ankete. Pažljivo pročitajte i pružite sledeće:

Vaše ra...


  ✓ Response received (518 chars)

[2/243] Q1 - v2: Család
Querying: Odgovarate na pitanje iz ankete. Pažljivo pročitajte i pružite sledeće:

Vaše ra...
  ✓ Response received (483 chars)

[3/243] Q1 - v3: Barátok és ismerősök
Querying: Odgovarate na pitanje iz ankete. Pažljivo pročitajte i pružite sledeće:

Vaše ra...
  ✓ Response received (776 chars)

[4/243] Q1 - v4: Szabadidő
Querying: Odgovarate na pitanje iz ankete. Pažljivo pročitajte i pružite sledeće:

Vaše ra...
  ✓ Response received (515 chars)

[5/243] Q1 - v5: Politika
Querying: Odgovarate na pitanje iz ankete. Pažljivo pročitajte i pružite sledeće:

Vaše ra...
  ✓ Response received (822 chars)

[6/243] Q1 - v6: Vallás
Querying: Odgovarate na pitanje iz ankete. Pažljivo pročitajte i pružite sledeće:

Vaše ra...
  ✓ Response received (592 chars)

[7/243] Q2 - v7: nan
Querying: Odgovarate na pitanje iz ankete. Pažljivo pročitajte i pružite sledeće:

Vaše ra...
  ✓ Response received (803 chars)

[8/243] Q3 - v8: nan
Querying: 

In [None]:
# Display results and save in machine-readable format
import pandas as pd

# Create a DataFrame for easy parsing
results_df = pd.DataFrame(results)

print("\nResults Summary:")
print(results_df.head())

# Save to CSV for easy parsing
results_df.to_csv("llm_survey_gemma_responses_" + language + ".csv", index=False)
print("Results saved to: llm_survey_gemma_responses_" + language + ".csv")

# Save full results to JSON
with open("llm_survey_gemma_responses_" + language + ".json", "w") as f:
    json.dump(results, f, indent=2)
print("Full results also saved to: llm_survey_gemma_responses_" + language + ".json")


Results Summary:
  question_id                                      question_text variable  \
0          Q1  Kérem mondja meg, hogy a következők mennyire f...       v1   
1          Q1  Kérem mondja meg, hogy a következők mennyire f...       v2   
2          Q1  Kérem mondja meg, hogy a következők mennyire f...       v3   
3          Q1  Kérem mondja meg, hogy a következők mennyire f...       v4   
4          Q1  Kérem mondja meg, hogy a következők mennyire f...       v5   

            option_text                                     response_scale  \
0                 Munka  1 - nagyon fontos, 2 - elég fontos, 3 - nem fo...   
1                Család  1 - nagyon fontos, 2 - elég fontos, 3 - nem fo...   
2  Barátok és ismerősök  1 - nagyon fontos, 2 - elég fontos, 3 - nem fo...   
3             Szabadidő  1 - nagyon fontos, 2 - elég fontos, 3 - nem fo...   
4              Politika  1 - nagyon fontos, 2 - elég fontos, 3 - nem fo...   

                                      model_respon

In [None]:
language="ru"
import pandas as pd

# Load questions from new CSV file
csv_file_path = "/home/id02619@hi.inet/repos/eloquence/eloquence/WP1/euvalues/Surveys_parsed/ZA7500_q_" + language + ".csv"

# Read the CSV
df = pd.read_csv(csv_file_path) #, delimiter=',', quotechar='"')

print(f"Loaded CSV with {len(df)} rows")
print(f"Columns: {list(df.columns)}\n")

# Display first few rows
print("First 5 rows of the CSV:")
print(df.head())

#df = df[:2]


Loaded CSV with 183 rows
Columns: ['Question Number', 'Question Text', 'Response_Scale', 'Response_Options', 'Card_Number', 'Variable_Name']

First 5 rows of the CSV:
  Question Number                                      Question Text  \
0              Q1  Для каждой из сторон жизни, которые я сейчас п...   
1              Q1  Для каждой из сторон жизни, которые я сейчас п...   
2              Q1  Для каждой из сторон жизни, которые я сейчас п...   
3              Q1  Для каждой из сторон жизни, которые я сейчас п...   
4              Q1  Для каждой из сторон жизни, которые я сейчас п...   

                                      Response_Scale  Response_Options  \
0  1 (Очень важно) - 2 (Довольно важно) - 3 (не в...            Работа   
1  1 (Очень важно) - 2 (Довольно важно) - 3 (не в...             Семья   
2  1 (Очень важно) - 2 (Довольно важно) - 3 (не в...  Друзья, знакомые   
3  1 (Очень важно) - 2 (Довольно важно) - 3 (не в...   Свободное время   
4  1 (Очень важно) - 2 (Доволь

In [None]:
# Prepare detailed query items from the new CSV
query_items = []

for idx, row in df.iterrows():
    query_items.append({
        "question_id": row.get('Question Number', ''),
        "question_text": row.get('Question Text', ''),
        "variable": row.get('Variable_Name', ''),
        "option_text": row.get('Response_Options', ''),
        "response_scale": row.get('Response_Scale', ''),
        "csv_index": idx
    })

print(f"Prepared {len(query_items)} query items from the new CSV (each with variable + option + scale):\n")
for i, item in enumerate(query_items[:5], 1):
    print(f"  {i}. [{item['question_id']}] {item['question_text']}")
    print(f"     Variable: {item['variable']} | Option: {item['option_text']} | Scale: {item['response_scale']}")
print(f"  ... and {len(query_items) - 5} more items\n")


Prepared 183 query items from the new CSV (each with variable + option + scale):

  1. [Q1] Для каждой из сторон жизни, которые я сейчас перечислю, скажите, пожалуйста, является ли она для Вас очень важной; довольно важной; не важной или совсем не важной?
     Variable: v1 | Option: Работа | Scale: 1 (Очень важно) - 2 (Довольно важно) - 3 (не важно) - 4 (совсем не важно) - 8 (Затр. ответить, /НЕ ЧИТАТЬ/) - 9 (Отказ, /НЕ ЧИТАТЬ/)
  2. [Q1] Для каждой из сторон жизни, которые я сейчас перечислю, скажите, пожалуйста, является ли она для Вас очень важной; довольно важной; не важной или совсем не важной?
     Variable: v2 | Option: Семья | Scale: 1 (Очень важно) - 2 (Довольно важно) - 3 (не важно) - 4 (совсем не важно) - 8 (Затр. ответить, /НЕ ЧИТАТЬ/) - 9 (Отказ, /НЕ ЧИТАТЬ/)
  3. [Q1] Для каждой из сторон жизни, которые я сейчас перечислю, скажите, пожалуйста, является ли она для Вас очень важной; довольно важной; не важной или совсем не важной?
     Variable: v3 | Option: Друзья, знакомы

In [None]:
# Query the LLM with detailed prompts allowing thinking and reasoning
results = []

print("\n" + "=" * 80)
print("Querying LLM with Survey Questions (With Model Reasoning)")
print("=" * 80 + "\n")

for idx, item in enumerate(query_items, 1):
    # Build a detailed prompt that allows the model to think and reason
    detailed_prompt = f"""Вы отвечаете на вопрос анкеты. Внимательно прочитайте и предоставьте следующее:

Ваше рассуждение относительно вопроса и конкретного аспекта, который затрагивается.
Ваш ответ из предоставленной шкалы ответов.
Вопрос [{item['question_id']}]: {item['question_text']}
Аспект: {item['variable']}
Опция: {item['option_text']}
Шкала ответов: {item['response_scale']}

Пожалуйста, внимательно подумайте и предоставьте:

Ваше рассуждение (объясните, почему вы выбрали свой ответ)
Ваш окончательный ответ на указанной шкале."""

    print(f"[{idx}/{len(query_items)}] {item['question_id']} - {item['variable']}: {item['option_text']}")
    
    answer = query_llm(detailed_prompt, temperature=0.7, max_tokens=256)
    
    if answer:
        results.append({
            "question_id": item['question_id'],
            "question_text": item['question_text'],
            "variable": item['variable'],
            "option_text": item['option_text'],
            "response_scale": item['response_scale'],
            "model_response": answer.strip(),
            "status": "success"
        })
        print(f"  ✓ Response received ({len(answer.strip())} chars)\n")
    else:
        results.append({
            "question_id": item['question_id'],
            "question_text": item['question_text'],
            "variable": item['variable'],
            "option_text": item['option_text'],
            "response_scale": item['response_scale'],
            "model_response": None,
            "status": "failed"
        })
        print(f"  ✗ Failed to get answer.\n")
    
    # Small delay between requests to avoid overwhelming the server
    time.sleep(0.2)

print("=" * 80)
print(f"Completed: {sum(1 for r in results if r['status'] == 'success')}/{len(results)} items answered")
print("=" * 80)



Querying LLM with Survey Questions (With Model Reasoning)

[1/183] Q1 - v1: Работа
Querying: Вы отвечаете на вопрос анкеты. Внимательно прочитайте и предоставьте следующее:
...
  ✓ Response received (672 chars)

[2/183] Q1 - v2: Семья
Querying: Вы отвечаете на вопрос анкеты. Внимательно прочитайте и предоставьте следующее:
...
  ✓ Response received (546 chars)

[3/183] Q1 - v3: Друзья, знакомые
Querying: Вы отвечаете на вопрос анкеты. Внимательно прочитайте и предоставьте следующее:
...
  ✓ Response received (1067 chars)

[4/183] Q1 - v4: Свободное время
Querying: Вы отвечаете на вопрос анкеты. Внимательно прочитайте и предоставьте следующее:
...
  ✓ Response received (636 chars)

[5/183] Q1 - v5: Политика
Querying: Вы отвечаете на вопрос анкеты. Внимательно прочитайте и предоставьте следующее:
...
  ✓ Response received (864 chars)

[6/183] Q1 - v6: Религия
Querying: Вы отвечаете на вопрос анкеты. Внимательно прочитайте и предоставьте следующее:
...
  ✓ Response received (571 chars)



In [None]:
# Display results and save in machine-readable format
import pandas as pd

# Create a DataFrame for easy parsing
results_df = pd.DataFrame(results)

print("\nResults Summary:")
print(results_df.head())

# Save to CSV for easy parsing
results_df.to_csv("llm_survey_gemma_responses_" + language + ".csv", index=False)
print("Results saved to: llm_survey_gemma_responses_" + language + ".csv")

# Save full results to JSON
with open("llm_survey_gemma_responses_" + language + ".json", "w") as f:
    json.dump(results, f, indent=2)
print("Full results also saved to: llm_survey_gemma_responses_" + language + ".json")


Results Summary:
  question_id                                      question_text variable  \
0          Q1  Prosím, řekněte, jak moc je každý z následujíc...       v1   
1          Q1  Prosím, řekněte, jak moc je každý z následujíc...       v2   

  option_text                                     response_scale  \
0       Práce  1 (velmi důležité) - 2 (docela důležité) - 3 (...   
1      Rodina  1 (velmi důležité) - 2 (docela důležité) - 3 (...   

                                      model_response   status  
0  **Vaše úvahy:**\n\nOtázka se ptá na důležitost...  success  
1  **Vaše úvahy:**\n\nOtázka se ptá na důležitost...  success  
Results saved to: llm_survey_gemma_responses_cz.csv
Full results also saved to: llm_survey_gemma_responses_cz.json


# Iterate over languages and models

In [None]:
import requests
import json
from typing import List, Dict, Optional
import time
import pandas

# Configuration
LLAMA_SERVER_URL = "http://127.0.0.1:10000"
CHAT_ENDPOINT = f"{LLAMA_SERVER_URL}/v1/chat/completions"

print(f"LLM Server URL: {LLAMA_SERVER_URL}")
print(f"Chat Endpoint: {CHAT_ENDPOINT}")

LLM Server URL: http://127.0.0.1:10000
Chat Endpoint: http://127.0.0.1:10000/v1/chat/completions


In [None]:
def query_llm(
    question: str,
    temperature: float = 0.7,
    max_tokens: int = -1,
    timeout: int = 300
) -> Optional[str]:
    """
    Send a question to the local llama.cpp server and get a response.
    
    Args:
        question: The question/prompt to send
        temperature: Sampling temperature (0.0 = deterministic, 1.0 = more random)
        max_tokens: Maximum tokens in response
        timeout: Request timeout in seconds
    
    Returns:
        The LLM's response text, or None if error
    """
    try:
        payload = {
            "model": "local-model",  # llama.cpp uses this default
            "messages": [
                {
                    "role": "user",
                    "content": question
                }
            ],
            "temperature": temperature,
            "max_tokens": max_tokens,
            "stream": False
        }
        
        print(f"Querying: {question[:80]}...")
        response = requests.post(
            CHAT_ENDPOINT,
            json=payload,
            timeout=timeout
        )
        
        if response.status_code == 200:
            result = response.json()
            answer = result.get("choices", [{}])[0].get("message", {}).get("content", "").strip()
            return answer
        else:
            print(f"  Error: HTTP {response.status_code}")
            print(f"  Response: {response.text[:200]}")
            return None
            
    except requests.exceptions.Timeout:
        print(f"  Error: Request timeout after {timeout}s")
        return None
    except requests.exceptions.ConnectionError:
        print(f"  Error: Could not connect to {LLAMA_SERVER_URL}")
        return None
    except Exception as e:
        print(f"  Error: {e}")
        return None

In [None]:
# Test connection to the server with detailed debugging
print("Testing connection to local llama.cpp server...")
test_question = "Hello, how are you?"

try:
    test_payload = {
        "model": "local-model",
        "messages": [{"role": "user", "content": test_question}],
        "temperature": 0.7,
        "max_tokens": 5000,
        "stream": False
    }
    
    print(f"Server URL: {LLAMA_SERVER_URL}")
    print(f"Payload: {test_payload}\n")
    
    response = requests.post(CHAT_ENDPOINT, json=test_payload, timeout=30)
    
    print(f"Response Status: {response.status_code}")
    print(f"Response Headers: {response.headers}")
    print(f"Response Body: {response.text}\n")
    
    if response.status_code == 200:
        result = response.json()
        test_answer = result.get("choices", [{}])[0].get("message", {}).get("content", "").strip()
        print(f"✓ Test successful! LLM responded:")
        print(f"  Q: {test_question}")
        print(f"  A: {test_answer}\n")
    else:
        print(f"✗ Server returned error {response.status_code}")
        try:
            error_detail = response.json()
            print(f"  Error details: {error_detail}")
        except:
            print(f"  Raw response: {response.text}")
            
except Exception as e:
    print(f"✗ Connection failed: {e}")
    print("\nPlease ensure:")
    print("  1. llama.cpp server is running on http://127.0.0.1:10006/")
    print("  2. You can reach the endpoint (check firewall, port, etc.)")
    print("  3. Try in terminal: curl -X POST http://127.0.0.1:10006/v1/chat/completions -H 'Content-Type: application/json' -d '{\"messages\":[{\"role\":\"user\",\"content\":\"hi\"}]}'")

Testing connection to local llama.cpp server...
Server URL: http://127.0.0.1:10000
Payload: {'model': 'local-model', 'messages': [{'role': 'user', 'content': 'Hello, how are you?'}], 'temperature': 0.7, 'max_tokens': 5000, 'stream': False}

Response Status: 200
Response Headers: {'Keep-Alive': 'timeout=5, max=100', 'Content-Type': 'application/json; charset=utf-8', 'Server': 'llama.cpp', 'Content-Length': '1557', 'Access-Control-Allow-Origin': ''}
Response Body: {"choices":[{"finish_reason":"stop","index":0,"message":{"role":"assistant","reasoning_content":"Okay, the user is greeting me. I need to respond appropriately. Since I'm an AI, I don't have feelings, but I can simulate a friendly conversation. I should ask how they're doing in return or respond in a polite manner. I'll go with a friendly response that acknowledges the greeting and asks about their well-being.\n\nLet's craft a response:\n\n\"Hello! I'm an AI, so I don't have feelings, but I'm here and ready to help you. How abo

In [51]:
languages = ["gb", "es", "it", "cz", "hu", "rs", "ru"]
llm_names = ["minimistral3"]  #["apertus", "gemma", "qwen"]

In [None]:
from jinja2 import Template
import pandas as pd



for language in languages:
    for llm_name in llm_names:
        
        # Load the Jinja2 template
        #template_path = "./prompts/survey_prompt/survey_prompts.jinja2"
        template_path = "./prompts/survey_prompt/survey_prompts_final_answer_"+llm_name+".jinja2"
        with open(template_path, 'r', encoding='utf-8') as f:
            template_string = f.read()
            
        template = Template(template_string)
        
        # Load questions from new CSV file
        csv_file_path = f"./Surveys_parsed/ZA7500_q_{language}.csv"

        # Read the CSV
        df = pd.read_csv(csv_file_path) #, delimiter=',', quotechar='"')

        print(f"Loaded CSV with {len(df)} rows")
        print(f"Columns: {list(df.columns)}\n")
        
        #df = df[:5] # Limit to first 2 for testing
        
        # Display first few rows for testing
        #print("First 5 rows of the CSV:")
        #print(df.head())
        
        # Prepare detailed query items from the CSV
        query_items = []

        for idx, row in df.iterrows():
            query_items.append({
                "question_id": row.get('Question Number', ''),
                "question_text": row.get('Question Text', ''),
                "variable": row.get('Variable_Name', ''),
                "option_text": row.get('Response_Options', ''),
                "response_scale": row.get('Response_Scale', ''),
                "csv_index": idx
            })

        print(f"Prepared {len(query_items)} query items from the new CSV (each with variable + option + scale):\n")
        for i, item in enumerate(query_items[:5], 1):
            print(f"  {i}. [{item['question_id']}] {item['question_text']}")
            print(f"     Variable: {item['variable']} | Option: {item['option_text']} | Scale: {item['response_scale']}")
        print(f"  ... and {len(query_items) - 5} more items\n")
        
        # Query the LLM with detailed prompts allowing thinking and reasoning
        results = []

        print("\n" + "=" * 80)
        print("Querying LLM with Survey Questions (With Model Reasoning)")
        print("=" * 80 + "\n")

        for idx, item in enumerate(query_items, 1):
            # Render detailed prompt from template
            detailed_prompt = template.render(
                language=language,
                question_id=item['question_id'],
                question_text=item['question_text'],
                variable=item['variable'],
                option_text=item['option_text'],
                response_scale=item['response_scale']
            )

            print(f"[{idx}/{len(query_items)}] {item['question_id']} - {item['variable']}: {item['option_text']}")
            
            answer = query_llm(detailed_prompt) #, temperature=0.7, max_tokens=256000)
            
            if answer:
                results.append({
                    "question_id": item['question_id'],
                    "question_text": item['question_text'],
                    "variable": item['variable'],
                    "option_text": item['option_text'],
                    "response_scale": item['response_scale'],
                    "model_response": answer.strip(),
                    "status": "success"
                })
                print(f"  ✓ Response received ({len(answer.strip())} chars)\n")
            else:
                results.append({
                    "question_id": item['question_id'],
                    "question_text": item['question_text'],
                    "variable": item['variable'],
                    "option_text": item['option_text'],
                    "response_scale": item['response_scale'],
                    "model_response": None,
                    "status": "failed"
                })
                print(f"  ✗ Failed to get answer.\n")
            
            # Small delay between requests to avoid overwhelming the server
            time.sleep(0.2)

        print("=" * 80)
        print(f"Completed: {sum(1 for r in results if r['status'] == 'success')}/{len(results)} items answered")
        print("=" * 80)
        
        
        # Create a DataFrame for easy parsing
        results_df = pd.DataFrame(results)

        print("\nResults Summary:")
        print(results_df.head())

        # Save to CSV for easy parsing
        results_df.to_csv("./Surveys_responses/llm_survey_" + llm_name + "_responses_" + language + ".csv", index=False)
        print("Results saved to: llm_survey_" + llm_name + "_responses_" + language + ".csv")

        # Save full results to JSON
        with open("./Surveys_responses/llm_survey_" + llm_name + "_responses_" + language + ".json", "w") as f:
            json.dump(results, f, indent=2)
        print("Full results also saved to: llm_survey_" + llm_name + "_responses_" + language + ".json")

Loaded CSV with 229 rows
Columns: ['Question Number', 'Question Text', 'Response_Scale', 'Response_Options', 'Card_Number', 'Variable_Name']

Prepared 229 query items from the new CSV (each with variable + option + scale):

  1. [Q1] Please say, for each of the following, how important it is in your life.
     Variable: v1 | Option: Work | Scale: 1 (very important) - 2 (quite important) - 3 (not important) - 4 (not at all important) - 8 (Don't Know, DK) - 9 (Not Answer, NA)
  2. [Q1] Please say, for each of the following, how important it is in your life.
     Variable: v2 | Option: Family | Scale: 1 (very important) - 2 (quite important) - 3 (not important) - 4 (not at all important) - 8 (Don't Know, DK) - 9 (Not Answer, NA)
  3. [Q1] Please say, for each of the following, how important it is in your life.
     Variable: v3 | Option: Friends and acquaintances | Scale: 1 (very important) - 2 (quite important) - 3 (not important) - 4 (not at all important) - 8 (Don't Know, DK) - 9 (Not 