In [4]:
%pip install requests


Note: you may need to restart the kernel to use updated packages.


In [None]:
import requests
import json
import time

# Endpoint of the Ollama server
OLLAMA_URL = "http://192.168.8.18:11434/api/generate"

# Your prompt template
prompt = """
You are generating realistic, emotionally-rich transcripts of calls made to **116**, the national child protection helpline in East Africa.

---

üìå Context:
Callers use 116 to report serious child protection concerns like:
- physical abuse
- emotional abuse
- neglect
- sexual violence
- child marriage
- child labor
- trafficking
- abandonment
- online exploitation
- harmful cultural practices
- exposure to violence
- denial of education
- forced domestic work

A trained counselor responds to each call with empathy, asks questions, assesses urgency, and advises next steps.

---

üìû Example Transcript (long-form, single paragraph):

"Hello? Am I speaking to 116? Yes, this is 116. Thank you for calling. Who am I speaking with? My name is Samuel, I‚Äôm calling from Bungoma. I‚Äôm worried about my neighbor‚Äôs daughter ‚Äî she‚Äôs only 10, and they leave her alone at home every night while they go drinking. Sometimes she cries until late. That sounds very distressing. Has she ever told you what happens? Yes, she said she's scared and that sometimes strangers knock on the door. That‚Äôs very concerning, Samuel. Thank you for speaking up. Do her parents come back in the morning? Sometimes they don‚Äôt show up till the next day. She‚Äôs hungry and hasn‚Äôt gone to school in weeks. Okay. That‚Äôs a clear case of neglect. We‚Äôll involve the area Children‚Äôs Officer and also get in touch with a local child protection volunteer. Do you think she‚Äôd be safe staying with you until someone comes? Yes, we already help her sometimes. Alright. We‚Äôll act on this immediately. Please stay close by and we‚Äôll follow up shortly."

---

üß† Your Task:

Now generate a new, realistic transcript of a child protection call. Follow these instructions:

- Write in **one long, natural-sounding paragraph**
- Use a **conversational, spoken tone**
- Include empathy, clarifying questions, and referral steps
- Vary the names, locations, phrasing, and issue types
- Do **not** use speaker labels, line breaks, or numbered steps
- Avoid unrelated numbers, markdown symbols, or formatting

---

After writing the full transcript, return **only** a valid JSON object with the following fields:

{
  "transcript": "<The full conversation as one paragraph>",
  "summary": "<A short narrative summary of what happened in the call. Use the structure: '<Reporter name> from <location> called to report a <case category> case involving <victim>. The incident involves <perpetrator>. The case was categorized as <priority> and referred to <referral>. Intervention: <intervention>.'>",
  "name": "<First name of caller>",
  "location": "<Town, county, or region>",
  "issue": "<The main issue reported (e.g., child marriage, neglect)>",
  "victim": "<Who is affected (e.g., 12-year-old girl, caller‚Äôs cousin)>",
  "perpetrator": "<Who is responsible (e.g., stepfather, aunt)>",
  "referral": "<What support the counselor recommended (e.g., Children‚Äôs Officer, police)>",
  "category": "<Case type (e.g., physical abuse, child labor)>",
  "priority": "<low | medium | high ‚Äî based on urgency and risk>",
  "intervention": "<What action was taken or recommended>"
}

Return only the JSON object ‚Äî no explanations, no markdown, no extra text.
"""

# Output file
output_file = "mistral_helpline_data_ollama2.jsonl"
num_samples = 10000

with open(output_file, "w") as f:
    for i in range(num_samples):
        print(f"üìû Generating record {i+1}/{num_samples}...")

        try:
            payload = {
                "model": "mistral",
                "prompt": prompt,
                "stream": False
            }

            res = requests.post(OLLAMA_URL, json=payload)
            res.raise_for_status()

            output_text = res.json()["response"]
            print (output_text)

            # Parse and check
            data = json.loads(output_text)
            required_keys = [
                "transcript", "summary", "name", "location", "issue",
                "victim", "perpetrator", "referral", "category", "priority", "intervention"
            ]

            if all(k in data for k in required_keys):
                f.write(json.dumps(data) + "\n")
            else:
                print("‚ö†Ô∏è Missing fields in sample", i+1)
        except Exception as e:
            print("‚ùå Error:", str(e))

        time.sleep(0.2)

print(f"\n‚úÖ Done! Saved {num_samples} records to: {output_file}")


üìû Generating record 1/1000...
 {
  "transcript": "Hello? Yes, this is 116. A man from Mwanza called, deeply concerned about his niece. She's barely eight and has been forced to work in a brick factory since her parents passed away. He heard that her uncle, who lives nearby, might be involved. This sounds tragic. The caller hasn't seen the girl for weeks but thinks she could be hidden somewhere around their old family home. That's a clear case of child labor. We'll involve the local Children's Officer and reach out to a child welfare organization. If possible, he suggested searching for her there. We‚Äôll act on this immediately and get back to him soon.",
  "summary": "'John from Mwanza called to report a child labor case involving an 8-year-old girl, his niece. The incident involves her uncle. The case was categorized as high and referred to the local Children‚Äôs Officer and a child welfare organization. Intervention: Search for the victim if possible.'",
  "name": "John",
  "loca