# Welcome to the GDIT Hackathon!

We‚Äôre thrilled to have you join us for today's event! This hackathon is designed to give you hands-on experience with the SeekrFlow platform, explore practical AI-driven solutions, and collaborate creatively to tackle real-world challenges.



### What You‚Äôll Learn

Throughout this hackathon, you'll be introduced to:

* **SeekrFlow Basics:** How to interact with AI agents to answer business questions.
* **Agentic Workflows:** Leveraging pre-built agents to solve specific tasks.
* **Vector Search and Retrieval:** Finding relevant information quickly and effectively.
* **Understanding Predictive Insights:** Interpreting outputs from predictive models provided to you.
* **Practical Application:** Applying these tools directly to real-world GDIT challenges.



### How to Use This Notebook

This notebook is your interactive guide for the day. Here's how you can make the most of it:

* **Follow Step-by-Step:** Each section is clearly labeled, guiding you from introduction to practical implementation.
* **Interactive Exercises:** You'll be given clear prompts where you can enter your own inputs or run provided examples.
* **Understand the Concepts:** Explanations are provided to help you understand what‚Äôs happening behind the scenes.
* **Reference Material:** Quick reference sections are included to help you recall key concepts and code snippets throughout the hackathon.



### Goals and Outcomes

By the end of the hackathon, you‚Äôll be able to:

* Confidently use SeekrFlow agents to answer business-critical questions.
* Understand how AI-driven solutions can automate and streamline your workflow.
* Demonstrate your team's innovative solution to the provided challenge.
* Engage with your peers and exchange insights and ideas.



### Let's Get Started!

Dive into the notebook, ask questions, collaborate, and most importantly, enjoy the experience. We‚Äôre excited to see the innovative solutions your team develops today!

Happy hacking!


In [None]:
from seekrai import SeekrFlow
import os, getpass
os.environ["SEEKR_API_KEY"] = getpass.getpass("Enter API key:")

# Initialize client
client = SeekrFlow(api_key=os.environ["SEEKR_API_KEY"])
BASE_URL = "https://flow.seekr.com/v1"

# Vector Databases & Embeddings: What‚Äôs Happening Here

Before we load any data, we need a place to store our ‚Äúembeddings‚Äù ‚Äî dense numerical representations of each record that capture its meaning and relationships. That‚Äôs exactly what a **vector database** is for.

---

### 1. What Is an Embedding?  
- An **embedding** is a fixed-length numeric vector (e.g. 768 floats) generated by a model (like Mistral).  
- It encodes the semantic essence of a piece of text ‚Äî in our case, a string of water‚Äêquality features.  
- Similar records produce embeddings that sit close together in high-dimensional space.


### 2. Why Use a Vector Database?  
- We want to ask: ‚ÄúWhich past record is most like my new sensor reading?‚Äù  
- A vector database indexes all embeddings and lets us do **nearest-neighbor search** quickly.  
- Behind the scenes it‚Äôs optimized for large-scale similarity queries.


In [None]:
from seekrai import Client

# Create vector database
vector_db = client.vector_database.create(
    name="refugee",
    model="intfloat/e5-mistral-7b-instruct",
    description="GDIT Hackathon 2025 - Refugee Asylum",
)
db_id = vector_db.id
print(f"‚úÖ Created Vector DB '{vector_db.name}' (ID: {db_id})")

# Locating & Uploading Your Challenge Data

Each hackathon track comes with its own set of pre-generated data. Before you upload them into SeekrFlow, you need to point the script at the correct directory.

---

### 1. Identify Your Challenge Folder

| Use Case                                | Directory                           |
|-----------------------------------------|-------------------------------------|
| **Medicare/Medicaid Work Requirements** | `medicare/medicare_md`               |
| **Asylum-Interview Similarity**         | `refugee/refugee_md`      |
| **USPS Shipping Fraud Investigation**   | `usps/`          |
| **Water Quality Sensor Alerts**         | `water/`          |
| **Commander's Intent/OPORDER**          | `defense/`          |

*(If you renamed or moved the folders, adjust the path accordingly.)*

---

### 2. Update the `base_dir` Variable

In the code cell below, replace the default `base_dir` value with the directory for **your** use case. For example, if you‚Äôre working on water quality:

```python
# Point to your track‚Äôs Markdown folder
base_dir = "refugee/refugee_md"


In [None]:
import os

# Path to the directory containing your data
base_dir = "refugee/refugee_md/"

file_ids = []
for root, dirs, files in os.walk(base_dir):
    for filename in files:
        if filename.lower().endswith(".md"):
            file_path = os.path.join(root, filename)
            resp = client.files.upload(file_path, purpose="alignment")
            file_ids.append(resp.id)
            print(f"Uploaded {filename} ‚Üí {resp.id}")

In [None]:
print(file_ids)

# What‚Äôs Happening in the Ingestion Step

Once you‚Äôve uploaded your Markdown files into SeekrFlow, the next step is to **ingest** them into your vector database. This is where your raw text is broken into ‚Äúchunks,‚Äù embedded, and stored for fast similarity search. The following parameters are configurable.

### 1. `token_count=512` (Chunk Size)
Determines the approximate number of tokens (words/subwords) per chunk.

**Impacts:**  
- **Larger chunk** (e.g. 1024+):  
  - Pros: more context in each retrieval  
  - Cons: slower embedding calls, higher memory usage, fewer chunks overall  
- **Smaller chunk** (e.g. 256):  
  - Pros: faster ingestion, more fine-grained retrieval  
  - Cons: risk of losing cross-sentence context  

**Best practice:**  
Start with **512‚Äì768** tokens per chunk for general documents. Tune up or down based on average document length and retrieval quality.



### 2. `overlap_tokens=50` (Chunk Overlap)
Specifies how many tokens to repeat at the start of the next chunk.

**Why overlap?**  
- Prevents important context from falling in the ‚Äúgap‚Äù between adjacent chunks  
- Improves retrieval accuracy for queries that span chunk boundaries  

**Trade-offs:**  
- **More overlap** (e.g. 100‚Äì150):  
  - Better context continuity  
  - More storage & compute (duplicates between chunks)  
- **Less overlap** (e.g. 20‚Äì30):  
  - Leaner storage, faster ingestion  
  - Higher risk of missing cross-boundary associations  

**Rule of thumb:**  
Overlap of **10%‚Äì20%** of your `token_count` is a good starting point (e.g. 50‚Äì100 tokens for 512).



In [None]:
# Kick off ingestion with the collected IDs
ingest_job = client.vector_database.create_ingestion_job(
    database_id=db_id,
    files=file_ids,
    method="best",
    token_count=512,
    overlap_tokens=50,
)
print(f"‚è≥ Ingestion job submitted: {ingest_job.id}")

In [None]:
# Monitor ingestion job
import time

timeout = 300  # 5 minutes timeout
interval = 5   # Check every 5 seconds

while True:
    job_status = client.vector_database.retrieve_ingestion_job(db_id, ingest_job.id)
    status = job_status.status
    print(f"Ingestion job status: {status}")

    if status == "completed":
        print("Vector database ready!")
        break
    elif status == "failed":
        error = getattr(job_status, "error_message", "Unknown error")
        print(f"Ingestion job failed: {error}")
        break

    time.sleep(interval)

# General Guide: Building Effective SeekrFlow Agents

Use this as a blueprint for creating any SeekrFlow agent‚Äîregardless of challenge‚Äîby focusing on tool setup, prompt design, model choice, and iteration best practices.



## 1. Define Your Tools Clearly  
Every agent needs ‚Äútools‚Äù to reach outside the LLM‚Äôs knowledge.  
- **Name & Description**  
  Give each tool a human‚Äêreadable label and a concise description  
  (e.g. ‚ÄúVectorSearch: lookup similar records in the water_quality index‚Äù).  
- **Configuration Parameters**  
  - For a vector search tool: the vector-DB ID, chunking or index settings, etc.  
  - For a web search tool: target domains or recency filters.  
- **Best Practices**  
  - One tool ‚Üí one responsibility.  
  - Keep descriptions < 12 words.  
  - Use consistent naming conventions across agents.



## 2. Craft Step-by-Step Agent Instructions  
Your prompt is the single most powerful lever for steering behavior.  
- **Break Down the Workflow**  
  1. **Receive** the user input in its expected format (JSON, pipe string, etc.).  
  2. **Call** the right tool with that raw input.  
  3. **Extract** the needed field from the tool result (e.g. `metadata.event_type`).  
  4. **Generate** the final user‚Äêfacing output (report, plan, recommendation).  
- **Constrain Output**  
  - Specify exactly what to return (e.g. ‚ÄúOutput only the maintenance plan.‚Äù).  
  - Discourage extra commentary or restating the prompt.  
- **Tone & Style**  
  - Define audience (business stakeholders vs. technical users).  
  - Use bullet lists or short paragraphs for readability.  
- **Troubleshooting Tips**  
  - If the agent hallucinates or misses a tool call, add or refine explicit tool‚Äêcalling instructions.  
  - Numbered steps help enforce the order of operations.



## 3. Choose the Right Model  
Selecting a model impacts speed, cost, and fidelity:  
- **Instruction-tuned vs. Base**  
  - Instruct models excel at following detailed prompts and formatting outputs.  
- **Size Trade-offs**  
  - Smaller (< 4B) for very low-latency tasks or tight compute budgets.  
  - Mid-range (8B) for balanced speed and reasoning.  
  - Large (> 13B) for the most complex, multi-step logic.  
- **When to Switch**  
  - Increase size if the agent truncates steps or produces inconsistent outputs.  
  - Downgrade if simple tasks run slowly and network latency is a concern.



## 4. Assemble & Register Your Agent  
Use the SDK or UI to bring everything together:  
1. **Initialize** your client with a secure API key.  
2. **Register** each tool with its configuration.  
3. **Compose** a `CreateAgentRequest` (or UI equivalent) including:
   - Agent **name**  
   - **instructions** text  
   - **model_id**  
   - **tools** list  
4. **Create** the agent and note its ID for interactive queries.



## 5. Validate & Iterate  
Before tackling full use cases, run quick sanity checks:  
- **Smoke Test**  
  - Send a representative sample input; confirm the correct tool is called.  
  - Verify the agent returns only the desired output format.  
- **Refine Prompt**  
  - Tighten or reorder steps if the agent wanders off.  
  - Add output examples or templates for clarity (few-shot style).  
- **Log & Compare**  
  - Keep versions of your instructions and record outcomes.  
  - Share change summaries with your team to converge on best phrasing.



### üöÄ Final Tips  
- **Clarity over cleverness:** explicit > implicit.  
- **Modular design:** separate tools, prompts, and data so you can swap pieces easily.  
- **Collaborate early:** peer-review prompts and tool configs before hackathon demos.  

Follow this framework on any challenge‚ÄîMedicare eligibility, shipping fraud, water quality‚Äîand you‚Äôll build agents that are reliable, transparent, and ready for real‚Äêworld impact.  


<div class="card">

## üõ†Ô∏è **Understanding Your Agent's Tools**

Your SeekrFlow agent will use specialized tools‚Äî**FileSearch** and **WebSearch**‚Äîto access and retrieve external information. Here‚Äôs how each tool works in practice, helping you set proper expectations and craft effective prompts.



### üìÇ **FileSearch Tool**

**Purpose:**  
Searches a pre-loaded vector database containing structured documents and data provided specifically for the hackathon.

**What Happens Under the Hood:**  
- **Similarity Search:** Your query is embedded into a numeric format, and the tool finds the most similar records in the database.
- **Returned Results:** Typically include:
  - Relevant text snippets or records from the stored documents.
  - Associated metadata (e.g., event types, IDs, timestamps).

**Best Practices:**  
- Queries should clearly reference identifiable fields or terms from your data.
- Ideal for specific, structured lookups‚Äîlike member profiles, historical events, sensor readings, or labeled classifications.



### üåê **WebSearch Tool**

**Purpose:**  
Performs a real-time search using BraveSearch to retrieve the latest, publicly available web information.

**How WebSearch Operates:**  
  - Sends your query to BraveSearch.
  - Retrieves a set of metadata-rich results, including **page title**, **snippet (~120 characters)**, and **URL**.
  - **Note:** No full page content returned at this step.
  
**Best Practices:**  
- Ideal for up-to-date facts, regulations, guidelines, or public records.
- Craft specific queries to get precise, relevant snippets.
- Be aware of the snippet limitations; use concise, clear wording to improve relevance.



### üìå **Key Takeaways for Using Tools**

- **FileSearch** is great for targeted, structured lookups using your provided datasets.
- **WebSearch** excels at finding current external information quickly but returns limited initial context (titles, URLs, short snippets).

By understanding these behaviors, you can craft prompts that help your agent retrieve exactly the insights you need for the challenge.

</div>



In [None]:
from seekrai import SeekrFlow
from seekrai.types import CreateAgentRequest, FileSearch, FileSearchEnv, WebSearch, WebSearchEnv

# Vector DB that holds historical asylum-seeker interview chunks
vector_search_tool = FileSearch(
    tool_env=FileSearchEnv(
        file_search_index=db_id,
        document_tool_desc="Historical database of asylum-seeker interviews"
    )
)

# Optional external fact-checking (e.g., news / NGO / gov sites from the past week)
web_search_tool = WebSearch(
    tool_env=WebSearchEnv()
)

agent_req = CreateAgentRequest(
    name="test",
    instructions="""

You are **test**, an expert assistant that can:


""".strip(),
    model_id="meta-llama/Llama-3.1-8B-Instruct",
    tools=[vector_search_tool, web_search_tool]
)

agent = client.agents.create(agent_req)
print("Created agent:", agent.id)


In [None]:
client.agents.retrieve(agent_id=agent.id)
status = client.agents.retrieve(agent_id=agent.id).status
print(status)
print(agent.id)

## üßµ Understanding Threads in SeekrFlow

Even though you‚Äôll be interacting with agents via the Sandbox UI during the hackathon, it‚Äôs helpful to know how the underlying ‚Äúthread‚Äù concept works in code. A **thread** is simply the container for a single conversation with an agent‚Äîmuch like a chat room or ticket‚Äîand it keeps all your messages, tool calls, and responses neatly grouped.



### 1. What Is a Thread?  
- A thread represents one discrete dialogue or task instance.  
- It stores all user messages, assistant messages, and intermediate tool outputs.  
- You can have multiple concurrent threads against the same agent (e.g. testing different sensor inputs in parallel).



### 2. Core Thread Operations

| Step                           | Purpose                                                                 |
|--------------------------------|-------------------------------------------------------------------------|
| **Create a thread**            | `client.agents.threads.create()` ‚Üí returns a new `thread.id`.           |
| **Post a user message**        | `create_message(thread_id, role="user", content=...)`                  |
| **Launch the agent run**       | `client.agents.runs.run(agent_id, thread_id, stream=False)`            |
| **Fetch assistant replies**    | `client.agents.threads.list_messages(thread_id)` and filter `role=="assistant"` |

### 3. Why Threads Matter

- **Stateless Agents**  
  Each run is isolated to its thread; you won‚Äôt accidentally mix up inputs or outputs from different tests.

- **Audit Trail**  
  All messages, tool calls, and responses are preserved‚Äîuseful for debugging or compliance.

- **Parallel Testing**  
  Spin up multiple threads to compare different prompts, tool configurations, or input records side-by-side.



### 4. Hackathon Takeaway

In the UI you‚Äôll see a familiar chat interface; under the covers, SeekrFlow is managing these threads via API. Knowing how threads work will help you:

- Interpret logs and debug unexpected responses.  
- Automate batch testing in code if you choose to extend beyond the UI.  
- Understand how context is preserved across multi-step workflows.
- **Update the `content` field** when posting your user message to test different inputs against your agent and observe how it responds.

Now you‚Äôre armed with the ‚Äúthread‚Äù concept‚Äîlet‚Äôs keep iterating on those agents!  


In [None]:
from seekrai import SeekrFlow

# Spin up an empty thread
thread = client.agents.threads.create()
print("New thread:", thread.id)

In [None]:
# 5a. Post your user question
user_msg = client.agents.threads.create_message(
    thread_id=thread.id,
    role="user",
    content=(
        "I am interested in the Zone-1 water quality. How many contamination events do we have??" 
    )
)
print("‚úâÔ∏è User message posted:", user_msg.id)

In [None]:
# 5b. Kick off the run (no streaming)
run_resp = client.agents.runs.run(
    agent_id=agent.id,
    thread_id=thread.id,
    stream=False
)
print("‚ñ∂Ô∏è Run started, run_id =", run_resp.run_id)

In [None]:
# 5c. Fetch the assistant‚Äôs reply once it‚Äôs done
msgs = client.agents.threads.list_messages(thread.id)
assistant_replies = [m for m in msgs if m.role == "assistant"]
if assistant_replies:
    print("ü§ñ Agent reply:\n", assistant_replies[-1].content)
else:
    print("‚ö†Ô∏è No reply found yet ‚Äì try polling again in a few seconds.")

# üöÄ Launch & Interact with Your Agent in the SeekrFlow UI

1. Open your browser and go to **https://apps.seekr.com**  
2. Enter the credentials provided by your administrator.  
3. From the top‚Äêleft menu, choose **SeekrFlow**.  
4. In the SeekrFlow home screen, click on **Agents Sandbox (Beta)**.  
5. In the ‚ÄúSelect Agent‚Äù dropdown, find and choose the name of the agent you just created (e.g. ‚Äúv5 Water Quality‚Äù or your custom agent name).  
6. Start typing your test prompts in the chat window and hit **Send** to see your agent in action!  

# Happy hacking‚Äîand may your agents be ever responsive!  
