# 4. Skill Setup - Customizing and Developing Kernel Skills

This section focuses on extending your RAG application by customiszing the kernel skill and integrating with Studio for development tracing. You'll learn how to customize the Q&A skill template, modify prompts to generate structured responses, and use development tracing to debug and improve your skills.

## Prerequisites

- **Document Collection**: You have completed the previous section on document ingestion and have at least one indexed document collection
- **Lorem Ipsum**: ...


## Skill Components

The skill development process involves several key elements:

- **Kernel Integration**: Connects custom code to the Pharia kernel
- **Studio Tracing**: Provides debugging and performance insights during development
- **Prompt Customization**: Controls how responses are structured and formatted
- **Development Workflow**: Process for testing and refining your skills

## What You'll Learn

In this section, you'll learn how to:

1. Understand the Q&A Kernel Skill template structure
2. Explore document retrieval options in the intelligence layer
3. Customize prompts to generate structured responses
4. Set up development tracing with Studio for debugging and optimization

<br>
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
<br>

### 1. Understanding the Q&A Kernel Skill Template

The Q&A skill template provides a foundational structure for implementing Retrieval-Augmented Generation (RAG). Let's examine the core components of the template to understand how it works.

The basic structure of a Q&A kernel skill includes:
- Input and output models
- Document retrieval using the Document Index
- Context assembly from retrieved documents
- LLM prompting for answer generation

Here's the standard Q&A skill template that was generated when you created your application:

In [None]:
from pharia_skill import ChatParams, Csi, IndexPath, Message, skill
from pydantic import BaseModel

NAMESPACE = "Studio" #Document Index Namespace, won't change
COLLECTION = "papers"
INDEX = "asym-64"


class Input(BaseModel):
    question: str
    namespace: str = NAMESPACE
    collection: str = COLLECTION
    index: str = INDEX


class Output(BaseModel):
    answer: str | None
    sources: list[str] | None # Sources used to generate the answer


@skill
def custom_rag(csi: Csi, input: Input) -> Output:
    index = IndexPath(
        namespace=input.namespace,
        collection=input.collection,
        index=input.index,
    )

    if not (documents := csi.search(index, input.question, 3, 0.5)):
        return Output(answer=None)


    # Extract document names from the documents list
    document_names = list(set([d.document_path.name for d in documents]))

    context = "\n".join([d.content for d in documents])
    content = f"""Using the provided context documents below, answer the following question accurately and comprehensively. If the information is directly available in the context documents, cite it clearly. If not, use your knowledge to fill in the gaps while ensuring that the response is consistent with the given information. Do not fabricate facts or make assumptions beyond what the context or your knowledge base provides. Ensure that the response is structured, concise, and tailored to the specific question being asked.

Input: {context}

Question: {input.question}
"""
    message = Message.user(content)
    params = ChatParams(max_tokens=512)
    response = csi.chat("llama-3.1-8b-instruct", [message], params)
    return Output(answer=response.message.content, sources=document_names)

Let's break down the key components:

1. **Configuration Constants**: The template defines default values for `NAMESPACE`, `COLLECTION`, and `INDEX`, which specify where to search for documents.

2. **Input/Output Models**: The skill uses Pydantic models to define the expected input (a question and optional search parameters) and output (the answer).

3. **Document Retrieval**: The `csi.search()` method retrieves relevant documents based on the user's question, limiting to 3 results with a minimum relevance score of 0.5.

4. **Context Assembly**: Retrieved documents are combined into a single context string that will be sent to the LLM.

5. **Prompt Construction**: The template includes a basic prompt that instructs the LLM how to use the context to answer the question.

6. **Answer Generation**: The `csi.chat()` method sends the prompt to the LLM and returns the generated response.

#### Updating your skill (rebuild & publish)
To make sure that your changes to the skill code are taken over, remember to rebuild and republish your skill after making changes, using the commands from the previous section from within your the ```<your-application>\skill``` folder:

In [None]:
#To set the right .env
set -a && source ../.env

#Build & publish the skill
uv run pharia-skill build qa
uv run pharia-skill publish qa

### 2. Change the Collection the Skill is Referring to
Before we explore advanced features, let's update our skill to use the document collection we created in the earlier Data Setup section. By default, the Q&A template is configured to use a collection called "papers", but we want to use our own collection.
The collection name is defined as a constant at the top of the skill file:

```python
NAMESPACE = "Studio" #Document Index Namespace, won't change
COLLECTION = "papers"
INDEX = "asym-64"
```

We need to update the ```COLLECTION``` constant to point to our collection created during the Document Ingestion phase:

```python
NAMESPACE = "Studio" #Document Index Namespace, won't change
COLLECTION = "pharia-tutorial-full"
INDEX = "asym-64"
```

This simple change redirects your skill to use the documents that you ingested earlier instead of the default example collection. Now when users ask questions, the skill will search within your custom document collection.


### 3. Document Retrieval Options in the Intelligence Layer "Work In Progress"

The Q&A template uses the `search` method, but there are other retrieval methods available that offer different capabilities from the Intelligence Layer SDK. We are still working on transferring all of thhe capabilites provided in the IL SDK in the kernel search implementation. (https://github.com/Aleph-Alpha/intelligence-layer-sdk/blob/main/src/documentation/document_index.ipynb)

Some of the alternative retrieval methods include:

- **Hybrid Search**: Combines semantic and keyword-based search for better precision
- **Metadata Filtering**: Restricts search to documents with specific metadata properties
- **Multi-Query Search**: Generates multiple search queries from a single user question
- **Document Reranking**: Re-scores retrieved documents based on relevance to the question

For most applications, the standard `search` method provides a good balance of performance and relevance. As your application becomes more sophisticated, you can explore these alternative methods to improve retrieval quality.

### 4. Customizing Prompts for Structured Responses

One of the most effective ways to improve the quality of your RAG application is to customize the prompt that is sent to the LLM. By modifying the prompt, you can control the format, style, and content of the generated responses.

Let's look at how you can modify the prompt to generate more structured responses. For example, you might want responses that:

- Include specific sections like "Summary" and "Details"
- Clearly indicate which parts are from the documents and which are from the LLM's knowledge
- Follow a consistent formatting style

Here's how you could modify the prompt in the Q&A skill template:

In [None]:
# Example of a modified prompt for structured responses
content = f"""Using the provided context documents below, answer the following question. Format your response with the following sections: 1. Summary: A brief 1-2 sentence answer to the question 2. Details: A comprehensive explanation with specific information from the context 3. Sources: References to the specific parts of the context you used, if applicable If the information is not available in the context documents, clearly state this and provide a general response based on your knowledge, marked as [GENERAL KNOWLEDGE].

To implement this change, you would replace the `content` variable in your skill function with the modified prompt above. This instructs the LLM to structure its response with specific sections, making the information more organized and easier to read.

For even more control, you can use a system message to set persistent instructions for the LLM:

Experiment with different prompt variations to find the structure that works best for your specific use case. Keep in mind that the prompt should be clear and specific about what you want the LLM to do, while leaving enough flexibility for it to generate helpful responses.

### 5. Setting Up Development Tracing with Studio

Development tracing is a powerful feature that allows you to debug and optimize your skills by sending execution data to PhariaStudio. This gives you visibility into each step of the RAG process, from document retrieval to answer generation.

#### Understanding Tracing Benefits

Tracing helps you:
- Visualize the exact documents retrieved for each question
- Evaluate the relevance of retrieved documents
- Examine how the prompt is constructed
- Measure performance metrics like retrieval and generation time
- Identify bottlenecks in your RAG pipeline

#### 5.1 Configuration Setup

First, you need to configure the environment variables in the `<your-application>\skill` folder. These variables enable your skill to interact with both the Kernel (for execution) and Studio (for tracing):

```bash
PHARIA_KERNEL_ADDRESS=https://pharia-kernel.your-deployment.pharia.com/
PHARIA_AI_TOKEN=example-token-value
PHARIA_STUDIO_ADDRESS=.https://pharia-studio.your-deployment.pharia.com
```

#### 5.2 Adding Dependencies for Tracing


To enable tracing from within Jupyter notebooks, we need to add the `pharia-skill` dependency to our environment. Run this command in your terminal (in the same folder as this notebook):

In [None]:
poetry add pharia-skill

#### 5.3.A Creating a Test Function

Now we'll define a function that calls our skill with tracing enabled. This allows us to test the skill independently and quickly iterate on improvements:

In [None]:
#Helper Function
from pharia_skill.testing import DevCsi
from rag_tutorial.skill.qa import IndexPath, Input, custom_rag

def test_tracing():
    csi = DevCsi().with_studio("rag-tutorial")
    index = IndexPath(
        namespace="Studio",
        collection="papers",
        index="asym-64",
    )
    input = Input(
        question="What is an encoder?",
        namespace=index.namespace,
        collection=index.collection,
        index=index.index,
    )
    result = custom_rag(csi, input)
    assert "network" in result.answer and "layers" in result.answer
    print(result)

To run the test function, we need to load the environment variables from our skill's .env file and then call the function:


In [None]:
from dotenv import load_dotenv

load_dotenv("rag_tutorial/skill/.env")
test_tracing()

#### 5.3.B (Alternative) Running Tests from Terminal

Alternatively you can also add the test_tracing function to your qa_test.py file and invoke it from the terminal:

In [None]:
uv run pytest -k test_tracing

This approach could be useful for integrating tracing into your continuous integration workflow.

#### 5.4 Viewing Traces in Studio
After executing the skill with tracing enabled, you can view the detailed trace in PhariaStudio:

1. Navigate to your PhariaStudio URL
2. Go to the "Traces" section in the left sidebar
3. Find your trace by name (in our example, "rag-tutorial")
4. Click on the trace to view the detailed execution flow

In the trace view, you'll see:

- The input question
- The documents retrieved from the index with their relevance scores
- The exact prompt sent to the LLM
- The generated response
- Additional metrics for each step of the process

This information is invaluable for understanding how your skill is performing and identifying opportunities for optimization.


## Summary

In this section, you've learned how to customize and optimize your RAG application by working with the kernel skill:

✅ **Explored the Q&A skill template** and its key components for implementing RAG

✅ **Learned about different document retrieval options** available in the Intelligence Layer SDK

✅ **Customized prompts** to generate more structured and informative responses

✅ **Set up development tracing** with Studio to debug and optimize your skill

With these skills, you can now create more sophisticated RAG applications that deliver high-quality, structured responses to user queries. In the next section, we'll look at how to evaluate and improve the performance of your RAG application.