# ADALA Quickstart

In this notebook, we are going to run through some of the common tasks for creating data labeling agents with ADALA. In this example, we're going to create a data labeling agent for a text classification task - labeling our text samples as either "Subjective or "Objective" statements. 

This agent will be LLM-based, so we will use [OpenAI's API](https://platform.openai.com/). You will to generate an API key and set it as an environment variable as follows: 

```
export OPENAI_API_KEY=your_openai_api_key
```

Now, let's begin. 

## Dataset Creation
First, let's use a dataset of product reviews stored in pandas dataframe. This will help us manage our data as we add more attributes, like predictions and labels for subjectivity and objectivity over time. 

In [1]:
import pandas as pd
texts = [
    "The mic is great.",
    "Will order from them again!",
    "Not loud enough and doesn't turn on like it should.",
    "The phone doesn't seem to accept anything except CBR mp3s",
    "All three broke within two months of use."
]
df = pd.DataFrame(texts, columns=['text'])
df

Unnamed: 0,text
0,The mic is great.
1,Will order from them again!
2,Not loud enough and doesn't turn on like it sh...
3,The phone doesn't seem to accept anything exce...
4,All three broke within two months of use.


## Create Agent

Agent's abilities are defined as **"Skills"**. Each agent can possess many different skills. In our case, this agent only has one labeling skill, to produce a classification of Subjective or Objective for a given piece of text. 

To define this skill, we will leverage an LLM, passing it instructions and the set of labeles we expect to receive back. 

In [2]:
from adala.agents import Agent
from adala.datasets import DatFrameDataset
from adala.environments import DataFrameEnvironment
from adala.skills import ClassificationSkill


dataset = DataFrameDataset(df=df, input_data_field='text')


agent = Agent(
    # connect to a dataset 
    environment=DataFrameEnvironment(
        dataset=dataset,
        # we use the same dataframe as a ground truth source
        ground_truth_set=df,
        ground_truth_column='ground_truth',
        matching_function='exact_match'
    ),
    
    # define the agent's labeling skill
    skills=ClassificationSkill(
        name='subjectivity_detection',
        description='Understanding subjective and objective statements from text.',
        instructions='Classify a product review as either expressing "Subjective" or "Objective" statements.',
        labels=['Subjective', 'Objective']
    )
)

agent

Agent(environment=Environment(dataset=DataFrameDataset(input_data_field='text', df=                                                text  ground_truth
0                                  The mic is great.           NaN
1                        Will order from them again!           NaN
2  Not loud enough and doesn't turn on like it sh...           NaN
3  The phone doesn't seem to accept anything exce...           NaN
4          All three broke within two months of use.           NaN, ground_truth_column='ground_truth')), skills=LinearSkillSet(skills={'skill_0': ClassificationSkill(name='subjectivity_detection', instructions='Classify a product review as either expressing "Subjective" or "Objective" statements.', description='Understanding subjective and objective statements from text.', input_template='Input: {{{{{input}}}}}', output_template="Output: {{{{select 'predictions' options=labels logprobs='score'}}}}", validation_fields=['predictions'], labels=['Subjective', 'Objective'], predi

## Running the Agent
We will now apply the skill to each of the tasks in our dataframe by running the agent. Our result will be a dataframe that we'll combine with our original data to view together. 

In [3]:
%load_ext autoreload
%autoreload 2

In [None]:
agent.environment = Environment()

In [4]:
# here we apply skills to the same dataset, but in the real world scenarios it can be any other dataset
experience = agent.apply_skills(dataset)
pd.concat((df, experience.predictions), axis=1)

10


Unnamed: 0,text,ground_truth,predictions,score
0,The mic is great.,,Subjective,"{'Subjective': -0.02697588099999997, 'Objectiv..."
1,Will order from them again!,,Subjective,"{'Subjective': -0.11282212000000001, 'Objectiv..."
2,Not loud enough and doesn't turn on like it sh...,,Subjective,"{'Subjective': -0.014163457000000034, 'Objecti..."
3,The phone doesn't seem to accept anything exce...,,Objective,"{'Subjective': -2.0720863, 'Objective': -0.134..."
4,All three broke within two months of use.,,Objective,"{'Subjective': -2.1821797, 'Objective': -0.119..."


## Labeling the Data
We now have some basic predictions for our data. Now, these predictions could be all correct or wildly incorrect, but we don't know at this point. We need to incorporate an annotation team to provide feedback. Since this is a quickstart, we will apply some labels to the dataframe directly. Let's fix them and create a _ground truth_ that is reliable. 

In [5]:
df.loc[0, 'ground_truth'] = 'Subjective'
df.loc[1, 'ground_truth'] = 'Subjective'
df.loc[2, 'ground_truth'] = 'Objective'
df.loc[3, 'ground_truth'] = 'Objective'
df.loc[4, 'ground_truth'] = 'Objective'

## Improving the Agent
Now that we have ground truth labels for our data, we can ask our agent to learn and improve itself by using it as a guide. 

We can see in the output:
- System Prompt - the agent prompt used to improve our labeling skill
- User Prompt - the original LLM prompt we created for our skill
- Assistant Prompt - our updated prompt learned from our agent

In [7]:
learn_results = agent.learn()

10

Done!





In [9]:
print(learn_results.updated_instructions)

Determine whether the given product review contains "Subjective" (based on personal feelings, tastes, or opinions) or "Objective" (based on facts or observable phenomena) statements.

Examples:
Input: The camera quality is 12 megapixels.
Output: Objective

Input: I think the camera quality is not good enough.
Output: Subjective

Input: The laptop has a 15-inch screen.
Output: Objective


##

## Next Steps
We can incorporate our updated instructions into our agent's skill, and run it on new data. 

In [None]:
"Output: {{{{gen 'predictions'}}}}".format(a='aaaa')