# Code generation

We teach Adala how to generate a code to convert one json format to another.

The following are the examples from Huggingface Inference API responses:

In [5]:
import pandas as pd


df = pd.DataFrame([
    {'payload': '{"outputs": [{"entity_group": "ORG", "score": 0.9994323253631592, "word": "Apple Inc", "start": 0, "end": 9}, {"entity_group": "MISC", "score": 0.997283935546875, "word": "iPhone 14", "start": 24, "end": 33}], "inputs": "Apple Inc. released the iPhone 14 in September 2022, featuring satellite connectivity."}'},
    {'payload': '{"outputs": [{"entity_group": "MISC", "score": 0.9428057670593262, "word": "Ubuntu", "start": 26, "end": 32}, {"entity_group": "MISC", "score": 0.962793231010437, "word": "Ubuntu", "start": 51, "end": 57}, {"entity_group": "ORG", "score": 0.998673677444458, "word": "Canonical Ltd", "start": 87, "end": 100}], "inputs": "The latest version of the Ubuntu operating system, Ubuntu 22.04, was made available by Canonical Ltd. in April."}'},
    {'payload': '{"outputs": [{"entity_group": "ORG", "score": 0.979661226272583, "word": "Tesla", "start": 0, "end": 5}, {"entity_group": "ORG", "score": 0.8453200459480286, "word": "Cybertru", "start": 12, "end": 20}, {"entity_group": "MISC", "score": 0.7452507019042969, "word": "##ck", "start": 20, "end": 22}, {"entity_group": "PER", "score": 0.9728273153305054, "word": "El", "start": 78, "end": 80}, {"entity_group": "PER", "score": 0.9739447236061096, "word": "##on Musk", "start": 80, "end": 87}], "inputs": "Tesla\'s new Cybertruck is set to hit the roads in late 2023, according to CEO Elon Musk."}'},
    {'payload': '{"outputs": [{"entity_group": "ORG", "score": 0.9987253546714783, "word": "Google", "start": 0, "end": 6}, {"entity_group": "ORG", "score": 0.9994670748710632, "word": "Alphabet Inc", "start": 25, "end": 37}, {"entity_group": "MISC", "score": 0.9959796667098999, "word": "Pixel 6", "start": 91, "end": 98}], "inputs": "Google\'s parent company, Alphabet Inc., saw a rise in stock prices after the launch of the Pixel 6."}'},
    {'payload': '{"outputs": [{"entity_group": "ORG", "score": 0.999211311340332, "word": "Samsung Electronics", "start": 0, "end": 19}, {"entity_group": "ORG", "score": 0.9967896342277527, "word": "LG Display", "start": 38, "end": 48}, {"entity_group": "MISC", "score": 0.47527530789375305, "word": "O", "start": 56, "end": 57}, {"entity_group": "MISC", "score": 0.5774009227752686, "word": "##D", "start": 59, "end": 60}], "inputs": "Samsung Electronics is competing with LG Display in the OLED market."}'}
])

The goal is to convert them into the Label Studio format.

`SimpleCodeValidationEnvironment` automatically validates and feedback is exchanged with agents to improve based on detected errors.

In [None]:
import pandas as pd
from adala.skills import AnalysisSkill, ParallelSkillSet, LinearSkillSet
from adala.agents import Agent
from adala.environments import StaticEnvironment, WebStaticEnvironment, SimpleCodeValidationEnvironment
from adala.runtimes import OpenAIChatRuntime


skillset = ParallelSkillSet(skills=[
    AnalysisSkill(
        name='code_generation',
        input_template="Input JSON: {payload}",
        output_template="Code: {code}",
        instructions='''
Format description: 
id - Identifier for the labeling task from the dataset.
data - Data dict copied from the input data task format.
project - Identifier for a specific project in Label Studio.
predictions - Array containing the labeling results for the task.
predictions.id - Identifier for the completed task.
predictions.lead_time - Time in seconds to label the task.
predictions.result - Array containing the results of the labeling or annotation task.
result.id - Identifier for the specific annotation result for this task.
result.from_name - Name of the tag used to label the region. See control tags.
result.to_name	- Name of the object tag that provided the region to be labeled. See object tags.
result.type	- Type of tag used to annotate the task.
result.value - Tag-specific value that includes details of the result of labeling the task. The value - structure depends on the tag for the label. For more information, see Explore each tag.
predictions.score - The overall score of the result, based on the probabilistic output, confidence level, or other.

Following the target JSON format provided, write a minimal python code that transform input json to this format. \
Assume the input data will be read from the standard input (stdin), and the output generated will be directed to the standard output (stdout).'''
)])

env = SimpleCodeValidationEnvironment(df=df, code_fields={'code': 'payload'})

agent = Agent(skills=skillset, environment=env)
agent.learn(learning_iterations=1, num_feedbacks=1, batch_size=3)
predictions = agent.run()

Here is the code produced by Adala agent:

In [12]:
print(predictions.code[0])


import json
import sys

# read input from stdin
input_json = sys.stdin.read()

# parse input json
input_data = json.loads(input_json)

# initialize output json
output_data = {}

# add id to output json
output_data["id"] = input_data["outputs"][0]["entity_group"]

# add data to output json
output_data["data"] = input_data["inputs"]

# add project to output json
output_data["project"] = "Label Studio"

# initialize predictions array
predictions = []

# loop through each output in input json
for output in input_data["outputs"]:
    # initialize prediction dict
    prediction = {}

    # add id to prediction dict
    prediction["id"] = output["entity_group"]

    # add lead_time to prediction dict
    prediction["lead_time"] = 0

    # initialize result array
    result = []

    # initialize result dict
    result_dict = {}

    # add from_name to result dict
    result_dict["from_name"] = output["entity_group"]

    # add to_name to result dict
    result_dict["to_name"] = output["entit