# SN20 - BitAgent

This demo notebook should fufill the following tasks.
 - (A) Demonstrate the quality of the communication between miners and validators coming from the top miner.
 - (B) Justify the difference in incentive for miners in different tiers (eg.quantile 1 VS quantile 3).
 - (C) (If applicable) Show the landscape and variety of miners. 
 - (D) (If applicable) Demonstrate the effectiveness of the scoring mechanism.
 - (E) (If applicable) Show the dataset that was used by the validator.
 - (F) (If applicable) Show the use of any API and/or links to a frontend.
 
## Objective
>  SN20 focuses on providing LLM-guided tool execution. There is 1 type of task with several sub types:
> - Single Tool Call Task - given a single tool, call that tool with the correct arguments  
> - Multi Tool Call Task - given a list of serveral tools, call the correct tool with the correct arguments
> - Irrelevant Tool Call Task - given a list of serveral tools, do not call any tools
> - And many more to come ... multi-step, parallel, etc.

## Setup
> Provide your hotkey and coldkey \
> Modify any other config you need like the openai api base url for your mistral 7b llm

In [1]:
%%capture

# to clear the notebook params from the argument list for argparser
import sys
sys.argv = ['']

import bittensor as bt 
from rich import print as rprint
from neurons.validator import Validator
from IPython.display import display_markdown
from bitagent.tasks.task import get_random_task
from bitagent.validator.initiation import initiate_validator

SUBNET_UID = 20
# working with subnet
#subnet = bt.metagraph(netuid=SUBNET_UID, network="finney")
#vali_wallet =  bt.wallet(name=WALLET_NAME, hotkey=HOTKEY_NAME, path=WALLET_PATH)
#vali_dendrite = bt.dendrite(wallet=vali_wallet)

# Wallet 
WALLET_NAME = # TODO, put your coldkey
HOTKEY_NAME = # TODO, put your hotkey
WALLET_PATH = # TODO, put your wallet path

# setup the config for the validator
cfg = bt.Config()
cfg.wallet = {"name": WALLET_NAME, "hotkey": HOTKEY_NAME, "path": WALLET_PATH}
cfg.netuid = SUBNET_UID
#cfg.openai_api_base = # TODO if you need to set this to the LLM of your choice

# setup the validator
from unittest.mock import patch

# Define your stubbed function for our get_alive_uids so we don't have to check them ALL
def stub_get_alive_uids(whatever):
    return [1, 2, 3]

# Patch the function in our validator module
with patch('common.base.validator.get_alive_uids', new=stub_get_alive_uids):
    vali = Validator(config=cfg)
    initiate_validator(vali)


In [2]:
task = get_random_task(vali)


In [3]:
task.criteria

[{   'desc': '',
     'eval_args': [   ],
     'eval_fx': <function does_not_error at 0x7fad77aca950>,
     'name': 'Does '
             'not '
             'error'},
 {   'desc': '',
     'eval_args': [   ],
     'eval_fx': <function does_not_take_a_long_time at 0x7fad77aca9e0>,
     'name': 'Does '
             'not '
             'take '
             'a '
             'long '
             'time'},
 {   'desc': '',
     'eval_args': [   ],
     'eval_fx': <function correct_tool_call_function_format at 0x7fad77acab00>,
     'name': 'Return '
             'correct '
             'function '
             'format'},
 {   'desc': '',
     'eval_args': [   {   'arguments': {   'company_name': [   'Apple '
                                                               'Inc.',
                                                               'Apple'],
                                           'detail_level': [   'detailed'],
                                           'market': [   'NASDAQ',
  

In [4]:
task.synapse.messages

[ChatMessage(role=<ChatRole.USER: 'user'>, content='Obtain the detailed stock information for either "Apple Inc." or "Apple" in the NASDAQ market.')]

In [5]:
task.synapse.tools

[Tool(name='get_sensor_readings_history_by_interval', description='Retrieves historical sensor readings within a specified timespan, summarized in intervals, and returns them sorted by the start time of each interval in descending order.', arguments={'perPage': {'required': False, 'type': 'integer', 'description': 'Number of entries per page, within the range of 3 to 100.'}, 'startingAfter': {'required': False, 'type': 'string', 'description': 'Server-generated token indicating the start of the page, typically a timestamp or ID.'}, 'endingBefore': {'required': False, 'type': 'string', 'description': 'Server-generated token indicating the end of the page, typically a timestamp or ID.'}, 'networkId': {'required': False, 'type': 'array', 'description': 'Filter data by the specified network IDs.'}, 'serials': {'required': False, 'type': 'array', 'description': 'Filter readings by sensor serial numbers.'}, 'metrics': {'required': False, 'type': 'array', 'description': "Specify sensor readin

## (A) Top miner responses
- This section should demonstrate the quality of the communication between miners and validators coming from the top miner.

> - (1) Define a group of top miner.
>
> - (2) Define a forward function. 
> - (3) Call the forward function for the top miners. 
> - (4) Show the responses from the miners.

In [6]:
top_miner_uids = (-vali.metagraph.I).argsort()[:5]

def forward(uids):
    responses = vali.dendrite.query(
        axons=[vali.metagraph.axons[uid] for uid in uids],
        synapse=task.synapse,
        deserialize=False,
        timeout=10*task.timeout,
    )
    return responses
    
display_markdown(f'''### Tool Call Task - Input:
#### Messages:
{task.synapse.messages}

#### Available Tools:
{[t.name for t in task.synapse.tools]}

#### Top 5 Miner UIDs for Subnet 20: {top_miner_uids}''', raw=True)

results = forward(top_miner_uids)
for i,result in enumerate(results):
    try:
        response = result.response
        display_markdown(f'''### <span style="color:darkgray">**Response from {top_miner_uids[i]}:**</span> 
        {response}''', raw=True)
        feedbacks = []
        for crit in task.criteria:
            score, max_score, feedback = crit.evaluate(task, vali, result)
            feedbacks.append(feedback)
        rprint(("\n").join(feedbacks).replace("bold blue", "bold white"))
    except Exception as e:
        print(result.dendrite.status_code, " ", result.dendrite.process_time, " ", e)
        print(f"Miner {top_miner_uids[i]} did not respond correctly")

### Tool Call Task - Input:
#### Messages:
[ChatMessage(role=<ChatRole.USER: 'user'>, content='Obtain the detailed stock information for either "Apple Inc." or "Apple" in the NASDAQ market.')]

#### Available Tools:
['get_sensor_readings_history_by_interval', 'Buses_3_BuyBusTicket', 'Events_3_BuyEventTickets', 'get_sensor_alerts', 'Events_3_FindEvents', 'get_sensor_readings_latest', 'Hotels_2_BookHouse', 'get_current_time', 'get_stock_info', 'get_instagram_story_clicks', 'get_sensor_readings_history', 'Buses_3_FindBus', 'Hotels_2_SearchHouse']

#### Top 5 Miner UIDs for Subnet 20: [103 157 171 129 220]

### <span style="color:darkgray">**Response from 103:**</span> 
        get_stock_info(company_name='Apple Inc.', detail_level='detailed', market='NASDAQ')

### <span style="color:darkgray">**Response from 157:**</span> 
        get_stock_info(company_name='Apple Inc.', detail_level='detailed', market='NASDAQ')

### <span style="color:darkgray">**Response from 171:**</span> 
        get_stock_info(company_name='Apple Inc.', detail_level='detailed', market='NASDAQ')

### <span style="color:darkgray">**Response from 129:**</span> 
        get_stock_info(company_name='Apple Inc.', detail_level='detailed', market='NASDAQ')

### <span style="color:darkgray">**Response from 220:**</span> 
        get_stock_info(company_name='Apple Inc.', detail_level='detailed', market='NASDAQ')

## (B) Justification for incentive distribution 
- Justify the difference in incentive for miners in different incentive tiers (eg. sample 5 miners from quantile 1 VS 5 miners from quantile 3) with code.
- If there is no significant difference in the incentive distribution, you can also show that miners in the SN have about the same performance in multiple ways.

- There could be many reasons for the difference in incentive for miners. 
    - Case 1: Difference in the quality of response
        - Show that miners with higher incentive generally give better answer than those with lower incentive through the following ways
            - lower loss; higher accuracy
            - human eval for text/ image/ audio quality 

    - Case 2: Difference in miner avalibility 
        - Show that given a certain number of trials(100), there are more successful calls to higher incentive miners.
        
    - Case 3: Difference in latency.
        - Show that miners in Q1 generally respond faster than miners in Q3.

    - Case 4: Please provide your own justification if the reasons above dosen't fit.

> (1) Define the group of high incentive miner VS low incentive miner (you can have ~5 samples from each group, but please feel to make your own definition of high/low incentive miners)
>
> (2) Make the forward call to the group of high/low incentive miners 
>
> (3) Show the difference in the quality of the high/low incentive miners 


In [7]:
# we have our top miners, here are our lower end miners
validator_uids = vali.metagraph.uids[vali.metagraph.stake > 20000]
bottom_miner_uids = vali.metagraph.I.argsort()[len(validator_uids):len(validator_uids)+5] # remove the validators
display_markdown(f'''### Tool Call Task - Input:
#### Messages:
{task.synapse.messages}

#### Available Tools:
{[t.name for t in task.synapse.tools]}

#### Bottom 5 Miner UIDs for Subnet 20: {bottom_miner_uids}''', raw=True)
results = forward(bottom_miner_uids)
for i,result in enumerate(results):
    try:
        response = result.response
        display_markdown(f'''### <span style="color:darkgray">**Response from {bottom_miner_uids[i]}:**</span> 
        {response}''', raw=True)
        feedbacks = []
        for crit in task.criteria:
            score, max_score, feedback = crit.evaluate(task, vali, result)
            feedbacks.append(feedback)
        rprint(("\n").join(feedbacks).replace("bold blue", "bold white"))
    except Exception as e:
        print(result.dendrite.status_code, " ", result.dendrite.process_time, " ", e)
        print(f"Miner {bottom_miner_uids[i]} did not respond correctly")

### Tool Call Task - Input:
#### Messages:
[ChatMessage(role=<ChatRole.USER: 'user'>, content='Obtain the detailed stock information for either "Apple Inc." or "Apple" in the NASDAQ market.')]

#### Available Tools:
['get_sensor_readings_history_by_interval', 'Buses_3_BuyBusTicket', 'Events_3_BuyEventTickets', 'get_sensor_alerts', 'Events_3_FindEvents', 'get_sensor_readings_latest', 'Hotels_2_BookHouse', 'get_current_time', 'get_stock_info', 'get_instagram_story_clicks', 'get_sensor_readings_history', 'Buses_3_FindBus', 'Hotels_2_SearchHouse']

#### Bottom 5 Miner UIDs for Subnet 20: [134 111 205  95  99]

### <span style="color:darkgray">**Response from 134:**</span> 
        

### <span style="color:darkgray">**Response from 111:**</span> 
        get_stock_info(company_name='Apple Inc.', detail_level='detailed', market='NASDAQ')

### <span style="color:darkgray">**Response from 205:**</span> 
        get_stock_info(company_name='Apple Inc.', detail_level='detailed', market='NASDAQ')

### <span style="color:darkgray">**Response from 95:**</span> 
        get_stock_info(company_name='Apple Inc.', detail_level='detailed', market='NASDAQ')

### <span style="color:darkgray">**Response from 99:**</span> 
        get_stock_info(company_name='Apple Inc.', detail_level='detailed', market='NASDAQ')

## (C) (If applicable) Miner landscape
- How many unique responses can we get from the network and how many miners are giving the same responses. It is perfectly fine even if all of the miners respond the same thing.
> (1) Send the same request to all miners over the SN
>
> (2) Check the number of unique responses  
> 
> (3) Check the number of miners giving the same response
  

 ## (D) (If applicable) Demonstrate the effectiveness of the scoring mechanism.
- If you are using a reward/penalty model: 
    - Please load the reward or penalty model one by one and then show that the reward of a good response > the reward of a bad response
    - Please allow us to customise the input of the reward model

    > (1) Load the reward/penalty model one by one 
    >
    > (2) Define the good/bad response
    >
    > (3) Score the response with the model

- Otherwise, you may just give a brief explanation to how does your scoring mechanism works.
- Please show the distribution of the final reward and each sub-reward.
- Please link us to the original validator code where appropriate.


 ## (E) (If applicable) Show the dataset that was used by the validator.
 > (1) Load the dataset 
 > 
 > (2) Show the first 10 samples of the dataset 
- Please link us to the original validator code where appropriate

 ## (F) (If applicable) Demonstrate the use of any API and/or links to a frontend.


URL to app: https://gogoagent.ai \
API Endpoint: https://api.gogoagent.ai \
More documentation here - https://docs.google.com/document/d/1QVCzDu0eMmkdglD65F_Q_UjnCJauVEr62WgG8SgACt0

### Using the API endpoint

In [8]:
import openai

MODEL_API = "https://api.gogoagent.ai"
MODEL_NAME = "BitAgent/GoGoAgent"

# Initialize the OpenAI client
client = openai.OpenAI(
    api_key= # TODO, put your API key here
    base_url=MODEL_API
)

def tip_calculator(bill_amount, tip_percent):
    return bill_amount * tip_percent/100.

def another_calculator_for_summation(num1, num2):
    return num1 + num2

# Pose your user query
messages = [{"role": "user",
                    "content": "Need help calculating the tip, what is 10% tip on a bill totalling $100"}]

# Define the tools (see methods above)
tools = [{"name": "tip_calculator", "description": "Calculate the tip amount",
          "arguments": {"bill_amount": {"required": True, "type": "number",
                                        "description": "the bill amount in dollars"},
                        "tip_percent": {"required": True, "type": "number",
                                        "description": "the tip percentage as a whole number"},
                       }
        },
        {"name": "another_calculator_for_summation", "description": "Calculate the sum of two numbers",
          "arguments": {"num1": {"required": True, "type": "number",
                                 "description": "the first number for summation"},
                        "num2": {"required": True, "type": "number",
                                 "description": "the second number for summation"},
                       }
        }]

chat_response = client.chat.completions.create(
        model=MODEL_NAME,
        tools=tools,
        messages=messages,
    )

message = chat_response.choices[0].message.content
print(f"Chat Response: {message}")

Chat Response: [tip_calculator(bill_amount=100, tip_percent=10)]


#### Notice
Since the context was tip related, the tip calculator was passed back, and the summation calculator was not recommended for use.

From here, we can now call this method on our end - we have the Tip Calculator.

## Consent: Do you want this demo notebook to be public? Yes/No  

Yes