# Sample Submission to TAPE

To preserve the integrity of the results, we do not release the targets for the test data to public. Thus, to evaluate your solution on the test set, you must submit it to the official e-mail address: tapebenchmark@gmail.com.

This is an example of how to make a submission to TAPE ledareboard.

## Installation

Clone the repository and install the requirements:

In [4]:
# !git clone https://github.com/RussianNLP/TAPE.git
# %cd TAPE
# !pip install .

Import all the necessary modules:

In [5]:
import pandas as pd
import numpy as np

from datasets import load_dataset
# from TAPE.utils.episodes import get_episode_data
def get_episode_data(data: pd.DataFrame, episode: int) -> pd.DataFrame:
    """
    Returns all the data from the specified episode

    Parameters
    ----------
    data: pd.DataFrame
        data to work with
    episode: int
        episode number

    Returns
    -------
    pd.DataFrame
        train data from the passed episode
    """
    ids = data.episode.apply(lambda x: episode in x)

    return data[ids]

## Making predictions

For the purpose of this tutorial, we create simple random prediction models for each task.

Let's define all the functions that we will require:

In [6]:
def get_data(task_name: str):
    """
    Load train and test
    """
    data = load_dataset("RussianNLP/tape",f"{task_name}.episodes")
    train_data = data['train'].data.to_pandas()
    test_data = data['test'].data.to_pandas()
    return train_data, test_data

In [7]:
def predict(k_shots: int, test_data: pd.DataFrame, task_name: str):
    """
    Random prediction for each task
    """
    if task_name in ['ru_worldtree', 'ru_openbook']:
        predictions = np.random.choice(['A', 'B', 'C', 'D'], size=test_data.shape[0])
    elif task_name == 'winograd':
        predictions = np.random.choice([0,1], size=test_data.shape[0])
    elif task_name in ['chegeka', 'multiq']:
        predictions = np.random.choice(['some', 'answer'], size=test_data.shape[0])
    else:
        predictions = np.array([np.random.choice([0,1], size=(5,)) for _ in range(test_data.shape[0])])
    return predictions

def get_predictions(task_name: str):
    """
    Make predictions for each task
    """
    train_data, test_data = get_data(task_name)
    full_predictions = []
    episodes = [4] + list(np.unique(np.hstack(train_data.episode.values)))
    
    # iterate over episodes
    for episode in sorted(episodes):
        
        k_shots = get_episode_data(train_data, episode)
        
        # iterate over transformed and original test datasets
        for perturbation, test in test_data.groupby('perturbation'):
        
            # get model predictions
            predictions = predict(k_shots, test, task_name)
            
            # save predictions
            full_predictions.append({
                "episode": episode,
                "shot": k_shots.shape[0],
                "slice": perturbation,
                "preds": predictions
            })
            
    full_predictions = pd.DataFrame(full_predictions)
    return full_predictions

We can use the `get_predictions` function to get model predictions for each task in the correct format:

- `episode`: evaluation episode number
- `shot`: k-shot value, used for evaluation
- `slice`: test slice (perturbation name or dataset name, if original test data was used for evaluation)
- `preds`: a list of model predictions for each test example (or a list of lists in the case of both Ethics tasks)

In [10]:
get_predictions('ru_openbook')

Unnamed: 0,episode,shot,slice,preds
0,4,0,addsent,"[C, A, C, C, D, B, D, A, A, C, B, A, D, C, A, ..."
1,4,0,back_translation,"[B, A, A, D, B, A, A, C, A, D, C, D, C, D, B, ..."
2,4,0,butter_fingers,"[C, B, D, C, D, C, B, B, B, B, D, B, D, A, C, ..."
3,4,0,del,"[C, D, C, A, C, D, A, D, B, B, D, C, B, D, C, ..."
4,4,0,emojify,"[B, B, C, C, B, D, A, A, A, D, D, A, D, D, B, ..."
...,...,...,...,...
107,19,8,butter_fingers,"[C, D, D, C, B, B, D, B, B, B, D, B, C, B, D, ..."
108,19,8,del,"[A, A, D, A, A, A, B, D, C, C, A, B, D, D, A, ..."
109,19,8,emojify,"[D, B, C, C, B, D, B, A, C, A, D, D, B, B, C, ..."
110,19,8,ru_openbook,"[A, A, D, C, D, C, D, B, D, A, B, A, D, D, B, ..."


Now, when we have everything set up, we can get create a folder for our `sample_submission` model: 

In [9]:
TASKS = {
    'winograd': 'Winograd',
    'ru_openbook': 'RuOpenBookQA',
    'ru_worldtree': 'RuWorldTree',
    'multiq': 'MultiQ',
    'chegeka': 'CheGeKa',
    'sit_ethics': 'Ethics1',
    'per_ethics': 'Ethics2'
}

Create a folder for the model:

In [7]:
mkdir -p sample_submission/predictions

Predict on tasks and save to the predictions folder:

In [8]:
for task_name in TASKS:
    print(f'Predicting for {TASKS[task_name]}')
    predictions = get_predictions(task_name)

    predictions.to_json(
        f'sample_submission/predictions/{TASKS[task_name]}.json',
        orient='records',
        force_ascii=False
    )
    print()

Predicting for Winograd




  0%|          | 0/2 [00:00<?, ?it/s]


Predicting for RuOpenBookQA




  0%|          | 0/2 [00:00<?, ?it/s]


Predicting for RuWorldTree




  0%|          | 0/2 [00:00<?, ?it/s]


Predicting for MultiQ




  0%|          | 0/2 [00:00<?, ?it/s]


Predicting for CheGeKa




  0%|          | 0/2 [00:00<?, ?it/s]


Predicting for Ethics1




  0%|          | 0/2 [00:00<?, ?it/s]


Predicting for Ethics2




  0%|          | 0/2 [00:00<?, ?it/s]




## Including metadata

Each submission must additionally contain a `meta.json` file with general information about the submittion:

- `model name`: name of the solution to apper on the leaderboard
- `author`: author name to appear on the leaderboard (a person or a team)
- `description`: a short description of the solution
- `repository`: link to the reproducible code (if applicable)



In [9]:
metadata = {
    "model_name": "sample_submission",
    "author": "AGI-NLP",
    "description": " sample submission code",
    "repository": "https://github.com/RussianNLP/TAPE"
}

In [10]:
import json

with open('sample_submission/meta.json', 'w', encoding='utf-8') as f:
    json.dump(metadata, fp=f, ensure_ascii=False, indent=4)

## Testing

Run the tests to check file structure:

In [11]:
!pytest tests/test_submission.py --dirpath sample_submission

platform linux -- Python 3.8.16, pytest-7.2.0, pluggy-1.0.0
rootdir: /content/TAPE
plugins: subtests-0.9.0, typeguard-2.7.1
collected 2 items                                                              [0m

tests/test_submission.py [32m,[0m[32m,[0m[32m,[0m[32m,[0m[32m,[0m[32m,[0m[32m.[0m[32m,[0m[32m,[0m[32m,[0m[32m,[0m[32m,[0m[32m,[0m[32m,[0m[32m,[0m[32m,[0m[32m,[0m[32m,[0m[32m,[0m[32m,[0m[32m,[0m[32m.[0m[33m                          [100%][0m



Since the tests pass, we can make an archive of the folder and send it to tapebenchmark@gmail.com 

In [12]:
import shutil

shutil.make_archive('sample_submission', 'zip', 'sample_submission')

'/content/TAPE/sample_submission.zip'