# TENET: TExtual traiNing Examples from daTa

Google Colab pipeline to make TENET run. Just press 'Run all'. 

Experiment settings can be edited under Configurations. ```active_cfg``` contains the current configuration to be run.

## Setup



Run this section to install the project in the local Colab folder.

In [None]:
%cd /content
%rm -rf eurecom-evidence-generator/
!git clone https://github.com/akatief/eurecom-evidence-generator.git
%cd eurecom-evidence-generator
!git checkout origin/develop

In [None]:
!pip install -r requirements.txt
!pip install --no-deps feverous

In [None]:
from google.colab import drive
drive.mount('/content/drive')

## Configurations

Customize your experiment by creating a new config. Note that you must have a valid FEVEROUS database and ToTTo model image on your Google Drive.

In [None]:
class cfg:
  def __init__(self,data_path, model_path, positive_evidence, negative_evidence,
               table_type, wrong_cell, table_per_page, evidence_per_table,
               column_per_table, seed, strat):
    self.data_path = data_path
    self.model_path = model_path
    self.positive_evidence = positive_evidence
    self.negative_evidence = negative_evidence
    self.table_type = table_type
    self.wrong_cell = wrong_cell
    self.table_per_page = table_per_page
    self.evidence_per_table = evidence_per_table
    self.column_per_table= column_per_table
    self.seed = seed
    self.strat = strat

# Set to your model path and desired configuration
cfg1 = cfg(data_path = '/content/drive/MyDrive/Datasets/filtereddb_st_2.db', model_path = '/content/drive/MyDrive/Colab Notebooks/exported_totto_large/1648208035/',
           positive_evidence = 4000, negative_evidence = 0, table_type='both', wrong_cell = 0, table_per_page = 1,
           evidence_per_table = 1, column_per_table= 2, seed = 2022, strat = 'entity')

#cfg2 = ...

In [None]:
active_cfg = cfg1

## Experiment

Finally, run the claim generation pipeline. 

In [None]:
%cd /content/eurecom-evidence-generator/

import json
from src.claim import TextualClaim
from src.claim import ToTToGenerator
from src.pipeline import ClaimGeneratorPipeline
from src.evidence.feverous_retriever.random import FeverousRetrieverRandom

retriever = FeverousRetrieverRandom(p_dataset=active_cfg.data_path,
                                    num_positive=active_cfg.positive_evidence,
                                    num_negative=active_cfg.negative_evidence,
                                    table_type=active_cfg.table_type,
                                    wrong_cell=active_cfg.wrong_cell,
                                    table_per_page=active_cfg.table_per_page,
                                    evidence_per_table=active_cfg.evidence_per_table,
                                    column_per_table=active_cfg.column_per_table,
                                    seed=active_cfg.seed,
                                    key_strategy=active_cfg.strat,
                                    )

generator = ToTToGenerator(encoding='totto', model_path=active_cfg.model_path)


pipeline = ClaimGeneratorPipeline([retriever, generator])

claims = pipeline.generate()

In [None]:
json_evidence = TextualClaim.to_json(claims)

%cd /content/drive/MyDrive/
file_name='./data_'
file_name += f'col_{active_cfg.column_per_table}_'
file_name += f'strategy_{active_cfg.strat}_'
file_name += f'positive_{ len([1 for c in claims if c.evidence.label=="SUPPORTS"])}_'
file_name += f'negative_{ len([1 for c in claims if c.evidence.label=="REFUTES"])}_'
file_name += f'table_type_{active_cfg.table_type}.json'

with open(file_name, 'w', encoding='utf-8') as f:
    json.dump(json_evidence, f, ensure_ascii=False, indent=4)