## Setup

In [None]:
!pip install pytest
!pip install transformers
!pip install sentencepiece
!pip install tokenizers
!pip install nltk
!pip install loguru
!pip install rouge-score

In [None]:
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Not connected to a GPU')
else:
  print(gpu_info)

In [2]:
from google.colab import drive
drive.mount('/content/drive/')

Mounted at /content/drive/


In [1]:
%cd drive/MyDrive/projects/compositional-reasoning-finetuning

/content/drive/MyDrive/projects/compositional-reasoning-finetuning


## Git

In [None]:
!git pull

Git Push

In [6]:
!git status

On branch main
Your branch is ahead of 'origin/main' by 1 commit.
  (use "git push" to publish your local commits)

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	[31mmodified:   Evaluation.ipynb[m
	[31mmodified:   logs/evaluation.log[m

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	[31mcheckpoints/[m
	[31mmodels/[m
	[31mt5-base-train.ipynb[m

no changes added to commit (use "git add" and/or "git commit -a")


In [None]:
!git config --global user.email "richardmathews.ai@gmail.com"
!git config --global user.name "RichardMathewsII"
!git add Evaluation.ipynb
!git commit -m "finish evaluation layer"

In [None]:
import os
from getpass import getpass
import urllib

pat = input('Enter token: ')

cmd_string = 'git push https://RichardMathewsII:{0}@github.com/RichardMathewsII/compositional-reasoning-finetuning.git'.format(pat)

os.system(cmd_string)
cmd_string, pat = "", "" # removing the password from the variable

## Pytest

In [None]:
!pytest . -v

platform linux -- Python 3.10.12, pytest-7.2.2, pluggy-1.2.0 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /content/drive/MyDrive/W266 Project/compositional-reasoning-finetuning
plugins: anyio-3.7.1
[1mcollecting ... [0m

## Evaluation

In [2]:
from evaluation import load_responses, EvaluationConfig
from data_loaders import load_TestData
import json
config = EvaluationConfig("flan-t5-small-direct", examplars=False, data_path="data/MultihopEvaluation/", results_path="results/", create_tokenizer=False)
responses = load_responses(config)
test_set = load_TestData(config.generate_test_set_file(), n_examples=5)
with open(config.generate_results_file(), "r") as f:
    results = json.load(f)

In [3]:
for idx in range(5):
    print("Example:", idx)
    print(test_set[idx]["prompt"])
    display("Reference text:", test_set[idx]["target"])
    print("True answer:", test_set[idx]["answer"])
    print()
    print("--------")
    print("T5 response")
    display("Response:", responses[idx]["response"])
    print("Answer:", responses[idx]["answer"])
    print("--------")
    print("\n\n")
display("T5 Results")
display(results["macro_results"])

Example: 0
Facts:
Fact #0: (Wojna polsko-ruska) is a 2009 Polish film directed by Xawery Żuławski based on the novel Polish-Russian War under the white-red flag by Dorota Masłowska.
Fact #1: He is the son of actress Małgorzata Braunek and director Andrzej Żuławski.

Question: Who is the mother of the director of film Polish-Russian War (Film)?
Answer:


'Reference text:'

'Małgorzata Braunek'

True answer: Małgorzata Braunek

--------
T5 response


'Response:'

'Magorzata Braunek'

Answer: Magorzata Braunek
--------



Example: 1
Facts:
Fact #0: Blind Shaft is a 2003 film about a pair of brutal con artists operating in the illegal coal mines of present- day northern China.
Fact #1: The Mask of Fu Manchu is a 1932 pre-Code adventure film directed by Charles Brabin.

Question: Which film came out first, Blind Shaft or The Mask Of Fu Manchu?
Answer:


'Reference text:'

'The Mask Of Fu Manchu'

True answer: The Mask Of Fu Manchu

--------
T5 response


'Response:'

'The Mask Of Fu Manchu'

Answer: The Mask Of Fu Manchu
--------



Example: 2
Facts:
Fact #0: John was the second (but eldest surviving) son of Ernest I, Prince of Anhalt-Dessau, by his wife Margarete, daughter of Henry I, Duke of Münsterberg-Oels, and granddaughter of George of Poděbrady, King of Bohemia.
Fact #1: Ernest I, Prince of Anhalt-Dessau (died Dessau, 12 June 1516), was a German prince of the House of Ascania and ruler of the principality of Anhalt-Dessau.

Question: When did John V, Prince Of Anhalt-Zerbst's father die?
Answer:


'Reference text:'

'12 June 1516'

True answer: 12 June 1516

--------
T5 response


'Response:'

'12 June 1516'

Answer: 12 June 1516
--------



Example: 3
Facts:
Fact #0: Wearing Velvet Slippers under a Golden Umbrella (Pronounced as Katipa phanat see shwe htee hsaung) is a 1970 Burmese film directed by Maung Wunna starring Myat Mon, Myat Lay and Thet Naung.
Fact #1: Maung Wunna  was a two-time Myanmar Motion Picture Academy Awards-winning Burmese director and writer.

Question: What is the award that the director of film Wearing Velvet Slippers Under A Golden Umbrella won?
Answer:


'Reference text:'

'Myanmar Motion Picture Academy Awards'

True answer: Myanmar Motion Picture Academy Awards

--------
T5 response


'Response:'

'Film-Fee'

Answer: Film-Fee
--------



Example: 4
Facts:
Fact #0: Ronnie Rocket is an unfinished film project written by David Lynch, who also intended to direct it.
Fact #1: Born to a middle-class family in Missoula, Montana, Lynch spent his childhood traveling around the United States before he studied painting at the Pennsylvania Academy of Fine Arts in Philadelphia, where he first made the transition to producing short films.

Question: Where was the director of film Ronnie Rocket born?
Answer:


'Reference text:'

'Missoula, Montana'

True answer: Missoula, Montana

--------
T5 response


'Response:'

'Missoula, Montana'

Answer: Missoula, Montana
--------





'T5 Results'

{'accuracy': 0.9,
 'F1-1': 0.9,
 'F1-2': 0.8,
 'bleu-1': 0.9,
 'bleu-2': 0.8,
 'rouge-1': 0.9,
 'rouge-2': 0.8,
 'rouge-L': 0.9}

In [4]:
import pandas as pd

pd.DataFrame(results["micro_results"])

Unnamed: 0,correct,bleu-1,bleu-2,rouge-1,rouge-2,rouge-L,F1-1,F1-2
0,True,1.0,1.0,1.0,1.0,1.0,1.0,1.0
1,True,1.0,1.0,1.0,1.0,1.0,1.0,1.0
2,True,1.0,1.0,1.0,1.0,1.0,1.0,1.0
3,False,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,True,1.0,1.0,1.0,1.0,1.0,1.0,1.0
5,True,1.0,1.0,1.0,1.0,1.0,1.0,1.0
6,True,1.0,1.0,1.0,1.0,1.0,1.0,1.0
7,True,1.0,0.0,1.0,0.0,1.0,1.0,0.0
8,True,1.0,1.0,1.0,1.0,1.0,1.0,1.0
9,True,1.0,1.0,1.0,1.0,1.0,1.0,1.0
