# Results Comparison

We provide this notebook for: 
* Inspecting a single canto by specifying model and temperature
*   Reading the results of all the generations of all the models
*   Finding the best performance for each metric (hendecasyllables, rhymes, etc...)
* Finding the overall best models



##Setup

In [None]:
import pandas as pd
import numpy as np
import json

# we stored our files on Git
import sys
!git clone "https://github.com/RiccardoCozzi96/DeepComedy"

fatal: destination path 'DeepComedy' already exists and is not an empty directory.


In [None]:
path = f"DeepComedy/generated cantos/"
comedy_filename = "DeepComedy/datasets/commedia.txt"
results_filename = "DeepComedy/results.csv"

### Some useful functions

In [None]:
def print_model(model_string, temp=None):
  m = model_string.split("_")
  print("\nencoders\t{}\ndecoders\t{}\ndff     \t{}\nd_model \t{}\nheads   \t{}\nprod epochs\t{}""\ncomedy epochs\t{}"
  .format(m[0], m[1], m[2], m[3], m[4], m[5], m[6]))
  if temp != None:
    print("temperature\t{}".format(temp))

def replace_tonic_accents(text):
  new = ""
  for word in text.split(" "):
    for char in word[:-1]: # don't look the last character:
      if   char in  "àá": new += "a"
      elif char in  "èé": new += "e"
      elif char in  "ìí": new += "i"
      elif char in  "òó": new += "o"
      elif char in  "ùú": new += "u"
      else:               new += f"{char}"
    new += word[-1]+" " if len(word) > 1 else word+" "
  return new[:-1]

def get_text(model, temp, show_tonic_accents=False):
  text = json.load(open(f"{path}{model}/LOG_{model}.json"))["generations"]["temp_"+temp]
  if not show_tonic_accents:
    return replace_tonic_accents(text)
  else:
    return text



## Load data

In [None]:
data = pd.read_csv(results_filename, index_col=["id"]) 

### Ignore unuseful scores
Some metrics we tried to implement are not useful or not accurate enough. We ignore them for the moment. 

In [None]:
#TEMP change name for hendec
data["hendec_ratio"] = data["hendec"]
data = data.drop(columns=["hendec"])

# TEMP SUBTRACT AVG_SYL
# data["hendec_correctness"] = [1 - score for score in data["avg_syl"].values]
data = data.drop(columns=["avg_syl"])

#TEMP convert the incorrectness to sigmoid value
data["word_correctness"] = [1+x for x in data.incorr]
data = data.drop(columns=["incorr"])

#TEMP parameters to be ignored
data = data.drop(columns=["plagiar"])
data = data.drop(columns=["n_vers"])

data

Unnamed: 0_level_0,model,temperature,struct,rhymes,hendec_ratio,word_correctness
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,1_1_256_512_4_0_150,0.5,1.000,0.891,0.87,0.56
1,1_1_256_512_4_0_150,0.6,1.000,0.891,0.90,0.57
2,1_1_256_512_4_0_150,0.7,1.000,0.875,0.90,0.55
3,1_1_256_512_4_0_150,0.8,1.000,0.828,0.92,0.68
4,1_1_256_512_4_0_150,0.9,1.000,0.891,0.90,0.49
...,...,...,...,...,...,...
237,7_7_256_512_4_0_150,1.1,1.000,0.000,0.89,0.72
238,7_7_256_512_4_0_150,1.2,1.000,0.047,0.85,0.68
239,7_7_256_512_4_0_150,1.3,1.000,0.062,0.85,0.62
240,7_7_256_512_4_0_150,1.4,0.917,0.031,0.85,0.65


## Inspect a generated canto 

In [None]:
model = "5_5_256_512_4_0_70"
temp = "0.8"
 
# show model info and generated text
print_model(model, temp)
print(get_text(model, temp))
 
# print scores of the selected canto
data.loc[data['model'] == model].loc[data["temperature"] == float(temp)]


encoders	5
decoders	5
dff     	256
d_model 	512
heads   	4
prod epochs	0
comedy epochs	70
temperature	0.8

ahi quanto a dir qual è anima futura
esta l'è a far la costa come malvage
la mente tua nel mal è giù la fura.

grande grossa, e per salir non vage;
che' questa è quel caro è ne la casa
in voi non sia da l'onda è vage.

ma già di sovra li ci e la ripasa,
non ché non facea menare a far mascelle,
ma la qual fummo in su la grave pastasa.

per sette porte, per le braccia volle:
io tendi, e prende come più con vento
le gambe in giu, e dintorno più s'avvolle.

sì com'e' dice; e ne l'aspetto vento,
perche' a lo spirto qui non per novella,
dicendo in troppa vede, e arco somento.

la donna mia, ché prima è cappella,
sì ché rubita non può torre ne' piedi,
se non son io, ché come ti favella.

non vo' pero, se tu vuo' ch'io ricordi
di questa ché sia ch'i' dico sì predi,
com'io dopo lui, se tu vuo' ch'io ricordi.

vuo' ch'i' dissi "a me tu ch'io procedi
per questo mondo e, che' a questo cielo


Unnamed: 0_level_0,model,temperature,struct,rhymes,hendec_ratio,word_correctness
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
146,5_5_256_512_4_0_70,0.8,1.0,0.891,0.91,0.58


##Selecting best model for each metric


In [None]:
limit = 5

attributes = data.columns[2:]
best_ids = []

# extract best generations
bests = {a:[] for a in attributes}
for attribute in attributes:
  bests[attribute] = data.sort_values(by=attribute, ascending=False)[:limit]
  best_ids.extend(bests[attribute].index)

for attribute in list(bests):
  print("\n\ntop of {}:\n{}\n".format(attribute, "="*80))
  print(bests[attribute][["model", "temperature", attribute]].head())




top of struct:

                     model  temperature  struct
id                                             
0      1_1_256_512_4_0_150          0.5     1.0
149     5_5_256_512_4_0_70          1.1     1.0
138  5_3_256_512_4_150_150          1.1     1.0
139  5_3_256_512_4_150_150          1.2     1.0
140  5_3_256_512_4_150_150          1.3     1.0


top of rhymes:

                   model  temperature  rhymes
id                                           
36   1_7_256_512_4_0_150          0.8   0.984
143   5_5_256_512_4_0_70          0.5   0.969
25   1_5_256_512_4_0_150          0.8   0.969
114  5_3_256_512_4_0_150          0.9   0.969
169  5_5_256_512_4_70_70          0.9   0.969


top of hendec_ratio:

                     model  temperature  hendec_ratio
id                                                   
44     3_1_256_512_4_0_150          0.5          1.00
83     3_5_256_512_4_0_150          1.1          0.97
68   3_3_256_512_4_150_150          0.7          0.97
37     1_7_2

##Find the best generated text by summing all the scores


In [None]:
sums_column = []
for i in range(len(data)):
  row_score = sum(data.iloc[i, 1:][["struct", "hendec_ratio", "rhymes", "word_correctness"]].values)
  sums_column.append([data.iloc[i].name,
                      data.iloc[i]["model"],
                      data.iloc[i]["temperature"],
                      row_score])

winners = pd.DataFrame(sums_column).sort_values(by=[3], ascending=False).drop(columns=0)[:limit]
winners.columns = ["model", "temperature", "final_score"]
winners

Unnamed: 0,model,temperature,final_score
154,5_5_256_512_4_0_150,0.5,3.532
143,5_5_256_512_4_0_70,0.5,3.529
155,5_5_256_512_4_0_150,0.6,3.512
169,5_5_256_512_4_70_70,0.9,3.509
25,1_5_256_512_4_0_150,0.8,3.479


##And the winners are...


In [None]:
data.iloc[winners.index.values]

Unnamed: 0_level_0,model,temperature,struct,rhymes,hendec_ratio,word_correctness
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
154,5_5_256_512_4_0_150,0.5,1.0,0.922,0.96,0.65
143,5_5_256_512_4_0_70,0.5,1.0,0.969,0.9,0.66
155,5_5_256_512_4_0_150,0.6,1.0,0.922,0.96,0.63
169,5_5_256_512_4_70_70,0.9,1.0,0.969,0.93,0.61
25,1_5_256_512_4_0_150,0.8,1.0,0.969,0.93,0.58


In [None]:
for i in range(len(winners)):
  id = winners.iloc[i].name
  model = winners.iloc[i]["model"]
  temp = str(winners.iloc[i]["temperature"])
  score = winners.iloc[i]["final_score"]

  print("\n{}\nSCORE: {}\n".format("="*80, score))
  print_model(model, temp)
  print("\n{}\n{}\n{}\n".format("-"*40, data.iloc[id], "-"*40))


SCORE: 3.532


encoders	5
decoders	5
dff     	256
d_model 	512
heads   	4
prod epochs	0
comedy epochs	150
temperature	0.5

----------------------------------------
model               5_5_256_512_4_0_150
temperature                         0.5
struct                                1
rhymes                            0.922
hendec_ratio                       0.96
word_correctness                   0.65
Name: 154, dtype: object
----------------------------------------


SCORE: 3.529


encoders	5
decoders	5
dff     	256
d_model 	512
heads   	4
prod epochs	0
comedy epochs	70
temperature	0.5

----------------------------------------
model               5_5_256_512_4_0_70
temperature                        0.5
struct                               1
rhymes                           0.969
hendec_ratio                       0.9
word_correctness                  0.66
Name: 143, dtype: object
----------------------------------------


SCORE: 3.512


encoders	5
decoders	5
dff     	256
d_model 	512

#Let's now do a little game, just for fun... 😊

### Get ready for...

---
## OK IL VERSO È GIUSTO!
---



In [192]:
model = "5_5_256_512_4_0_70"
temp = "0.8"
fake_tercets = get_text(model, temp).split("\n\n")[1:-1]
real_tercets = open(comedy_filename).read().split("\n\n")[1:-1]
np.random.shuffle(fake_tercets)
np.random.shuffle(real_tercets)
 
print("\n\nWelcome to...\n")
print("*************************")
print("* OK IL VERSO È GIUSTO! *")
print("*************************\n\n")
 
total = 10
scores = 0
 
for i in range(total):
  target = np.random.randint(1, 3)
  if target == 1: tercet = real_tercets[i]
  else:           tercet = fake_tercets[i]
  # print("target=", target) # debug
  print("-"*40)
  print(tercet)
  print("-"*40)
 
  answer = 0
  while int(answer) not in [1,2]:
    answer = int(input("\nWho wrote it?\n  (1) Dante\n  (2) AI\n\n"))
 
  if answer == target:  
    print("\nCorrect! :)\n")
    scores += 1
  else:                 
    print("\nWrong! Try again ;)\n")
    scores -= 1 if scores > 0 else 0
 
  print("="*80)
 
print("\n\tYou guessed {} tercets out of {}\n\n\tYour score is: {}".format(scores, total, round(scores/total,1)*100))
print("\n\n{}".format("*"*80))