### Problem statement

Notebook ***"03_clip_bleu_meteor_similarity-scoring_prod.ipynb"***:

- extracts visual relationships as caption seeds from VISREL analysis results for selected images, i.e. either manually annotated images (training dataset images) or never-seen-before images (test dataset images) processed by an object detection R-CNN model,
- post-processes caption-seeds so they conform to the CLIP textual input requirements,
- computes image-summary embeddings similarities between reference and candidate summaries.
- computes BLEU and METEOR n-gram based similarities between reference and candidate summaries.
- tabulates thus obtained results to ease the visual comparison between the scoring schemes.

CLIP is the *Contrastive Language-Image Pre-Training* neural network trained on a variety of (image, text) pairs, first reported by Radford et al (2020) and applied by Hessel et al (2021). <BR>

CLIP authors claim that given a digitized image, CLIP can produce a descriptive summary of the image, withouthaving specifically fine-tuned for the task, similar to the zero-shot capabilities of OpenAI's Generative Pre-trained Transformer models, GPT-2 and 3.
    
#### Sequence of actions:
**A)** Scan specific directory or zipped archive, where '<basename>.{jpg,png}' files are located<BR>
      Run existence checks on files '<basename>.{.img,*.xml,_*_vrd.json}'
    
**B)** Recover caption seeds obtained with .../vis-rel/07_bbx_visrel_rules.ipynb<BR>
      Caption seeds are saved as '<basename>_vrd.json' 

**C)** Implement CLIP to compute and save cosine similarities between images and caption seeds<BR>
      Save result in '../Sgoab/Data/score.tsv'
    
**D)** Implement ROUGE, BLEU, METEOR n-gram based similarity scores


### Licensing terms and copyright

Sections A and B of this notebook are concerned with recovering/loading image captions (e.g. caption-seeds obtained from the VISREL code). The code used to produce caption seeds can be found [here](https://github.com/Cbhihe/VisRel_caption-seeds). It is protected by the terms and conditions of the GNU_GPL-v3 copyleft license.

     Copyright (C) 2021 Cedric Bhihe

This program is free software: you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation, either version 3 of the License, or any later version.

In short, THIS PROGRAM IS MADE AVAILABLE OR DISTRIBUTED IN THE HOPE OF IT BEING USEFUL, 
BUT WITHOUT WARRANTY OF ANY KIND, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE.  Look at the GNU General Public 
License for more details on its terms.

You should have received a copy of the GNU General Public License along with this program.
If not, see <https://www.gnu.org/licenses/>, or consult the License.md file in this repo.

=========================================================

Section C of this notebook is concerned with the implementation 
of CLIP, as originally made available by its authors. It is protected by the terms and 
conditions of the MIT license.

    Copyright (C) 2021 OpenAI

Permission is hereby granted, free of charge, to any person obtaining a copy of this 
software and associated documentation files (the "Software"), to deal in the Software 
without restriction, including without limitation the rights to use, copy, modify, merge, 
publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons 
to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or 
substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, 
DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

=========================================================

***To contact the repo owner for any issue, please open an new thread under the [Issues] tab on this repo.***

### Guidelines

This iPython notebook must be executed in a Python virtual environment, running Python v3.7.1. This is a prerequisite so the proper versions of Torch 1.7.1+cpu and TorchVision 0.8.2+cpu can be invoked to run the CLIP 1.0 inference engine on test images. A description of installation steps is detailed in notebook '.../Sgoab/Src/Caption_evaluate/01_clip_cpu_classify.ipynb'.

     
### Package requirements

- Install all required packages in the virtual environment directory "/path/to/my_directory", with:
```
    $ cd /path/to/my_directory
    $ python -m pip freeze <<- 'EOF'
                clip @ git+https://github.com/openai/CLIP.git@04f4dc2ca1ed0acc9893bd1a3b526a7e02c4bb10ftfy
                contractions==0.0.58
                Cython==0.29.1
                h5py==2.9.0
                ftfy==5.5.1
                matplotlib==3.0.2
                nltk==3.4
                numpy==1.17.3
                Pillow==8.3.2
                pywsd==1.2.4
                pyyaml==5.1
                regex==2021.8.3
                requests==2.20.1
                scipy==1.2.0
                torch==1.7.1+cpu
                torchaudio==0.7.2
                torchvision==0.8.2+cpu
                tqdm==4.38.0
                typing==3.7.4
                zipfile37==0.1.3
    EOF
```
From within Python, also do:
```
        nltk.donwload('punkt',                         # required for pos tagging
                      'stopwords',
                      'wordnet',                       # required for synsets lookup
                      'wordnet_ic',
                      'averaged_perceptron_tagger',    # required for pos tagging
                      'treebank',
                      'names',)
    # or instead:
        nltk.donwload('popular')
```    

Some specific resources becomes accessible with the above requirements:

    - n-gram based scoring algorithms:
        nltk.translate.meteor
        nltk.translate.meteor_score
        nltk.translate.bleu_score
        nltk.corpus.wordnet
        nltk.stem.PorterStemmer
        nltk.stem.WordNetLemmatizer
        nltk.stem.api.StemmerI
     
    - Word Sense Disambiguation (WSD):   (use `python -m pip install pywsd==1.0.2`)
        pywsd.lesk.simple_lesk
        pywsd.disambiguate
        pywsd.similarity.max_similarity
        
Jupyter environment requirements include:
```
                ipykernel==6.6.0
                ipython==7.30.1
                ipython_genutils==0.2.0
                ipywidgets==7.6.5
                jupyter_client==7.1.0
                jupyter_core==4.9.1
                nbclient==0.5.9
                nbconvert==6.3.0
                nbformat==5.1.3
                notebook==5.7.4
                traitlets==5.1.1
```
 ... and starting the jupyter notebook from the sytem's jupyter's instance with:
```
    $ /usr/bin/jupyter notebook 01_clip_cpu_classify
```
#### Known issues¶

- Launching the notebook by relying on the local environment's shims, with:
'''
$ jupyter notebook 01_clip_cpu_classify
'''
may fail under Pyenv with a "Segmentation fault". It is likely an iPython issue related to jupyter. To avoid it, launch either notebook from a more recent python version, and select iyour custom built 3.7.1 iPython kernel from the notebook at first launch.

In [None]:
import os, sys, time, csv, json, ast, pickle, re, string, random, faulthandler  # builtins
from typing import Union, Any, List, Tuple, Optional, Iterable, cast
import copy
import multiprocessing as mp
from multiprocessing.pool import ThreadPool

from zipfile import ZipFile

import matplotlib.pyplot as plt
%matplotlib inline

faulthandler.enable()

In [None]:
import contractions
import nltk
from nltk import word_tokenize, pos_tag
from nltk.translate import meteor, meteor_score, bleu_score
#from nltk.corpus import stopwords, WordNetCorpusReader, wordnet
from nltk.corpus import WordNetCorpusReader, wordnet, stopwords

from nltk.stem import WordNetLemmatizer
from nltk.stem.porter import PorterStemmer
from nltk.stem.snowball import SnowballStemmer 
from nltk.stem.api import StemmerI

import numpy as np

# partly maintained; install PyPI.org package as: `python -m pip install pywsd==1.0.2`
from pywsd import disambiguate      
from pywsd.similarity import max_similarity as maxsim
from pywsd.lesk import simple_lesk 
from pywsd.lesk import simple_signature  # read pre-computed signatures per synset
                                         # method name changed to 'cached_signature'
                                         # in later versions
import torch
import clip

from PIL import Image
import IPython.display

In [None]:
## NLTK set-up
nltk.download('punkt',                              # required for pos tagging
              'averaged_perceptron_tagger',         # required for pos tagging
              'stopwords',
              'wordnet',                            # required for synsets lookup
              'wordnet_ic',                         # required for calculating information content similarity
              'treebank',
              'names')
stop_words = set(stopwords.words('english'))        # English stop-words set

In [None]:
## Import trained CLIP model
localdevice = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=localdevice, jit=False)

#### Methods

In [None]:
def cleanup(text: str, lower=True) -> str:
    '''
    Remove spaces at begin and end of sole string argument.
    Make lower case by default or not if so specified
    
    @type            text: str
    @param           text: input string at end of which any number of spaces are removed
    
    @return: output string
    '''
    if lower:
       text = text.lower()
    return text.strip()


def add_end_period(text: str) -> str:
    '''
    Add period at end of string provided no punctuation is there already.
        
    @type            text: str
    @param           text: input string to end with dot
    
    @return: output string
    '''
    text = re.sub(r'([^\.!?:;,]+$)','\\1.',text)
    return text


def download_url(url: Tuple[str,str], my_proxy={ 'http': 'http://127.0.0.2:9151' }) -> None:
    """
    Download function for very large images using chunked stream.
     - downloads object located at 'url' in a single process.
     - uses default http proxy
     - overridden/suppressed). 
     - default http proxy can allows you to use your system default setting to connect to the Internet.
    Use a streaming connection to download larges files in chunks of predetermined sizes.
    
     @type       url: tuple 
     @type  my_proxy: dict or None 
     @param      url: destination_filename, source_filename); both filenames are fully qualified. 
     @param my_proxy: optional kw arg; default value specific for the system implementation.
                                       Replace default dict value with 'None' if needed.
                                       
     @return: None                                  
    """
    path, url = url
    response = req.get(url, stream = True, allow_redirects=True, proxies=my_proxy)
    with open(path, 'wb') as file:
        for chunk in response.iter_content(chunk_size = 2048):
            if chunk:
                file.write(chunk) 

                
def shorten_string(text: str, context_length: int =64) -> str:
    '''
    - Develop English contracted verbal forms; e.g. "it's" becomes "it is". 
      Contracted possessive forms are kept.
    - Shorten provided string to 74 (default) or fewer tokens, counting actual words, 
      punctuation symbols and two place-holders for start-of-string and end-of-string.
    
    @type            text: str
    @type  context_length: int
    @param           text: input string to develop and shorten
    @param context_length: maximum length allowed for output string
    
    @return: output string
    '''
    text = contractions.fix(text)
    if len(word_tokenize(text)) >= context_length:
        text = ' '.join(word_tokenize(text)[0:min(context_length-2,64)])
        text = re.sub(r' (\.|!|,|;)','\\1',text)
        if not re.match(r'.+\.$',text):
             text += '.'
    return text


def clean_visrel_seeds(text: str) -> str:
    '''
    - Clean provided string following regex pattern substitution.
    
    @type            text: str
    @param           text: input string to clean
    
    @return: output string
    '''
    pattern1 = r'\(\(|\)\)'
    pattern2 = r'(\/([a-z_])*)+'
    pattern3 = r'_[0-9]+'
    pattern4 = r'_'
    pattern5 = r'\s*[^.]+\sis\spartial\.*\s*'
    
    text = re.sub(pattern1,'',text)
    text = re.sub(pattern2,'',text)
    text = re.sub(pattern3,'',text)
    text = re.sub(pattern4,' ',text)
    text = re.sub(pattern5,'',text)
    return text


def has_wn_exception(token: str, pos='v') -> str:
    '''
    Return exception as stem per Wordnet dictionary for given POS encoding
        
    @type       token: str
    @type         pos: str
    @param      token: input token to look up
    @param        pos: POS tagging per wordnet classification: 'n','v' (default),'a','r' 
    
    @return: output either the input string or the exception stem if any is found.
    '''
    try:
        return wordnet._exception_map[pos][token][0]
    except KeyError:     # if pos not in {'n','v','a','r'}
        return token
    
    
def stage_vocab_dis(summary: Union[str, List[str]], method: str ='hypernym', similarity: str ='res') -> List[str]:
    '''
    Stage-Vocabulary-Disambiguation:
    
    Stage the hypernym based on a Wordnet synset-based information Content similarity score 
    classification of each token present in 'summary'.
    
    @type     summary: string, or list (tokenized sentence)
    @type      method: string
    @type  similarity: string
    @param    summary: textual summary to process
    @param     method: value = 'hyp'; uses contextualized hypernyms (default)
                       value = 'lemma'; uses lemmata, morphological reduction and wordnet
                                exception dictionaries
    @param similarity: abbreviation of similarity method use to score the disambiguated sense
                       and to return a synsets. Similarity methods are one of:
                               path: 'wup' (Wu and Palmer, 1994), 'lch' (Leacock and Chodorow, 1994 & 1998),
                       info content: 'res' (Resnik, 1993 & 1995), 'jcn' (Jiang and Conrath, 1997), 'lin' (Lin, 1998)
                       
    @return: tokenized sentence in the form 'List[str]'
    
    Disregard tokens not composed exclusively of alphabetical characters and stop-words (per NLTK)
    and return them unchanged.
    
    In case of multiple choice among available synsets, disambiguation is performed based on maximizing 
    similarity between Wordnet definition and context s provided by 'summary'. Once done disambiguation
    permits hypernym determination, no matter what the synset tag happens to be. 
    
    To ensure proper pick of noun or verb lemma, use NLTK pos tagging with following classification
    labels:
        CC Coordinating Conjunction
        CD Cardinal Digit
        DT Determiner
        EX Existential There. Example: “there is” … think of it like “there exists”)
        FW Foreign Word.
        IN Preposition/Subordinating Conjunction.
        JJ Adjective.
        JJR Adjective, Comparative.
        JJS Adjective, Superlative.
        LS List Marker 1.
        MD Modal.
        NN Noun, Singular.
        NNS Noun Plural.
        NNP Proper Noun, Singular.
        NNPS Proper Noun, Plural.
        PDT Predeterminer.
        POS Possessive Ending. Example: parent’s
        PRP Personal Pronoun. Examples: I, he, she
        PRP$ Possessive Pronoun. Examples: my, his, hers
        RB Adverb. Examples: very, silently,
        RBR Adverb, Comparative. Example: better
        RBS Adverb, Superlative. Example: best
        RP Particle. Example: give up
        TO to. Example: go ‘to’ the store.
        UH Interjection. Example: errrrrrrrm
        VB Verb, Base Form. Example: take
        VBD Verb, Past Tense. Example: took
        VBG Verb, Gerund/Present Participle. Example: taking
        VBN Verb, Past Participle. Example: taken
        VBP Verb, Sing Present, non-3d take
        VBZ Verb, 3rd person sing. present takes
        WDT wh-determiner. Example: which
        WP wh-pronoun. Example: who, what
        WP$ possessive wh-pronoun. Example: whose
        WRB wh-abverb. Example: where, when
        
    Alternative methods:
        - get all wn grandchildren for a hypernym grandparent ?
          https://stackoverflow.com/questions/24217776
        - ...
    '''
    if isinstance(summary,list):
        summary_tokenized = [s.lower() for s in summary]
        summary = ' '.join(summary_tokenized)
        summary = re.sub(r' (\.|!|,|;|:|\?)','\\1',summary)  # suppress blank before punctuation char: .!,;:?
    elif isinstance(summary,str):
        summary = summary.lower()
        summary_tokenized = word_tokenize(summary)
    else:
        pass
    
    if method not in {'hypernym','h','hyp','hype','hyper','lemma','l','lem','lemm'}:
        method = 'hyp'     # default
        
    if similarity not in {'wup','lch','res','jcn','lin'}:
        similarity = 'res' # default
    
    summary_dis_lst = list()
    
    summary_wsd = disambiguate(summary, algorithm=maxsim, similarity_option=similarity, keepLemmas=True)
    
    if method in {'hypernym','h','hyp','hype','hyper'} :
        for _, lmtzd, dis_tok_syn in summary_wsd:
            if dis_tok_syn is None or len(dis_tok_syn.hypernyms()) == 0:
                hypernym = lmtzd
            else:
                try:
                    hypernym = dis_tok_syn.hypernyms()[0].name().split('.')[0]
                except AttributeError:
                    hypernym = lmtzd
            
            summary_dis_lst.append(hypernym)
        
    elif method in {'lemma','l','lem','lemm'}:
        tagging = pos_tag(summary_tokenized)
        tok_stem = list()
        
        for idx, token in enumerate(summary_tokenized):
            not_punctuation = True if re.match(r'^[a-zA-Z]+$',token) else False
            not_stop_word = True if token not in stop_words else False
            len_synsets = len(wordnet.synsets(token))

            if (not_punctuation and not_stop_word and len_synsets > 0):

                synset_postag = tagging[idx][1]    
                if synset_postag.startswith('V'):
                    synset_postag = wordnet.VERB      # pos('v')
                elif synset_postag.startswith('J'):
                    synset_postag = wordnet.ADJ       # pos('a')
                elif synset_postag.startswith('R'):
                    synset_postag = wordnet.ADV       # pos('r')
                else:
                    # if synset_postag starts with 'N' or anything not in {'V','J','R'}
                    synset_postag = wordnet.NOUN      # pos('n')

         
                cand_synset_lst = [wordnet.synsets(token)[k] for k in range(len_synsets) 
                                   if (wordnet.synsets(token)[k].pos() == synset_postag 
                                       if synset_postag != 'a' 
                                       else wordnet.synsets(token)[k].pos() in {'a','s'}) ]
                
                try:
                    if summary_wsd[idx][2] in cand_synset_lst:
                        try:
                            token_lemmata = summary_wsd[idx][2].lemmas()
                            token_lemma = token_lemmata[0].name()
                        except (AtrributeError, IndexError):
                            token_lemma = token # summary_wsd[idx][1]
                    else:
                        token_lemma = summary_wsd[idx][1]
                        
                except AttributeError:
                    token_lemma = summary_wsd[idx][1]
                    
                if token_lemma == token:
                    try:
                        if synset_postag != 'a':
                            stem = wordnet._morphy(token, pos=synset_postag)[0]
                        else:
                            tok_stem.extend(wordnet._morphy(token, pos='a'))
                            tok_stem.extend(wordnet._morphy(token, pos='s'))
                            tok_stem = list(set(tok_stem))
                            stem = tok_stem[0] if len(tok_stem) == 1 else sorted(tok_stem, key=lambda x: len(x), reverse=False)[0]
                    except IndexError:
                        stem = token
                        
                    if stem == token_lemma:
                        token_lemma = has_wn_exception(stem, pos=synset_postag)
            else:
                token_lemma = token
            
            summary_dis_lst.append(token_lemma)
    
    else:
        pass
    
    summary_dis_tokenized = word_tokenize(' '.join(summary_dis_lst))
    
    return summary_dis_tokenized


def split_on_the_dot(annots_refs_in: List[str]) -> List[List[str]]:
    '''
    Split provided list of summaries (as strings) on the end-of-sentence period, 
    whenever a summary is made of more than one sentence.
    
    @type     annots_refs_in: list of strings
    @param    annots_refs_in: textual summaries to be processed

    @return:  list of lists of strings; summaries' component sentences as individual 
              component summaries, ordered as input list was.
    '''
    annots_refs_split = list()
    
    annots_refs_in = [re.sub(r'\.$','',x) for x in annots_refs_in] 
    annots_refs_in = [s.split('.') for s in annots_refs_in]

    for subs in annots_refs_in:
        subset = list()
        for s in subs:
            s += '.'
            s = re.sub(r'^\s+','',s)
            subset.append(s)
        annots_refs_split.append(subset)
    
    del annots_refs_in, subs, subset
    
    return annots_refs_split


In [None]:
def benchmark_scores(idx: int) -> None:
    '''
    benchmark_scores()::
      - computes:
          - CLIP img-summary cosine similarities for human references and candidate seeds 
          ('test' RCNN+VR and 'train' VisRel)
        
      - computes for both whole and sentence-split summaries:
          - METEOR scores between machine candidates and human references 
          with  stemming and synonymy or lemmatization and synonymy word matches
          - BLEU scores between machine candidates and human references
          with exact or lemmatized  word matches
          
        Input:
          @param    img: CLIP-preprocessed img
          @type     img: pyTorch tensor
          
        Output:
          none
    '''
    
    img = C_imgs_preproc[idx]
    file = scored_files[idx]
    
    img_cands = list()
    img_cands_split = list()
    
    if (file in checklist and
        file + '_test_vrd.json_copy' in zipped.namelist() and
        file + '_train_vrd.json_copy' in zipped.namelist()
       ):
        
        start = time.time()
        #print(f'idx: {idx} and file: {file}')
        
        # ================
        ## CLIP cosine similarities for WHOLE and SPLIT human references
        # ================
        # authors' list for whole refs
        img_refs_authors = [author for author, _ in C_wrk_annots_refs[idx]]
        # authors' list for split refs
        img_refs_split_authors = [author for author,split_nchunk,_ in C_wrk_annots_refs_split[idx] 
                                  for _ in range(split_nchunk)]

        # whole human ref summaries
        img_annots_refs_flat = [shorten_string(annot, context_length=64) for _, annot in C_wrk_annots_refs[idx]]
        #print(f'idx: {idx}\nimg_annots_refs_flat: {img_annots_refs_flat}\n')
        
        # split human ref summaries  ('ref_split_summaries'), keeps authors ordering
        img_annots_refs_split_flat = [shorten_string(split_annot, context_length=64) 
                               for _,_,split_txt in C_wrk_annots_refs_split[idx] 
                               for split_annot in split_txt]
        
        # list of lists of nltk-tokenized for WHOLE and SPLIT references for given img (at index 'idx')
        img_annots_refs_tok = [word_tokenize(ref) for ref in img_annots_refs_flat]
        C_annots_refs_tok.append(img_annots_refs_tok)
        
        img_annots_refs_split_tok = [word_tokenize(ref) for ref in img_annots_refs_split_flat] # flat lst, keeps author order
        C_annots_refs_split_tok.append(img_annots_refs_split_tok)   # ordered first by img idx, then by author 
        
        image_input = torch.tensor(np.stack([img,])).to(localdevice)
        # build flat list of CLIP tokens for WHOLE human ref summaries for image at idx
        img_annots_refs_ctok = clip.tokenize(img_annots_refs_flat).to(localdevice)
        # build flat list of CLIP tokens for SPLIT human ref summaries for image at idx   
        img_annots_refs_split_ctok = clip.tokenize(img_annots_refs_split_flat).to(localdevice)

        with torch.no_grad():
            img_features = model.encode_image(image_input).float().to(localdevice)
            img_annots_refs_features = model.encode_text(img_annots_refs_ctok).float().to(localdevice)
            img_annots_refs_split_features = model.encode_text(img_annots_refs_split_ctok).float().to(localdevice)

        img_features /= img_features.norm(dim=-1, keepdim=True)
        img_annots_refs_features /= img_annots_refs_features.norm(dim=-1, keepdim=True)
        img_annots_refs_split_features /= img_annots_refs_split_features.norm(dim=-1, keepdim=True)

        # WHOLE SUMMARIES
        S_clip_img_refs = img_annots_refs_features.cpu().numpy() @ img_features.cpu().numpy().T
        
        # SPLIT SUMMARIES
        S_clip_img_refs_split = img_annots_refs_split_features.cpu().numpy() @ img_features.cpu().numpy().T
        
        S_clip_img_annots_refs_flat = np.ndarray.flatten(S_clip_img_refs).tolist()
        S_clip_img_annots_refs_split_flat = np.ndarray.flatten(S_clip_img_refs_split).tolist()
        
        S_clip_annots_refs_flat.append(S_clip_img_annots_refs_flat)
        S_clip_annots_refs_split_flat.append(S_clip_img_annots_refs_split_flat)
        
        img_refs_best_simil_idx = S_clip_img_annots_refs_flat.index(max(S_clip_img_annots_refs_flat))   # best human scorer according to CLIP
        img_refs_split_best_simil_idx = S_clip_img_annots_refs_split_flat.index(max(S_clip_img_annots_refs_split_flat)) # best human scorer with CLIP        
        
        # ================
        ## CLIP cosine similarities for candidate seeds ('test' RCNN+VR and 'train' VisRel)
        # ================
        
        ## load candidates 'test' (RCNN+VR) and 'train' (VR)
        vrd_json_files = [f for f in zipped.namelist()
                          if (f.endswith('_vrd.json_copy') and f.startswith(file))
                         ]
        vrd_json_files.sort()
        
        for f in vrd_json_files: 
            # load 'test' candidate summary on 1st for-loop pass;
            # load 'train' candidate summary on 2nd pass
            with zipped.open(f, mode='r') as infile:
                data = infile.read()
            
            data= data.decode('utf8', 'strict')
            data_dict = ast.literal_eval(data)
            annot_dict = data_dict["annot_txt"]
            img_cand_split = [shorten_string(clean_visrel_seeds(seed), context_length=64) 
                              for key in annot_dict 
                              for seed in annot_dict[key]] #, sep=',', end=','
            
            if len(img_cand_split) != 0:
                img_cand_split = [seed for seed in set(img_cand_split) if seed != '']
            else:
                img_cand_split = ['.']
            
            # list of whole cand summaries ('test','train')
            img_cands.append(shorten_string('. '.join(img_cand_split)+'.', context_length=64))
            # list of 2 lists of split cand summary seeds ('test','train'), for one image
            if len(img_cand_split) != 0:
                img_cands_split.append([add_end_period(seed) for seed in img_cand_split if seed not in {'',}])
            else:
                img_cands_split.append(['.'])
        
        # C_cands.append(img_cands) <- [['test' cand, 'train' cand],['test' cand, 'train' cand],...], 1 sublist/image
        C_cands.append(img_cands)     
        C_cands_test.append(img_cands[0])
        C_cands_train.append(img_cands[1])
        
        C_cands_split.append(img_cands_split)
        C_cands_split_test.append(img_cands_split[0])
        C_cands_split_train.append(img_cands_split[1])
        
        
        img_cands_ntok = list(map(lambda x: len(word_tokenize(x)),img_cands))
        C_cands_ntok.append(img_cands_ntok)
        
        img_cands_split_nchunk = list(map(lambda x: len(x),img_cands_split))
        C_cands_split_nchunk.append(img_cands_split_nchunk)
        
        # structured output: list of 2 lists for 'test' and 'train' nltk-tokenized cands respectively
        img_cands_tok = [word_tokenize(seeds) for seeds in img_cands]
        # structured output: list made of 2 lists of lists for 'test' split lists and 'train' split lists
        #+ respectively; each split list contains a variable nbr of lists of nltk-tokenized split seeds
        img_cands_split_tok = [list(map(lambda x: word_tokenize(x),img_cand_split)) 
                               for img_cand_split in img_cands_split]
        
        C_cands_tok.append(img_cands_tok)
        C_cands_test_tok.append(img_cands_tok[0])
        C_cands_train_tok.append(img_cands_tok[1])
        
        C_cands_split_tok.append(img_cands_split_tok)
        C_cands_split_test_tok.append(img_cands_split_tok[0])
        C_cands_split_train_tok.append(img_cands_split_tok[1])


        # ================
        ## compute CLIP cosine similarities for candidate seeds (WHOLE and SPLIT)
        # ================
        img_cands_ctok = clip.tokenize([seeds for seeds in img_cands]).to(localdevice)
        img_cands_split_test_ctok = clip.tokenize(img_cands_split[0]).to(localdevice)
        img_cands_split_train_ctok = clip.tokenize(img_cands_split[1]).to(localdevice)
                                                                             
        with torch.no_grad():
            #image_features = model.encode_image(image_input).float().to(localdevice)
            img_cands_features = model.encode_text(img_cands_ctok).float().to(localdevice)
            img_cands_split_test_features = model.encode_text(img_cands_split_test_ctok).float().to(localdevice)
            img_cands_split_train_features = model.encode_text(img_cands_split_train_ctok).float().to(localdevice)
        
        #image_features /= image_features.norm(dim=-1, keepdim=True)
        img_cands_features /= img_cands_features.norm(dim=-1, keepdim=True)
        img_cands_split_test_features /= img_cands_split_test_features.norm(dim=-1, keepdim=True)
        img_cands_split_train_features /= img_cands_split_train_features.norm(dim=-1, keepdim=True)
        
        S_clip_img_cands = img_cands_features.cpu().numpy() @ img_features.cpu().numpy().T
        S_clip_img_cands_split_test = img_cands_split_test_features.cpu().numpy() @ img_features.cpu().numpy().T
        S_clip_img_cands_split_train = img_cands_split_train_features.cpu().numpy() @ img_features.cpu().numpy().T
        
        
        
        # ================
        ## METEOR scores for caption generation with exact/stem/lemmatized/synonym word matches
        # ================
        # Bib ref.
        # A. Lavie and A. Agarwal, “Meteor: an automatic metric for MT evaluation with high
        # levels of correlation with human judgments,” in Proc. 2nd Workshop on Statistical
        # Machine Translation, Prague, Czech Republic, pp. 228–231 (Jun. 2007).
        
        
        #  ======  METEOR with Porter stemming + synonymy matching
        # WHOLE SUMMARIES
        S_meteor_img_cands_s = list(map(lambda x: meteor_score.meteor_score(img_annots_refs_tok,x,
                                                                             stemmer=PorterStemmer(),
                                                                             wordnet=wordnet),
                                        img_cands_tok))
        
        S_meteor_cands_s_test.append(S_meteor_img_cands_s[0])
        S_meteor_cands_s_train.append(S_meteor_img_cands_s[1])
        
        # SPLIT SUMMARIES
        S_meteor_img_cands_s_split = [list(map(lambda x: meteor_score.meteor_score(img_annots_refs_split_tok,x,
                                                                                   stemmer=PorterStemmer(),
                                                                                   wordnet=wordnet),
                                               img_cand_split_tok))
                                      for img_cand_split_tok in img_cands_split_tok]
        
        S_meteor_img_cands_s_split_test = S_meteor_img_cands_s_split[0]
        S_meteor_img_cands_s_split_train = S_meteor_img_cands_s_split[1]
        
        S_meteor_cands_s_split_test.append(S_meteor_img_cands_s_split_test)
        S_meteor_cands_s_split_train.append(S_meteor_img_cands_s_split_train)
        
        
        # ======  Meteor with lemmatization + synonymy matching
        img_annots_refs_l_tok = [stage_vocab_dis(img_ref_tok) for img_ref_tok in img_annots_refs_tok]
        img_annots_refs_l_split_tok = [stage_vocab_dis(img_ref_split_tok) for img_ref_split_tok in img_annots_refs_split_tok]
        
        img_cands_l_tok = [stage_vocab_dis(img_cand_tok) for img_cand_tok in img_cands_tok]
        img_cands_l_split_tok = [list(map(lambda x: stage_vocab_dis(x),img_cand_split_tok))
                                 for img_cand_split_tok in img_cands_split_tok]
        img_cands_l_split_test_tok = img_cands_l_split_tok[:len(img_cands_l_split_tok[0])]
        img_cands_l_split_train_tok = img_cands_l_split_tok[len(img_cands_l_split_tok[0]):]
        
        C_refs_l_tok.append(img_annots_refs_l_tok)        
        C_refs_l_split_tok.append(img_annots_refs_l_split_tok)
        
        C_cands_l_tok.append([img_cands_l_tok[0],img_cands_l_tok[1]])
        C_cands_l_test_tok.append(img_cands_l_tok[0])
        C_cands_l_train_tok.append(img_cands_l_tok[1])
        
        C_cands_l_split_tok.append(img_cands_l_split_tok)
        C_cands_l_split_test_tok.append(img_cands_l_split_test_tok)
        C_cands_l_split_train_tok.append(img_cands_l_split_train_tok)
        
        # WHOLE SUMMARIES
        S_meteor_img_cands_l = list(map(lambda x: meteor_score.meteor_score(img_annots_refs_l_tok,x,
                                                                            stemmer=PorterStemmer(),
                                                                            wordnet=wordnet),
                                        img_cands_l_tok))
        
        S_meteor_cands_l_test.append(S_meteor_img_cands_l[0])
        S_meteor_cands_l_train.append(S_meteor_img_cands_l[1])
                
        # SPLIT SUMMARIES
        S_meteor_img_cands_l_split = [list(map(lambda x: meteor_score.meteor_score(img_annots_refs_l_split_tok,x,
                                                                                   stemmer=PorterStemmer(),
                                                                                   wordnet=wordnet),

                                               img_cand_l_split_tok)) 
                                      for img_cand_l_split_tok in img_cands_l_split_tok]
        
        S_meteor_img_cands_l_split_test = S_meteor_img_cands_l_split[0]
        S_meteor_img_cands_l_split_train = S_meteor_img_cands_l_split[1]
        
        S_meteor_cands_l_split_test.append(S_meteor_img_cands_l_split_test)
        S_meteor_cands_l_split_train.append(S_meteor_img_cands_l_split_train)
        
        
        
        # ================
        ## ======  BLEU (BiLingual Evaluation Understudy)
        # ================
        # Source code for nltk implementation of BLEU and GLEU scores: 
        #+      https://www.nltk.org/_modules/nltk/translate/bleu_score.html
        
        # WHOLE SUMMARIES
        S_bleu_img_cands_e = list(map(lambda x: bleu_score.sentence_bleu(img_annots_refs_tok,x,
                                                                         weights=ngram_weights,
                                                                         smoothing_function=bleu_smoothing),
                                      img_cands_tok))
        
        S_bleu_cands_e_test.append(S_bleu_img_cands_e[0])
        S_bleu_cands_e_train.append(S_bleu_img_cands_e[1])
        
        # SPLIT SUMMARIES
        S_bleu_img_cands_e_split = [list(map(lambda x: bleu_score.sentence_bleu(img_annots_refs_split_tok,x,
                                                                                weights=ngram_weights,
                                                                                smoothing_function=bleu_smoothing),
                                             img_cand_split_tok))
                                    for img_cand_split_tok in img_cands_split_tok]
        
        S_bleu_cands_e_split_test.append(np.mean(S_bleu_img_cands_e_split[0]))
        S_bleu_cands_e_split_train.append(np.mean(S_bleu_img_cands_e_split[1]))
        
        # WHOLE
        S_bleu_img_cands_l = list(map(lambda x: bleu_score.sentence_bleu(img_annots_refs_l_tok,x,
                                                                         weights=ngram_weights,
                                                                         smoothing_function=bleu_smoothing),
                                      img_cands_l_tok))
        
        S_bleu_cands_l_test.append(S_bleu_img_cands_l[0])
        S_bleu_cands_l_train.append(S_bleu_img_cands_l[1])
        
        # SPLIT
        S_bleu_img_cands_l_split = [list(map(lambda x: bleu_score.sentence_bleu(img_annots_refs_l_split_tok,x,
                                                                                weights=ngram_weights,
                                                                                smoothing_function=bleu_smoothing),
                                             img_cand_l_split_tok))
                                    for img_cand_l_split_tok in img_cands_l_split_tok]
        
        S_bleu_cands_l_split_test.append(np.mean(S_bleu_img_cands_l_split[0]))
        S_bleu_cands_l_split_train.append(np.mean(S_bleu_img_cands_l_split[1]))  
        
        
        ## Collect CLIP similarity score for each split human ref per image --> simils_img_refs_split
        img_wrk_annots_refs_split = C_wrk_annots_refs_split[idx]
        for ii, (_,l,_) in enumerate(img_wrk_annots_refs_split):
            img_split_start = sum([l for _,l,_ in img_wrk_annots_refs_split[:ii]])
            simils_img_refs_split = S_clip_img_annots_refs_split_flat[img_split_start:img_split_start+l]
        
        ## Note: 
        # When CPU based multiprocessing is used, if scores appended to file for each image 
        #+ requires posterior dynamic access, append operations order must be preserved. Affected files are:
        #+ S_bleu_cands_e_test
        #+ S_bleu_cands_e_train
        #+ S_bleu_cands_e_split_test
        #+ S_bleu_cands_e_split_train
        #+ S_bleu_cands_l_test
        #+ S_bleu_cands_l_train
        #+ S_bleu_cands_l_split_test
        #+ S_bleu_cands_l_split_train
        #+ S_meteor_cands_s_test
        #+ S_meteor_cands_s_train
        #+ S_meteor_cands_s_split_test
        #+ S_meteor_cands_s_split_train
        #+ S_meteor_cands_l_test
        #+ S_meteor_cands_l_train
        #+ S_meteor_cands_l_split_test
        #+ S_meteor_cands_l_split_train
        
        # In that context, in order to prevent  possible race conditions a lock or a semaphore mechanism
        #+ encompassing all append operations as a unique block should be implementated.
        
        '''
          File      Cand length |          CLIP             |   METEOR_s   |   METEOR_l   |    BLEU_e   |   BLEU_l
                         Te/Tr  | HRef_mx  RCNN+VR      VR  | RCNN+VR  VR  | RCNN+VR  VR  | RCNN+VR  VR | RCNN+VR  VR
        '''
        print(f'\
{file[0:11]:<9s}... w {img_cands_ntok[0]:>3d}/{img_cands_ntok[1]:<3d}\
{round(100*S_clip_img_annots_refs_flat[img_refs_best_simil_idx],2):>9.2f}%\
{round(100*list(S_clip_img_cands[:,0])[0],2):>8.2f}%\
{round(100*list(S_clip_img_cands[:,0])[1],2):>8.2f}%\
{100*S_meteor_img_cands_s[0]:>9.0f}\
{100*S_meteor_img_cands_s[1]:>5.0f}\
{100*S_meteor_img_cands_l[0]:>9.0f}\
{100*S_meteor_img_cands_l[1]:>5.0f}\
{100*S_bleu_img_cands_e[0]:>9.0f}\
{100*S_bleu_img_cands_e[1]:>5.0f}\
{100*S_bleu_img_cands_l[0]:>9.0f}\
{100*S_bleu_img_cands_l[1]:>5.0f}\n\
WCT:{(time.time()-start)//60:3.0f}:{(time.time()-start)%60:02.0f}{"s":>6s}\
{round(100*np.mean(simils_img_refs_split),2):>17.2f}%\
{round(100*np.mean(S_clip_img_cands_split_test[:,0]),2):>8.2f}%\
{round(100*np.mean(S_clip_img_cands_split_train[:,0]),2):>8.2f}%\
{100*np.mean(S_meteor_img_cands_s_split_test):>9.0f}\
{100*np.mean(S_meteor_img_cands_s_split_train):>5.0f}\
{100*np.mean(S_meteor_img_cands_l_split_test):>9.0f}\
{100*np.mean(S_meteor_img_cands_l_split_train):>5.0f}\
{100*np.mean(S_bleu_img_cands_e_split[0]):>9.0f}\
{100*np.mean(S_bleu_img_cands_e_split[1]):>5.0f}\
{100*np.mean(S_bleu_img_cands_l_split[0]):>9.0f}\
{100*np.mean(S_bleu_img_cands_l_split[1]):>5.0f}')
    
    return

In [None]:
# Establish minimal runtime directory structure
data_dir = 'Data_git/'
annot_dir = data_dir + 'Annots/'

# Load zipped crowdsourced files
infile = data_dir + 'crowd_set.zip'
with ZipFile(infile,'r') as zipped:
    img_lst = ['.'.join(f.split(sep='.')[:-1]) for f in zipped.namelist() if f.endswith('.png') or f.endswith('.jpg')]
    img_lst.sort()
    xml_lst = ['_train.'.join(f.split(sep='_train.')[:-1]) for f in zipped.namelist() if f.endswith('_train.xml')]
    xml_lst.sort()

files = [f for f in img_lst if f in xml_lst]

# Load crowdsourced annotations
with open(annot_dir + 'crowd-ref-annots.tsv', 'r', newline='') as infile:
    reader = csv.reader(infile, delimiter='\t')
    header = next(reader)
    ncol = len(header)
    ndarr = np.empty([0,ncol],dtype='str')
    for row in reader:
        ndarr = np.append(ndarr,[row],axis=0)
        
camp_annots_workers = [str(x) for x in list(ndarr[:,1])]
camp_annots_imgs = ['.'.join(str(x).split(sep='.')[:-1]) for x in list(ndarr[:,2])]     # img filename w/ infix
camp_annots_refs = [shorten_string(cleanup(str(x).replace('  ',' '))) for x in list(ndarr[:,5])]
camp_annots_refs_split = split_on_the_dot(camp_annots_refs)

In [None]:
## Post process R-CNN + VisRel visual relationships as stored in "*_vrd.json" files

infix = ("_train","_test","")

checklist = list(dict.fromkeys(camp_annots_imgs))   # eliminate dupes
max_to_show = min(24,len(files))
prev_state = random.getstate()
random.seed()
f_to_score = random.sample(files,max_to_show)

In [None]:
## Retrieve human ref summaries, calculate image/summary similarities with CLIP, 
#+ display results for whole summaries and for summaries split-on-the-dot.

scored_files =  list()
C_imgs_origin =  list()
C_imgs_preproc =  list()

txts =  list()       # list of sublists of strings; each sublist contains ordered whole human
                     #+  refs per image index-ordered per image, then per author.
txts_split =  list() # list of lists of sublist of strings; each sublist contains available human refs,
                     #+ split-on-the-dot for image; index-ordered per image, then per author, 
                     #+ then per sub-sentence
C_wrk_annots_refs =  list()
C_wrk_annots_refs_split =  list()

S_clip_refs_flat = list()
S_clip_refs_split_flat = list()

plt.figure(figsize=(10*max_to_show, 4*max_to_show))
plt.rcParams['font.family'] = 'monospace'

start = time.time()

zipped = ZipFile(data_dir + 'crowd_set.zip','r')  # in-memory
zipinfo = zipped.infolist()

for file in f_to_score:
    str_to_plt = ''
    str_to_print = ''
    
    if (file in checklist and file + '.jpg' in zipped.namelist()):
        infile = zipped.open(file + '.jpg')
        img = Image.open(infile).convert('RGBA')
        img_preproc = preprocess(img)
        
        C_imgs_origin.append(img)
        C_imgs_preproc.append(img_preproc)
        
        scored_files.append(file)
        
        img_wrk_annots_refs = [(camp_annots_workers[idx],camp_annots_refs[idx]) 
                               for idx, pix_name in enumerate(camp_annots_imgs) 
                               if pix_name == file]
        img_wrk_annots_refs_split = [(camp_annots_workers[idx],len(camp_annots_refs_split[idx]),camp_annots_refs_split[idx])
                                     for idx, pix_name in enumerate(camp_annots_imgs)
                                     if pix_name == file]              # Artem's idea
        
        C_wrk_annots_refs.append(img_wrk_annots_refs)                  # list of (_,_) tuples
        C_wrk_annots_refs_split.append(img_wrk_annots_refs_split)      # list of (_,_,_) tuples

        
        #img_refs_lst
        img_refs_flat = [annot for _,annot in img_wrk_annots_refs]
        #img_refs_split_lst
        img_refs_split_flat = [split_annot 
                               for _,_,split_annots in img_wrk_annots_refs_split 
                               for split_annot in split_annots]              # flattened list of split annots for img
        
        txts.append(img_refs_flat)
        txts_split.append([split_annots 
                           for _,_,split_annots in img_wrk_annots_refs_split])  # structured list of lists of split annots for img batch
        
        img_input = torch.tensor(np.stack([img_preproc,])).to(localdevice)  # 'img_preproc' can be numpy array(np.uint8) or PIL img
        img_refs_ctok = clip.tokenize(img_refs_flat).to(localdevice)
        img_refs_split_ctok = clip.tokenize(img_refs_split_flat).to(localdevice)
        img_refs_authors = [author for author, _ in img_wrk_annots_refs]
        
        with torch.no_grad():
            img_features = model.encode_image(img_input).float().to(localdevice)
            img_refs_features = model.encode_text(img_refs_ctok).float().to(localdevice)
            img_refs_split_features = model.encode_text(img_refs_split_ctok).float().to(localdevice)
        
        img_features /= img_features.norm(dim=-1, keepdim=True)
        img_refs_features /= img_refs_features.norm(dim=-1, keepdim=True)
        img_refs_split_features /= img_refs_split_features.norm(dim=-1, keepdim=True)
        
        #similarity
        S_clip_img_refs = img_refs_features.cpu().numpy() @ img_features.cpu().numpy().T
        #similarity_split
        S_clip_img_refs_split = img_refs_split_features.cpu().numpy() @ img_features.cpu().numpy().T
        
        #similarities 
        S_clip_img_refs_flat= np.ndarray.flatten(S_clip_img_refs).tolist()
        #similarities_split 
        S_clip_img_refs_split_flat = np.ndarray.flatten(S_clip_img_refs_split).tolist()
        
        S_clip_refs_flat.append(S_clip_img_refs_flat)
        S_clip_refs_split_flat.append(S_clip_img_refs_split_flat)
        
        print('\n'+file, len(img_refs_flat), '('+'+'.join([str(subset[1]) 
                                                           for subset in img_wrk_annots_refs_split])+')'
             )
        
        split_start = []
        similarity_split_to_print = []
        
        for idx, (_,l,_) in enumerate(img_wrk_annots_refs_split):
            split_start = sum([l for _,l,_ in img_wrk_annots_refs_split[:idx]])
            similarity_split_to_print = S_clip_img_refs_split[split_start:split_start+l,0].tolist()
            average = round(100*np.mean(similarity_split_to_print),1)
            str_intermed = ';'.join([str(round(100*x,1))+'%' for x in similarity_split_to_print])
            str_to_print = f'{100*S_clip_img_refs[idx,0]:49.2f}% ({str_intermed} Mean:{average}%)'
            print(str_to_print)
        
        for idx, (author, summary) in enumerate(img_wrk_annots_refs):
            str_to_plt += f'\n{author:>8}: {100*S_clip_img_refs[idx,0]:.2f}%: \"{shorten_string(summary, context_length=64)}\"'
        str_to_plt += f'\nAverage cosine similarity (CLIP v1.0): {100*np.mean(S_clip_img_refs_flat):.1f}%'
        
        # prepare 14 rows x 2 cols grid, where each img uses 4 grid spots (eg. 1,2,3,4 then 5,6,7,8, etc.)
        #+ counting left to right then continuing one grid row down
        plt.subplot(2*max_to_show, 2, (4*(len(C_imgs_preproc)-1)+1, 4*(len(C_imgs_preproc)-1)+4))
        plt.imshow(img)
        plt.text(img.size[0]*1.1, img.size[1], 
                 file + str_to_plt, 
                 fontsize=12)
        plt.subplots_adjust(right=0.75) #,hspace=0.4)  # make room for text
        plt.xticks([])
        plt.yticks([])
    
plt.show()

print(f"Time to complete {time.time()-start}s")

In [None]:
## Generate scores:
# ================
# - CLIP individual candidates' cosine similarities: human, RCNN+VR, VR 
# - METEOR: individual candidates' RCNN+VR, VR vs. human refs.
# - BLEU: individual candidates' and corpus based RCNN+VR, VR vs. human refs.

C_refs_tok = list()
C_refs_split_tok = list()

C_refs_l_tok = list()
C_refs_l_split_tok = list()

C_annots_refs_tok = list()
C_annots_refs_split_tok = list()

C_cands = list()
C_cands_test = list()
C_cands_train = list()

C_cands_split = list()
C_cands_split_test = list()
C_cands_split_train = list()

C_cands_ntok = list()           # nbr of nltk-tokens in whole cands (RCNN+VR 'test', VR'train')
C_cands_split_nchunk = list()   # nbr of chunks in split cands list(RCNN+VR 'test', VR'train')

C_cands_tok = list()
C_cands_test_tok = list()
C_cands_train_tok = list()

C_cands_split_tok = list()
C_cands_split_test_tok = list()
C_cands_split_train_tok = list()

C_cands_l_tok = list()
C_cands_l_test_tok = list()
C_cands_l_train_tok = list()

C_cands_l_split_tok = list()
C_cands_l_split_test_tok = list()
C_cands_l_split_train_tok = list()

S_clip_refs_flat = list()
S_clip_annots_refs_flat = list()
S_clip_annots_refs_split_flat = list()

S_meteor_cands_s_test = list()          # Meteor w/ Porter stemmer v1
S_meteor_cands_s_train = list()         #  id
S_meteor_cands_s_split_test = list()    # Meteor w/ split-on-the-dot and Porter stemmer v1 treatment
S_meteor_cands_s_split_train = list()   #  id

S_meteor_cands_l_test = list()          # Meteor w/ lemmatization (prior to application of Porter stemmer v1) 
S_meteor_cands_l_train = list()         #  id
S_meteor_cands_l_split_test = list()    # Meteor w/ split-on-the-dot, lemmatization (prior to application of Porter stemmer v1) 
S_meteor_cands_l_split_train = list()   #  id

S_bleu_cands_e_test = list()
S_bleu_cands_e_train = list()
S_bleu_cands_e_split_test = list()
S_bleu_cands_e_split_train = list()

S_bleu_cands_l_test = list()
S_bleu_cands_l_train = list()
S_bleu_cands_l_split_test = list()
S_bleu_cands_l_split_train = list()

ngram_max_n = 4
ngram_weights = [0.35, 0.45, 0.1, 0.1]   # default: [1/ngram_max_n for _ in range(ngram_max_n)]
bleu_smoothing = bleu_score.SmoothingFunction().method1     # output 0.08, 0.09

print(f"               VR length |          CLIP             |   METEOR_s  |   METEOR_l  |    BLEU_e   |   BLEU_l")
print(f" File             Te/Tr  | HRef_mx  RCNN+VR     VR   | RCNN+VR  VR | RCNN+VR  VR | RCNN+VR  VR | RCNN+VR  VR")

for idx in range(len(C_imgs_preproc)):
    benchmark_scores(idx)

print(f'\n{"Campaign scores:  BLEU_e  BLEU_l  METEOR_s  METEOR-l"}\n\
{"                 (M-ave) (M-ave)  (M-ave)   (M-ave)"}\n\
{"R-CNN+VR (Te) w":>16s}:  \
{100*np.mean(S_bleu_cands_e_test):4.0f}  \
{100*np.mean(S_bleu_cands_l_test):5.0f}  \
{100*np.mean(S_meteor_cands_s_test):8.0f}  \
{100*np.mean(S_meteor_cands_l_test):7.0f}\n\
{"s":>15s}:  \
{100*np.mean(S_bleu_cands_e_split_test):4.0f}  \
{100*np.mean(S_bleu_cands_l_split_test):5.0f}  \
{100*np.mean([score for scores in S_meteor_cands_s_split_test for score in scores]):8.0f}  \
{100*np.mean([score for scores in S_meteor_cands_l_split_test for score in scores]):7.0f}\n\
{"VR (Tr) w":>16s}:  \
{100*np.mean(S_bleu_cands_e_train):4.0f}  \
{100*np.mean(S_bleu_cands_l_train):5.0f}  \
{100*np.mean(S_meteor_cands_s_train):8.0f}  \
{100*np.mean(S_meteor_cands_l_train):7.0f}\n\
{"s":>15s}:  \
{100*np.mean(S_bleu_cands_e_split_train):4.0f}  \
{100*np.mean(S_bleu_cands_l_split_train):5.0f}  \
{100*np.mean([score for scores in S_meteor_cands_s_split_train for score in scores]):8.0f}  \
{100*np.mean([score for scores in S_meteor_cands_l_split_train for score in scores]):7.0f}')