Skip to content

elliottd/awareness

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Awareness Evaluation

This repository implements the Adversarial Awareness Evaluation defined in Elliott (2018).

Dependencies:

  • Python 3.5+
  • scipy
  • numpy
  • seaborn
  • matplotlib

Reproducing Table 1 and Figure 2

  1. Unzip the system output data:
for x in `ls data/`; do cd data/$x/; unzip $x.zip; cd -; done

This produces three new directories inside data: decinit, hierattn, and trgmul. Each directory contains the decoded sentences, the segment-level Meteor scores (.scores), and the segment-level language modelling scores (logps-).

  1. Run the awareness.py script on the output .scores files. The output shows you the average Meteor score of the model evaluated with the congruent image data, followed by the average Meteor score of the model evaluated with the incongruent image data, the average Awareness of the model (Eq. 1), and whether we can reject the null hypothesis given the results.
python awareness.py --congruent data/decinit/decinit.val.tok.de.congruent.scores \
                    --incongruent data/decinit/decinit.*.random*.scores \
                    --meteor

Mean congruent score: 0.5852
Mean incongruent score: 0.5824 +- 0.00043

Average awareness: 0.0028 +- 0.00043
Fisher's method Chi-Squared = 32.79, p=0.0003
  1. Run the violin_plots.py script to generate Figure 2.
python violin_plots.py --model1 data/trgmul/trgmul.val*.scores 
                       --model2 data/decinit/decinit.val*.scores 
                       --model3 data/hierattn/hierattn.val*.scores 
                       --meteor

Figure 2 from the paper

Evaluating your own model

  1. Generate the shuffled image data: python shuffle_images.py --image_order_file val.txt --features val-resnet50-avgpool.npy, for example. This will produce five new .npy files and text files that show the shuffled order of the images.

  2. Generates translations for the different shuffles of the image data using your model.

  3. Score the translations at the sentence-level.

  4. Follow the instructions in Step 2 of Reproducing Table 1 and Figure 2 using the scores files from the previous step.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages