This repository contains all the code and data necessary to replicate the results of related dissertation. The custom test code and translated test set data are directly accessible through this GitHub page. The rest of the data is linked to below, including the trained multimodal models.
All multimodal models used for testing can be found in the following Google Drive folder: https://drive.google.com/drive/folders/14HsxuSsxQqEAi_5bBk46Fkst1TAnznfK?usp=sharing
This folder includes all tested combinations, at all tested Imagined Embedding Learning Epochs.
Both test corpora are available in the Word Relatedness folder, alongside the test code.
The MEN translated corpus was originally obtained in the English language, and the original should also be cited in case you use the translation. The original English version is available here, and should be cited as:
Multimodal Distributional Semantics E. Bruni, N. K. Tran and M. Baroni. Journal of Artificial Intelligence Research 49: 1-47.
The GeoSim test corpus is also available here, in the original project's GitHub page, and should be cited as:
Portuguese Word Embeddings for the Oil and Gas Industry: development and evaluation Diogo Gomes, Fabio Cordeiro, Bernardo Consoli, Nikolas Santos, Viviane Moreira, Renata Vieira, Silvia Moraes, Alexandre Evsukoff. "Portuguese Word Embeddings for the Oil and Gas Industry: Development and Evaluation". Computers in Industry, vol. 124, 2021, p. 103347. doi:10.1016/j.compind.2020.103347. http://www.sciencedirect.com/science/article/pii/S0166361520305819
The Analogy Prediction test set was made available alongside NILC's Embedding repository, as a way to intrinsically test the embeddings. It can be found here. The code to run the tests can be found alongside the test sets.
The Semantic Similarity test set was also developed by NILC, and is available here. The code to evaluate this test set can be found here.
The Named Entity Recognition corpora are HAREM 1, MiniHarem and GeoCorpus 3.0. MiniHarem was used as the test set for a NER NN which was trained with HAREM 1, while GeoCorpus 3.0 was 10-Fold Cross-Validated.
Both HAREM 1 and MiniHarem can be found here. GeoCorpus 3.0 can be found here.
The code used for running the NER test can be found here.
TBD