Skip to content

bsconsoli/Enriching-Portuguese-Word-Embeddings-with-Visual-Information

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

# Enriching Portuguese Word Embeddings with Visual Information

This repository contains all the code and data necessary to replicate the results of related dissertation. The custom test code and translated test set data are directly accessible through this GitHub page. The rest of the data is linked to below, including the trained multimodal models.

Pre-Trained Multimodal Models

All multimodal models used for testing can be found in the following Google Drive folder: https://drive.google.com/drive/folders/14HsxuSsxQqEAi_5bBk46Fkst1TAnznfK?usp=sharing

This folder includes all tested combinations, at all tested Imagined Embedding Learning Epochs.

Word Relatedness Tests

Both test corpora are available in the Word Relatedness folder, alongside the test code.

The MEN translated corpus was originally obtained in the English language, and the original should also be cited in case you use the translation. The original English version is available here, and should be cited as:

Multimodal Distributional Semantics E. Bruni, N. K. Tran and M. Baroni. Journal of Artificial Intelligence Research 49: 1-47.

The GeoSim test corpus is also available here, in the original project's GitHub page, and should be cited as:

Portuguese Word Embeddings for the Oil and Gas Industry: development and evaluation Diogo Gomes, Fabio Cordeiro, Bernardo Consoli, Nikolas Santos, Viviane Moreira, Renata Vieira, Silvia Moraes, Alexandre Evsukoff. "Portuguese Word Embeddings for the Oil and Gas Industry: Development and Evaluation". Computers in Industry, vol. 124, 2021, p. 103347. doi:10.1016/j.compind.2020.103347. http://www.sciencedirect.com/science/article/pii/S0166361520305819

Analogy Prediction Tests

The Analogy Prediction test set was made available alongside NILC's Embedding repository, as a way to intrinsically test the embeddings. It can be found here. The code to run the tests can be found alongside the test sets.

Sentence Similarity Tests

The Semantic Similarity test set was also developed by NILC, and is available here. The code to evaluate this test set can be found here.

Named Entity Recognition Tests

The Named Entity Recognition corpora are HAREM 1, MiniHarem and GeoCorpus 3.0. MiniHarem was used as the test set for a NER NN which was trained with HAREM 1, while GeoCorpus 3.0 was 10-Fold Cross-Validated.

Both HAREM 1 and MiniHarem can be found here. GeoCorpus 3.0 can be found here.

The code used for running the NER test can be found here.

Citing this work

TBD

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages