Skip to content

cu-mkp/word-embeddings

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 

Repository files navigation

word-embeddings

Goal

Using BERT-trained contextual word embeddings, measure the similarity between certain relevant vocabulary words used in different subject tags. I will identify a subset of phrases or tags in both the English translation and the French text. Using these contextual embeddings, I would like to determine whether translation loss can be measured with word embeddings, and if so, what loss has been experienced in the translation of this manuscript.

Benefit to M&K

Considering the level of effort that was put into the translation process and specifically the effort that was taken to maintain the translator’s voice, this project will provide a potential metric for semantic faithfulness. The intratext vocabulary comparisons will provide an additional level of insight into the author-practitioner’s lexicon for different contexts.

Next Steps

Preliminary background research:

  • Review outstanding literature on BERT and translation loss
  • Review the manuscript and its translations/editions

Technological workflow:

  • Google Collab with HuggingFace transformer library for BERT

Manuscript object:

  • What type of preprocessing needs to be done with the text
  • Narrow down examples to work with from within the manuscript
  • Some textual analysis
  • Develop more formal hypotheses about translation loss

References

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published