Semi Supervised Word Sense Disambiguation

This project implements Word Sense Disambiguation and Word Sense Induction. It compares both ways of disambiguating words within a context.

Link to our project on GitHub

How to use

Every functionality can be used with a command in the terminal. Before you start, make sure to add two files to the parent directory of our project:

The file containing FastText embeddings: It is called cc.fr.300.bin and can be downloaded here. Just search for French and click on bin.
The two folders with the trained models and the trained clusters: The folder containing the classification models is named trained_models and that for the clustering models is named trained_kmeans. Both are included in the zip file we handed in.

With this done, everything else you need to know are the commands for the terminal. Here is an overview over all the options:

Command	Action
help, -h	Display the online help we provided
--wsi, -i	Show our clustering results for the different embeddings
--wsd, -d	Show our classification results for the different embeddings and the split method. WARNING: This takes around 1h to execute!
--compare, -c	Show the clustering results and the classification results to compare them.
--decrease, -dv	Shows the results of the classification with decreasing training examples. WARNING: This takes around 1h to execute!
--increase, -ic	See how many examples should be added as constraints in order for Kmeans to achieve a better quality than a WSD classifier
--verbs, -v	Show all available verbs we trained our models on. There are 66 verbs in total.

The next three options have to be used together. With them, you can provide the model with a sentence and a lemma and it predicts the lemma's sense used in the sentence:

Command	Action
--sentence, -s	Provide a the sentence to be evaluated between quotes
--lemma, -l	Provide the lemma that should be looked at in the sentence. This is used to select the right pretrained model
--mode, -m	Provide either "wsd" or "wsi" depending on which method you want to use to predict the sense. The default mode is WSI.

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
data		data
src		src
vectors		vectors
.gitignore		.gitignore
main.py		main.py
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Semi Supervised Word Sense Disambiguation

How to use

About

Releases

Packages

Contributors 3

Languages

Caegi/Semi_Supervised_Word_Sence_Disambiguation

Folders and files

Latest commit

History

Repository files navigation

Semi Supervised Word Sense Disambiguation

How to use

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages