Skip to content

AlexandreH13/word_similarity_visualization

Repository files navigation

Word Similarity Visualization

An example of word similarity visualization using an Word2Vec model and networkx

Libs and tools:


Project Description

This repository provides an example of word similarity visualization. Here we use a word2vec model that vectorizes each word, cosine similarity to calculate the distance between each word vector and networkx to plot the word embedding into a graph structure

The example uses a news dataset and the model is trained using the headlines of three different categories: Politics, Entertainment and Travel

The model generates vector to represent each word. So, we can apply a lot of methods in the vectors space, like cosine similarity, to obtain the most similar words. This can be use to form clusters of words

alt text


Results

Clustering

First we use the word vectors in a Kmeans method to see how well the groups(news categories) are splitted. To determine the optimal numbers of clusters, the elbow method was applied. We can see in the image below that the optimal number of clusters matches with the number of categories.

alt text

To see if the groups were formed correctly, we provide words from each category and check if its groups are the same.

alt text

Visualization

Finally, we plot the clusters using the graph structure. As can be seen in the image below, a graph is created were its vertices represents the words in the vocabulary and the edges represents the distance (similarity)

alt text

About

Word similarity using Word2Vec model and Networkx

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published