Word Similarity Visualization

An example of word similarity visualization using an Word2Vec model and networkx

Libs and tools:

Python: version 3.7.5
Pandas: version 1.1.1
Numpy: version 1.17.3
NLTK: version 3.5
Gensim: version 3.8.0
Scikit-learn: version 0.23.2
Plotly: version 4.14.1
Networkx: version 2.5
Yellowbrick: version 1.3

Project Description

This repository provides an example of word similarity visualization. Here we use a word2vec model that vectorizes each word, cosine similarity to calculate the distance between each word vector and networkx to plot the word embedding into a graph structure

The example uses a news dataset and the model is trained using the headlines of three different categories: Politics, Entertainment and Travel

The model generates vector to represent each word. So, we can apply a lot of methods in the vectors space, like cosine similarity, to obtain the most similar words. This can be use to form clusters of words

Results

Clustering

First we use the word vectors in a Kmeans method to see how well the groups(news categories) are splitted. To determine the optimal numbers of clusters, the elbow method was applied. We can see in the image below that the optimal number of clusters matches with the number of categories.

To see if the groups were formed correctly, we provide words from each category and check if its groups are the same.

Visualization

Finally, we plot the clusters using the graph structure. As can be seen in the image below, a graph is created were its vertices represents the words in the vocabulary and the edges represents the distance (similarity)

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
imgs		imgs
models		models
README.md		README.md
clustering.ipynb		clustering.ipynb
read_data.ipynb		read_data.ipynb
similarity_visualization.ipynb		similarity_visualization.ipynb
temp-plot.html		temp-plot.html
w2v_model.ipynb		w2v_model.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Word Similarity Visualization

An example of word similarity visualization using an Word2Vec model and networkx

Libs and tools:

Project Description

Results

Clustering

Visualization

About

Releases

Packages

Languages

AlexandreH13/word_similarity_visualization

Folders and files

Latest commit

History

Repository files navigation

Word Similarity Visualization

An example of word similarity visualization using an Word2Vec model and networkx

Libs and tools:

Project Description

Results

Clustering

Visualization

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages