Skip to content
Sentence to emoji translation using sent2vec and emoji2vec's dataset
Jupyter Notebook Python
Branch: master
Clone or download

Latest commit

Fetching latest commit…
Cannot retrieve the latest commit at this time.

Files

Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
EmojiTranslation
JupyterNotebooks
plots
.gitignore
LICENSE
README.md
Results.md
requirements.txt

README.md

Sentence to Emoji Translation (or 📄🗜➡️😆)

A novel chunking approach to summarizing a sentence to emoji translation using sent2vec and some of emoji2vec's dataset. This is a final senior research project for my undergraduate degree at Clarion University. The paper and explanation of the algorithm can be seen at TEMPORARY LINK.

Folder Structure

  • EmojiTranslation ➡️ Emoji Translation module
  • JupyterNotebooks ➡️ Literate programming-esque emoji translation algorithm
  • plots ➡️ Plots for the paper

Getting off the Ground

  • This project requires Python 3
  • Install dependencies with pip install -r requirements.txt
  • Install sent2vec according to the github repo
  • Download a sent2vec model to the models/ directory
  • Download the emoji2vec dataset to the data/ directory

Jupyter Notebook

The JupyterNotebook directory contains all of the Jupyter Notebooks that were used to develop this algorithm. Most of them are accurately named and a lot of them have a ton of comments and markdown to explain everything. the command cd JupyterNotebooks && jupyter notebook starts the jupyter notebook server and allows you to explore the algorithm

Module

The module contains the two separate algorithms in two separate classes. To use them simply import the Translators submodule from the EmojiTranslation module. In that submodule there are two classes of note ExhaustiveChunkingTranslation and PartOfSpeechEmojiTranslator. They both are instantiated with three paramters. An example of use is shown below

# Import the translation submodule 
from EmojiTranslation import Translators

# Paramererize the constructor arguments
emoji_file = "./data/emoji_joined.txt"         # Emoji keyword file from emoji2vec
sent2vec_model = "../models/wiki_unigrams.bin" # sent2vec model
nothing_lemma_func = lambda x: x               # Lemmatization function that does nothing

# Instantiate the exhaustive and part of speech translation algorithms
exh = Translators.ExhaustiveChunkingTranslation(emoji_file, sent2vec_model, nothing_lemma_func)
pos = Translators.PartOfSpeechEmojiTranslator(emoji_file, sent2vec_model, nothing_lemma_func)

# Translate a sentence
sent = "the dog ran fast"
print(exh.summarize(sent))
print(pos.summarize(sent))
You can’t perform that action at this time.