Skip to content

arundprabhu/Code-Mixed-Dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Code-Mixed-Dialog

This repository contains the dataset used in our paper : Codeswitched Sentence Creation using Dependency Parsing.

The sentiment data for Hindi and Marathi are provided in the respective directories. We use the code provided in Sub-word-LSTM for training and testing.

Method Overview

Given a sentence we extract independent phrases using Stanford Parser. The independent phrases are translated using Google's On-device NMT and transliterated using Indic trans. The original phrases are replaced by these phrases in such a way that the CMI of the resulting codemixed sentence is maximum.

Dataset generation

We use the test set of Sentiment 140 as the original dataset for generating the codemixed dataset.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published