Code-Mixed-Dialog

This repository contains the dataset used in our paper : Codeswitched Sentence Creation using Dependency Parsing.

The sentiment data for Hindi and Marathi are provided in the respective directories. We use the code provided in Sub-word-LSTM for training and testing.

Method Overview

Given a sentence we extract independent phrases using Stanford Parser. The independent phrases are translated using Google's On-device NMT and transliterated using Indic trans. The original phrases are replaced by these phrases in such a way that the CMI of the resulting codemixed sentence is maximum.

Dataset generation

We use the test set of Sentiment 140 as the original dataset for generating the codemixed dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Hindi		Hindi
Marathi		Marathi
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hindi

Hindi

Marathi

Marathi

README.md

README.md

Repository files navigation

Code-Mixed-Dialog

Method Overview

Dataset generation

About

Releases

Packages

arundprabhu/Code-Mixed-Dataset

Folders and files

Latest commit

History

Repository files navigation

Code-Mixed-Dialog

Method Overview

Dataset generation

About

Resources

Stars

Watchers

Forks