Skip to content
This repository has been archived by the owner on Apr 19, 2024. It is now read-only.
/ codeswitch Public archive
forked from sagorbrur/codeswitch

CodeSwitch is a NLP tool, can use for language identification, pos tagging, name entity recognition, sentiment analysis of code mixed data.

License

Notifications You must be signed in to change notification settings

ctarnold/codeswitch

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Code Switch

Documentation Status PyPI Version Colab Notebook Downloads

CodeSwitch is an NLP tool, can use for language identification, pos tagging, name entity recognition, sentiment analysis of code mixed data.

Supported Code-Mixed Language

We used LinCE dataset for training multilingual BERT model using huggingface transformers. LinCE has four language mixed data. We took three of it spanish-english, hindi-english and nepali-english. Hope we will train and add other language and task too.

  • Spanish-English(spa-eng)
  • Hindi-English(hin-eng)
  • Nepali-English(nep-eng)

Language Code

  • spa-eng for spanish-english
  • hin-eng for hindi-english
  • nep-eng for nepali-english

Installation

pip install codeswitch

Dependency

  • pytorch >=1.6.0

Training Details

  • All three(lid, ner, pos) sequence tagging model was trainend with huggingface token classification
  • Sentiment Analysis Model trained with huggingface text classification
  • You can find every model and evaluation results here

Features & Supported Language

  • Language Identification
    • spanish-english
    • hindi-english
    • nepali-english
  • POS
    • spanish-english
    • hindi-english
  • NER
    • spanish-english
    • hindi-english
  • Sentiment Analysis
    • spanish-english

Language Identification

from codeswitch.codeswitch import LanguageIdentification
lid = LanguageIdentification('spa-eng') 
# for hindi-english use 'hin-eng', 
# for nepali-english use 'nep-eng'
text = "" # your code-mixed sentence 
result = lid.identify(text)
print(result)

POS Tagging

from codeswitch.codeswitch import POS
pos = POS('spa-eng')
# for hindi-english use 'hin-eng'
text = "" # your mixed sentence 
result = pos.tag(text)
print(result)

NER Tagging

from codeswitch.codeswitch import NER
ner = NER('spa-eng')
# for hindi-english use 'hin-eng'
text = "" # your mixed sentence 
result = ner.tag(text)
print(result)

Sentiment Analysis

from codeswitch.codeswitch import SentimentAnalysis
sa = SentimentAnalysis('spa-eng')
sentence = "El perro le ladraba a La Gatita .. .. lol #teamlagatita en las playas de Key Biscayne este Memorial day"
result = sa.analyze(sentence)
print(result)
# [{'label': 'LABEL_1', 'score': 0.9587041735649109}]

Acknowledgement

About

CodeSwitch is a NLP tool, can use for language identification, pos tagging, name entity recognition, sentiment analysis of code mixed data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 95.1%
  • Python 4.9%