Skip to content
Fast, simple identification of codeswitching in Tweets and other short messages.
Python Shell
Branch: master
Clone or download

Latest commit

Fetching latest commit…
Cannot retrieve the latest commit at this time.

Files

Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
models
params
tests
tools
.gitignore
LICENSE
README.md
__init__.py
api_sample.py
codeswitchador.py
csvunicode.py
eval_codeswitch.py
eval_hit2.sh
freqratio.py
lid_constants.py
lidlists.py
make_ratiolist.sh
make_wordlist.sh
metrics.py
scalereader.py
wordlist.py
wordlistlid.py

README.md

Codeswitchador was developed as a part of the SCALE 2012 summer workshop at the Johns Hopkins Human Language Technology Center of Excellence.

Runnable shell scripts

  • make_ratiolist.sh: qsub-able wrapper for freqratio.py
  • make_wordlist.sh: qsub-able wrapper for wordlist.py

Runnable Python scripts

  • api_sample.py: A sample of using the codeswitchador API.
  • eval_codeswitch.py: Evaluate the performance of codeswitching models 0, 1.0, 1.5.
  • freqratio.py: Create a frequency ratio list from two wordlists.
  • wordlist.py: Create a wordlist from a corpus.

Libraries

  • codeswitchador.py: Support for codeswitching detection.
  • lid_constants.py: Constants used by many files.
  • lidlists.py: Wordlists and paths used by idiotLID and wordlist-based models.
  • scalereader.py: Support for reading from common SCALE file formats (e.g., Jerboa output).
  • wordlistlid.py: Wordlist-based LID/CS models.

The Basics

Most common things you'll need to do:

  1. Create wordlists. See:
    • tools/make_eng_wordlist.sh
    • tools/make_spa_wordlist.sh

TODO: More things here!

License

Codeswitchador is distributed under the Simplified BSD License. See LICENSE for more information.

You can’t perform that action at this time.