code-switching

Relevant data and R notebooks can be found in the csv's (and numbers) directory. ALl other relevant analysis code can be found in the notebooks directory.

All dependency ngrams are of the form: (dependency relation label, word, head).

redone_ prefix: I had to redo my code-switched analysis at one point because I had not used the full code-switched dataset to begin with.

All word_freq_eng values are calculated by using phrasefinder.io on the American English corpus and total_counts for American English from the Google Books corpus. All word_freq values have had negative natural log (ln) applied to them in redoing_cs_data.csv (most up to date code-switched dataset) and non_cs_sentences_full.csv (most up to date non-code-switched dataset).

In the word_freq_eng column in non_cs_sentences_full.csv, frquencies were calculated using only the first word wherever the column contains a phrase of more than one word. This was for consistency with word_freq_eng in redoing_cs_data.csv.

full_cs_back_data.csv was formerly truncated.csv

Useful links:

Getting frequency raw counts: https://phrasefinder.io/
Getting frequency total counts: http://storage.googleapis.com/books/ngrams/books/datasetsv2.html

Part-of-speech tagsets for English and Chinese: https://github.com/explosion/spaCy/blob/master/spacy/glossary.py

Name		Name	Last commit message	Last commit date
Latest commit History 241 Commits
1-grams		1-grams
SEAME		SEAME
__pycache__		__pycache__
csv's (and numbers)		csv's (and numbers)
distribution plots		distribution plots
figures not for paper		figures not for paper
from calvillo repo		from calvillo repo
non surprisal data		non surprisal data
notebooks		notebooks
outputs		outputs
regression results tables		regression results tables
scatter plots		scatter plots
surprisal numbers only		surprisal numbers only
training data collection		training data collection
trying baidu tieba webscrape		trying baidu tieba webscrape
zh wiki		zh wiki
LICENSE		LICENSE
README.md		README.md
baidu_tieba_test_data.txt		baidu_tieba_test_data.txt
build.xml		build.xml
config.py		config.py
dependency_parsing.py		dependency_parsing.py
dynamic.sh		dynamic.sh
non_cs_test_ngram_eng.txt		non_cs_test_ngram_eng.txt
redoing_cs_dep_zh.txt		redoing_cs_dep_zh.txt
testing.xml		testing.xml
zh_deprel_non_cs.txt		zh_deprel_non_cs.txt
zh_deprel_non_cs_2.txt		zh_deprel_non_cs_2.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

code-switching

About

Uh oh!

Releases

Packages

Languages

License

db758/code-switching

Folders and files

Latest commit

History

Repository files navigation

code-switching

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages