This project explores whether certain substrings are statistically more likely to appear in some languages than others and how well they can act as fingerprints for a language according to a maximized likelihood ratio.
language_substrings.ipynb
: Main notebook containing data collection, analysis, and results.environment.yml
: Contains the necessary libraries for running the notebook