Skip to content

ThilinaRajapakse/low-resource-language-models

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Low Resource Language Models

Implemented Language Models

  1. Kikuyu

  2. Ganda

    • RoBERTa (Masked Language Modeling) (Download)
    • GPT-2 (Language Generation) (Download)

Setup

  1. Install Anaconda or Miniconda Package Manager from here

  2. Create a new virtual environment and install packages.
    conda create -n transformers python pandas tqdm
    conda activate transformers
    If using cuda:
        conda install pytorch cudatoolkit=10.1 -c pytorch
    else:
        conda install pytorch cpuonly -c pytorch

  3. Install simpletransformers.
    pip install simpletransformers

Usage

Testing RoBERTa language models

  1. Download the compressed model files from the link and extract to the models/ directory. (E.g. models/kikuyu_baseline)
  2. Run test_language_model.py.
    1. Change the language variable to either "kikuyu" or "ganda" depending on the requirement.
    2. Change the string in line 24 of test_language_model.py to test different sentences. The string may contain one <mask> token which the model will attempt to predict.

Testing GPT-2 language generation

  1. Download the compressed model files from the link and extract to the models/ directory. (E.g. models/ganda-gpt2)
  2. Run test_language_generation.py.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages