Character-Level Language Models Repo 🕺🏽

This repository contains multiple character-level language models (charLLM). Each language model is designed to generate text at the character level, providing a granular level of control and flexibility.

🌟 Available Language Models

Character-Level MLP LLM (First MLP LLM)
GPT-2 (under process)

Character-Level MLP

The Character-Level MLP language model is implemented based on the approach described in the paper "A Neural Probabilistic Language Model" by Bential et al. (2002). It utilizes a multilayer perceptron architecture to generate text at the character level.

Installation

With PIP

This repository is tested on Python 3.8+, and PyTorch 2.0.0+.

First, create a virtual environment with the version of Python you're going to use and activate it.

Then, you will need to install PyTorch.

When backends has been installed, CharLLMs can be installed using pip as follows:

pip install charLLM

With GIT

CharLLMs can be installed using conda as follows:

git clone https://github.com/RAravindDS/Neural-Probabilistic-Language-Model.git

Quick Tour

To use the Character-Level MLP language model, follow these steps:

Install the package dependencies.
Import the CharMLP class from the charLLM module.
Create an instance of the CharMLP class.
Train the model on a suitable dataset.
Generate text using the trained model.

Demo for NPLM (A Neural Probabilistic Language Model)

# Import the class 
>>> from charLLM import NPLM # Neural Probabilistic Language Model
>>> text_path = "path-to-text-file.txt" 
>>> model_parameters = {
    "block_size" :3, 
    "train_size" :0.8, 
    'epochs' :10000, 
    'batch_size' :32, 
    'hidden_layer' :100, 
    'embedding_dimension' :50,
    'learning_rate' :0.1 
    }
>>> obj = NPLM(text_path, model_parameters)  # Initialize the class 
>>> obj.train_model() 
## It outputs the val_loss and image 
>>> obj.sampling(words_needed=10) #It samples 10 tokens.

Model Output Graph

Feel free to explore the repository and experiment with the different language models provided.

Contributions

Contributions to this repository are welcome. If you have implemented a novel character-level language model or would like to enhance the existing models, please consider contributing to the project. Thank you !

License

This repository is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.github/workflows		.github/workflows
images		images
src/charLLM		src/charLLM
tests		tests
.gitignore		.gitignore
LICENCE		LICENCE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

images

images

src/charLLM

src/charLLM

tests

tests

.gitignore

.gitignore

LICENCE

LICENCE

README.md

README.md

pyproject.toml

pyproject.toml

requirements.txt

requirements.txt

setup.cfg

setup.cfg

Repository files navigation

Character-Level Language Models Repo 🕺🏽

🌟 Available Language Models

Character-Level MLP

Installation

With PIP

With GIT

Quick Tour

Contributions

License

About

Releases 5

Packages

Languages

License

RAravindDS/CharLLMs

Folders and files

Latest commit

History

Repository files navigation

Character-Level Language Models Repo 🕺🏽

🌟 Available Language Models

Character-Level MLP

Installation

With PIP

With GIT

Quick Tour

Contributions

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages