ShortLang: Compressed Text for efficient LLMs

The future of text representation and processing

Overview

ShortLang is a minimal-length, semantically-preserving textual representation framework designed to optimize language model reasoning, training efficiency, and storage requirements. It compresses natural language into a concise symbolic form while retaining core meaning as measured by embedding similarity.

Features

Rule-Based Compression: Deterministic methods to remove stopwords, abbreviate entities, and eliminate redundancy.
Model-Based Compression: Uses fine-tuned language models for nuanced semantic compression.
Hybrid Approach: Combines rule-based preprocessing with model-based compression.
Embedding Validation: Objective assessment of semantic retention using cosine similarity.

Installation

Clone the repository:

git clone https://github.com/Pro-GenAI/ShortLang.git
cd ShortLang

Install dependencies:
```
pip install -r requirements.txt
```
Set up environment variables in ".env" based on ".env.example".

Usage

Run "short_lang/shortlang.py".

Applications

Reasoning Optimization
Training Data Compression
Efficient Chunking for Vector Embedding
Vector Database Storage and Retrieval
Multi-Agent and Multi-Step Systems

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
short_lang		short_lang
.env.example		.env.example
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
paper-ShortLang.pdf		paper-ShortLang.pdf
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ShortLang: Compressed Text for efficient LLMs

The future of text representation and processing

Overview

Features

Installation

Usage

Applications

About

Uh oh!

Languages

License

Pro-GenAI/ShortLang

Folders and files

Latest commit

History

Repository files navigation

ShortLang: Compressed Text for efficient LLMs

The future of text representation and processing

Overview

Features

Installation

Usage

Applications

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages