ShortLang is a minimal-length, semantically-preserving textual representation framework designed to optimize language model reasoning, training efficiency, and storage requirements. It compresses natural language into a concise symbolic form while retaining core meaning as measured by embedding similarity.
- Rule-Based Compression: Deterministic methods to remove stopwords, abbreviate entities, and eliminate redundancy.
- Model-Based Compression: Uses fine-tuned language models for nuanced semantic compression.
- Hybrid Approach: Combines rule-based preprocessing with model-based compression.
- Embedding Validation: Objective assessment of semantic retention using cosine similarity.
-
Clone the repository:
git clone https://github.com/Pro-GenAI/ShortLang.git cd ShortLang -
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables in ".env" based on ".env.example".
Run "short_lang/shortlang.py".
- Reasoning Optimization
- Training Data Compression
- Efficient Chunking for Vector Embedding
- Vector Database Storage and Retrieval
- Multi-Agent and Multi-Step Systems
