MemoryGate

A lightweight DistilBERT classifier that decides what an AI assistant should remember — and what it should forget.

Warning: In LM Studio do not use a reasoning model. Reasoning models may break the system.

Tested and verified on Ubuntu.

Most AI assistants treat all conversation turns equally. MemoryGate filters them by importance, so only meaningful information gets stored in long-term memory — things like medical details, deadlines, passwords, and personal events — while casual small talk and trivia are quietly discarded.

How It Works

MemoryGate is a three-stage pipeline:

Generate — Uses a local LLM via LM Studio to produce labelled training examples across high and low importance conversation topics
Train — Fine-tunes a DistilBERT classifier on that data to score each conversation turn
Run — Runs the trained model in real time to decide what the assistant should save to its memory

What Counts as Important

High importance (label = 1)

Deaths, grief, family emergencies, personal trauma
Passwords, API keys, PINs, access tokens
Medical diagnoses, prescriptions, allergies, surgery dates
Legal contracts, compliance deadlines, court dates
Financial decisions, bank details, tax deadlines
Project deadlines, stakeholder agreements, production credentials

Low importance (label = 0)

Casual greetings and small talk
General trivia and history facts
Creative requests like jokes or poems
Simple definitions and basic questions
Movie or food recommendations

Requirements

Python 3.10 (via Anaconda recommended)
A CUDA-capable GPU is recommended for training (CPU fallback is supported)
LM Studio running locally with a model loaded (always needed for run_memory.py and generate_training_data.py)

Installation

Clone the repository:

git clone https://github.com/ErenalpCet/MemoryGate.git
cd MemoryGate

Create and activate a Python 3.10 environment with Anaconda:

conda create -n memorygate python=3.10
conda activate memorygate

Install dependencies:

pip install -r requirements.txt

This will automatically install PyTorch with CUDA 12.6 support. If you are on CPU only, replace the --index-url line in requirements.txt with the standard PyPI version.

Set up your environment variables by copying the example file:

cp .env.example .env

Then open .env and adjust the settings if needed.

Usage

Step 1 — Generate Training Data

Make sure LM Studio is running with a model loaded, then run:

python generate_training_data.py

This produces conversation_data.jsonl with balanced high and low importance examples.

Step 2 — Train the Model

python train_model.py

The best checkpoint is saved to ./importance_model/ based on validation loss.

Step 3 — Run the Memory Filter

python run_memory.py

Project Structure

MemoryGate/
├── generate_training_data.py   # Synthetic data generation via LM Studio
├── train_model.py              # DistilBERT fine-tuning pipeline
├── run_memory.py               # Runtime memory filtering
├── conversation_data.jsonl     # Generated training data (git ignored)
├── importance_model/           # Saved model weights (git ignored)
├── .env.example                # Environment variable template
└── requirements.txt

Configuration

Key settings in train_model.py:

Setting	Default	Description
`model_name`	distilbert-base-uncased	Base transformer model
`batch_size`	32	Adjust based on available VRAM
`epochs`	6	Training epochs
`importance_threshold`	0.60	Deployment classification threshold
`use_amp`	True	Mixed precision, recommended for CUDA

License

This project is licensed under the GNU Affero General Public License v3.0.

Any project that uses MemoryGate — including over a network or API — must also be released under AGPL-3.0. See the LICENSE file for full details.

Author

ErenalpCet — Erenalp Çetintürk

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MemoryGate

How It Works

What Counts as Important

Requirements

Installation

Usage

Step 1 — Generate Training Data

Step 2 — Train the Model

Step 3 — Run the Memory Filter

Project Structure

Configuration

License

Author

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.env.example		.env.example
LICENSE		LICENSE
README.md		README.md
generate_training_data.py		generate_training_data.py
requirements.txt		requirements.txt
run_memory.py		run_memory.py
train_model.py		train_model.py

Folders and files

Latest commit

History

Repository files navigation

MemoryGate

How It Works

What Counts as Important

Requirements

Installation

Usage

Step 1 — Generate Training Data

Step 2 — Train the Model

Step 3 — Run the Memory Filter

Project Structure

Configuration

License

Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages