Skip to content

Programming-from-A-to-Z/Save-Embeddings-JSON

Repository files navigation

Saving Embeddings to JSON file

Overview

This is an example Node.js application processes a text corpus, generates embeddings for "chunks", and saves the embeddings to a local file. The embeddings can be used in another application (like a Retrieval Augmentated Generation system or 2D/3D clustering demonstration using UMAP dimensionality reduction)

There are two main scripts in this project:

  • `embeddings-replicate.js``: Generates embeddings using the Llama model on Replicate.
  • `embeddings-transformers.js``: Generates embeddings using the bge-small model with transformers.js.

Both scripts output the embeddings to embeddings.json.

Replicate with Llama model

Using transformers.js with bge-small model

  • Uses the transformers.js package and bge-small model for embeddings generation.
  • embeddings-transformers.js: Script to process a text file and generate embeddings using the bge-small model.

A map of clustered p5.js function names

References

How-To

  1. Install Dependencies
npm install

For Replicate (embeddings-replicate.js)

  1. Set up the .env file with your Replicate API token:
REPLICATE_API_TOKEN=your_api_token_here
  1. Generate the embeddings.json file.

You'll need to hard-code a text filename and adjust how the text is split up depending on the format of your data.

const raw = fs.readFileSync('text-corpus.txt', 'utf-8');
let chunks = raw.split(/\n+/);

Then:

node embeddings-replicate.js

For transformers.js (embeddings-transformers.js)

  1. Generate the embeddings.json file. Adjust the text filename and splitting method as needed:
const raw = fs.readFileSync('text-corpus.txt', 'utf-8');
let chunks = raw.split(/\n+/);
node embeddings-transformers.js

About

Using bge-large-en-v1.5 to save embeddings to a local file

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published