Saving Embeddings to JSON file

Overview

This is an example Node.js application processes a text corpus, generates embeddings for "chunks", and saves the embeddings to a local file. The embeddings can be used in another application (like a Retrieval Augmentated Generation system or 2D/3D clustering demonstration using UMAP dimensionality reduction)

There are two main scripts in this project:

`embeddings-replicate.js``: Generates embeddings using the Llama model on Replicate.
`embeddings-transformers.js``: Generates embeddings using the bge-small model with transformers.js.

Both scripts output the embeddings to embeddings.json.

Uses the transformers.js package and bge-small model for embeddings generation.
embeddings-transformers.js: Script to process a text file and generate embeddings using the bge-small model.

npm install

REPLICATE_API_TOKEN=your_api_token_here

You'll need to hard-code a text filename and adjust how the text is split up depending on the format of your data.

const raw = fs.readFileSync('text-corpus.txt', 'utf-8');
let chunks = raw.split(/\n+/);

Then:

node embeddings-replicate.js

Generate the embeddings.json file. Adjust the text filename and splitting method as needed:

const raw = fs.readFileSync('text-corpus.txt', 'utf-8');
let chunks = raw.split(/\n+/);

node embeddings-transformers.js

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.env-sample		.env-sample
.gitignore		.gitignore
README.md		README.md
clustering.png		clustering.png
embeddings-replicate.js		embeddings-replicate.js
embeddings-transformers.js		embeddings-transformers.js
index.js		index.js
p5-embeddings.json		p5-embeddings.json
p5.txt		p5.txt
package-lock.json		package-lock.json
package.json		package.json
tv-embeddings.json		tv-embeddings.json
tv.txt		tv.txt