-
Notifications
You must be signed in to change notification settings - Fork 351
feat: embedder service #9417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
feat: embedder service #9417
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
d8f61f0
feat: add embedder service
jordanh 658ebe3
chore: add logs to prod.js
mattkrick f9cb23e
fix: rename AbstractModel.ts, fix case-sentitive import
jordanh 147672c
fix: pm2 prod config
jordanh ee5a46f
fix: type and class interface changes, assert AbstractModelCase commit 1
jordanh 0782034
fix: final rename AbstractModel.ts
jordanh 654d62b
fix: code review updates
jordanh 02057ca
fix: code review comments take 3
jordanh c41943a
fix: retrospectiveDiscussionTopic add to meta
jordanh File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| module.exports = { | ||
| extends: [ | ||
| '../../.eslintrc.js' | ||
| ], | ||
| parserOptions: { | ||
| project: './tsconfig.json', | ||
| ecmaVersion: 2020, | ||
| sourceType: 'module' | ||
| }, | ||
| "ignorePatterns": ["**/lib", "*.js"] | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,71 @@ | ||
| # `Embedder` | ||
|
|
||
| This service builds embedding vectors for semantic search and for other AI/ML | ||
| use cases. It does so by: | ||
|
|
||
| 1. Updating a list of all possible items to create embedding vectors for and | ||
| storing that list in the `EmbeddingsMetadata` table | ||
| 2. Adding these items in batches to the `EmbeddingsJobQueue` table and a redis | ||
| priority queue called `embedder:queue` | ||
| 3. Allowing one or more parallel embedding services to calculate embedding | ||
| vectors (EmbeddingJobQueue states transistion from `queued` -> `embedding`, | ||
| then `embedding` -> [deleting the `EmbeddingJobQueue` row] | ||
|
|
||
| In addition to deleteing the `EmbeddingJobQueue` row, when a job completes | ||
| successfully: | ||
|
|
||
| - A row is added to the model table with the embedding vector; the | ||
| `EmbeddingMetadataId` field on this row points the appropriate | ||
| metadata row on `EmbeddingsMetadata` | ||
| - The `EmbeddingsMetadata.models` array is updated with the name of the | ||
| table that the embedding has been generated for | ||
|
|
||
| 4. This process repeats forever using a silly polling loop | ||
|
|
||
| In the future, it would be wonderful to enhance this service such that it were | ||
| event driven. | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| The Embedder service depends on pgvector being available in Postgres. | ||
|
|
||
| The predeploy script checks for an environment variable | ||
| `POSTGRES_USE_PGVECTOR=true` to enable this extension in production. | ||
|
|
||
| ## Configuration | ||
|
|
||
| The Embedder service takes no arguments and is controlled by the following | ||
| environment variables, here given with example configuration: | ||
|
|
||
| - `AI_EMBEDDER_ENABLE`: enable/disable the embedder service from | ||
| performing work, or sleeping indefinitely | ||
|
|
||
| `AI_EMBEDDER_ENABLED='true'` | ||
|
|
||
| - `AI_EMBEDDING_MODELS`: JSON configuration for which embedding models | ||
| are enabled. Each model in the array will be instantiated by | ||
| `ai_models/ModelManager`. Each model instance will have its own | ||
| database table created for it (if it does not exist already) used | ||
| to store calculated vectors. See `ai_models/ModelManager` for | ||
| which configurations are supported. | ||
|
|
||
| Example: | ||
|
|
||
| `AI_EMBEDDING_MODELS='[{"model": "text-embeddings-inference:llmrails/ember-v1", "url": "http://localhost:3040/"}]'` | ||
|
|
||
| - `AI_GENERATION_MODELS`: JSON configuration for which AI generation | ||
| models (i.e. GPTS are enabled). These models are used for summarization | ||
| text to be embedded by an embedding model if the text length would be | ||
| greater than the context window of the embedding model. Each model in | ||
| the array will be instantiated by `ai_models/ModelManager`. | ||
| See `ai_models/ModelManager` for which configurations are supported. | ||
|
|
||
| Example: | ||
|
|
||
| `AI_GENERATION_MODELS='[{"model": "text-generation-inference:TheBloke/zephyr-7b-beta", "url": "http://localhost:3050/"}]'` | ||
|
|
||
| ## Usage | ||
|
|
||
| The Embedder service is stateless and takes no arguments. Multiple instances | ||
| of the service may be started in order to match embedding load, or to | ||
| catch up on history more quickly. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,75 @@ | ||
| export interface ModelConfig { | ||
| model: string | ||
| url: string | ||
| } | ||
|
|
||
| export interface EmbeddingModelConfig extends ModelConfig { | ||
| tableSuffix: string | ||
| } | ||
|
|
||
| export interface GenerationModelConfig extends ModelConfig {} | ||
|
|
||
| export abstract class AbstractModel { | ||
| public readonly url?: string | ||
| public modelInstance: any | ||
jordanh marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| constructor(config: ModelConfig) { | ||
| this.url = this.normalizeUrl(config.url) | ||
| } | ||
|
|
||
| // removes a trailing slash from the inputUrl | ||
| private normalizeUrl(inputUrl: string | undefined) { | ||
| if (!inputUrl) return undefined | ||
| const regex = /[/]+$/ | ||
| return inputUrl.replace(regex, '') | ||
| } | ||
| } | ||
|
|
||
| export interface EmbeddingModelParams { | ||
| embeddingDimensions: number | ||
| maxInputTokens: number | ||
| tableSuffix: string | ||
| } | ||
|
|
||
| export abstract class AbstractEmbeddingsModel extends AbstractModel { | ||
| readonly embeddingDimensions: number | ||
| readonly maxInputTokens: number | ||
| readonly tableName: string | ||
| constructor(config: EmbeddingModelConfig) { | ||
| super(config) | ||
| const modelParams = this.constructModelParams(config) | ||
| this.embeddingDimensions = modelParams.embeddingDimensions | ||
| this.maxInputTokens = modelParams.maxInputTokens | ||
| this.tableName = `Embeddings_${modelParams.tableSuffix}` | ||
| } | ||
| protected abstract constructModelParams(config: EmbeddingModelConfig): EmbeddingModelParams | ||
| abstract getEmbedding(content: string): Promise<number[]> | ||
| } | ||
|
|
||
| export interface GenerationModelParams { | ||
| maxInputTokens: number | ||
| } | ||
|
|
||
| export interface GenerationOptions { | ||
| maxNewTokens?: number | ||
| seed?: number | ||
| stop?: string | ||
| temperature?: number | ||
| topK?: number | ||
| topP?: number | ||
| truncate?: boolean | ||
| } | ||
|
|
||
| export abstract class AbstractGenerationModel extends AbstractModel { | ||
| readonly maxInputTokens: number | ||
| constructor(config: GenerationModelConfig) { | ||
| super(config) | ||
| const modelParams = this.constructModelParams(config) | ||
| this.maxInputTokens = modelParams.maxInputTokens | ||
| } | ||
|
|
||
| protected abstract constructModelParams(config: GenerationModelConfig): GenerationModelParams | ||
| abstract summarize(content: string, options: GenerationOptions): Promise<string> | ||
| } | ||
|
|
||
| export default AbstractModel | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,153 @@ | ||
| import {Kysely, sql} from 'kysely' | ||
|
|
||
| import { | ||
| AbstractEmbeddingsModel, | ||
| AbstractGenerationModel, | ||
| EmbeddingModelConfig, | ||
| GenerationModelConfig, | ||
| ModelConfig | ||
| } from './AbstractModel' | ||
| import TextEmbeddingsInference from './TextEmbeddingsInference' | ||
| import TextGenerationInference from './TextGenerationInference' | ||
|
|
||
| interface ModelManagerConfig { | ||
| embeddingModels: EmbeddingModelConfig[] | ||
| generationModels: GenerationModelConfig[] | ||
| } | ||
|
|
||
| export type EmbeddingsModelType = 'text-embeddings-inference' | ||
| export type GenerationModelType = 'text-generation-inference' | ||
|
|
||
| export class ModelManager { | ||
| embeddingModels: AbstractEmbeddingsModel[] | ||
| embeddingModelsMapByTable: {[key: string]: AbstractEmbeddingsModel} | ||
| generationModels: AbstractGenerationModel[] | ||
|
|
||
| private isValidConfig( | ||
| maybeConfig: Partial<ModelManagerConfig> | ||
| ): maybeConfig is ModelManagerConfig { | ||
| if (!maybeConfig.embeddingModels || !Array.isArray(maybeConfig.embeddingModels)) { | ||
| throw new Error('Invalid configuration: embedding_models is missing or not an array') | ||
| } | ||
| if (!maybeConfig.generationModels || !Array.isArray(maybeConfig.generationModels)) { | ||
| throw new Error('Invalid configuration: summarization_models is missing or not an array') | ||
| } | ||
|
|
||
| maybeConfig.embeddingModels.forEach((model: ModelConfig) => { | ||
| this.isValidModelConfig(model) | ||
| }) | ||
|
|
||
| maybeConfig.generationModels.forEach((model: ModelConfig) => { | ||
| this.isValidModelConfig(model) | ||
| }) | ||
|
|
||
| return true | ||
| } | ||
|
|
||
| private isValidModelConfig(model: ModelConfig): model is ModelConfig { | ||
| if (typeof model.model !== 'string') { | ||
| throw new Error('Invalid ModelConfig: model field should be a string') | ||
| } | ||
| if (model.url !== undefined && typeof model.url !== 'string') { | ||
| throw new Error('Invalid ModelConfig: url field should be a string') | ||
| } | ||
|
|
||
| return true | ||
| } | ||
|
|
||
| constructor(config: ModelManagerConfig) { | ||
| // Validate configuration | ||
| this.isValidConfig(config) | ||
|
|
||
| // Initialize embeddings models | ||
| this.embeddingModelsMapByTable = {} | ||
| this.embeddingModels = config.embeddingModels.map((modelConfig) => { | ||
| const [modelType] = modelConfig.model.split(':') as [EmbeddingsModelType, string] | ||
|
|
||
| switch (modelType) { | ||
| case 'text-embeddings-inference': { | ||
| const embeddingsModel = new TextEmbeddingsInference(modelConfig) | ||
| this.embeddingModelsMapByTable[embeddingsModel.tableName] = embeddingsModel | ||
| return embeddingsModel | ||
| } | ||
| default: | ||
| throw new Error(`unsupported embeddings model '${modelType}'`) | ||
| } | ||
| }) | ||
|
|
||
| // Initialize summarization models | ||
| this.generationModels = config.generationModels.map((modelConfig) => { | ||
| const [modelType, _] = modelConfig.model.split(':') as [GenerationModelType, string] | ||
|
|
||
| switch (modelType) { | ||
| case 'text-generation-inference': { | ||
| const generator = new TextGenerationInference(modelConfig) | ||
| return generator | ||
| } | ||
| default: | ||
| throw new Error(`unsupported summarization model '${modelType}'`) | ||
| } | ||
| }) | ||
| } | ||
|
|
||
| async maybeCreateTables(pg: Kysely<any>) { | ||
| const maybePromises = this.embeddingModels.map(async (embeddingsModel) => { | ||
| const tableName = embeddingsModel.tableName | ||
| const hasTable = | ||
| ( | ||
| await sql<number[]>`SELECT 1 FROM ${sql.id('pg_catalog', 'pg_tables')} WHERE ${sql.id( | ||
| 'tablename' | ||
| )} = ${tableName}`.execute(pg) | ||
| ).rows.length > 0 | ||
| if (hasTable) return undefined | ||
| const vectorDimensions = embeddingsModel.embeddingDimensions | ||
| console.log(`ModelManager: creating ${tableName} with ${vectorDimensions} dimensions`) | ||
| const query = sql` | ||
| DO $$ | ||
| BEGIN | ||
| CREATE TABLE IF NOT EXISTS ${sql.id(tableName)} ( | ||
| "id" INT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, | ||
| "embedText" TEXT, | ||
| "embedding" vector(${sql.raw(vectorDimensions.toString())}), | ||
| "embeddingsMetadataId" INTEGER NOT NULL, | ||
| FOREIGN KEY ("embeddingsMetadataId") | ||
| REFERENCES "EmbeddingsMetadata"("id") | ||
| ON DELETE CASCADE | ||
| ); | ||
| CREATE INDEX IF NOT EXISTS "idx_${sql.raw(tableName)}_embedding_vector_cosign_ops" | ||
| ON ${sql.id(tableName)} | ||
| USING hnsw ("embedding" vector_cosine_ops); | ||
| END $$; | ||
|
|
||
| ` | ||
| return query.execute(pg) | ||
| }) | ||
| Promise.all(maybePromises) | ||
| } | ||
| } | ||
|
|
||
| let modelManager: ModelManager | undefined | ||
| export function getModelManager() { | ||
| if (modelManager) return modelManager | ||
| const {AI_EMBEDDING_MODELS, AI_GENERATION_MODELS} = process.env | ||
| const config: ModelManagerConfig = { | ||
| embeddingModels: [], | ||
| generationModels: [] | ||
| } | ||
| try { | ||
| config.embeddingModels = AI_EMBEDDING_MODELS && JSON.parse(AI_EMBEDDING_MODELS) | ||
| } catch (e) { | ||
| throw new Error(`Invalid AI_EMBEDDING_MODELS .env JSON: ${e}`) | ||
| } | ||
| try { | ||
| config.generationModels = AI_GENERATION_MODELS && JSON.parse(AI_GENERATION_MODELS) | ||
| } catch (e) { | ||
| throw new Error(`Invalid AI_GENERATION_MODELS .env JSON: ${e}`) | ||
| } | ||
|
|
||
| modelManager = new ModelManager(config) | ||
|
|
||
| return modelManager | ||
| } | ||
|
|
||
| export default getModelManager |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.