Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions packages/core/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,7 @@ export * from './subsystems/IO/VectorDB.service/connectors/MilvusVectorDB.class'
export * from './subsystems/IO/VectorDB.service/connectors/PineconeVectorDB.class';
export * from './subsystems/IO/VectorDB.service/connectors/RAMVecrtorDB.class';
export * from './subsystems/IO/VectorDB.service/embed/BaseEmbedding';
export * from './subsystems/IO/VectorDB.service/embed/GoogleEmbedding';
export * from './subsystems/IO/VectorDB.service/embed/index';
export * from './subsystems/IO/VectorDB.service/embed/OpenAIEmbedding';
export * from './subsystems/LLMManager/LLM.service/connectors/Anthropic.class';
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
import { GoogleGenerativeAI } from '@google/generative-ai';
import { BaseEmbedding, TEmbeddings } from './BaseEmbedding';
import { AccessCandidate } from '@sre/Security/AccessControl/AccessCandidate.class';
import { getLLMCredentials } from '@sre/LLMManager/LLM.service/LLMCredentials.helper';
import { TLLMCredentials, TLLMModel, BasicCredentials } from '@sre/types/LLM.types';

const DEFAULT_MODEL = 'gemini-embedding-001';

export class GoogleEmbeds extends BaseEmbedding {
protected client: GoogleGenerativeAI;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ› οΈ Refactor suggestion

Initialize client field safely

Mark client as optional to satisfy strict init checks and match the lazy init pattern used below.

-    protected client: GoogleGenerativeAI;
+    protected client?: GoogleGenerativeAI;
πŸ“ Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
protected client: GoogleGenerativeAI;
// packages/core/src/subsystems/IO/VectorDB.service/embed/GoogleEmbedding.ts
class GoogleEmbedding {
protected client?: GoogleGenerativeAI;
// …rest of class…
}
πŸ€– Prompt for AI Agents
In packages/core/src/subsystems/IO/VectorDB.service/embed/GoogleEmbedding.ts
around line 10, the protected client is declared without allowing uninitialized
state which fails strict init checks and conflicts with the lazy-init pattern;
change the declaration to make it optional (e.g., use "protected client?:
GoogleGenerativeAI" or "protected client: GoogleGenerativeAI | undefined") and
ensure existing code paths that access this.client either initialize it first or
guard for undefined before use.


public static models = ['gemini-embedding-001'];
public canSpecifyDimensions = true;

Comment on lines +12 to +14
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ’‘ Verification agent

🧩 Analysis chain

Dimensions flag: verify end-to-end behavior

canSpecifyDimensions = true implies callers may request non-default dims, but no dimension is passed to the API here. Ensure BaseEmbedding (or downstream) handles resizing/truncation to requested dims or pass the API parameter if supported.


🏁 Script executed:

#!/bin/bash
# Where are embedding dimensions enforced?
rg -n -C2 -e 'dimensions?|dimensionality|vector.?size|numDimensions' packages/core/src

Length of output: 13157


🏁 Script executed:

#!/bin/bash
# Show GoogleEmbedding class and its methods
rg -nP --type=ts 'class GoogleEmbedding' -C5 packages/core/src/subsystems/IO/VectorDB.service/embed

Length of output: 101


🏁 Script executed:

#!/bin/bash
# List files in embed directory to locate GoogleEmbedding file
ls -1 packages/core/src/subsystems/IO/VectorDB.service/embed

Length of output: 126


🏁 Script executed:

#!/bin/bash
# Display the contents of GoogleEmbedding.ts to inspect dimension handling
sed -n '1,200p' packages/core/src/subsystems/IO/VectorDB.service/embed/GoogleEmbedding.ts

Length of output: 3614


🏁 Script executed:

#!/bin/bash
sed -n '1,200p' packages/core/src/subsystems/IO/VectorDB.service/embed/BaseEmbedding.ts

Length of output: 3912


Disable dimension override in GoogleEmbedding

  • In packages/core/src/subsystems/IO/VectorDB.service/embed/GoogleEmbedding.ts:12, canSpecifyDimensions = true implies custom‐dim support but neither the API call nor BaseEmbedding uses or enforces this.dimensions, leading to mismatched vector lengths in downstream connectors.
  • Either set canSpecifyDimensions = false or add truncation/padding logic after model.embedContent to enforce this.dimensions.
πŸ€– Prompt for AI Agents
In packages/core/src/subsystems/IO/VectorDB.service/embed/GoogleEmbedding.ts
around lines 12 to 14, the class sets canSpecifyDimensions = true but neither
the Google API call nor BaseEmbedding enforces this.dimensions, causing
potential vector-length mismatches; either set canSpecifyDimensions = false to
disallow dimension overrides, or keep it true and after receiving
model.embedContent apply deterministic truncation or zero-padding to the
returned embedding to exactly this.dimensions (and validate the value is numeric
and >0), then return the adjusted vector so downstream connectors always receive
vectors of the expected length.

constructor(private settings?: Partial<TEmbeddings>) {
super({ model: settings?.model ?? DEFAULT_MODEL, ...settings });
}
Comment on lines +15 to +17
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Default model can be unintentionally overridden to undefined

Spreading settings after setting model can overwrite the default when settings.model is undefined. Put the spread first and the model last.

-    constructor(private settings?: Partial<TEmbeddings>) {
-        super({ model: settings?.model ?? DEFAULT_MODEL, ...settings });
-    }
+    constructor(private settings?: Partial<TEmbeddings>) {
+        super({ ...settings, model: settings?.model ?? DEFAULT_MODEL });
+    }
πŸ“ Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
constructor(private settings?: Partial<TEmbeddings>) {
super({ model: settings?.model ?? DEFAULT_MODEL, ...settings });
}
constructor(private settings?: Partial<TEmbeddings>) {
super({ ...settings, model: settings?.model ?? DEFAULT_MODEL });
}
πŸ€– Prompt for AI Agents
In packages/core/src/subsystems/IO/VectorDB.service/embed/GoogleEmbedding.ts
around lines 15 to 17, the constructor currently spreads settings after setting
model which allows settings.model === undefined to overwrite the default; change
the order so you spread settings first and then set model (or set model using
nullish-coalescing on the already-spread object) so the default model is applied
when settings.model is undefined.


async embedTexts(texts: string[], candidate: AccessCandidate): Promise<number[][]> {
const batches = this.chunkArr(this.processTexts(texts), this.chunkSize);

const batchRequests = batches.map((batch) => {
return this.embed(batch, candidate);
});
const batchResponses = await Promise.all(batchRequests);

const embeddings: number[][] = [];
for (let i = 0; i < batchResponses.length; i += 1) {
const batch = batches[i];
const batchResponse = batchResponses[i];
for (let j = 0; j < batch.length; j += 1) {
embeddings.push(batchResponse[j]);
}
}
return embeddings;
}

async embedText(text: string, candidate: AccessCandidate): Promise<number[]> {
const processedText = this.processTexts([text])[0];
const embeddings = await this.embed([processedText], candidate);
return embeddings[0];
}

protected async embed(texts: string[], candidate: AccessCandidate): Promise<number[][]> {
let apiKey: string | undefined;

// Try to get from credentials first
try {
const modelInfo: TLLMModel = {
provider: 'GoogleAI',
modelId: this.model,
credentials: this.settings?.credentials as unknown as TLLMCredentials,
};
const credentials = await getLLMCredentials(candidate, modelInfo);
apiKey = (credentials as BasicCredentials)?.apiKey;
} catch (e) {
// If credential system fails, fall back to environment variable
}

// Fall back to environment variable if not found in credentials
if (!apiKey) {
apiKey = process.env.GOOGLE_AI_API_KEY;
}

if (!apiKey) {
throw new Error('Please provide an API key for Google AI embeddings via credentials or GOOGLE_AI_API_KEY environment variable');
}

if (!this.client) {
this.client = new GoogleGenerativeAI(apiKey);
}

try {
const model = this.client.getGenerativeModel({ model: this.model });

const embeddings: number[][] = [];

for (const text of texts) {
const result = await model.embedContent(text);
if (result?.embedding?.values) {
embeddings.push(result.embedding.values);
} else {
throw new Error('Invalid embedding response from Google AI');
}
}

return embeddings;
} catch (e) {
throw new Error(`Google Embeddings API error: ${e.message || e}`);
}
}
}
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import { OpenAIEmbeds } from './OpenAIEmbedding';
import { GoogleEmbeds } from './GoogleEmbedding';
import { TEmbeddings } from './BaseEmbedding';

// a factory to get the correct embedding provider based on the provider name
Expand All @@ -7,6 +8,10 @@ const supportedProviders = {
embedder: OpenAIEmbeds,
models: OpenAIEmbeds.models,
},
GoogleAI: {
embedder: GoogleEmbeds,
models: GoogleEmbeds.models,
},
} as const;

export type SupportedProviders = keyof typeof supportedProviders;
Expand Down
Loading