feat: add Gemini embed_content tracking by carlos-marchal-ph · Pull Request #498 · PostHog/posthog-python

carlos-marchal-ph · 2026-04-10T09:36:25Z

Changes

Adds embed_content wrapping to the Gemini client (both sync and async), capturing $ai_embedding events in PostHog. This addresses a community request for tracking Gemini embedding calls.

Add embed_content() to Models and AsyncModels classes, matching the native client.models.embed_content() API
Track $ai_embedding events with provider, model, input, latency, and token counts (Vertex AI)
Support all PostHog params (distinct_id, trace_id, properties, privacy_mode, groups) with default merging
Error handling: capture event with error properties before re-raising

How to test

pytest posthog/test/ai/gemini/ -v — 20 new tests (8 sync + 8 async mock-based, 2 sync + 2 async integration)
Integration tests run with GOOGLE_API_KEY=$GEMINI_API_KEY and hit the real Gemini embedding API
Existing Gemini tests still pass (55 passed, 4 skipped)

github-actions · 2026-04-10T09:37:03Z

posthog-python Compliance Report

Date: 2026-04-10 15:06:01 UTC
Duration: 196ms

✅ All Tests Passed!

0/0 tests passed

greptile-apps · 2026-04-10T09:39:01Z

Prompt To Fix All With AI

This is a comment left during a code review.
Path: posthog/ai/gemini/gemini.py
Line: 28

Comment:
**Duplicate import from `posthog.ai.utils`**

`with_privacy_mode` is imported in a separate statement when an import for the same module already exists on lines 17–21. This violates DRY and makes the import block harder to scan.

```suggestion
from posthog.ai.utils import (
    call_llm_and_track_usage,
    capture_streaming_event,
    merge_usage_stats,
    with_privacy_mode,
)
```

The standalone `from posthog.ai.utils import with_privacy_mode` line (28) should then be removed. The same issue exists in `gemini_async.py`.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: posthog/test/ai/gemini/test_gemini.py
Line: 1196-1351

Comment:
**Prefer parameterised tests**

The 8 new `embed_content` unit tests (and their 8 async mirrors in `test_gemini_async.py`) are written as independent functions, but several of them vary only in one dimension (input type, presence of token stats, privacy flag, etc.) and share nearly identical assertion logic. Per the project's testing convention, these should use `pytest.mark.parametrize`.

For example, `test_embed_content_basic` and `test_embed_content_without_token_counts` both call `embed_content` with a single string and verify the captured event — the only difference is which fixture is used and the expected `$ai_input_tokens` value. Similarly, `test_embed_content_privacy_mode` and `test_embed_content_no_distinct_id` each test a single property on the same captured event and could be collapsed into a single parameterised test.

Additionally, the `mock_embed_content_response` and `mock_embed_content_response_with_stats` fixtures are duplicated verbatim in both test files; moving them to a shared `conftest.py` would remove the redundancy.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "chore: add sampo changeset for gemini em..." | Re-trigger Greptile}

greptile-apps · 2026-04-10T09:39:05Z

posthog/ai/gemini/gemini.py

    extract_gemini_content_from_chunk,
    format_gemini_streaming_output,
 )
+from posthog.ai.utils import with_privacy_mode


Duplicate import from posthog.ai.utils

with_privacy_mode is imported in a separate statement when an import for the same module already exists on lines 17–21. This violates DRY and makes the import block harder to scan.

Suggested change

from posthog.ai.utils import with_privacy_mode

from posthog.ai.utils import (

call_llm_and_track_usage,

capture_streaming_event,

merge_usage_stats,

with_privacy_mode,

)

The standalone from posthog.ai.utils import with_privacy_mode line (28) should then be removed. The same issue exists in gemini_async.py.

Prompt To Fix With AI

This is a comment left during a code review. Path: posthog/ai/gemini/gemini.py Line: 28 Comment: **Duplicate import from `posthog.ai.utils`** `with_privacy_mode` is imported in a separate statement when an import for the same module already exists on lines 17–21. This violates DRY and makes the import block harder to scan. ```suggestion from posthog.ai.utils import ( call_llm_and_track_usage, capture_streaming_event, merge_usage_stats, with_privacy_mode, ) ``` The standalone `from posthog.ai.utils import with_privacy_mode` line (28) should then be removed. The same issue exists in `gemini_async.py`. How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

greptile-apps · 2026-04-10T09:39:06Z

posthog/test/ai/gemini/test_gemini.py

+def test_embed_content_basic(mock_client, mock_google_genai_client, mock_embed_content_response):
+    mock_google_genai_client.models.embed_content.return_value = mock_embed_content_response
+
+    client = Client(api_key="test-key", posthog_client=mock_client)
+    response = client.models.embed_content(
+        model="gemini-embedding-001",
+        contents="Hello world",
+        posthog_distinct_id="test-id",
+        posthog_properties={"foo": "bar"},
+    )
+
+    assert response == mock_embed_content_response
+    assert mock_client.capture.call_count == 1
+
+    call_args = mock_client.capture.call_args[1]
+    props = call_args["properties"]
+
+    assert call_args["distinct_id"] == "test-id"
+    assert call_args["event"] == "$ai_embedding"
+    assert props["$ai_provider"] == "gemini"
+    assert props["$ai_model"] == "gemini-embedding-001"
+    assert props["$ai_input"] == "Hello world"
+    assert props["$ai_http_status"] == 200
+    assert isinstance(props["$ai_latency"], float)
+    assert "$ai_trace_id" in props
+    assert props["$ai_base_url"] == "https://generativelanguage.googleapis.com"
+    assert props["foo"] == "bar"
+
+    # Verify the underlying call received the right args
+    mock_google_genai_client.models.embed_content.assert_called_once_with(
+        model="gemini-embedding-001", contents="Hello world"
+    )
+
+
+def test_embed_content_with_token_counts(mock_client, mock_google_genai_client, mock_embed_content_response_with_stats):
+    mock_google_genai_client.models.embed_content.return_value = mock_embed_content_response_with_stats
+
+    client = Client(api_key="test-key", posthog_client=mock_client)
+    client.models.embed_content(
+        model="gemini-embedding-001",
+        contents=["Hello", "World"],
+        posthog_distinct_id="test-id",
+    )
+
+    props = mock_client.capture.call_args[1]["properties"]
+    assert props["$ai_input_tokens"] == 13  # 5 + 8
+
+
+def test_embed_content_without_token_counts(mock_client, mock_google_genai_client, mock_embed_content_response):
+    mock_google_genai_client.models.embed_content.return_value = mock_embed_content_response
+
+    client = Client(api_key="test-key", posthog_client=mock_client)
+    client.models.embed_content(
+        model="gemini-embedding-001",
+        contents="Hello",
+        posthog_distinct_id="test-id",
+    )
+
+    props = mock_client.capture.call_args[1]["properties"]
+    assert props["$ai_input_tokens"] == 0
+
+
+def test_embed_content_privacy_mode(mock_client, mock_google_genai_client, mock_embed_content_response):
+    mock_google_genai_client.models.embed_content.return_value = mock_embed_content_response
+
+    client = Client(api_key="test-key", posthog_client=mock_client)
+    client.models.embed_content(
+        model="gemini-embedding-001",
+        contents="Secret text",
+        posthog_distinct_id="test-id",
+        posthog_privacy_mode=True,
+    )
+
+    props = mock_client.capture.call_args[1]["properties"]
+    assert props["$ai_input"] is None
+
+
+def test_embed_content_no_distinct_id(mock_client, mock_google_genai_client, mock_embed_content_response):
+    mock_google_genai_client.models.embed_content.return_value = mock_embed_content_response
+
+    client = Client(api_key="test-key", posthog_client=mock_client)
+    client.models.embed_content(
+        model="gemini-embedding-001",
+        contents="Hello",
+    )
+
+    call_args = mock_client.capture.call_args[1]
+    props = call_args["properties"]
+
+    # Should fall back to trace_id as distinct_id
+    assert call_args["distinct_id"] == props["$ai_trace_id"]
+    assert props["$process_person_profile"] is False
+
+
+def test_embed_content_default_params(mock_client, mock_google_genai_client, mock_embed_content_response):
+    mock_google_genai_client.models.embed_content.return_value = mock_embed_content_response
+
+    client = Client(
+        api_key="test-key",
+        posthog_client=mock_client,
+        posthog_distinct_id="default-id",
+        posthog_properties={"team": "ai"},
+        posthog_groups={"company": "test-co"},
+    )
+    client.models.embed_content(
+        model="gemini-embedding-001",
+        contents="Hello",
+        posthog_properties={"extra": "prop"},
+    )
+
+    call_args = mock_client.capture.call_args[1]
+    props = call_args["properties"]
+
+    assert call_args["distinct_id"] == "default-id"
+    assert props["team"] == "ai"
+    assert props["extra"] == "prop"
+    assert call_args["groups"] == {"company": "test-co"}
+
+
+def test_embed_content_error_handling(mock_client, mock_google_genai_client):
+    mock_google_genai_client.models.embed_content.side_effect = Exception("API error")
+
+    client = Client(api_key="test-key", posthog_client=mock_client)
+
+    with pytest.raises(Exception, match="API error"):
+        client.models.embed_content(
+            model="gemini-embedding-001",
+            contents="Hello",
+            posthog_distinct_id="test-id",
+        )
+
+    # Event should still be captured
+    assert mock_client.capture.call_count == 1
+    props = mock_client.capture.call_args[1]["properties"]
+    assert props["$ai_is_error"] is True
+    assert props["$ai_error"] == "API error"
+    assert props["$ai_http_status"] == 0
+
+
+def test_embed_content_kwargs_passthrough(mock_client, mock_google_genai_client, mock_embed_content_response):
+    mock_google_genai_client.models.embed_content.return_value = mock_embed_content_response
+
+    client = Client(api_key="test-key", posthog_client=mock_client)
+    client.models.embed_content(
+        model="gemini-embedding-001",
+        contents="Hello",
+        posthog_distinct_id="test-id",
+        config={"output_dimensionality": 64},
+    )
+
+    mock_google_genai_client.models.embed_content.assert_called_once_with(
+        model="gemini-embedding-001",
+        contents="Hello",
+        config={"output_dimensionality": 64},
+    )
+


Prefer parameterised tests

The 8 new embed_content unit tests (and their 8 async mirrors in test_gemini_async.py) are written as independent functions, but several of them vary only in one dimension (input type, presence of token stats, privacy flag, etc.) and share nearly identical assertion logic. Per the project's testing convention, these should use pytest.mark.parametrize.

For example, test_embed_content_basic and test_embed_content_without_token_counts both call embed_content with a single string and verify the captured event — the only difference is which fixture is used and the expected $ai_input_tokens value. Similarly, test_embed_content_privacy_mode and test_embed_content_no_distinct_id each test a single property on the same captured event and could be collapsed into a single parameterised test.

Additionally, the mock_embed_content_response and mock_embed_content_response_with_stats fixtures are duplicated verbatim in both test files; moving them to a shared conftest.py would remove the redundancy.

Prompt To Fix With AI

This is a comment left during a code review. Path: posthog/test/ai/gemini/test_gemini.py Line: 1196-1351 Comment: **Prefer parameterised tests** The 8 new `embed_content` unit tests (and their 8 async mirrors in `test_gemini_async.py`) are written as independent functions, but several of them vary only in one dimension (input type, presence of token stats, privacy flag, etc.) and share nearly identical assertion logic. Per the project's testing convention, these should use `pytest.mark.parametrize`. For example, `test_embed_content_basic` and `test_embed_content_without_token_counts` both call `embed_content` with a single string and verify the captured event — the only difference is which fixture is used and the expected `$ai_input_tokens` value. Similarly, `test_embed_content_privacy_mode` and `test_embed_content_no_distinct_id` each test a single property on the same captured event and could be collapsed into a single parameterised test. Additionally, the `mock_embed_content_response` and `mock_embed_content_response_with_stats` fixtures are duplicated verbatim in both test files; moving them to a shared `conftest.py` would remove the redundancy. How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

carlos-marchal-ph · 2026-04-10T11:00:28Z

This PR was implemented to address a community request for tracking Gemini embedding calls.

JS SDK companion PR for feature parity: PostHog/posthog-js#3369

…mbed-content # Conflicts: # posthog/test/ai/gemini/test_gemini.py

carlos-marchal-ph added 2 commits April 10, 2026 11:21

feat: add gemini embed_content tracking support

efd4b30

chore: add sampo changeset for gemini embed_content

e581b53

carlos-marchal-ph requested a review from a team April 10, 2026 09:36

carlos-marchal-ph self-assigned this Apr 10, 2026

carlos-marchal-ph added team/llm-analytics release labels Apr 10, 2026

greptile-apps bot reviewed Apr 10, 2026

View reviewed changes

carlos-marchal-ph mentioned this pull request Apr 10, 2026

feat: add Gemini embedContent tracking PostHog/posthog-js#3369

Merged

7 tasks

chore: fix black formatting

3793bf5

andrewm4894 approved these changes Apr 10, 2026

View reviewed changes

Merge remote-tracking branch 'origin/master' into feat(llma)/gemini-e…

d039ed4

…mbed-content # Conflicts: # posthog/test/ai/gemini/test_gemini.py

carlos-marchal-ph merged commit b921fe3 into master Apr 10, 2026
26 checks passed

carlos-marchal-ph deleted the feat(llma)/gemini-embed-content branch April 10, 2026 15:15

carlos-marchal-ph deployed to Release April 10, 2026 15:15 — with GitHub Actions Active

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Gemini embed_content tracking#498

feat: add Gemini embed_content tracking#498
carlos-marchal-ph merged 4 commits intomasterfrom
feat(llma)/gemini-embed-content

carlos-marchal-ph commented Apr 10, 2026

Uh oh!

github-actions bot commented Apr 10, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Apr 10, 2026

Uh oh!

greptile-apps bot Apr 10, 2026

Uh oh!

greptile-apps bot Apr 10, 2026

Uh oh!

carlos-marchal-ph commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

carlos-marchal-ph commented Apr 10, 2026

Changes

How to test

Uh oh!

github-actions bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

posthog-python Compliance Report

✅ All Tests Passed!

Uh oh!

greptile-apps bot commented Apr 10, 2026

Uh oh!

greptile-apps bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-marchal-ph commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Apr 10, 2026 •

edited

Loading