Skip to content

Conversation

hemanth5055
Copy link
Contributor

@hemanth5055 hemanth5055 commented Oct 12, 2025

Description:

This PR introduces a robust analytics and language detection system for the code translation feature. It provides real-time tracking of translation usage and enables detailed insights into which language pairs are most frequently used. The main updates include:

  1. KV-based Analytics Tracking:

    • Every translation request now records the source and target language pair in Cloudflare Workers KV (LANG_TRANSLATION_ANALYTICS).
    • Each KV entry maintains a usage count that increments with every translation request.
    • Keys are normalized (lowercase and trimmed) to avoid duplicates caused by inconsistent casing or whitespace.
  2. Source Language Detection:

    • Automatically detects the programming language of submitted code using Google Gemini AI before performing translation.
    • Guarantees accurate analytics tracking for multiple programming languages.
  3. /v1/analytics Endpoint:

    • Introduces a new GET endpoint to fetch all translation analytics stored in KV.
    • Returns a JSON object where keys represent source-target language pairs and values indicate their usage counts.
    • Enables monitoring of translation usage patterns and identification of popular language combinations.

Testing:

  • Verified that translation requests increment analytics counts correctly in KV. Screenshot 2025-10-12 at 3 57 51 PM

  • Tested the /v1/analytics endpoint to ensure all keys and counts are returned accurately. Screenshot 2025-10-12 at 3 59 18 PM

  • Confirmed that source language detection reliably identifies programming languages for various code snippets.


Impact:

  • Enables comprehensive tracking of translation usage and helps identify the most popular language pairs.
  • Provides actionable insights into feature usage for monitoring, optimization, and future improvements.
  • Improves reliability by ensuring KV writes are awaited and keys are normalized.

Next Steps / Suggestions for Reviewers:

  • Please review the KV write logic and analytics counting mechanism.
  • Confirm that language detection works accurately for all supported languages.
  • Suggest any improvements for performance or code readability.

Before testing or running the Worker locally, please add your KV namespace ID for LANG_TRANSLATION_ANALYTICS in your wrangler.json under the KV binding section.

closes #20

@dineshsutihar dineshsutihar requested a review from Copilot October 13, 2025 10:32
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a comprehensive analytics system for the code translation feature, tracking usage patterns through Cloudflare Workers KV storage. It implements automatic source language detection using Google Gemini AI and provides an analytics endpoint for monitoring translation usage.

  • Added KV-based analytics tracking that records source-target language pairs with usage counts
  • Implemented automatic source language detection for submitted code using AI
  • Created a new /v1/analytics endpoint to retrieve usage statistics

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
backend/wrangler.jsonc Added KV namespace binding for analytics storage
backend/src/index.ts Implemented analytics tracking, language detection, and analytics endpoint

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

async function handleTranslate(request: Request, model: ReturnType<GoogleGenerativeAI['getGenerativeModel']>) {

async function updateAnalytics(source: string, dest: string, env: Env) {
const key = `${source}-${dest}`;
Copy link

Copilot AI Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The analytics key should be normalized (lowercase and trimmed) to avoid duplicates, similar to how the rate limit key is handled on line 19.

Suggested change
const key = `${source}-${dest}`;
const normalizedSource = source.trim().toLowerCase();
const normalizedDest = dest.trim().toLowerCase();
const key = `${normalizedSource}-${normalizedDest}`;

Copilot uses AI. Check for mistakes.

${code}`;

const result = await model.generateContent(prompt);
return result.response.text().trim();
Copy link

Copilot AI Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The detectLanguage function should normalize the returned language name to lowercase to ensure consistent analytics keys, preventing duplicate entries like 'Python' vs 'python'.

Suggested change
return result.response.text().trim();
return result.response.text().trim().toLowerCase();

Copilot uses AI. Check for mistakes.

const stats: Record<string, any> = {};
for (const key of list.keys) {
const val = await env.LANG_TRANSLATION_ANALYTICS.get(key.name);
stats[key.name] = JSON.parse(val || '{}');
Copy link

Copilot AI Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON.parse should be wrapped in a try-catch block to handle potential parsing errors, similar to the error handling in updateAnalytics function.

Suggested change
stats[key.name] = JSON.parse(val || '{}');
try {
stats[key.name] = JSON.parse(val || '{}');
} catch (e) {
console.error(`Failed to parse analytics value for key "${key.name}":`, e);
stats[key.name] = {};
}

Copilot uses AI. Check for mistakes.

@hemanth5055
Copy link
Contributor Author

Hey @dineshsutihar , I have made changes according to co-pilot review , you can merge it now.

@dineshsutihar dineshsutihar self-requested a review October 14, 2025 16:57
Copy link
Collaborator

@dineshsutihar dineshsutihar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, @hemanth5055! I'm going to merge it now.
In the future, I think we can optimize this by sending the source language from the extension, so we won't need to call Gemini to detect the language.

@hemanth5055
Copy link
Contributor Author

Nice work, @hemanth5055! I'm going to merge it now. In the future, I think we can optimize this by sending the source language from the extension, so we won't need to call Gemini to detect the language.

Hey @dineshsutihar,
I was thinking along the same lines. We can modify Gemini to return a JSON response like this:

{
  "translation": "",
  "source_language": ""
}

This way, we can eliminate the extra API call that’s currently used just for language detection.
If you like this approach, let me know — I can raise a new issue for it.

@dineshsutihar
Copy link
Collaborator

Yes, that sounds like a solid approach, @hemanth5055. Please go ahead and raise a new issue for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Backend: Add anonymous usage analytics to track popular languages

2 participants