angular · jelbourn · Sep 15, 2025 · Sep 15, 2025
@@ -1,6 +1,6 @@
 MIT License
 
-Copyright (c) 2025 Angular
+Copyright (c) 2025 Google LLC
 
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal

@@ -1,36 +1,57 @@
 # Web Codegen Scorer
 
-This project is a tool designed to assess the quality of front-end code generated by Large Language Models (LLMs).
+**Web Codegen Scorer** is a tool for evaluating the quality of web code generated by Large Language
+Models (LLMs).
 
-## Documentation directory
+You can use this tool to make evidence-based decisions relating to AI-generated code. For example:
 
-- [Environment config reference](./docs/environment-reference.md)
-- [How to set up a new model?](./docs/model-setup.md)
+* 🔄 Iterate on a system prompt to find most effective instructions for your project.
+* ⚖️ Compare the code quality of code produced by different models.
+* 📈 Monitor generated code quality over time as models and agents evolve.
+
+Web Codegen Scorer is different from other code benchmarks in that it focuses specifically on _web_
+code and relies primarily on well-established measures of code quality.
+
+## Features
+
+* ⚙️ Configure your evaluations with different models, frameworks, and tools.
+* ✍️ Specify system instructions and add MCP servers.
+* 📋 Use built-in checks for build success, runtime errors, accessibility, security, LLM rating, and
+  coding best practices. (More built-in checks coming soon!)
+* 🔧 Automatically attempt to repair issues detected during code generating.
+* 📊 View and compare results with an intuitive report viewer UI.
 
 ## Setup
 
-1.  **Install the package:**
+1. **Install the package:**
+
 ```bash
 npm install -g web-codegen-scorer
 ```
 
-2.  **Set up your API keys:**
-In order to run an eval, you have to specify an API keys for the relevant providers as environment variables:
+2. **Set up your API keys:**
+
+   In order to run an eval, you have to specify an API keys for the relevant providers as
+   environment variables:
+
 ```bash
 export GEMINI_API_KEY="YOUR_API_KEY_HERE" # If you're using Gemini models
 export OPENAI_API_KEY="YOUR_API_KEY_HERE" # If you're using OpenAI models
 export ANTHROPIC_API_KEY="YOUR_API_KEY_HERE" # If you're using Anthropic models
 ```
 
 3. **Run an eval:**
-You can run your first eval using our Angular example with the following command:
+
+   You can run your first eval using our Angular example with the following command:
+
 ```bash
 web-codegen-scorer eval --env=angular-example
 ```
 
 4. (Optional) **Set up your own eval:**
-If you want to set up a custom eval, instead of using our built-in examples, you can run the following
-command which will guide you through the process:
+
+   If you want to set up a custom eval, instead of using our built-in examples, you can run the
+   following command which will guide you through the process:
 
 ```bash
 web-codegen-scorer init
@@ -40,54 +61,112 @@ web-codegen-scorer init
 
 You can customize the `web-codegen-scorer eval` script with the following flags:
 
-- `--env=<path>` (alias: `--environment`): (**Required**) Specifies the path from which to load the environment config.
-  - Example: `web-codegen-scorer eval --env=foo/bar/my-env.js`
+- `--env=<path>` (alias: `--environment`): (**Required**) Specifies the path from which to load the
+  environment config.
+    - Example: `web-codegen-scorer eval --env=foo/bar/my-env.js`
 
-- `--model=<name>`: Specifies the model to use when generating code. Defaults to the value of `DEFAULT_MODEL_NAME`.
-  - Example: `web-codegen-scorer eval --model=gemini-2.5-flash --env=<config path>`
+- `--model=<name>`: Specifies the model to use when generating code. Defaults to the value of
+  `DEFAULT_MODEL_NAME`.
+    - Example: `web-codegen-scorer eval --model=gemini-2.5-flash --env=<config path>`
 
-- `--runner=<name>`: Specifies the runner to use to execute the eval. Supported runners are `genkit` (default) or `gemini-cli`.
+- `--runner=<name>`: Specifies the runner to use to execute the eval. Supported runners are
+  `genkit` (default) or `gemini-cli`.
 
-- `--local`: Runs the script in local mode for the initial code generation request. Instead of calling the LLM, it will attempt to read the initial code from a corresponding file in the `.llm-output` directory (e.g., `.llm-output/todo-app.ts`). This is useful for re-running assessments or debugging the build/repair process without incurring LLM costs for the initial generation.
-  - **Note:** You typically need to run `web-codegen-scorer eval` once without `--local` to generate the initial files in `.llm-output`.
-  - The `web-codegen-scorer eval:local` script is a shortcut for `web-codegen-scorer eval --local`.
+- `--local`: Runs the script in local mode for the initial code generation request. Instead of
+  calling the LLM, it will attempt to read the initial code from a corresponding file in the
+  `.web-codegen-scorer/llm-output` directory (e.g., `.web-codegen-scorer/llm-output/todo-app.ts`).
+  This is useful for re-running assessments or debugging the build/repair process without incurring
+  LLM costs for the initial generation.
+    - **Note:** You typically need to run `web-codegen-scorer eval` once without `--local` to
+      generate the initial files in `.web-codegen-scorer/llm-output`.
+    - The `web-codegen-scorer eval:local` script is a shortcut for
+      `web-codegen-scorer eval --local`.
 
 - `--limit=<number>`: Specifies the number of application prompts to process. Defaults to `5`.
-  - Example: `web-codegen-scorer eval --limit=10 --env=<config path>`
+    - Example: `web-codegen-scorer eval --limit=10 --env=<config path>`
 
-- `--output-directory=<name>` (alias: `--output-dir`): Specifies which directory to output the generated code under which is useful for debugging. By default the code will be generated in a temporary directory.
-  - Example: `web-codegen-scorer eval --output-dir=test-output --env=<config path>`
+- `--output-directory=<name>` (alias: `--output-dir`): Specifies which directory to output the
+  generated code under which is useful for debugging. By default, the code will be generated in a
+  temporary directory.
+    - Example: `web-codegen-scorer eval --output-dir=test-output --env=<config path>`
 
-- `--concurrency=<number>`: Sets the maximum number of concurrent AI API requests. Defaults to `5` (as defined by `DEFAULT_CONCURRENCY` in `src/config.ts`).
-  - Example: `web-codegen-scorer eval --concurrency=3 --env=<config path>`
+- `--concurrency=<number>`: Sets the maximum number of concurrent AI API requests. Defaults to `5` (
+  as defined by `DEFAULT_CONCURRENCY` in `src/config.ts`).
+    - Example: `web-codegen-scorer eval --concurrency=3 --env=<config path>`
 
-- `--report-name=<name>`: Sets the name for the generated report directory. Defaults to a timestamp (e.g., `2023-10-27T10-30-00-000Z`). The name will be sanitized (non-alphanumeric characters replaced with hyphens).
-  - Example: `web-codegen-scorer eval --report-name=my-custom-report --env=<config path>`
+- `--report-name=<name>`: Sets the name for the generated report directory. Defaults to a
+  timestamp (e.g., `2023-10-27T10-30-00-000Z`). The name will be sanitized (non-alphanumeric
+  characters replaced with hyphens).
+    - Example: `web-codegen-scorer eval --report-name=my-custom-report --env=<config path>`
 
-- `--rag-endpoint=<url>`: Specifies a custom RAG (Retrieval-Augmented Generation) endpoint URL. The URL must contain a `PROMPT` substring, which will be replaced with the user prompt.
-  - Example: `web-codegen-scorer eval --rag-endpoint="http://localhost:8080/my-rag-endpoint?query=PROMPT" --env=<config path>`
+- `--rag-endpoint=<url>`: Specifies a custom RAG (Retrieval-Augmented Generation) endpoint URL. The
+  URL must contain a `PROMPT` substring, which will be replaced with the user prompt.
+    - Example:
+      `web-codegen-scorer eval --rag-endpoint="http://localhost:8080/my-rag-endpoint?query=PROMPT" --env=<config path>`
 
-- `--prompt-filter=<name>`: String used to filter which prompts should be run. By default a random sample (controlled by `--limit`) will be taken from the prompts in the current environment. Setting this can be useful for debugging a specific prompt.
-  - Example: `web-codegen-scorer eval --prompt-filter=tic-tac-toe --env=<config path>`
+- `--prompt-filter=<name>`: String used to filter which prompts should be run. By default, a random
+  sample (controlled by `--limit`) will be taken from the prompts in the current environment.
+  Setting this can be useful for debugging a specific prompt.
+    - Example: `web-codegen-scorer eval --prompt-filter=tic-tac-toe --env=<config path>`
 
-- `--skip-screenshots`: Whether to skip taking screenshots of the generated app. Defaults to `false`.
-  - Example: `web-codegen-scorer eval --skip-screenshots --env=<config path>`
+- `--skip-screenshots`: Whether to skip taking screenshots of the generated app. Defaults to
+  `false`.
+    - Example: `web-codegen-scorer eval --skip-screenshots --env=<config path>`
 
 - `--labels=<label1> <label2>`: Metadata labels that will be attached to the run.
-  - Example: `web-codegen-scorer eval --labels my-label another-label --env=<config path>`
+    - Example: `web-codegen-scorer eval --labels my-label another-label --env=<config path>`
 
 - `--mcp`: Whether to start an MCP for the evaluation. Defaults to `false`.
-  - Example: `web-codegen-scorer eval --mcp --env=<config path>`
+    - Example: `web-codegen-scorer eval --mcp --env=<config path>`
 
 - `--help`: Prints out usage information about the script.
 
+### Additional configuration options
+
+- [Environment config reference](./docs/environment-reference.md)
+- [How to set up a new model?](./docs/model-setup.md)
+
 ## Local development
 
-If you've cloned this repo and want to work on the tool, you have to install its dependencies by running `pnpm install`.
+If you've cloned this repo and want to work on the tool, you have to install its dependencies by
+running `pnpm install`.
 Once they're installed, you can run the following commands:
 
 * `pnpm run release-build` - Builds the package in the `dist` directory for publishing to npm.
 * `pnpm run eval` - Runs an eval from source.
 * `pnpm run report` - Runs the report app from source.
 * `pnpm run init` - Runs the init script from source.
 * `pnpm run format` - Formats the source code using Prettier.
+
+## FAQ
+
+### Who built this tool?
+
+This tool is built by the Angular team at Google.
+
+### Does this tool only work for Angular code or Google models?
+
+No! You can use this tool with any web library or framework (or none at all) as well as any model.
+
+### Why did you build this tool?
+
+As more and more developers reach for LLM-based tools to create and modify code, we wanted to be
+able to empirically measure the effect of different factors on the quality of generated code. While
+many LLM coding benchmarks exist, we found that these were often too broad and didn't measure the
+specific quality metrics we cared about.
+
+In the absence of such a tool, we found that many developers based their judgements on codegen with
+different models, frameworks, and tools on loosely structured trial-and-error. In contrast, Web
+Codegen Scorer gives us a platform to consistently measure codegen across different configurations
+with consistency and repeatability.
+
+### Will you add more features over time?
+
+Yes! We plan to both expand the number of built-in checks and the variety of codegen scenarios.
+
+Our roadmap includes:
+
+* Including _interaction testing_ in the rating, to ensure the generated code performs any requested
+  behaviors.
+* Measure Core Web Vitals.
+* Measuring the effectiveness of LLM-driven edits on an existing codebase.