From 6f0fcb331b34472e73ddf7c77aaaf1019fadd966 Mon Sep 17 00:00:00 2001
From: Jeremy Elbourn <jelbourn@google.com>
Date: Mon, 15 Sep 2025 08:47:13 -0700
Subject: [PATCH] Polish documentation for open-source release

* Expand the README with additional intro and FAQ
* Change some bulleted lists to tables
* Minor grammar edits and formatting throughout
---
 LICENSE                       |   2 +-
 README.md                     | 147 ++++++++++++++++++++++++++--------
 docs/environment-reference.md |  88 ++++++++++----------
 docs/model-setup.md           |   8 +-
 package.json                  |  14 +++-
 5 files changed, 174 insertions(+), 85 deletions(-)
diff --git a/LICENSE b/LICENSE
index 322654b..9eb3c39 100644
--- a/LICENSE
+++ b/LICENSE
@@ -1,6 +1,6 @@
 MIT License
 
-Copyright (c) 2025 Angular
+Copyright (c) 2025 Google LLC
 
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
diff --git a/README.md b/README.md
index f98bda0..05ab175 100644
--- a/README.md
+++ b/README.md
@@ -1,21 +1,39 @@
 # Web Codegen Scorer
 
-This project is a tool designed to assess the quality of front-end code generated by Large Language Models (LLMs).
+**Web Codegen Scorer** is a tool for evaluating the quality of web code generated by Large Language
+Models (LLMs).
 
-## Documentation directory
+You can use this tool to make evidence-based decisions relating to AI-generated code. For example:
 
-- [Environment config reference](./docs/environment-reference.md)
-- [How to set up a new model?](./docs/model-setup.md)
+* 🔄 Iterate on a system prompt to find most effective instructions for your project.
+* ⚖️ Compare the code quality of code produced by different models.
+* 📈 Monitor generated code quality over time as models and agents evolve.
+
+Web Codegen Scorer is different from other code benchmarks in that it focuses specifically on _web_
+code and relies primarily on well-established measures of code quality.
+
+## Features
+
+* ⚙️ Configure your evaluations with different models, frameworks, and tools.
+* ✍️ Specify system instructions and add MCP servers.
+* 📋 Use built-in checks for build success, runtime errors, accessibility, security, LLM rating, and
+  coding best practices. (More built-in checks coming soon!)
+* 🔧 Automatically attempt to repair issues detected during code generating.
+* 📊 View and compare results with an intuitive report viewer UI.
 
 ## Setup
 
-1.  **Install the package:**
+1. **Install the package:**
+
 ```bash
 npm install -g web-codegen-scorer
 ```
 
-2.  **Set up your API keys:**
-In order to run an eval, you have to specify an API keys for the relevant providers as environment variables:
+2. **Set up your API keys:**
+
+   In order to run an eval, you have to specify an API keys for the relevant providers as
+   environment variables:
+
 ```bash
 export GEMINI_API_KEY="YOUR_API_KEY_HERE" # If you're using Gemini models
 export OPENAI_API_KEY="YOUR_API_KEY_HERE" # If you're using OpenAI models
@@ -23,14 +41,17 @@ export ANTHROPIC_API_KEY="YOUR_API_KEY_HERE" # If you're using Anthropic models
 ```
 
 3. **Run an eval:**
-You can run your first eval using our Angular example with the following command:
+
+   You can run your first eval using our Angular example with the following command:
+
 ```bash
 web-codegen-scorer eval --env=angular-example
 ```
 
 4. (Optional) **Set up your own eval:**
-If you want to set up a custom eval, instead of using our built-in examples, you can run the following
-command which will guide you through the process:
+
+   If you want to set up a custom eval, instead of using our built-in examples, you can run the
+   following command which will guide you through the process:
 
 ```bash
 web-codegen-scorer init
@@ -40,50 +61,75 @@ web-codegen-scorer init
 
 You can customize the `web-codegen-scorer eval` script with the following flags:
 
-- `--env=<path>` (alias: `--environment`): (**Required**) Specifies the path from which to load the environment config.
-  - Example: `web-codegen-scorer eval --env=foo/bar/my-env.js`
+- `--env=<path>` (alias: `--environment`): (**Required**) Specifies the path from which to load the
+  environment config.
+    - Example: `web-codegen-scorer eval --env=foo/bar/my-env.js`
 
-- `--model=<name>`: Specifies the model to use when generating code. Defaults to the value of `DEFAULT_MODEL_NAME`.
-  - Example: `web-codegen-scorer eval --model=gemini-2.5-flash --env=<config path>`
+- `--model=<name>`: Specifies the model to use when generating code. Defaults to the value of
+  `DEFAULT_MODEL_NAME`.
+    - Example: `web-codegen-scorer eval --model=gemini-2.5-flash --env=<config path>`
 
-- `--runner=<name>`: Specifies the runner to use to execute the eval. Supported runners are `genkit` (default) or `gemini-cli`.
+- `--runner=<name>`: Specifies the runner to use to execute the eval. Supported runners are
+  `genkit` (default) or `gemini-cli`.
 
-- `--local`: Runs the script in local mode for the initial code generation request. Instead of calling the LLM, it will attempt to read the initial code from a corresponding file in the `.llm-output` directory (e.g., `.llm-output/todo-app.ts`). This is useful for re-running assessments or debugging the build/repair process without incurring LLM costs for the initial generation.
-  - **Note:** You typically need to run `web-codegen-scorer eval` once without `--local` to generate the initial files in `.llm-output`.
-  - The `web-codegen-scorer eval:local` script is a shortcut for `web-codegen-scorer eval --local`.
+- `--local`: Runs the script in local mode for the initial code generation request. Instead of
+  calling the LLM, it will attempt to read the initial code from a corresponding file in the
+  `.web-codegen-scorer/llm-output` directory (e.g., `.web-codegen-scorer/llm-output/todo-app.ts`).
+  This is useful for re-running assessments or debugging the build/repair process without incurring
+  LLM costs for the initial generation.
+    - **Note:** You typically need to run `web-codegen-scorer eval` once without `--local` to
+      generate the initial files in `.web-codegen-scorer/llm-output`.
+    - The `web-codegen-scorer eval:local` script is a shortcut for
+      `web-codegen-scorer eval --local`.
 
 - `--limit=<number>`: Specifies the number of application prompts to process. Defaults to `5`.
-  - Example: `web-codegen-scorer eval --limit=10 --env=<config path>`
+    - Example: `web-codegen-scorer eval --limit=10 --env=<config path>`
 
-- `--output-directory=<name>` (alias: `--output-dir`): Specifies which directory to output the generated code under which is useful for debugging. By default the code will be generated in a temporary directory.
-  - Example: `web-codegen-scorer eval --output-dir=test-output --env=<config path>`
+- `--output-directory=<name>` (alias: `--output-dir`): Specifies which directory to output the
+  generated code under which is useful for debugging. By default, the code will be generated in a
+  temporary directory.
+    - Example: `web-codegen-scorer eval --output-dir=test-output --env=<config path>`
 
-- `--concurrency=<number>`: Sets the maximum number of concurrent AI API requests. Defaults to `5` (as defined by `DEFAULT_CONCURRENCY` in `src/config.ts`).
-  - Example: `web-codegen-scorer eval --concurrency=3 --env=<config path>`
+- `--concurrency=<number>`: Sets the maximum number of concurrent AI API requests. Defaults to `5` (
+  as defined by `DEFAULT_CONCURRENCY` in `src/config.ts`).
+    - Example: `web-codegen-scorer eval --concurrency=3 --env=<config path>`
 
-- `--report-name=<name>`: Sets the name for the generated report directory. Defaults to a timestamp (e.g., `2023-10-27T10-30-00-000Z`). The name will be sanitized (non-alphanumeric characters replaced with hyphens).
-  - Example: `web-codegen-scorer eval --report-name=my-custom-report --env=<config path>`
+- `--report-name=<name>`: Sets the name for the generated report directory. Defaults to a
+  timestamp (e.g., `2023-10-27T10-30-00-000Z`). The name will be sanitized (non-alphanumeric
+  characters replaced with hyphens).
+    - Example: `web-codegen-scorer eval --report-name=my-custom-report --env=<config path>`
 
-- `--rag-endpoint=<url>`: Specifies a custom RAG (Retrieval-Augmented Generation) endpoint URL. The URL must contain a `PROMPT` substring, which will be replaced with the user prompt.
-  - Example: `web-codegen-scorer eval --rag-endpoint="http://localhost:8080/my-rag-endpoint?query=PROMPT" --env=<config path>`
+- `--rag-endpoint=<url>`: Specifies a custom RAG (Retrieval-Augmented Generation) endpoint URL. The
+  URL must contain a `PROMPT` substring, which will be replaced with the user prompt.
+    - Example:
+      `web-codegen-scorer eval --rag-endpoint="http://localhost:8080/my-rag-endpoint?query=PROMPT" --env=<config path>`
 
-- `--prompt-filter=<name>`: String used to filter which prompts should be run. By default a random sample (controlled by `--limit`) will be taken from the prompts in the current environment. Setting this can be useful for debugging a specific prompt.
-  - Example: `web-codegen-scorer eval --prompt-filter=tic-tac-toe --env=<config path>`
+- `--prompt-filter=<name>`: String used to filter which prompts should be run. By default, a random
+  sample (controlled by `--limit`) will be taken from the prompts in the current environment.
+  Setting this can be useful for debugging a specific prompt.
+    - Example: `web-codegen-scorer eval --prompt-filter=tic-tac-toe --env=<config path>`
 
-- `--skip-screenshots`: Whether to skip taking screenshots of the generated app. Defaults to `false`.
-  - Example: `web-codegen-scorer eval --skip-screenshots --env=<config path>`
+- `--skip-screenshots`: Whether to skip taking screenshots of the generated app. Defaults to
+  `false`.
+    - Example: `web-codegen-scorer eval --skip-screenshots --env=<config path>`
 
 - `--labels=<label1> <label2>`: Metadata labels that will be attached to the run.
-  - Example: `web-codegen-scorer eval --labels my-label another-label --env=<config path>`
+    - Example: `web-codegen-scorer eval --labels my-label another-label --env=<config path>`
 
 - `--mcp`: Whether to start an MCP for the evaluation. Defaults to `false`.
-  - Example: `web-codegen-scorer eval --mcp --env=<config path>`
+    - Example: `web-codegen-scorer eval --mcp --env=<config path>`
 
 - `--help`: Prints out usage information about the script.
 
+### Additional configuration options
+
+- [Environment config reference](./docs/environment-reference.md)
+- [How to set up a new model?](./docs/model-setup.md)
+
 ## Local development
 
-If you've cloned this repo and want to work on the tool, you have to install its dependencies by running `pnpm install`.
+If you've cloned this repo and want to work on the tool, you have to install its dependencies by
+running `pnpm install`.
 Once they're installed, you can run the following commands:
 
 * `pnpm run release-build` - Builds the package in the `dist` directory for publishing to npm.
@@ -91,3 +137,36 @@ Once they're installed, you can run the following commands:
 * `pnpm run report` - Runs the report app from source.
 * `pnpm run init` - Runs the init script from source.
 * `pnpm run format` - Formats the source code using Prettier.
+
+## FAQ
+
+### Who built this tool?
+
+This tool is built by the Angular team at Google.
+
+### Does this tool only work for Angular code or Google models?
+
+No! You can use this tool with any web library or framework (or none at all) as well as any model.
+
+### Why did you build this tool?
+
+As more and more developers reach for LLM-based tools to create and modify code, we wanted to be
+able to empirically measure the effect of different factors on the quality of generated code. While
+many LLM coding benchmarks exist, we found that these were often too broad and didn't measure the
+specific quality metrics we cared about.
+
+In the absence of such a tool, we found that many developers based their judgements on codegen with
+different models, frameworks, and tools on loosely structured trial-and-error. In contrast, Web
+Codegen Scorer gives us a platform to consistently measure codegen across different configurations
+with consistency and repeatability.
+
+### Will you add more features over time?
+
+Yes! We plan to both expand the number of built-in checks and the variety of codegen scenarios.
+
+Our roadmap includes:
+
+* Including _interaction testing_ in the rating, to ensure the generated code performs any requested
+  behaviors.
+* Measure Core Web Vitals.
+* Measuring the effectiveness of LLM-driven edits on an existing codebase.
diff --git a/docs/environment-reference.md b/docs/environment-reference.md
index 00fae60..b177ce5 100644
--- a/docs/environment-reference.md
+++ b/docs/environment-reference.md
@@ -1,7 +1,7 @@
 # Environment configuration reference
 
 Environments are configured by creating a `config.js` that exposes an object that satisfies the
-`EnvironmentConfig` interface. This document covers all the possible options in `EnvironmentConfig`
+`EnvironmentConfig` interface. This document covers all options in `EnvironmentConfig`
 and what they do.
 
 ## Required properties
@@ -10,29 +10,27 @@ These properties all have to be specified in order for the environment to functi
 
 ### `displayName`
 
-Human-readable name that will be shown in eval reports about this environment.
+Human-readable name that is shown in eval reports about this environment.
 
 ### `id`
 
-Unique ID for the environment. If ommitted, one will be generated from the `displayName`.
+Unique ID for the environment. If omitted, one is generated from the `displayName`.
 
 ### `clientSideFramework`
 
-ID of the client-side framework that the environment will be running, for example `angular`.
+ID of the client-side framework that the environment runs, for example `angular`.
 
 ### `ratings`
 
-An array defining the ratings that will be executed as a part of the evaluation.
-The ratings determine what score that will be assigned to the test run.
-Currently we support the following types of ratings:
+An array defining the ratings that are executed as a part of the evaluation.
+The ratings determine the score assigned for the test run.
+Currently, the tool supports the following built-in ratings:
 
-- `PerBuildRating` - assigns a score based on the build result of the generated code, e.g.
-  "Does it build on the first run?" or "Does it build after X repair attempts?"
-- `PerFileRating` - assigns a score based on the content of individual files generated by the LLM.
-  Can be run either against all file types by setting the `filter` to
-  `PerFileRatingContentType.UNKNOWN` or against specific files.
-- `LLMBasedRating` - rates the generated code by asking an LLM to assign a score to it,
-  e.g. "Does this app match the specified prompts?"
+| Rating Name      | Description                                                                                                                                                                                                         |
+|------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `PerBuildRating` | Assigns a score based on the build result of the generated code, e.g. "Does it build on the first run?" or "Does it build after X repair attempts?"                                                                 |
+| `PerFileRating`  | Assigns a score based on the content of individual files generated by the LLM. Can be run either against all file types by setting the `filter` to<br>`PerFileRatingContentType.UNKNOWN` or against specific files. |
+| `LLMBasedRating` | Rates the generated code by asking an LLM to assign a score to it, e.g. "Does this app match the specified prompts?"                                                                                                |
 
 ### `packageManager`
 
@@ -62,29 +60,25 @@ This is useful when evaluating confidential code.
 
 ### `skipInstall`
 
-Whether to skip installing dependencies during the eval run. This can be useful if you've already
-ensured that all dependencies are installed through something like pnpm workspaces.
+Whether to skip installing dependencies during the eval run. This is useful if you've already
+installed dependencies through something like pnpm workspaces.
 
 ### Prompt templating
 
-Prompts are typically stored in `.md` files. We support the following template syntax inside of
-these files in order to augment the prompt and reduce boilerplate:
+Prompts are typically stored in `.md` files. The tool supports the following template syntax inside
+of these files in order to augment the prompt and reduce boilerplate:
 
-- `{{> embed file='../path/to/file.md' }}` - embeds the content of the specified file in the
-  current one.
-- `{{> contextFiles '**/*.foo' }}` - specifies files that should be passed to the LLM as context
-  when the prompt is executed. Should be a comma-separated string of glob pattern **within** the
-  environments project code. E.g. `{{> contextFiles '**/*.ts, **/*.html' }}` will pass all `.ts`
-  and `.html` files as context.
-- `{{CLIENT_SIDE_FRAMEWORK_NAME}}` - insert the name of the client-side framework of the current
-  environment.
-- `{{FULL_STACK_FRAMEWORK_NAME}}` - insert the name of the full-stack framework of the current
-  environment.
+| Helper / Variable                        | Description                                                                                                                                                                                                                                                                          |
+|------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `{{> embed file='../path/to/file.md' }}` | Embeds the content of the specified file in the current one.                                                                                                                                                                                                                         |
+| `{{> contextFiles '**/*.foo' }}`         | Specifies files that should be passed to the LLM as context when the prompt is executed. Should be a comma-separated string of glob pattern **within** the environments project code. E.g. `{{> contextFiles '**/*.ts, **/*.html' }}` passes all `.ts` and `.html` files as context. |
+| `{{CLIENT_SIDE_FRAMEWORK_NAME}}`         | Insert the name of the client-side framework of the current environment.                                                                                                                                                                                                             |
+| `{{FULL_STACK_FRAMEWORK_NAME}}`          | Insert the name of the full-stack framework of the current environment.                                                                                                                                                                                                              |
 
 ### Prompt-specific ratings
 
-If you want to run a set of ratings against a specific prompt, you can set an object literal
-in the `executablePrompts` array, instead of a string:
+If you want to run a set of ratings against a specific prompt, set an object literal in the
+`executablePrompts` array, instead of a string:
 
 ```ts
 executablePrompts: [
@@ -101,10 +95,12 @@ executablePrompts: [
 
 ### Multi-step prompts
 
-Multi-step prompts are prompts meant to evaluate workflows made up of one or more stages.
-Steps execute one after another **inside the same directory**, but are rated individually and
-snapshots after each step are stored in the final report. You can create a multi-step prompt by
-passing an instrance of the `MultiStepPrompt` class into the `executablePrompts` array, for example:
+**Multistep prompts** evaluate workflows composed of one or more stages.
+Steps execute one after another **inside the same directory**, but are rated individually. The tool
+takes
+snapshots after each step and includes them in the final report. You can create a multistep prompt
+by
+passing an instance of the `MultiStepPrompt` class into the `executablePrompts` array, for example:
 
 ```ts
 executablePrompts: [
@@ -142,34 +138,36 @@ run against it.
 
 ## Optional properties
 
-These properties aren't required for the environment to run, but can be used to configure it further.
+These properties aren't required for the environment to run, but can be used to configure it
+further.
 
 ### `sourceDirectory`
 
-Project into which the LLM-generated files will be placed, built, executed and evaluated.
-Can be an entire project or a handful of files that will be merged with the
+Directory into which the LLM-generated files are written, built, executed, and evaluated.
+Can be an entire project or a handful of files to be merged with the
 `projectTemplate` ([see below](#projecttemplate))
 
 ### `projectTemplate`
 
 Used for reducing the boilerplate when setting up an environment, `projectTemplate` specifies the
-path of the project template that will be merged together with the files from `sourceDirectory` to
-create the final project structure that the evaluation will run against.
+path of a project template to be merged together with the files from `sourceDirectory`, creating
+the final project structure against which the evaluation runs.
 
-For example, if the config has `projectTemplate: './templates/angular', sourceDirectory: './project'`,
-the eval runner will copy the files from `./templates/angular` into the output directory
-and then apply the files from `./project` on top of them, merging directories and replacing
+For example, if the config has
+`projectTemplate: './templates/angular', sourceDirectory: './project'`,
+the eval runner copies the files from `./templates/angular` into the output directory
+and then applies the files from `./project` on top of them, merging directories and replacing
 overlapping files.
 
 ### `fullStackFramework`
 
-Name of the full-stack framework that is used in the evaluation, in addition to the
-`clientSideFramework`. If omitted, the `fullStackFramework` will be set to the same value as
+Name of the full-stack framework that used in the evaluation, in addition to the
+`clientSideFramework`. If omitted, the `fullStackFramework` is set to the same value as
 the `clientSideFramework`.
 
 ### `mcpServers`
 
-IDs of Model Context Protocol servers that will be started and exposed to the LLM as a part of
+IDs of Model Context Protocol (MCP) servers that are started and exposed to the LLM as a part of
 the evaluation.
 
 ### `buildCommand`
diff --git a/docs/model-setup.md b/docs/model-setup.md
index 4648d4e..5ce305c 100644
--- a/docs/model-setup.md
+++ b/docs/model-setup.md
@@ -1,9 +1,11 @@
-# How to setup up a new LLM?
+# How to set up a new LLM?
 
 If you want to test out a model that isn't yet available in the runner, you can add
 support for it by following these steps:
 
-1. Ensure that the provider of the model is supported by Genkit.
-2. Find the provider for the model in `runner/codegen/genkit/providers`. If the provider hasn't been implemented yet, do so by creating a new `GenkitModelProvider` and adding it to the `MODEL_PROVIDERS` in `runner/genkit/models.ts`.
+1. Ensure that the provider of the model is supported by [Genkit](https://genkit.dev/).
+2. Find the provider for the model in `runner/codegen/genkit/providers`. If the provider hasn't been
+   implemented yet, do so by creating a new `GenkitModelProvider` and adding it to the
+   `MODEL_PROVIDERS` in `runner/genkit/models.ts`.
 3. Add your model to the `GenkitModelProvider` configs.
 4. Done! 🎉 You can now run your model by passing `--model=<your model ID>`.
diff --git a/package.json b/package.json
index 9d238d6..ef1edb9 100644
--- a/package.json
+++ b/package.json
@@ -11,10 +11,20 @@
     "format": "prettier --write \"runner/**/*.ts\" \"report-app/**/*.ts\" \"*.json\"",
     "check-format": "prettier --check \"runner/**/*.ts\" \"report-app/**/*.ts\" \"*.json\""
   },
-  "keywords": [],
+  "keywords": [
+    "codegen",
+    "code generation",
+    "benchmark",
+    "llm",
+    "evaluation",
+    "web",
+    "web development",
+    "code quaility",
+    "prompt engineering"
+  ],
   "author": "",
   "license": "MIT",
-  "description": "",
+  "description": "Web Codegen Scorer is a tool for evaluating the quality of web code generated by Large Language Models (LLMs).",
   "type": "module",
   "bugs": {
     "url": "https://github.com/angular/web-codegen-scorer/issues"