Skip to content

rfc: Prompt API support #14

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

rfc: Prompt API support #14

wants to merge 5 commits into from

Conversation

jkleinsc
Copy link
Member

@jkleinsc jkleinsc commented Apr 9, 2025

The Prompt API is part of Built-in AI capabilities in Chromium based browsers.

With the Prompt API developers can send natural language requests from the renderer for local processing.

This proposal adds support for this API in Electron.

📄 Read rendered document

@jkleinsc jkleinsc requested a review from a team as a code owner April 9, 2025 13:53
@felixrieseberg
Copy link
Member

I love the simplicity of directly setting the handler on Session, I think that's really easy to understand.

The main drawback I see is that with this design, the main process is always an intermediary between the utility process and the renderer in which window.ai is involved. We already have similar shortcomings in our network API, so this proposal isn't any worse than equivalent APIs - but the optimal solution would establish a message channel directly between the utility process and the rendered process (via Mojo), thus avoiding having to bother the main process at all.

In @electron/llm, I'm going with something that combines both approaches - exchanging ports on prompt and then streaming additional responses directly to the renderer. I wonder if we can make that easier for devs here, for instance by providing and established port in the streaming API that you can just post to.

if (!this.#aiProcess) {
throw new Error('AI model process not started');
}
this.#aiProcess.postMessage({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we expect to get a prompt response back out of the utility process? Will it postMessage back to the main process?

}
}

session.defaultSession.setPromptAPIHandler(() => { ElectronAI });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we directly pass a construction function, I think it'll make it easier to more flexibly swap out the implementation, and will also make it easier to adapt to changes in the upstream API

Suggested change
session.defaultSession.setPromptAPIHandler(() => { ElectronAI });
session.defaultSession.setPromptAPIHandler((webContents, options) => new ElectronAI(webContents, options));

Copy link
Member

@dsanders11 dsanders11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some non-substantive comments, and a note that we should probably make the source.chromium.org links permalinks for historical reference.

@samuelmaddock samuelmaddock added the pending-review Waiting for reviewers to give feedback on the PR specifications label Apr 17, 2025
Copy link
Member

@dsanders11 dsanders11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there other imagined use cases at the moment for the WebContentsargument to setPromptAPIHandler, or is the only imagined use case currently to distinguish different WebContents instances to avoid conversation bleed over between them?

Exposing WebContents to utilityProcess would be a rather large side effect of this API, so I wonder if we want to start with the minimal surface area and only provide webContents.id to start?

@MarshallOfSound
Copy link
Member

Exposing WebContents to utilityProcess would be a rather large side effect of this API, so I wonder if we want to start with the minimal surface area and only provide webContents.id to start?

It would also expose a bit of a security hole, utility process's shouldn't have arbitrary mojo access to webContents. And building a whole system where we pass handles back and forth to ensure that only the given utility process can call those methods without a strong reason for utility process's to need that access is gonna be a hard sell (at least to me).

Agree with david, would love to avoid exposing any part of the webContents API to Utility. An ID seems good, or if you want to make security decisions passing a securityOrigin

@erickzhao erickzhao added pending-changes Waiting for changes from the PR author based on feedback given and removed pending-review Waiting for reviewers to give feedback on the PR specifications labels May 8, 2025
Pass webContentsId instead of webContents
Additionally pass securityOrigin

Added ses.getUtilityProcessForLocalAI so that a handle to the utility process can be obtained so that messages can be sent to and received from the utilityProcess (eg the utilityProcess sends a message with the webContentsId in order to get more information about the webContents if needed).
@jkleinsc jkleinsc added pending-review Waiting for reviewers to give feedback on the PR specifications and removed pending-changes Waiting for changes from the PR author based on feedback given labels May 27, 2025
@jkleinsc
Copy link
Member Author

@dsanders11 @MarshallOfSound Good points on passing the webContents. I've adjusted the design to pass a webContentsId instead and I've also added the securityOrigin.

Copy link
Member

@dsanders11 dsanders11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got some questions about the lifetime of the associated UtilityProcess.

It looks like the Prompt API allows for explicit session destruction, and I see you've got the destroy method in the handler script. Do we want destroy handled at that level, and leave the UtilityProcess alive, or do we also kill the UtilityProcess on destroy?

Do we support multiple LanguageModel.create calls, and/or the clone operation? Does each one get its own UtilityProcess, or is it a single UtilityProcess with multiple ElectronAI instances?

Do we automatically restart the UtilityProcess if it crashes, like network service in Chromium? Or for now should we simply let the process die and throw errors for any LanguageModel associated with the dead process?

Also addressed review feedback
@jkleinsc
Copy link
Member Author

jkleinsc commented Jun 3, 2025

@dsanders11 in regards to your questions:

It looks like the Prompt API allows for explicit session destruction, and I see you've got the destroy method in the handler script. Do we want destroy handled at that level, and leave the UtilityProcess alive, or do we also kill the UtilityProcess on destroy?

I think this should be left up to those implementing the API as to whether they want to leave the UtilityProcess running or not.

Do we support multiple LanguageModel.create calls, and/or the clone operation? Does each one get its own UtilityProcess, or is it a single UtilityProcess with multiple ElectronAI instances?

Again, I think this is up to those implementing the API. You could in theory register multiple local AI Handler Scripts/Utility processes or you could use one utility process.

Do we automatically restart the UtilityProcess if it crashes, like network service in Chromium? Or for now should we simply let the process die and throw errors for any LanguageModel associated with the dead process?

This is a good question that I'd like the API WG to weigh in on, but I think we'd want to let the process die and throw errors because if someone wrote a utility process that continually crashes we wouldn't want to repeatedly restart it only to have it die again.

Comment on lines +81 to +84
* `webContentsId` (Integer | null) - The unique id of the [WebContents](web-contents.md)
calling the Prompt API. The webContents id may be null if the Prompt API is called from
a service worker or shared worker.
* `securityOrigin` String - Origin of the page calling the Prompt API.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can structure this in a way that allows looking up more details of the sender similar to how preload realms introduced senderType to IPCs. That way the handler can more easily validate the initiator.

```diff
interface IpcMainEvent {
// Breaking change
- sender: WebContents;
+ sender: WebContents | ServiceWorkerMain;
// Non-breaking change
+ senderType: 'frame' | 'service-worker';
+ senderServiceWorker: ServiceWorkerMain;
}
```

Could be something like

type HandlerDetails = { senderType: 'frame', webContentsId: number, frameId: number } |
  { senderType: 'service-worker', versionId: number, scopeUrl: string } |
  { senderType: 'process', processId: number }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, should the handler allow rejecting the prompt API request?


* `id` string - local AI handler script ID

Unregisters the specified local AI handler script and terminates the associated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this call the destroy() method on any prompt API handlers, or does that only happen if the prompt instance is destroyed?

Are there any other considerations we should make for how the utility process should handle shutdown, or is that already covered by existing utility process APIs?

The `localAIHandler` module has the following methods:

##### `localAIHandler.setPromptAPIHandler(handler)`
* `handler` Function\<[LanguageModel](https://github.com/webmachinelearning/prompt-api?tab=readme-ov-file#full-api-surface-in-web-idl)\> | null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to make this async? Since we're just getting a webContentsId, it might be necessary to make an IPC call to the Main process to gather other info.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may eventually want to add other parameters in future, so it might be more future-proof if we pass a dictionary to the handler, rather than positional paramters

@erickzhao erickzhao added pending-changes Waiting for changes from the PR author based on feedback given and removed pending-review Waiting for reviewers to give feedback on the PR specifications labels Jul 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending-changes Waiting for changes from the PR author based on feedback given
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants