-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FlowFuse-Hosted LLM API for Flow Generation #3921
Comments
As discussed, as a first step, I have setup demos on a dashboard for OpenIA, Gemini 1.0 Pro and Gemini 1.5 Pro - here: https://inquisitive-pacific-swift-9466.flowfuse.dev/dashboard (FF Auth) Unfortunately, after trying 2 different accounts, trying different network connections I am unable to "top up" for testing the OpenAI GPT API, so the GPT trials were not performed via API but rather via its web app. Issues:
Recommendations:Fine TuningWhile testing the LLMs I added Q+As to the data each time I got a nonsense result and it did improve the quality of a response. CuratedTo limit scope and improve accuracy we could add an initial primer Post processingI believe a level of post processing will be needed. For example, during tests involving nodes with config nodes, the LLM returned |
Are you using your Brex card here to get a paid account? |
Yes, i was using Brex card. The errors I received are well documented
The solution was often to "wiat 12 hours" or "Use a different card" |
Happy to use personal CC to top up and get that working (could add like $15 to get us some credits)? |
so, even if we set this up in production, we would still be hitting these problems? |
If you are referring to OpenAI + CC entry issue, then maybe yes, but it could also be due to personal account or location. Would need to explore what other payment options they have (they have team and enterprise option that may have alt payment methods). |
Okay, please investigate and feel free to upgrade if that's what it required. We'd need assurance that we have a stable connection/service. |
So, we'd be going out as an "alpha" for this if we go out, but it does need to be reasonable. What are your estimations in effort/time required to make that so? (if possible at all) |
I have gotten GPT working now (can be tested on https://inquisitive-pacific-swift-9466.flowfuse.dev/dashboard (FF Auth / might need an invite). It is all POC and pretty rough (i.e. not checking stop reason, assumes success etc - super alpha). Also, we will need to test concurrent requests from multiple users (not handled in this dashboard POC) On first try outs, GPT 4o is better than the gemini models. I have performed several simple prompts like "Take the number received on MQTT topic "home/kitchen/temperature/c" convert it to Fahrenheit and send that on topic "home/kitchen/temperature/f" and have used up 0.23p of credit. For determining readiness for an alpha version, I will need to do additional testing specifically: To get to alpha
Other considerations
Est for 1st alpha: 3 days work. NOTE: Alpha in the above context does not include any additional fine tuning of models - that is a larger task and can be a follow up task if/when required. Additionally, that means no sanity check that flows either work or are "well structured". that would require either JSON schemas for all known nodes or very specific system prompts to be included with user prompts. |
What's the concern here? Why does this matter? We'd be opening this to the public too, not just in-product
Can you expand on this too please? If users want to try and trick the LLM into returning non-NR stuff, let them
If we're hitting rate limits because the service is too popular, I'm okay with that being a problem we solve at the time. |
Not sure I understand how there is 3 days work here, expose what you have now via an API, we label it alpha, wrap it in a Hubspot form (this may need some thinking) - and we go. May want to add CORS options so that the API can only be called from our website? |
If there is one endpoint serving multiple users (be that in-house or public) I need to ensure it handles concurrent requests. Since responses can easily take > 30s, there is a high probability multiple in-flight requests will occur.
As it stands, there are
Understood
I was under the assumption it would be integrated into product so I put extra time in there for "integration" requirements. If we are simply providing this as a form in some other means, then that is reduced. The other considerations listed are still pieces of work to do tho. Also, there was no API when I wrote that comment (there was a minimal functioning flow that permitted POC via dashboard). As of writing this comment, there is now an API endpoint (that I started yesterday, finished this morning, albeit rough n ready - but working) PS: Never done anything like what you suggest in Hubspot - will need a leg up for sure.
As it stands, I have had no issue calling the new endpoint from CURL from my local machine 🤞 . |
It's an LLM, there are going to be backdoors galore. There is only so much we can do. Ensure we're clear in the |
The scope for this issue is just an API to call, nothing more. |
Yes, but we WE can curtail it in some ways. I am not suggesting 50 hours of work. i am suggesting we do due diligence to minimise. |
That is not what we discussed Joe. We discussed a dashboard for POC first when we verbally discussed this. |
PoC for testing is useful, yes, but the actual deliverable, as defined in the issue description is the API endpoint.
Nor had I until last week, played with it a little for the Dashboard Migration service stuff, there are holes, but it's okay |
👇
I know the brief was to make an API but at the time of writing, there was not, it was still PoC |
Thanks Steve - in that case, let's just add the sanity checking of the response to ensure it looks like a |
The API will be a deliverable for FlowFuse/website#2229 |
Moving this from the dev board, and to icebox, as it's not something we're going to resume for the time being |
Create an API endpoint (whether hosted in FlowFuse Cloud or elsewhere) that given a text-based prompt, returns a
flow.json
that can be deployed to Node-RED.Service or Self-Hosted:
Do we build/train/host our own LLM, or utilise an existing LLM as a Service? My gut would say, especially for first iteration, the latter is sensible.
Staging Prompts
Need to experiment with sending config/setup prompts to whichever AI service we use in order to ensure our prompts always return a valid Node-RED flow, etc.
You can see the setup prompts we use with
function-gpt
node here: https://github.com/FlowFuse/node-red-function-gpt/blob/42d3eeb7a28bef1a5d1b4bffb76a2906e0f8389f/config/index.js#L46-L49API Hosting
Do we provide a single endpoint that all FlowFuse instances can call, or do we make this a configuration on a FF instance-by-instance basis, and consequently each FF-instance has it's own endpoint to call?
Quality of Results
We should be flagging this as "Alpha" or "Beta" for some time to caveat quality of results given the hallucinations of LLMs
The text was updated successfully, but these errors were encountered: