Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some questions on library choice #8

Open
BradKML opened this issue Apr 23, 2024 · 2 comments
Open

Some questions on library choice #8

BradKML opened this issue Apr 23, 2024 · 2 comments
Assignees
Labels
good first issue Good for newcomers

Comments

@BradKML
Copy link

BradKML commented Apr 23, 2024

  1. What are the difference between Anterion's components, and tools like ChatDev, AgentGPT, MetaGPT, and AutoGen?
  2. How can one get AI to do web design and web development (or more generally UX development and accessibility)?
  3. What are the criteria in picking an open model for use in Anterion? (e.g. Qwen, DeepSeek, Phind, Wizard, LLaMA)
  4. Can this project handle reading documents and browsing the internet for technical updates?
  5. Is automated fine-tuning or reranking possible to handle data science related tasks? (e.g. LDB, LATS, Parsel)
@MiscellaneousStuff
Copy link
Owner

Whoo, that's a lot of questions, I'll try my best to answer all of them.

  1. The main difference between Anterion and those other projects that you mentioned is that we're using SWE-agent as the base layer, and will soon include planning as well. From our experience, SWE-agent's approach of heavily utilising demonstrations per task and having restricted inputs and outputs greatly improves it's ability to solve novel programming tasks and also be robust to errors and allow it to know when to stop attempting to do something entirely, or to just backtrack and try another approach to a problem.
  2. One can improve AI web design and development by integrating multi-modal components into agents. This could involve using something like GPT4-V or Claude's vision API to allow the agent to have a general semantic understanding of the browser, and then use more specific tools like AgentQL to interface with specific parts of the website. You can also integrate a sense of time into the agent by recording generally how long things take to happen, this is quite important in web design as well. By integrating visual and spatial/geometric information into the agent, as well as improving how you present the DOM (whether that be the raw DOM, or some React-based DOM, etc.) into the agent and then combine those with the robustness of something like SWE-agent, this would be a good angle for improving AI-based web development. We'd like to try some of these methods ourself within Anterion.
  3. Ideally we would like to implement LiteLLM (for API-based LLMs) and Ollama (for local LLMs) which should abstract providing access to a wide range of local LLMs for end-users. This is the best approach as it will allow users to use whatever they want in a way which is easiest for us to implement (as long as those libraries cover the LLMs which people want to use). As to which LLMs in general work well, this will just be a question of performance. The ideal LLM will be able to work well with the prompts which are used by SWE-agent and have the context length to support that. At the moment, the average call for SWE-agent requires around 10,000 input tokens and may output up to 1,000 output tokens (depending on the individual action which the agent is performing). This gravitates towards bigger models which can utilise these larger calls better.
  4. This project can indirectly handle reading documents using bash tools or python, however if this is a feature you'd like to see more explicitly implemented then we can definitely add that to the list of features we'll add into the agents command specification next. As for browsing the internet, this is something which we'll explicitly add to the agents capabilities next, refer to Issue Add Browser Support During Web Development Tasks #3, for further context.
  5. I think you'll need to expand on this further for me to understand. From what I'm understanding, this query is asking if the agent can dynamically adjust its behaviour during data science tasks in a way that allows it to dynamically analyse data or change how it's performing certain tasks? The answer to this is generally yes but it's hard to say exactly without knowing exactly what you had in mind. If you join our discord on https://discord.com/invite/nbY6njCuxh, maybe we can give you a better answer.

Thanks for your interest in the project! Stick around, we've got exciting updates coming soon.

@MiscellaneousStuff MiscellaneousStuff self-assigned this Apr 24, 2024
@MiscellaneousStuff MiscellaneousStuff added the good first issue Good for newcomers label Apr 24, 2024
@BradKML
Copy link
Author

BradKML commented Apr 25, 2024

For 4, it is mostly about reading downloaded documentation + knowing how to web-browse to find updated documentation. Either way thanks for the cross-link to the other issue
For 5, It is mostly inspired by this benchmark that there is a whole field of optimization techniques that might be good to look at https://paperswithcode.com/sota/code-generation-on-humaneval

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants