-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search based QA research #174
Comments
I assume we want to develop our own system that can be run locally and does not rely on calling any of the cohere APIs, right? |
@billray0259 yes, primarily we would like to focus on an open-source solution. (extension modules allowing arbitrary API calls are also possible, but not the preferred solution. Integration of search-engine results is definitely one of the core features that we want to implement in Open-Assistant) Regarding the cohere approach a first step could be to analyze and describe which functionality the generate&embed functions offer/what is needed for this approach. |
Alright, I'll learn what I can about their system and generate a report. |
@billray0259 thank you very much for taking this on. once you have the report, could you make a PR with the report inside a markdown file? maybe somewhere in |
Hi @andreaskoepf , @billray0259 , @yk I would love to help you out in this effort! I guess once Bill evaluates the framework, additional datasources and research into good ways of fine-tuning for embeddings can be done! |
Hey, happy to help too. https://github.com/hwchase17/langchain is an open-source variation with a lot of activity. I can help write up a document for this as well. Don't want to issue snipe but I went through coheres repo and generated a report. Here it is if you want to use it, or I can make a pull request if you like it. let me know what you think :) Cohere Grounded Question AnsweringGrounded question answering is a system that aims to provide accurate and contextualized answers to factual questions by combining the strengths of language models and Google search. The system uses language models, such as Cohere API and Serp API, to understand the context and form natural language questions. It then uses Google search to find relevant information on the web and provide an answer based on the retrieved information. The motivation behind this approach is that language models are good at generating sensible answers to complex questions, but they do not have a mechanism for determining the truthfulness of their answers. On the other hand, Google search is effective at finding factual information on the web, but it is not as good at understanding contextual questions or providing answers in natural language. By combining the ability of language models to understand context and generate natural language questions with the consensus-based truthfulness of Google search, grounded question answering aims to provide reliable and accurate answers to a wide range of factual questions. There are some potential failure modes for the bot, such as when the user asks a question that implies something that is not true or when the bot is unable to find relevant information for the question. SummaryThis code repository contains various scripts that can be used to create a question answering chatbot. The chatbot is able to generate answers to questions based on a given context and information gathered from the web using the Google Search API.
discord_bot.pyThe code for The bot is capable of answering questions based on the context of the conversation and information from the web, and can be triggered by either direct messages to the bot or by users reacting with a specific emoji to a message. The script takes in several command line arguments, including an API key for Cohere, an API key for the serpAPI, and an API key for the Discord bot. The script sets up an instance of the qa/bot.pyThe The The qa/model.pyThe The The qa/answer.pyThe The The qa/search.pyThe The function The The The The |
@danielpatrickhug Thank you for putting this together! I'm working on a report as well. I'll try to add supplementary information to your report; hopefully, the combination will be greater than the sum of its parts. I plan on doing some research to determine what models they are currently using for tasks such as:
I also want to share some ideas related to using OpenAssistant to accomplish these tasks. It looks like we'll have an excellent understanding of the Cohere Grounded QA system! I will be able to provide a pull request with my report by the end of the day today. I'll include my discussion questions/thoughts in the body of the request instead of the report document. Thanks again @danielpatrickhug |
@billray0259 you're welcome! happy to help. all good ideas, looking forward to it! The default model for searching through paragraphs is "multilingual-22-12" from cohere. I'll see if I can get a similar document going for langchain! |
@mrcabbage972 Oh cool, Thanks for sharing I'll check it out. Langchain is more of a general framework for building a LLM 'agent' or 'chain' while cohere's repo is for asking and aswering questions using their apis. Langchain on the other hand allows you to create general prompt templates, use embedding representations, query document/vector stores, and a framework for giving llmchains tools like:
its also, works with Openai, huggingface, and cohere models. |
What is the action items from this? One other difference between Cohere QA and LangChain is Cohere has more processing in the Serp API results than LangChain (by splitting up into paragraphs). This paragraph splitting may not be sufficient for certain types of questions or queries so that would be another workstream. |
Take a closer look at sandbox-grounded-qa and analyze how they do the contextualization with Google search. Describe the requirements to use such a technique for queries against our LM. Write a short report as md file.
Video: https://youtu.be/DpOQpClVgCw
Your research will help up to plan the next iteration after the MVP.
The text was updated successfully, but these errors were encountered: