Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Adding RAG_FAQ.MD to retrieval-augmented-generation #655

Closed
wants to merge 8 commits into from
24 changes: 24 additions & 0 deletions gemini/use-cases/retrieval-augmented-generation/RAG_FAQ.MD
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since most of these are related to the intro_multimodal_rag notebook, can you add these as a section at the end of that notebook?

@lavinigam-gcp What do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reasons behind keeping it as a standalone doc:

  • the doc will grow as more questions are brought up. *.MD files don't need to go through as rigorous a pull/merge process as *.ipynb, so the maintenance would be easier.
  • from a customer's standpoint, it might be more useful to refer to one RAG QnA doc rather than look at every notebook in the directory.
    That being said, it is your repo, happy to "disagree and commit". it is more important to get the content in.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your reasoning. I'm thinking that the questions specific to the one notebook should be in the notebook itself, or maybe an FAQ.md in that specific directory, or just more specific explanations in the main content of the notebook.

The rest of the general RAG FAQ should really be turned into official documentation.

Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
RAG_FAQ.MD
Most questions apply to
https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retrieval-augmented-generation/intro_multimodal_rag.ipynb

Q: Why do we need both variables: text_embedding_page (the embedding of the entire page) and text_embedding_chunk (the embedding of each text chunk)?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use code font when referring to variables/methods

A: We use chunk for most cases in the downstream task in the notebook. This is again from a demonstration purposes that you can either take the whole page or divide them further into smaller chunk. A lot depends on 1) LLM token limit, 2) How would you want your data to be structured and searched. Like, Gemin 1.0 has 8k token limit, so we can very easily do a single page (and multiple as well), and same with Gemini 1.5, where you can probably send a whole document at go. One of the rationale for chunk is to make sure, we don't have too much noise while search for something specific. Hope that clarifies?

Check warning on line 6 in gemini/use-cases/retrieval-augmented-generation/RAG_FAQ.MD

View workflow job for this annotation

GitHub Actions / Check Spelling

`Gemin` is not a recognized word. (unrecognized-spelling)

Q. The accompanying presentation (and https://github.com/GoogleCloudPlatform/generative-ai/blob/main/embeddings/intro-textemb-vectorsearch.ipynb talks about ANN method of finding closest embeddings match, but the code uses cosine similarity… Should we even mention ANN for this notebook?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turn that notebook link into a hyperlink.

A. Yes, In the notebook, we use simple "cosine similarity", but in real world at big scale, you would want something like managed ANN, as given in the presentation.

Q. The cell which calls get_gemini_response() generates different responses if I run it several times, even with temperature=0 and same 'query' and 'context' variable. Is this expected? Can anything be done to make it generate the same response every time?
A. Yeah, that behavior is expected. Temp=0 doesn't solve it. I am not sure if there is any plan in the roadmap to address it. You can however make prompts do that magic by forcing structure like JSON

Check warning on line 12 in gemini/use-cases/retrieval-augmented-generation/RAG_FAQ.MD

View workflow job for this annotation

GitHub Actions / Check Spelling

`roadmap` is not a recognized word. (unrecognized-spelling)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this more formal, instead of "Yeah" put "Yes" also fully spell out the variable name. See comment above about code formatting.


Q. Why did we split 14-page PDF into two 7-page PDFs? IS there any limitation that would have prevented us from processing the entire 14-page doc?
A. The pdf is split just to show that you can read multiple pdf’s with the logic. That’s the only reason.

Q. For text embeddings, we are splitting the docs into smaller "chunks". Are we doing the same for images?
A. No we are not chuncking images. We have two kinds of image embeddings - one uses ‘multimodal-embeddings’ where we send image directly and the API returns embeddings.

Check warning on line 18 in gemini/use-cases/retrieval-augmented-generation/RAG_FAQ.MD

View workflow job for this annotation

GitHub Actions / Check Spelling

`chuncking` is not a recognized word. (unrecognized-spelling)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A. No we are not chuncking images. We have two kinds of image embeddings - one uses multimodal-embeddings where we send image directly and the API returns embeddings.
A. No we are not chunking images. We have two kinds of image embeddings - one uses `multimodal-embeddings` where we send the image directly and the API returns embeddings.

The second method is - we send an image to Gemini - get a description of it as text - and send that text to the ‘text-embeddings’ model and get embeddings back. In the latter case, we send the whole text as is. No chunking.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turn into a bullet point list.


Q: Is it a good practice to do a RAG on the LLM-generated summary of text chunks rather than the raw text? Context: for customer-service or “trouble-ticket” use-cases, the docs in the data corpus aren’t always grammatically correct or well-structured syntactically. Would document pre-processing/summarization prior to generating the embeddings lead to a higher accuracy of a RAG-based solution?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change quote marks to straight quotes " instead of curly quotes “”

Suggested change
Q: Is it a good practice to do a RAG on the LLM-generated summary of text chunks rather than the raw text? Context: for customer-service or trouble-ticket use-cases, the docs in the data corpus aren’t always grammatically correct or well-structured syntactically. Would document pre-processing/summarization prior to generating the embeddings lead to a higher accuracy of a RAG-based solution?
Q: Is it a good practice to do a RAG on the LLM-generated summary of text chunks rather than the raw text? Context: for customer-service or "trouble-ticket" use-cases, the docs in the data corpus aren’t always grammatically correct or well-structured syntactically. Would document pre-processing/summarization prior to generating the embeddings lead to a higher accuracy of a RAG-based solution?

A. It depends. This may indeed be helpful to pre-process documents via an LLM before generating the embeddings. One would still want to insert the original documents as prompt context, however.


Loading