[BUG]: Confluence connector only save partial document body in JSON #1501

jazelly · 2024-05-23T03:30:53Z

How are you running AnythingLLM?

Local development

What happened?

After confluence connector scraping confluence documents, the document bodies are not fully saved in JSON under storage.

After embedding them, it will not provide useful info as expected. For example, we have a confluence doc containing some code snippets and would like to ask questions to retrieve that. The code snippets is lost after scraping, however, which caused the LLM to response basic info.

I am not sure if this is a limitation of Atlassian API but surely users would expect more than just some basic info of the confluence documents.

Are there known steps to reproduce?

No response

The text was updated successfully, but these errors were encountered:

timothycarambat · 2024-05-23T15:48:35Z

@jazelly the pageContent of the associated docment is empty?

jazelly · 2024-05-23T23:38:12Z

@timothycarambat the pageContent is not empty. It has content, but just not include script content, e.g.

VIEW ALL\nsql\nASSIGN TO AN ACCOUNT\nThe account must already exist.\nsql\n

Notice the sql in the pageContent, which is supposed to be a SQL command. LLM makes up the answers when we ask a question related to that, since the prompt contains no reference to the real command

jainpradeep · 2024-05-30T10:55:17Z

Issue faced with local deployment as well. LLM responses are poor.

timothycarambat · 2024-05-30T14:25:24Z

Issue faced with local deployment as well. LLM responses are poor.

Has nothing to do with the deployment method or RAG structure, the RAG results are bad because the scraper is returning poor information from the documents. As @jazelly mentions, it seems like some non-text blocks are not returned or parsed using the Langchain parser - which is where this lies

jazelly · 2024-05-30T23:29:16Z

This might be an issue better for LangChain community.

To us, the current solution is nothing more than writing our own scraper to download these documents, and upload them to anything-llm via APIs

jazelly added the possible bug Bug was reported but is not confirmed or is unable to be replicated. label May 23, 2024

shatfield4 self-assigned this May 23, 2024

timothycarambat added investigating Core team or maintainer will or is currently looking into this issue core-team-only labels May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: Confluence connector only save partial document body in JSON #1501

[BUG]: Confluence connector only save partial document body in JSON #1501

jazelly commented May 23, 2024

timothycarambat commented May 23, 2024

jazelly commented May 23, 2024 •

edited

jainpradeep commented May 30, 2024

timothycarambat commented May 30, 2024

jazelly commented May 30, 2024

[BUG]: Confluence connector only save partial document body in JSON #1501

[BUG]: Confluence connector only save partial document body in JSON #1501

Comments

jazelly commented May 23, 2024

How are you running AnythingLLM?

What happened?

Are there known steps to reproduce?

timothycarambat commented May 23, 2024

jazelly commented May 23, 2024 • edited

jainpradeep commented May 30, 2024

timothycarambat commented May 30, 2024

jazelly commented May 30, 2024

jazelly commented May 23, 2024 •

edited