Skip to content

Feature/updater#33

Merged
denkv merged 38 commits into
developfrom
feature/updater
May 26, 2026
Merged

Feature/updater#33
denkv merged 38 commits into
developfrom
feature/updater

Conversation

@carowa292
Copy link
Copy Markdown
Contributor

No description provided.

carowa292 and others added 6 commits April 24, 2026 11:57
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
@carowa292 carowa292 requested a review from kymeyer May 6, 2026 13:33
@denkv
Copy link
Copy Markdown
Collaborator

denkv commented May 20, 2026

It seems that I always get a response like "the provided information was not sufficient" compared to the develop branch where it gives an answer in the same conditions.

For example:
PDF file: https://content.iospress.com/articles/data-science/ds190021
Question: "What is hobbit"

@carowa292
Copy link
Copy Markdown
Contributor Author

It seems that I always get a response like "the provided information was not sufficient" compared to the develop branch where it gives an answer in the same conditions.

For example: PDF file: https://content.iospress.com/articles/data-science/ds190021 Question: "What is hobbit"

Most likely it is not ingested

@denkv
Copy link
Copy Markdown
Collaborator

denkv commented May 20, 2026

Most likely it is not ingested

Yes, looks like something with import or ingestion. Tried to run the import with develop branch and then run the RAG pipeline from feature/updater and it worked.

return all_documents


def process_delta_imports(
Copy link
Copy Markdown
Collaborator

@denkv denkv May 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that process_delta_imports and _delta_by_source is the only place where index or update_documents is called, and process_delta_imports is not called from anywhere. I suppose it should be called either during or after importer/main.py::main runs.

Previously it was handled after import, but the function which was used to start it disappeared at this point:
1d757c8#diff-3562c2e63dafe5160aaee2cf765616db62686b3a77d8aab94c9da894bb6fe4e3L202-L205

So the question would be: at which point process_delta_imports should be called?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, should be calle din main, will update

@denkv
Copy link
Copy Markdown
Collaborator

denkv commented May 21, 2026

  File "learn2rag/pipeline/app.py", line 125, in <lambda>                                             
    chunks = await loop.run_in_executor(executor, lambda: list(sync_gen()))                                                                               
                                                          ~~~~^^^^^^^^^^^^
  File "learn2rag/pipeline/app.py", line 122, in sync_gen                                        
    for chunk in generate.generate_stream(question, results, opt_config, request_id=request_id):                                                          
                 ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                          
  File "learn2rag/pipeline/generate.py", line 36, in generate_stream                                  
    context = "\n\n".join([context_template.format(source=result.payload['path'], content=result.payload['content']) for result in search_results]) # type
: ignore[index]                                                                                                                                           
                                                          ~~~~~~~~~~~~~~^^^^^^^^                                                                          
KeyError: 'path'                                                                                                                                          

Example of result:

ScoredPoint(
id='2516e9c5-f34f-4249-8454-f3ba8ed8fc50',
version=5,
score=0.52725434,
payload={
  'content': '...',
  'source': '....pdf',
  'content_hash': '74bc2f7667474af9c82af19bc89a3c1a27b12f48575cef7316d728a5ec50f7d8',
  'chunk_hash': '306b5f122e5e7ee5a2222f8bf37d22d9',
  'title': '...',
  'uri': '',
  'loader_id': '420fe88d-8171-4225-bd55-a740feba85c8',
  'document_id': '',
},
vector=None,
shard_key=None,
order_value=None)

@carowa292
Copy link
Copy Markdown
Contributor Author

  File "learn2rag/pipeline/app.py", line 125, in <lambda>                                             
    chunks = await loop.run_in_executor(executor, lambda: list(sync_gen()))                                                                               
                                                          ~~~~^^^^^^^^^^^^
  File "learn2rag/pipeline/app.py", line 122, in sync_gen                                        
    for chunk in generate.generate_stream(question, results, opt_config, request_id=request_id):                                                          
                 ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                          
  File "learn2rag/pipeline/generate.py", line 36, in generate_stream                                  
    context = "\n\n".join([context_template.format(source=result.payload['path'], content=result.payload['content']) for result in search_results]) # type
: ignore[index]                                                                                                                                           
                                                          ~~~~~~~~~~~~~~^^^^^^^^                                                                          
KeyError: 'path'                                                                                                                                          

Example of result:

ScoredPoint(
id='2516e9c5-f34f-4249-8454-f3ba8ed8fc50',
version=5,
score=0.52725434,
payload={
  'content': '...',
  'source': '....pdf',
  'content_hash': '74bc2f7667474af9c82af19bc89a3c1a27b12f48575cef7316d728a5ec50f7d8',
  'chunk_hash': '306b5f122e5e7ee5a2222f8bf37d22d9',
  'title': '...',
  'uri': '',
  'loader_id': '420fe88d-8171-4225-bd55-a740feba85c8',
  'document_id': '',
},
vector=None,
shard_key=None,
order_value=None)

'Path' was Changed to 'source' at some point

@denkv denkv merged commit f774dad into develop May 26, 2026
1 check passed
@denkv denkv deleted the feature/updater branch May 26, 2026 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants