Replies: 2 comments 7 replies
-
These are two different concepts. Let's tackle them individually.
This is a decision that I struggled for a bit when implementing it. Ultimately, I decided to keep it internal for now as part of the bigger refactor in #72. Meaning, for now, if you want to add a new document handler, you need to send a PR to ragna to add it. My reasoning for this is that this is common component in almost all use cases. If you can show me a good use case where you want a custom
I soft-disagree here. The code for our Lines 237 to 263 in 4962665 with the actual core part being only 5 Lines 259 to 263 in 4962665 So we aren't looking at crazy amounts of duplication here. IMO it is not worth it to pull a dependency in for this. That being said, maybe
This is what the |
Beta Was this translation helpful? Give feedback.
-
I recognise that extensions for fairly-arbitrary data connectors are possible at the moment, and I really appreciate the thought that was put in originally to allow this. My question though was i) is it the intention for I realise that adding support for a new connector is relatively straightforward (ten's of LoC as you point out). But it is the sheer scale of the connectors that are required (see the |
Beta Was this translation helpful? Give feedback.
-
Question: Like the title - how does
ragna
intend to develop support for other data connectors eg different file extensions like.docx
, or documents that are not on the local filesystem such as in the cloud or behind an API?Context: Similar in spirit to #177. At the moment,
ragna
only has support for reading local files with.txt
and.pdf
extensions. This support is provided via classes in thecore/_document.py
module. These classes are built on top of a custom import system that eagerly loads all available file handlers at runtime. The custom import system would seem to suggest that all new data connectors will need to be written from scratch.Personal Thoughts: On first impressions, this seems to be creating a very large body of work and distracting from
ragna
's goal as a very accessible and effective RAG orchestration framework eg see the number of data connectors supported bylangchain
here. Duplicating this effort inragna
also seems to me like a questionable use of community resources (but this is a very personal opinion). Is there any way thatragna
could hook in to the community effort from other projects (but only for the data connectors), and hence allow theragna
team to focus on optimising the RAG orchestration experience?Beta Was this translation helpful? Give feedback.
All reactions