-
Couldn't load subscription status.
- Fork 1
feat: Initial Docling API and Client for health and source conversion #22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@edeandrea I'm getting lots of warnings due to missing JavaDoc. Is that rule part of the editorconfig? I wonder if we want to enforce comments on all public methods. For now, I haven't added them. Mostly because the majority would be redundant (like on setter methods provided by the several builder classes). |
|
Yeah I agree that it shouldn't be required on everything. I'll investigate how to get it to make less noise. I think it's a gradle thing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll review more in depth this week
|
The problem with Java records is that they are impossible to make backwards compatible. Any addition of a new attribute is a breaking change. Should we keep the API just a set of interfaces? We could use the builder pattern to build default implementations (which could be records), but the fact that it's a record should be an implementation detail? |
I got rid of all of these javadoc warnings. |
* The API is represented by an interface and a collection of Java records and enums. Jackson annotations are used, but implementations are not required to use Jackson. * The Client is a default implementation of the API. It uses the JDK HttpClient and Jackson. It also works as a reference implementation. Libraries and frameworks can implement the DoclingApi interface to provide their own implementation. * Unit tests and integration tests are included for the most common paths. Signed-off-by: Thomas Vitale <ThomasVitale@users.noreply.github.com>
|
@ThomasVitale rather than do an in-depth review here, I think I'd prefer to agree with your statement Lets get it in there, review it, and then evolve it a bit. I've also committed the testcontainer module so we can definitely use our own module in tests. As I mentioned here I'm not sure Java records should be part of a public API because they are not backwards compatible. I was thinking we build interfaces for everything with builders. Records are great, don't get me wrong, but honestly they cause many headaches when they are part of a public API that evolves (been there done that in both LangChain4j & Quarkus). I was also thinking for the client, maybe we can extend it a bit using a @ThomasVitale / @lordofthejars thoughts? |
|
Well depending on how you use records you can still have backward compatibility but it is true that complicates a lot everything. On the other side it will look weird having the model as interfaces, but obvisouly the docling API is what it is. Maybe if we want to stick with records we could make our versions compatible with Docling version, So our 1.7.X version is compatible with their 1.7.X version. If we find a bug or we want to improve we increase the patch version 1.7.1, .2 .3, and when they realease a new 1.8.0, we check the API if it is still the same we release the 1.8.0, if not, we adapt the records and we release the 1.8.0. So basically we are behaving like them. If they break we break. Is it perfect, no of course, but at least we are aligned everywhere, we follow the same principle. |
I'd really like to try as much as we can to not get into this situation. Doing so will mean lots of "babysitting", which isn't fun. Sure there may be times things need to break because upstream breaks, but let's try and make that the exception.
What if we followed a pattern like this? It allows us to use records as the impl classes, but makes things completely backwards compatible (through the use of builders). This is the pattern we follow in LangChain4j and its worked out very nicely for us. We could even delegate all the This approach also gives you something like "copy constructors" for free. It embraces that objects are ephemeral, but allows you to customize objects without mutation.
|
|
Sure, it is valid and probably this is what we need to do.moreover ifwe want to let implementators change it, then we are good to go. |
@ThomasVitale if you're ok with this approach I'll merge this PR as-is and then I can help with some of the refactoring :) |
Signed-off-by: Eric Deandrea <eric.deandrea@gmail.com>
|
@edeandrea thanks for the review! I like the idea of merging this and then look into the design we want to adopt moving forward. I would suggest creating two separate issues where to discuss these subjects: one for the API and one for the Client. What do you think? I have some concerns about the interface and SPI approach for the domain APIs, but maybe it's better if I share them on a dedicated issue rather than here? |
Done! |
This comes from the Arconia Docling library. It includes support for two endpoints:
/health/v1/convert/sourceThe domain objects are modelled as Java records and Jackson-annotated, especially to configure the naming (though, Jackson is not required at runtime). There is a
DoclingApiinterface which would be the one implemented in Quarkus or Spring Boot. In the client package, there is a default implementation (DoclingClient) based on JDK HttpClient and Jackson.I see this as a starting point, so to have something working to begin with. I'm open to evolve this in different ways moving forward.