Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-16782: Export tool should export in JSON that matches Solr Import Format #1623

Merged
merged 9 commits into from
May 9, 2023

Conversation

epugh
Copy link
Contributor

@epugh epugh commented May 3, 2023

https://issues.apache.org/jira/browse/SOLR-16782

Description

Export tool only supports Javabin and JSONL, not JSON... We would like to have that to be able to import docs directly!

Solution

Add JSON format... Make it the default format as JSONL is somewhat special purpose...

Tests

extended existing tests.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide

@epugh epugh requested review from ctargett and madrob May 3, 2023 16:53
@ctargett ctargett removed their request for review May 3, 2023 17:06
@epugh
Copy link
Contributor Author

epugh commented May 3, 2023

Okay, I think this is ready for review.. With slicked up examples in ref guide. I tested the round tripping...

@janhoy
Copy link
Contributor

janhoy commented May 3, 2023

Did not review, but makes sense.
However, for huuuge exports, jsonl makes sense since it is much easier to parse/stream line by line and won’t kill your editor. Probably solr’s json loader handles normal json in efficient way. But would be just as interesting to add support for jsonl to /update

@epugh
Copy link
Contributor Author

epugh commented May 3, 2023

@janhoy I spiked out JSONL for solr streaming expressions... SO you can split the big doc and have multiple loaders all running ;-) For similar reasons... I had an interesting conversation with Christopher Ball about this at Haystack ;-)

@epugh epugh merged commit ad4875d into apache:main May 9, 2023
3 of 5 checks passed
epugh added a commit that referenced this pull request May 9, 2023
…t Format (#1623)

Fixed bug that JSON output format was actually a JSON with Lines output format by introducing jsonl output parameter.   Add proper JSON output format.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants