Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove json file generating/posting in indexing (Ready for review) #33

Merged
merged 11 commits into from
Jul 5, 2017

Conversation

kshefchek
Copy link
Contributor

See background ticket: #32

This PR replaces portions of code that convert Neo4J results to JSON, with code that converts Neo4J results to solr input doc objects and then adds them using SolrJ.

Note that the tests are now broken, I will fix or write new ones. To build:
mvn clean install -Dmaven.test.skip

@kltm
Copy link

kltm commented Jun 15, 2017

@kshefchek Would you know what version of Solr you're using in production?

@kshefchek
Copy link
Contributor Author

@kltm yes we are using 6.2.1

@kltm
Copy link

kltm commented Jun 16, 2017

Okay, it's coming back to me now--wanting to make sure that no incompatibilities creep in, then remembering that we already tested for that.

It all looks pretty sane to me, it seems more of a conversion than a complete rewrite (not to underplay all the work in there).
The only thing that comes to mind would be that it might be nice to be able to get at the batching size for experimenting without recompiling.

@kshefchek
Copy link
Contributor Author

No you are right - it is straight conversion and was not much work. I can make batch size a cmd line arg if that makes sense.

@cmungall
Copy link
Contributor

Looks good

@kshefchek
Copy link
Contributor Author

great! I'll test this branch to index our three cores with our dev scigraph, fix tests, and update the docs.

@kshefchek
Copy link
Contributor Author

Batch size notes:
1 million: GC overhead limit
100k: SocketException: broken pipe
10k: SocketException: broken pipe
1k: No exceptions

For now going with 1k batches, it may be worth catching the socket exception and examining the documents to see if one contains an excessively large evidence graph.

@kshefchek kshefchek changed the title WIP - Remove json file generating/posting in indexing (Do not merge) Remove json file generating/posting in indexing (Ready for review) Jun 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants