Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new UpdateStream to the Streaming API #37

Open
joel-bernstein opened this issue Jan 15, 2015 · 1 comment
Open

Add new UpdateStream to the Streaming API #37

joel-bernstein opened this issue Jan 15, 2015 · 1 comment
Assignees

Comments

@joel-bernstein
Copy link
Contributor

The UpdateStream will send Updates to a SolrCloud Collection. UpdateStream will wrap a TupleStream. As it iterates the TupleStream it will send the Tuples to be indexed as documents in a SolrCloud collection. This will allow developers to build new data sets by combining and transforming TupleStreams.

Documents will be routed directly to the correct SolrCloud leader using techniques similar to CloudSolrServer. The actual documents will be sent using the ConcurrentUpdateSolrServer so updates can be Streamed rather than batched.

The UpdateStream can wrap any TupleStream. So it can wrap custom TupleStreams that pull data from other data sources such as RDBM's or NoSQL engines. This provides a generalized streaming ETL framework.

@joel-bernstein joel-bernstein self-assigned this Jan 15, 2015
@joel-bernstein
Copy link
Contributor Author

Added initial implementation to the helio_ustream branch.
fdf85a0
Not working yet but gives the basic idea. The initial code uses CloudSolrServer as the indexer.

Next step is to work on the Tuples that are returned from the read() method after each batch. These tuples will report on the progress of the indexing. I think it makes sense to report the number of batches indexed, in the queue and error counts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant