-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build Kafka Connector for Dgraph in Live and Bulk Loader #3967
Comments
great idea,I hoop Dgraph can has a close connection with process engine(eg. Flink, Spark) in the near future |
Hey @willem520, Could you tell us more about what you would expect from these integrations with Flink or Spark? |
hello, in my project,I want to use Flink or Sparking streaming to process Rdf or json data in realtime, and I need to transport history data from other graph database(eg janusGraph) to Dgraph.but I found, when I used the spark and dgraph4j to process large dataset(eg 5 million node),it was always failed, and sometime, there was breakdown in alpha. |
I'm sorry but I'm going to need more information on what you were actually building and how it failed. If I understand correctly, you're processing a stream of events in RDF or JSON format? What exact API would you like us to provide to integrate with Spark or Flink? |
How many cores are you providing to each executor? How many executors are you running concurrently? You could try reducing the size of each transaction so that it finishes quickly and total number of pending transactions are fewer. |
I used 4executor-cores,5num-executors.I needed to import at least 100 million data to Dgraph |
Not directly related to Dgraph, but Neo4j just announced a new product which will tightly integrate Neo4j with Kafka. I feel like this is a feature which might greatly impact DB choice for (new) projects. https://www.datanami.com/2019/10/01/neo4j-gets-hooks-into-kafka/ |
@AshNL Have you ever used neo4j in your entire life? We did for ~3 months and actually migrating everything away from it to save our sanity and company. I cannot remember any other database that was causing more operational problems, more concurrency issues, and consistently terrible performance. The most mind-boggling thing is, the company indeed listens to all reported problems, but they never fixed anything... Meanwhile, we run the most mission-critical stuff on Postgres. We de-normalized those few tables to operate entirely join-free to sustain very high performance. With DGraph, there are a few rough edges because its relatively new, but for the most part, when it runs, it just runs. For the aforementioned Kafka connector, there are tutorials out of how to write one. I think implementing the connector with a queue and proper batch-writing should do the trick. https://www.confluent.fr/blog/create-dynamic-kafka-connect-source-connectors/ |
No need to start biting. I'm sorry I'm not as experienced as you are. In the meantime I have indeed written my own connector. |
Github issues have been deprecated. |
This will allow loading data directly from Kafka.
The text was updated successfully, but these errors were encountered: