Build Kafka Connector for Dgraph in Live and Bulk Loader #3967

mangalaman93 · 2019-09-11T13:10:03Z

This will allow loading data directly from Kafka.

willem520 · 2019-09-12T09:16:23Z

great idea，I hoop Dgraph can has a close connection with process engine(eg. Flink, Spark) in the near future

campoy · 2019-09-13T14:43:37Z

Hey @willem520,

Could you tell us more about what you would expect from these integrations with Flink or Spark?

willem520 · 2019-09-15T07:07:28Z

Hey @willem520,

Could you tell us more about what you would expect from these integrations with Flink or Spark?

hello, in my project,I want to use Flink or Sparking streaming to process Rdf or json data in realtime, and I need to transport history data from other graph database(eg janusGraph) to Dgraph.but I found, when I used the spark and dgraph4j to process large dataset(eg 5 million node),it was always failed, and sometime, there was breakdown in alpha.

campoy · 2019-09-17T15:48:39Z

I'm sorry but I'm going to need more information on what you were actually building and how it failed.

If I understand correctly, you're processing a stream of events in RDF or JSON format?
Or is it a batch analysis with 5 million nodes?

What exact API would you like us to provide to integrate with Spark or Flink?

willem520 · 2019-09-19T02:56:49Z

hi,I used Spark to load 5 million node into memory and used 100 partitions to process data, in each partition, I build 2000 node with JSON format into a mutation,an used dgraph4j client to execute txn.mutate. when I run the program,it was failed, and got the error message

if I used a small dataset(eg 500000 node) in the same program, it was successed.

mangalaman93 · 2019-09-19T07:14:32Z

How many cores are you providing to each executor? How many executors are you running concurrently? You could try reducing the size of each transaction so that it finishes quickly and total number of pending transactions are fewer.

willem520 · 2019-09-23T03:19:17Z

I used 4executor-cores,5num-executors.I needed to import at least 100 million data to Dgraph

Naralux · 2019-10-03T06:26:43Z

Not directly related to Dgraph, but Neo4j just announced a new product which will tightly integrate Neo4j with Kafka. I feel like this is a feature which might greatly impact DB choice for (new) projects. https://www.datanami.com/2019/10/01/neo4j-gets-hooks-into-kafka/

marvin-hansen · 2020-01-16T11:12:49Z

@AshNL Have you ever used neo4j in your entire life?

We did for ~3 months and actually migrating everything away from it to save our sanity and company. I cannot remember any other database that was causing more operational problems, more concurrency issues, and consistently terrible performance. The most mind-boggling thing is, the company indeed listens to all reported problems, but they never fixed anything...

Meanwhile, we run the most mission-critical stuff on Postgres. We de-normalized those few tables to operate entirely join-free to sustain very high performance.

With DGraph, there are a few rough edges because its relatively new, but for the most part, when it runs, it just runs.

For the aforementioned Kafka connector, there are tutorials out of how to write one. I think implementing the connector with a queue and proper batch-writing should do the trick.

https://www.confluent.fr/blog/create-dynamic-kafka-connect-source-connectors/

Naralux · 2020-01-16T12:41:54Z

No need to start biting. I'm sorry I'm not as experienced as you are. In the meantime I have indeed written my own connector.

minhaj-shakeel · 2020-07-20T18:57:57Z

Github issues have been deprecated.
This issue has been moved to discuss. You can follow the conversation there and also subscribe to updates by changing your notification preferences.

danielmai added the kind/feature Something completely new we should consider. label Sep 11, 2019

campoy added area/integration area/integrations Related to integrations with other projects. and removed area/integration labels Sep 13, 2019

campoy added the status/needs-specs Issues that require further specification before implementation. label Sep 13, 2019

MichelDiz added the popular label Feb 12, 2020

shekarm added the status/accepted We accept to investigate/work on it. label Feb 18, 2020

minhaj-shakeel closed this as completed Jul 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build Kafka Connector for Dgraph in Live and Bulk Loader #3967

Build Kafka Connector for Dgraph in Live and Bulk Loader #3967

mangalaman93 commented Sep 11, 2019

willem520 commented Sep 12, 2019

campoy commented Sep 13, 2019

willem520 commented Sep 15, 2019

campoy commented Sep 17, 2019

willem520 commented Sep 19, 2019 •

edited

Loading

mangalaman93 commented Sep 19, 2019 •

edited

Loading

willem520 commented Sep 23, 2019 •

edited

Loading

Naralux commented Oct 3, 2019

marvin-hansen commented Jan 16, 2020

Naralux commented Jan 16, 2020

minhaj-shakeel commented Jul 20, 2020

Build Kafka Connector for Dgraph in Live and Bulk Loader #3967

Build Kafka Connector for Dgraph in Live and Bulk Loader #3967

Comments

mangalaman93 commented Sep 11, 2019

willem520 commented Sep 12, 2019

campoy commented Sep 13, 2019

willem520 commented Sep 15, 2019

campoy commented Sep 17, 2019

willem520 commented Sep 19, 2019 • edited Loading

mangalaman93 commented Sep 19, 2019 • edited Loading

willem520 commented Sep 23, 2019 • edited Loading

Naralux commented Oct 3, 2019

marvin-hansen commented Jan 16, 2020

Naralux commented Jan 16, 2020

minhaj-shakeel commented Jul 20, 2020

willem520 commented Sep 19, 2019 •

edited

Loading

mangalaman93 commented Sep 19, 2019 •

edited

Loading

willem520 commented Sep 23, 2019 •

edited

Loading