Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Product Roadmap 2020 #4724

Open
manishrjain opened this issue Feb 3, 2020 · 40 comments
Open

Product Roadmap 2020 #4724

manishrjain opened this issue Feb 3, 2020 · 40 comments
Labels

Comments

@manishrjain
Copy link
Member

@manishrjain manishrjain commented Feb 3, 2020

Here's the product roadmap for 2020.

  • Official GraphQL spec compliance
    • Queries (Q1)
    • Mutations (Q1)
    • Subscriptions (Q2)
    • Live Queries (Q2)
  • SaaS (TBD)
  • Scalability, performance, and reliability
    • Single predicate sharded across groups (TBD)
    • Distributed bulk loader (TBD)
    • Rolling upgrades (Q2)
    • Ludicrous mode (TBD)
    • Query Planner (TBD)
  • Enterprise features
    • Change Data Capture (Q2)
    • Multi-tenancy (Q1)
    • Point-in-time recovery (Q2)
    • Audit logs (TBD)
    • ACL integration with AD, LDAP (stretch goal)
  • Integrations
    • Kubernetes
      • Operator (Q2)
      • Bulk Loader (Q2)
    • Kafka (TBD)
    • Others. Let us know in comments.

We have mentioned the features we are planning to focus on in Q1 and Q2 (first half of 2020). For the rest, we'll assess them for their ETA as reach mid-year. Tell us what more you'd like to see happen in 2020!

@manishrjain manishrjain added the roadmap label Feb 3, 2020
@manishrjain manishrjain mentioned this issue Feb 3, 2020
8 of 14 tasks complete
@manishrjain

This comment has been minimized.

Copy link
Member Author

@manishrjain manishrjain commented Feb 3, 2020

I know a bunch of folks have been looking for Gremlin support. We're currently focused on GraphQL, but if you need Gremlin to work with Dgraph, show your support by upvoting this comment and we'll consider prioritizing it in a few months.

@smkhalsa

This comment has been minimized.

Copy link

@smkhalsa smkhalsa commented Feb 3, 2020

@manishrjain Thanks for the update and for the wonderful work you and the dgraph team are doing.

I'm glad to see that a fully managed SaaS option is on the roadmap.

One question: I don't see any specific mention of exposing full GraphQL+- functionality in spec-compliant GraphQL queries (something like what neo4j-graphql-js offers). There was an indication from the dgraph team shortly after the graphql api was launched that this was coming soon. If that's the case, can you give a preview of how that might work?

@manishrjain

This comment has been minimized.

Copy link
Member Author

@manishrjain manishrjain commented Feb 3, 2020

exposing full GraphQL+- functionality in spec-compliant GraphQL queries

We're considering multiple ways of doing this:

  1. Figuring out ways in which we can port over +- functionality in GraphQL spec compliant way, so a GraphQL user can just construct these queries directly (fuzzy matching, full-text search, has function).

  2. Adding automatically generated GraphQL functions for the complex +- functionality (like, say email: string @index(exact) @upsert could have upsertEmail func automatically generated).

  3. If a feature can't be translated into GraphQL, then having a way for a user to specify a function name along with the +- query it maps to. This is the closest to the resolver pattern exposed via Apollo. This would be done at schema level, instead of involving any particular programming language (Apollo's resolvers are typically javascript based) for maximizing compatibility among various languages.

For now, we're going with easy wins, that is 1 and 2. Once we have covered and exhausted ways to port +- into 1 or 2, we can look into 3.

@vikram-ma

This comment has been minimized.

Copy link

@vikram-ma vikram-ma commented Feb 5, 2020

@manishrjain
Where can I find more details to understand what these goals really mean?

  • For example, what does "Ludicrous mode (TBD)" do? and what's the plan to achieve this is.
  • Also can you elaborate Query Planner (TBD)?
    Do you guys have any checks in place to control GraphQL (and graphql+-) query complexity? To prevent friendly fire scenario? Will this be part of Query Planner?
@Relaxe111

This comment has been minimized.

Copy link

@Relaxe111 Relaxe111 commented Feb 6, 2020

Hello, it looks like multi-tenancy support is added to Enterprise feature. I would kindly ask to reconsider this feature to make it open source or to make a more flexible pricing plan. Locking this feature in Enterprise version could be a big barrier in widely adoptoption of dgraph, at least in the EU it will be for sure.

@manishrjain

This comment has been minimized.

Copy link
Member Author

@manishrjain manishrjain commented Feb 6, 2020

All the enterprise features (including multi-tenancy) would be automatically included in the SaaS offering. That should allow for flexible, pay as you go sort of pricing.

Ludicrous mode -- Idea is to allow a mode of Dgraph, which gives up on some "correctness" things to achieve maximum performance. For a lot of people, if Dgraph doesn't give them the needed speed, they revert to a NoSQL database, which provide you very little in terms of consistency / transactional guarantees. This mode would allow Dgraph to run with lower guarantees, but at a faster speed.

Query Planner: Dgraph doesn't do much query planning right now. It executes the queries in the same order they're given. Of course, we could do a better job by having an internal query planner which can alter the ordering of tasks to achieve better performance.

Do you guys have any checks in place to control GraphQL (and graphql+-) query complexity

Not sure what you mean.

@vikram-ma

This comment has been minimized.

Copy link

@vikram-ma vikram-ma commented Feb 6, 2020

Clients can do arbitrarily complex GraphQL queries.i.e clients control it the query complexity.
What it means is that clients can issue very very complex queries i.e queries could take minutes/hours to complete, and impact dgraph's ability to serve other requests.

A valid client is making very complex query, that significantly slows down or impacts dgraph's ability to serve other client requests.

Are there checks in place to detect/prevent/handle this?

@Willem520

This comment has been minimized.

Copy link

@Willem520 Willem520 commented Feb 6, 2020

Hello, it looks like multi-tenancy support is added to Enterprise feature. I would kindly ask to reconsider this feature to make it open source or to make a more flexible pricing plan. Locking this feature in Enterprise version could be a big barrier in widely adoptoption of dgraph, at least in the EU it will be for sure.

hi, multi-tenancy is a popular feature, many developer like us need this feature to improve our project. Personally, I hope it could be free. thx

@manishrjain

This comment has been minimized.

Copy link
Member Author

@manishrjain manishrjain commented Feb 6, 2020

Are there checks in place to detect/prevent/handle this?

A client can specify a context with timeout, which can shut the query down once if it runs too long. But, apart from that, nothing avoids that right now. Once we have a query planner and can calculate the cost of running a query, we can do a better job of rejecting expensive queries.

@nmabhinandan

This comment has been minimized.

Copy link

@nmabhinandan nmabhinandan commented Feb 6, 2020

  • A startup program offering evaluation license of enterprise edition. (or better yet, SaaS credits).
  • A reactive Spring Data client (Along the lines of SDN/RX)
  • GeoJson support
@manishrjain

This comment has been minimized.

Copy link
Member Author

@manishrjain manishrjain commented Feb 6, 2020

A startup program offering evaluation license of enterprise edition. (or better yet, SaaS credits).

Already there. Every Dgraph instance comes with a month of free enterprise trial.

GeoJson support

Already there. Dgraph has been supporting geo queries since the early days. In fact, some users have said that Dgraph's geo support is better than PostGis (we haven't verified).

@Relaxe111

This comment has been minimized.

Copy link

@Relaxe111 Relaxe111 commented Feb 6, 2020

Saas including multi-tenancy I think is not what majority of developers will really searching for. I mean by more flexible pricing plan to make gradual licencing. For example a company could choose and pay only for Enterprise feature in which is interested, if I need only one feature I would not be happy to pay full license but using only one feature.

@jdeal-mediamath

This comment has been minimized.

Copy link

@jdeal-mediamath jdeal-mediamath commented Feb 7, 2020

My company is in the position of exploring different graph DBs to modernize our stack, and Dgraph is the best fit in every category (especially looking at this 2020 roadmap), except for lack of Gremlin support (we would like to port our existing queries, and eventually switch). Excited to see how this roadmap progresses!

@manishrjain

This comment has been minimized.

Copy link
Member Author

@manishrjain manishrjain commented Feb 7, 2020

Generally asking the community here, for Gremlin support, I'm curious if that's really a deal breaker. The port of queries from Gremlin to GraphQL is probably a one-time effort -- and you get the benefit of new, easy to use tech, JSON support, with a growing ecosystem of tools and editors to support creating queries, exploring data, etc. (GraphQL has so many editors).

@jdeal-mediamath

This comment has been minimized.

Copy link

@jdeal-mediamath jdeal-mediamath commented Feb 7, 2020

Oh porting is definitely something we would be opening to do. Its that we are moving under time pressures and we know that what we have works, and that a few of our members are experienced with gremlin. Not to mention we can switch backends as needed with the support. So that is very specific to us, but it would help teams who are exploring dgraph decide early on if it is a good fit for them by allowing them to use what they have.

@brianbroderick

This comment has been minimized.

Copy link

@brianbroderick brianbroderick commented Feb 9, 2020

I mentioned this in #2693 but wanted to make sure people see this.

First, I want to say that I love Dgraph and am an evangelist for your product. I've given several talks and am constantly trying to get people interested in Dgraph.

There's one last hurtle to start getting actual adoption, and that's the ability to have multiple schemas in a dev & test environment.

Most companies in my area use MySql, Maria, or Postgres. Therefore having the ability to have many schemas is something people take for granted and is something people are not willing to pay for.

It's a challenge to get people to switch from something they are comfortable with. Making this as painless as possible is the only way to get widespread adoption.

There are many reasons to have multiple schemas; for example, it's typical to have a dev, test, and prod environment with their respective schemas. This makes it so the test database can be recreated before each test run. Right now, the only way to accomplish this is to either have multiple instances of Dgraph running, or to add a prefix to all predicates.

If I only want to clear test environment predicates, adding a prefix complicates queries like this: &api.Operation{DropAll: true}, which I run before any tests. It would also complicate Go structs when determining the right predicate values in JSON.

It's also typical to work on many micro services at a time, but these micro services should not have any chance of data colliding with each other; they should be completely isolated. It doesn't seem realistic to have 10+ instances of DGraph running at the same time on a laptop (5+ micro services, each with a dev and test environment)

Therefore, I want to second everyone's comments about supporting multiple schemas in the free version. The Saas offering isn't going to help my dev and test environments on my laptop.

@Relaxe111

This comment has been minimized.

Copy link

@Relaxe111 Relaxe111 commented Feb 9, 2020

This is absolutely true! That's why I kindly asked to reconsider multi-tenancy to be open-source! I try for last year to introduce dgraph in my company but the biggest barrier to convincing my boss is support for multi-tenancy. Last week we had a new discussion and again everyone in my company is skeptical against dgraph because of lack of multi-tenancy in the community version. I can confirm that for multi-tenancy my company isn't willing to pay. Having multiple instances of DB for different Environments dev/ test is an issue in dgraph. I just hope that owners of such amazing product will understand that multi-tenancy will not be an argument for companies to buy a license to use dgraph enterprise. But having multi-tenancy open-source will be an argument to adopt it. It is more likely to switch to Enterprise for a company that uses already dgraph than to switch from well known traditional DBS to dgraph enterprise or dgraph open-source without multi-tenancy. I truly believe that bigger adoption of dgraph will be, bigger Enterprise mass dgraph will have. But without open-source multi-tenancy, the majority of potential future Enterprise users will ignore dgraph to adopt it now as open source.

@marvin-hansen

This comment has been minimized.

Copy link

@marvin-hansen marvin-hansen commented Feb 10, 2020

My company isn't going to pay for multi-schema / multi-tenancy because any other OSS DB brings it already to the table. Charging for something you get in most OSS DB's for free and that is legally mandated in certain industries or countries is just completely ridiculous. Please start listening to your customers!

Please add GPU acceleration or support for in-DB machine learning to bring at least some tangible value to the enterprise version that would justify a purchase.

#4678

#4608

@bronzels

This comment has been minimized.

Copy link

@bronzels bronzels commented Feb 11, 2020

Hope Gremlin support in Q1 pls.

@thefliik

This comment has been minimized.

Copy link

@thefliik thefliik commented Feb 13, 2020

Generally asking the community here, for Gremlin support, I'm curious if that's really a deal breaker. The port of queries from Gremlin to GraphQL is probably a one-time effort -- and you get the benefit of new, easy to use tech, JSON support, with a growing ecosystem of tools and editors to support creating queries, exploring data, etc. (GraphQL has so many editors).

@manishrjain I imagine most people currently investigating dgraph are people with existing graph db needs and, historically, many existing graph dbs use gremlin. Personally, having used Cypher, Gremlin, and SQL (GraphQL too, though I've never seen it used for directly querying a db), as well as some proprietary APIs like Firestore, I'd say that Gremlin is by far the worst (and one of the reasons why graph dbs are a nich product). I can appreciate someone asking for support because porting an app over to a new language can be a huge undertaking, but, long term, I really hope Gremlin dies in favor of other languages (e.g. upcoming GQL standard). Providing tooling to help port existing Gremlin apps to a newer query language might be a compromise.

I'm speculating, but I think one challenge for dgraph could be that, historically, graph database usage is mostly confined to backend engineers. It seems likely that most of the current dgraph users are backend folks as well. This would contrast with GraphQL which is mostly a frontend query language (tho obviously it can be used server side as well). From my perspective, one of the most exciting aspects of dgraph is the idea that maybe in the future, I can use Apollo Client to query the backend directly from the frontend, eliminating a huge chunk of work in building out an API server (similar to what Firestore or Hasura can accomplish). This is probably not something that has any appeal to backend folks though.

@shekarm

This comment has been minimized.

Copy link
Member

@shekarm shekarm commented Feb 14, 2020

Hi, I investigated different databases and it appears that multi-tenancy is not something you get free with other databases either. Any implementation of multi-tenancy will require access control lists and other security-related features and most databases require an enterprise license for the same. In someways, Dgraph is following those models.

@Relaxe111

This comment has been minimized.

Copy link

@Relaxe111 Relaxe111 commented Feb 14, 2020

Hello @shekarm could you please give concrete examples of such databases?
Thnks.

@Relaxe111

This comment has been minimized.

Copy link

@Relaxe111 Relaxe111 commented Feb 14, 2020

I respectfully disagree with you. Multi-tenancy and ACL are different features which can't be put together. In my experience, most (if not all) dbs multi-tenancy is open source. But ACL not so many dbs offers that feature either free or Enterprise.

@seanlaff

This comment has been minimized.

Copy link

@seanlaff seanlaff commented Feb 15, 2020

Data isolation (which is falling under the multi-tenancy bullet) is critical for dgraph to see success in our company. We have many interested parties, but lacking that feature makes it a non-starter.

If dgraph found a place in our stack, I could see us growing into the enterprise tier (e.x needing granular ACL, fancier snapshot/restore, etc), however lacking rudimentary data isolation in the free-tier hampers our ability to start the journey/build PoCs.

Specifically- risk of schema collision is the real blocker

@shekarm

This comment has been minimized.

Copy link
Member

@shekarm shekarm commented Feb 18, 2020

Currently, Dgraph implements multi-tenancy and user authentication as part of our ACL implementation, to validate users and their access credentials. We will look at this implementation and see if it makes sense to isolate the credential authorization required for multi-tenancy.

@shekarm

This comment has been minimized.

Copy link
Member

@shekarm shekarm commented Feb 18, 2020

On the issue of GPU acceleration, there is a separate issue opened by @marvin-hansen and it is being tracked separately here.

@seanlaff

This comment has been minimized.

Copy link

@seanlaff seanlaff commented Feb 18, 2020

@shekarm Thanks for your consideration- I think it maps to the elasticsearch pattern of supporting multiple indices in the free tier, and then supporting document (and field) level security in the enterprise tier.

@marvin-hansen

This comment has been minimized.

Copy link

@marvin-hansen marvin-hansen commented Feb 19, 2020

@manishrjain @shekarm

Please consider an in-memory mode to boost performance, as reported in issue #4813

GPU acceleration is hard and complex to implement, but an in-memory mode gives about the same performance, requires less complexity, and scales much cheaper because adding a few more 100GM memory cost way less than adding a few more high-end GPU's.

Ludicrous mode -- Idea is to allow a mode of Dgraph, which gives up on some "correctness" things > to achieve maximum performance.

Do not sacrifice "correctness" for performance, otherwise, Dgraph ends-up being no different.
Use a proper in-memory mode like Redis-graph, but actually useful.

@larvinloy

This comment has been minimized.

Copy link

@larvinloy larvinloy commented Feb 19, 2020

@thefliik Could you give some examples as to why

Gremlin is by far the worst

@thefliik

This comment has been minimized.

Copy link

@thefliik thefliik commented Feb 20, 2020

@larvinloy are you asking as a dgraph employee? Or are you just curious? The difference being that the first is "on topic" for this thread, the second is probably "off topic."

@larvinloy

This comment has been minimized.

Copy link

@larvinloy larvinloy commented Feb 21, 2020

@thefliik

@larvinloy are you asking as a dgraph employee? Or are you just curious? The difference being that the first is "on topic" for this thread, the second is probably "off topic."

Just curious. Feel free to not respond if it's off topic.

@marvin-hansen

This comment has been minimized.

Copy link

@marvin-hansen marvin-hansen commented Feb 22, 2020

@larvinloy @thefliik

Gremlin can be used to perform any arbitrary graph query, but it lacks much of the intuitive and clean syntax made available by SPARQL.

As the DGraph engineers aleady have figured out, GrapQL alone doesn't do the trick of querying an RDF graph effectively that's why they came up with the +/- extension.

GraphQL and it's extension still have some way to go but it's certainly a very welcome addition to have a native GraphQL endpoint in DGraph.

I wouldn't call Gremlin the worst, but I'm still left wondering why DGraph never even considered SPARQL as it's specifically made for RDF graphs and is one of the very few mature query languages that can uniformly query graphs, relational data, XML, and JSON. Due to it's strict predicate namespace, cross origin queries are a piece of cake in SPARQL and thus it's pretty useful in complex system integration. At least you don't have the foreign entity mess you have to deal with in an Apollo federation.

However, I haven't seen anything about SPARQL so am I correct to assume that DGraph isn't going into that direction?

@iluminae

This comment has been minimized.

Copy link

@iluminae iluminae commented Feb 27, 2020

hey guys - excellent work so far.

I would like to say that rudimentary data isolation is an absolute must for me to start using dgraph. For my use case, I would currently have to spin up a dgraph instance for every customer - which is not possible operationally. Customers will have conflicting schemas, which is not something an ACL can fix. Call this multi-tenancy if you wish - but I am actually not interested in ACLs. I need data isolation as far as other databases give me (postgresql, mysql, elasticsearch, etc. - all just have another directory on disk segmented by "database"). If each GQL call to dgraph selected one and only one "database", represented by it's own directory on disk, without any cross-talk, it would fit my need exactly.

@larvinloy

This comment has been minimized.

Copy link

@larvinloy larvinloy commented Mar 1, 2020

@marvin-hansen What's intuitive is subjective. From my experience of using Neptune, I'm yet to see a query language as powerful as Gremlin. Two of the big features that I miss in every other graph query language are the ability to set query timeout on individual hops (inside the same query), and the ability to do recursive queries until a certain condition is met (i.e. without having to specify depth).

SparkQL might feel cleaner to a lot of folks because of it's similarity with SQL, but I'm yet to see a language for Graph Dbs that is as rich and mature as Gremlin.

@marvin-hansen

This comment has been minimized.

Copy link

@marvin-hansen marvin-hansen commented Mar 11, 2020

@manishrjain @shekarm @MichelDiz

Please support CSV data import in Dgraph.

Details in ticket #4920

@fpattyn

This comment has been minimized.

Copy link

@fpattyn fpattyn commented Mar 18, 2020

Can you reconsider adding multi-tenancy to the open source distribution? Being able to define different graphs in one database helps to solve the 'provenance' issue when integrating data from different sources. Every source adds data to a separate graph. It's a cool feature to be able to show where each data source contributed to a the complete knowledge graph.

@ganisback

This comment has been minimized.

Copy link

@ganisback ganisback commented Apr 1, 2020

If does not support multi-tenancy in open source distribution, I have to back to janusgraph.
I think it's a basic feature for a graph database.

@Willem520

This comment has been minimized.

Copy link

@Willem520 Willem520 commented Apr 2, 2020

Hi,I want to know when the multi-tenancy will be supported in open source distribution. it is really important to me.I have used in product. single-tenancy. it means that I have to allocate server resources to each business. if I have 100 business, it will took a lot of server resources

@ganisback

This comment has been minimized.

Copy link

@ganisback ganisback commented Apr 2, 2020

@Willem520, from this roadmap, they do not plan support this feature in community edition, it will be included in enterprise edition.

@Relaxe111

This comment has been minimized.

Copy link

@Relaxe111 Relaxe111 commented Apr 2, 2020

Well this is not exactly that. According to comments written earlier, they will consider if it will make sense to open source multi-tenancy. So I think than to speculate around this issue, will be better to wait for an official announcement from Dgraph team. )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.