Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can’t Query Type Data Inserted by Bulk Loader #3968

Open
cactus222 opened this issue Sep 11, 2019 · 2 comments

Comments

@cactus222
Copy link

commented Sep 11, 2019

What version of Dgraph are you using?

Dgraph version : v1.1.0
Dgraph SHA-256 : 7d4294a80f74692695467e2cf17f74648c18087ed7057d798f40e1d3a31d2095
Commit SHA-1 : ef7cdb2
Commit timestamp : 2019-09-04 00:12:51 -0700
Branch : HEAD
Go version : go1.12.7

Have you tried reproducing the issue with the latest release?

Yes

What is the hardware spec (RAM, OS)?

Attempted on Ubuntu 18.04 and using Dgraph docker

Steps to reproduce the issue (command/config used to run Dgraph).

Start Zero (the commands are actually in the docker-compose file)

dgraph zero --my=zero:5080

Feed data and schema into Dgraph bulk

data file

_:brand1 <dgraph.type> "Brand" .
_:brand1 <name> "brand1" .
_:brand2 <dgraph.type> "Brand" .
_:brand2 <name> "brand2" .

_:product1 <dgraph.type> "Product" .
_:product1 <brand> _:brand1 .
_:product1 <name> "name1" .
_:product1 <pid> "abc" .

_:product2 <dgraph.type> "Product" .
_:product2 <brand> _:brand2 .
_:product2 <name> "name2" .
_:product2 <pid> "123" .

_:product3 <dgraph.type> "Product" .
_:product3 <brand> _:brand2 .
_:product3 <name> "name3" .
_:product3 <pid> "ab1" .

schema file

type Product {
  name: string
  brand: uid
  pid: string
}

type Brand {
  name: string
}

name: string @index(term)  .
pid: string @index(hash)  .
brand: uid .

Run Dgraph Bulk on data and schema

dgraph bulk --schema ./data/smallschema.txt -f ./data/small.txt  --format=rdf --reduce_shards=2 --num_go_routines=2 --map_shards=2

Start alphas pointing to the generated directories

dgraph alpha --my=server:7080 --lru_mb=2048 --zero=zero:5080 -p out/0/p/
dgraph alpha --my=server:7081 --lru_mb=2048 --zero=zero:5080 -p out/1/p/ -o=1

Expected behaviour and actual result.

Query all objects with type Product

  q(func: type(Product)) {
    name
    uid
  }

Expected all product objects returned
Actual result is an empty result

Discussion link on forums: https://discuss.dgraph.io/t/cant-query-type-data-on-bulk-loader/5038

@danielmai danielmai added the kind/bug label Sep 11, 2019

@pawanrawal pawanrawal self-assigned this Sep 11, 2019

@pawanrawal

This comment has been minimized.

Copy link
Member

commented Sep 11, 2019

This happens because of the following

  1. When loading data with the given dataset using bulk loader with reduce_shards as 3, the data for dgraph.type lies in the 3rd output shard i.e. out/2/p.
  2. When the 1st alpha node is started with out/0/p it proposes initial schema for dgraph.type and starts serving the tablet.
    gr.proposeInitialSchema()
  3. When 3rd alpha node comes, although it has the data for dgraph.type it doesn't serve the predicate as it finds some other node is already serving it.

I think what we should do instead is not propose initial schema on startup but propose it when the first mutation for dgraph.type happens on the cluster as the user would do mutations only after starting all the nodes serving all different shards.

Note - This problem could happen with any of the internal predicates defined in

initialSchema = append(initialSchema, &pb.SchemaUpdate{
when loading data using the bulk loader.

@martinmr

This comment has been minimized.

Copy link
Member

commented Sep 13, 2019

I think the cleanest solution should be to force the bulk loader to allocate the reserved predicates in the first shard. But I don't have a lot of insight into the bulk loader so it might not be that easy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.