Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dgraph live crashes when loading 21mil sample freebase data into a t2.medium cluster #4528

Closed
rahst12 opened this issue Jan 9, 2020 · 3 comments
Labels
area/import-export Issues related to data import and export.

Comments

@rahst12
Copy link

rahst12 commented Jan 9, 2020

What version of Dgraph are you using?

1.1.1

Have you tried reproducing the issue with the latest release?

yes

What is the hardware spec (RAM, OS)?

t2.medium, centos

Steps to reproduce the issue (command/config used to run Dgraph).

  1. Install Multi-host dgraph with the following extra parameters:
docker-machine create --driver amazonec2 --amazonec2-instance-type t2.medium --amazonec2-ami ami-02eac2c0129f6376b --amazonec2-ssh-user centos aws01
docker-machine create --driver amazonec2 --amazonec2-instance-type t2.medium --amazonec2-ami ami-02eac2c0129f6376b --amazonec2-ssh-user centos aws02
docker-machine create --driver amazonec2 --amazonec2-instance-type t2.medium --amazonec2-ami ami-02eac2c0129f6376b --amazonec2-ssh-user centos aws03
  1. Setup dgraph
  2. Get data
wget "https://github.com/dgraph-io/benchmarks/blob/master/data/21million.rdf.gz?raw=true" -O 21million.rdf.gz -q
wget "https://raw.githubusercontent.com/dgraph-io/benchmarks/master/data/21million.schema" -O 21million.schema -q
  1. Run dgraph live:
dgraph live --files 21million.rdf.gz --alpha 172.31.86.65:9080,172.31.93.166:9080,172.31.87.103:9080 --zero 172.31.86.65:5080 --verbose -c 1 --schema 21million.
schema
  1. Wait for error:
[06:33:15Z] Elapsed: 10m20s Txns: 4042 N-Quads: 4042000 N-Quads/s [last 5s]:  2800 Aborts: 0
[06:33:20Z] Elapsed: 10m25s Txns: 4049 N-Quads: 4049000 N-Quads/s [last 5s]:  1400 Aborts: 0
[06:33:25Z] Elapsed: 10m30s Txns: 4050 N-Quads: 4050000 N-Quads/s [last 5s]:   200 Aborts: 0
[06:33:30Z] Elapsed: 10m35s Txns: 4050 N-Quads: 4050000 N-Quads/s [last 5s]:     0 Aborts: 0
2020/01/09 06:33:30 transport is closing
github.com/dgraph-io/dgraph/x.Fatalf
        /tmp/go/src/github.com/dgraph-io/dgraph/x/error.go:101
github.com/dgraph-io/dgraph/dgraph/cmd/live.handleError
        /tmp/go/src/github.com/dgraph-io/dgraph/dgraph/cmd/live/batch.go:104
github.com/dgraph-io/dgraph/dgraph/cmd/live.(*loader).request
        /tmp/go/src/github.com/dgraph-io/dgraph/dgraph/cmd/live/batch.go:156
github.com/dgraph-io/dgraph/dgraph/cmd/live.(*loader).makeRequests
        /tmp/go/src/github.com/dgraph-io/dgraph/dgraph/cmd/live/batch.go:169
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1357

Expected behaviour and actual result.

@MichelDiz
Copy link
Contributor

Hey @rahst12, in case of live load. You should upgrade to at least t2.xlarge (I recommend t2.2xlarge for long loads) and when it finishes just downgrade to t2.medium.

@sleto-it sleto-it added the area/import-export Issues related to data import and export. label Apr 9, 2020
@MichelDiz
Copy link
Contributor

@rahst12 Try the new feature called ludicrous mode (eventually consistent writes, you gonna have way better results - I guarantee). It hasn't been released yet, use it sparingly. And disable it after you have loaded the dataset.

This mode is not recommended for financial systems.

When the next version comes, change the tag for it.

docker pull dgraph/dgraph:master
docker run -d -p 5080:5080 -p 6080:6080 -p 8080:8080 -p 9080:9080 -p 8000:8000 -v ~/dgraph:/dgraph --name dgraph dgraph/dgraph:master dgraph zero --ludicrous_mode
docker exec -d dgraph dgraph alpha --zero localhost:5080 --ludicrous_mode
docker exec -it dgraph sh

AND

curl --progress-bar -LS -o 21million.rdf.gz "https://github.com/dgraph-io/benchmarks/blob/master/data/release/21million.rdf.gz?raw=true"
curl --progress-bar -LS -o release.schema "https://github.com/dgraph-io/benchmarks/blob/master/data/release/release.schema?raw=true"

ls

Then

dgraph live -f 21million.rdf.gz -s release.schema --conc=100 --batch=10000

Decrease the values of conc and batch according to your machine (do some testing of parameters to find out which is the best)

@minhaj-shakeel
Copy link
Contributor

Github issues have been deprecated.
This issue has been moved to discuss. You can follow the conversation there and also subscribe to updates by changing your notification preferences.

drawing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/import-export Issues related to data import and export.
Development

No branches or pull requests

4 participants