Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alpha crashes with fault address in groupby.go #3642

Closed
jarifibrahim opened this issue Jul 9, 2019 · 6 comments

Comments

@jarifibrahim
Copy link
Member

commented Jul 9, 2019

What version of Dgraph are you using?

727b246

Have you tried reproducing the issue with latest release?

Yes

What is the hardware spec (RAM, OS)?

N/A

Steps to reproduce the issue (command/config used to run Dgraph).

  1. Run zero using
dgraph zero -w /home/ibrahim/m2_drive/flock_backup/zero1/zw -o 0 --idx=1 --my=localhost:5080 --replicas=1 --logtostderr -v=2 --bindall
  1. Run alpha using
dgraph alpha -p /home/ibrahim/m2_drive/flock_backup/alpha1/p -w /home/ibrahim/m2_drive/flock_backup/alpha1/w -o 100 --my=localhost:7180 --lru_mb=1024 --zero=localhost:5080 --logtostderr -v=2
  1. Apply the following diff on flock agent's main.go
diff --git a/main.go b/main.go
index c2cf6ec..91e8da0 100644
--- a/main.go
+++ b/main.go
@@ -52,8 +52,8 @@ id_str: string @index(exact) @upsert .
 created_at: dateTime @index(hour) .
 hashtags: [string] @index(exact) .
 
-author: uid @count @reverse .
-mention: uid @reverse .
+author: [uid] @count @reverse .
+mention: [uid] @reverse .
 `
 
        cDgraphTweetQuery = `
  1. Run flock agent
go run main.go -a ":9180"
  1. Apply the following to flock client's main.go (without this change some of the queries in flock were failing)
diff --git a/client/main.go b/client/main.go
index c1eab19..5913475 100644
--- a/client/main.go
+++ b/client/main.go
@@ -547,7 +547,7 @@ type querySix struct {
 
 func (q *querySix) getParams(dgr *dgo.Dgraph) error {
        // we subtract 41 hours because that's the latest data we get from twitter
-       curTime := time.Now().Add(-41 * time.Hour)
+       curTime := time.Now().Add(-100 * time.Hour)
 
        query := fmt.Sprintf(`
 {
@@ -619,7 +619,7 @@ type querySeven struct {
 
 func (q *querySeven) getParams(dgr *dgo.Dgraph) error {
        // we subtract 41 hours because that's the latest data we get from twitter
-       curTime := time.Now().Add(-41 * time.Hour)
+       curTime := time.Now().Add(-100 * time.Hour)
 
        query := fmt.Sprintf(`
 {
@@ -693,7 +693,7 @@ type queryEight struct {
 
 func (q *queryEight) runQuery(dgr *dgo.Dgraph) error {
        // we subtract 41 hours because that's the latest data we get from twitter
-       curTime := time.Now().Add(-41 * time.Hour)
+       curTime := time.Now().Add(-100 * time.Hour)
 
        query := fmt.Sprintf(`
 {
@@ -787,7 +787,7 @@ type queryNine struct {
 
 func (q *queryNine) getParams(dgr *dgo.Dgraph) error {
        // we subtract 41 hours because that's the latest data we get from twitter
-       curTime := time.Now().Add(-41 * time.Hour)
+       curTime := time.Now().Add(-100 * time.Hour)
 
        query := fmt.Sprintf(`
 {
  1. Run flock client
go run main.go -a ":9180"

Expected behaviour and actual result.

After a few seconds alpha crashes. Here's the stack trace of alpha https://gist.github.com/jarifibrahim/d517efefd7f7c334a9bb50552041cc11

@martinmr

This comment has been minimized.

Copy link
Member

commented Jul 12, 2019

@jarifibrahim Can you check if the following change to groupby.go in line 225 helps?

Change from

ul := child.uidMatrix[i]

to

if i >= len(child.uidMatrix) {
  continue
}
ul := child.uidMatrix[i]

I think that would stop the crash but I am not sure if it's the right solution, or why uidMatrix does not have enough entries.

@jarifibrahim

This comment has been minimized.

Copy link
Member Author

commented Jul 15, 2019

@martinmr The change stopped the crash.

@martinmr

This comment has been minimized.

Copy link
Member

commented Jul 15, 2019

ok.still not sure of the root cause but i'll open a PR with the fix. We have similar checks in our codebase so this should be fine.

@martinmr

This comment has been minimized.

Copy link
Member

commented Jul 15, 2019

My bad, the crash is in the next line while trying to access ul.Uids. So ul is nil. I am guessing the error is not happening every time so it looked like my original suggestion fixed it. For now, I'll open a PR to check that ul is non-nil.

martinmr added a commit that referenced this issue Jul 15, 2019
For some reason, there are nil values stored in the uidMatrix, which
results in a segmentation fault when trying to access the Uids field.
The root cause of this issue is still unclear but this change adds an
additional check to prevent the segmentation fault.
@martinmr

This comment has been minimized.

Copy link
Member

commented Jul 15, 2019

martinmr added a commit that referenced this issue Jul 15, 2019
For some reason, there are nil values stored in the uidMatrix, which
results in a segmentation fault when trying to access the Uids field.
The root cause of this issue is still unclear but this change adds an
additional check to prevent the segmentation fault.
@martinmr

This comment has been minimized.

Copy link
Member

commented Jul 15, 2019

Fix has been submitted so closing this for now. Feel free to reopen if you happen to see it again but the original segfault should be gone now.

@martinmr martinmr closed this Jul 15, 2019
dna2github added a commit to dna2fork/dgraph that referenced this issue Jul 19, 2019
For some reason, there are nil values stored in the uidMatrix, which
results in a segmentation fault when trying to access the Uids field.
The root cause of this issue is still unclear but this change adds an
additional check to prevent the segmentation fault.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.