Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-5380][GraphX] Solve an ArrayIndexOutOfBoundsException when build graph with a file format error #4176

Closed
wants to merge 2 commits into from

Conversation

Leolh
Copy link

@Leolh Leolh commented Jan 23, 2015

When I build a graph with a file format error, there will be an ArrayIndexOutOfBoundsException

…if the format of the source file is wrong

There will be an ArrayIndexOutOfBoundsException if the format of the source file is wrong
@Leolh Leolh changed the title [SPARK-3650][GraphX] There will be an ArrayIndexOutOfBoundsException if ... [SPARK-3650][GraphX] Solve an ArrayIndexOutOfBoundsException when build graph with a file Jan 23, 2015
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@Leolh Leolh changed the title [SPARK-3650][GraphX] Solve an ArrayIndexOutOfBoundsException when build graph with a file [SPARK-5380][GraphX] Solve an ArrayIndexOutOfBoundsException when build graph with a file Jan 23, 2015
@Leolh Leolh changed the title [SPARK-5380][GraphX] Solve an ArrayIndexOutOfBoundsException when build graph with a file [SPARK-5380][GraphX] Solve an ArrayIndexOutOfBoundsException when build graph with a file format error Jan 23, 2015
@srowen
Copy link
Member

srowen commented Jan 23, 2015

The argument I can see against it is simply that you want to fail fast on bad input, not sort of continue. If all of the file is like this you'll just log a zillion warnings instead of just stopping. How about changing the warning to throw a standard exception like IllegalArgumentException using require?

@maropu
Copy link
Member

maropu commented Feb 4, 2015

I wonder if stopping the process is the best solution.
If there is only one illegal entry in a last line, we need to re-try loading a whole file,
which is time-consuming.

An other idea is that illegal entries are silently re-directed into a file or something.
Finally, # of the re-directed entries is only output in the last phase of bulk-loading.
Moreover, I think that it'd be better to add a new API to append these entries into existing Graph
such as GraphOps.addEdges(val edges: RDD[Edge[ED]]).

@asfgit asfgit closed this in 575d2df Feb 6, 2015
@srowen
Copy link
Member

srowen commented Feb 6, 2015

Since this was a very simple change, I went ahead and merged it as my first test-drive of the commit bit. Looks like it worked!

@JoshRosen
Copy link
Contributor

I've also merged this into branch-1.3 (1.3.0).

asfgit pushed a commit that referenced this pull request Feb 6, 2015
…ld graph with a file format error

When I build a graph with a file format error, there will be an ArrayIndexOutOfBoundsException

Author: Leolh <leosandylh@gmail.com>

Closes #4176 from Leolh/patch-1 and squashes the following commits:

94f6d22 [Leolh] Update GraphLoader.scala
23767f1 [Leolh] [SPARK-3650][GraphX] There will be an ArrayIndexOutOfBoundsException if the format of the source file is wrong
@srowen
Copy link
Member

srowen commented Feb 6, 2015

Ah sounds like I should have done more back-porting. I was conservative and only put these several small fixes in master. Go ahead and I'll recalibrate a little accordingly.

@JoshRosen
Copy link
Contributor

@srowen From now until we cut a 1.3.0 preview, I'd merge all bug fixes into at least master and branch-1.3. We'll raise the bar for getting fixes into branch-1.3 as we get closer to the RC voting period.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants