Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

twitter_classifier/Collect.scala: Pending TODO can be completed: SPARK-3390 was fixed in Spark 1.2.0. #50

Closed
MiguelPeralvo opened this issue Jan 4, 2015 · 0 comments · Fixed by #51

Comments

@MiguelPeralvo
Copy link
Contributor

Line 42 of reference-apps/twitter_classifier/scala/src/main/scala/com/databricks/apps/twitter_classifier/Collect.scala can now be safely removed, as SPARK-3390 was fixed in pull request #2364 for Apache 1.2.0.

If you use Spark 1.2.0, this is the code that can be removed:

.filter(!_.contains("boundingBoxCoordinates")) // TODO(vida): Remove this workaround when SPARK-3390 is fixed.

If you remove it for Spark 1.1.0, Collect.java won't break when run, but ExamineAndTrain.scala will do, with a "scala.MatchError: StructType(List())" exception. It will be caused by the "boundingBoxCoordinates" json entries, as Spark 1.1.0 doesn't handle them properly.

MiguelPeralvo added a commit to MiguelPeralvo/reference-apps that referenced this issue Jan 4, 2015
Fixes [issue databricks#50: Pending TODO can be completed: SPARK-3390 was fixed] (databricks#50). I've tested it in Spark 1.1.0 and 1.2.0 and it works, as expected.
@MiguelPeralvo MiguelPeralvo changed the title twitter_classifier/Collect.scala: Pending TODO can be completed: SPARK-3390 was fixed. twitter_classifier/Collect.scala: Pending TODO can be completed: SPARK-3390 was fixed in Spark 1.2.0. Jan 4, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant