Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-10289] [SQL] Fixes JSONRelation refreshing on read path #8460

Conversation

liancheng
Copy link
Contributor

PR #8035 added a refresh() call in JSONRelation.buildScan(). However, when reading a partition table, buildScan() is invoked for each individual partition. This PR works around this issue by special casing JSONRelation in DataSourceStrategy. We should revisit this issue and deliver a more appropriate fix later.

@@ -562,7 +562,7 @@ abstract class HadoopFsRelation private[sql](maybePartitionSpec: Option[Partitio
})
}

private[sql] def buildScan(
final private[sql] def buildScan(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method was marked as final at the very beginning, and wasn't designed to be overridden by subclasses. The final decorator was removed in #8035 to work around the refreshing issue.

@liancheng
Copy link
Contributor Author

cc @yhuai

@SparkQA
Copy link

SparkQA commented Aug 26, 2015

Test build #41624 has finished for PR 8460 at commit ac00522.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class GaussianMixtureModel @Since("1.3.0") (
    • class KMeansModel @Since("1.1.0") (@Since("1.0.0") val clusterCenters: Array[Vector])
    • class PowerIterationClusteringModel @Since("1.3.0") (
    • class StreamingKMeansModel @Since("1.2.0") (
    • class StreamingKMeans @Since("1.2.0") (
    • class ChiSqSelectorModel @Since("1.3.0") (
    • class ChiSqSelector @Since("1.3.0") (
    • class ElementwiseProduct @Since("1.4.0") (
    • class IDF @Since("1.2.0") (@Since("1.2.0") val minDocFreq: Int)
    • class Normalizer @Since("1.1.0") (p: Double) extends VectorTransformer
    • class PCA @Since("1.4.0") (@Since("1.4.0") val k: Int)
    • class StandardScaler @Since("1.1.0") (withMean: Boolean, withStd: Boolean) extends Logging
    • class StandardScalerModel @Since("1.3.0") (
    • class PoissonGenerator @Since("1.1.0") (
    • class ExponentialGenerator @Since("1.3.0") (
    • class GammaGenerator @Since("1.3.0") (
    • class LogNormalGenerator @Since("1.3.0") (
    • abstract class GeneralizedLinearModel @Since("1.0.0") (
    • class IsotonicRegressionModel @Since("1.3.0") (
    • case class LabeledPoint @Since("1.0.0") (
    • class LassoModel @Since("1.1.0") (
    • class LinearRegressionModel @Since("1.1.0") (
    • class RidgeRegressionModel @Since("1.1.0") (
    • class MultivariateGaussian @Since("1.3.0") (
    • case class BoostingStrategy @Since("1.4.0") (
    • class Strategy @Since("1.3.0") (
    • class DecisionTreeModel @Since("1.0.0") (
    • class Node @Since("1.2.0") (
    • class Predict @Since("1.2.0") (
    • class RandomForestModel @Since("1.2.0") (
    • class GradientBoostedTreesModel @Since("1.2.0") (

@liancheng
Copy link
Contributor Author

#8469 supersedes this one, closing it.

@liancheng liancheng closed this Aug 26, 2015
@liancheng liancheng deleted the spark-10289/refresh-partitioned-json-relations branch August 26, 2015 23:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants