Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-28912][BRANCH-2.4] Fixed MatchError in getCheckpointFiles() #25719

Closed
wants to merge 1 commit into from

Conversation

avkgh
Copy link
Contributor

@avkgh avkgh commented Sep 7, 2019

What changes were proposed in this pull request?

This change fixes issue SPARK-28912.

Why are the changes needed?

If checkpoint directory is set to name which matches regex pattern used for checkpoint files then logs are flooded with MatchError exceptions and old checkpoint files are not removed.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manually.

  1. Start Hadoop in a pseudo-distributed mode.

  2. In another terminal run command nc -lk 9999

  3. In the Spark shell execute the following statements:

    val ssc = new StreamingContext(sc, Seconds(30))
    ssc.checkpoint("hdfs://localhost:9000/checkpoint-01")
    val lines = ssc.socketTextStream("localhost", 9999)
    val words = lines.flatMap(_.split(" "))
    val pairs = words.map(word => (word, 1))
    val wordCounts = pairs.reduceByKey(_ + _)                                
    wordCounts.print()                               
    ssc.start()                       
    ssc.awaitTermination()

@HyukjinKwon
Copy link
Member

ok to test

@HyukjinKwon HyukjinKwon changed the title [SPARK-28912] Fixed MatchError in getCheckpointFiles() [SPARK-28912][BRANCH-2.4] Fixed MatchError in getCheckpointFiles() Sep 8, 2019
@SparkQA
Copy link

SparkQA commented Sep 8, 2019

Test build #110291 has finished for PR 25719 at commit 1a55258.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 9, 2019

Test build #4858 has finished for PR 25719 at commit 1a55258.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to branch-2.4.

HyukjinKwon pushed a commit that referenced this pull request Sep 9, 2019
### What changes were proposed in this pull request?

This change fixes issue SPARK-28912.

### Why are the changes needed?

If checkpoint directory is set to name which matches regex pattern used for checkpoint files then logs are flooded with MatchError exceptions and old checkpoint files are not removed.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Manually.

1. Start Hadoop in a pseudo-distributed mode.

2. In another terminal run command  nc -lk 9999

3. In the Spark shell execute the following statements:

    ```scala
    val ssc = new StreamingContext(sc, Seconds(30))
    ssc.checkpoint("hdfs://localhost:9000/checkpoint-01")
    val lines = ssc.socketTextStream("localhost", 9999)
    val words = lines.flatMap(_.split(" "))
    val pairs = words.map(word => (word, 1))
    val wordCounts = pairs.reduceByKey(_ + _)
    wordCounts.print()
    ssc.start()
    ssc.awaitTermination()
    ```

Closes #25719 from avkgh/SPARK-28912-branch-2.4.

Authored-by: avk <nullp7r@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
@srowen srowen closed this Sep 9, 2019
scunniff pushed a commit to scunniff/nomad-spark that referenced this pull request Nov 10, 2020
### What changes were proposed in this pull request?

This change fixes issue SPARK-28912.

### Why are the changes needed?

If checkpoint directory is set to name which matches regex pattern used for checkpoint files then logs are flooded with MatchError exceptions and old checkpoint files are not removed.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Manually.

1. Start Hadoop in a pseudo-distributed mode.

2. In another terminal run command  nc -lk 9999

3. In the Spark shell execute the following statements:

    ```scala
    val ssc = new StreamingContext(sc, Seconds(30))
    ssc.checkpoint("hdfs://localhost:9000/checkpoint-01")
    val lines = ssc.socketTextStream("localhost", 9999)
    val words = lines.flatMap(_.split(" "))
    val pairs = words.map(word => (word, 1))
    val wordCounts = pairs.reduceByKey(_ + _)
    wordCounts.print()
    ssc.start()
    ssc.awaitTermination()
    ```

Closes apache#25719 from avkgh/SPARK-28912-branch-2.4.

Authored-by: avk <nullp7r@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants