Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-22377][BUILD] Use /usr/sbin/lsof if lsof does not exists in release-build.sh #19695

Closed
wants to merge 1 commit into from

Conversation

HyukjinKwon
Copy link
Member

What changes were proposed in this pull request?

This PR proposes to use /usr/sbin/lsof if lsof is missing in the path to fix nightly snapshot jenkins jobs. Please refer #19359 (comment):

Looks like some of the snapshot builds are having lsof issues:

https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-2.1-maven-snapshots/182/console

https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-2.2-maven-snapshots/134/console

spark-build/dev/create-release/release-build.sh: line 344: lsof: command not found
usage: kill [ -s signal | -p ] [ -a ] pid ...
kill -l [ signal ]

Up to my knowledge, the full path of lsof is required for non-root user in few OSs.

How was this patch tested?

Manually tested as below:

#!/usr/bin/env bash

LSOF=lsof
if ! hash $LSOF 2>/dev/null; then
  echo "a"
  LSOF=/usr/sbin/lsof
fi

$LSOF -P | grep "a"

@HyukjinKwon
Copy link
Member Author

cc @holdenk and @xynny, mind taking a look please?

@@ -130,6 +130,10 @@ else
fi
fi

LSOF=lsof
if ! hash $LSOF 2>/dev/null; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know much about this but is command better? https://stackoverflow.com/questions/592620/check-if-a-program-exists-from-a-bash-script not sure what the scripts here usually do (not at my keyboard)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure .. but I found hash is used in few places more than command within Spark:

grep -r "hash" . | grep ">/dev/null"
./dev/run-pip-tests:if hash virtualenv 2>/dev/null && [ ! -n "$USE_CONDA" ]; then
./dev/run-pip-tests:  if hash python2 2>/dev/null; then
./dev/run-pip-tests:  elif hash python 2>/dev/null; then
./dev/run-pip-tests:  if hash python3 2>/dev/null; then
./dev/run-pip-tests:elif hash conda 2>/dev/null; then
./dev/run-pip-tests:if ! hash pip 2>/dev/null; then
./sql/create-docs.sh:if ! hash python 2>/dev/null; then
./sql/create-docs.sh:if ! hash mkdocs 2>/dev/null; then

Looks:

grep -r "command" . | grep ">/dev/null"

not used in Spark but .. I am not sure ..

Copy link
Member Author

@HyukjinKwon HyukjinKwon Nov 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally use hash more somehow without a specific reason. If anyone prefers command specifically, I can change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m used to the hash check now as well, but that’s probably from editing Spark shell scripts more than any particular good practice.

Copy link
Member

@viirya viirya Nov 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If lsof's full path can be different in different OSes as @kiszk's comment #19695 (comment), seems command -v can help return the full path of lsof?

@SparkQA
Copy link

SparkQA commented Nov 8, 2017

Test build #83597 has finished for PR 19695 at commit c820237.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member

kiszk commented Nov 8, 2017

I realized that some OSs (e.g. Ubuntu) put lsof into /usr/bin and other OSs (e.g. CentOS) put lsof into /usr/sbin.
Could you apply this to dev/run-tests.py, too?

--- CentOS box ---
$ cat /etc/centos-release 
CentOS release 6.6 (Final)
$ ls /usr/bin/lsof /usr/sbin/lsof
ls: cannot access /usr/bin/lsof: No such file or directory
/usr/sbin/lsof
--- Ubuntu box ---
$ cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.3 LTS"
$ ls /usr/bin/lsof /usr/sbin/lsof
ls: cannot access '/usr/sbin/lsof': No such file or directory
/usr/bin/lsof

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Nov 9, 2017

Ah, for #19695 (comment) and #19695 (comment), actually the problem here is not the different path for lsof IIUC but lsof is not in the path somehow in few OSes for non-root users. I take @holdenk's words in #19359, it should be in the path to run this script. However, the problem is, it looks being failed specifically in few machines of Jenkins, which I assume CentOS 6.5.

So, yea, this is a band-aid fix to avoid the failures actually. Probably, I can add some comments here.

FWIW, for different paths, I just double checked they are different too (thanks for testing this out @kiszk):

Ubuntu 14.04 LTS

$ which lsof
/usr/bin/lsof

CentOS 7.3.1611

$ which lsof
/usr/sbin/lsof

@SparkQA
Copy link

SparkQA commented Nov 9, 2017

Test build #83625 has finished for PR 19695 at commit 0dcc12b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 9, 2017

Test build #83629 has finished for PR 19695 at commit a6642fa.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member

kiszk commented Nov 9, 2017

Jenkins, retest this please

@SparkQA
Copy link

SparkQA commented Nov 9, 2017

Test build #83638 has finished for PR 19695 at commit a6642fa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@xynny
Copy link

xynny commented Nov 11, 2017

The script looks fine to me, but I think it's just still a problem that all the jenkins slaves are actually running different OSes and the tests/builds aren't consistently run.

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Nov 12, 2017

Let's go ahead with this first and make a followup if another problem can be found, if you guys here wouldn't mind. I guess there wouldn't be a problem for now because it was hardcoded /usr/sbin/lsof before anyway.

@HyukjinKwon
Copy link
Member Author

Just rebased and fixed the comments a little bit more. Let me merge this one once it passes the tests.

@holdenk
Copy link
Contributor

holdenk commented Nov 13, 2017

LGTM pending jenkins.

@SparkQA
Copy link

SparkQA commented Nov 13, 2017

Test build #83765 has finished for PR 19695 at commit 5a513ac.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

@felixcheung, I am about to merge this one to master, branch-2.2 and branch-2.1. Would it be all okay to your release preparation?

@felixcheung
Copy link
Member

felixcheung commented Nov 13, 2017 via email

asfgit pushed a commit that referenced this pull request Nov 13, 2017
…lease-build.sh

## What changes were proposed in this pull request?

This PR proposes to use `/usr/sbin/lsof` if `lsof` is missing in the path to fix nightly snapshot jenkins jobs. Please refer #19359 (comment):

> Looks like some of the snapshot builds are having lsof issues:
>
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-2.1-maven-snapshots/182/console
>
>https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-2.2-maven-snapshots/134/console
>
>spark-build/dev/create-release/release-build.sh: line 344: lsof: command not found
>usage: kill [ -s signal | -p ] [ -a ] pid ...
>kill -l [ signal ]

Up to my knowledge,  the full path of `lsof` is required for non-root user in few OSs.

## How was this patch tested?

Manually tested as below:

```bash
#!/usr/bin/env bash

LSOF=lsof
if ! hash $LSOF 2>/dev/null; then
  echo "a"
  LSOF=/usr/sbin/lsof
fi

$LSOF -P | grep "a"
```

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #19695 from HyukjinKwon/SPARK-22377.

(cherry picked from commit c8b7f97)
Signed-off-by: hyukjinkwon <gurwls223@gmail.com>
asfgit pushed a commit that referenced this pull request Nov 13, 2017
…lease-build.sh

## What changes were proposed in this pull request?

This PR proposes to use `/usr/sbin/lsof` if `lsof` is missing in the path to fix nightly snapshot jenkins jobs. Please refer #19359 (comment):

> Looks like some of the snapshot builds are having lsof issues:
>
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-2.1-maven-snapshots/182/console
>
>https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-2.2-maven-snapshots/134/console
>
>spark-build/dev/create-release/release-build.sh: line 344: lsof: command not found
>usage: kill [ -s signal | -p ] [ -a ] pid ...
>kill -l [ signal ]

Up to my knowledge,  the full path of `lsof` is required for non-root user in few OSs.

## How was this patch tested?

Manually tested as below:

```bash
#!/usr/bin/env bash

LSOF=lsof
if ! hash $LSOF 2>/dev/null; then
  echo "a"
  LSOF=/usr/sbin/lsof
fi

$LSOF -P | grep "a"
```

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #19695 from HyukjinKwon/SPARK-22377.

(cherry picked from commit c8b7f97)
Signed-off-by: hyukjinkwon <gurwls223@gmail.com>
@HyukjinKwon
Copy link
Member Author

Merged to master, branch-2.2 and branch-2.1.

@HyukjinKwon
Copy link
Member Author

Let me keep my eyes on the builds.

@asfgit asfgit closed this in c8b7f97 Nov 13, 2017
@HyukjinKwon
Copy link
Member Author

Oh BTW thank you all for your review / comments, @srowen, @kiszk, @viirya, @holdenk, @xynny and @felixcheung.

ghost pushed a commit to dbtsai/spark that referenced this pull request Dec 18, 2017
## What changes were proposed in this pull request?

In [the environment where `/usr/sbin/lsof` does not exist](apache#19695 (comment)), `./dev/run-tests.py` for `maven` causes the following error. This is because the current `./dev/run-tests.py` checks existence of only `/usr/sbin/lsof` and aborts immediately if it does not exist.

This PR changes to check whether `lsof` or `/usr/sbin/lsof` exists.

```
/bin/sh: 1: /usr/sbin/lsof: not found

Usage:
 kill [options] <pid> [...]

Options:
 <pid> [...]            send signal to every <pid> listed
 -<signal>, -s, --signal <signal>
                        specify the <signal> to be sent
 -l, --list=[<signal>]  list all signal names, or convert one to a name
 -L, --table            list all signal names in a nice table

 -h, --help     display this help and exit
 -V, --version  output version information and exit

For more details see kill(1).
Traceback (most recent call last):
  File "./dev/run-tests.py", line 626, in <module>
    main()
  File "./dev/run-tests.py", line 597, in main
    build_apache_spark(build_tool, hadoop_version)
  File "./dev/run-tests.py", line 389, in build_apache_spark
    build_spark_maven(hadoop_version)
  File "./dev/run-tests.py", line 329, in build_spark_maven
    exec_maven(profiles_and_goals)
  File "./dev/run-tests.py", line 270, in exec_maven
    kill_zinc_on_port(zinc_port)
  File "./dev/run-tests.py", line 258, in kill_zinc_on_port
    subprocess.check_call(cmd, shell=True)
  File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '/usr/sbin/lsof -P |grep 3156 | grep LISTEN | awk '{ print $2; }' | xargs kill' returned non-zero exit status 123
```

## How was this patch tested?

manually tested

Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com>

Closes apache#19998 from kiszk/SPARK-22813.
@HyukjinKwon HyukjinKwon deleted the SPARK-22377 branch January 2, 2018 03:37
MatthewRBruce pushed a commit to Shopify/spark that referenced this pull request Jul 31, 2018
…lease-build.sh

## What changes were proposed in this pull request?

This PR proposes to use `/usr/sbin/lsof` if `lsof` is missing in the path to fix nightly snapshot jenkins jobs. Please refer apache#19359 (comment):

> Looks like some of the snapshot builds are having lsof issues:
>
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-2.1-maven-snapshots/182/console
>
>https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-2.2-maven-snapshots/134/console
>
>spark-build/dev/create-release/release-build.sh: line 344: lsof: command not found
>usage: kill [ -s signal | -p ] [ -a ] pid ...
>kill -l [ signal ]

Up to my knowledge,  the full path of `lsof` is required for non-root user in few OSs.

## How was this patch tested?

Manually tested as below:

```bash
#!/usr/bin/env bash

LSOF=lsof
if ! hash $LSOF 2>/dev/null; then
  echo "a"
  LSOF=/usr/sbin/lsof
fi

$LSOF -P | grep "a"
```

Author: hyukjinkwon <gurwls223@gmail.com>

Closes apache#19695 from HyukjinKwon/SPARK-22377.

(cherry picked from commit c8b7f97)
Signed-off-by: hyukjinkwon <gurwls223@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
8 participants