Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP Python API #1387

Closed
wants to merge 2 commits into from
Closed

Conversation

@fnothaft
Copy link
Member

@fnothaft fnothaft commented Feb 10, 2017

Resolves #538. Still a bit to be done:

  • Accessing the RDD contents from Python still needs a bit of work, the core GenomicRDD APIs work though
  • I haven't added the region join or pipe APIs yet, because they do funny business with ClassTags/Implicits. The classTag bit will be easy to work around, just needs to be done.
  • ./scripts/jenkins-test is almost definitely going to need an update
  • Right now, this is on top of #1386, which has since merged. This PR needs a rebase.
  • Bump in coverage to the JavaADAMContext which has some ignored unit tests.

Figured it'd be good to get some early comments.

@fnothaft fnothaft requested a review from laserson Feb 10, 2017
@AmplabJenkins
Copy link

@AmplabJenkins AmplabJenkins commented Feb 10, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1782/
Test PASSed.

@laserson
Copy link
Contributor

@laserson laserson commented Feb 10, 2017

It'll take me a bit to get to this. This is also a pretty large patch...would there be a logical way to subdivide it at all?

@laserson
Copy link
Contributor

@laserson laserson commented Feb 10, 2017

Also, we don't have too much Python code from the past, but I'd be a pretty strong proponent of making PEP8 our official style, with possible exceptions if we want.

@fnothaft
Copy link
Member Author

@fnothaft fnothaft commented Feb 11, 2017

It'll take me a bit to get to this. This is also a pretty large patch...would there be a logical way to subdivide it at all?

Let me rebase; I think that will drastically drop the number of files touched.

Also, we don't have too much Python code from the past, but I'd be a pretty strong proponent of making PEP8 our official style, with possible exceptions if we want.

+1. I believe this is all PEP8, but haven't linted it.

@andypetrella
Copy link
Contributor

@andypetrella andypetrella commented Feb 11, 2017

Cool, check this one @0asa ;)

@fnothaft fnothaft force-pushed the fnothaft:issues/538-python branch from 2058469 to b72c360 Feb 14, 2017
@AmplabJenkins
Copy link

@AmplabJenkins AmplabJenkins commented Feb 14, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1786/
Test PASSed.

@fnothaft fnothaft force-pushed the fnothaft:issues/538-python branch from b72c360 to ad8d0e2 Feb 15, 2017
@fnothaft
Copy link
Member Author

@fnothaft fnothaft commented Feb 15, 2017

Slimmed this down a bit. Watch for another PR coming momentarily...

@fnothaft fnothaft mentioned this pull request Feb 15, 2017
4 of 4 tasks complete
@AmplabJenkins
Copy link

@AmplabJenkins AmplabJenkins commented Feb 15, 2017

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1787/

Build result: FAILURE

[...truncated 3 lines...]Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prbWiping out workspace first.Cloning the remote Git repositoryCloning repository https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > /home/jenkins/git2/bin/git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1387/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains 0b0f85d989ceb8f8fd48bafaf7d834a4d9dc4806 # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1387/merge^{commit} # timeout=10Checking out Revision 0b0f85d989ceb8f8fd48bafaf7d834a4d9dc4806 (origin/pr/1387/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 0b0f85d989ceb8f8fd48bafaf7d834a4d9dc4806First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@fnothaft
Copy link
Member Author

@fnothaft fnothaft commented Feb 16, 2017

This is failing in the Python part of the build as we don't have virtualenv installed on @AmplabJenkins. I will be following up with @shaneknapp.

@shaneknapp
Copy link
Contributor

@shaneknapp shaneknapp commented Feb 16, 2017

@fnothaft
Copy link
Member Author

@fnothaft fnothaft commented Feb 16, 2017

Thanks for the quick reply @shaneknapp! I had been assuming a virtualenv but it should be a trivial change to use conda instead. Let me give it a looksee tonight.

@shaneknapp
Copy link
Contributor

@shaneknapp shaneknapp commented Feb 16, 2017

@fnothaft fnothaft force-pushed the fnothaft:issues/538-python branch from ad8d0e2 to cc090a1 Feb 16, 2017
@AmplabJenkins
Copy link

@AmplabJenkins AmplabJenkins commented Feb 16, 2017

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1789/

Build result: FAILURE

[...truncated 3 lines...]Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prbWiping out workspace first.Cloning the remote Git repositoryCloning repository https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > /home/jenkins/git2/bin/git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1387/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains 9e34eb6015c1fd36addadeb5fc0f1fec3d6a5b15 # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1387/merge^{commit} # timeout=10Checking out Revision 9e34eb6015c1fd36addadeb5fc0f1fec3d6a5b15 (origin/pr/1387/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 9e34eb6015c1fd36addadeb5fc0f1fec3d6a5b15First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@fnothaft fnothaft force-pushed the fnothaft:issues/538-python branch from cc090a1 to bc96724 Feb 16, 2017
@AmplabJenkins
Copy link

@AmplabJenkins AmplabJenkins commented Feb 16, 2017

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1790/

Build result: FAILURE

[...truncated 3 lines...]Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prbWiping out workspace first.Cloning the remote Git repositoryCloning repository https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > /home/jenkins/git2/bin/git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1387/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains 76b3ee7a54ba77a257fd5cd600a5a1c95958e518 # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1387/merge^{commit} # timeout=10Checking out Revision 76b3ee7a54ba77a257fd5cd600a5a1c95958e518 (origin/pr/1387/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 76b3ee7a54ba77a257fd5cd600a5a1c95958e518First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@shaneknapp
Copy link
Contributor

@shaneknapp shaneknapp commented Feb 16, 2017

here's some quick bash to fix this failure:

test -d /home/jenkins/.conda/envs/adam-build/ || conda create -n adam-build python=2.7 anaconda

@fnothaft fnothaft force-pushed the fnothaft:issues/538-python branch from bc96724 to 862ff44 Feb 16, 2017
@fnothaft
Copy link
Member Author

@fnothaft fnothaft commented Feb 16, 2017

Jenkins, test this please.

@AmplabJenkins
Copy link

@AmplabJenkins AmplabJenkins commented Feb 16, 2017

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1791/

Build result: FAILURE

[...truncated 3 lines...]Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prbWiping out workspace first.Cloning the remote Git repositoryCloning repository https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > /home/jenkins/git2/bin/git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1387/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains 624a670392588cb9de95103e1f3c16ff90a59d0b # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1387/merge^{commit} # timeout=10Checking out Revision 624a670392588cb9de95103e1f3c16ff90a59d0b (origin/pr/1387/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 624a670392588cb9de95103e1f3c16ff90a59d0bFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@shaneknapp
Copy link
Contributor

@shaneknapp shaneknapp commented Feb 16, 2017

hmm, looks like there's a conda deactivate called before you call pip2.7.

@fnothaft fnothaft force-pushed the fnothaft:issues/538-python branch from 862ff44 to afa0cc0 Feb 16, 2017
@shaneknapp
Copy link
Contributor

@shaneknapp shaneknapp commented Feb 16, 2017

after watching your build log scream by, i'd also recommend adding the "-q" flag to your conda create env call. this will suppress all of the download status bars, etc.

@AmplabJenkins
Copy link

@AmplabJenkins AmplabJenkins commented Feb 16, 2017

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1792/

Build result: FAILURE

[...truncated 3 lines...]Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prbWiping out workspace first.Cloning the remote Git repositoryCloning repository https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > /home/jenkins/git2/bin/git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1387/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains a31a5019257be7f4dabacd4fa0abd24d05d1fd4d # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1387/merge^{commit} # timeout=10Checking out Revision a31a5019257be7f4dabacd4fa0abd24d05d1fd4d (origin/pr/1387/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f a31a5019257be7f4dabacd4fa0abd24d05d1fd4dFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@fnothaft fnothaft force-pushed the fnothaft:issues/538-python branch from afa0cc0 to 9843fc4 Feb 16, 2017
@fnothaft
Copy link
Member Author

@fnothaft fnothaft commented Feb 25, 2017

Hi @jpdna! A few things:

  • Did you mvn package? You'll need to recompile to get some of the Java-friendly method changes in this PR. (The initial ADAMContext required one of these changes)
  • Also, you need to set a few things on the PySpark command line/as an envar. Look in ./scripts/jenkins-test for details.
@jpdna
Copy link
Member

@jpdna jpdna commented Feb 25, 2017

OK so I did mvn package and then reinstalled the adam-python module
and as for the envars inspired by ./scripts/jenkins-test
I tried so far

export PYTHONPATH=/home/paschalj/Spark/v1/spark-1.6.3-bin-hadoop2.6/python/lib/py4j-0.9-src.zip:${PYTHONPATH}

export PYSPARK_SUBMIT_ARGS="--jars /home/paschalj/frank_adam/python/adam/adam-assembly/target/adam_2.10-0.21.1-SNAPSHOT.jar --driver-class-path /home/paschalj/frank_adam/python/adam/adam-assembly/target/adam_2.10-0.21.1-SNAPSHOT.jar pyspark-shell"

But I am still getting the same TypeError: 'JavaPackage' object is not callable from ac = ADAMContext(sc) as described above in the pyspark REPL.
Do those two exports seem correct? And what else am I missing?

I am using a virtualenv and not anaconda.

@fnothaft
Copy link
Member Author

@fnothaft fnothaft commented Feb 25, 2017

It should work with a virtualenv, that's what I've been using locally. Do the tests pass for you when you make test in adam-python? (running mvn package -P python will run these)

@jpdna
Copy link
Member

@jpdna jpdna commented Feb 26, 2017

Ah - yeah python build tests fail for me.
From a fresh clone/checkout of this branch issues/538-python I run:

mvn package -P python

and it fails.
I'll email you the console output with errors.
Before I had just ran mvn package earlier, not noticing if python module got compiled/test passed.

So, assuming a fresh git checkout - what steps needs to be done to properly compile/test with the adam-python module?

@fnothaft
Copy link
Member Author

@fnothaft fnothaft commented Feb 26, 2017

I saw your email, will look more later. What version of spark are you running?

@jpdna
Copy link
Member

@jpdna jpdna commented Feb 26, 2017

I saw your email, will look more later. What version of spark are you running?

Spark 1.6.3

@fnothaft
Copy link
Member Author

@fnothaft fnothaft commented Feb 26, 2017

I saw your email, will look more later. What version of spark are you running?

Spark 1.6.3

OK, that's what I'm running locally as well...

@jpdna
Copy link
Member

@jpdna jpdna commented Feb 27, 2017

I tried from scratch again on different machine, with commands below, but got same errors during compile/test as I sent earlier. Note, I tried running make develop in the adam-python dir first, but didn't seem to help. Not sure when/if that needs to be run.

source venv/bin/activate
git clone https://github.com/fnothaft/adam.git
git checkout issues/538-python
export SPARK_HOME=/home/paschallj/Spark/1.6.3/spark-1.6.3-bin-hadoop2.6
export PYTHONPATH=/home/paschallj/Spark/1.6.3/spark-1.6.3-bin-hadoop2.6/python/lib/py4j-0.9-src.zip:{$PYTHONPATH}
pip install pytest
mvn package -P python

Error like below from mvn package -P python:

[INFO] ------------------------------------------------------------------------
[INFO] Building ADAM_2.10: Python APIs 0.21.1-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- maven-enforcer-plugin:1.0:enforce (enforce-versions) @ adam-python_2.10 ---
[INFO] 
[INFO] --- maven-enforcer-plugin:1.0:enforce (enforce-maven) @ adam-python_2.10 ---
[INFO] 
[INFO] --- scalariform-maven-plugin:0.1.4:format (default-cli) @ adam-python_2.10 ---
[INFO] Modified 0 of 0 .scala files
[INFO] 
[INFO] --- maven-resources-plugin:3.0.1:resources (default-resources) @ adam-python_2.10 ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /home/paschallj/adam_python/v2/adam/adam-python/src/main/resources
[INFO] 
[INFO] --- scala-maven-plugin:3.2.2:compile (scala-compile-first) @ adam-python_2.10 ---
[INFO] No sources to compile
[INFO] 
[INFO] --- exec-maven-plugin:1.5.0:exec (dev-python) @ adam-python_2.10 ---
pip install -e .
Obtaining file:///home/paschallj/adam_python/v2/adam/adam-python
Installing collected packages: bdgenomics.adam
  Found existing installation: bdgenomics.adam 0.21.1-SNAPSHOT
    Uninstalling bdgenomics.adam-0.21.1-SNAPSHOT:
      Successfully uninstalled bdgenomics.adam-0.21.1-SNAPSHOT
  Running setup.py develop for bdgenomics.adam
Successfully installed bdgenomics.adam
[INFO] 
[INFO] --- maven-compiler-plugin:3.5.1:compile (default-compile) @ adam-python_2.10 ---
[INFO] No sources to compile
[INFO] 
[INFO] --- maven-resources-plugin:3.0.1:testResources (default-testResources) @ adam-python_2.10 ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /home/paschallj/adam_python/v2/adam/adam-python/src/test/resources
[INFO] 
[INFO] --- scala-maven-plugin:3.2.2:testCompile (scala-test-compile-first) @ adam-python_2.10 ---
[INFO] No sources to compile
[INFO] 
[INFO] --- exec-maven-plugin:1.5.0:exec (test-python) @ adam-python_2.10 ---
mkdir -p target
python2.7 -m pytest -vv --junitxml target/pytest-report.xml src
============================= test session starts ==============================
platform linux2 -- Python 2.7.12, pytest-3.0.6, py-1.4.32, pluggy-0.4.0 -- /home/paschallj/adam_python/v2/venv/bin/python2.7
cachedir: .cache
rootdir: /home/paschallj/adam_python/v2/adam/adam-python, inifile: 
collecting ... collected 0 items / 5 errors

 generated xml file: /home/paschallj/adam_python/v2/adam/adam-python/target/pytest-report.xml 
==================================== ERRORS ====================================
________ ERROR collecting src/bdgenomics/adam/test/adamContext_test.py _________
ImportError while importing test module '/home/paschallj/adam_python/v2/adam/adam-python/src/bdgenomics/adam/test/adamContext_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../venv/local/lib/python2.7/site-packages/_pytest/python.py:418: in _importtestmodule
    mod = self.fspath.pyimport(ensuresyspath=importmode)
../../venv/local/lib/python2.7/site-packages/py/_path/local.py:662: in pyimport
    __import__(modname)
src/bdgenomics/adam/test/__init__.py:26: in <module>
    from pyspark.context import SparkContext
E   ImportError: No module named pyspark.context
_____ ERROR collecting src/bdgenomics/adam/test/alignmentRecordRdd_test.py _____
ImportError while importing test module '/home/paschallj/adam_python/v2/adam/adam-python/src/bdgenomics/adam/test/alignmentRecordRdd_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../venv/local/lib/python2.7/site-packages/_pytest/python.py:418: in _importtestmodule
    mod = self.fspath.pyimport(ensuresyspath=importmode)
../../venv/local/lib/python2.7/site-packages/py/_path/local.py:662: in pyimport
    __import__(modname)
src/bdgenomics/adam/test/__init__.py:26: in <module>
    from pyspark.context import SparkContext
E   ImportError: No module named pyspark.context
_________ ERROR collecting src/bdgenomics/adam/test/featureRdd_test.py _________
ImportError while importing test module '/home/paschallj/adam_python/v2/adam/adam-python/src/bdgenomics/adam/test/featureRdd_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../venv/local/lib/python2.7/site-packages/_pytest/python.py:418: in _importtestmodule
    mod = self.fspath.pyimport(ensuresyspath=importmode)
../../venv/local/lib/python2.7/site-packages/py/_path/local.py:662: in pyimport
    __import__(modname)
src/bdgenomics/adam/test/__init__.py:26: in <module>
    from pyspark.context import SparkContext
E   ImportError: No module named pyspark.context
________ ERROR collecting src/bdgenomics/adam/test/genotypeRdd_test.py _________
ImportError while importing test module '/home/paschallj/adam_python/v2/adam/adam-python/src/bdgenomics/adam/test/genotypeRdd_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../venv/local/lib/python2.7/site-packages/_pytest/python.py:418: in _importtestmodule
    mod = self.fspath.pyimport(ensuresyspath=importmode)
../../venv/local/lib/python2.7/site-packages/py/_path/local.py:662: in pyimport
    __import__(modname)
src/bdgenomics/adam/test/__init__.py:26: in <module>
    from pyspark.context import SparkContext
E   ImportError: No module named pyspark.context
_________ ERROR collecting src/bdgenomics/adam/test/variantRdd_test.py _________
ImportError while importing test module '/home/paschallj/adam_python/v2/adam/adam-python/src/bdgenomics/adam/test/variantRdd_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../venv/local/lib/python2.7/site-packages/_pytest/python.py:418: in _importtestmodule
    mod = self.fspath.pyimport(ensuresyspath=importmode)
../../venv/local/lib/python2.7/site-packages/py/_path/local.py:662: in pyimport
    __import__(modname)
src/bdgenomics/adam/test/__init__.py:26: in <module>
    from pyspark.context import SparkContext
E   ImportError: No module named pyspark.context
!!!!!!!!!!!!!!!!!!! Interrupted: 5 errors during collection !!!!!!!!!!!!!!!!!!!!
=========================== 5 error in 0.28 seconds ============================
Makefile:82: recipe for target 'test' failed
make: *** [test] Error 2
[ERROR] Command execution failed.
org.apache.commons.exec.ExecuteException: Process exited with an error: 2 (Exit value: 2)
	at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:404)
	at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:166)
	at org.codehaus.mojo.exec.ExecMojo.executeCommandLine(ExecMojo.java:764)
	at org.codehaus.mojo.exec.ExecMojo.executeCommandLine(ExecMojo.java:711)
	at org.codehaus.mojo.exec.ExecMojo.execute(ExecMojo.java:289)
	at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:207)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
	at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
	at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:307)
	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:193)
	at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:106)
	at org.apache.maven.cli.MavenCli.execute(MavenCli.java:863)
	at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:288)
	at org.apache.maven.cli.MavenCli.main(MavenCli.java:199)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
	at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
	at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
	at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] ADAM_2.10 .......................................... SUCCESS [  3.782 s]
[INFO] ADAM_2.10: Core .................................... SUCCESS [  5.201 s]
[INFO] ADAM_2.10: APIs for Java ........................... SUCCESS [  0.354 s]
[INFO] ADAM_2.10: CLI ..................................... SUCCESS [  1.197 s]
[INFO] ADAM_2.10: Assembly ................................ SUCCESS [  4.930 s]
[INFO] ADAM_2.10: Python APIs ............................. FAILURE [  1.578 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 17.204 s
[INFO] Finished at: 2017-02-27T08:16:10-05:00
[INFO] Final Memory: 224M/1313M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.5.0:exec (test-python) on project adam-python_2.10: Command execution failed. Process exited with an error: 2 (Exit value: 2) -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :adam-python_2.10

@fnothaft
Copy link
Member Author

@fnothaft fnothaft commented Feb 27, 2017

@jpdna If you open a python interpreter and run from pyspark.context import SparkContext, what happens?

@jpdna
Copy link
Member

@jpdna jpdna commented Feb 27, 2017

from pyspark.context import SparkContext

I get:
ImportError: No module named pyspark.context
I thought the way to use pyspark was via ./bin/pyspark for most part
If its needed how to I go about installing pyspark in my venv or adding it to python path?

@fnothaft
Copy link
Member Author

@fnothaft fnothaft commented Feb 27, 2017

@jpdna pyspark needs to be on your PYTHONPATH. Can you echo $PYTHONPATH and post here?

@jpdna
Copy link
Member

@jpdna jpdna commented Feb 27, 2017

Earlier

PYTHONPATH=/home/paschallj/Spark/1.6.3/spark-1.6.3-bin-hadoop2.6/python/lib/py4j-0.9-src.zip

I tried adding pyspark so now:

echo $PYTHONPATH
/home/paschallj/Spark/1.6.3/spark-1.6.3-bin-hadoop2.6/python/lib/py4j-0.9-src.zip:/home/paschallj/Spark/1.6.3/spark-1.6.3-bin-hadoop2.6/python/lib/pyspark.zip

so with that, now in normal python REPL from pyspark.context import SparkContext works.

However, in same venv, and same terminal sesion running

mvn package -P python -DskipTests

still fails with somewhat different verbose error messages that I will email to you.

@fnothaft
Copy link
Member Author

@fnothaft fnothaft commented Feb 27, 2017

still fails with somewhat different verbose error messages that I will email to you.

Can you post them here so I don't have to cross reference email threads? Thanks!

@jpdna
Copy link
Member

@jpdna jpdna commented Feb 27, 2017

Sure - is there a way to force a code block veritical scroll bar? I don't like having to make the page here so yuge

------------------------------------------------------------------------
[INFO] Building ADAM_2.10: Python APIs 0.21.1-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- maven-enforcer-plugin:1.0:enforce (enforce-versions) @ adam-python_2.10 ---
[INFO] 
[INFO] --- maven-enforcer-plugin:1.0:enforce (enforce-maven) @ adam-python_2.10 ---
[INFO] 
[INFO] --- scalariform-maven-plugin:0.1.4:format (default-cli) @ adam-python_2.10 ---
[INFO] Modified 0 of 0 .scala files
[INFO] 
[INFO] --- maven-resources-plugin:3.0.1:resources (default-resources) @ adam-python_2.10 ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /home/paschallj/adam_python/v2/adam/adam-python/src/main/resources
[INFO] 
[INFO] --- scala-maven-plugin:3.2.2:compile (scala-compile-first) @ adam-python_2.10 ---
[INFO] No sources to compile
[INFO] 
[INFO] --- exec-maven-plugin:1.5.0:exec (dev-python) @ adam-python_2.10 ---
pip install -e .
Obtaining file:///home/paschallj/adam_python/v2/adam/adam-python
Installing collected packages: bdgenomics.adam
  Found existing installation: bdgenomics.adam 0.21.1-SNAPSHOT
    Uninstalling bdgenomics.adam-0.21.1-SNAPSHOT:
      Successfully uninstalled bdgenomics.adam-0.21.1-SNAPSHOT
  Running setup.py develop for bdgenomics.adam
Successfully installed bdgenomics.adam
[INFO] 
[INFO] --- maven-compiler-plugin:3.5.1:compile (default-compile) @ adam-python_2.10 ---
[INFO] No sources to compile
[INFO] 
[INFO] --- maven-resources-plugin:3.0.1:testResources (default-testResources) @ adam-python_2.10 ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /home/paschallj/adam_python/v2/adam/adam-python/src/test/resources
[INFO] 
[INFO] --- scala-maven-plugin:3.2.2:testCompile (scala-test-compile-first) @ adam-python_2.10 ---
[INFO] No sources to compile
[INFO] 
[INFO] --- exec-maven-plugin:1.5.0:exec (test-python) @ adam-python_2.10 ---
mkdir -p target
python2.7 -m pytest -vv --junitxml target/pytest-report.xml src
============================= test session starts ==============================
platform linux2 -- Python 2.7.12, pytest-3.0.6, py-1.4.32, pluggy-0.4.0 -- /home/paschallj/adam_python/v2/venv/bin/python2.7
cachedir: .cache
rootdir: /home/paschallj/adam_python/v2/adam/adam-python, inifile: 
collecting ... collected 19 items

src/bdgenomics/adam/test/adamContext_test.py::ADAMContextTest::test_load_alignments FAILED
src/bdgenomics/adam/test/adamContext_test.py::ADAMContextTest::test_load_bed FAILED
src/bdgenomics/adam/test/adamContext_test.py::ADAMContextTest::test_load_genotypes FAILED
src/bdgenomics/adam/test/adamContext_test.py::ADAMContextTest::test_load_gtf FAILED
src/bdgenomics/adam/test/adamContext_test.py::ADAMContextTest::test_load_interval_list FAILED
src/bdgenomics/adam/test/adamContext_test.py::ADAMContextTest::test_load_narrowPeak FAILED
src/bdgenomics/adam/test/adamContext_test.py::ADAMContextTest::test_load_sequence FAILED
src/bdgenomics/adam/test/adamContext_test.py::ADAMContextTest::test_load_variants FAILED
src/bdgenomics/adam/test/alignmentRecordRdd_test.py::AlignmentRecordRDDTest::test_save_as_bam FAILED
src/bdgenomics/adam/test/alignmentRecordRdd_test.py::AlignmentRecordRDDTest::test_save_sorted_sam FAILED
src/bdgenomics/adam/test/alignmentRecordRdd_test.py::AlignmentRecordRDDTest::test_save_unordered_sam FAILED
src/bdgenomics/adam/test/featureRdd_test.py::FeatureRDDTest::test_round_trip_bed FAILED
src/bdgenomics/adam/test/featureRdd_test.py::FeatureRDDTest::test_round_trip_gtf FAILED
src/bdgenomics/adam/test/featureRdd_test.py::FeatureRDDTest::test_round_trip_interval_list FAILED
src/bdgenomics/adam/test/featureRdd_test.py::FeatureRDDTest::test_round_trip_narrowPeak FAILED
src/bdgenomics/adam/test/genotypeRdd_test.py::GenotypeRDDTest::test_vcf_round_trip FAILED
src/bdgenomics/adam/test/genotypeRdd_test.py::GenotypeRDDTest::test_vcf_sort FAILED
src/bdgenomics/adam/test/genotypeRdd_test.py::GenotypeRDDTest::test_vcf_sort_lex FAILED
src/bdgenomics/adam/test/variantRdd_test.py::VariantRDDTest::test_vcf_round_trip FAILED

 generated xml file: /home/paschallj/adam_python/v2/adam/adam-python/target/pytest-report.xml 
=================================== FAILURES ===================================
_____________________ ADAMContextTest.test_load_alignments _____________________

self = <bdgenomics.adam.test.adamContext_test.ADAMContextTest testMethod=test_load_alignments>

    def test_load_alignments(self):
    
        testFile = self.resourceFile("small.sam")
>       ac = ADAMContext(self.sc)

src/bdgenomics/adam/test/adamContext_test.py:30: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <bdgenomics.adam.adamContext.ADAMContext object at 0x7fc4d4c57090>
sc = <pyspark.context.SparkContext object at 0x7fc4d4bf5790>

    def __init__(self, sc):
        """
            Initializes an ADAMContext using a SparkContext.
    
            :param pyspark.context.SparkContext sc: The currently active
            SparkContext.
            """
    
        self._sc = sc
        self._jvm = sc._jvm
>       c = self._jvm.org.bdgenomics.adam.rdd.ADAMContext(sc._jsc.sc())
E       TypeError: 'JavaPackage' object is not callable

src/bdgenomics/adam/adamContext.py:44: TypeError
...
...

src/bdgenomics/adam/adamContext.py:44: TypeError
----------------------------- Captured stderr call -----------------------------
17/02/27 13:21:12 INFO SparkContext: Running Spark version 1.6.3
17/02/27 13:21:12 INFO SecurityManager: Changing view acls to: paschallj
17/02/27 13:21:12 INFO SecurityManager: Changing modify acls to: paschallj
17/02/27 13:21:12 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(paschallj); users with modify permissions: Set(paschallj)
17/02/27 13:21:12 INFO Utils: Successfully started service 'sparkDriver' on port 44438.
17/02/27 13:21:12 INFO Slf4jLogger: Slf4jLogger started
17/02/27 13:21:12 INFO Remoting: Starting remoting
17/02/27 13:21:12 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@localhost:34489]
17/02/27 13:21:12 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 34489.
17/02/27 13:21:12 INFO SparkEnv: Registering MapOutputTracker
17/02/27 13:21:12 INFO SparkEnv: Registering BlockManagerMaster
17/02/27 13:21:12 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-ee0a1e40-7a0d-45a2-b0e9-4ae0e5628433
17/02/27 13:21:12 INFO MemoryStore: MemoryStore started with capacity 511.1 MB
17/02/27 13:21:12 INFO SparkEnv: Registering OutputCommitCoordinator
17/02/27 13:21:12 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/02/27 13:21:12 INFO SparkUI: Started SparkUI at http://localhost:4040
17/02/27 13:21:12 INFO Executor: Starting executor ID driver on host localhost
17/02/27 13:21:12 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 40674.
17/02/27 13:21:12 INFO NettyBlockTransferService: Server created on 40674
17/02/27 13:21:12 INFO BlockManagerMaster: Trying to register BlockManager
17/02/27 13:21:12 INFO BlockManagerMasterEndpoint: Registering block manager localhost:40674 with 511.1 MB RAM, BlockManagerId(driver, localhost, 40674)
17/02/27 13:21:12 INFO BlockManagerMaster: Registered BlockManager
17/02/27 13:21:12 INFO SparkUI: Stopped Spark web UI at http://localhost:4040
17/02/27 13:21:12 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/02/27 13:21:12 INFO MemoryStore: MemoryStore cleared
17/02/27 13:21:12 INFO BlockManager: BlockManager stopped
17/02/27 13:21:12 INFO BlockManagerMaster: BlockManagerMaster stopped
17/02/27 13:21:12 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/02/27 13:21:12 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
17/02/27 13:21:12 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
17/02/27 13:21:12 INFO SparkContext: Successfully stopped SparkContext
17/02/27 13:21:12 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
========================== 19 failed in 17.87 seconds ==========================
Makefile:82: recipe for target 'test' failed
[ERROR] Command execution failed.
org.apache.commons.exec.ExecuteException: Process exited with an error: 2 (Exit value: 2)
	at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:404)
	at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:166)
	at org.codehaus.mojo.exec.ExecMojo.executeCommandLine(ExecMojo.java:764)
	at org.codehaus.mojo.exec.ExecMojo.executeCommandLine(ExecMojo.java:711)
	at org.codehaus.mojo.exec.ExecMojo.execute(ExecMojo.java:289)
	at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:207)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
	at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
	at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:307)
	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:193)
	at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:106)
	at org.apache.maven.cli.MavenCli.execute(MavenCli.java:863)
	at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:288)
	at org.apache.maven.cli.MavenCli.main(MavenCli.java:199)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
	at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
	at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
	at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] ADAM_2.10 .......................................... SUCCESS [  4.471 s]
[INFO] ADAM_2.10: Core .................................... SUCCESS [  5.155 s]
[INFO] ADAM_2.10: APIs for Java ........................... SUCCESS [  0.378 s]
[INFO] ADAM_2.10: CLI ..................................... SUCCESS [  1.102 s]
[INFO] ADAM_2.10: Assembly ................................ SUCCESS [  4.967 s]
[INFO] ADAM_2.10: Python APIs ............................. FAILURE [ 19.141 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE

@fnothaft
Copy link
Member Author

@fnothaft fnothaft commented Feb 27, 2017

Thanks @jpdna! What does echo $PYSPARK_SUBMIT_ARGS show?

@fnothaft
Copy link
Member Author

@fnothaft fnothaft commented Feb 27, 2017

@jpdna
Copy link
Member

@jpdna jpdna commented Feb 27, 2017

success! - thanks @fnothaft
Now I can get on with trying out your API!

it wasn't set before but now

echo $PYSPARK_SUBMIT_ARGS
--jars /home/paschallj/adam_python/v2/adam/adam-assembly/target/adam_2.10-0.21.1-SNAPSHOT.jar --driver-class-path /home/paschallj/adam_python/v2/adam/adam-assembly/target/adam_2.10-0.21.1-SNAPSHOT.jar pyspark-shell

echo $PYTHONPATH
/home/paschallj/Spark/1.6.3/spark-1.6.3-bin-hadoop2.6/python/lib/py4j-0.9-src.zip:/home/paschallj/Spark/1.6.3/spark-1.6.3-bin-hadoop2.6/python/lib/pyspark.zip

launching via $SPARK_HOME/bin/pyspark still produced the java error I saw earlier,
but below worked fine in normal python REPL

from pyspark import SparkContext
from bdgenomics.adam.adamContext import ADAMContext
sc = SparkContext("local","jpappname")
ac = ADAMContext(sc)
reads = ac.loadAlignments("/home/paschallj/adam_python/v2/adam/adam-core/src/test/resources/small.sam")

I ran 'make develop' in adam/adam-pytthon first, but not sure if that was needed or not.

@fnothaft
Copy link
Member Author

@fnothaft fnothaft commented Feb 27, 2017

Ah, interesting! What's the command line you are using when you invoke $SPARK_HOME/bin/pyspark?

@fnothaft
Copy link
Member Author

@fnothaft fnothaft commented Mar 3, 2017

This PR should include a bump in coverage to the JavaADAMContext which has some ignored unit tests.

@fnothaft
Copy link
Member Author

@fnothaft fnothaft commented Mar 6, 2017

Any thoughts on getting this and #1391 into the 3/18 0.22.0 release? I think that it'd be desirable to get both in. Additionally, these two PRs have been open for a long time now, and we need to start getting review eyeballs on them at the least.

@jpdna
Copy link
Member

@jpdna jpdna commented Mar 6, 2017

I'm attempting to use the `countKmers' function from the Python command line as below:

reads = ac.loadAlignments("small.sam")
myKmerCount = reads.countKmers(10)

But I can't seem to figure out how to then get out a map of Kmers and counts from the myKmerCount object.

This is one error I got from an attempt:

>>> myKmerCount.count()
2017-03-06 07:51:20 ERROR PythonRunner:95 - Python worker exited unexpectedly (crashed)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/home/paschallj/Spark/1.6.3/spark-1.6.3-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/worker.py", line 98, in main
    command = pickleSer._read_with_length(infile)
  File "/home/paschallj/Spark/1.6.3/spark-1.6.3-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/serializers.py", line 156, in _read_with_length
    length = read_int(stream)
  File "/home/paschallj/Spark/1.6.3/spark-1.6.3-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/serializers.py", line 545, in read_int
    raise EOFError
EOFError

	at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166)
	at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:207)
	at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:125)
	at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)

Here Is another attempt:

 myKmerCount.collectAsMap()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/paschallj/Spark/1.6.3/spark-1.6.3-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/rdd.py", line 1520, in collectAsMap
  File "/home/paschallj/Spark/1.6.3/spark-1.6.3-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/rdd.py", line 772, in collect
  File "/home/paschallj/Spark/1.6.3/spark-1.6.3-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/rdd.py", line 142, in _load_from_socket
  File "/home/paschallj/Spark/1.6.3/spark-1.6.3-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/serializers.py", line 139, in load_stream
  File "/home/paschallj/Spark/1.6.3/spark-1.6.3-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/serializers.py", line 164, in _read_with_length
  File "/home/paschallj/Spark/1.6.3/spark-1.6.3-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/serializers.py", line 422, in loads
cPickle.UnpicklingError: invalid load key, 'A'.
>>> 2017-03-06 07:52:45 ERROR PythonRDD:95 - Error while sending iterator
org.apache.spark.SparkException: Unexpected element type class java.lang.Long
	at org.apache.spark.api.python.PythonRDD$.org$apache$spark$api$python$PythonRDD$$write$1(PythonRDD.scala:449)
	at org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:452)
	at org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:452)
	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)

I imagine this likely has to do with need to convert to some java/jvm object - but can you point me in right direction. Also - in case users use dir(myKmerCount) to try to figure out what python functions will work, and find that some fail as above, we may need to describe in docs which python functions are available.

@fnothaft
Copy link
Member Author

@fnothaft fnothaft commented May 11, 2017

I'm going to close this in favor of #1391, which contains this same code and more.

@fnothaft fnothaft closed this May 11, 2017
@fnothaft fnothaft deleted the fnothaft:issues/538-python branch May 11, 2017
@heuermh heuermh moved this from Triage to Completed in Release 0.23.0 May 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Linked issues

Successfully merging this pull request may close these issues.

None yet

6 participants
You can’t perform that action at this time.