Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The way American-Gut repo was intended to be used? #204

Closed
iugrina opened this issue Mar 10, 2016 · 5 comments
Closed

The way American-Gut repo was intended to be used? #204

iugrina opened this issue Mar 10, 2016 · 5 comments

Comments

@iugrina
Copy link
Contributor

iugrina commented Mar 10, 2016

Hi,

I've been struggling with American-Gut repo and the way I should use it for the past few days. If I understood correctly the repo is broken into a package ('americangut' dir) and auxiliary files. Some of these files are intended to be used by the package itself while others are for interactive sessions with ipython notebooks for example.

In (#199) @jwdebelius recommends installing the package with pip install . -e --no-deps. Therefore, americangut dir indeed was intended to be used as a package. Still, this will not install folders latex and tests from package_data since setup.py seems to be a bit mis-configured (package_data should be a part of src dir of the package).

Also, running (e.g.) 01-get_sequences_and_metadata.md will fail on study_accessions = agenv.get_study_accessions() since it calls get_repository_dir (from results_utils.py) which will strangely take a part of the full path (outside of the package dir) and will try to find 'data' and 'latex' there. Moreover, 'data' isn't even specified in the setup.py.

Therefore, I'm not quite sure how should I use the repo. Should I define PYTHONPATH to include the repo and PATH to include scripts without installing the package or should I install the package (as recommended by @jwdebelius). If I need to install it, what else do I need to adjust to make it work (PATHs, PYTHONPATHs, ...)?

@jwdebelius
Copy link
Contributor

The auxillary files are primarily intended for use in the notebooks. At this point, analysis is wrapped into the notebook. Over the course of the project, there has been an evolution in the best way to call these functions within a notebook (command line utils vs imported functions). There has also been an evolution in the best enviroment and package management approach.

The installation described in #199 is reflective of the current conda install. As far as I can tell from the repeated research, conda doesn't easily support pythonpath modifications. The best suggestion I've seen is modifying a .pth file, which has its own set of challenges. Therefore, its necessary to include a setup.py and install the repository using pip if you wish to have the auxillary code work on the enviroment.

If you're using another environment manager (virtualenv, for instance) which lets you modify the pythonpath, its preferable to modify the path and pythonpath.

@iugrina
Copy link
Contributor Author

iugrina commented Mar 10, 2016

Thank you for the reply.

I've tried it now with conda (instructions from #199) and it still doesn't work. Folders data, latex and tests are not installed as a part of the package (if that was the intention) and running 01-get_sequences_and_metadata.md as an ipython notebook with AG_TESTING=True gives

study_accessions = agenv.get_study_accessions()
---------------------------------------------------------------------------
IOError                                   Traceback (most recent call last)
<ipython-input-3-4ce98b7f14da> in <module>()
----> 1 study_accessions = agenv.get_study_accessions()

/home/iugrina/miniconda2/envs/americangut/lib/python2.7/site-packages/americangut/notebook_environment.pyc in get_study_accessions()
   2256     """
   2257     if ag.is_test_env():
-> 2258         _stage_test_accessions()
   2259         return _TEST_ACCESSIONS[:]
   2260     else:

/home/iugrina/miniconda2/envs/americangut/lib/python2.7/site-packages/americangut/notebook_environment.pyc in _stage_test_accessions()
   2318     sourced from EBI.
   2319     """
-> 2320     repo = get_repository_dir()
   2321     for acc in _TEST_ACCESSIONS:
   2322         src = os.path.join(repo, 'tests/data/%s' % acc)

/home/iugrina/miniconda2/envs/americangut/lib/python2.7/site-packages/americangut/results_utils.pyc in get_repository_dir()
     55 
     56     # get_path verifies the existance of these directories
---> 57     get_path(expected, 'data')
     58     get_path(expected, 'latex')
     59 

/home/iugrina/miniconda2/envs/americangut/lib/python2.7/site-packages/americangut/results_utils.pyc in get_path(d, f)
     46     """Check and get a path, or throw IOError"""
     47     path = os.path.join(d, f)
---> 48     check_file(path)
     49     return path
     50 

/home/iugrina/miniconda2/envs/americangut/lib/python2.7/site-packages/americangut/util.pyc in check_file(f, e)
    146     """Verify a file (or directory) exists"""
    147     if not os.path.exists(f):
--> 148         raise e("Cannot continue! The file %s does not exist!" % f)
    149 
    150 

IOError: Cannot continue! The file /home/iugrina/miniconda2/envs/americangut/lib/python2.7/site-packages/data does not exist!

Therefore, IMHO the problem isn't in conda vs pip. Since americangut is installed as a package get_repository_dir will obviously miss the correct repo dir with data/tests/latex folders. The only way I see get_repository_dir finding the correct repo dir is if it is sourced from American-Gut/ameriacngut/results_utils.py (not from the package). However, this way 01-get_sequences_and_metadata.md won't know about it since American-Gut repo isn't in the PYTHONPATH and therefore it will import the package version.

I would like to help with improving this (making it more reproducible, working on different platforms, ...) but I need to know what was the intended way to run it. An example from scratch would help a lot with comments on following question:

  • Are data, latex and tests folders intended to be a part of the package or just a part of the repo?

@wasade
Copy link
Member

wasade commented Mar 11, 2016

Thanks, Ivo. Data and latex are intended to be part of the repo. I
recommend looking at what is done via travis.yml. I admit, our internal
uses just clone the repo so having setup.py is a bit confusing. However,
we'd be excited to see install/deploy improve
On Mar 10, 2016 12:39 PM, "Ivo Ugrina" notifications@github.com wrote:

Thank you for the reply.

I've tried it now with conda (instructions from #199
#199) and it still
doesn't work. Folders data, latex and tests are not installed as a part
of the package (if that was the intention) and running
01-get_sequences_and_metadata.md as an ipython notebook with
AG_TESTING=True gives

study_accessions = agenv.get_study_accessions()

IOError Traceback (most recent call last)
in ()
----> 1 study_accessions = agenv.get_study_accessions()

/home/iugrina/miniconda2/envs/americangut/lib/python2.7/site-packages/americangut/notebook_environment.pyc in get_study_accessions()
2256 """
2257 if ag.is_test_env():
-> 2258 _stage_test_accessions()
2259 return _TEST_ACCESSIONS[:]
2260 else:

/home/iugrina/miniconda2/envs/americangut/lib/python2.7/site-packages/americangut/notebook_environment.pyc in _stage_test_accessions()
2318 sourced from EBI.
2319 """
-> 2320 repo = get_repository_dir()
2321 for acc in _TEST_ACCESSIONS:
2322 src = os.path.join(repo, 'tests/data/%s' % acc)

/home/iugrina/miniconda2/envs/americangut/lib/python2.7/site-packages/americangut/results_utils.pyc in get_repository_dir()
55
56 # get_path verifies the existance of these directories
---> 57 get_path(expected, 'data')
58 get_path(expected, 'latex')
59

/home/iugrina/miniconda2/envs/americangut/lib/python2.7/site-packages/americangut/results_utils.pyc in get_path(d, f)
46 """Check and get a path, or throw IOError"""
47 path = os.path.join(d, f)
---> 48 check_file(path)
49 return path
50

/home/iugrina/miniconda2/envs/americangut/lib/python2.7/site-packages/americangut/util.pyc in check_file(f, e)
146 """Verify a file (or directory) exists"""
147 if not os.path.exists(f):
--> 148 raise e("Cannot continue! The file %s does not exist!" % f)
149
150

IOError: Cannot continue! The file /home/iugrina/miniconda2/envs/americangut/lib/python2.7/site-packages/data does not exist!

Therefore, IMHO the problem isn't in conda vs pip. Since americangut is
installed as a package get_repository_dir will obviously miss the correct
repo dir with data/tests/latex folders. The only way I see
get_repository_dir finding the correct repo dir is if it is sourced from
American-Gut/ameriacngut/results_utils.py (not from the package).
However, this way 01-get_sequences_and_metadata.md won't know about it
since American-Gut repo isn't in the PYTHONPATH and therefore it will
import the package version.

I would like to help with improving this (making it more reproducible,
working on different platforms, ...) but I need to know what was the
intended way to run it. An example from scratch would help a lot with
comments on following question:

  • Are data, latex and tests folders intended to be a part of the
    package or just a part of the repo?


Reply to this email directly or view it on GitHub
#204 (comment)
.

@iugrina
Copy link
Contributor Author

iugrina commented Mar 11, 2016

Thanks. If it is intended to be used only as a repo then adjusting PYTHONPATH and PATH should be enough.

@iugrina
Copy link
Contributor Author

iugrina commented May 23, 2016

Resolved with #211

@iugrina iugrina closed this as completed May 23, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants