Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R dependencies #1

Closed
Acribbs opened this issue Mar 3, 2019 · 14 comments · Fixed by #7
Closed

R dependencies #1

Acribbs opened this issue Mar 3, 2019 · 14 comments · Fixed by #7

Comments

@Acribbs
Copy link

Acribbs commented Mar 3, 2019

Hi @sebastian-luna-valero this is really useful and a great idea for a repository. I would like to follow a similar strategy for R. Do you have any suggestions or links on best way to implement this?

The bio conductor packages are probably going to be a bit of a nightmare but I think the yaml conversion dict is a good idea.

@Acribbs
Copy link
Author

Acribbs commented Mar 4, 2019

Would the R packages, https://lobstr.r-lib.org, https://github.com/TobCap/walkast or https://cran.r-project.org/web/packages/codetools/index.html be a useful way of inspecting ASTs?

@sebastian-luna-valero
Copy link
Member

Many thanks, Adam. I haven't looked at R dependencies yet. I would like to focus on Python for the first working version of the script, and then R is next. Let's keep this issue open as a reminder.

@Acribbs
Copy link
Author

Acribbs commented Mar 5, 2019

I am very happy to help at any stage as this would help me enormously with writing conda packages for software

@sebastian-luna-valero
Copy link
Member

Hi,

As of version 0.0.5 the script should now be able to translate Python and R dependencies.

At the moment, it scans:

  • files ending in .py for Python and R dependencies dependencies (rpy2).
  • files ending in .R for R dependencies.

I am skipping .ipynb and .Rmd files at the moment, so please bear that in mind.

I would be grateful if you could give it a go and provide feedback. This is still in the early stages, and the development is slow because I am doing it in my spare time.

Another important point to bear in mind is that the translations for both Python and R are not comprehensive and are mainly based in the dependencies used in the CGAT code. It will be a matter of time to keep adding more dependencies to the two json files in charge of the translation. This implies that the environment file produced as output may not be valid straight away, but conda should tell you that when creating the environment (i.e. error message: PackagesNotFoundError). I might need to stress that in the readme file.

Best regards,
Sebastian.

@Acribbs
Copy link
Author

Acribbs commented Apr 24, 2019

Hi Sebastian,

Im currently testing it and its looking really good. I have a few suggestions and will add these to a separate pull request.

Currently im writing a lot of rmarkdown for publication into packages and I think that it will definitely help if we have .Rmd implemented.

BW,
Adam

@Acribbs Acribbs closed this as completed Apr 26, 2019
@sebastian-luna-valero
Copy link
Member

Hi Adam (and other interested people),

On branch ipynb-rmd I have been looking into parsing both .Rmd and .ipynb files.

Provided that I only use a regex to look for the patern library(name) to scan R dependencies, that seems to work well on both .R and .Rmd files.

In the case of .ipynb files it is different. Python has:

  • import <name>
  • from <name> import <etc>

The most accurate way IMHO to find both is using Python Abstract Syntax Trees. I could not find a regex that would do the same job properly.

That having said, Python Abstract Syntax Trees only work with Python code and therefore the .ipynb files need to be translated to .py before the scan.

So far conda_deps has no dependencies out of the Python Standard Library which is something that I really like so we keep this package and its dependencies to a minimum. However, if we need to convert .ipynb to .py to reliably scan Python imports in .ipynb files, that makes conda_deps depend on nbformat and nbconvert which in turns downloads a lot of other things that are pretty much not required for any other thing.

A while ago, I had a look at jupytext as a way of version controlling Jupyter notebooks. Are you aware of it? I am wondering whether we need to move to that so:

  • Jupyter notebooks are correctly managed through version control
  • We keep conda_deps with no additional dependencies.

Please let me know your thoughts.

Best regards,
Sebastian

@Acribbs
Copy link
Author

Acribbs commented May 17, 2019

My take on this is that supporting ipynb is a really good feature and it would be extremely useful for myself (and im sure others) to support this.

While I like jupytext (I haven't yet used it, but I will check it out) and it seems to have features that I will use, Jupiter notebook is the standard and most commonly output. If I generate a pipeline at the moment its still most likely going to be with the ipynb suffix.

Also, if we want other users to use this then I would have thought that supporting .ipynb files is necessary.

Please disagree with me though, at the moment I dont have many .ipynb files because im an Rmarkdown fan, so we could leave this feature out at the moment and add it back in if it becomes important?

@sebastian-luna-valero
Copy link
Member

Hi,

I am working on this on the ipynb-rmd branch. Would you like to give it a go?

I have tested it with notebooks in the cgat-flow repository and I found that:

  • it works with notebooks for the peakcalling pipeline
  • but it doesn't with the notebooks for the bamstats pipeline

Do you know why?

Do you have other interesting notebooks to test it with?

I found an additional problem with code like the following:

#load R and the R packages required
%load_ext rpy2.ipython
%R require(ggplot2)

Any suggestions?

Best regards,
Sebastian

@Acribbs
Copy link
Author

Acribbs commented May 30, 2019

Hi @sebastian-luna-valero ,

I have been testing it one other notebooks and it seems to work well for those. I will have a look into why it isn't working for bamstats today.

Just to be clear, with the additional code, is this embedded within the notebook?

BW,
Adam

@Acribbs
Copy link
Author

Acribbs commented May 31, 2019

hi @sebastian-luna-valero, I have looked into the issue with the bamstats notebooks and it seems like there is a problem with the encoding of the notebooks. I haven't pinned the issue fully down yet but it seems like there are a few corrupted lines in the notebooks. No idea how they got there and its a pain to debug. Although I think the notebooks may need to be re-written anyway and im going to give it to a student as a coding exercise to fix.

I have tested the branch and it all works fine.

In relation to the issue you mentioned previously, are you referring to the require(ggplot2)? or the %R? If it is the require() then I don't think we should support this at the moment as library() should always be used to load a package. require() is basically a try and except in R. Here is a great blog post I found describing why require shouldn't be used: https://yihui.name/en/2014/07/library-vs-require/

@Acribbs
Copy link
Author

Acribbs commented Jun 13, 2019

Actually regarding the require(), iv been thinking about it more and maybe we do need to support it as im currently writing an r package and require is used to check dependancies in a package.

@sebastian-luna-valero
Copy link
Member

Hi Adam,

It looks like you should be using library() instead of require():
https://g4greetz.wordpress.com/2016/12/08/difference-between-library-and-require-in-r/

require() is used inside functions, as it outputs a warning and continues if the package is not found, whereas library() will throw an error.

According to that, I think it is preferable to support library() only.

On the other hand, regarding my question about Jupyter magics, I have had a look at this:
https://ipython.readthedocs.io/en/stable/config/extensions/#extensions-bundled-with-ipython

and I think we only need to support rmagicand cythonmagic and I have pushed changes to reflect that on the ipynb-rmd branch.

I am going to release a new version with these changes.

Best regards,
Sebastian

@Acribbs
Copy link
Author

Acribbs commented Jul 12, 2019

I agree, library() is preferred. Notebook merge looks really good

What was your ultimate aim or goal for this work? Were you thinking on writing a manuscript to release this? What about a manuscript on best practices for developing python software for computational biology or even a conda specific software?

@sebastian-luna-valero
Copy link
Member

Thanks, Adam.

The only goal is to make dependency management easier. I was doing something along these lines previously with bash scripts but wanted to get it out as an stand-alone tool that could be of broader interest. Hopefully now more people can benefit from it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants