Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building from source issues - Is TypeTE Docker container ready? #2

Open
moldach opened this issue Feb 6, 2021 · 9 comments
Open

Building from source issues - Is TypeTE Docker container ready? #2

moldach opened this issue Feb 6, 2021 · 9 comments

Comments

@moldach
Copy link

moldach commented Feb 6, 2021

As I'm working with patient data (level 4 data) we're on a secure Linux compute cluster where, for security reasons, it is not possible to make outbound connections to the internet.

Originally I had tried building from source, however, I ran into major issues with the installation of a particular Perl dependency (see notes at bottom) therefore I would like to run the Docker container for TypeTE but not sure if it's ready or not?

me not knowing how to use the container

sudo docker pull cgoubert/typete
sudo docker run -it --entrypoint /home/TypeTE/softwares/TypeTE/ cgoubert/typete run_TypeTE_NRef.sh &> TypeTE.log &

Doesn't look running this container would be straight-forward, so if it's functional some documentation would be greatly helpful.

Issues with building from source

As I have no outbound connection to the internet I cannot use pip to download Perl modules. After downloading/installing all the dependencies listed I ran typeTE and recieved errors about missing Perl module Bio::SeqIO so I have to do the following.

First, I grab the link for the module from meta::cpan website on a laptop with internet connection.

wget https://cpan.metacpan.org/authors/id/C/CJ/CJFIELDS/BioPerl-1.7.8.tar.gz

Then I transfer this to the secure computing environment into my perl5 directory and tar zxvf BioPerl-1.7.8.tar.gz && cd BioPerl, followed by building perl Makefile.PL. I will then verify the installation with perl -e "use Bio::SeqIO (if I don't see errors that means it's installed).

[moldach@marc TypeTE-Test]$ perl -e "use Bio::SeqIO;"
[moldach@marc TypeTE-Test]$

Next try to run typeTE again but I get an error about String::Approx so I follow the same method described, followed by perl -e "use String::Approx qw(amatch);" - things appear to be installed:

 [moldach@marc TypeTE-Test]$ perl -e "use String::Approx qw(amatch);"
[moldach@marc TypeTE-Test]$

I add both the perl -e statements now to the top of my batch script and try to run typeTE and here is where the odd behavior is happening:

Script

#!/bin/bash
#SBATCH --job-name=typeTE_test # Job name
#SBATCH --mail-type=END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=moldach@ucalgary.ca # Where to send mail
#SBATCH --ntasks=1 #Run on a single CPU
#SBATCH --cpus-per-task=1 # How many cores?
#SBATCH --mem-per-cpu=1G
#SBATCH --output=typeTE_test_%j.log # Standard output and error log
#SBATCH --error=typeTE_test_%j.err # Error log
#SBATCH --time=01:00:00
pwd; hostname; date

perl -e "use Bio::Seq"
perl -e "use String::Approx"

#bash run_TypeTE_Ref.sh

date

As you can see I've commented out the run_TypeTE_Ref.sh script and the first call to Bio::Seq runs successfully without error; however, the call to String::Approx throws an error.

Error

Can't locate String/Approx.pm in @INC (you may need to install the String::Approx module) (@INC contains: /project/M-mtgraovac182840/perl5-matt/lib/perl5 /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5) at -e line 1.
BEGIN failed--compilation aborted at -e line 1.

Albeit, not directly a typeTE issue, this issue with the String::Approx Perl module dependency in the linchpin preventing me from using this tool.

While we are fairly new to working in this restrictive environment I have successfully installed 15 other Perl modules into /project/M-mtgraovac182840/perl5-matt/, by running the install process described above (e.g. tar ... && cd ... && perl Makefile.pl)

@clemgoub
Copy link
Owner

clemgoub commented Feb 8, 2021

Dear Matthiew,

Thank you for your interest! I'm so sorry, indeed this repos is a dev version that is not functional. It's not on your end!

For the use of TypeTE as in the paper, I recommend to use the github. Be sure that you have bioperl properly installed because it is a source of problems.

We are currently working to improve the deletion pipeline (reference insertions) and this will be integrated with nextflow/docker. However we need a few more months on the dev!

Best,

Clément

@clemgoub clemgoub closed this as completed Feb 8, 2021
@clemgoub
Copy link
Owner

clemgoub commented Feb 8, 2021

By the way, I just found you v a p o R w a v e package and I love it! I'm gonna try it out for my next presentation! =)

@clemgoub clemgoub reopened this Feb 10, 2021
@clemgoub
Copy link
Owner

Reading your issue again, I wonder if this is not related to the PERL5LIB variable which doesn't points toward local libraries. I am not a perl expert, but maybe @jainy who coded the perl scripts can help you!

Best,

Clément

@jainy
Copy link
Collaborator

jainy commented Feb 10, 2021

Hi Matthiew,

Can you try to install String::Approx using an alternative method and check if that works.

perl -MCPAN -e shell
install String::Approx

Thanks
Jainy

@moldach
Copy link
Author

moldach commented Feb 17, 2021

Hi @jainy

Where does perl -MCPAN -e shell, followed by install String::Approx install modules?

Here is what I see inside the lib directory `perl5/lib/perl5:

(base) mtg@mtg-ThinkPad-P53:~/perl5/lib/perl5$ ll
total 212
drwxrwxr-x 15 mtg mtg   4096 Jan  7 16:36 ./
drwxrwxr-x  3 mtg mtg   4096 Jul 21  2020 ../
drwxrwxr-x  3 mtg mtg   4096 Jul 21  2020 5.30.0/
drwxrwxr-x  2 mtg mtg   4096 Jan  7 14:44 App/
drwxrwxr-x  3 mtg mtg   4096 Jul 21  2020 Archive/
drwxrwxr-x  9 mtg mtg   4096 Jan  7 14:44 CPAN/
-r--r--r--  1 mtg mtg 146411 Jun 12  2020 CPAN.pm
drwxrwxr-x  2 mtg mtg   4096 Jul 21  2020 Devel/
drwxrwxr-x  3 mtg mtg   4096 Jan  7 16:36 Exporter/
drwxrwxr-x  3 mtg mtg   4096 Jul 21  2020 lib/
drwxrwxr-x  3 mtg mtg   4096 Jan  7 14:46 List/
drwxrwxr-x  2 mtg mtg   4096 Jul 21  2020 local/
drwxrwxr-x  2 mtg mtg   4096 Jul 21  2020 Mock/
drwxrwxr-x  4 mtg mtg   4096 Jul 21  2020 POD2/
drwxrwxr-x  3 mtg mtg   4096 Jan  7 16:36 Test/
drwxrwxr-x 11 mtg mtg   4096 Feb 17 10:54 x86_64-linux-gnu-thread-multi/

These don't look like String::Approx components - I could be wrong.

(base) mtg@mtg-ThinkPad-P53:~/perl5/lib/perl5$ find . -name "String*"
./Test/Deep/String.pm
./Archive/Zip/StringMember.pm
./x86_64-linux-gnu-thread-multi/auto/String
./x86_64-linux-gnu-thread-multi/String

As I mentioned, there is no outgoing internet connection on this server so I can install String::Approx with that method directly; I need to run this on another computer with internet connection and then transfer the compiled module over. This results in module folder for each Perl library:

(base) [moldach@marc TypeTE-Test]$ ll /project/M-mtgraovac182840/perl5-matt/lib/perl5
total 216
drwxrwsr-x  3 moldach M-mtgraovac182840  4096 Jan  7 14:59 5.30.0
drwxrwsr-x  2 moldach M-mtgraovac182840  4096 Jan  7 14:59 App
drwxrwsr-x  3 moldach M-mtgraovac182840  4096 Jan  7 14:59 Archive
drwxr-sr-x 24 moldach M-mtgraovac182840  8192 Feb  5 13:51 Bio
-r--r--r--  1 moldach M-mtgraovac182840  7252 Feb  2 22:04 BioPerl.pm
drwxrwsr-x  9 moldach M-mtgraovac182840  4096 Jan  7 14:59 CPAN
-r--r--r--  1 moldach M-mtgraovac182840     0 Jan  7 14:59 CPAN.pm
drwxr-sr-x  2 moldach M-mtgraovac182840  4096 Jan  7 15:35 Capture
drwxr-sr-x  2 moldach M-mtgraovac182840  4096 Jan  7 15:40 Config
drwxr-sr-x  2 moldach M-mtgraovac182840  4096 Jan  7 16:13 Data
drwxrwsr-x  2 moldach M-mtgraovac182840  4096 Jan  7 15:52 Devel
drwxr-sr-x  3 moldach M-mtgraovac182840  4096 Jan  7 16:41 Exporter
drwxr-sr-x  3 moldach M-mtgraovac182840  4096 Jan  7 16:14 File
drwxrwsr-x  3 moldach M-mtgraovac182840  4096 Jan  7 16:34 List
drwxr-sr-x  3 moldach M-mtgraovac182840  4096 Feb  5 15:17 Method
drwxrwsr-x  2 moldach M-mtgraovac182840  4096 Jan  7 14:59 Mock
drwxr-sr-x  4 moldach M-mtgraovac182840  4096 Jan  7 16:08 Module
drwxr-sr-x  3 moldach M-mtgraovac182840  4096 Feb  5 15:18 Moo
-r--r--r--  1 moldach M-mtgraovac182840 34419 Nov 24 17:58 Moo.pm
drwxr-sr-x  2 moldach M-mtgraovac182840  4096 Jan  7 16:20 Number
drwxrwsr-x  4 moldach M-mtgraovac182840  4096 Jan  7 14:59 POD2
drwxr-sr-x  3 moldach M-mtgraovac182840  4096 Feb  5 14:59 Parallel
drwxrwsr-x  4 moldach M-mtgraovac182840  4096 Jan  7 15:45 Test
drwxr-sr-x  2 moldach M-mtgraovac182840  4096 Jan  7 16:17 Text
drwxr-sr-x  3 moldach M-mtgraovac182840  4096 Jan  7 15:25 inc
drwxrwsr-x  3 moldach M-mtgraovac182840  4096 Jan  7 14:59 lib
drwxrwsr-x  2 moldach M-mtgraovac182840  4096 Jan  7 14:59 local
-r--r--r--  1 moldach M-mtgraovac182840  1218 Sep  2 04:16 oo.pm
drwxr-sr-x  5 moldach M-mtgraovac182840  4096 Feb  5 14:17 x86_64-linux
drwxrwsr-x 10 moldach M-mtgraovac182840  4096 Jan  7 14:59 x86_64-linux-gnu-thread-multi

Where each module folder looks like:

(base) [moldach@marc TypeTE-Test]$ tree /project/M-mtgraovac182840/perl5-matt/lib/perl5/Parallel/
/project/M-mtgraovac182840/perl5-matt/lib/perl5/Parallel/
|-- ForkManager
|   `-- Child.pm
`-- ForkManager.pm

Now, when I look on the laptop where I installed with your method I can find

$ find ~ -name  "*String::Approx*"
/home/mtg/.cpan/build/String-Approx-3.28-0/blib/man3/String::Approx.3pm
/home/mtg/perl5/man/man3/String::Approx.3pm

Okay let's take a look back on the secure linux in the man3 sub-directory:

(base) [moldach@marc man3]$ pwd
/project/M-mtgraovac182840/perl5-matt/man/man3
(base) [moldach@marc man3]$ ll
total 7232
-rw-r--r-- 1 moldach M-mtgraovac182840  12988 Jan  7 15:00 App::Cpan.3pm
-rw-r--r-- 1 moldach M-mtgraovac182840  74982 Jan  7 15:00 Archive::Zip.3pm
-r--r--r-- 1 moldach M-mtgraovac182840      0 Jan  7 14:59 Archive::Zip::FAQ.3pm
-rw-r--r-- 1 moldach M-mtgraovac182840   6734 Jan  7 15:00 Archive::Zip::MemberRead.3pm
-r--r--r-- 1 moldach M-mtgraovac182840      0 Jan  7 14:59 Archive::Zip::Tree.3pm
-r--r--r-- 1 moldach M-mtgraovac182840  20959 Feb  5 13:51 Bio::Align::AlignI.3
-r--r--r-- 1 moldach M-mtgraovac182840  18616 Feb  5 13:51 Bio::Align::DNAStatistics.3

...

-r--r--r-- 1 moldach M-mtgraovac182840  20053 Feb  5 14:49 String::Approx.3

...

Above, I've cut-off most of the output, however, two things are apparent/confusing:

  1. There are two types of files inside the man3 directory: .3 and .3pm files but it's not clear to me what the difference is?
  2. My installation of String::Approx created a .3 file; however, using the perl -MCPAN -e shell creates a .3pm file instead.

Therefore, I tried to copy over this .3pm file and try again...

And @clemgoub, on your point about the possibility of it being related to PERL5LIB, I didn't think so, since the bash script did not error on perl -e "use Bio::Seq". However, to assuage those concerns I've now also included paths to PERL5LIB at the start of the script:

#!/bin/bash
#SBATCH --job-name=typeTE_test # Job name
#SBATCH --ntasks=1 #Run on a single CPU
#SBATCH --cpus-per-task=1 # How many cores?
#SBATCH --mem-per-cpu=1G
#SBATCH --output=typeTE_test_%j.log # Standard output and error log
#SBATCH --error=typeTE_test_%j.err # Error log
#SBATCH --time=00:05:00

PATH="/project/M-mtgraovac182840/perl5-matt/bin${PATH:+:${PATH}}"; export PATH;
PERL5LIB="/project/M-mtgraovac182840/perl5-matt/lib/perl5${PERL5LIB:+:${PERL5LIB}}"; export PERL5LIB;
PERL_LOCAL_LIB_ROOT="/project/M-mtgraovac182840/perl5-matt${PERL_LOCAL_LIB_ROOT:+:${PERL_LOCAL_LIB_ROOT}}"; export PERL_LOCAL_LIB_ROOT;
PERL_MB_OPT="--install_base \"/project/M-mtgraovac182840/perl5-matt\""; export PERL_MB_OPT;
PERL_MM_OPT="INSTALL_BASE=/project/M-mtgraovac182840/perl5-matt"; export PERL_MM_OPT;
alias perl='/project/M-mtgraovac182840/tools/perl-5.32.0-good/perl'

perl -e "use Bio::Seq"
perl -e "use String::Approx qw(amatch);"

In the error log, we see that no error is raised for Bio::Seq but for String::Approx

Can't locate String/Approx.pm in @INC (you may need to install the String::Approx module) (@INC contains: /project/M-mtgraovac182840/perl5-matt/lib/perl5 /project/M-mtgraovac182840/perl5-matt/lib/perl5 /project/M-mtgraovac182840/perl5-matt/lib/perl5 /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5) at -e line 1.
BEGIN failed--compilation aborted at -e line 1.

@jainy
Copy link
Collaborator

jainy commented Feb 17, 2021

Hi Matthiew,

I am not sure. I found a webpage that describes the installation of perl modules locally. Can you try that.

https://blogs.iu.edu/ncgas/2019/05/30/installing-perl-modules-locally/

This describes how to get the String::Approx and install locally and add that path to the bashrc file as needed.

Hope it helps!

Best,
Jainy

@moldach
Copy link
Author

moldach commented Feb 18, 2021

I am not sure. I found a webpage that describes the installation of perl modules locally. Can you try that.

https://blogs.iu.edu/ncgas/2019/05/30/installing-perl-modules-locally/

That is exactly the process I described above...

local_00
local_01

@jainy
Copy link
Collaborator

jainy commented Feb 18, 2021

Hi Matthiew,

I am sorry if I am repeating myself.
Just want to make sure that you downloaded String::Apprx module from cpan website using the following commands. In your message i see that you downloaded and installed bioperl but not String::Approx.

wget https://cpan.metacpan.org/authors/id/J/JH/JHI/String-Approx-3.28.tar.gz

Best,
Jainy

@moldach
Copy link
Author

moldach commented Feb 19, 2021

@jainy yes, here were the commands I used:

$ wget https://cpan.metacpan.org/authors/id/J/JH/JHI/String-Approx-3.28.tar.gz
# transfer this to secure cluster in the `perl5` directory
$ tar -xzvf String-Approx-3.28.tar.gz 
$ cd String-Approx-3.28
$ perl Makefile.pl PREFIX=$PWD
Checking if your kit is complete...
Looks good
Only one of PREFIX or INSTALL_BASE can be given.  Not both.

Here is a bit different than the instructions because it's failing with the PREFIX=$PWD parameter - I need to run the following instead:

$ perl Makefile.PL
Checking if your kit is complete...
Looks good
Generating a Unix-style Makefile
Writing Makefile for String::Approx
Writing MYMETA.yml and MYMETA.json
$ make 
$ make install
$ make test # shows Result: PASS

Let's confirm it work from the LOGIN node:

$ perl -e "use String::Approx qw(amatch);"
$

Looks like it works there.

Now try submitting a script and make sure to include the environmental variables that affect Perl5:

#!/bin/bash
#SBATCH --job-name=typeTE_test # Job name
#SBATCH --ntasks=1 #Run on a single CPU
#SBATCH --cpus-per-task=1 # How many cores?
#SBATCH --mem-per-cpu=1G
#SBATCH --output=typeTE_test_%j.log # Standard output and error log
#SBATCH --error=typeTE_test_%j.err # Error log
#SBATCH --time=00:05:00
pwd; hostname; date

PATH="/project/M-mtgraovac182840/perl5-matt/bin${PATH:+:${PATH}}"; export PATH;
PERL5LIB="/project/M-mtgraovac182840/perl5-matt/lib/perl5${PERL5LIB:+:${PERL5LIB}}"; export PERL5LIB;
PERL_LOCAL_LIB_ROOT="/project/M-mtgraovac182840/perl5-matt${PERL_LOCAL_LIB_ROOT:+:${PERL_LOCAL_LIB_ROOT}}"; export PERL_LOCAL_LIB_ROOT;

### the article recommends un-setting the following which I had in there before - comment them out
#PERL_MB_OPT="--install_base \"/project/M-mtgraovac182840/perl5-matt\""; export PERL_MB_OPT;
#PERL_MM_OPT="INSTALL_BASE=/project/M-mtgraovac182840/perl5-matt"; export PERL_MM_OPT;
PERL_MB_OPT=
PERL_MM_OPT=

## include Bio::Seq first, as this used the exact same local installation as String::Approx
### Bio::Seq passes successfully but String::Approx throws an error
perl -e "use Bio::Seq"
perl -e "use String::Approx qw(amatch);"

And I get the following error:

Can't locate String/Approx.pm in @INC (you may need to install the String::Approx module) (@INC contains: /project/M-mtgraovac182840/perl5-matt/lib/perl5 /project/M-mtgraovac182840/perl5-matt/lib/perl5 /project/M-mtgraovac182840/perl5-matt/lib/perl5 /project/M-mtgraovac182840/perl5-matt/lib/perl5 /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5) at -e line 1.
BEGIN failed--compilation aborted at -e line 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants