Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

notes: running magsearch for christy #2

Open
ctb opened this issue Sep 25, 2022 · 0 comments
Open

notes: running magsearch for christy #2

ctb opened this issue Sep 25, 2022 · 0 comments

Comments

@ctb
Copy link
Owner

ctb commented Sep 25, 2022

editing here: https://hackmd.io/EQG9YLZwQGOeoKWjy-fHFg

Running MAGsearch for Christy

Christy G. asked me to run MAGsearch for her, and I thought I'd document it this time!

first, sketch the genomes.

I grabbed all of her genomes and then ran:

sourmash sketch dna -p k=31,scaled=1000 *

in the directory containing the FASTA files.

I then put them in a zip file:

zip -r christy-2022.09.25.zip *.sig

and transferred them to farm (our HPC).

2. unpack the sketches and generate a list

On farm, I went to my MAGsearch directory:

cd ~ctbrown/scratch/magsearch
mkdir query.christy-2022.09.25

and unzipped the sketches:

unzip ~/transfer/christy-2022.09.25.zip

and made a list of the files relative to the base MAGsearch directory:

ls -1 query.christy-2022.09.25/* > query.christy-2022.09.25.txt

3. make a configuration file

I made a new copy of the config file:

cp config.yml config-christy-2022.09.25.yml

and then added the search-specific things:

# unique query name
query_name: christy-2022.09.25

# list of paths of query signatures - 1 or more.
query_sigs: query.christy-2022.09.25.txt

# catalog to search - list of paths of subject signatures
#catalog: /group/ctbrowngrp/sra_search/catalogs/metagenomes
catalog: catalog.sub

# containment threshold to use
threshold: 0.01

# k-mer size to use
ksize: 31

# scaled to use
scaled: 1000

# where to put the results
out_dir: "output.magsearch"

4. start an srun session

Next I started screen and ran a beefy srun:

screen -S magsearch-christy
srun -p high2 --time=48:00:00 --nodes=1 --cpus-per-task 32 --mem 50GB --pty /bin/bash

and ran a test:

snakemake -s magsearch.snakefile --configfile config-christy-2022.09.25.yml -j 32

note that this is a test because I'm only searching a small catalog, catalog.sub - this makes sure the queries etc can all be loaded before we run the thing for a day or two!

5. check logs for test

It looks like all went well:

% cat output.magsearch/logs/sra_search.k31.log
[2022-09-25T12:56:54Z INFO  sra_search] Loading queries
[2022-09-25T12:56:54Z INFO  sra_search] Loaded 27 query signatures
[2022-09-25T12:56:54Z INFO  sra_search] Loading siglist
[2022-09-25T12:56:54Z INFO  sra_search] Loaded 14 sig paths in siglist
[2022-09-25T12:56:54Z INFO  sra_search] Processed 0 search sigs

(the last line is output only every so often, so more than 0 search sigs were processed.)

6. run for realz

Remove test output,

rm output.magsearch/results/christy-2022.09.25.csv 

edit the config file like so:

# catalog to search - list of paths of subject signatures
catalog: /group/ctbrowngrp/sra_search/catalogs/metagenomes
#catalog: catalog.sub

and run!

snakemake -s magsearch.snakefile --configfile config-christy-2022.09.25.yml -j 32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant