<a id='home'></a>
### purpose

create genotype runs of GF using the sets of random loci assigned to individual runs

### notes

After originally submitting the training and fitting jobs I had to go back in and create dummy envfiles for pooled data because mvp02.get_envdata asserts 2 envfiles. I then resubmitted the fitting jobs since the training jobs did fine.

### outline

1. [get dirs with training files](#dirs)

    get a list of directories containing genomic data

1. [symlink supplementary files](#sup)

    directories are nested by the number of loci provided for training. There is no need to copy environmental data to each directory, so I symlink it. These files are used by training scripts.
    
1. [create sh files for training and predicting](#shfiles)

    create slurm sbatch files to submit training jobs to the cluster, as well as jobs that use trained model to predict models to common garden environments
    
1. [create shfiles for making predictions of the trained GF models to specific environments](#shfiles)

    1. [create sh files for training and predicting](#training)
    1. [create shfiles for making predictions of the trained GF models to specific environments](#predict)

1. [submit jobs using 500 loci](#submit500)

1. [submit jobs using 5000 loci](#submit5000)

1. [submit jobs using 10000 loci](#submit10k)

1. [submit jobs using 20000 loci](#submit20k)

In [1]:
from pythonimports import *

import MVP_summary_functions as mvp

lview, dview = get_client(cluster_id='1705962011-6977', profile='lotterhos')

outerdir = '/work/lotterhos/brandon/ind_runtimes'

mvp.latest_commit()
session_info.show()

36 36
#########################################################
Today:	January 22, 2024 - 17:20:30 EST
python version: 3.8.5
conda env: mvp_env

Current commit of [1mpythonimports[0m:
[33mcommit 419895d157c97717f835390196c13cf973d25eba[m  
Merge: e20434f 1e09b6c  
Author: Brandon Lind <lind.brandon.m@gmail.com>

Current commit of [94m[1mMVP_offsets[0m[0m:
[33mcommit c5bc403582e5bafc1036be8cd2a4bb0b4d154623[m  
Author: Brandon Lind <lind.brandon.m@gmail.com>  
Date:   Mon Dec 18 14:38:19 2023 -0500
#########################################################



<a id='dirs'></a>
# get dirs with training files
[top](#home)

In [2]:
# get a list of directories for each rep
set_nums = ['00500', '05000', '10000', '20000']

dirs = fs(outerdir, startswith='run', dirs=True)

dst_dirs = defaultdict(list)
for d in dirs:
    rep = op.basename(d)
    print(ColorText(rep).bold())
    
    for set_num in set_nums: 
        dst_dirs[rep].append(f'{d}/{set_num}/gradient_forests/training/training_files')
        
    print(dst_dirs[rep], '\n')

[1mrun_20220919_0-225[0m
['/work/lotterhos/brandon/ind_runtimes/run_20220919_0-225/00500/gradient_forests/training/training_files', '/work/lotterhos/brandon/ind_runtimes/run_20220919_0-225/05000/gradient_forests/training/training_files', '/work/lotterhos/brandon/ind_runtimes/run_20220919_0-225/10000/gradient_forests/training/training_files', '/work/lotterhos/brandon/ind_runtimes/run_20220919_0-225/20000/gradient_forests/training/training_files'] 

[1mrun_20220919_225-450[0m
['/work/lotterhos/brandon/ind_runtimes/run_20220919_225-450/00500/gradient_forests/training/training_files', '/work/lotterhos/brandon/ind_runtimes/run_20220919_225-450/05000/gradient_forests/training/training_files', '/work/lotterhos/brandon/ind_runtimes/run_20220919_225-450/10000/gradient_forests/training/training_files', '/work/lotterhos/brandon/ind_runtimes/run_20220919_225-450/20000/gradient_forests/training/training_files'] 

[1mrun_20220919_450-675[0m
['/work/lotterhos/brandon/ind_runtimes/run_20220919_4

In [3]:
# get a list of source directories
src_dirs = {}
for rep in dst_dirs.keys():
    print(rep)
    
    src_dir = f'/work/lotterhos/MVP-Offsets/{rep}/gradient_forests/training/training_files'
    assert op.exists(src_dir)
    
    src_dirs[rep] = src_dir

run_20220919_0-225
run_20220919_225-450
run_20220919_450-675


<a id='sup'></a>
# symlink supplementary files

[top](#home)

In [4]:
# symlink envfile and rangefile
for rep, d in src_dirs.items():
    files = fs(d, exclude=['maf-gt-p01', 'adaptive', 'neutral'], endswith='ind.txt')
    
    assert len(files) == 450  # 2 files for each of 225 seeds (one envfile, one rangefile)

    for src in pbar(files, desc=rep):
        for d in dst_dirs[rep]:
            dst = f'{d}/{op.basename(src)}'
            
            try:
                os.symlink(src, dst)
            except FileExistsError as e:
                pass

run_20220919_0-225: 100%|███████████████| 450/450 [00:07<00:00, 59.65it/s]
run_20220919_225-450: 100%|███████████████| 450/450 [00:08<00:00, 54.62it/s]
run_20220919_450-675: 100%|███████████████| 450/450 [00:08<00:00, 51.01it/s]


In [5]:
# add pooled envfiles to directories so jobs pass mvp02.get_envdata filecount assertion
for rep, repdirs in dst_dirs.items():
    for dst_dir in pbar(repdirs, desc=rep):
#         dst_dir = f'{d}/training/training_files'
#         src_dir = f'/work/lotterhos/MVP-Offsets/{rep}/gradient_forests/training/training_files'
        src_dir = src_dirs[rep]
        assert src_dir != dst_dir

        envfiles = fs(src_dir, endswith='envfile_GFready_pooled.txt')
        assert len(envfiles) == 225

        for src in envfiles:
            dst = f'{dst_dir}/{op.basename(src)}'

            try:
                os.symlink(src, dst)
            except FileExistsError as e:
                pass

run_20220919_0-225: 100%|███████████████| 4/4 [00:04<00:00,  1.21s/it]
run_20220919_225-450: 100%|███████████████| 4/4 [00:04<00:00,  1.10s/it]
run_20220919_450-675: 100%|███████████████| 4/4 [00:03<00:00,  1.02it/s]


<a id='shfiles'></a>
# create sh files for training and predicting

[top](#home)

<a id='training'></a>
### create training shfiles

[top](#home)

I use training sh files previously created for Lind & Lotterhos (2024). I edit them to serve our purposes

In [6]:
dst_dirs[rep]

['/work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/00500/gradient_forests/training/training_files',
 '/work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/05000/gradient_forests/training/training_files',
 '/work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/10000/gradient_forests/training/training_files',
 '/work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/20000/gradient_forests/training/training_files']

In [7]:
all_shfiles = []
for rep, src_dir in src_dirs.items():
    shdir = src_dir.replace('_files', '_shfiles')
    
    shfiles = fs(shdir, endswith='ind_all.sh', exclude='watcher')
    
    assert len(shfiles) == 225
    
    for sh in pbar(shfiles, desc=rep):
    
        for dst_dir in dst_dirs[rep]:
            text = read(sh, lines=False)

            assert text.count(f'/home/b.lind/offsets/{rep}') == 2
            
            # replace training files and training outfiles dirs
            text = text.replace(f'/home/b.lind/offsets/{rep}', dst_dir.split('/gradient')[0])
            
            # replace mem and time and email
            text = text.replace('#SBATCH --time=5-00:00:00', '#SBATCH --time=1-00:00:00')
            text = text.replace('#SBATCH --mem=900000M', '#SBATCH --mem=50000M')
            text = text.replace('b.lind@northeastern.edu', 'dummy_email@gmail.com')
            
            dst_shdir = makedir(dst_dir.replace('training_files', 'training_shfiles'))
            dst_outdir = makedir(dst_dir.replace('training_files', 'training_outfiles'))
            
            newsh = f'{dst_shdir}/{op.basename(sh)}'
            with open(newsh, 'w') as o:
                o.write(text)
            all_shfiles.append(newsh)
            
len(all_shfiles)

run_20220919_0-225: 100%|███████████████| 225/225 [00:09<00:00, 23.68it/s]
run_20220919_225-450: 100%|███████████████| 225/225 [00:09<00:00, 23.38it/s]
run_20220919_450-675: 100%|███████████████| 225/225 [00:13<00:00, 16.65it/s]


2700

In [8]:
print(text)

#!/bin/bash
#SBATCH --job-name=1231768_GF_training_ind_all
#SBATCH --time=1-00:00:00
#SBATCH --mem=50000M
#SBATCH --partition=long
#SBATCH --output=1231768_GF_training_ind_all_%j.out
#SBATCH --mail-user=dummy_email@gmail.com
#SBATCH --mail-type=FAIL

source $HOME/.bashrc  # assumed that conda init is within .bashrc
conda deactivate
conda activate r35

cd /work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/20000/gradient_forests/training/training_files

/home/b.lind/anaconda3/envs/r35/lib/R/bin/Rscript \
/home/b.lind/code/MVP-offsets/01_src/MVP_gf_training_script.R \
1231768_Rout_Gmat_sample_maf-gt-p01_GFready_ind_all.txt \
1231768_envfile_GFready_ind.txt \
1231768_rangefile_GFready_ind.txt \
1231768_GF_training_ind_all \
/work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/20000/gradient_forests/training/training_outfiles




<a id='predict'></a>
## create shfiles for making predictions of the trained GF models to specific environments

MVP_02 doesn't use slimdir, so that doesn't matter

MVP_03 needs the original slimdir to get fitness/envdata/locations etc for validation

[top](#home)

In [9]:
for rep, src_dir in src_dirs.items():
    fitting_shdir = src_dir.replace('training/training_files', 'fitting/fitting_shfiles')
    
    shfiles = fs(fitting_shdir, endswith='.sh', exclude='watcher')
    
    for sh in pbar(shfiles, desc=rep):
        for dst_dir in dst_dirs[rep]:
            dst_shdir = makedir(dst_dir.replace('training/training_files', 'fitting/fitting_shfiles'))
            new_sh = f'{dst_shdir}/{op.basename(sh)}'
            
            text = read(sh, lines=True, ignore_blank=True)
            
            assert 'time' in text[2]
            text[2] = '#SBATCH --time=1-00:00:00'

            assert 'MVP_02' in text[-2]
            text[-2] = text[-2].replace(f'/home/b.lind/offsets/{rep}',
                                        dst_dir.split('/grad')[0])
            text[-2] += ' 1 pooled'  # expect one RDS file and exclude pooled (non-default for MVP_02)
            
            assert 'MVP_03' in text[-1]
            text[-1] = text[-1].replace(f'/home/b.lind/offsets/{rep}/gradient_forests',
                                        dst_dir.split('/training')[0])
            text[-1] += ' 100 pooled'  # expect 100 RDS files and exclude pooled (non-default for MVP_03)
            
            if 'partition' in text[5]:
                text.remove(text[5])
            
            # erase previous dependencies, will update after sbatching
            assert 'dependency' in text[6]
            text[6] = '#SBATCH --dependency=afterok:'
            
            assert 'mail' in text[7]
            text[7] = '#SBATCH --mail-user=dummy_email@gmail.com'  # new code, don't want 1e6 emails
            
            
            with open(new_sh, 'w') as o:
                o.write('\n'.join(text))
dst_dir 

run_20220919_0-225: 100%|███████████████| 225/225 [00:09<00:00, 23.66it/s]
run_20220919_225-450: 100%|███████████████| 225/225 [00:08<00:00, 26.68it/s]
run_20220919_450-675: 100%|███████████████| 225/225 [00:07<00:00, 28.29it/s]


'/work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/20000/gradient_forests/training/training_files'

In [10]:
text

['#!/bin/bash',
 '#SBATCH --job-name=1231768_gf_fitting',
 '#SBATCH --time=1-00:00:00',
 '#SBATCH --ntasks=1',
 '#SBATCH --mem=300000M',
 '#SBATCH --output=1231768_gf_fitting_%j.out',
 '#SBATCH --dependency=afterok:',
 '#SBATCH --mail-user=dummy_email@gmail.com',
 '#SBATCH --mail-type=FAIL',
 '#SBATCH --nodes=1',
 '#SBATCH --cpus-per-task=7',
 'cd /home/b.lind/code/MVP-offsets/01_src',
 'source $HOME/.bashrc',
 'conda activate mvp_env',
 'python MVP_02_fit_gradient_forests.py 1231768 /work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/20000/slimdir /work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/20000/gradient_forests/training/training_outfiles /home/b.lind/anaconda3/envs/r35/lib/R/bin/Rscript 1 pooled',
 'python MVP_03_validate_gradient_forests.py 1231768 /home/b.lind/offsets/run_20220919_450-675/slimdir /work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/20000/gradient_forests 100 pooled']

In [11]:
new_sh

'/work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/20000/gradient_forests/fitting/fitting_shfiles/1231768_gf_fitting.sh'

In [12]:
read(new_sh)

['#!/bin/bash',
 '#SBATCH --job-name=1231768_gf_fitting',
 '#SBATCH --time=1-00:00:00',
 '#SBATCH --ntasks=1',
 '#SBATCH --mem=300000M',
 '#SBATCH --output=1231768_gf_fitting_%j.out',
 '#SBATCH --dependency=afterok:',
 '#SBATCH --mail-user=dummy_email@gmail.com',
 '#SBATCH --mail-type=FAIL',
 '#SBATCH --nodes=1',
 '#SBATCH --cpus-per-task=7',
 'cd /home/b.lind/code/MVP-offsets/01_src',
 'source $HOME/.bashrc',
 'conda activate mvp_env',
 'python MVP_02_fit_gradient_forests.py 1231768 /work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/20000/slimdir /work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/20000/gradient_forests/training/training_outfiles /home/b.lind/anaconda3/envs/r35/lib/R/bin/Rscript 1 pooled',
 'python MVP_03_validate_gradient_forests.py 1231768 /home/b.lind/offsets/run_20220919_450-675/slimdir /work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/20000/gradient_forests 100 pooled']

<a id='submit500'></a>
# submit jobs using 500 loci

I have to be careful about which shfiles I submit since my code relies on non-duplicated job names in the slurm queue

[top](#home)

### submit training files

In [14]:
rep

'run_20220919_450-675'

In [15]:
dst_dirs[rep]

['/work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/00500/gradient_forests/training/training_files',
 '/work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/05000/gradient_forests/training/training_files',
 '/work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/10000/gradient_forests/training/training_files',
 '/work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/20000/gradient_forests/training/training_files']

In [16]:
d_index = 0  # which dir I'm submitting

jobnames =[]
pids = defaultdict(dict)
for rep, repdirs in dst_dirs.items():
    sh_dir = repdirs[d_index].replace('training_files', 'training_shfiles')
    
    shfiles = fs(sh_dir, endswith='.sh', exclude='watcher')
    
    assert len(shfiles) == 225
    
    for sh in pbar(shfiles, desc=rep):
        jobnames.append(op.basename(sh))
        seed = op.basename(sh).split("_")[0]
        pids[rep][seed] = sbatch(sh, progress_bar=False)

luni(jobnames), len(jobnames)

run_20220919_0-225: 100%|███████████████| 225/225 [00:34<00:00,  6.58it/s]
run_20220919_225-450: 100%|███████████████| 225/225 [00:38<00:00,  5.91it/s]
run_20220919_450-675: 100%|███████████████| 225/225 [00:42<00:00,  5.29it/s]


(675, 675)

### submit fitting files

In [17]:
fitting_shfiles = []
for (rep, seed), pid in unwrap_dictionary(pids, progress_bar=True):
    fitting_shdir = dst_dirs[rep][d_index].replace('training/training_files', 'fitting/fitting_shfiles')
    fitting_sh = f'{fitting_shdir}/{seed}_gf_fitting.sh'

    text = read(fitting_sh)

    assert 'dependency' in text[6]
    text[6] = f'#SBATCH --dependency=afterok:{pid[0]}'

    with open(fitting_sh, 'w') as o:
        o.write('\n'.join(text))

    fitting_shfiles.append(fitting_sh)

print(len(fitting_shfiles))

text

100%|███████████████| 3/3 [00:06<00:00,  2.13s/it]

675





['#!/bin/bash',
 '#SBATCH --job-name=1231768_gf_fitting',
 '#SBATCH --time=1-00:00:00',
 '#SBATCH --ntasks=1',
 '#SBATCH --mem=300000M',
 '#SBATCH --output=1231768_gf_fitting_%j.out',
 '#SBATCH --dependency=afterok:40438394',
 '#SBATCH --mail-user=dummy_email@gmail.com',
 '#SBATCH --mail-type=FAIL',
 '#SBATCH --nodes=1',
 '#SBATCH --cpus-per-task=7',
 'cd /home/b.lind/code/MVP-offsets/01_src',
 'source $HOME/.bashrc',
 'conda activate mvp_env',
 'python MVP_02_fit_gradient_forests.py 1231768 /work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/00500/slimdir /work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/00500/gradient_forests/training/training_outfiles /home/b.lind/anaconda3/envs/r35/lib/R/bin/Rscript 1 pooled',
 'python MVP_03_validate_gradient_forests.py 1231768 /home/b.lind/offsets/run_20220919_450-675/slimdir /work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/00500/gradient_forests 100 pooled']

In [18]:
fitting_pids = sbatch(fitting_shfiles)

sbatching: 100%|███████████████| 675/675 [02:30<00:00,  4.48it/s]


### update job times/mems

In [19]:
sq = Squeue(grepping='train')

sq

[1m[38;2;128;128;128m🗒️  Queue Summary:
[0m[0m
{'[4m[1mlong[0m[0m': {'PD': 655, 'R': 20}}

In [20]:
Squeue(grepping='fit')

[1m[38;2;128;128;128m🗒️  Queue Summary:
[0m[0m
{'[4m[1mshort[0m[0m': {'PD': 675}}

In [21]:
Squeue(grepping='train').update(to_partition='short', num_jobs=0.5)

update: 100%|███████████████| 325/325 [00:22<00:00, 14.38it/s]


In [None]:
Squeue(grepping='train').update(to_partition='short', num_jobs=0.5)
Squeue(grepping='fit').update(to_partition='long', num_jobs=0.5)

In [22]:
Squeue(grepping='fit').update(to_partition='long', num_jobs=0.5)

update: 100%|███████████████| 338/338 [00:23<00:00, 14.39it/s]


In [23]:
Squeue()

[1m[38;2;128;128;128m🗒️  Queue Summary:
[0m[0m
{'[4m[1mlong[0m[0m': {'PD': 663, 'R': 25},
 '[4m[1mlotterhos[0m[0m': {'R': 1},
 '[4m[1mshort[0m[0m': {'PD': 612, 'R': 50}}

In [24]:
dst_dirs

defaultdict(list,
            {'run_20220919_0-225': ['/work/lotterhos/brandon/ind_runtimes/run_20220919_0-225/00500/gradient_forests/training/training_files',
              '/work/lotterhos/brandon/ind_runtimes/run_20220919_0-225/05000/gradient_forests/training/training_files',
              '/work/lotterhos/brandon/ind_runtimes/run_20220919_0-225/10000/gradient_forests/training/training_files',
              '/work/lotterhos/brandon/ind_runtimes/run_20220919_0-225/20000/gradient_forests/training/training_files'],
             'run_20220919_225-450': ['/work/lotterhos/brandon/ind_runtimes/run_20220919_225-450/00500/gradient_forests/training/training_files',
              '/work/lotterhos/brandon/ind_runtimes/run_20220919_225-450/05000/gradient_forests/training/training_files',
              '/work/lotterhos/brandon/ind_runtimes/run_20220919_225-450/10000/gradient_forests/training/training_files',
              '/work/lotterhos/brandon/ind_runtimes/run_20220919_225-450/20000/gradient_f

<a id='submit5000'></a>
# submit jobs using 5000 loci
[top](#home)

In [1]:
from pythonimports import *

import MVP_summary_functions as mvp

lview, dview = get_client(cluster_id='1706017763-opoj', profile='lotterhos')

outerdir = '/work/lotterhos/brandon/ind_runtimes'

mvp.latest_commit()
session_info.show()

36 36
#########################################################
Today:	January 23, 2024 - 08:52:27 EST
python version: 3.8.5
conda env: mvp_env

Current commit of [1mpythonimports[0m:
[33mcommit 419895d157c97717f835390196c13cf973d25eba[m  
Merge: e20434f 1e09b6c  
Author: Brandon Lind <lind.brandon.m@gmail.com>

Current commit of [94m[1mMVP_offsets[0m[0m:
[33mcommit c5bc403582e5bafc1036be8cd2a4bb0b4d154623[m  
Author: Brandon Lind <lind.brandon.m@gmail.com>  
Date:   Mon Dec 18 14:38:19 2023 -0500
#########################################################



In [2]:
dst_dirs = defaultdict(list,
            {'run_20220919_0-225': ['/work/lotterhos/brandon/ind_runtimes/run_20220919_0-225/00500/gradient_forests/training/training_files',
              '/work/lotterhos/brandon/ind_runtimes/run_20220919_0-225/05000/gradient_forests/training/training_files',
              '/work/lotterhos/brandon/ind_runtimes/run_20220919_0-225/10000/gradient_forests/training/training_files',
              '/work/lotterhos/brandon/ind_runtimes/run_20220919_0-225/20000/gradient_forests/training/training_files'],
             'run_20220919_225-450': ['/work/lotterhos/brandon/ind_runtimes/run_20220919_225-450/00500/gradient_forests/training/training_files',
              '/work/lotterhos/brandon/ind_runtimes/run_20220919_225-450/05000/gradient_forests/training/training_files',
              '/work/lotterhos/brandon/ind_runtimes/run_20220919_225-450/10000/gradient_forests/training/training_files',
              '/work/lotterhos/brandon/ind_runtimes/run_20220919_225-450/20000/gradient_forests/training/training_files'],
             'run_20220919_450-675': ['/work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/00500/gradient_forests/training/training_files',
              '/work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/05000/gradient_forests/training/training_files',
              '/work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/10000/gradient_forests/training/training_files',
              '/work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/20000/gradient_forests/training/training_files']})

In [5]:
d_index = 1  # which dir I'm submitting

jobnames =[]
pids = defaultdict(dict)
for rep, repdirs in dst_dirs.items():
    print(repdirs[d_index])

/work/lotterhos/brandon/ind_runtimes/run_20220919_0-225/05000/gradient_forests/training/training_files
/work/lotterhos/brandon/ind_runtimes/run_20220919_225-450/05000/gradient_forests/training/training_files
/work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/05000/gradient_forests/training/training_files


In [6]:
jobnames =[]
pids = defaultdict(dict)
for rep, repdirs in dst_dirs.items():
    sh_dir = repdirs[d_index].replace('training_files', 'training_shfiles')
    
    shfiles = fs(sh_dir, endswith='.sh', exclude='watcher')
    
    assert len(shfiles) == 225
    
    for sh in pbar(shfiles, desc=rep):
        jobnames.append(op.basename(sh))
        seed = op.basename(sh).split("_")[0]
        pids[rep][seed] = sbatch(sh, progress_bar=False)

luni(jobnames), len(jobnames)

run_20220919_0-225: 100%|███████████████| 225/225 [00:30<00:00,  7.43it/s]
run_20220919_225-450: 100%|███████████████| 225/225 [00:36<00:00,  6.20it/s]
run_20220919_450-675: 100%|███████████████| 225/225 [00:40<00:00,  5.61it/s]


(675, 675)

In [7]:
fitting_shfiles = []
for (rep, seed), pid in unwrap_dictionary(pids, progress_bar=True):
    fitting_shdir = dst_dirs[rep][d_index].replace('training/training_files', 'fitting/fitting_shfiles')
    fitting_sh = f'{fitting_shdir}/{seed}_gf_fitting.sh'

    text = read(fitting_sh)

    assert 'dependency' in text[6]
    text[6] = f'#SBATCH --dependency=afterok:{pid[0]}'

    with open(fitting_sh, 'w') as o:
        o.write('\n'.join(text))

    fitting_shfiles.append(fitting_sh)

print(len(fitting_shfiles))

text

100%|███████████████| 3/3 [00:03<00:00,  1.29s/it]

675





['#!/bin/bash',
 '#SBATCH --job-name=1231768_gf_fitting',
 '#SBATCH --time=1-00:00:00',
 '#SBATCH --ntasks=1',
 '#SBATCH --mem=300000M',
 '#SBATCH --output=1231768_gf_fitting_%j.out',
 '#SBATCH --dependency=afterok:40446486',
 '#SBATCH --mail-user=dummy_email@gmail.com',
 '#SBATCH --mail-type=FAIL',
 '#SBATCH --nodes=1',
 '#SBATCH --cpus-per-task=7',
 'cd /home/b.lind/code/MVP-offsets/01_src',
 'source $HOME/.bashrc',
 'conda activate mvp_env',
 'python MVP_02_fit_gradient_forests.py 1231768 /work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/05000/slimdir /work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/05000/gradient_forests/training/training_outfiles /home/b.lind/anaconda3/envs/r35/lib/R/bin/Rscript 1 pooled',
 'python MVP_03_validate_gradient_forests.py 1231768 /home/b.lind/offsets/run_20220919_450-675/slimdir /work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/05000/gradient_forests 100 pooled']

In [8]:
fitting_pids = sbatch(fitting_shfiles)  # one cancelled because previous job still in queue (1231591_gf_fitting)

sbatching:  74%|███████████    | 497/675 [01:44<00:39,  4.49it/s]
scancel: 100%|███████████████| 1/1 [00:00<00:00, 15.52it/s]
sbatching: 100%|███████████████| 675/675 [02:26<00:00,  4.62it/s]


In [9]:
Squeue(grepping='train').update(to_partition='short', num_jobs=0.5)

update: 100%|███████████████| 326/326 [00:20<00:00, 16.18it/s]


In [10]:
Squeue(grepping='fit').update(to_partition='long', num_jobs=0.5)

update: 100%|███████████████| 337/337 [00:20<00:00, 16.30it/s]


In [11]:
Squeue()

[1m[38;2;128;128;128m🗒️  Queue Summary:
[0m[0m
{'[4m[1mlong[0m[0m': {'PD': 662, 'R': 25},
 '[4m[1mlotterhos[0m[0m': {'R': 1},
 '[4m[1mshort[0m[0m': {'PD': 613, 'R': 50}}

<a id='submit10k'></a>
# submit jobs using 10000 loci

[top](#home)

In [1]:
from pythonimports import *

import MVP_summary_functions as mvp

lview, dview = get_client(cluster_id='1706191002-fj53', profile='lotterhos')

outerdir = '/work/lotterhos/brandon/ind_runtimes'

mvp.latest_commit()
session_info.show()

8 8
#########################################################
Today:	January 25, 2024 - 08:56:51 EST
python version: 3.8.5
conda env: mvp_env

Current commit of [1mpythonimports[0m:
[33mcommit 419895d157c97717f835390196c13cf973d25eba[m  
Merge: e20434f 1e09b6c  
Author: Brandon Lind <lind.brandon.m@gmail.com>

Current commit of [94m[1mMVP_offsets[0m[0m:
[33mcommit c5bc403582e5bafc1036be8cd2a4bb0b4d154623[m  
Author: Brandon Lind <lind.brandon.m@gmail.com>  
Date:   Mon Dec 18 14:38:19 2023 -0500
#########################################################



In [2]:
dst_dirs = defaultdict(list,
            {'run_20220919_0-225': ['/work/lotterhos/brandon/ind_runtimes/run_20220919_0-225/00500/gradient_forests/training/training_files',
              '/work/lotterhos/brandon/ind_runtimes/run_20220919_0-225/05000/gradient_forests/training/training_files',
              '/work/lotterhos/brandon/ind_runtimes/run_20220919_0-225/10000/gradient_forests/training/training_files',
              '/work/lotterhos/brandon/ind_runtimes/run_20220919_0-225/20000/gradient_forests/training/training_files'],
             'run_20220919_225-450': ['/work/lotterhos/brandon/ind_runtimes/run_20220919_225-450/00500/gradient_forests/training/training_files',
              '/work/lotterhos/brandon/ind_runtimes/run_20220919_225-450/05000/gradient_forests/training/training_files',
              '/work/lotterhos/brandon/ind_runtimes/run_20220919_225-450/10000/gradient_forests/training/training_files',
              '/work/lotterhos/brandon/ind_runtimes/run_20220919_225-450/20000/gradient_forests/training/training_files'],
             'run_20220919_450-675': ['/work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/00500/gradient_forests/training/training_files',
              '/work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/05000/gradient_forests/training/training_files',
              '/work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/10000/gradient_forests/training/training_files',
              '/work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/20000/gradient_forests/training/training_files']})

In [3]:
d_index = 2  # which dir I'm submitting

jobnames =[]
pids = defaultdict(dict)
for rep, repdirs in dst_dirs.items():
    print(repdirs[d_index])

/work/lotterhos/brandon/ind_runtimes/run_20220919_0-225/10000/gradient_forests/training/training_files
/work/lotterhos/brandon/ind_runtimes/run_20220919_225-450/10000/gradient_forests/training/training_files
/work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/10000/gradient_forests/training/training_files


In [4]:
jobnames =[]
pids = defaultdict(dict)
for rep, repdirs in dst_dirs.items():
    sh_dir = repdirs[d_index].replace('training_files', 'training_shfiles')
    
    shfiles = fs(sh_dir, endswith='.sh', exclude='watcher')
    
    assert len(shfiles) == 225
    
    for sh in pbar(shfiles, desc=rep):
        jobnames.append(op.basename(sh))
        seed = op.basename(sh).split("_")[0]
        pids[rep][seed] = sbatch(sh, progress_bar=False)

luni(jobnames), len(jobnames)

run_20220919_0-225: 100%|███████████████| 225/225 [00:31<00:00,  7.10it/s]
run_20220919_225-450: 100%|███████████████| 225/225 [00:36<00:00,  6.09it/s]
run_20220919_450-675: 100%|███████████████| 225/225 [00:40<00:00,  5.50it/s]


(675, 675)

In [6]:
fitting_shfiles = []
for (rep, seed), pid in unwrap_dictionary(pids, progress_bar=True):
    fitting_shdir = dst_dirs[rep][d_index].replace('training/training_files', 'fitting/fitting_shfiles')
    fitting_sh = f'{fitting_shdir}/{seed}_gf_fitting.sh'

    text = read(fitting_sh)

    assert 'dependency' in text[6]
    text[6] = f'#SBATCH --dependency=afterok:{pid[0]}'

    with open(fitting_sh, 'w') as o:
        o.write('\n'.join(text))

    fitting_shfiles.append(fitting_sh)

print(len(fitting_shfiles))

text

100%|███████████████| 3/3 [00:02<00:00,  1.11it/s]

675





['#!/bin/bash',
 '#SBATCH --job-name=1231768_gf_fitting',
 '#SBATCH --time=1-00:00:00',
 '#SBATCH --ntasks=1',
 '#SBATCH --mem=300000M',
 '#SBATCH --output=1231768_gf_fitting_%j.out',
 '#SBATCH --dependency=afterok:40475630',
 '#SBATCH --mail-user=dummy_email@gmail.com',
 '#SBATCH --mail-type=FAIL',
 '#SBATCH --nodes=1',
 '#SBATCH --cpus-per-task=7',
 'cd /home/b.lind/code/MVP-offsets/01_src',
 'source $HOME/.bashrc',
 'conda activate mvp_env',
 'python MVP_02_fit_gradient_forests.py 1231768 /work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/10000/slimdir /work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/10000/gradient_forests/training/training_outfiles /home/b.lind/anaconda3/envs/r35/lib/R/bin/Rscript 1 pooled',
 'python MVP_03_validate_gradient_forests.py 1231768 /home/b.lind/offsets/run_20220919_450-675/slimdir /work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/10000/gradient_forests 100 pooled']

In [7]:
fitting_pids = sbatch(fitting_shfiles)

sbatching: 100%|███████████████| 675/675 [02:25<00:00,  4.63it/s]


In [8]:
Squeue(grepping='train').update(to_partition='short', num_jobs=0.5)

Squeue(grepping='fit').update(to_partition='long', num_jobs=0.5)

Squeue()

update: 100%|███████████████| 325/325 [00:19<00:00, 16.35it/s]
update: 100%|███████████████| 338/338 [00:20<00:00, 16.70it/s]


[1m[38;2;128;128;128m🗒️  Queue Summary:
[0m[0m
{'[4m[1mlong[0m[0m': {'PD': 663, 'R': 25},
 '[4m[1mlotterhos[0m[0m': {'R': 1},
 '[4m[1mshort[0m[0m': {'PD': 657, 'R': 5}}

In [12]:
len(Squeue())

1351

In [13]:
675*2

1350

<a id='submit20k'></a>
# submit jobs using 20000 loci
[top](#home)

In [1]:
from pythonimports import *

import MVP_summary_functions as mvp

lview, dview = get_client(cluster_id='1706883711-ub9x', profile='lotterhos')

outerdir = '/work/lotterhos/brandon/ind_runtimes'

mvp.latest_commit()
session_info.show()

36 36
#########################################################
Today:	February 02, 2024 - 10:04:21 EST
python version: 3.8.5
conda env: mvp_env

Current commit of [1mpythonimports[0m:
[33mcommit 419895d157c97717f835390196c13cf973d25eba[m  
Merge: e20434f 1e09b6c  
Author: Brandon Lind <lind.brandon.m@gmail.com>

Current commit of [94m[1mMVP_offsets[0m[0m:
[33mcommit c0df4e9120d165dc9d594d671de5ec99887e874d[m  
Author: Brandon Lind <lind.brandon.m@gmail.com>  
Date:   Mon Jan 29 20:40:49 2024 -0500
#########################################################



In [2]:
dst_dirs = defaultdict(list,
            {'run_20220919_0-225': ['/work/lotterhos/brandon/ind_runtimes/run_20220919_0-225/00500/gradient_forests/training/training_files',
              '/work/lotterhos/brandon/ind_runtimes/run_20220919_0-225/05000/gradient_forests/training/training_files',
              '/work/lotterhos/brandon/ind_runtimes/run_20220919_0-225/10000/gradient_forests/training/training_files',
              '/work/lotterhos/brandon/ind_runtimes/run_20220919_0-225/20000/gradient_forests/training/training_files'],
             'run_20220919_225-450': ['/work/lotterhos/brandon/ind_runtimes/run_20220919_225-450/00500/gradient_forests/training/training_files',
              '/work/lotterhos/brandon/ind_runtimes/run_20220919_225-450/05000/gradient_forests/training/training_files',
              '/work/lotterhos/brandon/ind_runtimes/run_20220919_225-450/10000/gradient_forests/training/training_files',
              '/work/lotterhos/brandon/ind_runtimes/run_20220919_225-450/20000/gradient_forests/training/training_files'],
             'run_20220919_450-675': ['/work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/00500/gradient_forests/training/training_files',
              '/work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/05000/gradient_forests/training/training_files',
              '/work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/10000/gradient_forests/training/training_files',
              '/work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/20000/gradient_forests/training/training_files']})

In [3]:
d_index = 3  # which dir I'm submitting

jobnames =[]
pids = defaultdict(dict)
for rep, repdirs in dst_dirs.items():
    print(repdirs[d_index])

/work/lotterhos/brandon/ind_runtimes/run_20220919_0-225/20000/gradient_forests/training/training_files
/work/lotterhos/brandon/ind_runtimes/run_20220919_225-450/20000/gradient_forests/training/training_files
/work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/20000/gradient_forests/training/training_files


In [4]:
jobnames =[]
pids = defaultdict(dict)
for rep, repdirs in dst_dirs.items():
    sh_dir = repdirs[d_index].replace('training_files', 'training_shfiles')
    
    shfiles = fs(sh_dir, endswith='.sh', exclude='watcher')
    
    assert len(shfiles) == 225
    
    for sh in pbar(shfiles, desc=rep):
        jobnames.append(op.basename(sh))
        seed = op.basename(sh).split("_")[0]
        pids[rep][seed] = sbatch(sh, progress_bar=False)

luni(jobnames), len(jobnames)

run_20220919_0-225: 100%|███████████████| 225/225 [00:34<00:00,  6.58it/s]
run_20220919_225-450: 100%|███████████████| 225/225 [00:42<00:00,  5.30it/s]
run_20220919_450-675: 100%|███████████████| 225/225 [00:44<00:00,  5.07it/s]


(675, 675)

In [6]:
fitting_shfiles = []
for (rep, seed), pid in unwrap_dictionary(pids, progress_bar=True):
    fitting_shdir = dst_dirs[rep][d_index].replace('training/training_files', 'fitting/fitting_shfiles')
    fitting_sh = f'{fitting_shdir}/{seed}_gf_fitting.sh'

    text = read(fitting_sh)

    assert 'dependency' in text[6]
    text[6] = f'#SBATCH --dependency=afterok:{pid[0]}'

    with open(fitting_sh, 'w') as o:
        o.write('\n'.join(text))

    fitting_shfiles.append(fitting_sh)

print(len(fitting_shfiles))

text

100%|███████████████| 3/3 [00:03<00:00,  1.27s/it]

675





['#!/bin/bash',
 '#SBATCH --job-name=1231768_gf_fitting',
 '#SBATCH --time=1-00:00:00',
 '#SBATCH --ntasks=1',
 '#SBATCH --mem=300000M',
 '#SBATCH --output=1231768_gf_fitting_%j.out',
 '#SBATCH --dependency=afterok:40622751',
 '#SBATCH --mail-user=dummy_email@gmail.com',
 '#SBATCH --mail-type=FAIL',
 '#SBATCH --nodes=1',
 '#SBATCH --cpus-per-task=7',
 'cd /home/b.lind/code/MVP-offsets/01_src',
 'source $HOME/.bashrc',
 'conda activate mvp_env',
 'python MVP_02_fit_gradient_forests.py 1231768 /work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/20000/slimdir /work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/20000/gradient_forests/training/training_outfiles /home/b.lind/anaconda3/envs/r35/lib/R/bin/Rscript 1 pooled',
 'python MVP_03_validate_gradient_forests.py 1231768 /home/b.lind/offsets/run_20220919_450-675/slimdir /work/lotterhos/brandon/ind_runtimes/run_20220919_450-675/20000/gradient_forests 100 pooled']

In [7]:
Squeue(grepping='train').update(minmemorynode='200000')

update: 100%|███████████████| 675/675 [00:49<00:00, 13.72it/s]


In [8]:
# some (~22) will get cancelled because some of the 10k fitting jobs are still in the queue
fitting_pids = sbatch(fitting_shfiles)

sbatching:  12%|█▊             | 81/675 [00:17<02:06,  4.71it/s]
scancel: 100%|███████████████| 1/1 [00:00<00:00, 14.64it/s]
sbatching:  14%|██▏            | 96/675 [00:20<02:13,  4.34it/s]
scancel: 100%|███████████████| 1/1 [00:00<00:00, 15.18it/s]
sbatching:  16%|██▎            | 105/675 [00:22<01:53,  5.04it/s]
scancel: 100%|███████████████| 1/1 [00:00<00:00, 14.45it/s]
sbatching:  43%|██████▍        | 288/675 [01:03<01:18,  4.93it/s]
scancel: 100%|███████████████| 1/1 [00:00<00:00, 15.32it/s]
sbatching:  44%|██████▌        | 294/675 [01:04<01:24,  4.49it/s]
scancel: 100%|███████████████| 1/1 [00:00<00:00, 14.36it/s]
sbatching:  72%|██████████▋    | 483/675 [01:51<00:48,  3.96it/s]
scancel:   0%|               | 0/1 [00:00<?, ?it/s][A
scancel: 100%|███████████████| 1/1 [00:00<00:00,  3.43it/s][A
sbatching:  74%|███████████    | 497/675 [01:57<00:54,  3.24it/s]
scancel: 100%|███████████████| 1/1 [00:00<00:00, 14.44it/s]
sbatching:  78%|███████████▋   | 526/675 [02:04<00:33,  4.49it

In [9]:
Squeue(grepping='train').update(to_partition='short', num_jobs=0.5)

update: 100%|███████████████| 338/338 [00:24<00:00, 14.01it/s]


In [10]:
Squeue()

[1m[38;2;128;128;128m🗒️  Queue Summary:
[0m[0m
{'[4m[1mlong[0m[0m': {'PD': 344, 'R': 3},
 '[4m[1mlotterhos[0m[0m': {'R': 1},
 '[4m[1mshort[0m[0m': {'PD': 973, 'R': 30}}

In [14]:
Squeue(grepping='train')

[1m[38;2;128;128;128m🗒️  Queue Summary:
[0m[0m
{'[4m[1mlong[0m[0m': {'PD': 337},
 '[4m[1mshort[0m[0m': {'PD': 308, 'R': 30}}

In [15]:
Squeue(grepping='fit')

[1m[38;2;128;128;128m🗒️  Queue Summary:
[0m[0m
{'[4m[1mlong[0m[0m': {'PD': 7, 'R': 3},
 '[4m[1mshort[0m[0m': {'PD': 665}}