This Python script makes a bash file to run on MOOSE to get all the relevant files when the list of files is very long and would take too long to sift through manually. It moves the files to a specified location (netscratch) in a directory. It gets and moves each file one by one because they are too large to handle in groups.

In [27]:
import glob
import math

In [21]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [34]:
# 1. Make a folder on JASMIN to put all the files into. DONE
# 2. Find the files and put them in the folder 1 by 1. Test the command. DONE
# 3. SCP the folder to netscratch. DONE
# 4. Delete the folder off JASMIN. DONE
# 5. Make a shell script to do all this.

# the UM output files are named like: moose:/crum/u-cy731/apl.pp/cy731a.pl20180531.pp
um_file_path = 'moose:/crum/u-cy731/apl.pp/'
um_file_prefix = 'cy731a.pl'
moo_get = 'moo get -v'
dest_jasmin = '.'
dest_scratch = 'st838@atm-farman.ch.private.cam.ac.uk:/scratch/st838/netscratch/UM_nudged_J_outputs_for_ATom/'

# Find all the dates.
atom_folder_path = '/content/drive/MyDrive/Photolysis data/ATom_MER10_Dataset.20210613'
atom_files = glob.glob(atom_folder_path + '/*.ict')

all_commands = ''
num_files = 0

for atom_file in atom_files:

   date = atom_file[-16:-8]
   file_name = '{}{}.pp'.format(um_file_prefix, date)
   um_file = '{}{}'.format(um_file_path, file_name)
   moo_command = '{} {} {}'.format(moo_get, um_file, dest_jasmin)
   scp_command = 'scp {} {}'.format(file_name, dest_scratch)
   delete_command = 'rm {}'.format(file_name)
   all_commands = '{}\n{}\n{}\n{}'.format(all_commands, moo_command, scp_command, delete_command)
   num_files += 1

# Work out how much storage we will use up.
total_files = 1237
total_bytes = 19177039643840
data_size = (total_bytes / total_files) * num_files
GB = math.ceil(data_size * 0.000000001)
print('Moving a total of {} gigabytes of data.'.format(GB))

Moving a total of 745 gigabytes of data.


In [36]:
# Make the bash script.
bash_file_path = '/content/drive/MyDrive/Photolysis data/get_store_data.sh'
bash_script = open(bash_file_path, 'w')
bash_content = '#!/bin/bash\n{}'.format(all_commands)
print(bash_content)
bash_script.write(bash_content)
bash_script.close()

#!/bin/bash

moo get -v moose:/crum/u-cy731/apl.pp/cy731a.pl20160729.pp .
scp cy731a.pl20160729.pp st838@atm-farman.ch.private.cam.ac.uk:/scratch/st838/netscratch/UM_nudged_J_outputs_for_ATom/
rm cy731a.pl20160729.pp
moo get -v moose:/crum/u-cy731/apl.pp/cy731a.pl20160801.pp .
scp cy731a.pl20160801.pp st838@atm-farman.ch.private.cam.ac.uk:/scratch/st838/netscratch/UM_nudged_J_outputs_for_ATom/
rm cy731a.pl20160801.pp
moo get -v moose:/crum/u-cy731/apl.pp/cy731a.pl20160817.pp .
scp cy731a.pl20160817.pp st838@atm-farman.ch.private.cam.ac.uk:/scratch/st838/netscratch/UM_nudged_J_outputs_for_ATom/
rm cy731a.pl20160817.pp
moo get -v moose:/crum/u-cy731/apl.pp/cy731a.pl20160806.pp .
scp cy731a.pl20160806.pp st838@atm-farman.ch.private.cam.ac.uk:/scratch/st838/netscratch/UM_nudged_J_outputs_for_ATom/
rm cy731a.pl20160806.pp
moo get -v moose:/crum/u-cy731/apl.pp/cy731a.pl20170126.pp .
scp cy731a.pl20170126.pp st838@atm-farman.ch.private.cam.ac.uk:/scratch/st838/netscratch/UM_nudged_J_outputs_f