### PLEASE COPY NOTEBOOKS TO YOUR FOLDERS TO PREVENT COMMIT CONFLICTS
* If you would like to contribute to this notebook, make changes on it in useful_notebooks folder, run "Restart and Clear Output" before commit.


#### THIS SCRIPT CAN:
1. Move experiments from an old set to a new set - see Part I section for caveats
2. Archive any processed files and associated workflow_runs and QCs associated with the old sets
3. CHANGE THE STATUS OF THE OLD SET TO replaced, AND ASSOCIATE IT WITH THE NEW ONE FOR REDIRECT AND ADD STATIC SECTIONS ABOUT REPLACEMENTS TO BOTH THE OLD AND NEW SETS.

Each part can be run independently though once the sets are replaced experiments cannot be moved.

**NOTE**: If there is a publication attached to the old set, remember to attach it manually to the new one. This script does NOT do that.

**NOTE**: there is an 'action' = True/False for each part - set to True to perform updates otherwise will be dry run

#### Imports and Helper Functions

In [None]:
### PLEASE COPY NOTEBOOKS TO YOUR FOLDERS TO PREVENT COMMIT CONFLICTS

#Given 2 sets, one old with old biological/technical replicates
#               second one with new biological/techinal replicates with continuing numbers (no overlap with previous one)
# This cell will add old ones to the new one, and also if there are any processed files on the previous set, 
# files and immediate connections (wfrs, qcs, output files) will be archived

# To prevent confusion, new set is in Accession format, and old set is in uuid format


from dcicutils import ff_utils
from functions.notebook_functions import *
import json
import time
from datetime import datetime


def conv_time(time_info):
    """Convert date_created date_modified to datetime object for time operations"""
    time_info, zone_info = time_info.split('+')
    assert zone_info == '00:00'
    try:
        time_info = datetime.strptime(time_info, '%Y-%m-%dT%H:%M:%S.%f')
    except ValueError:  # items created at the perfect second
        time_info = datetime.strptime(time_info, '%Y-%m-%dT%H:%M:%S')
    return time_info

def fetch_pf_associated(pf_id, my_key):
    """Given a file accession, find all related items
    1) QCs
    2) wfr producing the file, and other outputs from the same wfr
    3) wfrs this file went as input, and all files/wfrs/qcs around it
    The returned list might contain duplicates, uuids and display titles for qcs"""
    file_as_list = []
    pf_info = ff_utils.get_metadata(pf_id, my_key)
    file_as_list.append(pf_info['uuid'])
    if pf_info.get('quality_metric'):
        file_as_list.append(pf_info['quality_metric']['uuid'])
    inp_wfrs = pf_info.get('workflow_run_inputs')
    out_wfr = pf_info.get('workflow_run_outputs')[0]
    for inp_wfr in inp_wfrs:
        file_as_list.extend(fetch_wfr_associated(inp_wfr['uuid'], my_key))
    file_as_list.extend(fetch_wfr_associated(out_wfr['uuid'], my_key))
    return list(set(file_as_list))
        
                
def fetch_wfr_associated(wfr_uuid, my_key):
    """Given wfr_uuid, find associated output files and qcs"""
    wfr_as_list = []
    wfr_info = ff_utils.get_metadata(wfr_uuid, my_key)
    wfr_as_list.append(wfr_info['uuid'])
    if wfr_info.get('output_files'):
        for o in wfr_info['output_files']:
                if o.get('value'):
                    wfr_as_list.append(o['value']['uuid'])
                elif o.get('value_qc'):
                    wfr_as_list.append(o['value_qc']) # this is a @id
    if wfr_info.get('output_quality_metrics'):
        for qc in wfr_info['output_quality_metrics']:
            if qc.get('value'):
                wfr_as_list.append(qc['value']['uuid'])
    if wfr_info.get('quality_metric'):
        wfr_as_list.append(wfr_info['quality_metric']['uuid'])
    return wfr_as_list




#### Inputs 
* To prevent confusion, **new set** is *Accession*, and **old set** is *uuid*
* also sets up connection info
* gets es metadata for old and new set that can be (re)used in each step independently.

In [None]:
# new set accession; set to be replaced uuid
setpairs = [
    ['4DNESWST3UBHXXX', 'f6a9adc8-ce9c-4095-889d-e25ee8b73e16']
]

my_key = get_key('andrea_data')

# initial es_metadata fetching to create a list of pairs used as input to each step
workon = []
uid_list = [sp[1] for sp in setpairs] # grab uuids for old guys
acc2uid = {}
for sp in setpairs:
    acc = sp[0]
    try:
        uid = ff_utils.get_metadata(acc, my_key).get('uuid')
    except AttributeError:
        print("Can't get uuid for {} - removing {} from uuid list and skipping pair".format(acc, sp[1]))
        uid_list.remove(sp[1])
    else:
        uid_list.append(uid)
        acc2uid[acc] = uid
        
es_res = ff_utils.get_es_metadata(uid_list, key=my_key, is_generator=True)
es_meta = {}
for es in es_res:
    es_meta[es.get('uuid')] = es

for new, old in setpairs:
    nuid = acc2uid[new]
    workon.append([es_meta[nuid], es_meta[old]])

print('HAVE {} PAIRS OF SETS TO WORKON'.format(len(workon)))


### Part 1 - adding experiments and associated metadata from the old set to the new set
#### NOTE: this part can be skipped if all the new links have been set up already
##### dependencies 
* old sets are **not** already status=replaced
* experiments being transferred to the new set do not share replicate numbers with those already in the new set

In [None]:
action = False
for new_set, old_set in workon:
    # here we need want the uuids so get properties from es metadata
    new_set_info = new_set.get('properties')
    old_set_info = old_set.get('properties')
    if old_set_info['status'] == 'replaced':
        print('old set already replaced, skipping')
        continue
    print('Combining {} into {}'.format(old_set_info['accession'], new_set_info['accession']))
    # assert new one is older the old one
    assert conv_time(old_set_info['date_created']) < conv_time(new_set_info['date_created'])
    # combine rep exps
    new_rep = new_set_info['replicate_exps'] + old_set_info['replicate_exps']
    new_rep = sorted(new_rep, key=lambda k: [k['bio_rep_no'],k['tec_rep_no']])
    # assert unique bio tec reps
    tec_bio = [str(i['bio_rep_no'])+'_'+str(i['tec_rep_no']) for i in new_rep]
    try:
        assert len(new_rep) == len(list(set(tec_bio)))
    except AssertionError:
        print('same rep numbers are used, either merged already happened, or conflicting numbers in both sets, skipping')
        continue
    ans = input('Continue with this rep numbers formatted b_t (y/n):\n{}\n'.format(tec_bio))
    if ans != 'y':
        break
    # patch the new set with the new rep info
    if action:
        ff_utils.patch_metadata({'replicate_exps': new_rep}, new_set_info.get('accession'), my_key)
        print(new_set_info.get('accession'), ' replicates are updated')
        

### Part 2 - determine if there are processed_files, other_processed_files, workflow_runs and QCs linked (directly or indirectly) to the old set and if so archive them
#### Note: this can be skipped 

In [None]:
action = False
# are there processed files/ other processed files and wfrs/qcs that need to be archived
# will collect items on processed_files and other_processed_files fields, and their asociated items
# (only 1 level of wfrs)
for new_set_info, old_set_info in workon: 
    pre_set_url = "https://data.4dnucleome.org/experiment-set-replicates/"
    archive_files_info = 'This Processed File has been archived because it belongs to an archived ' \
    'Experiment Set ([{0}]({1}{2}/)), which has been replaced by [{3}]({1}{3}/). '.format(
        old_set_info['accession']
        pre_set_url
        old_set_info['uuid']
        new_set_info['accession']
    )
    archive_files = []
    if old_set_info.get('other_processed_files'):
        for case in old_set_info['other_processed_files']:
            archive_files.extend(case['files'])  # add all files to archive_list
            case['type'] = 'archived'
        ### Patch opf items type to archived
        if action:
            ff_utils.patch_metadata({'other_processed_files': old_set_info['other_processed_files']},
                                    old_set_info.get('accession'),
                                    my_key)
    if old_set_info.get('processed_files'):
        archive_files.extend(old_set_info['processed_files'])
    archive_list = []
    for ar_file in archive_files:
        archive_list.extend(fetch_pf_associated(ar_file, my_key))
    print(len(archive_list), 'associated items will be archived')
    for an_item in archive_list:
        if action:
            item_info = ff_utils.get_metadata(an_item, my_key)
            old_description = item_info.get('description', '')
            new_description = archive_files_info + old_description
            new_description = new_description.strip()
            if not new_description.endswith('.'):
                new_description += '.'
            item_status = item_info['status']
            if status == 'released':
                ff_utils.patch_metadata({'status': 'archived', 'description': new_description}, an_item, my_key)
            elif status == 'released to project':
                ff_utils.patch_metadata({'status': 'archived to project', 'description': new_description}, an_item, my_key)
            else:
                print(an_item, 'is not released yet, consider deleting, not processed')     

### Part 3 - Create static sections, change statuses and add alternate accessions
* Create and add static sections to old and new sets to indicate why replacement happened. This works also in case of recursive replacements (a new set replacing an old set, replacing an old set, ...).
* Set status of old set to replaced.
* Add the alternate accession from old to new set to set up the redirect.

In [None]:
# will perform patches/posts if set to true
action = False

# reason for replacement
reason = 'new biological replicates were added'

for new_set, old_set in workon:
    # here we need the embedded data
    new_set_info = new_set.get('embedded')
    old_set_info = old_set.get('embedded')
    old_acc = old_set_info['accession']
    new_acc = new_set_info['accession']
    old_status = old_set_info['status']
    new_status = new_set_info['status']

    if old_status in ['in review by lab', 'pre-release']:
        old_status = 'draft'
    if new_status in ['in review by lab', 'pre-release']:
        new_status = 'draft'

    # prepare the old set header
    old_alias = "static_header:replaced_item_{}_by_{}".format(old_acc, new_acc)
    pre_set_url = "https://data.4dnucleome.org/experiment-set-replicates/"
    old_body_message = "This experiment set was replaced by [{0}]({1}{0}/) because {2}.".format(
        new_acc, 
        pre_set_url,
        reason
    )
    old_header = {
        "body": old_body_message,
        "award": old_set_info['award']['uuid'],
        "lab": old_set_info['lab']['uuid'], 
        "name": "static-header.replaced_item_{}".format(old_acc),
        "section_type": "Item Page Header",
        "options": {"title_icon": "info", "default_open": True, "filetype": "md", "collapsible": False},
        "title": "Note: Replaced Item - {}".format(old_acc),
        "status": old_status,
        "aliases": [old_alias]
    }
    
    # prepare the new set header
    new_alias = "static_header:replacing_item_{}_old_{}".format(new_acc, old_acc)
    new_body_message = "This experiment set supersedes [{0}]({1}{2}/)".format(
        old_acc,
        pre_set_url,
        old_set_info['uuid']        
    ) 
    # Check for old set, determine if it was already replacing any other set.
    # If yes, it will create a cascade of redirects, so include the previously replaced
    # accession(s) in the static section.
#     old_old_acc = []
    if old_set_info['alternate_accessions']:
        for alt_accession in old_set_info['alternate_accessions']:
#             old_old_acc.append(alt_accession)
            search_query = 'https://data.4dnucleome.org/search/?q=' + alt_accession + '&type=Item&status=replaced'
            alt_uuid = ff_utils.search_metadata(search_query, my_key)[0].get('uuid')
            new_body_message = "{0} and [{1}]({2}{3}/)".format(
                new_body_message,
                alt_accession,
                pre_set_url,
                alt_uuid
            )
    new_body_message += " because " + reason + "."
    new_header = {
      "body": new_body_message,
      "award": new_set_info['award']['uuid'],
      "lab": new_set_info['lab']['uuid'],
      "name": "static-header.replacing_item_{}".format(new_acc),
      "section_type": "Item Page Header",
      "options": {"title_icon": "info", "default_open": True, "filetype": "md", "collapsible": False},
      "title": "Note: Superseding Item - {}".format(new_acc),
      "status": new_status,
      "aliases": [new_alias]
    }
    
    print('ADDING HEADER TO THE OLD SET')
    print(old_header)
    print('ADDING HEADER TO THE NEW SET')
    print(new_header)

    if action:
        # post the static sections
        try:
            old_h_resp = ff_utils.post_metadata(old_header, 'StaticSection', my_key)['@graph'][0]
        except:
            print('old header already in system')
            old_h_resp = ff_utils.get_metadata(old_alias, my_key)

        try:
            new_h_resp = ff_utils.post_metadata(new_header, 'StaticSection', my_key)['@graph'][0]
        except:
            print('new header already in system')
            new_h_resp = ff_utils.get_metadata(new_alias, my_key) 

        #see if existing headers
        old_header_list = []
        new_header_list = []
        if old_set_info.get('static_headers'):
            old_header_list = [i['uuid'] for i in old_set_info['static_headers']]
        if new_set_info.get('static_headers'):
            new_header_list = [i['uuid'] for i in new_set_info['static_headers']]
        # add new ones to the list
        if old_h_resp['uuid'] in old_header_list:
            pass
        else:
            old_header_list.append(old_h_resp['uuid'])
        if new_h_resp['uuid'] in new_header_list:
            pass
        else:
            new_header_list.append(new_h_resp['uuid'])

        # set the status of old set to replaced
        ff_utils.patch_metadata({'status':'replaced', 'static_headers': old_header_list},
                                obj_id=old_set_info['uuid'], key=my_key)
        # wait for indexing to take place
        # you might need to repeat this last piece separately if indexing does not catch up
        # new status needs to be indexed for alternate accession to be patched
        time.sleep(60)
        # set the alternate accession on the new set to the old one
        alt_ac = []
        if new_set_info.get('alternate_accessions'):
            alt_ac = new_set_info['alternate_accessions']
        alt_ac.append(old_acc)
        ff_utils.patch_metadata({'alternate_accessions':alt_ac, 'static_headers': new_header_list},
                                obj_id=new_set_info['uuid'], key=my_key)

    print('DONE')
    print('CHECK THE OLD SET', 'https://data.4dnucleome.org/experiment-set-replicates/{}/'.format(old_set_info['uuid']))
    print('CHECK THE NEW SET', 'https://data.4dnucleome.org/experiment-set-replicates/{}/'.format(new_acc))