## Create specimen instances 

Demonstration script for generating instances from metadata that is provided 
in the specimen_template.xlsx file without requiring prior python experience. 

With this notebook you can do the following:
1. Import user-defined specimen metadata
2. Convert metadata into openMINDS instances
3. Upload newly created instances to the KGE
4. Add newly created instances to a dataset version under "studiedSpecimen"

To be able to run the script, you need to the following requirements:
- Python version >= 3.6
- openMINDS package (can be downloaded from https://pypi.org/project/openMINDS/)
- read and write permission to the KG via the API

Run the script and answer the questions below.

In [None]:
# Import relevant packages
import os
import pandas as pd
import glob
from datetime import datetime
from getpass import getpass

from metabot import openMINDS_wrapper
w = openMINDS_wrapper()

### Define the location of your template file

We first need to define the location of your template file. This notebook asks you whether the template file is stored in the same location as the script. Press "y" if this is the case. If it is stored elsewhere, you press "n" and define the path to your file.

An output folder is automatically created in which the instances will be stored. The output folder is put in the same location as the template file and the name of the folder is "createdInstances_[date]_[time]"

In [None]:
# Define Location of the files
cwd = os.getcwd()
answer = input("Is this where your files are stored: " + cwd + "? yes (y) or no (n) " ) 

if answer == "y":
    fpath = cwd
elif answer == "n":
    fpath = input("Please define you path: ")
     
fpath = fpath + "\\" 
os.chdir(fpath)

# Make output folder is it does not exist yet
now = datetime.now()
output_path = "createdInstances" + "_" + now.strftime("%d%m%Y_%H%M") + "\\"
if os.path.isdir(output_path):
    print("Output folder already exists")
else:
    print("Output folder does not exist, making folder")        
    os.mkdir(output_path) 

### Import metadata and create instances

We then define the name of the template file (without the ".xlsx" extension) and import the metadata. Everything that is defined in the template file will be converted into an instance. 

Remember to use 1 row per specimen. If you would like to create 2 or more states (time points) per specimen with specific metadata (e.g. age, weight, attribute), use 1 row per specimen state. 

In [None]:
# Import the file with the specimen metadata
metadata_file = input("What is the name of your specimen file (e.g. specimen_template.xlsx)? ")
fileLocation = fpath + metadata_file + ".xlsx"

specimenInfo = pd.read_excel(fileLocation)

specimenType = specimenInfo.specimenType.to_list()

if "subjectGroup" in specimenType:
    SG_info = specimenInfo[specimenInfo.specimenType == "subjectGroup"].reset_index(drop=True) 
    SG_data = w.makeSubjectCollections(SG_info, output_path)

if "subject" in specimenType:
    subject_info = specimenInfo[specimenInfo.specimenType == "subject"].reset_index(drop=True) 
    if 'SG_data' in locals():
        subject_info = w.findGroup(subject_info, SG_data)
    subject_data = w.makeSubjectCollections(subject_info, output_path)

if "tsc" in specimenType:
    tsc_info = specimenInfo[specimenInfo.specimenType == "tsc"].reset_index(drop=True) 
    if 'subject_data' in locals():
        tsc_info = w.findGroup(tsc_info, subject_data)
    tsc_data = w.makeSampleCollections(tsc_info, output_path)

if "ts" in specimenType:
    ts_info = specimenInfo[specimenInfo.specimenType == "ts"].reset_index(drop=True)
    if 'tsc_data' in locals():
        ts_info = w.findGroup(ts_info, tsc_data) 
    ts_data = w.makeSampleCollections(ts_info, output_path)

# Saving an overview file in the output folder for future reference
print("\nOverview file is saved in the output folder \n")

### Upload the instances to the KGE

Once we have created the instances, we can immediately upload them to the KGE. For this you will need to authorise yourself with an authentication token. You can find that token in the KGE editor.

If you are not ready to upload the instances, press "n". You can always add them later using the ex3.py script.

If you uploaded them, but you made a mistake and would like to remove all the instances? Use the ex4.py script to delete the instance from the KGE.

### Add specimen to a dataset version

If you have chosen to upload the instances to the KGE immediately, you get the opportunity to add the specimen you created to a dataset version. For this, you need to give the uuid of the dataset version and all the instances will be added. 

If you decide against adding the instances, then you will have to add the instances to the dataset version manually (under "studiedSpecimen") or you can run script ex5.py later.

In [None]:
# Upload instances to the KGE
answer = input("Would you like to upload the instances you created to the KGE? yes (y) or no (n) " ) 

if answer == "y":
    token = getpass(prompt="Please enter your KG token (or Enter to skip uploading to the KG): ")
    instances_fnames = glob.glob(output_path + "*\\*", recursive = True)

    print("\nUploading data now:\n")
    
    if token != "":
        response_upload = w.upload(instances_fnames, token, space_name = "dataset")  

        # Add specimen to dataset version
        answer = input("Would you like to add the instances you created to a dataset version? yes (y) or no (n) " ) 
        dsv_uuid = input("What is the uuid of the dataset version you would like to add specimen to? ")
        token = getpass(prompt="Please enter your KG token (or Enter to skip uploading to the KG): ")
        
        print("\nAdding specimen to dataset version:" + dsv_uuid + "\n")

        # Retrieve the specimen information of the created instances
        if 'SG_data' in locals():
            SG2add = SG_data.specimen_uuid.unique().tolist()
        else:
            SG2add = []
        if 'subject_data' in locals():
            subjects2add = subject_data.specimen_uuid.unique().tolist()
        else:
            subjects2add = []
        if 'tsc_data' in locals():
            tsc2add = tsc_data.specimen_uuid.unique().tolist()
        else:
            tsc2add = []
        if 'ts_data' in locals():
            ts2add = ts_data.specimen_uuid.unique().tolist()
        else:
            ts2add = []
        
        instances2add = SG2add + subjects2add + tsc2add + ts2add

        response_addition = w.add2dsv(instances2add, token, dsv_uuid, space_name = "dataset")

    else: 
        print("No token provided")  
        
elif answer == "n":
    print("\nDone!")