# Start and installation

Import the module to your python environment. You will need to install the requirements of the repository first by running `pip install -r requirements.txt` at the root of this document.

# First steps

Create a file called `cred.txt` on the same folder from where you'll be using the script. This file has to contain (In order):
```
USER=<your_aap_username>
PASSWORD=<your_aap_password>
ROOT=<DSP_api_root>
```

The API root can either be `https://submission-test.ebi.ac.uk/api/` for the -test environments or `https://submission.ebi.ac.uk/api/` for the production environment.

Please check the root of the repository if you have doubts about what the user and the password are.

`cred.txt` is ignored by the repository, so don't fear to accidentally upload it if you push changes.

Most of the functions have outputs that can be used by wrapper functions. If you want to see these outputs, just delete the `;` at the end of the code blocks.

# Running the script

Once everything has been set up, we can begin with the code:

In [1]:
import DSP_submission as ds # Import the object

dsp = ds.DspCLI()

The code is documented and everything can be accessed through the `help()` command in python. For example, if we would like to know all the methods available to the user and a description:

In [2]:
help(dsp)

Help on DspCLI in module DSP_submission object:

class DspCLI(builtins.object)
 |  Methods defined here:
 |  
 |  __init__(self)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  create_new_team(self, description: str, centre_name: str) -> True
 |      Create a new team for the user.
 |      :param description : str
 |                           Brief description of the team
 |      :param centre_name : str
 |                           Name of the centre of the submission (e.g. EBI)
 |      :returns response : requests.Response
 |                          Response object from requests module
 |  
 |  create_submission(self, name: str = '') -> True
 |      Create an empty submission within the selected team.
 |      :param name: str
 |                   Name of the submission. If not specified the submission will be identified by its UUID.
 |      :returns response : requests.Response
 |                          Response object from requests module
 |  
 |  

But you can look at a specific question:

In [None]:
help(dsp.create_new_team)

## Creating and selecting a team and a submission

On this notebook, we're going to go through the whole process of a mock submission. In order to do so, we need to begin by creating a team:

In [3]:
dsp.create_new_team(description="Mock for notebook", centre_name="EBI")

<Response [201]>

Response 201 means that it has successfully been created. Now we will proceed to select the team:

In [4]:
dsp.select_team()
;

The teams are the following:
1 - subs.test-team-68
2 - subs.test-team-64
3 - subs.test-team-60
4 - subs.test-team-61
5 - subs.test-team-62
Please select a number: 1


''

DSP assigns the name automatically, so it might be kinda hard to find your team if you have more than one. This function also returns a JSON with the team content, so this could be solved with a wrapper around this function.

Next step is to create a submission:

In [5]:
dsp.create_submission(name='Mock_submission')

<Response [201]>

And select it:

In [6]:
dsp.select_submission()
;

Submissions available for team 'subs.test-team-68' are the following:
1 - Name: Mock_submission
Please select a number: 1


''

This returns the submission, but you don't need to worry about that. 

If, at some point, you just want to look at what submissions are available for a team, you can also run:

In [7]:
dsp.show_submissions()
;

Submissions available for team 'subs.test-team-68' are the following:
1 - Name: Mock_submission


''

Or show the available teams for the user:

In [8]:
dsp.show_teams()
;

The teams are the following:
1 - subs.test-team-68
2 - subs.test-team-64
3 - subs.test-team-60
4 - subs.test-team-61
5 - subs.test-team-62


''

# I created my submission. Now what?

Once you have your submission created and you have selected it (Not necessary when creating), the next step is to determine where are your submittables. **This guide assumes you already have the submittable JSONs ready**.

As you might (Or might not) know, the DSP divides the "submittables" in 5 different categories, which will be validated differently. We have a hardcoded list of accepted submittables as an attribute of the object:

In [9]:
dsp.show_accepted_submittables()

1 - projects
2 - samples
3 - study
4 - assays
5 - assay_data


From here, you have 2 options:

1. Push the submittables from a directory with the function `self.submit_directory(directory_name)`
1. Push the submittables one by one with the function `self.create_submittable(json_content, submittable_type).

The first one is strongly discouraged as it requires all the submittables to have a filename like `<submittable_type>__<submittable_name>.json` (e.g. `samples__cell_suspension_1.json`) and doesn't account for validation errors due to sample linking.

For the purpose of this walkthrough, we will submit one by one by using the first function. As the json content, you can either pass a python dictionary with the content of the submittable or a string with the path of a JSON file. We will also check that the submittable has been created correctly:

In [10]:
submittables_directory = '/Users/enrique/HumanCellAtlas/hca-to-dsp-tools/test_submission/' # This is where the submittables are

submittables = ['projects__EmbryonicHindlimb.json',
                'study__EmbryonicHindlimb.json',
                'assays__lib_5.json',
                'assay_data__5386STDY7557335.bam.json']

for submittable in submittables:
    submittable_type = submittable.split('__')[0]
    submittable_path = submittables_directory + submittable
    dsp.create_submittable(submittable_path, submittable_type)

Creation of projects EmbryonicHindlimb was successful!
Creation of enaStudies EmbryonicHindlimb was successful!
Creation of sequencingExperiments lib_5 was successful!
Creation of sequencingRuns 5386STDY7557335.bam was successful!


You can now check that the submittables are in your submission with the function `self.show_submittable_names(<submittable_type>)`. If we want to check all of the submittables we can iterate over `self.accepted_submission_types`:

In [11]:
for submittable_type in dsp.accepted_submission_types:
    dsp.show_submittable_names(submittable_type)

Retrieving all projects. This might take a while...
There are 1 projects. Are you sure you want to print them all?[Y/n]
Y
For submission with ID fa3c1415-82b2-4d11-aa4f-18c1271304c3, projects files are:
EmbryonicHindlimb
Retrieving all samples. This might take a while...
Retrieving all enaStudies. This might take a while...
There are 1 enaStudies. Are you sure you want to print them all?[Y/n]
Y
For submission with ID fa3c1415-82b2-4d11-aa4f-18c1271304c3, enaStudies files are:
EmbryonicHindlimb
Retrieving all sequencingExperiments. This might take a while...
There are 1 sequencingExperiments. Are you sure you want to print them all?[Y/n]
Y
For submission with ID fa3c1415-82b2-4d11-aa4f-18c1271304c3, sequencingExperiments files are:
lib_5
Retrieving all sequencingRuns. This might take a while...
There are 1 sequencingRuns. Are you sure you want to print them all?[Y/n]
Y
For submission with ID fa3c1415-82b2-4d11-aa4f-18c1271304c3, sequencingRuns files are:
5386STDY7557335.bam


## Show validation results for your submittables

We are going to check that the submittables are valid. For the sake of this notebook, all of our submittables are valid except the `assay data`, which will be invalid due to the file not being uploaded.

In [12]:
dsp.show_validation_results() # Assumes you have selected a submission
;

1 - For submittable with alias EmbryonicHindlimb, validation results are as following:
		Core:Pass
		Ena:Pass
		JsonSchema:Pass

2 - For submittable with alias lib_5, validation results are as following:
		Core:Error
		Ena:Error
		JsonSchema:Error

3 - For submittable with alias 5386STDY7557335.bam, validation results are as following:
		Core:Pass
		Ena:Error
		FileReference:Error
		JsonSchema:Pass

4 - For submittable with alias EmbryonicHindlimb, validation results are as following:
		BioStudies:Pass
		JsonSchema:Pass



''

This is an overall view for your submittables. To show the specific errors, you have to call the function `self.show_validation_errors()`:

In [13]:
dsp.show_validation_errors()
;

lib_5
	Schema: Core
	Error(s):
		Could not find reference for ALIAS: cell_suspension_5 in TEAM: subs.test-team-68 



	Schema: Ena
	Error(s):
		Failed to validate experiment xml, error: string value 'HCA-Seq' is not a valid enumeration value for typeLibraryStrategy



	Schema: JsonSchema
	Error(s):
		.attributes.library_strategy[0].value error(s): should be equal to one of the allowed values: ["AMPLICON","ATAC-seq","Bisulfite-Seq","ChIA-PET","ChIP-Seq","CLONE","CLONEEND","CTS","DNase-Hypersensitivity","EST","FAIRE-seq","FINISHING","FL-cDNA","Hi-C","MBD-Seq","MeDIP-Seq","miRNA-Seq","MNase-Seq","MRE-Seq","ncRNA-Seq","OTHER","POOLCLONE","RAD-Seq","RIP-Seq","RNA-Seq","SELEX","ssRNA-seq","Synthetic-Long-Read","Targeted-Capture","Tethered Chromatin Conformation Capture","Tn-Seq","VALIDATION","WCS","WGA","WGS","WXS"].


5386STDY7557335.bam
	Schema: Ena
	Error(s):
		Failed to validate experiment xml, error: string value 'HCA-Seq' is not a valid enumeration value for typeLibraryStrategy



	Sch

''

## Correct validation errors

Oh no! We have introduced HCA-Seq instead of RNA-Seq as the library strategy!

If there are validation errors, you can easily correct them by replacing the submittable. Luckily, there is a function for that in the class: `self.replace_submittable()`. We are going to call it with a the submittables corrected:

In [14]:
replacement_path = '/Users/enrique/HumanCellAtlas/hca-to-dsp-tools/test_submission/assays__lib_5_replacement.json'

dsp.replace_submittable(replacement_path, 'assays')

Replacement of sequencingExperiments lib_5 was successful!


<Response [200]>

We can check once again for the validation errors:

In [15]:
dsp.show_validation_errors()
;

lib_5
	Schema: Core
	Error(s):
		Could not find reference for ALIAS: cell_suspension_5 in TEAM: subs.test-team-68 


5386STDY7557335.bam
	Schema: FileReference
	Error(s):
		The file [5386STDY7557335.bam] referenced in the metadata is not exists on the file storage area.




''

## Upload files

Once we have finished correcting all the metadata errors, we will notice that there is still 1 validation error: The file does not exist on the file storage area.

We can easily change that by uploading the file with the provided method in the class:

In [16]:
path_to_file = '/Users/enrique/HumanCellAtlas/hca-to-dsp-tools/5386STDY7557335.bam'
dsp.upload_file(path_to_file, chunk_size=300)

Uploading file 5386STDY7557335.bam...


100%|██████████| 309/309 [00:40<00:00,  7.62it/s]


For the purpose of the notebook, a mock, small size bam file has been used.

If not specified, the chunk_size will default to 102400 (10 Mb). This provides with higher speeds on a more stable connection.

### Resume file upload
If the file stops uploading, fear not! You can either call `self.upload_file()` again (It will detect the filename from the path to file) or call `self.resume_file_upload(<filename>)`.

We will interrupt a file upload and re-start it:


In [17]:
dsp.upload_file(path_to_file, chunk_size=300)

Uploading file 5386STDY7557335.bam...
Seems like this file is giving an error. This might be due to the file being resumed from anupload. Trying to resume upload for file 5386STDY7557335.bam
https://submission-test.ebi.ac.uk/files/08e44848f06aea3611fb01f08b55fe2d
Resuming file upload of 5386STDY7557335.bam...


100%|██████████| 245/245 [00:31<00:00,  7.88it/s]


**Note**: Please make sure that you are in the right submission before uploading files. Files will be associated to the submission selected at the moment of upload.

### Delete file
If you want to delete a file, you just need to use the method `self.delete_file(<filename>)`:

In [18]:
dsp.delete_file('5386STDY7557335.bam')

Successfully deleted file


<Response [204]>

# Finishing the submission

Once everything is in order and there are no validation errors, you should be able to finish the submission by using the function `self.finish_submission()`. 

For the sake of this notebook, instead of finishing, we are going to delete the submission and the team.

Currently there are no functions to do that so we will do it manually.

In [19]:
dsp._delete(dsp.submission.get('_links', {}).get('self:delete', {}).get('href')) # Delete the submission

# To delete the team, access the AAP webpage (https://explore.aai.ebi.ac.uk/) and deactivate the domain.

<Response [204]>

# What other things can I do?

- Delete submittables: `self.delete_submittable(<submittable_type>, <alias>)`
- Show submission status: `self.show_submission_status()`