# Real world example: Upload Data and Create Entities

## Scenario

A measurement setup produces new files which should be uploaded to openBIS. In this example we will use the generic, well-known [IRIS data set](https://en.wikipedia.org/wiki/Iris_flower_data_set).
We want to create code to upload a data set and attacht it to an experimental step. 

The script should do the following:

* make sure the project and experiment exist - create if necessary
* read the measurements description from an additional file (measurement.txt)
* create the name/code of the experimental step with this information
* search this step - create if it is not already there, setting description from measurement.txt
* upload the two files (iris.csv, measurement) as a dataset to an experimental step

This example shows the **interactive development process** - step by step from the first line to the complete script.

To begin with it, **just run the cell below after replacing "mmusterm" with your BAM username**. This will be used both for the connection and for selecting space.

In [None]:
username = "mmusterm"

## Start: Connecting to openBIS

In [None]:
import getpass
from pybis import Openbis
o = Openbis('https://schulung.datastore.bam.de')
o.login(username, getpass.getpass('Enter openBIS password: '))

## Optional: Create dummy data

In [None]:
space_code = username.upper()
project_code = 'IRIS_PROJECT'
collection_code = 'IRIS_EXPERIMENT'
object_code = 'IRIS_STEP'

my_space = o.get_space(space_code)

try:
    my_project = my_space.get_project(project_code)
except ValueError:
    my_project = o.new_project(space=my_space, code=project_code)
    my_project.save()

try:
    my_collection = my_space.get_collection(collection_code)
except ValueError:
    my_collection = o.new_collection(project=project_code, code=collection_code, type='DEFAULT_EXPERIMENT')
    my_collection.save()

my_object = my_space.get_objects(code=object_code, project=project_code, collection=my_collection, type='EXPERIMENTAL_STEP')[0]
if not my_object:
    my_object = o.new_object(code=object_code, collection=my_collection, type='EXPERIMENTAL_STEP')
    my_object.save()
# download the data file
import requests
resp = requests.get('https://gist.githubusercontent.com/netj/8836201/raw/6f9306ad21398ea43cba4f7d537619d0e07d5ae3/iris.csv')
with open('iris.csv', 'w') as csvfile:
     csvfile.write(resp.text)
with open('measurement.txt', 'w') as txtfile:
     txtfile.write('foo\nbar\nbaz\n')

## Upload a dataset and attach to an experimental step

### Explore types and entities

#### List dataset types

In [None]:
o.get_dataset_types() # list dataset types to select the desired one

In [None]:
# select and store dataset type
dataset_type = 'RAW_DATA'
dataset_type

#### List collections (experiments)

In [None]:
my_space_code = username.upper() # use username or set manually
my_space = o.get_space(my_space_code) # get the space which will be used
my_space.get_collections() # list collections to check where we want to upload the dataset

In [None]:
# select and store this IRIS_EXPERIMENT
my_experiment = my_space.get_collection('IRIS_EXPERIMENT') #save selected collection in a variable
my_experiment

#### List objects (samples or experimental steps)

In [None]:
my_space.get_objects(collection=my_experiment) # list objects to check where we want to upload the dataset

In [None]:
my_step = my_space.get_objects(code='IRIS_STEP', collection=iris_experiment)[0] #save selected object in a variable
# or: my_step = my_space.get_object('/MMUSTERM/PYBISTUTORIAL/IRIS_STEP')
my_step

### Upload a dataset and attach to the object
The dataset will contain just two files: `iris.csv` and `measurement.txt`.

In [None]:
my_dataset = o.new_dataset(
    type = dataset_type, # selected type for the dataset
    collection = my_experiment, # selected collection
    object = my_step, # selected object
    files = ['iris.csv', 'measurement.txt'] # iris dataset to upload
)
my_dataset.save()

### Modify description (property) of the experimental step after upload

In [None]:
# read the content of measuremet.txt
with open('measurement.txt', 'r') as txtfile:
     desc = txtfile.read()
print(desc)
my_step.props['experimental_step.experimental_description'] = desc
my_step.save()

Now we have all the code to upload a dataset to an existing object and alter it's properties.

## Create the experimental step, experiment and project if needed

### Search or create the experimental step/object

For every measurement series a new experimental step should be used, based on the contents of the file `measurements.txt`. So we need to read this file first and use the first word for code of the experimental step.

In [None]:
with open('measurement.txt', 'r') as txtfile:
     desc = txtfile.read()
my_step_name = 'IRIS_'+desc.split()[0].upper()
my_step_name

In [None]:
steps = o.get_objects(my_step_name, project=my_project)
if steps:
    my_step = steps[0]
else:
    my_step = o.new_object(
        type = 'EXPERIMENTAL_STEP',
        project = my_project,
        collection = my_collection,
        code = my_step_name
    )
    my_step.save()
my_step # is now an existing or newly created step

### Search or create the experiment / collection and the project

For experiments and projects we can use the very powerfull `try-except` mechanism of python. Just try to get something. If it fails: create it!

In [None]:
project_code = 'PYBISTUTORIAL'
collection_code = 'IRIS_EXPERIMENT'
collection_type = 'DEFAULT_EXPERIMENT'

# space
my_space_code = username.upper() # use username or set manually
my_space = o.get_space(my_space_code) # get the space which will be used

# project
try:
    my_project = my_space.get_project(project_code)
except ValueError:
    my_project = o.new_project(space=my_space, code=project_code)
    my_project.save()

# collection
try:
    my_collection = my_space.get_collection(collection_code)
except ValueError:
    my_collection = o.new_collection(project=project_code, code=collection_code, type=collection_type)
    my_collection.save()
# now project and collection should exist!

## Putting it all together: the complete script

### Create a PAT to be used with this script
We may use a PAT instead of username-password for the script - but we have to creat the PAT first. Adjust the code below and run it.

In [None]:
import getpass
from pybis import Openbis
o = Openbis(url='https://schulung.datastore.bam.de/')
o.login('mmusterm', getpass.getpass('Enter openBIS password: '))
pat = o.get_or_create_personal_access_token('my-test-session')
print(pat.permId)
o.logout()

### The Complete Script

Now we combine all of the code above to a cell/script that can be used standalone. loop

In [None]:
from pybis import Openbis

# settings
pat = 'INSERT_PAT_HERE'
space_code = 'MMUSTERM'
project_code = 'IRIS_PROJECT'
collection_code = 'IRIS_EXPERIMENT'
collection_type = 'DEFAULT_EXPERIMENT'
object_type = 'EXPERIMENTAL_STEP' 

## connect and login - you should use a PAT instead
o = Openbis('https://schulung.datastore.bam.de/')
o.login('mmusterm', 'bamisgreat')
# o = Openbis('https://schulung.datastore.bam.de/', token=pat)

# space
my_space = o.get_space(space_code) # get the space which will be used

# project and collection
try:
    my_project = my_space.get_project(project_code)
except ValueError:
    my_project = o.new_project(space=my_space, code=project_code)
    my_project.save()
try:
    my_collection = my_space.get_collection(collection_code)
except ValueError:
    my_collection = o.new_collection(project=project_code, code=collection_code, type=collection_type)
    my_collection.save()

# object/step
with open('measurement.txt', 'r') as txtfile:
     desc = txtfile.read()
my_step_name = 'IRIS_'+desc.split()[0].upper()
steps = o.get_objects(my_step_name, project=my_project)
if steps:
    my_step = steps[0]
else:
    my_step = o.new_object(
        type = object_type,
        project = my_project,
        collection = my_collection,
        code = my_step_name,
        props = {'experimental_step.experimental_description' : desc}
    )
    print(my_step)
    my_step.save()

# dataset
my_dataset = o.new_dataset(
    type = dataset_type, # selected type for the dataset
    collection = my_experiment, # selected collection
    object = my_step, # selected object
    files = ['iris.csv', 'measurement.txt'] # iris dataset to upload
)
my_dataset.save()
o.logout()