# Accessing your data in StormDB using queries

Background/motivation here...
When collecting rather large data, you want to minimize duplication of data, so that it wont take up place. Noteworthy when you're having a big pile of files stacking up. With os.symlink you will be able to create symbolic links to your raw-data.

How it is built up:
* Connect to StormDB
* Using lists for series storage, where you can set specific criterias for what you need.
* It will create the destination folder for the symbolic links, if the folder does not exist.
* Using lists for creating symbolic links.

## Prerequisites

You must have the module `stormdb-python` in your Python path. A stable version is installed on the servers, but you may also want to clone a copy of the module into your project-folder.

In [None]:
# These are optional, if you wish to modify your path
# import sys
# sys.path.insert(0, '/path/to/your/local/copy/of/stormdb-python)

## Import and initialise the Query-object
Remember to edit the `proj_name` to your project

In [1]:
from stormdb.access import Query
from os.path import join
import os

In [None]:
# optional: see documentation for Query
Query?

In [2]:
proj_name = 'MEG_service'
qy = Query(proj_name)

To see what methods a Python-object offers, type the name of the instance, a dot, and hit Tab!

## Do a search for series (file names) matching a particular pattern
You can see the series, if you 
1. log in to StormDb.
2. Click on the project
3. Click on the subject 
4. Click on the study.
5. Find the right series.

The cell below finds every series with "aud_vis" in their name and puts them into the list.

In [4]:
#series_list = qy.filter_series('aud_vis*')

In [None]:
series_list = qy.filter_series('*')

In [None]:
join?

## Different useful commands for this notebook
* `series_list`
    * This will show you what is in the list.
* `series_list[0]['path']`
    * Will show the path, for the first series in the list
* `series_list[0]['files'][0]`
    * Will show the first filename, for the first series in the list
* `join(series_list[0]['path'], series_list[0]['files'][0])`
    * By "joining" the above 2 commands, you will be able to set up the path for the file.
* `os.symlink?`
    * Will show information about how to write the method.

## Exercise: Creating a "Neuromag-like" folder structure for raw files

Elekta Neuromag MEG data are saved on the acquisition computer as:

```bash
/neuro/data/sinuhe/neuromag_project_name/subj_ID/yymmdd/foo_raw.fif
/neuro/data/sinuhe/neuromag_project_name/subj_ID/yymmdd/bar_raw.fif
/neuro/data/sinuhe/neuromag_project_name/subj_ID/yymmdd/bar_raw-1.fif
/neuro/data/sinuhe/neuromag_project_name/subj_ID/yymmdd/bar_raw-2.fif
```

where the suffix `-1`, `-2`, ..., indicates that the acquisition `bar` was so long that it was split into a total of 3 files of maximum size 2 GB each.

1. Make a folder called scratch/raw_link (`os.makedirs`)
2. Make sub-folder for study date & subject ID (00XX_ABC)
3. Use `os.symlink` to make symbolic link from raw-folder to `raw_link/subj_ID/yymmdd/foo_raw.fif`
    * Note that the source file name is not what we want the destination file to be called
    * instead, if the source is `raw/.../files/PROJ0xxx_SUBJ0yyy_SER0zz_FILESNO001.fif`, the destination should be `raw_link/0yyy_ABC/yymmdd/whatever_the_series_name_is_raw.fif`

End result should look like this:

```bash
/projects/MINDLAB_PROJ_NAME/scratch/raw_links/subj_ID/yymmdd/foo_raw.fif
/projects/MINDLAB_PROJ_NAME/scratch/raw_links/subj_ID/yymmdd/bar_raw.fif
/projects/MINDLAB_PROJ_NAME/scratch/raw_links/subj_ID/yymmdd/bar_raw-1.fif
/projects/MINDLAB_PROJ_NAME/scratch/raw_links/subj_ID/yymmdd/bar_raw-2.fif
```


In [None]:
out_folder = join('/projects',proj_name,'scratch/raw_link',series_list[0]['subjectcode'],series_list[0]['study'][2:8])

In [None]:
# Ensure that output path exists
if not os.path.exists(out_folder):
    os.makedirs(out_folder)
print('Output folder: {:s}'.format(out_folder))

If get into trouble, delete the raw_links-folder and start again!
```bash
rm -rf raw_links
```

In [None]:
overwrite = True
for x in series_list:
    for idx, fil in enumerate(x['files']):
        out_fname = join(out_folder, '{0}_raw.fif'.format(x['seriename']))
        if idx > 0:  # data size > 2 GB
            out_fname = out_fname[:-4] + '-{0}.fif'.format(idx)
 
        try:
            os.symlink((join(x['path'], fil)), out_fname)
        except OSError as e:
            if e.errno == 17:  # FileExists
                if not overwrite:
                    print('Link exists, skipping {0}'.format(out_fname))
                else:
                    print('Link exists, re-linking {0}'.format(out_fname))
                    os.remove(out_fname)
                    os.symlink((join(x['path'], fil)), out_fname)
        #print(out_fname)

In [None]:
os.symlink?