In [None]:
__author__ = 'Alice Jacques <alice.jacques@noao.edu>, NOIRLab Astro Data Lab Team <datalab@noao.edu>'
__version__ = '20200915'
__keywords__ = ['vospace','mydb','store files','query']

# How to use the Data Lab *Command Line Client* Service

### Table of Contents

* [Summary](#summary)
* [Disclaimer & attribution](#attribution)
* [Imports & setup](#imports)
* [Handling VOSpace directories/files via the datalab command line client](#datalabcommand)
    - [Uploading a file](#cmdupload)
    - [Downloading a file](#cmddownload)
    - [Copying a file/directory](#cmdcopy)
    - [Linking a file/directory](#cmdlink)
    - [Creating a directory](#cmdcreate)
    - [Moving a file/directory](#cmdmove)
    - [Deleting a file](#cmddeletefile)
    - [Deleting a directory](#cmddeletedirectory)
    - ~[Tagging a file/directory](#cmdtag)~
* [Handling MyDB tables via the datalab command line client](#datalabcmdmydb)
    - [Listing MyDB tables and a table's schema](#listcmdmydb)
    - [Creating a MyDB table](#createcmdmydb)
    - [Inserting data into a MyDB table](#insertcmdmydb)
    - [Importing data into a MyDB table](#importcmdmydb)
    - [Truncating a MyDB table](#truncmdmydb)
    - [Copying a MyDB table](#copycmdmydb)
    - [Renaming a MyDB table](#renamecmdmydb)
    - [Dropping a MyDB table](#dropcmdmydb)
    
*Note: those that are crossed out above indicate this feature is currently not working.*    

<a class="anchor" id="summary"></a>
# Summary

This notebook documents how to use the Data Lab virtual storage system VOSpace and a user's personal MyDB database via the command line using the `datalab` command. The `datalab` command provides an alternate command line way to work with the auth client, query client, and store client. The API documentation can be found [here](https://datalab.noao.edu/docs/manual/UsingTheNOAODataLab/CommandLineTools/TheDatalabCommand/TheDatalabCommand.html).

<a class="anchor" id="attribution"></a>
# Disclaimer & attribution
If you use this notebook for your published science, please acknowledge the following:

* Data Lab concept paper: Fitzpatrick et al., "The NOAO Data Laboratory: a conceptual overview", SPIE, 9149, 2014, http://dx.doi.org/10.1117/12.2057445

* Data Lab disclaimer: http://datalab.noao.edu/disclaimers.php

<a class="anchor" id="imports"></a>
# Imports & setup

In [None]:
from dl import queryClient as qc, storeClient as sc
from dl.helpers.utils import convert

In [None]:
qc.set_svc_url('http://dltest.datalab.noao.edu/query')
sc.set_svc_url('http://dltest.datalab.noao.edu/storage')

The Data Lab Command Line Client (DCLC) is a Python-based package that provides an alternate way to interact with the various Data Lab services. It can be installed with 

    pip install --ignore-installed --no-cache-dir noaodatalab
    
It is invoked via the datalab command. 

We need to be logged into the Data Lab to use the query client and store client. If you are not already logged in, enter your Data Lab username after '*user=*' and enter your password for Data Lab after '*password=*' below:

In [None]:
!datalab login user=ajacques_dltest password=

<a class="anchor" id="datalabcommand"></a>
# 1. Handling directories/files via the datalab command line client

The `datalab` command provides a way to use a user's VOSpace. VOSpace is a convenient virtual storage space for users to save their work. It can store any data or file type.

Before we start this section, let's first query some example data from a Data Lab database and save it locally as a CSV file named `smags.csv`:

In [None]:
query = 'SELECT gmag, imag, rmag, zmag FROM smash_dr1.object LIMIT 10'
qc.query(adql=query,fmt='csv',out='./smags.csv')

<a class="anchor" id="cmdupload"></a>
### 1.1 Uploading a file

Let's say we want to upload a file from our local disk to the virtual storage:

In [None]:
!datalab put fr="./smags.csv" to="vos://smags.csv"

We can check that it has been uploaded to VOSpace:

In [None]:
!datalab ls name="vos://smags.csv"

<a class="anchor" id="cmddownload"></a>
### 1.2 Downloading a file

Let's say we want to download a file from our virtual storage space, in this case the CSV file that we uploaded to it in the last cell:

In [None]:
!datalab get fr="vos://smags.csv" to="./mysmags.csv"

<a class="anchor" id="cmdcopy"></a>
### 1.3 Copying a file/directory

We want to put a copy of the file in a remote work directory:

In [None]:
!datalab cp fr="vos://smags.csv" to="vos://tmp/smags.csv"

<a class="anchor" id="cmdlink"></a>
### 1.4 Linking to a file/directory

Sometimes we want to create a link to a file or directory. The following creates a (soft) link to the specified file at the given location:

In [None]:
!datalab ln fr="vos://tmp/mags.csv" to="vos://mags.csv"

<a class="anchor" id="cmdlist"></a>
### 1.5 Listing a file/directory

We can see all the files that are in a specific directory or get a full listing for a specific file:

In [None]:
!datalab ls name="vos://tmp"

<a class="anchor" id="cmdcreate"></a>
### 1.6 Creating a directory

We can create a directory:

In [None]:
!datalab mkdir name="vos://results"

<a class="anchor" id="cmdmove"></a>
### 1.7 Moving a file/directory

We can move a file or directory:

In [None]:
!datalab mv fr="vos://tmp/smags.csv" to="vos://results"

<a class="anchor" id="cmddeletefile"></a>
### 1.8 Deleting a file

We can delete a file:

In [None]:
!datalab rm name="vos://results/smags.csv"

<a class="anchor" id="cmddeletedirectory"></a>
### 1.9 Deleting a directory

We can also delete a directory:

In [None]:
!datalab rmdir name="vos://results"

<a class="anchor" id="cmdtag"></a>
### ~1.10 Tagging a file/directory~
**Warning**: Tagging is currently **not** working in the Data Lab storage manager. This notebook will be updated when the problem has been resolved.

We can tag any file or directory with arbitrary metadata:

In [None]:
#!datalab tag name="vos://results" tag="The results from my analysis"

<a class="anchor" id="datalabcmdmydb"></a>
# 2. Handling MyDB tables via the datalab command line client
The `datalab` command provides a way to use a user's MyDB database. MyDB is a useful virtual storage space for users to save their work as a table. It can only store data tables. **_NOTE: The data must be in the form of either a CSV file or Pandas Dataframe object in order to load it into MyDB._**

<a class="anchor" id="listcmdmydb"></a>
### 2.1 Listing MyDB tables and a table's schema

We can list all of the MyDB tables currently in a user's database with the `mydb_list` function:

In [None]:
!datalab mydb_list

We can also list the schema and schema's datatype in a specified MyDB table:

In [None]:
!datalab mydb_list table="usno_objects"

<a class="anchor" id="createcmdmydb"></a>
### 2.2 Creating a MyDB table 
We can create a new empty MyDB table with a user-provided schema file using the `mydb_create` function with the following parameters:

*table* - name of the new MyDB table to create  
*schema* - location and name of the schema definition to be in the table

The schema definition is stored in a text file, in this case in the user notebook directory. The schema definition file is a CSV-formatted file that contains column name and (Postgres) data type, one row per column. The general format is:

`Columnname1,datatype1\nColumnname2,datatype2\nColumnname3,datatype3`

Let's first create a simple (id,ra,dec) schema of an integer value and two double values and save it locally as a text file named `schema.txt`:

In [None]:
schema_str = 'id,integer\nra,double precision\ndec,double precision\n'
with open ('schema.txt','w') as fd:
    fd.write (schema_str)

Now let's use the `mydb_create` function to make a new table in MyDB with the schema definition we created above:

In [None]:
!datalab mydb_create table="table1" schema="./schema.txt"

Let's make sure the table was created in MyDB by calling the `mydb_list` function:

In [None]:
!datalab mydb_list

Let's also make sure the schema was loaded into the table by calling the `mydb_list` function on the table:

In [None]:
!datalab mydb_list table="table1"

<a class="anchor" id="insertcmdmydb"></a>
### 2.3 Inserting data into a MyDB table

We can choose to insert data saved on a local computer or insert data from VOSpace into a pre-existing MyDB table. The data must be in the form of either a CSV file or Pandas Dataframe object in order to load it into MyDB. Use the `mydb_insert` function and the following parameters:  

*table* - name of the pre-existing MyDB table in which to insert the data  
*data* - location and name of the data to insert into the table

We will use the `exampledata.csv` file provided in this notebook directory as our data to insert into the `table1` table we created a few cells above:

In [None]:
!datalab mydb_insert table="table1" data="./exampledata.csv"

**FROM A PYTHON SCRIPT (i.e. within this notebook, not on command line):** Let's make sure the data was inserted into the table by converting the table into a Pandas Dataframe and printing it on-screen:

In [None]:
df1=convert(qc.query(sql="SELECT * FROM mydb://table1"))
df1

<a class="anchor" id="importcmdmydb"></a>
### 2.4 Importing data into a MyDB table
We can import data saved on a local computer or import data from VOSpace into a MyDB data table. The data must be in the form of either a CSV file or Pandas Dataframe object in order to load it into MyDB. Use the `mydb_import` function with the following parameters: 

*table* - name of the new MyDB table to create with the imported data  
*data* - location and name of the data to import  

Let's first query some example data from a Data Lab database and save it locally as a CSV file named `gaia_result.csv`:

In [None]:
query = "select * from gaia_dr1.gaia_source limit 10"
qc.query (adql=query, fmt='csv', out='./gaia_result.csv')

Now we can import the queried data into a new MyDB table:

In [None]:
!datalab mydb_import table="gaia_result_table" data="./gaia_result.csv"

Let's make sure the table was created in MyDB by calling the `mydb_list` function:

In [None]:
!datalab mydb_list

**FROM A PYTHON SCRIPT (i.e. within this notebook, not on command line):** Let's also make sure the data was imported into the table by converting the table into a Pandas Dataframe and printing it on-screen:

In [None]:
df2=convert(qc.query(sql="SELECT * FROM mydb://gaia_result_table"))
df2

Similarly, we can use the `mydb_import` function to import data from VOSpace into a MyDB table:

In [None]:
!datalab mydb_import table="magsvos" data="vos://smags.csv"

Let's make sure the table was created in MyDB by calling the `mydb_list` function:

In [None]:
!datalab mydb_list

We can view the schema definition of the table by calling the `mydb_list` function on the table:

In [None]:
!datalab mydb_list table="magsvos"

**FROM A PYTHON SCRIPT (i.e. within this notebook, not on command line):** Let's also make sure the data was imported into the table by converting the table into a Pandas Dataframe and printing it on-screen:

In [None]:
df3=convert(qc.query(sql="SELECT * FROM mydb://magsvos"))
df3

<a class="anchor" id="truncmdmydb"></a>
### 2.5 Truncating a MyDB table 
            
We can truncate a MyDB table, i.e. drop all rows but keep the table definition (schema), with the `mydb_truncate` function and the following parameter:

*table* - name of the MyDB table to truncate

In [None]:
!datalab mydb_truncate table="table1" 

Let's make sure the table was truncated by calling the `mydb_list` function on the table: 

In [None]:
!datalab mydb_list table="table1"

**FROM A PYTHON SCRIPT (i.e. within this notebook, not on command line):** We can also make sure the table was truncated by converting the table into a Pandas Dataframe and printing it on-screen:

In [None]:
df4=convert(qc.query(sql="SELECT * FROM mydb://table1"))
df4

<a class="anchor" id="copycmdmydb"></a>
### 2.6 Copying a MyDB table
We can copy a MyDB table that currently exists in a user's MyDB database with the `mydb_copy` function and the following parameters:

*source* - name of table to copy  
*target* - name of new table with copied data from source table

In [None]:
!datalab mydb_copy source="magsvos" target="magsvos_copy"

Let's make sure the newly copied table exists in MyDB database by calling the `mydb_list` function:

In [None]:
!datalab mydb_list

<a class="anchor" id="renamecmdmydb"></a>
### 2.7 Renaming a MyDB table
We can choose to rename a MyDB table with the `mydb_rename` function and the following parameters:

*old* - name of table to rename  
*new* - new name of table

In [None]:
!datalab mydb_rename old="magsvos_copy" new="newermagsvos"

Let's make sure the name was changed by calling the `mydb_list` function:

In [None]:
!datalab mydb_list

<a class="anchor" id="dropcmdmydb"></a>
### 2.8 Dropping a MyDB table
We can remove a MyDB table from a user's MyDB database by calling the `mydb_drop` function and the following parameter:

*table* - name of the table we wish to remove from MyDB database

In [None]:
!datalab mydb_drop table="newermagsvos"

Let's make sure the MyDB table was dropped by calling the `mydb_list` function:

In [None]:
!datalab mydb_list

# Clean up MyDB and VOSpace
For clean-up purposes, let's remove the tables we created in MyDB and files/directories we created in VOSpace.

In [None]:
!datalab mydb_drop table="gaia_result_table"
!datalab mydb_drop table="table1"
!datalab rm name="vos://smags.csv"
!datalab rm name="vos://mysmags.csv"
!datalab rmdir name="vos://tmp"