# Python MDCS Class

This notebook is designed to demonstrate the simple functionality of the Python mdcs.MDCS class.

## Library Imports

In [1]:
import os
import time

#This imports the mdcs-api Python tools
import mdcs

## Initialize MDCS Link

Interacting with a curator requires certain access information.  The MDCS class helps with this by allowing this information to be defined once, and does a quick check that the information is valid.

Keyword arguments for initilization:

- host = root url for the curator.

- user = username, or tuple consisting of (username, password).

- pswd = password.

- cert = certification file that may be necessary for some curators.

- records_fetch = boolean indicating if a local copy of all records are built during initialization. Default value is True.

#### Initialize an empty MDCS instance and fill in values separately

In [2]:
empty = mdcs.MDCS()
empty.host = 'http://127.0.0.1:8000/'
empty.user = ('admin', 'admin')

#### Initilize a link to a local MDCS using password prompt

In [3]:
local =  mdcs.MDCS(host='http://127.0.0.1:8000/', user='admin')

Enter password for admin on http://127.0.0.1:8000/:
password:········


#### Initilize a link to a remote MDCS using both certification and password files

In [4]:
iprhub = mdcs.MDCS(host='https://iprhub.nist.gov/', 
                   user='lmh1',
                   pswd='C:/users/lmh1/documents/iprhub/iprhub_password.txt',
                   cert='C:/users/lmh1/documents/iprhub/iprhub-ca.pem')

## Explore Types and Templates

When initialized with access information, the MDCS object will query the database and build local Pandas DataFrames of all the types and templates. Copies of these DataFrames can be accessed using the MDCS.templates and MDCS.xsd_types properties.

#### List templates attribute names 

In [5]:
for key in local.templates:
    print key

XSLTFiles
content
dependencies
exporters
filename
hash
id
templateVersion
title
version


In [6]:
local.templates

Unnamed: 0,XSLTFiles,content,dependencies,exporters,filename,hash,id,templateVersion,title,version
0,[],"<?xml version=""1.0"" encoding=""UTF-8"" standalon...","[57b32b34d92ecc10c8803322, 57b32b35d92ecc10c88...",[],trash,e6cb30d24c44e96a72714322aae339f96d867454,57b32b3fd92ecc10c880333e,57b32b3fd92ecc10c880333d,trash,1
1,[],"<?xml version=""1.0"" encoding=""UTF-8"" standalon...",[57b338efd92ecc10c8803341],[],trash,23c20caab03fe167efc727c977cd66fafbe96ea2,57b338f0d92ecc10c8803344,57b338f0d92ecc10c8803343,trash,1
2,[],"<?xml version=""1.0"" encoding=""UTF-8"" standalon...",[],[],record-calculation-system-relax.xsd,d516e03d699d3f2f8b51639f7497c57a6ca2869a,5852caddd92ecc1b88224e7d,5852caddd92ecc1b88224e7c,calculation-system-relax,1
3,[],"<?xml version=""1.0"" encoding=""UTF-8"" standalon...",[],[],record-calculation-cohesive-energy-relation.xsd,39719f7d51c3f6ebb0d4c355a827dbad742e1719,5852cadfd92ecc1b88224e7f,5852cadfd92ecc1b88224e7e,calculation-cohesive-energy-relation,1
4,[],"<?xml version=""1.0"" encoding=""UTF-8"" standalon...",[],"[{u'id': u'569e544f7a1398b5d8e01679', u'name':...",record-LAMMPS-potential.xsd,97a9a8f39b9d8cbcab10b8dafe38bb82e9014b58,5858461dd92ecc08149ef7d2,5858461dd92ecc08149ef7d1,LAMMPS-potential,1
5,[],"<?xml version=""1.0"" encoding=""UTF-8"" standalon...",[],"[{u'id': u'569e544f7a1398b5d8e01679', u'name':...",record-crystal-prototype.xsd,a832b6fb6c4fee0e8a7746c47ddac015845a13b3,585ac30bd92ecc0e6c7d541b,585ac30bd92ecc0e6c7d541a,crystal-prototype,1


#### List names of all templates on local

In [7]:
for template in local.templates.title.tolist():
    print template

trash
trash
calculation-system-relax
calculation-cohesive-energy-relation
LAMMPS-potential
crystal-prototype


#### List names of all types on local

In [8]:
for xsd_type in local.xsd_types.title.tolist():
    print xsd_type

uncertainty
physical-quantity
physical-quantity-numerator_L-denominator_
physical-quantity-numerator_MLL-denominator_TT
physical-quantity-numerator_M-denominator_LTT
chemical-element
chemical-element
chemical-element
atom
interatomic-potential-compact-id
elastic-stiffness-voigt
physical-quantity-numerator_1-denominator_
atomic-cell
atomic-system
physical-quantity-numerator_0-denominator_
note
remote-file
lammps-potential
uncertainty-new
new-uncertainty
newt-uncertainty


### Finding a Specific Template or Type

Pandas has great built-in tools for parsing through the listed records. For convience, the MDCS class also has the methods get_template() and get_xsd_type() for locating a specific template or type. These methods return the Series of the matching template/type and issue an error if exactly one matching template/type is not found. 

One optional argument is allowed. If given, a limiting search is performed using the template's title and filename (if not specified as keywords).

If any of the following keyword arguments are given, the returned template must have matching corresponding values for the terms.

- filename = name of the file associated with the template/type

- title = name assigned to the template/type

- version = version assigned to the template/type

#### Get template information for LAMMPS-potential template

In [8]:
local.get_template('LAMMPS-potential')

XSLTFiles                                                         []
content            <?xml version="1.0" encoding="UTF-8" standalon...
dependencies                              [57b338efd92ecc10c8803341]
exporters                                                         []
filename                                 record-lammps-potential.xsd
hash                        23c20caab03fe167efc727c977cd66fafbe96ea2
id                                          57b338f0d92ecc10c8803344
templateVersion                             57b338f0d92ecc10c8803343
title                                               LAMMPS-potential
version                                                            1
Name: 1, dtype: object

## Explore Records

Records are similarly copied to a local DataFrame and can be viewed with the property MDCS.records. 

#### Display the content of iprhub.records

In [9]:
iprhub.records

Unnamed: 0,_id,content,schema,title
0,5723b73e2403f03d834cfbd1,"<?xml version=""1.0"" encoding=""utf-8""?>\n<jarvi...",5723b6f92403f03d834cfbcf,JARVIS-one-demo
1,573330b12403f0029503762c,"<?xml version=""1.0"" encoding=""utf-8""?>\n<LAMMP...",56d45ae62403f01fe9027216,1985--Foiles-S-M--Ni-Cu
2,573330b22403f0029503762d,"<?xml version=""1.0"" encoding=""utf-8""?>\n<LAMMP...",56d45ae62403f01fe9027216,1987--Ackland-G-J--Ag
3,573330b22403f0029503762e,"<?xml version=""1.0"" encoding=""utf-8""?>\n<LAMMP...",56d45ae62403f01fe9027216,1987--Ackland-G-J--Au
4,573330b22403f0029503762f,"<?xml version=""1.0"" encoding=""utf-8""?>\n<LAMMP...",56d45ae62403f01fe9027216,1987--Ackland-G-J--Cu
5,573330b32403f00295037630,"<?xml version=""1.0"" encoding=""utf-8""?>\n<LAMMP...",56d45ae62403f01fe9027216,1987--Ackland-G-J--Mo
6,573330b32403f00295037631,"<?xml version=""1.0"" encoding=""utf-8""?>\n<LAMMP...",56d45ae62403f01fe9027216,1987--Ackland-G-J--Ni
7,573330b32403f00295037632,"<?xml version=""1.0"" encoding=""utf-8""?>\n<LAMMP...",56d45ae62403f01fe9027216,1989--Adams-J-B--Ag
8,573330b42403f00295037633,"<?xml version=""1.0"" encoding=""utf-8""?>\n<LAMMP...",56d45ae62403f01fe9027216,1989--Adams-J-B--Au
9,573330b42403f00295037634,"<?xml version=""1.0"" encoding=""utf-8""?>\n<LAMMP...",56d45ae62403f01fe9027216,1989--Adams-J-B--Cu


### Finding a Specific Record

The get_record() method can be used for locating a specific record. Returns the Series of the matching record. Issues an error if exactly one matching record is not found. 

If any of the following keyword arguments are given, the returned template must have matching corresponding values for the terms.

Keyword arguments are:

- title = name or id associated with the record.

- template = template (schema) associated with the record. Can be title, filename, id or Series representation of a template.

#### Call get_record with title only

In [9]:
iprhub.get_record('1987--Ackland-G-J--Cu')

_id                                 585af1f02403f01fb735c848
content    <?xml version="1.0" encoding="utf-8"?>\n<LAMMP...
schema                              585af0502403f01fb735c821
title                                  1987--Ackland-G-J--Cu
Name: 163, dtype: object

#### Call get_record with title and template

In [10]:
iprhub.get_record(title =    'potential.2015--Mendelev-M-I--Al-Sm', 
                  template = 'record-interatomic-potential')

_id                                 573b33792403f020621655b5
content    <?xml version="1.0" encoding="utf-8"?>\n<inter...
schema                              573b10a12403f020621654a5
title                    potential.2015--Mendelev-M-I--Al-Sm
Name: 123, dtype: object

## Refresh the Local Information

The DataFrames for the types, templates and records can be rebuilt by calling the refresh method. This is useful if external changes are made to the curator.

In [11]:
local.refresh()

## Adding Templates and Types

Templates and types can be easily added using the add_template() and add_xsd_type() methods.  Both functions work nearly identically.

Keyword arguments for both methods:

- filename = path to file being loaded.

- title = name to assign to template/type. If not given, then the filename minus extension is used.

- version = version number.

- dependencies = list of types that are required by the file being loaded. Each item can be identified by the type's filename, id, title, or as a Pandas Series.

- refresh = Boolean indicating if refresh() is called automatically. Default is True.

It will not add if the title and version of the template/type match with an existing template/type.

#### Call add_xsd_type with an already existing title and version

In [13]:
local.add_xsd_type('demo_files/uncertainty.xsd', title='uncertainty')

Matching xsd_type title and version already in curator


#### Call add_xsd_type again with a new name

In [15]:
local.add_xsd_type('demo_files/uncertainty.xsd', title='newt-uncertainty')

uploading newt-uncertainty


#### Show that new type exists

In [16]:
local.get_xsd_type('newt-uncertainty')

content         <?xml version="1.0" encoding="UTF-8" standalon...
dependencies                                                   []
filename                                          uncertainty.xsd
hash                     3db5ac62b1aaf1d844e630de15aa0a84b37da39a
id                                       5849dffed92ecc2710eed948
title                                            newt-uncertainty
typeVersion                              5849dffed92ecc2710eed947
version                                                         1
Name: 20, dtype: object

## Deleting and Restoring Templates and Types

There are also methods delete_template(), delete_xsd_type(), restore_template(), restore_xsd_type() that delete and restore templates and types, but these behaviors don't do much with the current version of the curator.

## Adding Records

A record can be added using the add_record() method. 

Arguments of the function are:

- content = string representation of an xml record, or the path to an xml file.

- title = name to give the record.

- template = schema template that the record is validated against.  Can be title, filename, id or Series representation of a template.

Keyword argument:

- refresh = Boolean indicating if refresh() is called automatically. Default is True.

#### Add a record

In [17]:
local.add_record('demo_files/1987--Ackland-G-J--Ag.xml', '1987--Ackland-G-J--Ag', 'LAMMPS-potential')
local.records

Unnamed: 0,_id,content,schema,title
0,57b346dad92ecc10c8803349,"<?xml version=""1.0"" encoding=""utf-8""?>\n<LAMMP...",57b338f0d92ecc10c8803344,1985--Foiles-S-M--Ni-Cu
1,5849e007d92ecc2710eed949,"<?xml version=""1.0"" encoding=""utf-8""?>\n<LAMMP...",57b338f0d92ecc10c8803344,1987--Ackland-G-J--Ag


## Modifying Records

Records can be modified with the update_record() method. Note that the corresponding \_id will change as this deletes the old record and adds a new one in its place.

Arguments:

- record = existing record to replace. This can be the record's title, \_id, or a Series representation.

- content =  string representation of an xml record, or the path to an xml file with the new content.

Keyword argument:

- refresh = Boolean indicating if refresh() is called automatically. Default is True.

#### View existing content

In [18]:
record = local.get_record('1987--Ackland-G-J--Ag').content
print record

<?xml version="1.0" encoding="utf-8"?>
<LAMMPS-potential><potential><key>dc4149ce-3592-4131-8683-ecf654d5a519</key><id>1987--Ackland-G-J--Ag</id></potential><units>metal</units><atom_style>atomic</atom_style><atom><element>Ag</element><mass>107.8682</mass></atom><pair_style><type>eam/fs</type></pair_style><pair_coeff><term><file>Ag.eam.fs</file></term><term><symbols>True</symbols></term></pair_coeff></LAMMPS-potential>


#### Modify content and update

In [19]:
new_record = record.replace('<mass>107.8682</mass>', '<mass>107.9</mass>')
local.update_record('1987--Ackland-G-J--Ag', new_record)
local.records

Unnamed: 0,_id,content,schema,title
0,57b346dad92ecc10c8803349,"<?xml version=""1.0"" encoding=""utf-8""?>\n<LAMMP...",57b338f0d92ecc10c8803344,1985--Foiles-S-M--Ni-Cu
1,5849e00cd92ecc2710eed94a,"<?xml version=""1.0"" encoding=""utf-8""?>\n<LAMMP...",57b338f0d92ecc10c8803344,1987--Ackland-G-J--Ag


#### Check that content has changed

In [20]:
record = local.get_record('1987--Ackland-G-J--Ag').content
print record

<?xml version="1.0" encoding="utf-8"?>
<LAMMPS-potential><potential><key>dc4149ce-3592-4131-8683-ecf654d5a519</key><id>1987--Ackland-G-J--Ag</id></potential><units>metal</units><atom_style>atomic</atom_style><atom><element>Ag</element><mass>107.9</mass></atom><pair_style><type>eam/fs</type></pair_style><pair_coeff><term><file>Ag.eam.fs</file></term><term><symbols>True</symbols></term></pair_coeff></LAMMPS-potential>


## Deleting A Record

An unwanted record can be deleted using the delete_record() method.

Arguments:

- record = existing record to delete. This can be the record's title, \_id, or a Series representation.

Keyword argument:

- refresh = Boolean indicating if refresh() is called automatically. Default is True.

#### Delete the recently added record

In [21]:
local.delete_record('1987--Ackland-G-J--Ag')
local.records

Unnamed: 0,_id,content,schema,title
0,57b346dad92ecc10c8803349,"<?xml version=""1.0"" encoding=""utf-8""?>\n<LAMMP...",57b338f0d92ecc10c8803344,1985--Foiles-S-M--Ni-Cu


## Add Other Files (blobs)

Non-record files can also be stored on the curator as blobs. These can be added with the add_file() method.

Argument:

- filename = path to the file to upload.

Returns the url that can be used to download the file later.

In [22]:
file_url = local.add_file('demo_files/test.txt')
print file_url

http://127.0.0.1:8000/rest/blob?id=5849e010d92ecc2710eed94b


## Large Repository Tips

When working with large repositories or a large number of records, it would be inefficient to download all records and to refresh everytime records are added or changed. This section details the options that can be done to reduce the local overhead involved for those cases.

### Initialize without building records

Setting records_fetch=False during initialization will build the templates/types but not the records. 

__NOTE:__ accessing records will cause it to be built if it doesn't exist yet.

#### Reinitialize iprhub

In [12]:
start = time.time()
iprhub = mdcs.MDCS(host='https://iprhub.nist.gov/', 
                   user='lmh1',
                   pswd='C:/users/lmh1/documents/iprhub/iprhub_password.txt',
                   cert='C:/users/lmh1/documents/iprhub/iprhub-ca.pem')
end = time.time()
print end-start, 'seconds'

1.60799980164 seconds


#### Reinitialize iprhub without loading records

In [13]:
start = time.time()
iprhub = mdcs.MDCS(host='https://iprhub.nist.gov/', 
                   user='lmh1',
                   pswd='C:/users/lmh1/documents/iprhub/iprhub_password.txt',
                   cert='C:/users/lmh1/documents/iprhub/iprhub-ca.pem',
                   records_fetch=False)
end = time.time()
print end-start, 'seconds'

0.617000102997 seconds


### Build records with limiters

The build_records() method can be used to then build the records DataFrame with only a selection of the curator's records.

Keyword arguments:

- format = file format to access. Values are xml or json.

- id = record id to limit to.

- template = template of records to include. Can be title, filename, id or Series representation of a template.

- title = record title to limit to.

Given the current options, this is really only good for curators with a large number of templates in that you can specifically limit your search to one template.

#### Retrieve only records for the record-interatomic-potential template

In [14]:
iprhub.build_records(template='record-interatomic-potential')
iprhub.records

Unnamed: 0,_id,content,schema,title
0,573b33512403f0206216553b,"<?xml version=""1.0"" encoding=""utf-8""?>\n<inter...",573b10a12403f020621654a5,potential.1985--Foiles-S-M--Ni-Cu
1,573b33522403f0206216553c,"<?xml version=""1.0"" encoding=""utf-8""?>\n<inter...",573b10a12403f020621654a5,potential.1987--Ackland-G-J--Ag
2,573b33522403f0206216553d,"<?xml version=""1.0"" encoding=""utf-8""?>\n<inter...",573b10a12403f020621654a5,potential.1987--Ackland-G-J--Au
3,573b33522403f0206216553e,"<?xml version=""1.0"" encoding=""utf-8""?>\n<inter...",573b10a12403f020621654a5,potential.1987--Ackland-G-J--Cu
4,573b33532403f0206216553f,"<?xml version=""1.0"" encoding=""utf-8""?>\n<inter...",573b10a12403f020621654a5,potential.1987--Ackland-G-J--Mo
5,573b33532403f02062165540,"<?xml version=""1.0"" encoding=""utf-8""?>\n<inter...",573b10a12403f020621654a5,potential.1987--Ackland-G-J--Nb
6,573b33532403f02062165541,"<?xml version=""1.0"" encoding=""utf-8""?>\n<inter...",573b10a12403f020621654a5,potential.1987--Ackland-G-J--Ni
7,573b33542403f02062165542,"<?xml version=""1.0"" encoding=""utf-8""?>\n<inter...",573b10a12403f020621654a5,potential.1987--Ackland-G-J--Ta
8,573b33542403f02062165543,"<?xml version=""1.0"" encoding=""utf-8""?>\n<inter...",573b10a12403f020621654a5,potential.1987--Ackland-G-J--V
9,573b33542403f02062165544,"<?xml version=""1.0"" encoding=""utf-8""?>\n<inter...",573b10a12403f020621654a5,potential.1987--Ackland-G-J--W


#### Retrieve only records for the record-JARVIS-FF-calc template

In [15]:
iprhub.build_records(template='record-JARVIS-FF-calc')
iprhub.records

Unnamed: 0,_id,content,schema,title
0,5723b73e2403f03d834cfbd1,"<?xml version=""1.0"" encoding=""utf-8""?>\n<jarvi...",5723b6f92403f03d834cfbcf,JARVIS-one-demo


#### The selection information is retained for when refresh() is called

In [16]:
iprhub.refresh()
iprhub.records

Unnamed: 0,_id,content,schema,title
0,5723b73e2403f03d834cfbd1,"<?xml version=""1.0"" encoding=""utf-8""?>\n<jarvi...",5723b6f92403f03d834cfbcf,JARVIS-one-demo


### Supressing refreshes

If a large number of files are being added/modified at once, it doesn't make much sense to refresh the local copies for every small change. All the methods that modify the curator have the keyword refresh which can be set to False to avoid needless refreshing.

__NOTE:__ Many of the capabilities of this class search over the local representations of the templates/types/records, so refreshing is necessary in some cases. Some examples are:

- The get methods will fail if they are searching for items that have been added after the last refresh. 

- Any method that takes templates/types as inputs won't be able to find them if they were added since the last refresh.

- The add_xml_types() and add_templates() may miss duplicate entries if added before a refresh.

#### Demonstrate adding a record without refreshing

In [28]:
local.records

Unnamed: 0,_id,content,schema,title
0,57b346dad92ecc10c8803349,"<?xml version=""1.0"" encoding=""utf-8""?>\n<LAMMP...",57b338f0d92ecc10c8803344,1985--Foiles-S-M--Ni-Cu


In [29]:
local.add_record('demo_files/1987--Ackland-G-J--Ag.xml', '1987--Ackland-G-J--Ag', 'LAMMPS-potential', refresh=False)
local.records

Unnamed: 0,_id,content,schema,title
0,57b346dad92ecc10c8803349,"<?xml version=""1.0"" encoding=""utf-8""?>\n<LAMMP...",57b338f0d92ecc10c8803344,1985--Foiles-S-M--Ni-Cu


#### Refresh

In [30]:
local.refresh()
local.records

Unnamed: 0,_id,content,schema,title
0,57b346dad92ecc10c8803349,"<?xml version=""1.0"" encoding=""utf-8""?>\n<LAMMP...",57b338f0d92ecc10c8803344,1985--Foiles-S-M--Ni-Cu
1,5849e017d92ecc2710eed94d,"<?xml version=""1.0"" encoding=""utf-8""?>\n<LAMMP...",57b338f0d92ecc10c8803344,1987--Ackland-G-J--Ag


#### Demonstrate deleting a record without refreshing

In [31]:
local.delete_record('1987--Ackland-G-J--Ag', refresh=False)
local.records

Unnamed: 0,_id,content,schema,title
0,57b346dad92ecc10c8803349,"<?xml version=""1.0"" encoding=""utf-8""?>\n<LAMMP...",57b338f0d92ecc10c8803344,1985--Foiles-S-M--Ni-Cu
1,5849e017d92ecc2710eed94d,"<?xml version=""1.0"" encoding=""utf-8""?>\n<LAMMP...",57b338f0d92ecc10c8803344,1987--Ackland-G-J--Ag


#### Refresh

In [32]:
local.refresh()
local.records

Unnamed: 0,_id,content,schema,title
0,57b346dad92ecc10c8803349,"<?xml version=""1.0"" encoding=""utf-8""?>\n<LAMMP...",57b338f0d92ecc10c8803344,1985--Foiles-S-M--Ni-Cu
