# Meta-genome virtual lab -- Version 1

This jupyter notebook is designed to engage with the data stored at portal.meta-genome.org via pycdcs. I plan to build a module to wrap around pycdcs to call and parse the xml payloads for multiple submissions in a pythonic way. E.g. storing iterations of stress-strain data from a single submission in a single pandas df (assuming same units + interval). 

Our schema is large, so there are many avenues of data scraping we can take. Here is a list of targetted data parsing:
 - Collate reference information into dictionary i.e.:\
   {"submission ID" : {"authors": ["O. Duncan", "R. Feynman"],\
                      "publication title": "pub title here"...} etc\
 - Collate metamaterial general information into dictionary i.e.:\
    {"submission ID" : {"metamaterial family": "Foam",\
                        "Unusual material property": "negative Poissons ratio"
                        "Strain convention" : "True"\
                        "Stress convention" : "True"}\
 - Collate single component measures and units for base materials i.e. :\
   {"submission ID" : {"base material ID" : ["Material name" : "Nylon 12",\
                                            "Material classification" : "Metallic"...\
                                            "Directional Sensitivity" : "Isotropic"]...}\
 - Collate single component measures and units for metamaterial i.e. :\
   {"submission ID" : {"metamaterial 1" : ["Material name" : "Nylon 12",\
                                            "Material classification" : "Metallic"...\
                                            "Directional Sensitivity" : "Isotropic"]...}\
 - from a list of submissions available to the user - generate a dict containing pandas.dfs for each continuous data curve. i.e. \
 {"submission1-ID" : {"metamaterial1" : [pandas.df.stress-strain1, pandas.df.stress-strain2, pandas.df.stress-strain3...],\
                      "metamaterial2" : [pandas.df.stress-strain1, pandas.df.stress-strain2, pandas.df.stress-strain3...],}

Here is a users workspace submission hierarchy for continuous stress-strain (most complex):
```
  workspace
  |───submission-1
  |   |───base material properties
  |   |   |───Directional Sensitivity (ISOTROPIC IS UNIQUE FROM TRANS AND ORTHO)
  |   |   |   |───Stress-strain data
  |   |   |   |   |  
  |   |   |   |   |───data-block-1
```

This means we need a 5 stage iteration for the continuous data. So for all records we will need a final dictionary that looks like:\
\
{"Workspace1": \
    {"submission ID": \
        {"base material properties ID" : \
            {"base material properties ID" : \
                {"Directional Sensitivity TYPE ID": \
                    {"Stress-strain datablock ID" : PANDAS.DF}}}}}}

Core functionality will first need to be established i.e. getting the data into the desired format, then I can build to parsing through all requested data sets.

I imagine that this itteration of the meta-genome-cdcs module and jupyter notebook will primarily be concerned with formatting the data.

Current thinking : 
Need a module that has a class in it. This class takes a pd.df from a pycdcs slice that contains the submission metadata and xml. Then we need to return a dict of dicts for each root level element ^ as given above. 

In [2]:

from cdcs import CDCS
#from meta_genome_cdcs import meta_genome_funcs
import xml_parse

The host URL and all login access parameters (username, password, authentication, etc.) are defined when creating a CDCS object.  Setting username as an empty string will access the site as an anonymous user, i.e. someone not signed in.

In [3]:

curator = CDCS('https://portal.meta-genome.org/', username='frontpage_user', password='FrontPage123!')



template="mecha-metagenome-schema31"
query_string = "{\"$or\": [{\"map.base-material-info.isotropic-choice.tensile-poissons-ratio-iso.tensile-poissons-ratio-val-iso\": {\"$lt\": 0.4}}, {\"map.base-material-info.isotropic-choice.tensile-poissons-ratio-iso.tensile-poissons-ratio-val-iso.#text\": {\"$lt\": 0.4}}]}"

#query_dict = {"map.base-material-info.isotropic-choice.tensile-poissons-ratio-iso.tensile-poissons-ratio-val-iso": { "exists": "true" }}
query_dict = "{\"$or\": [{\"map.metamaterial-material-info\": {\"$exists\": true}}, {\"map.metamaterial-material-info\": {\"$exists\": true}}]}"
query_dict = "{\"map.metamaterial-material-info\": {\"$exists\": true}}"

my_query= curator.query(template=template, mongoquery=query_dict)
my_query


100%|██████████| 17/17 [00:00<00:00, 19.50it/s]


Unnamed: 0,id,template,workspace,user_id,title,xml_content,creation_date,last_modification_date,last_change_date,template_title
0,82,56,1,10,StretchAux-LD60-OD-21.xml,"<map xmlns:xsi=""http://www.w3.org/2001/XMLSche...",2023-03-22 16:49:05.644000+00:00,2023-03-22 16:49:19.207000+00:00,2023-03-22 16:49:19.125000+00:00,mecha-metagenome-schema31
1,80,56,1,10,AuxBlock-CC-OD21.xml,"<map xmlns:xsi=""http://www.w3.org/2001/XMLSche...",2023-03-22 16:10:57.747000+00:00,2023-03-22 16:11:17.295000+00:00,2023-03-22 16:11:17.195000+00:00,mecha-metagenome-schema31
2,76,56,1,10,Aux_VC5-OD-23.xml,"<map xmlns:xsi=""http://www.w3.org/2001/XMLSche...",2023-03-22 14:42:57.495000+00:00,2023-03-22 15:28:05.182000+00:00,2023-03-22 15:28:04.851000+00:00,mecha-metagenome-schema31
3,46,56,1,10,AChiral_0_OD-23 -S.xml,"<map xmlns:xsi=""http://www.w3.org/2001/XMLSche...",2023-03-16 16:19:48.834000+00:00,2023-03-21 10:01:41.309000+00:00,2023-03-21 10:01:41.230000+00:00,mecha-metagenome-schema31
4,45,56,1,10,AChiral_0_OD-23.xml,"<map xmlns:xsi=""http://www.w3.org/2001/XMLSche...",2023-03-16 16:08:49.608000+00:00,2023-03-21 10:00:57.596000+00:00,2023-03-21 10:00:57.330000+00:00,mecha-metagenome-schema31
5,50,56,1,10,Chiral_10_OD-23.xml,"<map xmlns:xsi=""http://www.w3.org/2001/XMLSche...",2023-03-17 11:35:55.038000+00:00,2023-03-21 09:57:40.169000+00:00,2023-03-21 09:57:40.007000+00:00,mecha-metagenome-schema31
6,65,56,1,10,Chiral_30_OD-23.xml,"<map xmlns:xsi=""http://www.w3.org/2001/XMLSche...",2023-03-17 14:19:15.949000+00:00,2023-03-21 09:55:11.226000+00:00,2023-03-21 09:55:11.013000+00:00,mecha-metagenome-schema31
7,66,56,1,10,Chiral_30_OD-23 - S.xml,"<map xmlns:xsi=""http://www.w3.org/2001/XMLSche...",2023-03-17 14:25:43.164000+00:00,2023-03-21 09:54:22.510000+00:00,2023-03-21 09:54:22.433000+00:00,mecha-metagenome-schema31
8,51,56,1,10,Chiral_10_OD-23 - S.xml,"<map xmlns:xsi=""http://www.w3.org/2001/XMLSche...",2023-03-17 11:51:08.320000+00:00,2023-03-21 09:53:18.768000+00:00,2023-03-21 09:53:18.677000+00:00,mecha-metagenome-schema31
9,60,56,1,10,Chiral_25_OD-23.xml,"<map xmlns:xsi=""http://www.w3.org/2001/XMLSche...",2023-03-17 13:43:34.498000+00:00,2023-03-21 09:52:02.854000+00:00,2023-03-21 09:52:02.635000+00:00,mecha-metagenome-schema31


In [5]:
xml_string = my_query.iloc[0].xml_content
my_control = xml_parse.xml_control(my_query, xml_string)

In [12]:
import xml_parse
xml_string = my_query.iloc[0].xml_content
my_control = xml_parse.xml_control(my_query, xml_string)
myvar =my_control.inspect_xml()
print(myvar)


2


In [5]:
xml_string = my_query.iloc[8].xml_content
my_control = xml_parse.xml_control(my_query, xml_string)
myvar =my_control.get_topologies()
print(myvar)


iso
iso
iso
iso
{'unit-cell-topologies': ['https://portal.meta-genome.org/pid/rest/local/cdcs/49AGVQ6P93QX9BO4']}


REST calls can be made using the head, get, post, put, patch, and delete methods of the CDCS object.  Each method is named for the type of HTTP request to perform.  

Only the relative REST URL and any params and/or data associated with the request need to be given.  The host's URL will automatically be appended as a prefix to the REST URL, and the access parameters given when initializing the CDCS object will automatically be sent for each REST call. 

The REST call returns a requests.Response object allowing for checks of the status code as well as automatically transforming the data to str, bytes, or json contents. 