In [1]:
try:
    from sdRDM import DataModel
    from sdRDM.database import build_sql_database
    
except ModuleNotFoundError:
    # Install package if not given
    import subprocess
    import sys
    
    subprocess.call([sys.executable, "-m", "pip", "install", "git+https://github.com/JR-1991/software-driven-rdm.git"])

### Fetching the data model

The PyEED data model can be inspected [here](https://github.com/PyEED/pyeed-data-model) which incorporates a [markdown file](https://github.com/PyEED/pyeed-data-model/blob/main/specifications/model.md) that defines the model. By using sdRDM's ```DataModel``` class and ```from_git``` method we can generate the corresponding Python code in-memory and use the model right away. See the printed tree to verify that the code is present.

In [7]:
# pyeed=DataModel.from_markdown('./specifications/model.md')
pyeed = DataModel.from_git("https://github.com/PyEED/PAZy.git")

pyeed.ProteinSequence.visualize_tree()

KeyError: ''

### Building the SQL database

Next, we are going to use the ```build_sql_database``` function to set up an SQLite database file. This one will be used later on to populate data from our model. The database will construct a table for each object/attribute and thus facilitate an easy transfer from an application to the database.

In [6]:
build_sql_database(pyeed.ProteinSequence, pyeed.DNASequence, loc="./test.db")

### Importing PAZy database

Next we are importing the PAZy database from an excel file with all the important information


In [7]:
import subprocess
subprocess.call([sys.executable, "-m", "pip", "install", "openpyxl"])
subprocess.call([sys.executable, "-m", "conda", "install", "pandas"])
import pandas as pd
PAZy = pd.read_excel('./PAZy_DB.xlsx')
PAZy_ref = pd.read_excel('./PAZy_DB.xlsx', sheet_name = 'Table3')


### Creating a dataset and populating the database

In order to demonstrate how to populate the database using our model, we are going to construct a small dataset using the data model we just loaded. 

Finally, we are going to add the data to our previously created database by using the datasets ```to_sql``` method where we also submit the location of our database file. This will create an Object Relation Model (ORM) that represents the Database structure and map the values present in our dataset to the corresponding tables.

In [8]:
for index, row in PAZy.iterrows():
    dataset = pyeed.ProteinSequence(
        name=row["Enzyme"],
        pdb_id=[row["PDB_Accession"]],
        amino_acid_sequence=row["Sequence"],
        substrate=[row["Substrate"]],
        ncbi_id=row["NCBI_Accession"],
        # organism=pyeed.Organism(
        #     name=row["Microbial host"],
        #     ncbi_taxonomy_id=row["NCBI_Accession"]
        # )
    )
    # for index_2, row_2 in PAZy_ref.iterrows():
    #     if PAZy_ref.loc[PAZy_ref['Enzyme_ID'] == dataset.id]:
    #         dataset.add_to_reference(###)

               
    dataset.to_sql(loc="./test.db")

In [6]:
pyeed.Organism.visualize_tree()

Organism
├── id
├── name
└── ncbi_taxonomy_id
