# Create CSDMS Code-Meta Files

The purpose of this notebook is to create code-meta json files for content described in the CSDMS model registry. These code-meta representations will serve as input to the FAIR evaluation framework.

This notebooks uses data collected via the following url structure:

`https://csdms.colorado.edu/csdms_wiki/index.php?title=Special:Browse&offset=0&dir=out&article=Model%3ATOPMODEL&group=hide&format=json`

In [1]:
import json
import pprint
from datetime import date

from pydantic2_schemaorg.Person import Person
from pydantic2_schemaorg.Organization import Organization
from codemeticulous.codemeta.models import CodeMetaV3


In [8]:
# load some data
with open('raw_model_metadata/TOPMODEL.json', 'r') as f:
    dat = json.loads(f.read())    

Squash all properties so it's easier to access the data

All properties within the 'data' list are structure as:

```
{'property': <Property Name>, 'dataitem': [{'type': <Type>, 'item': <Value>}]}

for example:

{'property': 'City', 'dataitem': [{'type': 2, 'item': 'Lancaster'}]}
```

In [9]:
properties = {}
for prop in dat['data']:
    items = [p['item'].strip() for p in prop['dataitem']]
    if len(items) == 1:
        items = items[0]
    properties[prop['property']] = items

In [10]:
pprint.pprint(properties, indent=4)

{   'Additional_comments_model': 'Linkages Supported: Links to GLUE '
                                 '(Generalized Likelihood Uncertainty '
                                 'Estimation) program for '
                                 'sensitivity/uncertainty/calibration '
                                 'analyses.',
    'Citations': '34059',
    'City': 'Lancaster',
    'CodeReviewed': '1',
    'Code_CMT_compliant_or_not': 'No but possible',
    'Code_IRF_or_not': 'No but possible',
    'Code_openmi_compliant_or_not': 'No but possible',
    'Code_optimized': 'Single Processor',
    'Country': 'United Kingdom',
    'Current_future_collaborators': '--',
    'Describe_available_calibration_data': 'TOPMODEL calibration procedures '
                                           'are relatively simple because it '
                                           'uses very few parameters in the '
                                           'model formulas. The model is very '
                       

Create Pydantic Representation of "Core" metadata. These are metadata that fit into the SchemaOrg CreativeWork class.

In [12]:
creator = Person(
    givenName= properties['First_name'] ,
    familyName = properties['Last_name'],
    email = properties['Email_address'],
    affiliation = Organization(
        address = f'{properties["City"]}, {properties["Country"]}',
        name = properties['Institute']
    )
)

date_created = date(int(properties['Start_year_development']),1,1)

# Keywords combine: ModelDomain, Modelautophrases, and Model_keywords
keywords = properties['ModelDomain'] + properties['Modelautophrases'] + properties['Model_keywords']

In [13]:
meta = CodeMetaV3(
    name = dat['subject'][0:-7],
    dateCreated = date_created,
    creator=creator,
    description=properties['Extended_model_description'],
    keywords=keywords,
)

print(json.dumps(json.loads(meta.json()), indent=4))

{
    "@context": "https://w3id.org/codemeta/3.0",
    "@type": "SoftwareSourceCode",
    "name": "TOPMODEL",
    "author": {
        "@type": "Person",
        "givenName": "Keith",
        "familyName": "Beven",
        "affiliation": {
            "@type": "Organization",
            "name": "Lancaster University, Department of Environmental Science, Institute of Environmental and Natural Sciences",
            "address": "Lancaster, United Kingdom"
        },
        "email": "mailto:K.Beven@lancaster.ac.uk"
    },
    "dateCreated": "1974-01-01",
    "keywords": [
        "Terrestrial",
        "Hydrology",
        "physically",
        "based",
        "distributed",
        "watershed",
        "infiltration-excess overland flow",
        "saturation overland flow",
        "distributed watershed model",
        "overland flow",
        "simulates hydrologic fluxes",
        "subsurface flow",
        "physically based",
        "channel routing",
        "distributed watershed"