# Create CSDMS Code-Meta Files

The purpose of this notebook is to create code-meta json files for content described in the CSDMS model registry. These code-meta representations will serve as input to the FAIR evaluation framework.

This notebooks uses data collected via the following url structure:

`https://csdms.colorado.edu/csdms_wiki/index.php?title=Special:Browse&offset=0&dir=out&article=Model%3ATOPMODEL&group=hide&format=json`

In [45]:
import json
import pprint
from datetime import date

from pydantic.v1 import HttpUrl
from pydantic2_schemaorg.Person import Person
from pydantic2_schemaorg.Organization import Organization
from pydantic2_schemaorg.CreativeWork import CreativeWork
from codemeticulous.codemeta.models import CodeMetaV3, VersionedLanguage


In [4]:
# load some data
with open('raw_model_metadata/TOPMODEL.json', 'r') as f:
    dat = json.loads(f.read())    

Squash all properties so it's easier to access the data

All properties within the 'data' list are structure as:

```
{'property': <Property Name>, 'dataitem': [{'type': <Type>, 'item': <Value>}]}

for example:

{'property': 'City', 'dataitem': [{'type': 2, 'item': 'Lancaster'}]}
```

In [5]:
properties = {}
for prop in dat['data']:
    items = [p['item'].strip() for p in prop['dataitem']]
    if len(items) == 1:
        items = items[0]
    properties[prop['property']] = items

In [6]:
#pprint.pprint(properties, indent=4)

Create Pydantic Representation of "Core" metadata. These are metadata that fit into the SchemaOrg CreativeWork class.

In [7]:
creator = Person(
    givenName= properties['First_name'] ,
    familyName = properties['Last_name'],
    email = properties['Email_address'],
    affiliation = Organization(
        address = f'{properties["City"]}, {properties["Country"]}',
        name = properties['Institute']
    )
)

date_created = date(int(properties['Start_year_development']),1,1)

# Keywords combine: ModelDomain, Modelautophrases, and Model_keywords
keywords = properties['ModelDomain'] + properties['Modelautophrases'] + properties['Model_keywords']

Creating Pydantic representation of Code-Meta fields. These are defined by Leslie Hsu in https://github.com/codemeta/codemeta/blob/d464a2891206a55c1146b4dd6b996b8fa733ceb1/crosswalks/csdms.csv. In the future, this will be replaced with the crosswalk that Irene is developing.

In [51]:
# core fields
codeRepository = HttpUrl(scheme=properties['Source_web_address'].split(':')[0], url=properties['Source_web_address'])

plang = properties['Programming_language'] if isinstance(properties['Programming_language'], list) else [properties['Programming_language']]
plang_other = properties['Program_language_other'] if isinstance(properties['Program_language_other'], list) else [properties['Program_language_other']]
programmingLanguage = [VersionedLanguage(name=lang) for lang in (plang + plang_other)]
applicationCategory = properties['Model_type']
memoryRequirements = properties['Memory_requirements']

dateModified = None
if 'End_year_model_development' in properties:
    dateModified = date(properties['End_year_model_development'], 1, 1)

supported_platforms = properties['Supported_platforms'] if isinstance(properties['Supported_platforms'], list) else [properties['Supported_platforms']]
supported_platforms_other = []
if 'Supported_platforms_other' in properties: # not sure if this is use, but it's in Leslie's crosswalk
    supported_platforms_other = properties['Supported_platforms_other'] if isinstance(properties['Supported_platforms_other'], list) else [properties['Supported_platforms_other']]
operatingSystem = [platform for platform in (supported_platforms + supported_platforms_other)]

# TODO: The way we extract license needs to be improved.
#license = CreativeWork(name=properties['Program_license_type'])

#provider
#ispartof
#identifier
#sameas
url = HttpUrl(scheme=properties['Source_web_address'].split(':')[0], url=properties['Source_web_address'])



# --- code-meta fields ---


# hasSourceCode: Optional[SoftwareListOrSingle]
# isSourceCodeOf: Optional[
#     list[SoftwareApplication | str | AnyUrl] | SoftwareApplication | str | AnyUrl
# ]
# softwareSuggestions: Optional[SoftwareListOrSingle]
# maintainer: Optional[ActorListOrSingle]
# contIntegration: Optional[list[AnyUrl] | AnyUrl]
# continuousIntegration: Optional[list[AnyUrl] | AnyUrl]
# buildInstructions: Optional[list[AnyUrl] | AnyUrl]
# developmentStatus: Optional[str]
# embargoDate: Optional[date | datetime]
# embargoEndDate: Optional[date | datetime]
# funding: Optional[list[str] | str]
# issueTracker: Optional[list[AnyUrl] | AnyUrl]
# referencePublication: Optional[
#     list[ScholarlyArticle | str | AnyUrl] | ScholarlyArticle | str | AnyUrl
# ]
# readme: Optional[list[AnyUrl] | AnyUrl]

In [52]:
meta = CodeMetaV3(
    name = dat['subject'][0:-7],
    dateCreated = date_created,
    creator=creator,
    description=properties['Extended_model_description'],
    keywords=keywords,
    codeRepository=codeRepository,
    programmingLanguage = programmingLanguage,
    applicationCategory = applicationCategory,
    memoryRequirements = memoryRequirements,
    dateModified = dateModified,
    operatingSystem = operatingSystem,
    url = url,
)

print(json.dumps(json.loads(meta.json()), indent=4))

{
    "@context": "https://w3id.org/codemeta/3.0",
    "@type": "SoftwareSourceCode",
    "name": "TOPMODEL",
    "codeRepository": "https://cran.r-project.org/package=topmodel",
    "programmingLanguage": [
        {
            "@type": "ComputerLanguage",
            "name": "Fortran77"
        },
        {
            "@type": "ComputerLanguage",
            "name": "Visual Basic"
        }
    ],
    "applicationCategory": "Modular",
    "memoryRequirements": "--",
    "operatingSystem": [
        "Windows"
    ],
    "author": {
        "@type": "Person",
        "givenName": "Keith",
        "familyName": "Beven",
        "affiliation": {
            "@type": "Organization",
            "name": "Lancaster University, Department of Environmental Science, Institute of Environmental and Natural Sciences",
            "address": "Lancaster, United Kingdom"
        },
        "email": "mailto:K.Beven@lancaster.ac.uk"
    },
    "dateCreated": "1974-01-01",
    "keywords": [
        "