Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

License template update #656

Closed
mauzey1 opened this issue May 25, 2022 · 14 comments · Fixed by #657
Closed

License template update #656

mauzey1 opened this issue May 25, 2022 · 14 comments · Fixed by #657
Assignees
Projects

Comments

@mauzey1
Copy link
Collaborator

mauzey1 commented May 25, 2022

Due to changes to cmip6-cmor-tables from PCMDI/cmip6-cmor-tables#360 and WCRP-CMIP/CMIP6_CVs#1066, the way CMOR processes the license template will need to be changed.

The license template will be changed from the one below...

    "license":[
        "^CMIP6 model data produced by .* is licensed under a Creative Commons Attribution.*ShareAlike 4.0 International License .https://creativecommons.org/licenses.* *Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing CMIP6 output, including citation requirements and proper acknowledgment\\. *Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file).*\\. *The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose\\. *All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law\\.$"
    ]

to this one..

    "license":{
        "license":"CMIP6 model data produced by <Your Institution; see CMIP6_institution_id.json> is licensed under a <Creative Commons; select and insert a license_id; see below> License (<insert the matching license_url; see below>). Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing CMIP6 output, including citation requirements and proper acknowledgment. Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file)[ and at <some URL maintained by modeling group>]. The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law.",
        "license_options":{
            "CC BY 4.0":{
                "license_id":"Creative Commons Attribution 4.0 International",
                "license_url":"https://creativecommons.org/licenses/by/4.0/"
            },
            "CC BY-NC-SA 4.0":{
                "license_id":"Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International",
                "license_url":"https://creativecommons.org/licenses/by-nc-sa/4.0/"
            },
            "CC BY-SA 4.0":{
                "license_id":"Creative Commons Attribution-ShareAlike 4.0 International",
                "license_url":"https://creativecommons.org/licenses/by-sa/4.0/"
            },
            "CC0 1.0":{
                "license_id":"Creative Commons CC0 1.0 Universal Public Domain Dedication",
                "license_url":"https://creativecommons.org/publicdomain/zero/1.0/"
            }
        }
    }
@mauzey1 mauzey1 added this to To do in 3.7.0 via automation May 25, 2022
@mauzey1 mauzey1 self-assigned this May 25, 2022
@matthew-mizielinski
Copy link

One other option is to have a very broad regular expression for the license, e.g.

license_out = re.sub('<[A-Za-z0-9 ;_.]+>', '.*', license_in)

There are potential issues with this, there is a lot of freedom for text that could lead to errors, but it wouldn't require complex code to implement.

@mauzey1
Copy link
Collaborator Author

mauzey1 commented Jun 6, 2022

@durack1

I plan to incorporate the changes that I will make for this issue into the branch used for testing #628. I wanted to know how CMOR should handle a license created from the registered content vs that provided by the user input. I assume the default behavior should be filling in the license attribute with the license template in the registered content.

What should happen if the user provides a license in the input file? Should the user-provided license overwrite the CV-provided license, or do we just force using the CV-provided license? Should the user-provided license be validated against the CV-provided license?

@durack1
Copy link
Contributor

durack1 commented Jun 6, 2022

@mauzey1 thanks for pushing on this. I would like broader consultation from this from @matthew-mizielinski and @taylor13, but in my view, if the template (which includes the <Your Institution; see CMIP6_institution_id.json> and additional entries, then we replace these with the selected license, which defaults to CC BY 4.0, but if this template is edited (and doesn't match our default) then we allow the user customization to remain.

Is this clear enough guidance? And @matthew-mizielinski does this work for you (and across other projects, for which we'll have to update the table defaults)?

@mauzey1
Copy link
Collaborator Author

mauzey1 commented Jun 6, 2022

@durack1 Okay. So if the user provides a license in the user input like the one below

    "license": "This is my new license."

then we replace the template license with that one?

@durack1
Copy link
Contributor

durack1 commented Jun 6, 2022

@mauzey1 exactly, that is what I was thinking.

Before you proceed, I might defer for signoff from @matthew-mizielinski and @taylor13 just to make sure that I am not thinking of all edge cases we may encounter

@taylor13
Copy link
Collaborator

taylor13 commented Jun 6, 2022

I think I'd consider error-exiting (with a clear explanation) if the user-provided license doesn't match the registered license. Shouldn't we ask the modeling group to update the CV if they want to change their license?

@mauzey1
Copy link
Collaborator Author

mauzey1 commented Jun 6, 2022

I just noticed that the license template contains the following substring.

[ and at <some URL maintained by modeling group>]

How do we handle this? If the user adds this info, are we suppose to find the optional substring containing a URL? For example, ... and at https:///pcmdi.llnl.gov/.

If the license is made from the template, then it wouldn't have that substring, right? Should there be another user input parameter for a URL so that CMOR can build that substring?

@durack1
Copy link
Contributor

durack1 commented Jun 6, 2022

@mauzey1 good catch. Yes, so that user-provided URL would be optional, and if provided, we remove the [], fill in the URL and complete the string. We do have the source_specific_info field, and for the MOHC/HadGEM3*/UKESM1* and the NASA-GISS/GISS-* examples that we have in the CVs this field is a URL (maintained by the modeling group)

@matthew-mizielinski
Copy link

I'm tempted to suggest that we have a meeting to discuss this.

Some thoughts on the last few comments;

  • we should not change the current behaviour of failing / raising an exception if the user supplied license does not match the template(s) provided in the CVs -- overwriting an errant value with anything is problematic and I would anticipate errors from less experienced users.
  • Note that any changes need to be easily picked up by other projects, for example in one project I have a set of CVs where the license refers to the UK Open government license

In this situation I would be tempted by changing CMOR and such that the license text here is a list with multiple entries, one for each acceptable license pattern, e.g.

"license":[
            "^CMIP6 model data produced by .* is licensed under a Creative Commons Attribution 4.0 International License ...",
            "^CMIP6 model data produced by .* is licensed under a Creative Commons Attribution-Sharealike 4.0 International License ...",
            "^CMIP6 model data produced by .* is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License ...",
            "^CMIP6 model data produced by .* is licensed under a Creative Commons CC0 1.0 Universal Public Domain Dedication License ..."
]

CMOR could then test whether the license text matches any one of the licenses specified within this list -- if exactly one match is found proceed, if zero found then fail. This would also allow the current .* used to allow the text and at https://ukesm.ac.uk/cmip6 to be included in the license as we use for all UKESM1 and HadGEM3 data.

If C had a better regular expression capability I think we could simplify the above, but a quick google suggests that this is a little tricky at present.

@taylor13
Copy link
Collaborator

taylor13 commented Jun 7, 2022

I don't think we need to be prescriptive as to what is in the license, but for CMIP, at least, CMOR should access an updateable CV entry, approved by the relevant modeling group, which contains the license information. The WIP would recommend that a group adopt one of perhaps a couple of options regarding the wording of the license, but I think the groups should be free to impose any license restrictions they want to.

Perhaps CMOR should be flexibly configurable such that license information could be obtained from a CV or from the user's direct input. For CMIP we would require the license information be extracted from the CV; other projects might want to allow the data writer to directly inject the license to CMOR.

We might also consider a 3rd option: we could store a URL in the CMOR license attribute that would point to the license description (perhaps in a CV) that could subsequently be updated.

Under any of the options, I think we must insist that any update to a license would need to be in the direction of relaxation of restrictions; else someone might make use of data that at the time was allowed, but subsequently illegal.

@mauzey1
Copy link
Collaborator Author

mauzey1 commented Jun 7, 2022

For context, here is the license template in the current CMIP6_CV.json.

        "license":[
            "^CMIP6 model data produced by .* is licensed under a Creative Commons Attribution.*ShareAlike 4.0 International License .https://creativecommons.org/licenses.* *Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing CMIP6 output, including citation requirements and proper acknowledgment\\. *Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file).*\\. *The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose\\. *All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law\\.$"
        ]

This template is not derived from CMIP6_license.json but rather hard-coded in the script that builds CMIP6_CV.json.

I applied @matthew-mizielinski's idea and made templates similar to the original in format but with all four options.

        "license":[
            "^CMIP6 model data produced by .* is licensed under a Creative Commons Attribution 4\\.0 International License (https://creativecommons\\.org/licenses/by/4\\.0/)\\. *Consult https://pcmdi\\.llnl\\.gov/CMIP6/TermsOfUse for terms of use governing CMIP6 output, including citation requirements and proper acknowledgment\\. *Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file).*\\. *The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose\\. *All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law\\.$",
            "^CMIP6 model data produced by .* is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4\\.0 International License (https://creativecommons\\.org/licenses/by-nc-sa/4\\.0/)\\. *Consult https://pcmdi\\.llnl\\.gov/CMIP6/TermsOfUse for terms of use governing CMIP6 output, including citation requirements and proper acknowledgment\\. *Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file).*\\. *The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose\\. *All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law\\.$",
            "^CMIP6 model data produced by .* is licensed under a Creative Commons Attribution-ShareAlike 4\\.0 International License (https://creativecommons\\.org/licenses/by-sa/4\\.0/)\\. *Consult https://pcmdi\\.llnl\\.gov/CMIP6/TermsOfUse for terms of use governing CMIP6 output, including citation requirements and proper acknowledgment\\. *Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file).*\\. *The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose\\. *All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law\\.$",
            "^CMIP6 model data produced by .* is licensed under a Creative Commons CC0 1\\.0 Universal Public Domain Dedication License (https://creativecommons\\.org/publicdomain/zero/1\\.0/)\\. *Consult https://pcmdi\\.llnl\\.gov/CMIP6/TermsOfUse for terms of use governing CMIP6 output, including citation requirements and proper acknowledgment\\. *Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file).*\\. *The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose\\. *All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law\\.$"
        ]

It does make the rules for the license and URL stricter by removing the .* in the middle of the license name, and limiting it to 4 URLs in parentheses.

If we just wanted to validate the license values provided by the user, then we shouldn't need to make any changes to CMOR. Do we still want CMOR to provide a license value created from the templates if the user doesn't provide one?

Edit: Fixed the templates to escape periods.

@wachsylon
Copy link
Collaborator

wachsylon commented Jul 6, 2022

@mauzey1
I tried to run cmor 3.6.1 with the most recent CMIP_CV.json and CMOR crushes. As it was part of a larger update of many packages, I needed half a day and valgrind to find out:

**21654** *** stpcpy_chk: buffer overflow detected ***: program terminated
==21654==    at 0x50E97AFC: VALGRIND_PRINTF_BACKTRACE (valgrind.h:6314)
==21654==    by 0x50E9CBBC: __stpcpy_chk (vg_replace_strmem.c:1495)
==21654==    by 0x619449: strcat (string3.h:144)
==21654==    by 0x619449: cmor_CV_ValidateAttribute (cmor_CV.c:2015)
==21654==    by 0x61993B: cmor_CV_checkGblAttributes (cmor_CV.c:2183)
==21654==    by 0x60B730: cmor_setGblAttr (cmor.c:2979)
==21654==    by 0x60DDF3: cmor_write (cmor.c:4476)
==21654==    by 0x1D169B: write_variables(KVList*, std::shared_ptr<CdoStream>, mapping*, int, int, int, char*, char*, int*) (CMOR.cc:5495)
==21654==    by 0x1E93A6: CMOR(void*) (CMOR.cc:6318)
==21654==    by 0x4FC75C: ProcessManager::runProcesses() (processManager.cc:32)

That error is really a pitty because, at that stage in cmor_CV.c:2015, an error message should be created which would have clearly described what the problem is :D

I have not looked into detail but it most probably is because of the additional license entry for all sources which cannot be evaluated by CMOR. Idk how but maybe users should be warned somehow when the update is coming..

It still works with the 9b44cc846da13a6594f7315af6430876f6c6fa5b CMIP6_CV.json version.

@mauzey1
Copy link
Collaborator Author

mauzey1 commented Jul 6, 2022

@wachsylon

This is actually related to an error we encountered where an error message got bigger than the maximum length of a string in CMOR (see issue #638). This has been resolved by pull request #639, which removes the code from cmor_CV.c that tries to build a long regex string for the error message to display. Each license template is >800 characters long, so concatenating them easily exceeds the CMOR_MAX_STRING value of 1024.

This change will be in the next release of CMOR (version 3.7.0), which we are expecting to release later this month.

@wachsylon
Copy link
Collaborator

@mauzey1

Thanks for explanation. I suggest to update CMOR_Version in the header of the Tables in addition. That will let the users know they cannot work with the MIP-Tables and an earlier CMOR version.

@mauzey1 mauzey1 mentioned this issue Aug 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
3.7.0
Done
Development

Successfully merging a pull request may close this issue.

5 participants