Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two different licenses on create dataset page #1503

Closed
wants to merge 3 commits into from
Closed

Conversation

seanh
Copy link
Contributor

@seanh seanh commented Feb 24, 2014

Which license am I releasing my data under, creative commons attribution or open database?

screenshot from 2014-02-11 16 22 11

@seanh
Copy link
Contributor Author

seanh commented Feb 13, 2014

Dev meeting decided: we can just get rid of this footer text.

@seanh seanh self-assigned this Feb 13, 2014
Two licenses on the same form, confusing.
@seanh seanh removed their assignment Feb 24, 2014
@@ -23,15 +23,6 @@

{% block form_actions %}
<div class="form-actions">
{% block disclaimer %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we want to keep the block empty in case people want to add some text in it (and it's unlikely but maybe people has alrady used it on their extensions)

@amercader
Copy link
Member

@seanh see comment regarding empty block, other than that looks good

@seanh seanh assigned seanh and unassigned amercader Mar 4, 2014
@seanh seanh added this to the CKAN 2.3 milestone Apr 1, 2014
@seanh
Copy link
Contributor Author

seanh commented Apr 7, 2014

@amercader Finally done :)

@seanh seanh removed their assignment Apr 7, 2014
@davidread
Copy link
Contributor

That text at the bottom refers to the metadata the user is typing in. It is an important distinction as the data may have a closed licence. The small print grants everyone to use the CKAN metadata for all sorts of purposes. I think it would be a big error to delete it. If no-one here has understood it then it needs rewording.

@nigelbabu
Copy link
Contributor

It probably needs re-wording because until you explained it, it wasn't obvious.

@davidread davidread assigned davidread and unassigned amercader Apr 10, 2014
@davidread
Copy link
Contributor

@dread to reword or suggest move to a t&c page agreed to on registration.

@seanh
Copy link
Contributor Author

seanh commented Apr 10, 2014

Aside from having two different licenses on one page being confusing, I think there's a few problems here:

  1. On the create/update dataset form (see screenshot above) I'm entering a bunch of metadata (title, description..) and selecting a license. When I select a license on the form, I think I'd expect that the license applies to the information I'm entering into that form. In fact the intention is that the license applies to the files I'll upload on the next page, and the metadata license applies to the stuff entered into the form. But this isn't clear at all, it's just labelled license.
  2. The metadata licence presumably applies to all metadata including that of groups, orgs, user profiles, actsvities and following, etc etc. So it belongs at sign up time not at dataset create time.
  3. But if you ask me to agree to the open database license at sign up time, I'm naturally going to assume that license is going to apply to everything I add to the site. This might discourage me if I think oh but I wanted to publish my data files under license X but it.s making me agree to this other one. So at sign up time you'll have to somehow make it clear that uploaded, linked to or datastore-added data has a different license that you get to choose later.

I think the optimal solution here may be:

a. Do (3) above

b. Don't show the license option on the create/update dataset form, instead show it on the create/update resource form, and show it next to the upload button making it clear that it applies only to the file and not to the metadata also entered on that form.

c. Change CKAN to store the license against the resource not against the dataset.

Unfortunately this isn't practical, as we'd have to change CKAN's model and APIs in breaking ways, and there would be UI changes needed too in the way that datasets are displayed beyond just the forms, to handle datasets containing resources with different licenses.

I don't think we can do (a) and (b) but not (c), though. When you went to add a second resource to a dataset, the license shown next to the upload button for that resource would have to be fixed to the license already selected when uploading the first resource. Also editing the license of one resource would change it for all of them but it'd be difficult to make this clear to the user.

So we might have to leave the license on the dataset form, but somehow using a help text or something make it clear to the user that this license applies to the contents of the dataset's resource files and not to the metadata entered into that same form or the resource form. I think it's difficult to make that clear but I stand to be corrected.

Lastly, it is best practice for open data sites to use the same license for all data, but CKAN by default offers several licenses (admittedly all open ones) and lets each dataset creator choose one for each dataset. This seems to be encouraging many different licenses on one site which I think is bad.

So maybe we should change the default: one data license, and one metadata license. Or maybe even just one license that applies to data and metadata. Customizable by the site admin, of course (maybe move the option onto the site admin page by default, not on the dataset form). We'd still allow site admins to change the licenses for their site or to offer users multiple licenses to choose from, this would be doable in the API but it would be up to individual site designers to solve the UI problems above for the particular case of their site if they want to do it.

So it may be we can avoid all the seemingly tricky problems and also encourage best practice by just changing the default to one license everywhere, agreed by the user at sign-up time andm aybe shown site-wide in the footer of the site.

@davidread
Copy link
Contributor

The license only need apply to the dataset form, because that includes package, resource and its group. It would be a bonus to have the license apply to the group data - it's not really a concern for the use cases I've seen TBH. The activity stream is generated automatically so I don't believe there is any copyright to worry about. User profiles seem private to me - not something you want to release as open data. And the same goes for who is following whom. So I'm perfectly happy to leave this small print in the dataset form, (and possibly add it to the group form) rather than move it to user registration.

I don't like the idea of agreeing to T&C on registration because it means we have to write a T&C page which covers a number of things. And it means that every site that installs CKAN would be prompted to think about what their T&Cs are and get lawyers in involved that would likely ignore the need for an open metadata licence. I think if we can just get away with it in the small print on the form then it serves our goals best.

The license that you select in the form is, like everything else in the form, about the data. I don't think there has been any confusion that it applies to the metadata, has there? I'm happy for that to be added to a pop-up help text or whatever you have to document the fields, but I would be against renaming the field from "License" because of any perceived problem.

As regards forcing the data license, that is complicated. In DGU we designed our form to push departments to say the data is released under our favoured licence. And what goes on our site is reasonably authoritative, and often the only place the licence is stated anywhere, e.g. for smaller publishers. However there are some big publishers who are allowed to and have made a conscious decision to use a different licence for all their data and the best service we do for them is to allow them to say what that licence is. So for that reason, I'd imagine most CKAN sites would most want a free-selection of data licence.

However the metadata licence should be open. DGU requires the metadata to be ODBL or it is not allowed on the site, simple as that. CKAN is a tool to create a machine-readable database about datasets - the absolute raison d'etre of CKAN being. So if the metadata license is not stated or is easily changed to a closed one then it is a failure. I imagine the majority of CKAN sites can just have that as a condition slipped into the small-print and everyone benefits from that. Most people don't think about, even most CKAN sysadmins. I'd rather it just sat there in the small-print, everyone passively agrees to it and it didn't draw attention to itself. I think it is a good thing to make it not too easy for CKAN sites to change it from that default, so having it in the default template is better than on a separate terms and conditions page, or a config option.

BTW that blog post about not being able to plot the datasets on a map is exactly because that catalog doesn't have an explicit metadata licence. You CAN plot the datasets in DGU or datahub.io and I imagine most of the other CKANs who still have that clause in the form.

I suggest we change the text from:

Important: By submitting content, you agree to release your contributions under the Open Database License.

to:

By submitting this form, you agree to release this metadata under the Open Database License.

@seanh
Copy link
Contributor Author

seanh commented Apr 10, 2014

Your text is about as good as it can get, I think. I still think it's quite unclear what the dataset license applies to: not the metadata on that form, nor the metadata on the next form, but the contents of the files that will be uploaded on the next form. But I think it's quite difficult to make that clear without it being pretty awkward, especially because the dataset license is set against the dataset and not against each resource (so we can't show the field next to the resource file field).

Probably the best we can do is some help text next to the field, something like:

The data license will apply to the contents of any data files that you add to this dataset, and to any data that you add using the DataStore API.

As you can see it's quite awkward though, having both those small prints on the one form.

I think re-labelling the field to "Data license" might be good because then we can always talk about the data license or dataset license and the metadata license, and always keep the distinction straight. To just talk about the license when there are two different ones involved seems to invite confusion.

Shame to have form labels differ from what the field is called in the API though.

I agree with many of your points. Here's what I think now:

  1. Metadata license in small print (not T&C on sign up).

  2. Somehow make it clearer what the dataset license applies to, as mentioned above.

  3. We do need one metadata license for the whole site, and that metadata license should be an open one by default and doesn't need to be any easier to change than it already is. So internally the license of each dataset, the default dataset license, and the metadata license are three separate settings.

  4. I still think having just one open dataset license by default is a better default. It promotes open data best practice. And I do think that having all the datasets under a single open license is just as important as for all the metadata. If I want to take data from multiple datasets and aggregate it into a single data re-use, that's much easier if they all have the same open data license or maybe just one or two different ones. CKAN currently offers 15 different choices including "no license" and several "Other" options!

    Yes, site admins should be able to easily add extra optional dataset licenses if they need to (while keeping the single site-wide metadata license), but I don't see why CKAN should offer a whole list of dataset licenses by default.

  5. If it's possible to use the same open data license for both the metadata and the daasets, that would be even better. They should still be separate settings internally, but by default set to the same license. It would simplify the UI a whole lot by removing the whole confusion about which license applies to what.

But if it's important to support multiple dataset licenses anyway because a lot of sites are just going to need that, then there might not be much benefit to be had because as soon as they add a second dataset license choice then you have to make the distinction between the choosable dataset license and the site-wide metadata license clear to the users and the whole problem re-emerges.

We could just ignore that in the default UI and leave it up to individual site developers to deal with it. I guess it depends, if we think that most sites are gonna need more than one dataset license option that we should support the dataset vs metadata license distinction in our UI by default.

Hope that's clear

@seanh
Copy link
Contributor Author

seanh commented Apr 10, 2014

The license that you select in the form is, like everything else in the form, about the data. I don't think there has been any confusion that it applies to the metadata, has there?

I suppose that without actual user testing we can only guess, but I don't think this is clear. For example I think of the title that I enter on that form, "My Dataset", as the title of the dataset, by which I mean not just the data itself but the whole package, including the dataset and resource-level metadata. It's not just the title of the data. The URL on that form is also the URL where you go to see the whole package, data and metadata. If I use the name that I enter on that form to retrieve the package via the API, I get the package and resource metadata, not the data. So I'd think that intuitively the license would apply to the whole package like everything else seems to. I'd bet that the whole idea that you would have separate data and metadata licenses is not obvious at all to people who aren't open data website experts.

Maybe I'm making a problem out of nothing, difficult to say.

@davidread
Copy link
Contributor

Glad we've come to agree on the text and keeping it where it is.

I'm not very keen on adding small print next to the resource URL. I see your point about the package.title meaning not just the data's title, but the package's too. But I think we should try and get away from that if we can. Already it is hard enough displaying logically the separations between the data, metadata as well as meta-metadata (e.g. when & by whom the CKAN record was last updated). So maybe at the top of the package form we'd put "About the data". And the URL field might be renamed "Home page", rather than implying that the URL is the data itself. I think this might also be a better way to indicate that the license field is for the data, rather than calling it "Data Licence".

@seanh
Copy link
Contributor Author

seanh commented Apr 24, 2014

Ping.

I don't think putting "About the data" and changing URL to "Home page" helps this.

I want to go back to the original solution: just remove the metadata license small text. That solves the immediate problem: that we have two different licenses right next to each other on one form, with no explanation of why or which license applies to what, that is really bad.

I think introducing a separate metadata license is a good idea, but that can be done in a separate pull request, and it should be done so that it's reasonably clear to the user what the distinction between data license and metadata license is, which license applies to what, and also that the metadata license applies to all metadata including e.g. groups and orgs, metadaa added via the API, etc. Putting small text on the dataset form doesn't achieve this, imho, and I think trying to tweak it is just going to make it more awkward and confusing.

@davidread
Copy link
Contributor

If you just want to remove a licence, I would argue that the one with the form field is of less importance than one in the small print.

CKAN is built to be a factory for open metadata and you want to remove the "open" part of that. I don't understand why this decision would be taken so lightly.

If you want to move the metadata licence then make the argument for that here and put it in the same pull request as the removal from the form.

@vitorbaptista
Copy link
Contributor

What if we moved that note near the license selection? Something like:

License: _____________________
        Important: This is the dataset resources' license. Its metadata is released under the Open Database License.

Or something along those lines.

@wardi
Copy link
Contributor

wardi commented Apr 24, 2014

I suggest no special text around the selected license, but change the disclaimer near the submit button to be something like:

The license chosen above applies to the data linked to within this dataset. By submitting these metadata fields you agree to release this metadata under the Open Database License.

@seanh
Copy link
Contributor Author

seanh commented Apr 24, 2014

I like @wardi's suggestion of just changing the smalltext at the bottom of the form to cover both cases, less ugly than having two help texts one of them in the middle of the form, but I think it needs to be slightly more explicit, here's an attempt:

The data license you select above only applies to the contents of any resource files that you add to this dataset. By submitting this form, you agree to release the metadata values that you enter into the form under the Open Database License.

Arguably the file that you upload is a value that you enter into the form as well, so maybe it can be improved somehow.

"Resource file" is ugly, but I think it helps to clarify it because CKAN calls them resources.

It really has to be hyperlinked, googling open database license produces mulitple results.

I think it's very odd that it only appears on the first form when creating a dataset. It's not on the second or third forms of dataset creation stages, nor the dataset or resource edit forms, nor the group or organization forms ... It's ugly, but if the point is to get all the metadata under ODBL it surely has to be at the bottom of every form?

My preference would simply be to just remove it, as I think this smalltext is very unwieldy and I don't think ensuring that metadata is ODBL is doing a whole lot for open data really. (A catalog of open metadata without the open data to go with it seems a bit pointless, and I think CKAN has moved/is moving well beyond the just being a metadata catalog thing and into being an actual file and data store). But since I think I'm on my own there, I can go with wardi's solution.

@davidread
Copy link
Contributor

Good call @wardi and @seanh. Sean's text has the edge for me.

@vitorbaptista
Copy link
Contributor

👍 for @seanh's text. I'd just change ...values that you enter into _the_ form... to ...values that you enter into _this_ form.... And I agree it should be on every form.

@seanh seanh assigned seanh and unassigned davidread Apr 29, 2014
seanh pushed a commit that referenced this pull request May 21, 2014
@seanh
Copy link
Contributor Author

seanh commented May 21, 2014

Done here #1728

@seanh seanh closed this May 21, 2014
davidread pushed a commit that referenced this pull request May 22, 2014
@smotornyuk smotornyuk deleted the 1503-two-licenses branch December 19, 2018 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants