-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Two different licenses on create dataset page #1503
Conversation
Dev meeting decided: we can just get rid of this footer text. |
Two licenses on the same form, confusing.
@@ -23,15 +23,6 @@ | |||
|
|||
{% block form_actions %} | |||
<div class="form-actions"> | |||
{% block disclaimer %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we want to keep the block empty in case people want to add some text in it (and it's unlikely but maybe people has alrady used it on their extensions)
@seanh see comment regarding empty block, other than that looks good |
@amercader Finally done :) |
That text at the bottom refers to the metadata the user is typing in. It is an important distinction as the data may have a closed licence. The small print grants everyone to use the CKAN metadata for all sorts of purposes. I think it would be a big error to delete it. If no-one here has understood it then it needs rewording. |
It probably needs re-wording because until you explained it, it wasn't obvious. |
@dread to reword or suggest move to a t&c page agreed to on registration. |
Aside from having two different licenses on one page being confusing, I think there's a few problems here:
I think the optimal solution here may be: a. Do (3) above b. Don't show the license option on the create/update dataset form, instead show it on the create/update resource form, and show it next to the upload button making it clear that it applies only to the file and not to the metadata also entered on that form. c. Change CKAN to store the license against the resource not against the dataset. Unfortunately this isn't practical, as we'd have to change CKAN's model and APIs in breaking ways, and there would be UI changes needed too in the way that datasets are displayed beyond just the forms, to handle datasets containing resources with different licenses. I don't think we can do (a) and (b) but not (c), though. When you went to add a second resource to a dataset, the license shown next to the upload button for that resource would have to be fixed to the license already selected when uploading the first resource. Also editing the license of one resource would change it for all of them but it'd be difficult to make this clear to the user. So we might have to leave the license on the dataset form, but somehow using a help text or something make it clear to the user that this license applies to the contents of the dataset's resource files and not to the metadata entered into that same form or the resource form. I think it's difficult to make that clear but I stand to be corrected. Lastly, it is best practice for open data sites to use the same license for all data, but CKAN by default offers several licenses (admittedly all open ones) and lets each dataset creator choose one for each dataset. This seems to be encouraging many different licenses on one site which I think is bad. So maybe we should change the default: one data license, and one metadata license. Or maybe even just one license that applies to data and metadata. Customizable by the site admin, of course (maybe move the option onto the site admin page by default, not on the dataset form). We'd still allow site admins to change the licenses for their site or to offer users multiple licenses to choose from, this would be doable in the API but it would be up to individual site designers to solve the UI problems above for the particular case of their site if they want to do it. So it may be we can avoid all the seemingly tricky problems and also encourage best practice by just changing the default to one license everywhere, agreed by the user at sign-up time andm aybe shown site-wide in the footer of the site. |
The license only need apply to the dataset form, because that includes package, resource and its group. It would be a bonus to have the license apply to the group data - it's not really a concern for the use cases I've seen TBH. The activity stream is generated automatically so I don't believe there is any copyright to worry about. User profiles seem private to me - not something you want to release as open data. And the same goes for who is following whom. So I'm perfectly happy to leave this small print in the dataset form, (and possibly add it to the group form) rather than move it to user registration. I don't like the idea of agreeing to T&C on registration because it means we have to write a T&C page which covers a number of things. And it means that every site that installs CKAN would be prompted to think about what their T&Cs are and get lawyers in involved that would likely ignore the need for an open metadata licence. I think if we can just get away with it in the small print on the form then it serves our goals best. The license that you select in the form is, like everything else in the form, about the data. I don't think there has been any confusion that it applies to the metadata, has there? I'm happy for that to be added to a pop-up help text or whatever you have to document the fields, but I would be against renaming the field from "License" because of any perceived problem. As regards forcing the data license, that is complicated. In DGU we designed our form to push departments to say the data is released under our favoured licence. And what goes on our site is reasonably authoritative, and often the only place the licence is stated anywhere, e.g. for smaller publishers. However there are some big publishers who are allowed to and have made a conscious decision to use a different licence for all their data and the best service we do for them is to allow them to say what that licence is. So for that reason, I'd imagine most CKAN sites would most want a free-selection of data licence. However the metadata licence should be open. DGU requires the metadata to be ODBL or it is not allowed on the site, simple as that. CKAN is a tool to create a machine-readable database about datasets - the absolute raison d'etre of CKAN being. So if the metadata license is not stated or is easily changed to a closed one then it is a failure. I imagine the majority of CKAN sites can just have that as a condition slipped into the small-print and everyone benefits from that. Most people don't think about, even most CKAN sysadmins. I'd rather it just sat there in the small-print, everyone passively agrees to it and it didn't draw attention to itself. I think it is a good thing to make it not too easy for CKAN sites to change it from that default, so having it in the default template is better than on a separate terms and conditions page, or a config option. BTW that blog post about not being able to plot the datasets on a map is exactly because that catalog doesn't have an explicit metadata licence. You CAN plot the datasets in DGU or datahub.io and I imagine most of the other CKANs who still have that clause in the form. I suggest we change the text from:
to:
|
Your text is about as good as it can get, I think. I still think it's quite unclear what the dataset license applies to: not the metadata on that form, nor the metadata on the next form, but the contents of the files that will be uploaded on the next form. But I think it's quite difficult to make that clear without it being pretty awkward, especially because the dataset license is set against the dataset and not against each resource (so we can't show the field next to the resource file field). Probably the best we can do is some help text next to the field, something like:
As you can see it's quite awkward though, having both those small prints on the one form. I think re-labelling the field to "Data license" might be good because then we can always talk about the data license or dataset license and the metadata license, and always keep the distinction straight. To just talk about the license when there are two different ones involved seems to invite confusion. Shame to have form labels differ from what the field is called in the API though. I agree with many of your points. Here's what I think now:
But if it's important to support multiple dataset licenses anyway because a lot of sites are just going to need that, then there might not be much benefit to be had because as soon as they add a second dataset license choice then you have to make the distinction between the choosable dataset license and the site-wide metadata license clear to the users and the whole problem re-emerges. We could just ignore that in the default UI and leave it up to individual site developers to deal with it. I guess it depends, if we think that most sites are gonna need more than one dataset license option that we should support the dataset vs metadata license distinction in our UI by default. Hope that's clear |
I suppose that without actual user testing we can only guess, but I don't think this is clear. For example I think of the title that I enter on that form, "My Dataset", as the title of the dataset, by which I mean not just the data itself but the whole package, including the dataset and resource-level metadata. It's not just the title of the data. The URL on that form is also the URL where you go to see the whole package, data and metadata. If I use the name that I enter on that form to retrieve the package via the API, I get the package and resource metadata, not the data. So I'd think that intuitively the license would apply to the whole package like everything else seems to. I'd bet that the whole idea that you would have separate data and metadata licenses is not obvious at all to people who aren't open data website experts. Maybe I'm making a problem out of nothing, difficult to say. |
Glad we've come to agree on the text and keeping it where it is. I'm not very keen on adding small print next to the resource URL. I see your point about the package.title meaning not just the data's title, but the package's too. But I think we should try and get away from that if we can. Already it is hard enough displaying logically the separations between the data, metadata as well as meta-metadata (e.g. when & by whom the CKAN record was last updated). So maybe at the top of the package form we'd put "About the data". And the URL field might be renamed "Home page", rather than implying that the URL is the data itself. I think this might also be a better way to indicate that the license field is for the data, rather than calling it "Data Licence". |
Ping. I don't think putting "About the data" and changing URL to "Home page" helps this. I want to go back to the original solution: just remove the metadata license small text. That solves the immediate problem: that we have two different licenses right next to each other on one form, with no explanation of why or which license applies to what, that is really bad. I think introducing a separate metadata license is a good idea, but that can be done in a separate pull request, and it should be done so that it's reasonably clear to the user what the distinction between data license and metadata license is, which license applies to what, and also that the metadata license applies to all metadata including e.g. groups and orgs, metadaa added via the API, etc. Putting small text on the dataset form doesn't achieve this, imho, and I think trying to tweak it is just going to make it more awkward and confusing. |
If you just want to remove a licence, I would argue that the one with the form field is of less importance than one in the small print. CKAN is built to be a factory for open metadata and you want to remove the "open" part of that. I don't understand why this decision would be taken so lightly. If you want to move the metadata licence then make the argument for that here and put it in the same pull request as the removal from the form. |
What if we moved that note near the license selection? Something like:
Or something along those lines. |
I suggest no special text around the selected license, but change the disclaimer near the submit button to be something like:
|
I like @wardi's suggestion of just changing the smalltext at the bottom of the form to cover both cases, less ugly than having two help texts one of them in the middle of the form, but I think it needs to be slightly more explicit, here's an attempt:
Arguably the file that you upload is a value that you enter into the form as well, so maybe it can be improved somehow. "Resource file" is ugly, but I think it helps to clarify it because CKAN calls them resources. It really has to be hyperlinked, googling open database license produces mulitple results. I think it's very odd that it only appears on the first form when creating a dataset. It's not on the second or third forms of dataset creation stages, nor the dataset or resource edit forms, nor the group or organization forms ... It's ugly, but if the point is to get all the metadata under ODBL it surely has to be at the bottom of every form? My preference would simply be to just remove it, as I think this smalltext is very unwieldy and I don't think ensuring that metadata is ODBL is doing a whole lot for open data really. (A catalog of open metadata without the open data to go with it seems a bit pointless, and I think CKAN has moved/is moving well beyond the just being a metadata catalog thing and into being an actual file and data store). But since I think I'm on my own there, I can go with wardi's solution. |
👍 for @seanh's text. I'd just change |
Done here #1728 |
[#1503] Clarify data vs metadata licenses
Which license am I releasing my data under, creative commons attribution or open database?