Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement : Add fields for projects metadata #1221

Closed
ettorerizza opened this issue Aug 1, 2017 · 81 comments
Closed

Enhancement : Add fields for projects metadata #1221

ettorerizza opened this issue Aug 1, 2017 · 81 comments
Assignees
Labels
metadata Adding metadata to projects, columns and other parts of the data model Priority: High Denotes issues that require urgent attention and may be blocking progress. Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements.
Milestone

Comments

@ettorerizza
Copy link
Member

ettorerizza commented Aug 1, 2017

I have close to 300 projects in my workspace directory. Some have several years. The problem is I don't know where the data comes from, when exactly I processed them (two years ago, OK, but what month / day?), or sometimes simply what they correspond to. It should be recorded somewhere, for example, in metadata fields beside the project name.

screenshot-127 0 0 1-3333-2017-08-01-13-56-24

I do not know if it's difficult to implement. If someone else is interested, maybe we could associate to launch a decent bounty? I already put 150 dollars on BountySource.

@wetneb
Copy link
Sponsor Member

wetneb commented Aug 1, 2017

That would make a lot of sense! The original source and format of the data could be tracked there. This might also be useful for rich format imports (such as https://specs.frictionlessdata.io/data-package/ ).

@ettorerizza
Copy link
Member Author

@wetneb Interesting ! Open refine has become a sort of data warehouse for me, and for others also I guess. It is widely used in the information sciences community. For a librarian, not being able to add metadata is a torture.

@wetneb wetneb added the Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements. label Aug 1, 2017
@wetneb wetneb added the metadata Adding metadata to projects, columns and other parts of the data model label Aug 2, 2017
@wetneb
Copy link
Sponsor Member

wetneb commented Aug 2, 2017

Some ideas here: #1045

@ettorerizza
Copy link
Member Author

ettorerizza commented Aug 3, 2017

We have an offer for $ 250 (100 missing)

https://www.bountysource.com/issues/47809745-enhancement-add-fields-for-projects-metadata

@wetneb
Copy link
Sponsor Member

wetneb commented Aug 3, 2017

Hey @denim2x, welcome! Can you give us an idea of what your plans would be for this feature? There are various issues related to project metadata (see the tag: metadata Adding metadata to projects, columns and other parts of the data model , although not all of them are relevant), so it would be good to think about a solution with these broad use cases in mind.

@magdmartin
Copy link
Member

magdmartin commented Aug 4, 2017

Here is a summary of the comments and suggestion from the different tickets together (adding few of my own). Feel free to comment and enrich.

List of meta to create:

Make the list of project sortable by metadata (from #657)

Meta information are available from metadata.json ; /command/core/get-all-project-metadata, /command/core/get-models and /command/core/get-columns-info.

@ettorerizza
Copy link
Member Author

Dublin Core can be an inspiration for metadata fields, although not all of the fifteen elements are needed in this case.

Title
Creator
Subject
Description
Publisher
Contributor
Date
Type
Format
Identifier
Source
Language
Relation
Coverage
Rights

@wetneb
Copy link
Sponsor Member

wetneb commented Aug 4, 2017

Thanks a lot @magdmartin for this summary of the other issues! That's very convenient. I think it would also be good to have other generic metadata fields as @ettorerizza proposes.

We really should keep #778 and #1096 in mind for that. This document gives some ideas about what sort of metadata data packages provide: https://www.w3.org/TR/2015/PR-tabular-metadata-20151117/

If we want to support these sort of formats, I think it would be reasonable to designate a list of core metadata fields (things that apply to all OpenRefine projects and would be exposed in the UI) and also provide some support for custom metadata provided by the original import format. This extra metadata could be reused by exporters or extensions (so that, for instance, no metadata is lost if you import a data package, clean it, and export it to another data package).

@denim2x
Copy link

denim2x commented Aug 4, 2017

@wetneb PLease clarify: do you require PR-tabular-metadata conformance for this issue?

@wetneb
Copy link
Sponsor Member

wetneb commented Aug 4, 2017

@denim2x I think the Tabular Metadata specification goes way beyond what @ettorerizza had in mind when he started his issue and bounty. So I do not think implementing that should be required to close this issue.

However, solving this issue will probably involve some changes to the model, possibly impacting the format in which projects are persisted (if we want to add fields that cannot be stored yet). Doing this sort of change is slightly complicated: we might need to adapt the importers so that they fill the fields we introduce, ensure the existing workspaces will be migrated seamlessly to the format that supports the new fields, and so on.

So, this relatively heavy change should be made with the other use cases in mind: ideally, we should avoid further model changes and adopt a model that works for a wide range of metadata needs. That does not mean the UI should fully expose the flexibility of this new model, or anything. But maybe that will turn out to be too complicated and therefore out of the scope of this issue. I just want us to have that discussion and not rush too much on adding very particular fields without stepping back and looking at the bigger picture.

@fpompermaier
Copy link
Contributor

Hi to all,
I'm happy to see OpenRefine regaining interest from mantainers!
I think that nowadays Refine should be able to load data from local filesystem (as usual) but it should also evolve to exchange data from/to external systems like databases or Hive/HBase tables or Parquet Directory, delegating transformation of those datasets to Big Data engine like Spark or Flink (if the required memory to process a dataset goes over a certain threshold).
In this direction, it should be nice to introduce into Refine the concept of Catalog, in order to be able to properly manage metadata and interface with other existing repositories (like HCatalog and similar).
In this direction, a change to Refine datamodel could be also useful (e.g Apache Arrow could be a perfect fit).

I know that comment s a little bit out of discussion but I'd like to stimulate some discussion in this sense..

@denim2x
Copy link

denim2x commented Aug 4, 2017

@wetneb Are you considering using Apache Arrow in OpenRefine?

@wetneb
Copy link
Sponsor Member

wetneb commented Aug 4, 2017

@denim2x I don't know! Do you think it would make sense to use it to solve this issue? I am not familiar at all with that.

@thadguidry
Copy link
Member

thadguidry commented Aug 4, 2017

Breathe everyone. :)

Don't worry about compatibility with external projects. Others outside of OpenRefine can take that task because its just simple JSON with our existing metadata format.

@fpompermaier We have a plan, long term for bigger data under Performance Improvements 2017

All -

  1. What you want at the end of the day is reusing our simple key/value pair storage per project. We have that in place and it's stored like so in metadata.json

  2. For those doing the work, this search actually shows quite well (using "last modified") all the parts of OpenRefine that get touched or will need to be touched or reviewed. https://github.com/OpenRefine/OpenRefine/search?utf8=%E2%9C%93&q=Last+Modified&type=

  3. I'd prefer that the design allow the user with a button and simple dialog (reuse our existing dialogs in many places already to allow checkbox choice) to choose which keys are important to show from that metadata on the project open ui and show them as columns as @ettorerizza wants. Don't pick which keys for them, but having the initial default of created and notes would be a very wise choice, the other key/values should be allowed to be filled in by the author. Similar to the UI experience we have for our preferences.vt You could have a button next to each project name that opens the metadata table reading in the metadata.json and allowing editing in table form, like we do for preferences.vt
    See [1] https://github.com/OpenRefine/OpenRefine/blob/master/main/webapp/modules/core/scripts/preferences.js
    [2] https://github.com/OpenRefine/OpenRefine/blob/master/main/src/com/google/refine/ProjectMetadata.java#L108 , etc

@ettorerizza
Copy link
Member Author

ettorerizza commented Aug 4, 2017

My initial idea was to have fields directly editable right next to the project name, but a key-value menu like preferences makes a lot of sense. I just want to mention a possible UI implementation already used by Google Fusion tables. A screencast will be clearer than my explanations.

screencast

@thadguidry
Copy link
Member

@ettorerizza Yeap, that's the idea, an ABOUT or ? question mark button next to each project name that can open up a table of KEY / VALUE columns similar to our preferences.vt Also, being able to click checkboxes (up to perhaps max 4, or whatever works) those KEY's Values that you want to "SHOW ON PROJECT PAGE" as an additional column (space available) for easier instant viewing when the project page shows.

@denim2x
Copy link

denim2x commented Aug 21, 2017

@thadguidry @wetneb Any further information for working on this feature would be very appreciated

@denim2x
Copy link

denim2x commented Aug 21, 2017

@ettorerizza It'd be nice if you could provide some mockups with the improved UI (with steps)

@ettorerizza
Copy link
Member Author

ettorerizza commented Aug 22, 2017

@denim2x Personally, I do not have any other expectations than those mentioned above: key-value fields that allow the user to add custom metadata to each project and a summary table of the metadata that allows to sorts and find project by date, by keywords, by title, etc. @thadguidry or @magdmartin probably have a more precise and more general view of what the enhancement should look like in practice.

@wetneb
Copy link
Sponsor Member

wetneb commented Sep 5, 2017

@denim2x any news about this? Let us know if you have any question about the feature.

@ettorerizza
Copy link
Member Author

Is anyone able to connect to BountySource? https://salt.bountysource.com/teams/openrefine

@wetneb
Copy link
Sponsor Member

wetneb commented Sep 11, 2017

@ettorerizza it looks like their SSL certificate expired yesterday - I expect it will be renewed soon. I don't think they ran away with your bounty ^^

@jackyq2015
Copy link
Contributor

I have a little concern. I haven't received my payment after more than 2 months. I was told that "the payment though cheque is rare".

@wetneb
Copy link
Sponsor Member

wetneb commented Sep 11, 2017

@jackyq2015 yeah, actually maybe we should have a discussion about Bountysource… it has a number of issues so we might want to switch to something else instead.

  • there is a fairly high service fee
  • github integration is broken
  • the website is currently unavailable
  • bounties take a long time to be paid out

I quite like Liberapay (https://liberapay.com/ ), which is actively maintained and has a smaller service fee (basically just the payment processing fees of their bank). It does not allow to put bounties on specific issues though.

@jackyq2015
Copy link
Contributor

@ettorerizza I fixed the table width issue. Not perfect as I said. but much better. :)

Also the refresh issue was fixed. Please have a try.

@jackyq2015 jackyq2015 self-assigned this Nov 9, 2017
@wetneb
Copy link
Sponsor Member

wetneb commented Nov 9, 2017

@jackyq2015 thanks a lot for taking my rants into account! That was quick! I have just tried the new projects view and it looks great! Here are a few other things:

  1. When your mouse is not positioned on any row of the table, the first three columns look a bit mysterious - maybe it would be worth having some labels for them too? Or not having empty headers at all?
    mysterious

  2. I just tried creating a project from a URL (CSV, with default import options) and the row count in the project list was displayed as 0. It took me a hard refresh (Ctrl-F5) to see the correct count.

  3. Do we store the importer used to create the project anywhere? I can see you store the importing options (that's great!) but I would find it useful to know which importer was used.

  4. Quite a lot of comments about the metadata view:
    metadata_view

@jackyq2015
Copy link
Contributor

jackyq2015 commented Nov 9, 2017

@wetneb Thanks for the feedback. I agreed all of them. I will be working on the improvement.

For that 3 mysterious link at left hand, I think we can just give them a label instead of hiding it. Also we can remove the header borders for those 3.

For the "about" view, are you suggesting to remove the background of "edit" button to make it fit?

Any thoughts?

@jackyq2015
Copy link
Contributor

Pushed the code. It is much better now! cheers

@jackyq2015
Copy link
Contributor

I would suggest to close this issue if there is no objection. Please raise issue if there is any.

@wetneb
Copy link
Sponsor Member

wetneb commented Nov 10, 2017

@jackyq2015 it looks much better now! I still see a few points that we can still discuss:

  • do we really want to display the fields with raw JSON? Especially the one with column metadata (as it will currently remain empty and should be eventually manipulated from the project view anyway?)
  • the "Create Date" is currently marked as editable but does not have any validation (it's possible to type in invalid datestamps)… I would either make it properly editable (with a stock date picker widget) or simply disable editing for that (why would you change that date anyway?)
  • I like the light blue better in the metadata view (and it's more in line with the overall style of the UI) but I still think it's a bit too heavy - why don't we reuse the two colors used in the grid view for that? (#fff and #f2f2f2). That would be more consistent I think.
  • the "About" cell is capitalized whereas "rename" isn't. I would change it to "about" for consistency. (Also, I think the other labels should only be capitalized for the first word - again for consistency with the rest of the UI - so "Project Name" -> "Project name")
  • Do we still need the rename link now that the name can be edited in the metadata view?
  • It's not clear to me that "about" is the think you should click if you want to edit what you see in the table. I would find it more intuitive with "edit", or even better: a pencil symbol (for consistency with the delete button on the left hand side).
  • Would it make sense to have a metadata field that stores the original format (or name of the importer) of the dataset? It does not look like this is stored in the import metadata.

If you're fed up with these UI tweaks let me know, I can do them :) I just think it's important to keep the UI as clean as we can, especially for a part that is seen a lot.

One nice byproduct of this change is that the delete buttons are now always visible - I remember that it took me a while to figure out how to delete a project and that is now much clearer!

@wetneb
Copy link
Sponsor Member

wetneb commented Nov 10, 2017

Oh, and two other things (sorry):

  • the delete button is placed weirdly with Chromium (not centered in its cell)
  • it would be nice if the "last modified" could still be displayed in the nice human-readable form they had while preserving the sorting functionality… I have no idea how hard that would be. But in the current state it's kind of a regression in terms of user-friendliness.

@jackyq2015
Copy link
Contributor

jackyq2015 commented Nov 10, 2017

@weblate please see below:

  • do we really want to display the fields with raw JSON? Especially the one with column metadata (as it will currently remain empty and should be eventually manipulated from the project view anyway?)
    A: For column metadata, I would imagine we may need an extra view just for handling its complexity. For the import option, since it's read only. It should be sufficient for now
    THAD AGREES WITH ANSWER , WE WANT A DIFFERENT VIEW FOR COLUMNAR META EDITS
  • the "Create Date" is currently marked as editable but does not have any validation (it's possible to type in invalid datestamps)… I would either make it properly editable (with a stock date picker widget) or simply disable editing for that (why would you change that date anyway?)
    A: Will disable the edit for this field
  • I like the light blue better in the metadata view (and it's more in line with the overall style of the UI) but I still think it's a bit too heavy - why don't we reuse the two colors used in the grid view for that? (#fff and #f2f2f2). That would be more consistent I think.
    A: Agree. will change style.
  • the "About" cell is capitalized whereas "rename" isn't. I would change it to "about" for consistency. (Also, I think the other labels should only be capitalized for the first word - again for consistency with the rest of the UI - so "Project Name" -> "Project name")
    A: I can change the rename to Rename for consistence. But I think we should keep the "About" rather than "edit" since "edit" may means "edit the data". For the first upper case only, will make the change.
    THAD AGREES WITH ANSWER
  • Do we still need the rename link now that the name can be edited in the metadata view?
    A: It is a shortcut. I would prefer to keep it
    THAD AGREES WITH ANSWER
  • It's not clear to me that "about" is the think you should click if you want to edit what you see in the table. I would find it more intuitive with "edit", or even better: a pencil symbol (for consistency with the delete button on the left hand side).
    A: see above comments
  • Would it make sense to have a metadata field that stores the original format (or name of the importer) of the dataset? It does not look like this is stored in the import metadata.
    A: There is key "source" in the import option. from that mostly likely you can tell the file format. Also the options are different from importer to importer.
  • the delete button is placed weirdly with Chromium (not centered in its cell)
    A: will try to fix it if I can. :)
  • it would be nice if the "last modified" could still be displayed in the nice human-readable form they had while preserving the sorting functionality… I have no idea how hard that would be. But in the current state it's kind of a regression in terms of user-friendliness.
    A: For me, the "human-readable form" is actually harder to read than the current way(at least for me). Especially given that it can sort asc and desc. For example, if I want to find a project I did at 2017/11/01, it is much easier to do a sort then I can locate it. For the old way, you have to calculate it's XX days ago then you can go get it. It is personal preference. If there are more person like the old way to present, it can be done.
    THAD AGREES WITH ANSWER AND THIS HELPS WITH CONSISTENCY INTERNATIONALLY AS WELL

@thadguidry
Copy link
Member

@wetneb @jackyq2015 My Comments are in italics above.

@wetneb
Copy link
Sponsor Member

wetneb commented Nov 11, 2017

The "source" key in import metadata is not enough to tell which importer was used! For instance, if it's just from the clipboard (and in general not all file names indicate their format correctly, and the user might have chosen a different importer)… So I think it would really be worth storing that explicitly!

I'm curious to know if users want to see ISO timestamps rather than human-formatted dates… I think it looks quite hacky (especially with all the useless "000Z" at the end…). If you prefer to see dates rather than time deltas, then dates should be human-readable too. Think about our non-programmer users: they should not have to know about ISO timestamps to use OpenRefine!

@ettorerizza
Copy link
Member Author

ettorerizza commented Nov 11, 2017

The display of these dates on a 15-inch screen is problematic. It might also be necessary to reduce a little the font of the words "rename" and "about".

screenshot-127 0 0 1-3333-2017-11-11-22-05-27-884

A shorter timestamp, without the milliseconds, could solve the problem easily.

@thadguidry
Copy link
Member

Jacky just need to make the fields selectable to display or not. That probably will solve your problem better.

@jackyq2015
Copy link
Contributor

How about shorten the date to such as "2017-11-11 07:25 PM" ? The rename and about font is already small. Also I can reduce the width of "Row Count" a little bit but not too much(the header has to fit in)

@jackyq2015
Copy link
Contributor

Above was done by PR #1323

@wetneb
Copy link
Sponsor Member

wetneb commented Nov 12, 2017

I agree "Rename" and "About" take a lot of space… That's why I think "Rename" should be removed. I don't see any reason why this shortcut should be kept:

  • it only saves one click anyway
  • it is not placed logically next to the title
  • why provide a shortcut for the title (which is generally filled at project creation) while there are so many other empty fields that the user could want to fill?

@ettorerizza
Copy link
Member Author

@wetneb @jackyq2015 If I rely on my own practices, it's true that I rename projects much more often while they are open. But I do not know if everyone does it that way.

@jackyq2015
Copy link
Contributor

I will remove the "Rename" link from the index page if there is no objection by next Monday before release 2.7.2. All, please speak out if you want to keep it as is.

@jackyq2015
Copy link
Contributor

"Rename link" was removed by PR #1324

@wetneb
Copy link
Sponsor Member

wetneb commented Nov 14, 2017

@jackyq2015 in the end do we have tests for this feature?

@ettorerizza
Copy link
Member Author

ettorerizza commented Nov 15, 2017

Thanks a lot for all this work, @jackyq2015 ! As I said above, you already deserved the bounty with the first draft.

The system as it suits me perfectly. However, I wonder if it's ready for a public release. There are still some display issues that could frustrate users.

The dates display in Chrome or Edge on a 15-inch screen with a 100% zoom is perfectible, but that's just aesthetics:

screenshot-127 0 0 1-3333-2017-11-15-07-31-36-093

Note that there is not this problem with Firefox:

sans titre 1

But in every browsers, the metadata editing window comes out of the screen again. Each time you have to zoom out to access the "edit" buttons.

screenshot-127 0 0 1-3333-2017-11-15-07-32-45-224

screenshot-127 0 0 1-3333-2017-11-15-07-33-23-161

There is also a small problem of consistency between the "transform" window, which is displayed all along the screen, and other edit windows, which remained the same.

screenshot-127 0 0 1-3333-2017-11-15-07-34-12-896

screenshot-127 0 0 1-3333-2017-11-15-07-34-29-743

All this annoys me very little. I set Open Refine on a 90% screen zoom. But I'm afraid it looks a little buggy in an official version 2.7.2.

What do you think, everyone?

@thadguidry
Copy link
Member

@ettorerizza Yes, these need to be cleaned up a bit in our CSS files.
@jackyq2015 It annoys me A LOT. Let's open additional issues to note the problems and then fix them. I also would like to see the Transform window back to the way it was as not automatically stretched 100% in width of window.

Next Release is blocked until these UI issues can be cleaned up.

@jackyq2015
Copy link
Contributor

@ettorerizza I understand you concern. Do you mind open 2 issues(or more if you want) for metadata and the transform UI. In the meantime, I will take a look what we can do

@jackyq2015
Copy link
Contributor

jackyq2015 commented Nov 15, 2017

@ettorerizza @wetneb @thadguidry created PR #1331. Please try / review.
when doing the test for long string(for example description), please add while space in between like normally. that's how browser know where to wrap the word.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
metadata Adding metadata to projects, columns and other parts of the data model Priority: High Denotes issues that require urgent attention and may be blocking progress. Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements.
Projects
No open projects
Development

No branches or pull requests

7 participants