Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a new project directly from a project #1349

Closed
ettorerizza opened this issue Nov 19, 2017 · 9 comments
Closed

Create a new project directly from a project #1349

ettorerizza opened this issue Nov 19, 2017 · 9 comments
Labels
export Exporting a project to some format. Use the format-specific sub-label if available Status: Duplicate Assigned to issues that are exact duplicates of other open issues and  avoid duplicate discussions. Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements.

Comments

@ettorerizza
Copy link
Member

ettorerizza commented Nov 19, 2017

Data wrangling often involves extracting several subsets of a file. So I do not think I'm the only one who spends his time exporting parts of his projects into csv for immediately re-importing it into Open Refine.

Being able to directly create a new project from the current facets would avoid this unnecessary step. The option could be modeled on the "Custom tabular exporter" feature. This would be an opportunity to review the export menu, which still contains references to Freebase (Triple loader, MQLWRITE)

@jackyq2015 jackyq2015 added Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements. export Exporting a project to some format. Use the format-specific sub-label if available Status: Pending Review Indicates that the issue or pull request is awaiting review by project maintainers or collaborators labels Nov 20, 2017
@thadguidry
Copy link
Member

thadguidry commented Nov 20, 2017

@ettorerizza I'm curious however WHY it means several subsets. Is it because we do a crappy job on hiding columns ? Tucking things away that are no longer important as you refine and polish the data through additional column creation ? What / where exactly are the pain points. I'd like to discover those, rather than just giving you a new hammer. Think through the problem carefully. Imagine anything is possible, and just tell us or draw it on paper and take a photo.

@ettorerizza
Copy link
Member Author

ettorerizza commented Nov 20, 2017

"Is it because we do a crappy job on hiding columns ? "

Not at all, @thadguidry . Open Refine offers many ways to filter rows or columns. But these are facets, inherently unstable. It is often much safer to create a subset from these facets. Especially since multiple facets require a lot of RAM. When you have a column containing a million rows and lots of duplicates, it is much faster to export this column to a deduplicated version, to perform clustering, reconciliations, so on, and then cell.cross() join the result to the original file. Sometimes, too, the project has become so heavy (especially because of multiple reconciliations with Wikidata, a lot of json files extracted from APIs, too many records...) that it is better to work on a lighter and cleaner copy.

This is not very different from what we do in R or Pandas by creating multiple dataframes related to the same project.

The new "metadata project" feature will allow us to group all OpenRefine projects related to the same real "Subject". The feature I suggest would only be to facilitate this work on multiple datasets related to each other.

@thadguidry
Copy link
Member

thadguidry commented Nov 20, 2017

@ettorerizza Perfect. That's exactly the response I was hoping you would say. And agree entirely.

However, one thought that came up long ago from David and I was the undo/redo of Facets to store a state and we chatted about this long ago in Gridworks days. But we never got around to researching more the complexity in our code or what would have to be refactored or changed. The idea was generally to equate to the idea of Photoshop's Layers history...pretty powerful stuff...providing a snapshot button that stores the state of all your current Facets.

Thoughts on that idea ? I mean in addition to this issue's immediate need.

@ettorerizza
Copy link
Member Author

ettorerizza commented Nov 20, 2017

One of the vib-bits extensions has a "save facets" option. By clicking on "permalink" just under the name of the project, you can save the facets in the form of a long url. However, sometimes the process fails. I wonder if this is not due to a question of maximum URL length ...

sans titre 1

By the way, I read on Twitter some comments from users who ask why these vib-bits extension so convenient are not integrated into the Open Refine core. I 've looked for that, but I can not find an Open Source license for these extensions.

@ettorerizza
Copy link
Member Author

I'm not even sure it's worth putting a bounty on this kind of feature. If I understand correctly, they pay late, get a large commission, all for something that will be done anyway by someone from the Open Refine team.

@thadguidry
Copy link
Member

thadguidry commented Nov 20, 2017

@ettorerizza we would totally like to include vib-bits like features into OpenRefine, but that's always a time/expense decision. What we can do and hope to do is after our UI refresh, help extension authors get things working again.

For Bountysource, I would advise to hold off. In fact, we are soon to register as a non-profit and take donations directly. We might want to actually turn off the BountySource system, but that needs to be put to vote on our mailing list. (feel free to start that email or @magdmartin can)

Anyways...
But we agree on this particular issue's direct need:
One click new project creation from a state of facets. Correct ?

@jackyq2015
Copy link
Contributor

@ettorerizza the URL max length is around 2K. Not sure normally the length of your URL. If the length is an issue, that can be done by a menu or something like that

@ettorerizza
Copy link
Member Author

ettorerizza commented Nov 21, 2017

@jackyq2015 Thanks for the information, this confirms my intuition: the "permalinks" of vib-bits that display a blank screen are those that contain more than 2000 characters, like this one (2019)

http://127.0.0.1:3333/project?project=1545787044416&ui=%7B%22facets%22%3A%5B%7B%22c%22%3A%7B%22type%22%3A%22list%22%2C%22name%22%3A%22_%20-%20spatial%20-%20key%20-%20key%22%2C%22columnName%22%3A%22_%20-%20spatial%20-%20key%20-%20key%22%2C%22expression%22%3A%22value%22%2C%22omitBlank%22%3Afalse%2C%22omitError%22%3Afalse%2C%22selectBlank%22%3Afalse%2C%22selectError%22%3Afalse%2C%22invert%22%3Afalse%7D%2C%22o%22%3A%7B%22sort%22%3A%22count%22%7D%2C%22s%22%3A%5B%7B%22v%22%3A%7B%22v%22%3A%22LOC_MONUMENT%22%2C%22l%22%3A%22LOC_MONUMENT%22%7D%7D%5D%7D%2C%7B%22c%22%3A%7B%22type%22%3A%22list%22%2C%22name%22%3A%22_%20-%20spatial%20-%20value%20-%20value%3A%20judgment%22%2C%22columnName%22%3A%22_%20-%20spatial%20-%20value%20-%20value%22%2C%22expression%22%3A%22forNonBlank(cell.recon.judgment%2C%20v%2C%20v%2C%20if(isNonBlank(value)%2C%20%5C%22(unreconciled)%5C%22%2C%20%5C%22(blank)%5C%22))%22%2C%22omitBlank%22%3Afalse%2C%22omitError%22%3Afalse%2C%22selectBlank%22%3Afalse%2C%22selectError%22%3Afalse%2C%22invert%22%3Afalse%7D%2C%22o%22%3A%7B%22scroll%22%3Afalse%2C%22sort%22%3A%22name%22%7D%2C%22s%22%3A%5B%7B%22v%22%3A%7B%22v%22%3A%22(unreconciled)%22%2C%22l%22%3A%22(unreconciled)%22%7D%7D%5D%7D%2C%7B%22c%22%3A%7B%22type%22%3A%22list%22%2C%22name%22%3A%22record_spatial%3A%20judgment%22%2C%22columnName%22%3A%22record_spatial%22%2C%22expression%22%3A%22forNonBlank(cell.recon.judgment%2C%20v%2C%20v%2C%20if(isNonBlank(value)%2C%20%5C%22(unreconciled)%5C%22%2C%20%5C%22(blank)%5C%22))%22%2C%22omitBlank%22%3Afalse%2C%22omitError%22%3Afalse%2C%22selectBlank%22%3Afalse%2C%22selectError%22%3Afalse%2C%22invert%22%3Afalse%7D%2C%22o%22%3A%7B%22scroll%22%3Afalse%2C%22sort%22%3A%22name%22%7D%2C%22s%22%3A%5B%7B%22v%22%3A%7B%22v%22%3A%22none%22%2C%22l%22%3A%22none%22%7D%7D%5D%7D%2C%7B%22c%22%3A%7B%22type%22%3A%22list%22%2C%22name%22%3A%22beeldid%22%2C%22columnName%22%3A%22beeldid%22%2C%22expression%22%3A%22value%22%2C%22omitBlank%22%3Afalse%2C%22omitError%22%3Afalse%2C%22selectBlank%22%3Afalse%2C%22selectError%22%3Afalse%2C%22invert%22%3Afalse%7D%2C%22o%22%3A%7B%22sort%22%3A%22name%22%7D%2C%22s%22%3A%5B%7B%22v%22%3A%7B%22v%22%3A%2200029386%22%2C%22l%22%3A%2200029386%22%7D%7D%5D%7D%5D%7D

Edit : Mmh, no, the screen eventually appear after one minute.

@wetneb
Copy link
Sponsor Member

wetneb commented Dec 22, 2019

This issue seems to have deviated to a discussion about persistence of facets. That is indeed a very important topic I think, but we already have an issue for this: #560.

I think we should solve that rather than add the ability to create a project from another one, so I am tempted to close this issue and use #560 instead. Feel free to reopen if that does not work for you.

@wetneb wetneb closed this as completed Dec 22, 2019
@wetneb wetneb added Status: Duplicate Assigned to issues that are exact duplicates of other open issues and  avoid duplicate discussions. and removed Status: Pending Review Indicates that the issue or pull request is awaiting review by project maintainers or collaborators labels Dec 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
export Exporting a project to some format. Use the format-specific sub-label if available Status: Duplicate Assigned to issues that are exact duplicates of other open issues and  avoid duplicate discussions. Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements.
Projects
None yet
Development

No branches or pull requests

4 participants