New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove Blank Columns upon project creation/import #1750
Comments
Dissimilar to #1472 the Remove Rows with empty cells |
@wetneb @ostephens could we squeeze this into 3.2 milestone? |
@thadguidry I won't have time to work on this soon, but if someone wants to work to implement this before 3.2 I'm happy to get that on board, yes. |
This will be particularly useful when importing XML into OpenRefine, where tags that are used for structure but have no direct content themselves, create lots of empty columns that have to then be 'manually' removed - eg:
...gives empty 'poster' and 'name' columns on import. |
May I work on this bug? |
Sure, thanks a lot! |
Hey, I did some fixes but I don't get it how to rebuild to check if it works... |
@aquaruiz Yes, that's how you can do it manually (if not using Maven config and running from directly in your IDE) |
It doesn't look like this has been worked on. Can I work on it? |
Yes, I think so @CameronBedard , I don't see @aquaruiz ever made a PR for us to look at and review for merge. |
Hi team! I am an outreachy applicant. Please can I work on this issue if it is still open? |
Hi Team, Congratulations to me because I run OpenRefine from the command line successfully! Before I start, I would like to describe this feature and ask some questions I'm not sure about.
|
Congratulations for running OpenRefine! That is an excellent start :)
So I guess it is probably worth offering in tree importers as well, after all. So this would end up being an option that would be available to basically all importers (except perhaps the line-based importer which only creates one column anyway). It feels like there should be a better way than manually adding the option to all those importers with the corresponding tests (which is a lot of work). I guess it should ideally be treated as a post-processing step independent of the importer, but we have not got any such notion of post-processing step. With that in mind I would challenge the "good first issue" tag that I added myself - but that does not mean it is not doable either. |
well, sorry I'm not familiar enough with structure and code at the moment, so I can't suggest more about "post-processing step". It sounds like something that requires a lot of planning for the overall process. |
That sounds good to me! |
The changes are mainly divided into three parts: 1. Determine whether a column is blank: (based on OpenRefine#3497): Create a new list<Boolean> in the TabularImportingParserBase class, that represents whether each column has data. 2. If the store blank column is false, delete this column (only happens when store blank column == false): removeColumn method is created in the ColumnModel class. This will use columns.remove(all empty columns), which will remove both data lines and header lines. 3. The checkbox of the store blank column added to the front-end page. After testing with real files (csv, xls, tsv), this function works. I think it should work on other Tabular files as well, since I'm optimizing directly in the TabularImportingParserBase class. But I'm not really sure how to test in TabularImportingParserBase directly. In addition, I've tested tree files (xml, json), and now it is possible to do blank columns. So I didn't make changes.
Three parts of changes: 1. Determine whether a column is blank: (based on OpenRefine#3497): Create a new list<Boolean> in the TabularImportingParserBase class, that represents whether each column has data. 2. If the store blank column is false, delete this column (only happens when store blank column == false): removeColumn method is created in the ColumnModel class. This will use columns.remove(all empty columns), which will remove both data lines and header lines. 3. The checkbox of the store blank column added to the front-end page. After testing with real files (csv, xls, tsv), this function works. I think it should work on other Tabular files as well, since I'm optimizing directly in the TabularImportingParserBase class. But I'm not really sure how to test in TabularImportingParserBase directly. In addition, I've tested tree files (xml, json), and now it is possible to do blank columns. So I didn't make changes.
Is your feature request related to a problem or area of OpenRefine? Please describe.
Creating/importing a project allows for the removal of blank rows. It would be great to also be able to remove blank columns.
Describe the solution you'd like
Have a similar functionality to
Store blank rows
with aStore blank columns
option.Additional context
This was discussed on the google group 9/19/2018.
The text was updated successfully, but these errors were encountered: