Remove Blank Columns upon project creation/import #1750

jenyoung · 2018-09-26T19:36:34Z

Is your feature request related to a problem or area of OpenRefine? Please describe.

Creating/importing a project allows for the removal of blank rows. It would be great to also be able to remove blank columns.

Describe the solution you'd like
Have a similar functionality to Store blank rows with a Store blank columns option.

Additional context
This was discussed on the google group 9/19/2018.

The text was updated successfully, but these errors were encountered:

thadguidry · 2018-09-26T21:05:13Z

Dissimilar to #1472 the Remove Rows with empty cells

thadguidry · 2019-02-20T14:50:11Z

@wetneb @ostephens could we squeeze this into 3.2 milestone?

wetneb · 2019-02-20T15:42:48Z

@thadguidry I won't have time to work on this soon, but if someone wants to work to implement this before 3.2 I'm happy to get that on board, yes.

tommowlam · 2019-10-01T16:33:00Z

This will be particularly useful when importing XML into OpenRefine, where tags that are used for structure but have no direct content themselves, create lots of empty columns that have to then be 'manually' removed - eg:

    <poster>    
      <name>  
        <firstname>Tom</firstname>  
        <lastname>Mowlam</lastname>  
      </name>  
    </poster>

...gives empty 'poster' and 'name' columns on import.

aquaruiz · 2020-03-21T20:07:16Z

May I work on this bug?

wetneb · 2020-03-21T21:14:29Z

Sure, thanks a lot!

aquaruiz · 2020-03-22T18:26:45Z

Hey, I did some fixes but I don't get it how to rebuild to check if it works...
./refine build
./refine
Is this all?

thadguidry · 2020-03-22T19:13:11Z

@aquaruiz Yes, that's how you can do it manually (if not using Maven config and running from directly in your IDE)

cbedard · 2021-01-08T20:36:14Z

It doesn't look like this has been worked on. Can I work on it?

thadguidry · 2021-01-08T22:03:08Z

Yes, I think so @CameronBedard , I don't see @aquaruiz ever made a PR for us to look at and review for merge.

wsmmxmm · 2022-03-29T12:24:29Z

Hi team! I am an outreachy applicant. Please can I work on this issue if it is still open?

wsmmxmm · 2022-03-29T16:04:53Z

Hi Team,

Congratulations to me because I run OpenRefine from the command line successfully!

Before I start, I would like to describe this feature and ask some questions I'm not sure about.

Functionality: Store blank columns ( it's similar to Store blank rows)
Supported importer:
Tabular importer: This part needs to be added.
Tree-based importer: I'm not very sure if I need to do this because it's "the option would be less useful for the tree-based importers" from @wetneb in Added 'Store blank columns' importing feature #3497 . Maybe later?
Testing: I'm not very sure how to do that by "be testing TabularImportingParserBase directly". I plan to create a few different types of documents to test first when I'm done.

wetneb · 2022-03-29T19:49:44Z

Congratulations for running OpenRefine! That is an excellent start :)
For tree-based importers, I guess they can still produce some empty columns, for instance if importing the following payload:

[
    {
       "foo": "1",
       "bar": null
    },
    {
        "foo": "2",
        "bar": null
    }
]

So I guess it is probably worth offering in tree importers as well, after all.

So this would end up being an option that would be available to basically all importers (except perhaps the line-based importer which only creates one column anyway). It feels like there should be a better way than manually adding the option to all those importers with the corresponding tests (which is a lot of work). I guess it should ideally be treated as a post-processing step independent of the importer, but we have not got any such notion of post-processing step.

With that in mind I would challenge the "good first issue" tag that I added myself - but that does not mean it is not doable either.

wsmmxmm · 2022-03-30T03:30:48Z

well, sorry I'm not familiar enough with structure and code at the moment, so I can't suggest more about "post-processing step". It sounds like something that requires a lot of planning for the overall process.
I think I could continue to do this by manually adding the option with notes（I mean →“//”). This way we can easily find this section if we need to further action about "post-processing step". Through this process I could also become more familiar with the structure of each importer. Any other advice?

wetneb · 2022-03-30T06:39:55Z

That sounds good to me!

The changes are mainly divided into three parts: 1. Determine whether a column is blank: (based on OpenRefine#3497): Create a new list<Boolean> in the TabularImportingParserBase class, that represents whether each column has data. 2. If the store blank column is false, delete this column (only happens when store blank column == false): removeColumn method is created in the ColumnModel class. This will use columns.remove(all empty columns), which will remove both data lines and header lines. 3. The checkbox of the store blank column added to the front-end page. After testing with real files (csv, xls, tsv), this function works. I think it should work on other Tabular files as well, since I'm optimizing directly in the TabularImportingParserBase class. But I'm not really sure how to test in TabularImportingParserBase directly. In addition, I've tested tree files (xml, json), and now it is possible to do blank columns. So I didn't make changes.

Three parts of changes: 1. Determine whether a column is blank: (based on OpenRefine#3497): Create a new list<Boolean> in the TabularImportingParserBase class, that represents whether each column has data. 2. If the store blank column is false, delete this column (only happens when store blank column == false): removeColumn method is created in the ColumnModel class. This will use columns.remove(all empty columns), which will remove both data lines and header lines. 3. The checkbox of the store blank column added to the front-end page. After testing with real files (csv, xls, tsv), this function works. I think it should work on other Tabular files as well, since I'm optimizing directly in the TabularImportingParserBase class. But I'm not really sure how to test in TabularImportingParserBase directly. In addition, I've tested tree files (xml, json), and now it is possible to do blank columns. So I didn't make changes.

thadguidry added the Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements. label Sep 26, 2018

ostephens mentioned this issue Feb 20, 2019

CSV Import - Store Blank Columns Checkbox #1965

Closed

thadguidry added the Priority: Medium Represents important issues that need to be addressed but are not urgent label Feb 20, 2019

wetneb added the Good First Issue Indicates issues suitable for newcomers to design or coding, providing a gentle introduction. label Feb 20, 2019

wetneb assigned aquaruiz Mar 21, 2020

antoine2711 mentioned this issue Mar 31, 2020

Add a conditional to "remove column" operation #2481

Open

thadguidry assigned cbedard and unassigned aquaruiz Jan 8, 2021

cbedard mentioned this issue Jan 18, 2021

Added 'Store blank columns' importing feature #3497

Closed

wetneb unassigned cbedard Feb 11, 2022

wetneb assigned wsmmxmm Mar 29, 2022

wetneb removed the Good First Issue Indicates issues suitable for newcomers to design or coding, providing a gentle introduction. label Mar 29, 2022

wsmmxmm linked a pull request Apr 14, 2022 that will close this issue

(I#1750) Option to remove Blank Columns upon project creation #4757

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove Blank Columns upon project creation/import #1750

Remove Blank Columns upon project creation/import #1750

jenyoung commented Sep 26, 2018

thadguidry commented Sep 26, 2018

thadguidry commented Feb 20, 2019

wetneb commented Feb 20, 2019

tommowlam commented Oct 1, 2019 •

edited

aquaruiz commented Mar 21, 2020

wetneb commented Mar 21, 2020

aquaruiz commented Mar 22, 2020

thadguidry commented Mar 22, 2020

cbedard commented Jan 8, 2021

thadguidry commented Jan 8, 2021

wsmmxmm commented Mar 29, 2022

wsmmxmm commented Mar 29, 2022 •

edited

wetneb commented Mar 29, 2022

wsmmxmm commented Mar 30, 2022

wetneb commented Mar 30, 2022

Remove Blank Columns upon project creation/import #1750

Remove Blank Columns upon project creation/import #1750

Comments

jenyoung commented Sep 26, 2018

thadguidry commented Sep 26, 2018

thadguidry commented Feb 20, 2019

wetneb commented Feb 20, 2019

tommowlam commented Oct 1, 2019 • edited

aquaruiz commented Mar 21, 2020

wetneb commented Mar 21, 2020

aquaruiz commented Mar 22, 2020

thadguidry commented Mar 22, 2020

cbedard commented Jan 8, 2021

thadguidry commented Jan 8, 2021

wsmmxmm commented Mar 29, 2022

wsmmxmm commented Mar 29, 2022 • edited

wetneb commented Mar 29, 2022

wsmmxmm commented Mar 30, 2022

wetneb commented Mar 30, 2022

tommowlam commented Oct 1, 2019 •

edited

wsmmxmm commented Mar 29, 2022 •

edited