Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Blank Columns upon project creation/import #1750

Open
jenyoung opened this issue Sep 26, 2018 · 15 comments · May be fixed by #4757
Open

Remove Blank Columns upon project creation/import #1750

jenyoung opened this issue Sep 26, 2018 · 15 comments · May be fixed by #4757
Assignees
Labels
Priority: Medium Represents important issues that need to be addressed but are not urgent Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements.

Comments

@jenyoung
Copy link

Is your feature request related to a problem or area of OpenRefine? Please describe.

Creating/importing a project allows for the removal of blank rows. It would be great to also be able to remove blank columns.

Describe the solution you'd like
Have a similar functionality to Store blank rows with a Store blank columns option.

Additional context
This was discussed on the google group 9/19/2018.

@thadguidry thadguidry added the Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements. label Sep 26, 2018
@thadguidry
Copy link
Member

Dissimilar to #1472 the Remove Rows with empty cells

@thadguidry thadguidry added the Priority: Medium Represents important issues that need to be addressed but are not urgent label Feb 20, 2019
@thadguidry
Copy link
Member

@wetneb @ostephens could we squeeze this into 3.2 milestone?

@wetneb wetneb added the Good First Issue Indicates issues suitable for newcomers to design or coding, providing a gentle introduction. label Feb 20, 2019
@wetneb
Copy link
Sponsor Member

wetneb commented Feb 20, 2019

@thadguidry I won't have time to work on this soon, but if someone wants to work to implement this before 3.2 I'm happy to get that on board, yes.

@tommowlam
Copy link

tommowlam commented Oct 1, 2019

This will be particularly useful when importing XML into OpenRefine, where tags that are used for structure but have no direct content themselves, create lots of empty columns that have to then be 'manually' removed - eg:

    <poster>    
      <name>  
        <firstname>Tom</firstname>  
        <lastname>Mowlam</lastname>  
      </name>  
    </poster>

...gives empty 'poster' and 'name' columns on import.

@aquaruiz
Copy link

May I work on this bug?

@wetneb
Copy link
Sponsor Member

wetneb commented Mar 21, 2020

Sure, thanks a lot!

@aquaruiz
Copy link

Hey, I did some fixes but I don't get it how to rebuild to check if it works...
./refine build
./refine
Is this all?

@thadguidry
Copy link
Member

@aquaruiz Yes, that's how you can do it manually (if not using Maven config and running from directly in your IDE)

@cbedard
Copy link
Contributor

cbedard commented Jan 8, 2021

It doesn't look like this has been worked on. Can I work on it?

@thadguidry
Copy link
Member

Yes, I think so @CameronBedard , I don't see @aquaruiz ever made a PR for us to look at and review for merge.

@wsmmxmm
Copy link

wsmmxmm commented Mar 29, 2022

Hi team! I am an outreachy applicant. Please can I work on this issue if it is still open?

@wsmmxmm
Copy link

wsmmxmm commented Mar 29, 2022

Hi Team,

Congratulations to me because I run OpenRefine from the command line successfully!

Before I start, I would like to describe this feature and ask some questions I'm not sure about.

  • Functionality: Store blank columns ( it's similar to Store blank rows)
    image
  • Supported importer:
  • Tabular importer: This part needs to be added.
  • Tree-based importer: I'm not very sure if I need to do this because it's "the option would be less useful for the tree-based importers" from @wetneb in Added 'Store blank columns' importing feature  #3497 . Maybe later?
  • Testing: I'm not very sure how to do that by "be testing TabularImportingParserBase directly". I plan to create a few different types of documents to test first when I'm done.

@wetneb
Copy link
Sponsor Member

wetneb commented Mar 29, 2022

Congratulations for running OpenRefine! That is an excellent start :)
For tree-based importers, I guess they can still produce some empty columns, for instance if importing the following payload:

[
    {
       "foo": "1",
       "bar": null
    },
    {
        "foo": "2",
        "bar": null
    }
]

So I guess it is probably worth offering in tree importers as well, after all.

So this would end up being an option that would be available to basically all importers (except perhaps the line-based importer which only creates one column anyway). It feels like there should be a better way than manually adding the option to all those importers with the corresponding tests (which is a lot of work). I guess it should ideally be treated as a post-processing step independent of the importer, but we have not got any such notion of post-processing step.

With that in mind I would challenge the "good first issue" tag that I added myself - but that does not mean it is not doable either.

@wetneb wetneb removed the Good First Issue Indicates issues suitable for newcomers to design or coding, providing a gentle introduction. label Mar 29, 2022
@wsmmxmm
Copy link

wsmmxmm commented Mar 30, 2022

well, sorry I'm not familiar enough with structure and code at the moment, so I can't suggest more about "post-processing step". It sounds like something that requires a lot of planning for the overall process.
I think I could continue to do this by manually adding the option with notes(I mean →“//”). This way we can easily find this section if we need to further action about "post-processing step". Through this process I could also become more familiar with the structure of each importer. Any other advice?

@wetneb
Copy link
Sponsor Member

wetneb commented Mar 30, 2022

That sounds good to me!

wsmmxmm added a commit to wsmmxmm/OpenRefine that referenced this issue Apr 14, 2022
The changes are mainly divided into three parts:
1. Determine whether a column is blank: (based on OpenRefine#3497): Create a new list<Boolean> in the TabularImportingParserBase class, that represents whether each column has data.
2. If the store blank column is false, delete this column (only happens when store blank column == false):  removeColumn method is created in the ColumnModel class. This will use columns.remove(all empty columns), which will remove both data lines and header lines.
3. The checkbox of the store blank column added to the front-end page.
After testing with real files (csv, xls, tsv), this function works. I think it should work on other Tabular files as well, since I'm optimizing directly in the TabularImportingParserBase class. But I'm not really sure how to test in TabularImportingParserBase directly.
In addition, I've tested tree files (xml, json), and now it is possible to do blank columns. So I didn't make changes.
wsmmxmm added a commit to wsmmxmm/OpenRefine that referenced this issue Apr 14, 2022
Three parts of changes:
1. Determine whether a column is blank: (based on OpenRefine#3497): Create a new list<Boolean> in the TabularImportingParserBase class, that represents whether each column has data. 2. If the store blank column is false, delete this column (only happens when store blank column == false): removeColumn method is created in the ColumnModel class. This will use columns.remove(all empty columns), which will remove both data lines and header lines. 3. The checkbox of the store blank column added to the front-end page. After testing with real files (csv, xls, tsv), this function works. I think it should work on other Tabular files as well, since I'm optimizing directly in the TabularImportingParserBase class. But I'm not really sure how to test in TabularImportingParserBase directly. In addition, I've tested tree files (xml, json), and now it is possible to do blank columns. So I didn't make changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: Medium Represents important issues that need to be addressed but are not urgent Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements.
Projects
None yet
7 participants