Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add new option for spliting multi-valued cells: by transition between lowercase and uppercase, and number and text #2238

Closed
msaby opened this issue Dec 5, 2019 · 10 comments · Fixed by #2471
Assignees
Labels
Module: Frontend These issues involve working on HTML, CSS, and JavaScript code that affects the user interface. Theme: UX/Usability Focuses on issues related to improving the overall user experience and interaction flow. Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements.
Milestone

Comments

@msaby
Copy link
Member

msaby commented Dec 5, 2019

Is your feature request related to a problem or area of OpenRefine? Please describe.

I discoverd interesting options in MS PowerQuery. It could be nice to have the same in OR : split by transition between lowercase and uppercase, split by transition between number and text

Here is the menu (in french) in PowerQuery :

image

Describe the solution you'd like

Describe alternatives you've considered

Additional context

@thadguidry
Copy link
Member

You mentioned in the title that "for spliting columns"...
But I'm wondering if you meant splitting multi-valued cells, instead?
If so, you would like to see this box made a bit bigger and have some additional "radio button" options?

Annotation 2019-12-05 084009

@msaby msaby changed the title add new option for spliting columns : by transition between lowercase and uppercase, and number and text add new option for spliting multi-valued cells: by transition between lowercase and uppercase, and number and text Dec 5, 2019
@msaby
Copy link
Member Author

msaby commented Dec 5, 2019

I changed the title of the issue ;-)

It could be a new menu (like MS PowerQuery) or more options in the dialog box. I don't know what would be the best...

@msaby
Copy link
Member Author

msaby commented Dec 5, 2019

Split by capitalisation was something I was asked by a collegue. See https://stackoverflow.com/questions/58845900/how-to-split-a-string-based-on-capitalized-initials

@thadguidry thadguidry added Module: Frontend These issues involve working on HTML, CSS, and JavaScript code that affects the user interface. Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements. Theme: UX/Usability Focuses on issues related to improving the overall user experience and interaction flow. labels Dec 5, 2019
@lisa761
Copy link
Member

lisa761 commented Mar 20, 2020

Can I take up this issue?

@thadguidry
Copy link
Member

@lisa761 Thanks for volunteering Lisa! Assigned to you.
I see this new feature as expanding the existing Split multi-valued cells dialog with 2 additional radio buttons:

  • Split between a Lowercase and Uppercase letter
  • Split between a Number and Letter (and vice versa)

Agree?

@lisa761
Copy link
Member

lisa761 commented Mar 20, 2020

@thadguidry yup, okay

@lisa761
Copy link
Member

lisa761 commented Mar 23, 2020

Hi, So I was thinking would it be better if we provide an option for transition between all, i.e. one from lowercase to uppercase and another for uppercase to lowercase, one from letters to numbers and one from numbers to letters? Or we just give two options and then split whenever there is a transition from either uppercase or lowercase to lowercase or uppercase respectively.

For eg. If the value was, say 'fooBarABABFooBar'

Then split for both transitions, upper to lower and lower to upper case would be:

foo B ar ABABF oo B ar

Only lower to uppercase would be:

foo Bar ABABFoo Bar

Uppercase to lowercase would be:

fooB arABABF ooB ar

So should we provide all the options or only the first one?

@thadguidry
Copy link
Member

Normal English capitalization rules tell us that only the 2 options I mentioned would be useful.
This issue is just to have some quick menu options to support those 2 common use cases. If users have other use cases, they can resort to having the power of GREL and other languages we support. If users REALLY want to have additional options in the menus, they can request an enhancement issue and then upvote.

As a general rule, we typically try not to over engineer until we know the usefulness based on user feedback (mailing lists, surveys, issues).

@tfmorris
Copy link
Member

so I was thinking would it be better if we provide an option for transition between all, i.e. one from lowercase to uppercase and another for uppercase to lowercase, one from letters to numbers and one from numbers to letters?

The PowerQuery example given by the original requestor provides all four options, although it appears that they are mutually exclusive (they each appear to be a separate drop down menu pick).

I'm not convinced that splitting on uppercase to lowercase transitions is very useful, but all four were originally requested.

@thadguidry
Copy link
Member

thadguidry commented Mar 31, 2020

@tfmorris The split from uppercase then lowercase can still be useful in scientific domains where its much more prevalent than newsrooms.
Small example:
kPa = [k, Pa] = [kilo, pascal] = [1000 pascals]
and crap like that. (inverse example, but you get the idea) ;-)

@tfmorris tfmorris added this to the 3.5 milestone Jul 17, 2020
@wetneb wetneb mentioned this issue Apr 24, 2021
16 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Module: Frontend These issues involve working on HTML, CSS, and JavaScript code that affects the user interface. Theme: UX/Usability Focuses on issues related to improving the overall user experience and interaction flow. Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants