-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add new option for spliting multi-valued cells: by transition between lowercase and uppercase, and number and text #2238
Comments
I changed the title of the issue ;-) It could be a new menu (like MS PowerQuery) or more options in the dialog box. I don't know what would be the best... |
Split by capitalisation was something I was asked by a collegue. See https://stackoverflow.com/questions/58845900/how-to-split-a-string-based-on-capitalized-initials |
Can I take up this issue? |
@lisa761 Thanks for volunteering Lisa! Assigned to you.
Agree? |
@thadguidry yup, okay |
Hi, So I was thinking would it be better if we provide an option for transition between all, i.e. one from lowercase to uppercase and another for uppercase to lowercase, one from letters to numbers and one from numbers to letters? Or we just give two options and then split whenever there is a transition from either uppercase or lowercase to lowercase or uppercase respectively. For eg. If the value was, say 'fooBarABABFooBar' Then split for both transitions, upper to lower and lower to upper case would be:
Only lower to uppercase would be:
Uppercase to lowercase would be:
So should we provide all the options or only the first one? |
Normal English capitalization rules tell us that only the 2 options I mentioned would be useful. As a general rule, we typically try not to over engineer until we know the usefulness based on user feedback (mailing lists, surveys, issues). |
The PowerQuery example given by the original requestor provides all four options, although it appears that they are mutually exclusive (they each appear to be a separate drop down menu pick). I'm not convinced that splitting on uppercase to lowercase transitions is very useful, but all four were originally requested. |
@tfmorris The split from uppercase then lowercase can still be useful in scientific domains where its much more prevalent than newsrooms. |
Is your feature request related to a problem or area of OpenRefine? Please describe.
I discoverd interesting options in MS PowerQuery. It could be nice to have the same in OR : split by transition between lowercase and uppercase, split by transition between number and text
Here is the menu (in french) in PowerQuery :
Describe the solution you'd like
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: