Conversation
|
There are many places in the InputSystem source code where either Ideally we would do an audit of all occurances of ToLower or ToUpper on strings and chars in the source, and establish whether they should be culture-aware. I would guess that nearly all of them should not be. There are some tips for string comparisons here: https://learn.microsoft.com/en-us/dotnet/standard/base-types/best-practices-strings |
|
There are definitely a lot of places and likely just this fix is not sufficient, but I'm struggling to find a good strategy how to fix all of them, maybe just by using IDE "find all" tool? |
|
Yes I think simply a search in Visual Studio etc. for "ToUpper" and "ToLower". The number of occurrences is high enough that it's more than just a 5 minute job but not so large as to be infeasible. |
| // thus be shorter than matchTo and still match. | ||
|
|
||
| var matchToLowerCase = matchTo.ToLower(); | ||
| var matchToStringUpper = matchTo.ToString().ToUpperInvariant(); |
There was a problem hiding this comment.
The ToString() will cost a GC allocation as it needs to create a new temporary string.
I looked at matchTo and why you couldn't just to ToUpperInvariant and I found it's because it's an InternedString. The docs say this is a special string class designed exactly for what you are doing here, case-insensitive comparisons.
However that class looks like it needs to be updated too as it might also be broken for turkish letters. Maybe it makes sense for it to store internally in uppercase instead and then you wouldn't need ToString() here either. I'm not sure how much InternedString is used, so it might be quite a bit of work.
There was a problem hiding this comment.
this is a lot of work. changing InternedString is a big thing because it entails changes at many spots
lyndon-unity
left a comment
There was a problem hiding this comment.
Added suggestions on comparison without moving to uppercase or altering InternedString
| return true; // Wildcard at end of string so rest is matched. | ||
|
|
||
| ++posInStr; | ||
| nextChar = char.ToLower(str[posInStr]); |
There was a problem hiding this comment.
Could a fix be to convert to pass CultureInfo.InvariantCulture
E.g.
nextChar = char.ToLower(str[posInStr], CultureInfo.InvariantCulture);
or (if version allowed) :
nextChar = char.ToLowerInvariant(str[posInStr]);
and a few lines down
There was a problem hiding this comment.
ToUpperInvariant is recommended here: https://learn.microsoft.com/en-us/dotnet/standard/base-types/best-practices-strings
I believe going in the other direction (lower) creates issues.
There was a problem hiding this comment.
Thanks James. I missed that comment in the Microsoft documents. I mainly suggested this as it seemed we were missing the invariantCulture altogether but agree we may need the upper case for a more complete fix.
There was a problem hiding this comment.
ToUpperInvariant is using Invariant Culture rules so that should be enough without needing to pass CultureInfo.
There was a problem hiding this comment.
@lyndon-unity @jamesmcgill I just updated whole branch to only use culture invariant char comparisons, this fixes the bug without changing to ToUpperInvariant(). I would suggest closing the ticket (as the bug is fixed) at this state and creating a new refactoring ticket for clean up the string ToLowerInvariant -> ToUpperInvariant situation. Maybe it even makes sense to do it in a bigger scale where we look at all our string storing and comparison situation
There was a problem hiding this comment.
I agree this is a good step and fixes the current but. We have captured the future step in a task here:
https://jira.unity3d.com/browse/ISX-1380
| if (posInMatchTo == matchToLength) | ||
| return false; // Matched all the way to end of matchTo but there's more in str after the wildcard. | ||
| } | ||
| else if (char.ToLower(nextChar) != matchToLowerCase[posInMatchTo]) |
| } | ||
|
|
||
| var charInComponent = component[indexInComponent]; | ||
| if (charInComponent == nextCharInPath || char.ToLower(charInComponent) == char.ToLower(nextCharInPath)) |
| var first = firstList[startIndexInFirst + i]; | ||
| var second = secondList[startIndexInSecond + i]; | ||
|
|
||
| if (char.ToLower(first) != char.ToLower(second)) |
There was a problem hiding this comment.
and one more as above
| // thus be shorter than matchTo and still match. | ||
|
|
||
| var matchToLowerCase = matchTo.ToLower(); | ||
| var matchToStringUpper = matchTo.ToString().ToUpperInvariant(); |
There was a problem hiding this comment.
I'm wondering why InternedString returns str.toString rather than str.m_StringOriginalCase
The later could avoid the GC allocation ?
public static implicit operator string(InternedString str)
There was a problem hiding this comment.
But agree switching to uppercase internals in InternedString seems the right thing to do
3027763 to
1f76800
Compare
lyndon-unity
left a comment
There was a problem hiding this comment.
I'm happy with this as a fix for the immediate issue and then raising a separate task (not bug) on the backlog to refactor wider to use the upper case version which is a better long term fix but a bigger change which we don't have time for this sprint.
However there is a test failure outstanding
IntegrationTests.Integration_CanSendAndReceiveEvents
lyndon-unity
left a comment
There was a problem hiding this comment.
Overall I'm happy to proceed with this fix but ideally would like to see a new test case added to the input system test suite to make sure we do not regress with future changes.
After writing the test for this issue I came across the "real problem":
So only by using the big 'I' the wrong conversion happens. A normal 'i' exists as well in the turkish language. The two big 'I's' do not differ from each other, that's where the problem arises. |
James is off sick and we agreed to land this PR to follow up on the discussed points in another ticket later
Description
Peter Pimley observes this bug, he suggested someone else of the team should look over it because there could be more similar cases where bugs like this appear. So it would be great if the reviewer could validate that this solution is appropriate
for all cases that may come up.
Changes made
FIXED: Changed char comparison to use ToLower(CultureInfo) or ToLowerInvariant() to avoid problems with special characters like the turkish letter i.
ADDED: Test for turkish culture
Link to the Bug
Link to the user Ticket
Notes
Due to the unfortunate case that James is sick the PR gets merged without his approval.
Checklist
Before review:
Changed,Fixed,Addedsections.([case %number%](https://issuetracker.unity3d.com/issues/...)).Area_CanDoX,Area_CanDoX_EvenIfYIsTheCase,Area_WhenIDoX_AndYHappens_ThisIsTheResult.During merge:
NEW: ___.FIX: ___.DOCS: ___.CHANGE: ___.RELEASE: 1.1.0-preview.3.