Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non printable characters #2350

Closed

Conversation

AghaSaad04
Copy link
Member

@AghaSaad04 AghaSaad04 commented Mar 1, 2020

Signed-off-by: Agha Saad Fraz agha.saad04@gmail.com
I have added a feature of displaying non-printable characters. It is a fix for #1286.

Steps:

  • Import the messy data
  • Create the project
  • Toggling the non-printable character checkbox would show/hide the non-printable characters

Demo:

Data grid without non-printable characters:

image

Data grid with non-printable characters:

image

@thadguidry
Copy link
Member

@AghaSaad04 Can you change that first DEMO image to without using the Grammerly plugin? (Just disable it then screenshot again)
I'm pulling this in now and taking a look and testing.

@AghaSaad04
Copy link
Member Author

Done @thadguidry

@thadguidry
Copy link
Member

thadguidry commented Mar 1, 2020

Ah, we also really needed this in the Project preview (prior to Create Project)
How difficult would that be to add the option there also?
The reason is because older data can use those Control Chars as separators (and often did!), and we'd like users to see them and could type in the Control Char(s) as a custom separator in the CSV/TSV/separator-based Importer.

image

@thadguidry
Copy link
Member

thadguidry commented Mar 1, 2020

This is a good start, but it's not fully mapped yet? I typed in Control Chars using Windows Key + ALT + numpad 0001 - 0031 on various lines inside Notepad++ on Windows 10. (You can't do 0000, that's True Null so don't worry about it and where NULL injection security issues comes from by the way :-)

SOURCE:
Test File: ControlCharsTest.txt

image

RESULT:
image

@@ -356,3 +356,16 @@ ul.sorting-dialog-blank-error-positions > li {
cursor: move;
.rounded_corners(3px);
}

.interfaceForTextArea{
font: 400 13.3333px Arial;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, wondering if we should actually prefer first some Unicode fonts (Arial Unicode,etc.) in preference order, and then Arial ? Do popular Linux distros have them? What about Mac?

@AghaSaad04
Copy link
Member Author

@thadguidry Firstly thanks for reviewing, and yeah it is not fully mapped yet. That was a sort of prototype, I just wanted to make sure that the feature I was implementing was what you guys wanted. I will look into the Project preview part. Hopefully, that would not be tough. I will fix that typo. :)

@thadguidry
Copy link
Member

@AghaSaad04 Great! Understood this is a prototype to see what was possible. Now that we know a bit more, we can begin to make improvements. To reduce repetition, I'd improve the code through the use of .map, arrow functions and perhaps for..of. Experiment.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/map
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Functions

@AghaSaad04
Copy link
Member Author

Yeah, @thadguidry. Working on it to make it efficient and optimize.

@thadguidry
Copy link
Member

@AghaSaad04 I added my simple test file above the image in previous comments, if you want to use it or adapt it for test cases, up to you.

@AghaSaad04
Copy link
Member Author

Perfect Thanks @thadguidry.
I was thinking of creating an array for control characters and then finding the character code in that array.
A better-optimized version.
image

Imported the test file:
image

Result:
image

@thadguidry
Copy link
Member

thadguidry commented Mar 1, 2020

@AghaSaad04 Would a dictionary make things easier? ["SOH":"0001","STX":"0002"...]

UPDATE: nevermind, the mapping is implied through the array index of 0-31. You and I know that but others will have to figure this out without a bit of code commenting, so... :)

@thadguidry
Copy link
Member

Also, I'd replace char(9) HT with TAB, since this is a quite common convention worldwide, and char(11) is vertical tab and lesser known, so its fine to remain as VT.

@thadguidry
Copy link
Member

thadguidry commented Mar 1, 2020

One last thing, that I think would be useful to users....
SPACE, where @ostephens and I discussed this on the issue and both thought it probably would be useful to show subtly. Perhaps as Notepad++ and other text editing tools do by drawing white space with a single middle dot sometimes or other glyph.
SPACE is not in the Control Char range, but instead the start of the ASCII standard, it's printable, char(32), but hidden from our eyes often (2 different things, right :-) )

image

Sublime and other IDE's have the ability to toggle rendering white space in preferences.
I think we should have a 2nd toggle to do just that, but can be done in a separate issue if need be.

Thoughts? Do you think the existing div#tool-panel has enough room? Looks like it does for something simple like [ ] draw white space

@thadguidry
Copy link
Member

Lastly, the label "non-printable characters" is not accurate, since many Unicode chars also do not print with a glyph such as U+A0 and U+200B and U+FEFF , etc.
I think we need to be more direct and say "show control codes".

@wetneb
Copy link
Sponsor Member

wetneb commented Mar 1, 2020

I would not add a "non printable characters" textbox at this place in the UI, I think it is way too visible. This is not something that people will need to toggle often - better keep that precious space for other things (such as extensions).

I think the option should be enabled by default and it should be possible to disable it in some menu, for instance in the preferences.

@thadguidry
Copy link
Member

thadguidry commented Mar 1, 2020

@wetneb This was discussed before on the issue... Owen and Ettore also thought having a quick toggle was useful. But I understand moving it outside of div#tool-panel probably would be best. Perhaps here to div#viewpanel-header? and label it "show control codes".
image

@wetneb
Copy link
Sponsor Member

wetneb commented Mar 1, 2020

Owen and Ettore also thought having a quick toggle was useful.

Where? I can't find that in #1286.

@thadguidry
Copy link
Member

thadguidry commented Mar 1, 2020

@wetneb maybe was another related issue or the mailing list. We can ask them to comment in here. Anyways, I think much more useful will be #2351 for most users. But then after using the proposed facet, they will want to quickly visualize those control codes in their rows because they will be asking..."really? I have control codes in those rows? WHERE? show me."

@AghaSaad04
Copy link
Member Author

Ah, we also really needed this in the Project preview (prior to Create Project)
How difficult would that be to add the option there also?
The reason is because older data can use those Control Chars as separators (and often did!), and we'd like users to see them and could type in the Control Char(s) as a custom separator in the CSV/TSV/separator-based Importer.

image

@thadguidry I made some changes and got these results.
image

But I am still trying to figure out when a user switches to another parsing format, it has to be unchecked and then checked again for the functionality to work.

Once the data parsing format is changed, which function renders the data?

@thadguidry
Copy link
Member

thadguidry commented Mar 4, 2020

@AghaSaad04 Don't know, but you should be able to watch the events, or network calls, to find that out? Each importer also has an Update Preview button. No rush on this, you are doing great!

Copy link
Sponsor Member

@wetneb wetneb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @AghaSaad04 for the delay in reviewing this.
I think this is a great PR which should be massively useful to users - thank you for that!

As mentioned above, I don't think this deserves to have its own checkbox in such a prominent location, both in the preview and grid mode. I think this display mode should be enabled by default and could be disabled in the preferences.

In terms of architecture, it would also be better to insert the unprintable character rendering code in the existing cell rendering code (for the grid UI, for the preview, facet values, expression preview…) rather than fixing the DOM after rendering using jQuery selectors (for instance to make a migration to a reactive framework easier).

@thadguidry
Copy link
Member

I think prominent is important on this, it has affected many users. But I guess what Antonin is saying is because of that, let's just make it default on and hide the toggle in our preferences. I suspect we'll see later after release if users want to have that toggle exposed in the UI or not. (my hunch still says yes, but we'll see what the majority of users want and let them vote and speak for themselves, we trust them)

@ostephens
Copy link
Sponsor Member

I'm in favour with this being the default view and controlled in preferences (as described by @wetneb)

If we want to have a place inside the project to control the view, the All->View menu would seem like the most appropriate place right now - but I prefer the idea of it being set in preferences as I think it's likely to be something people want to set across all projects

@AghaSaad04
Copy link
Member Author

So I just googled a bit about this problem, this is a bug in Firefox which has been reported in bug_zilla for more than 10 years. There are workarounds for this problem, will implement them.

@AghaSaad04
Copy link
Member Author

AghaSaad04 commented Mar 18, 2020

There are some libraries I tried, unfortunately, they are giving the same issues in firefox :(

@thadguidry
Copy link
Member

thadguidry commented Mar 18, 2020

Would this be an easier way for now? https://lonekorean.github.io/highlight-within-textarea/
Maybe reaching out to Will himself to get some advice? https://codersblock.com/

I basically see a lot of options when doing this kind of Google search: how to highlight text inside a textarea

Did you try this one? https://markjs.io/

@AghaSaad04
Copy link
Member Author

AghaSaad04 commented Mar 18, 2020

Would this be an easier way for now? https://lonekorean.github.io/highlight-within-textarea/

The issue with this approach is that this plugin will not only highlight the control characters but also those words having control characters in them:

Examples:
TABLET
ACKNOWLEDGE

and yeah googling for alternatives.

@thadguidry
Copy link
Member

yes, and couldn't you "adapt" that code to work with ? or ask Will how one would go about that? perhaps opening an issue with his library to support that functionality?

Copy link
Sponsor Member

@wetneb wetneb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This issue is pretty involved, congrats for your tenacity.

Importing Thad's test file, I get this display with Chromium:
image

In Firefox, this renders correctly, but I have another problem:
image
It is impossible to add characters after the BEL, or to erase the BEL character itself (which should be a pretty common use case, I think). More precisely, if I click to the right of BEL, I do not get a cursor there (as if the field was not editable).

main/webapp/modules/core/.fuse_hidden000008ab00000008 Outdated Show resolved Hide resolved
@AghaSaad04
Copy link
Member Author

AghaSaad04 commented Mar 20, 2020

@wetneb, thanks for reviewing.

For Firefox:

It's a bug in firefox you cannot add after the contenteditable="false" element if it is at the end of the text. Also, you cannot delete the contenteditable="false".

For chromium I will review again.
I am looking into the alternatives.

@AghaSaad04
Copy link
Member Author

So I just reviewed the Chromium issue, and got to know the localStorage function I am using is not supported by Chromium :/

@wetneb
Copy link
Sponsor Member

wetneb commented Mar 20, 2020

I would not worry about caching the result of a command in local storage - you could run into issues of the cache being out sync with the preferences, and you can generally assume such a command is very quick, since the server is generally running locally.

If we want to minimize the number of queries (which could be very useful for hosted versions), we could refactor all calls fetching preferences such that all preferences are fetched just once and stored in a global scope (without caching them in local storage).

Anyway I think the problem you are tackling here is hard enough - don't make your life unnecessarily complicated with that sort of detail (that is a form of premature optimization!)

Signed-off-by: Agha Saad Fraz <agha.saad04@gmail.com>
@@ -777,3 +785,162 @@ function tags_to_control(str) {
}
return str;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, that's a serious amount of custom code, just for FF support! Maybe this is a code smell that we are doing something wrong in our approach? Don't know, not an expert.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure though. As there are some plugins I tried them as well those were not working on FF due to this backspace issue.

@@ -37,6 +37,8 @@ Refine.PreviewTable = function(projectData, elmt) {
this._render();
};

var controlCharacters = ["NUL", "SOH", "STX", "ETX", "EOT", "ENQ", "ACK", "BEL", "BS", "TAB", "LF", "VT", "FF", "CR", "SO", "SI", "DLE", "DC1", "DC2", "DC3", "DC4", "NAK", "SYN", "ETB", "CAN", "EM", "SUB", "ESC", "FS", "GS", "RS", "US", "SPACE"];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's replace SPACE with just SP to reduce verbosity

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, was thinking that as well. I will replace that now.

var charCode = cell.v.charAt(character).charCodeAt(0);
if (charCode <= 32) {
var size = (controlCharacters[charCode].length + 2) * 2;
unprintableChar = "<input type='button' class='unprintableCharacters' disabled='disabled' width=" + size + "% value=" + controlCharacters[charCode] + ">";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it looks much better if change unprintableCharacters to have a style="padding-left: 2px;padding-right: 2px;" added. Otherwise it seems there's a default padding of 8 on both sides and makes it too stretched?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I will update the styling.

@@ -436,3 +436,10 @@ p.body-text {
height: 100%;
overflow: hidden;
}

.unprintableCharacters{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, for some reason, I think the margin-top is getting accumulated? I preferred a setting of margin-top: -3px;

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double check all that in both FF and Chrome.

@tfmorris
Copy link
Member

This clearly represents a ton of work (and review), so I'm reluctant to close it, but there haven't been any updates in three months.

If @AghaSaad04 isn't available to push it over the finish line, is it close enough that someone else could pick it up? (I haven't looked at it myself, so asking @thadguidry and @wetneb who seem to have done most of the review).

@thadguidry
Copy link
Member

@tfmorris As a user, I've been pretty happy with it, and have a build with it merged into 3.3 myself that I have been using on occasion to have this long-standing needed feature in OpenRefine. It's quite a useful thing actually and I have it defaulted on in preferences on that custom build. I haven't had any problems other than a tiny small alignment issue between Chrome and Firefox, but its so minor and I bet @antoine2711 could find the issue on that sometime later after merge (or before if he can).

Anyways, that's my take on testing and using it. Have no idea if it's performance bottle-necking for @lisa761 's ongoing work and didn't check all that, only against 3.3 tag and merge/build. So I'd say it definitely needs more testing against master here.

@wetneb
Copy link
Sponsor Member

wetneb commented Jun 25, 2020

Yes this is impressive work. On my side I am not sure we can pull this confidently, because of the complexity of the code to support editing.

Perhaps one way forward would be to highlight non-printable characters only in non-editable locations first, and leave the support in form inputs for a follow-up PR.

@thadguidry
Copy link
Member

@wetneb I agree with the way forward on this. Now the question is who can make that happen?
@antoine2711 Do you think you might be able to volunteer an afternoon towards that, if @AghaSaad04 could not?

@thadguidry
Copy link
Member

I am now wondering...

For rendering in the Expression Editor:
I wonder if the best way forward would be to restart this and instead incorporate ACE to handle the majority of the code editing experience within our Expression Editor (without having to maintain our own custom code for highlighting hidden characters, syntax highlighting, folding, indenting, etc.)

For rendering in the data grid:
I do not know if ACE (or a smaller part of it), could also be used to highlight hidden characters there?

(Syntax Highlighting in ACE is actually done with a common standard now in place that nearly all Javascript code editors use and developed by PCWalton who worked with Mozilla Labs and Rust)

@wetneb
Copy link
Sponsor Member

wetneb commented Jan 27, 2021

I am going to close this as stale, given that there has not been any activity for many months.

I think what went wrong here is that @thadguidry did a lot of end-user reviewing upfront, but developers did not have time to review the PR code-wise. That pushed @AghaSaad04 to make a lot of changes, investing a lot of time in the PR to match the end user requirements, but building up a bit of a house of cards on the code side.

So:

  • developers should aim to give feedback early on in a PR
  • end-user reviewers (@thadguidry) should not push a contributor to do a lot of changes without the assurance of a developer that the PR is going in the right direction

If anyone wants to take that forward it is of course possible to open a new PR, possibly from the same branch.

@wetneb wetneb closed this Jan 27, 2021
@thadguidry
Copy link
Member

@wetneb ??? No pushing here. It was a collaboration. I do agree that within this @AghaSaad04 and I were both trying to see the limits of what could be done, that's all. In reality it was an EPIC, as so many of our single issues are actually, but we don't spend the time to break them up that way. That is something we should all try to improve on.

My opinion was that this was a terrific collaboration that met some technical hurdles for @AghaSaad04 with our current UI framework and the interactions of him coming to grips with some of it's pain points, along with learning new things along the way. I definitely commend @AghaSaad04 on actually being successful in showing the non-printable chars in the Expression Editor...which was the primary pain point for users. Since we give users other means to determine if there are non-printable (control chars) in the data grid like with the Unicode facet.

I think going forward for anyone on this PR and its issue, is that indeed they focus on 1 area at a time to display the non-printable chars, which @AghaSaad04 did get working quite well on that branch.

@wetneb
Copy link
Sponsor Member

wetneb commented Jan 27, 2021

@thadguidry of course that was a collaboration: my point is that by bringing up a lot of small details in your reviews (which is very useful in general), you are encouraging them to invest more and more time into the PR. As a team, it is our responsibility to ensure that we only do that when we are confident that the PR can make it, otherwise it can be quite disappointing for contributors.

It's really the developers' fault here - we should have chimed in earlier on. But in these cases it would be good if you could slow down a tiny bit perhaps and remind us that we need to have a look at it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants