Conversation
|
Thanks for hunting these pesky characters down and removing them. When looking at the markdown and then the rendered bytes, it seems they get dropped when the docs are built. It's very agreeable to get these out tho. Did you do a global search (i.e., outside of code blocks) for U+00A0? If so, did you find more of these ... as @mariaw says ... little "treasures" in there? Expanding out to other similar characters (e.g., the dreaded BOM!): If this is a larger problem and if there are all kinds of unnecessary and undesirable characters sprinkled in across the docs, shouldn't we determine if the docs can be cleansed of all of these characters in one big pass? |
Yes, there is more of them. For example, here is the start of the output of (The ones in the F# language reference code blocks were not replaced, because they're I wasn't completely sure they should be replaced elsewhere and thought they are less of an issue elsewhere, so I didn't replace them. Should I?
Here is a table of all unusual characters in *.md files in the docs directory (in this branch, so after NBSPs in code blocks were already replaced). Many of them are not wrong (e.g. Greek or Russian characters or some box drawing characters from the output of Yeoman) and some are a matter of style (e.g - vs. – vs. —).
|
|
The
I hope @mairaw will give us good news on the possibility of the CI warning/failing on at least some of the nastier ones (e.g., the BOM). RE: Removing the rest of the U+00A0 in favor of U+0020: I like it. Makes sense for this PR if @mairaw likes it. @mairaw will know if this problem can/should be approached more globally than by creating separate PR's that address individual characters. |
No, it's not, at least not in this case.
Ok, done. |
|
Never trust Notepad (ANSI/Windows-1252) to tell you an encoding! lol 😄 Yes, I got the encoding wrong. I see it now (dotnet-clean.md) ... My questions are still valid tho: Can we prevent undesirable codepoints creeping into the docs over the years? Is there an opportunity to purge the docs of undesirable codepoints in one bold stroke? |
That would require CI. And considering that CI currently seems to verify only content (the "OpenPublishing.Build" check) and the few remaining project.json projects (the "OrcaBot [.NET Core - Nix]" check), I think first we have to ask: Can we get a decent working CI? (Unless I'm missing something and CI is actually working fine.) |
When working on #2149, I have noticed that many code blocks contain the NO-BREAK SPACE character (U+00A0).
I don't know if they actually cause any issues, but I think they shouldn't be used in code blocks, so I replaced them with a normal space.
The code I used is here.