Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

False positive in filename, hugo markdown custom shortcode #485

Closed
naggie opened this issue May 9, 2022 · 7 comments
Closed

False positive in filename, hugo markdown custom shortcode #485

naggie opened this issue May 9, 2022 · 7 comments

Comments

@naggie
Copy link

naggie commented May 9, 2022

Hi,

I receive the following false positive -- typos seems to be reading (custom) image tag filenames, matching on ba.

image

Thanks!
Callan

PS: Thanks for such useful software; I use it as part of the build chain for my blog.

@naggie naggie changed the title False positive in filename, hugo markdown custom shortcde False positive in filename, hugo markdown custom shortcode May 9, 2022
@epage
Copy link
Collaborator

epage commented May 9, 2022

Could you include a copyable version of one of those lines?

We have GUID detection; my guess is its related to #481

@naggie
Copy link
Author

naggie commented May 10, 2022

Hi @epage

error: `ba` should be `by`, `be`
  --> content/blog/rcdos_power_cut_from_your_phone/cover_ed6e389a-cd40-4c32-8a71-9bc95ae2a2ba.jpg:41
error: `ba` should be `by`, `be`
  --> content/blog/rcdos_power_cut_from_your_phone/index.md:10:51
   |
10 | coverimg: cover_ed6e389a-cd40-4c32-8a71-9bc95ae2a2ba.jpg
   |                                                   ^^
   |
error: `ba` should be `by`, `be`
  --> content/blog/rcdos_power_cut_from_your_phone/index.md:74:54
   |
74 | {{< img src="cover_ed6e389a-cd40-4c32-8a71-9bc95ae2a2ba.jpg" caption="The old mechanicalrelay" >}}
   |                                                      ^^
   |
error: `ba` should be `by`, `be`
  --> content/blog/diy_linkwitz_riley_active_crossover/cover_d1d11585-6a21-47cd-b32e-472ba009ca84.jpg:3
error: `ba` should be `by`, `be`
  --> content/blog/diy_linkwitz_riley_active_crossover/index.md:7:44
  |
7 | coverimg: cover_d1d11585-6a21-47cd-b32e-472ba009ca84.jpg
  |                                            ^^
  |
error: `ba` should be `by`, `be`
  --> content/blog/diy_linkwitz_riley_active_crossover/index.md:29:51
   |
29 |     {{< img src="cover_d1d11585-6a21-47cd-b32e-472ba009ca84.jpg" caption="In operation">}}
   |                                                   ^^
   |

It also appears to have problems with (oddly terminated) base64 strings:

rror: `nd` should be `and`
 --> ./themes/naggie-hugo-theme/tools/package-lock.json:1036:50
    |
036 |       "integrity": "sha512-vd15qHsaqrRL7dtH6QNuy0ndJmRDrS9HAM1CAiSifNUFv4x1a0CCVsj18hJ1mShxIG6T2i1sO78MkP56r0nYRw==",
    |                                                  ^^
    |
rror: `Ot` should be `To`, `Of`, `Or`
 --> ./themes/naggie-hugo-theme/tools/package-lock.json:1476:55
    |
476 |       "integrity": "sha512-hCmlUAIlUiav8Xdqw3Io4LcpA1DOt7h3LSTAC4G6JGHFFaWzI6qvFt9oilvl8BmkbBRX1IhM90ZAmpk68zccQA==",
    |                                                       ^^
    |
$ typos --version
typos-cli 1.7.3

Thanks
Callan

@epage
Copy link
Collaborator

epage commented May 10, 2022

Was able to confirm, this is a duplicate of #481

@epage epage closed this as completed May 10, 2022
@epage epage reopened this May 10, 2022
@epage
Copy link
Collaborator

epage commented May 10, 2022

Actually, going to re-open this because this has a slightly different root cause. For cover_ed6e389a-cd40-4c32-8a71-9bc95ae2a2ba.jpg, the problem is cover_ed6e389a looks like an identifier, so that leaves us parsing -cd40-4c32-8a71-9bc95ae2a2ba.jpg which no longer looks like a guid and we turn it into identifiers

As a workaround, if - is used instead of _, then it works today without a fix for #481.

With the backtracking requirements needed to make this work, I'm concerned we don't have a viable way to directly fix this. In #484, we are talking about a heuristic for discarding words which might be a viable route for working around this.

epage added a commit to epage/typos that referenced this issue May 10, 2022
Previously, we bailed out if the string is too short (<90) and there
weren't non-alpha-base64 bytes present.  What we ignored were the
padding bytes.

We key off of padding bytes to detect that a string is in fact base64
encoded.  Like the other cases, there can be false positives but those
strings should show up elsewhere or the compiler will fail.

This was called out in crate-ci#485
epage added a commit to epage/typos that referenced this issue May 10, 2022
Previously, we bailed out if the string is too short (<90) and there
weren't non-alpha-base64 bytes present.  What we ignored were the
padding bytes.

We key off of padding bytes to detect that a string is in fact base64
encoded.  Like the other cases, there can be false positives but those
strings should show up elsewhere or the compiler will fail.

This was called out in crate-ci#485
@epage
Copy link
Collaborator

epage commented May 10, 2022

Having thought this through, I don't think there is a "good" solution for this besides #484, so I'm going to go ahead and close this.

I will note that the base64 failures are being fixed in #486.

@epage epage closed this as completed May 10, 2022
@epage
Copy link
Collaborator

epage commented May 10, 2022

v1.8.0 is released with a fix for the base64 strings

@naggie
Copy link
Author

naggie commented May 10, 2022

Thanks @epage !

Yes it's a tough problem to generalise over arbitrary code. Likely an impossible problem. I guess the best thing is to cover 99% of ground and rely on configuration to work around edge cases when there's no good solution.

In my case I worked around it with a _typos.toml:

[default.extend-words]
ba = "ba"

It's a hack but I'm building again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants