Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is utf_8-only doing more stuff (converting e.g. brackets) than it is needed? #86

Closed
delphym opened this issue Aug 27, 2021 · 1 comment

Comments

@delphym
Copy link

delphym commented Aug 27, 2021

Hello there,

I was wondering what am I doing wrong? I was hopping when I run:
detox -n -v -s utf_8-only /Volumes/sda2/Videos/Films/
which should be using sequence from /usr/local/etc/detoxrc which is defined as:

# transliterates UTF-8 to ASCII
sequence "utf_8-only" {
   utf_8;
};

I am getting the following:
/Volumes/sda2/Videos/Films//The_Bourne_Supremacy_(Bournův_mýtus)_2004 -> /Volumes/sda2/Videos/Films//The_Bourne_Supremacy__Bournuv_mytus__2004

I would expect to get following conversation based on the info from HACKING-v1.md:
/Volumes/sda2/Videos/Films//The_Bourne_Supremacy_(Bournův_mýtus)_2004 -> /Volumes/sda2/Videos/Films//The_Bourne_Supremacy_(Bournuv_mytus)_2004

Also, given the fact, I modified /usr/local/share/detox/safe.tbl and here're changes just FYI:

ζ diff /usr/local/share/detox/safe.tbl /usr/local/share/detox/safe.tbl.sample                                                                                                                                                    [d14fc2db1] 
95,99d94
< 0x28		(
< 0x29		)
< 0x5b		[
< 0x5d		]
< 
128,131c123,126
< #0x28		-	# (
< #0x29		-	# )
< #0x5b		-	# [
< #0x5d		-	# ]
---
> 0x28		-	# (
> 0x29		-	# )
> 0x5b		-	# [
> 0x5d		-	# ]

So, I would still expect the brackets won't get converted to - if I run detox with "full" utf_8 or default sequence:
detox -n -v -s utf_8 /Volumes/sda2/Videos/Films/
But the "opposite" is true:-(
/Volumes/sda2/Videos/Films//The_Bourne_Supremacy_(Bournův_mýtus)_2004 -> /Volumes/sda2/Videos/Films//The_Bourne_Supremacy_Bournuv_mytus_2004

For full picture, I'm attaching:

Note
I'm using macOs Mojave, latest stable detox v1.4.5 installed via Homebrew

@dharple
Copy link
Owner

dharple commented Nov 6, 2021

Add the lines you added to safe.tbl to unicode.tbl, and you should see the desired result.

0x28		(
0x29		)
0x5b		[
0x5d		]

The utf_8 filter loads its translation table from unicode.tbl.
The safe filter uses safe.tbl.

Currently, they aren't specified in unicode.tbl, so they are being converted to the default, which is set to _.

There is no way to override this behavior in detox 1.x, but it's coming in detox 2.

@dharple dharple closed this as completed Nov 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants