Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

script exits whn it encounters a no utf-8 file name #1

Open
bradw2002 opened this issue Jul 30, 2021 · 1 comment
Open

script exits whn it encounters a no utf-8 file name #1

bradw2002 opened this issue Jul 30, 2021 · 1 comment

Comments

@bradw2002
Copy link

Hi Kalebu, the script is very useful. I have several thousand files, some of which are duplicates. But the script has exited with an error when it encounters a non utf-8 encoded file.

I am running this on a Ubuntu Mate 18.04.5 LTS (Bionic Beaver) computer

(I renamed the script to remove-duplicate-files.py3 , and I am calling it like so...

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
S=$(date) ; python3 ./remove-duplicate-files.py3 ; E=$(date) ; echo -e "start = $S ..... \n end = $E"
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

this is the output I get (I have re-run it, so the previous duplicates have already been cleaned)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


**************** DUPLYTHON ****************************


---------------- WELCOME ----------------------------
---------------- WELCOME ----------------------------

Cleaning .................
Traceback (most recent call last):
File "./remove-duplicate-files.py3", line 69, in
App.main()
File "./remove-duplicate-files.py3", line 65, in main
self.welcome();self.clean();self.cleaning_summary()
File "./remove-duplicate-files.py3", line 53, in clean
print(raw_string, '.. cleaned ')
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 0-5: surrogates not allowed

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Can you suggest a change to the script so it does not fail with a filename that has a non utf-8 character in it?

And can it be made to print the name of the file it exited on?

Also it would be useful if the script can be placed in a different directory than the one I want to clean, and would ask me the name of the directory I want to clean.

Thanks,
bradw2002

@bradw2002
Copy link
Author

And sorry, I meant to say, could it always print the current file name in any circumstance where script might exit with an error?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant