Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

epg parsing error - 'charmap' codec can't encode character #79

Closed
MagicOneFr opened this issue Jul 4, 2023 · 16 comments
Closed

epg parsing error - 'charmap' codec can't encode character #79

MagicOneFr opened this issue Jul 4, 2023 · 16 comments
Assignees
Labels
documentation An issue that documents something environmental An issue that relates to a user's OS environmental setup

Comments

@MagicOneFr
Copy link

Hello,
I'm trying to filter an epg and I get this error:
epg creation failure: 'charmap' codec can't encode character '\u25c9' in position 104: character maps to
original.zip
Is it possible to fix that please?
Thank you.

@bebo-dot-dev
Copy link
Owner

Hi there, it maybe fixable but there's not enough to go on with what you've supplied so far.

On the face of it there's nothing wrong with your supplied xml file and a quick script test through with it worked as expected here, no errors seen.

@bebo-dot-dev bebo-dot-dev added question A functionality question pending feedback Pending feedback from the person who created an issue labels Jul 4, 2023
@MagicOneFr
Copy link
Author

Hello!
Ok what would you need? It's difficult for me to provide the config file with the url I use because it contains my login and password for the iptv provider.

@bebo-dot-dev
Copy link
Owner

No problem, if you're able to supply the config file with URLs/passwords removed and the original m3u file, that would enable me to run the script in exactly the same way as you against your source data.

@MagicOneFr
Copy link
Author

m3u.zip
Here it is.
Thank you very much.

@MagicOneFr
Copy link
Author

I've retried today and same error:
2023-07-05T18:46:43.233726 creating channel element for m3u entry from tvg-name value FR - RTS UN FHD
2023-07-05T18:46:43.260751 epg creation failure: 'charmap' codec can't encode character '\u25c9' in position 104: character maps to

@bebo-dot-dev
Copy link
Owner

Thanks will take a look asap

@bebo-dot-dev bebo-dot-dev added investigation Something is being analysed / tested and removed pending feedback Pending feedback from the person who created an issue labels Jul 5, 2023
@bebo-dot-dev
Copy link
Owner

Hi again, I've run a test with your supplied json config, m3u and xml file and no errors were seen.

The only changes that I made to your json config for the test run was to repoint the m3uurl and epgurl values to the local files you supplied, I also switched on log_enabled to true to enable debug output to process.log.

I've attached the files back here so you can take a look at the outcome and the details that were recorded for the test run in the process.log file.

If these supplied files were expected to fail then it is strange for sure and my gut tells me that it's perhaps some sort of environmental / windows O/S type of problem that you're seeing.

issue79.zip

@bebo-dot-dev bebo-dot-dev added pending feedback Pending feedback from the person who created an issue and removed investigation Something is being analysed / tested labels Jul 7, 2023
@MagicOneFr
Copy link
Author

I'm using windows 10 pro 21H2, french localisation.
For python I'm using python 3.8.6 (64bits).
I can see that line in your log;
2023-07-07T16:54:42.467740 creating channel element for m3u entry from tvg-name value FR - RTS DEUX FHD ◉
There's a strange round character on this line and in my log, it crashes just before adding this line.
Is it possible that this character isn't correctly handled by my OS/Python ?
Is it possible to strip this character?
Thank you very much

@bebo-dot-dev
Copy link
Owner

Thank you it's good to learn a little more about your OS and environment and yes I did notice the non-standard character ◉ in your source data, it exists in both your m3u and xml files.

Stripping this character (and perhaps others) out could be an option but before we follow that idea, can I ask why you have the force_epg option switched on and if you've tried with this option off?

The original idea for the existence of the force_epg option is outlined in #42

In short, the force_epg option was introduced some time ago as a feature to force the creation of channels into the newly written XML EPG file for all channels that exist in an m3u file. As far as I know this is quite a seldom used feature because there's a hope (a dream :)) that most source XML EPG files are reasonably complete and of decent quality when paired with a given m3u file.

I can see from your error report that the script is failing for you in an area of code only active when force_epg is switched on so it might be worth trying to switch this option off to see if you're able to generate an acceptable EPG file without error.

@MagicOneFr
Copy link
Author

Same result with force_epg to false. Thanks for the explanation about the option. I think that sometimes I have this kind of epg stored in the channel name.
In the same time I have tried python 3.11 : Same result.

@MagicOneFr
Copy link
Author

I have found a workaroud.
As described in this page, I have added the 2 "set" command in a batch file before calling the python script and now, there's no more error!
According to the same page, there's a way to fix the issue in the code.

@MagicOneFr
Copy link
Author

@bebo-dot-dev
Copy link
Owner

Wow good find :)

The link you included explains that it is a Windows OS issue related to the Windows shell that you're using - it would appear that it doesn't by default support UTF-8.

Are you using the regular (old) Windows command prompt rather than a Powershell shell?

@bebo-dot-dev bebo-dot-dev changed the title epg parsing error epg parsing error - 'charmap' codec can't encode character Jul 7, 2023
@MagicOneFr
Copy link
Author

yes indeed.

@bebo-dot-dev
Copy link
Owner

OK this makes a little more sense to me now.

The issue you have encountered is a Unicode (UTF-8) related issue that was triggered by Unicode characters in your data combined with your Windows command prompt shell that for one reason or another doesn't support Unicode by default.

Reading around this subject I believe that once upon a time, the Windows command prompt didn't support Unicode at all. Microsoft applied a number of incremental changes throughout Windows 10 builds to get the windows command prompt to a point where it is now supposed to support Unicode. There is clearly something in your system setup where it doesn't by default support Unicode, personally I suspect that it's related to your French locale and that affecting the code-page that is in use within your command prompt - this is a guess on my part.

You might see different results in a Powershell prompt and you could get different results again with the new Windows Terminal (the one that is installable from the Windows Store).

I do appreciate you reporting this issue, it could be helpful if it crops up for someone else.

I think we can call this resolved if you're happy for this issue to be closed.

@bebo-dot-dev bebo-dot-dev added documentation An issue that documents something environmental An issue that relates to a user's OS environmental setup and removed question A functionality question pending feedback Pending feedback from the person who created an issue labels Jul 8, 2023
@bebo-dot-dev bebo-dot-dev self-assigned this Dec 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation An issue that documents something environmental An issue that relates to a user's OS environmental setup
Projects
None yet
Development

No branches or pull requests

2 participants