convert_ipynb UnicodeEncodeError #1880

albertovilla · 2021-12-13T19:18:46Z

Describe the bug
When executing python convert_ipynb.py to generate the Markdown documentation the converter halts with an error.

Error message

Processing ..\..\..\..\tutorials/Tutorial1_Basic_QA_Pipeline.ipynb
Processing ..\..\..\..\tutorials/Tutorial2_Finetune_a_model_on_your_data.ipynb
Processing ..\..\..\..\tutorials/Tutorial3_Basic_QA_Pipeline_without_Elasticsearch.ipynb
Processing ..\..\..\..\tutorials/Tutorial4_FAQ_style_QA.ipynb
Processing ..\..\..\..\tutorials/Tutorial5_Evaluation.ipynb
Traceback (most recent call last):
  File "F:\projects\contrib\haystack\docs\_src\tutorials\tutorials\convert_ipynb.py", line 31, in <module>
    f.write(body)
  File "Z:\miniconda\envs\haystack\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 8650-8652: character maps to <undefined>

Expected behavior
The Markdown documentation should be generated without errors.

Additional context
Executing on Windows 10 with a newly created conda Python 3.9 environment; installation of the packages with the recommended command pip install --editable .

To Reproduce

Install from Github as per the instructions:

git clone https://github.com/deepset-ai/haystack.git
cd haystack
pip install --editable .

Change to the folder where we have the convert_ipynb.py file i.e. docs\_src\tutorials\tutorials\
Execute the command to update the documentation python convert_ipynb.py

FAQ Check

Have you had a look at our new FAQ page?

System:

OS: Windows 10
GPU/CPU: GTX 1060 / Intel(R) Core(TM) i5-4670K CPU @ 3.40GHz
Haystack version (commit or version number): 1.0
DocumentStore: N/A
Reader: N/A
Retriever: N/A

The text was updated successfully, but these errors were encountered:

albertovilla · 2021-12-13T19:20:19Z

The issue can be easily solved, in fact, I have already a solution for it so I could make a PR if you need to. The solution is just as simple as change the code line 29 to include the encoding as follows:

with open(str(i + 1) + ".md", "w", encoding='utf-8') as f:

…ked with issue deepset-ai#1880

ZanSara · 2021-12-14T08:58:47Z

Hello @albertovilla, thank you! Let's make sure it works on the CI too: we use that script almost exclusively there. By the way, if I may ask, how come you found yourself building our documentation locally?

albertovilla · 2021-12-14T09:23:57Z

Hi @ZanSara initially I planned to correct the issue with the documentation but before doing the commit I tried to rebuild the documentation, that's when I saw that bug.

…ked with issue #1880 (#1881)

albertovilla added a commit to albertovilla/haystack that referenced this issue Dec 13, 2021

Correct bug with encoding when generating Markdown documentation; lin…

9f2e706

…ked with issue deepset-ai#1880

ZanSara added type:documentation Improvements on the docs type:bug Something isn't working labels Dec 14, 2021

ZanSara linked a pull request Dec 14, 2021 that will close this issue

Correct bug with encoding when generating Markdown documentation issue #1880 #1881

Merged

4 tasks

ZanSara closed this as completed in #1881 Dec 14, 2021

ZanSara pushed a commit that referenced this issue Dec 14, 2021

Correct bug with encoding when generating Markdown documentation; lin…

2396f0c

…ked with issue #1880 (#1881)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

convert_ipynb UnicodeEncodeError #1880

convert_ipynb UnicodeEncodeError #1880

albertovilla commented Dec 13, 2021

albertovilla commented Dec 13, 2021

ZanSara commented Dec 14, 2021

albertovilla commented Dec 14, 2021

convert_ipynb UnicodeEncodeError #1880

convert_ipynb UnicodeEncodeError #1880

Comments

albertovilla commented Dec 13, 2021

albertovilla commented Dec 13, 2021

ZanSara commented Dec 14, 2021

albertovilla commented Dec 14, 2021