Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix AttributeError: 'HTMLParser' object has no attribute 'unescape' #789

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

heino
Copy link

@heino heino commented Dec 27, 2020

Proposed changes

Execution with Python 3.9 generated the following error:

[...]
File "/usr/local/lib/python3.9/site-packages/coursera/utils.py", line 118, in clean_filename
s = h.unescape(s)
AttributeError: 'HTMLParser' object has no attribute 'unescape'

This it due to the following:

The unescape() method in the html.parser.HTMLParser class has been removed (it was deprecated since Python 3.4). html.unescape() should be used for converting character references to the corresponding unicode characters.
(https://docs.python.org/3/whatsnew/3.9.html)

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist

  • I have read the CONTRIBUTING doc
  • I agree to contribute my changes under the project's LICENSE
  • I have checked that the unit tests pass locally with my changes
  • I have checked the style of the new code (lint/pep).
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Further comments

Reviewers

@coveralls
Copy link

Coverage Status

Coverage increased (+0.2%) to 73.604% when pulling c8796e5 on heino:fix into f9a9a26 on coursera-dl:master.

1 similar comment
@coveralls
Copy link

Coverage Status

Coverage increased (+0.2%) to 73.604% when pulling c8796e5 on heino:fix into f9a9a26 on coursera-dl:master.

@heino
Copy link
Author

heino commented Dec 27, 2020

The python 3.3 tests fails, but this relates to the requirement urllib3 >= 1.23.
That version was released on 2018-06-04, and does not support python 3.3.

@fangchih
Copy link

it works in python 3.9.1, thanks...

@ismail709
Copy link

ismail709 commented Mar 16, 2021

In the file '....\appdata\local\programs\python\python39\lib\site-packages\coursera\utils.py'
You need to comment out the line

from six.moves import html_parser

by just putting an # before it so it looks like

# from six.moves import html_parser

and add

import sys
if sys.version_info[0] >= 3:
   import html
else:
   from six.moves import html_parser
   html = html_parser.HTMLParser()

immediately below it.

Then you need to comment out any occurrence of

h = html_parser.HTMLParser()

again by putting an # in front of it, so it looks like

# h = html_parser.HTMLParser()

and put

h = html

immediately below it. I had to do it twice in the utils.py file.
this will solve the problem, it worked for me :)

@ImSiddh
Copy link

ImSiddh commented Apr 3, 2021

I m getting error
Traceback (most recent call last):
File "C:\Users\SIDDHARTH\Coursera\coursera-dl-master\coursera-dl", line 6, in
coursera_dl.main()
File "C:\Users\SIDDHARTH\Coursera\coursera-dl-master\coursera\coursera_dl.py", line 251, in main
error_occurred, completed = download_class(
File "C:\Users\SIDDHARTH\Coursera\coursera-dl-master\coursera\coursera_dl.py", line 218, in download_class
return download_on_demand_class(session, args, class_name)
File "C:\Users\SIDDHARTH\Coursera\coursera-dl-master\coursera\coursera_dl.py", line 138, in download_on_demand_class
error_occurred, modules = extractor.get_modules(
File "C:\Users\SIDDHARTH\Coursera\coursera-dl-master\coursera\extractors.py", line 53, in get_modules
error_occurred, modules = self._parse_on_demand_syllabus(
File "C:\Users\SIDDHARTH\Coursera\coursera-dl-master\coursera\extractors.py", line 174, in _parse_on_demand_syllabus
links = course.extract_links_from_quiz(
File "C:\Users\SIDDHARTH\Coursera\coursera-dl-master\coursera\api.py", line 783, in extract_links_from_quiz
return self._convert_quiz_json_to_links(quiz_json, 'quiz')
File "C:\Users\SIDDHARTH\Coursera\coursera-dl-master\coursera\api.py", line 792, in _convert_quiz_json_to_links
markup = self._quiz_to_markup(quiz_json)
File "C:\Users\SIDDHARTH\Coursera\coursera-dl-master\coursera\api.py", line 107, in call
question_text = unescape_html(prompt['definition']['value'])
File "C:\Users\SIDDHARTH\Coursera\coursera-dl-master\coursera\utils.py", line 108, in unescape_html
s = h.unescape(s)
NameError: name 'h' is not defined

@ImSiddh
Copy link

ImSiddh commented Apr 3, 2021

I m getting error
[...]
File "C:\Users\SIDDHARTH\Coursera\coursera-dl-master\coursera\utils.py", line 108, in unescape_html
s = h.unescape(s)
NameError: name 'h' is not defined

The pull request replaces "s = h.unescape(s)", so how is this relevant here?

I was getting HTMLparser error and I used add # and text as suggested abovee. after that I m getting such error

@heino
Copy link
Author

heino commented Apr 5, 2021

@ismail709, please do not pollute pull requests by adding comments that only describes the very few changes proposed in the commit. Such is of absolutely no value to the developer that evaluates the proposed changes. Further, it apparently leads to comments that refers to your comment instead of the proposed changes.

@Hadrien-lcrx
Copy link

Hadrien-lcrx commented May 17, 2021

Having issues downloading without CAUTH, I used CAUTH (-ca) and had the error mentioned in the PR title. I can confirm pushing changes manually, identical to this PR's changes, fixed the issue and allowed me to download. Using a virtual environment with Python 3.9.1 and just coursera-dl installed.

@mahyarmirrashed
Copy link

I followed @ismail709 and it started working. Not sure what @heino is talking about since they did help solve the issue...

@heino
Copy link
Author

heino commented Jun 1, 2021

I followed @ismail709 and it started working. Not sure what @heino is talking about since they did help solve the issue...

@mahyarmirrashed, this is a Pull Request with explicitly proposed code changes: c8796e5

For some random reason, @ismail709 added a comment describing how he solved the issue by by applying these changes line by line.

@mahyarmirrashed, if you don't know what a pull request is either, please read up on git. You might prefer reading following a how-I-applied-the-pull-request-manually guide rather than by simply taking a look at the proposed coded changes yourself, but you only further pollute this pull request by validating an alternative description that was not even committed.

Now please stop this nonsense!

@goekce
Copy link

goekce commented Oct 4, 2021

a short update: on Archlinux with 3.9.7 @heino s patch works.

If you are on Archlinux, read my comment on AUR - coursera-dl-git to install coursera-dl with this patch.

@ziyi-yan
Copy link

Thanks @heino! It works on 3.9.7 on macOS.

@okkymabruri
Copy link

okkymabruri commented Jan 1, 2022

@rbrito @felker please accept this pull request.

@okkymabruri
Copy link

You can directly install from this patch by @heino

pip install git+https://github.com/coursera-dl/coursera-dl@c8796e567698be166cb15f54e095140c1a9b567e

@allentiak
Copy link

It seems this patch fails in Python 3.3.x...
https://ci.appveyor.com/project/balta2ar/coursera-dl/builds/37011372

@allentiak
Copy link

Fixes #778

@heino
Copy link
Author

heino commented Jan 2, 2022

It seems this patch fails in Python 3.3.x... https://ci.appveyor.com/project/balta2ar/coursera-dl/builds/37011372

Indeed, but that was already mentioned on 27 Dec 2020, and is not directly related to this issue:

The python 3.3 tests fails, but this relates to the requirement urllib3 >= 1.23. That version was released on 2018-06-04, and does not support python 3.3.

@shwhsx
Copy link

shwhsx commented May 14, 2022

I updated the util.py file and still get the same error: "File "/XXX/env/lib/python3.9/site-packages/coursera/utils.py", line 118, in clean_filename
s = h.unescape(s)
AttributeError: 'HTMLParser' object has no attribute 'unescape'"

The line s = h.unescape(s) is not on 118 and I already set h = html before that line. Any suggestions? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet