Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can Extract 'List Courses', But, not able to download any course; urllib.error.HTTPError: HTTP Error 403: Forbidden #647

Open
phaneshavr opened this issue Sep 17, 2020 · 22 comments · Fixed by berezovskyi/edx-dl#1

Comments

@phaneshavr
Copy link

🚨Please review the Troubleshooting section
before reporting any issue. Don't forget also to check the current issues to
avoid duplicates.

Subject of the issue

I am able to download the list of courses successfully. But not able to download courses and get urllib.error.HTTPError: HTTP Error 403: Forbidden

Your environment

  • Operating System (name/version): Windows 10 Pro Version 2004 OS Build:19041.508
  • Python version: Python 3.8.5
  • youtube-dl version: 2020.09.14
  • edx-dl version: 0.1.13

Steps to reproduce

Tell us how to reproduce this issue. Please provide us the course URL, and the
specific subsection or unit if possible.
https://courses.edx.org/courses/course-v1:edX+edx201+1T2020/course/
above url is one example. But, with any other enrolled course also, the problem is same.

Expected behaviour

Tell us what should happen.
It should automatically download the course.

Actual behaviour

Tell us what happens instead. If the script fails, please copy the entire
output of the command or the stacktrace (don't forget to obfuscate your
username and password). If you cannot copy the exception, attach a screenshot.

edx_dl version 0.1.13
Building initial headers for future requests.
Getting initial CSRF token.
Found CSRF token.
Logging into Open edX site: https://courses.edx.org/login_ajax
Extracting course information from dashboard.
Traceback (most recent call last):
File "c:\users\user\appdata\local\programs\python\python38-32\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "c:\users\user\appdata\local\programs\python\python38-32\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "C:\Users\User\AppData\Local\Programs\Python\Python38-32\Scripts\edx-dl.exe_main
.py", line 7, in
File "c:\users\user\appdata\local\programs\python\python38-32\lib\site-packages\edx_dl\edx_dl.py", line 1020, in main
all_selections = {selected_course:
File "c:\users\user\appdata\local\programs\python\python38-32\lib\site-packages\edx_dl\edx_dl.py", line 1021, in
get_available_sections(selected_course.url.replace('info', 'course'),
File "c:\users\user\appdata\local\programs\python\python38-32\lib\site-packages\edx_dl\edx_dl.py", line 184, in get_available_sections
page = get_page_contents(url, headers)
File "c:\users\user\appdata\local\programs\python\python38-32\lib\site-packages\edx_dl\utils.py", line 58, in get_page_contents
result = urlopen(Request(url, None, headers))
File "c:\users\user\appdata\local\programs\python\python38-32\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "c:\users\user\appdata\local\programs\python\python38-32\lib\urllib\request.py", line 531, in open
response = meth(req, response)
File "c:\users\user\appdata\local\programs\python\python38-32\lib\urllib\request.py", line 640, in http_response
response = self.parent.error(
File "c:\users\user\appdata\local\programs\python\python38-32\lib\urllib\request.py", line 569, in error
return self._call_chain(*args)
File "c:\users\user\appdata\local\programs\python\python38-32\lib\urllib\request.py", line 502, in _call_chain
result = func(*args)
File "c:\users\user\appdata\local\programs\python\python38-32\lib\urllib\request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

@phaneshavr
Copy link
Author

I already tried changing the 'User-Agent': 'edX-downloader/0.01', to 'User-Agent': 'Mozilla/5.0', or to 'User-Agent': 'Chrome/85.0.4183.102' as suggested in #636 #637 # but still it did not help and keep getting the same error. with the argument '--list-courses', I am able to successfully download list of my enrolled courses. But, I am not able to download any course and get the above error

@let4be
Copy link

let4be commented Sep 24, 2020

Same story... the default installation of edx-dl is no longer working

@JWChengRelax
Copy link

I already tried changing the 'User-Agent': 'edX-downloader/0.01', to 'User-Agent': 'Mozilla/5.0', or to 'User-Agent': 'Chrome/85.0.4183.102' as suggested in #636 #637 # but still it did not help and keep getting the same error. with the argument '--list-courses', I am able to successfully download list of my enrolled courses. But, I am not able to download any course and get the above error

I meet the same error.

@staticfloat
Copy link

I can confirm I'm having the same error.

@liam-maps
Copy link

Same here. Listing works ok, but downloading not:
edx_dl version 0.1.13
Password:
Building initial headers for future requests.
Getting initial CSRF token.
Found CSRF token.
Logging into Open edX site: https://courses.edx.org/login_ajax
Extracting course information from dashboard.
Traceback (most recent call last):
File "/home/ubuntu/.local/bin/edx-dl", line 8, in
sys.exit(main())
File "/home/ubuntu/.local/lib/python3.8/site-packages/edx_dl/edx_dl.py", line 1020, in main
all_selections = {selected_course:
File "/home/ubuntu/.local/lib/python3.8/site-packages/edx_dl/edx_dl.py", line 1021, in
get_available_sections(selected_course.url.replace('info', 'course'),
File "/home/ubuntu/.local/lib/python3.8/site-packages/edx_dl/edx_dl.py", line 184, in get_available_sections
page = get_page_contents(url, headers)
File "/home/ubuntu/.local/lib/python3.8/site-packages/edx_dl/utils.py", line 58, in get_page_contents
result = urlopen(Request(url, None, headers))
File "/usr/lib/python3.8/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/usr/lib/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/usr/lib/python3.8/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/usr/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/usr/lib/python3.8/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

@anantsinha
Copy link

Same here.
Course Link: https://courses.edx.org/courses/course-v1:MITx+CTL.SC0x+2T2020/course/

Actual behaviour:

Building initial headers for future requests.
Getting initial CSRF token.
Found CSRF token.
Logging into Open edX site: https://courses.edx.org/login_ajax
Extracting course information from dashboard.
Traceback (most recent call last):
File "/Users/Anant/opt/anaconda3/bin/edx-dl", line 8, in
sys.exit(main())
File "/Users/Anant/opt/anaconda3/lib/python3.7/site-packages/edx_dl/edx_dl.py", line 1023, in main
for selected_course in selected_courses}
File "/Users/Anant/opt/anaconda3/lib/python3.7/site-packages/edx_dl/edx_dl.py", line 1023, in
for selected_course in selected_courses}
File "/Users/Anant/opt/anaconda3/lib/python3.7/site-packages/edx_dl/edx_dl.py", line 184, in get_available_sections
page = get_page_contents(url, headers)
File "/Users/Anant/opt/anaconda3/lib/python3.7/site-packages/edx_dl/utils.py", line 58, in get_page_contents
result = urlopen(Request(url, None, headers))
File "/Users/Anant/opt/anaconda3/lib/python3.7/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/Users/Anant/opt/anaconda3/lib/python3.7/urllib/request.py", line 531, in open
response = meth(req, response)
File "/Users/Anant/opt/anaconda3/lib/python3.7/urllib/request.py", line 641, in http_response
'http', request, response, code, msg, hdrs)
File "/Users/Anant/opt/anaconda3/lib/python3.7/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/Users/Anant/opt/anaconda3/lib/python3.7/urllib/request.py", line 503, in _call_chain
result = func(*args)
File "/Users/Anant/opt/anaconda3/lib/python3.7/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

@VinuRajaKumar
Copy link

Same error

edx_dl version 0.1.13
Building initial headers for future requests.
Getting initial CSRF token.
Found CSRF token.
Logging into Open edX site: https://courses.edx.org/login_ajax
Extracting course information from dashboard.
Traceback (most recent call last):
File "c:\users\vinu raja kumar c\appdata\local\programs\python\python36\lib\runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "c:\users\vinu raja kumar c\appdata\local\programs\python\python36\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "C:\Users\Vinu Raja Kumar C\AppData\Local\Programs\Python\Python36\Scripts\edx-dl.exe_main
.py", line 7, in
File "c:\users\vinu raja kumar c\appdata\local\programs\python\python36\lib\site-packages\edx_dl\edx_dl.py", line 1023, in main
for selected_course in selected_courses}
File "c:\users\vinu raja kumar c\appdata\local\programs\python\python36\lib\site-packages\edx_dl\edx_dl.py", line 1023, in
for selected_course in selected_courses}
File "c:\users\vinu raja kumar c\appdata\local\programs\python\python36\lib\site-packages\edx_dl\edx_dl.py", line 184, in get_available_sections
page = get_page_contents(url, headers)
File "c:\users\vinu raja kumar c\appdata\local\programs\python\python36\lib\site-packages\edx_dl\utils.py", line 58, in get_page_contents
result = urlopen(Request(url, None, headers))
File "c:\users\vinu raja kumar c\appdata\local\programs\python\python36\lib\urllib\request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "c:\users\vinu raja kumar c\appdata\local\programs\python\python36\lib\urllib\request.py", line 532, in open
response = meth(req, response)
File "c:\users\vinu raja kumar c\appdata\local\programs\python\python36\lib\urllib\request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "c:\users\vinu raja kumar c\appdata\local\programs\python\python36\lib\urllib\request.py", line 570, in error
return self._call_chain(*args)
File "c:\users\vinu raja kumar c\appdata\local\programs\python\python36\lib\urllib\request.py", line 504, in _call_chain
result = func(*args)
File "c:\users\vinu raja kumar c\appdata\local\programs\python\python36\lib\urllib\request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

@ChechkovEugene
Copy link

Same for me

edx_dl version 0.1.13
Password:
Building initial headers for future requests.
Getting initial CSRF token.
Found CSRF token.
Logging into Open edX site: https://courses.edx.org/login_ajax
Extracting course information from dashboard.
Traceback (most recent call last):
File "/usr/local/bin/edx-dl", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.8/site-packages/edx_dl/edx_dl.py", line 1020, in main
all_selections = {selected_course:
File "/usr/local/lib/python3.8/site-packages/edx_dl/edx_dl.py", line 1021, in
get_available_sections(selected_course.url.replace('info', 'course'),
File "/usr/local/lib/python3.8/site-packages/edx_dl/edx_dl.py", line 184, in get_available_sections
page = get_page_contents(url, headers)
File "/usr/local/lib/python3.8/site-packages/edx_dl/utils.py", line 58, in get_page_contents
result = urlopen(Request(url, None, headers))
File "/usr/local/Cellar/python@3.8/3.8.6/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/usr/local/Cellar/python@3.8/3.8.6/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/usr/local/Cellar/python@3.8/3.8.6/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/usr/local/Cellar/python@3.8/3.8.6/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/usr/local/Cellar/python@3.8/3.8.6/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/usr/local/Cellar/python@3.8/3.8.6/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

@hasnain3142
Copy link

hasnain3142 commented Oct 25, 2020

Same error

Building initial headers for future requests.
Getting initial CSRF token.
Found CSRF token.
Logging into Open edX site: https://courses.edx.org/login_ajax
Extracting course information from dashboard.
Traceback (most recent call last):
  File "/home/beinghasnain16/.local/bin/edx-dl", line 8, in <module>
    sys.exit(main())
  File "/home/beinghasnain16/.local/lib/python3.6/site-packages/edx_dl/edx_dl.py", line 1023, in main
    for selected_course in selected_courses}
  File "/home/beinghasnain16/.local/lib/python3.6/site-packages/edx_dl/edx_dl.py", line 1023, in <dictcomp>
    for selected_course in selected_courses}
  File "/home/beinghasnain16/.local/lib/python3.6/site-packages/edx_dl/edx_dl.py", line 184, in get_available_sections
    page = get_page_contents(url, headers)
  File "/home/beinghasnain16/.local/lib/python3.6/site-packages/edx_dl/utils.py", line 58, in get_page_contents
    result = urlopen(Request(url, None, headers))
  File "/usr/lib/python3.6/urllib/request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.6/urllib/request.py", line 532, in open
    response = meth(req, response)
  File "/usr/lib/python3.6/urllib/request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.6/urllib/request.py", line 570, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.6/urllib/request.py", line 650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

@jmfontana
Copy link

Confirmed on 29-10-2020. I'm having the same problem even after trying solutions suggested in #636 #637. Can someone help?

@johanneswerner
Copy link

johanneswerner commented Nov 1, 2020

EDIT: I have the same problem, see #652

edx_dl version 0.1.13
Building initial headers for future requests.
Getting initial CSRF token.
Found CSRF token.
Logging into Open edX site: https://courses.edx.org/login_ajax
Extracting course information from dashboard.
Traceback (most recent call last):
  File "/usr/bin/edx-dl", line 33, in <module>
    sys.exit(load_entry_point('edx-dl==0.1.13', 'console_scripts', 'edx-dl')())
  File "/usr/lib/python3.8/site-packages/edx_dl/edx_dl.py", line 1020, in main
    all_selections = {selected_course:
  File "/usr/lib/python3.8/site-packages/edx_dl/edx_dl.py", line 1021, in <dictcomp>
    get_available_sections(selected_course.url.replace('info', 'course'),
  File "/usr/lib/python3.8/site-packages/edx_dl/edx_dl.py", line 184, in get_available_sections
    page = get_page_contents(url, headers)
  File "/usr/lib/python3.8/site-packages/edx_dl/utils.py", line 58, in get_page_contents
    result = urlopen(Request(url, None, headers))
  File "/usr/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.8/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/lib/python3.8/urllib/request.py", line 640, in http_response
    response = self.parent.error(
  File "/usr/lib/python3.8/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.8/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

@jmfontana
Copy link

see #652

Sorry Johannes but I don't see how this helps. I've checked #652 and I can't see any information there that can help us solve this problem.

@johanneswerner
Copy link

johanneswerner commented Nov 1, 2020

@jmfontana My apologies, I wanted to report the same issue under a different operation system (arch linux, installed with the aur package v. 0.1.13, but I posted it in the wrong issue first (#652). I edited my previous post to make it clearer.

@johanneswerner
Copy link

The solution provided in #631 (comment) worked for me as well:

I'm not an author of the tool, but you can fix it by changing line 425 of edx_dl.py which specifies the User-Agent attribute of the http request header. Change 'User-Agent': 'edX-downloader/0.01', to 'User-Agent': 'Mozilla/5.0', and it will work.

@ChechkovEugene
Copy link

The solution provided in #631 (comment) worked for me as well:

I'm not an author of the tool, but you can fix it by changing line 425 of edx_dl.py which specifies the User-Agent attribute of the http request header. Change 'User-Agent': 'edX-downloader/0.01', to 'User-Agent': 'Mozilla/5.0', and it will work.

Yes. This fixed 403 error. But now i have only empty folders in the downloaded course. But maybe this is another issue not linked with 403

@johanneswerner
Copy link

The solution provided in #631 (comment) worked for me as well:

I'm not an author of the tool, but you can fix it by changing line 425 of edx_dl.py which specifies the User-Agent attribute of the http request header. Change 'User-Agent': 'edX-downloader/0.01', to 'User-Agent': 'Mozilla/5.0', and it will work.

Yes. This fixed 403 error. But now i have only empty folders in the downloaded course. But maybe this is another issue not linked with 403

Same situation here, it worked for one course (no idea why), but I try with others, I get the same problem.

@MagTun
Copy link

MagTun commented Nov 3, 2020

@ChechkovEugene and @johanneswerner, did you try this?

@gledguri
Copy link

gledguri commented Nov 5, 2020

Same problem here even though I tried #636 #637 Is there any solution/s yet?

@ChechkovEugene
Copy link

@ChechkovEugene and @johanneswerner, did you try this?

It's working. But in some moment https://github.com/l1ving/youtube-dl/issues/20. error appears. Waiting for finishing all merges

@Learnpython-code
Copy link

Hello everyone, I am new with python, Please help checking my results, I dont got any videos , only folders empty.

Result

C:\edx-dl-master>python edx-dl.py -u (username) https://courses.edx.org/courses/coursev1:URosarioX+URX01+1T2020/course/
edx_dl version 0.1.13
Password:
Building initial headers for future requests.
Getting initial CSRF token.
Found CSRF token.
Logging into Open edX site: https://courses.edx.org/login_ajax
Extracting course information from dashboard.
Downloading Diseño de sistemas de información gerencial para intranet con Micros
oft Access [course-v1:URosarioX+URX01+1T2020/co]
Downloading 5 section(s)
Section 1: Generalidades
Acerca del curso
Section 2: Microsoft Access y Bases de Datos Relacionales
Conceptos básicos
Planear y crear una BDR
Evaluación
Section 3: Diseño de la interface - Consultas
Visualizar información
Modificar la BDR con consultas de acción
Interacción con otros programas
Evaluación
Section 4: Diseño de la interface - Formularios y macros
Ingresar datos a la BDR
Panel de control personalizado
Evaluación
Section 5: Diseño de la interface - Informes
Informes
Evaluación
Cierre
Extracting all units information in parallel.
Processing 'https://courses.edx.org/courses/course-v1:URosarioX+URX01+1T2020/jum
p_to/block-v1:URosarioX+URX01+1T2020+type@sequential+block@ddbbb4394e4f4eeab5716
95c19842fc2'
Processing 'https://courses.edx.org/courses/course-v1:URosarioX+URX01+1T2020/jum
p_to/block-v1:URosarioX+URX01+1T2020+type@sequential+block@edcc3663b92546ee9f37d
4868d05ba30'
Processing 'https://courses.edx.org/courses/course-v1:URosarioX+URX01+1T2020/jum
p_to/block-v1:URosarioX+URX01+1T2020+type@sequential+block@7a917180012346c8b7f1d
e5837729bbd'
Processing 'https://courses.edx.org/courses/course-v1:URosarioX+URX01+1T2020/jum
p_to/block-v1:URosarioX+URX01+1T2020+type@sequential+block@fdb672aa18b0485aa6954
19f493a5fd0'
Processing 'https://courses.edx.org/courses/course-v1:URosarioX+URX01+1T2020/jum
p_to/block-v1:URosarioX+URX01+1T2020+type@sequential+block@5b34eb36e50a4db6a9c4c
53e719546cf'
Processing 'https://courses.edx.org/courses/course-v1:URosarioX+URX01+1T2020/jum
p_to/block-v1:URosarioX+URX01+1T2020+type@sequential+block@c78e301110b54cff8a850
0c784e16d09'
Processing 'https://courses.edx.org/courses/course-v1:URosarioX+URX01+1T2020/jum
p_to/block-v1:URosarioX+URX01+1T2020+type@sequential+block@fcd257068abb4f588805d
b3a15e0ba06'
Processing 'https://courses.edx.org/courses/course-v1:URosarioX+URX01+1T2020/jum
p_to/block-v1:URosarioX+URX01+1T2020+type@sequential+block@9205182f4d2b46ec93fd6
ff22d752fa6'
Processing 'https://courses.edx.org/courses/course-v1:URosarioX+URX01+1T2020/jum
p_to/block-v1:URosarioX+URX01+1T2020+type@sequential+block@f9a2c97a613a40169a016
67bb6aca2be'
Processing 'https://courses.edx.org/courses/course-v1:URosarioX+URX01+1T2020/jum
p_to/block-v1:URosarioX+URX01+1T2020+type@sequential+block@30549607116847379bc57
b4419084652'
Processing 'https://courses.edx.org/courses/course-v1:URosarioX+URX01+1T2020/jum
p_to/block-v1:URosarioX+URX01+1T2020+type@sequential+block@09f8ee9e3295491495749
4d87da8a4bc'
Processing 'https://courses.edx.org/courses/course-v1:URosarioX+URX01+1T2020/jum
p_to/block-v1:URosarioX+URX01+1T2020+type@sequential+block@674fda5e810440f190d84
9740e674cae'
Processing 'https://courses.edx.org/courses/course-v1:URosarioX+URX01+1T2020/jum
p_to/block-v1:URosarioX+URX01+1T2020+type@sequential+block@fe847e5e361b47a3a3efd
82f480b2a4e'
Processing 'https://courses.edx.org/courses/course-v1:URosarioX+URX01+1T2020/jum
p_to/block-v1:URosarioX+URX01+1T2020+type@sequential+block@29c2dfb8e8294eed941ee
3b576db59c8'
Removed 0 duplicated urls from 0 in total
Output directory: Downloaded

@jialinyi94
Copy link

The same issues here.

Any progress?

@ndcroos
Copy link

ndcroos commented Jan 2, 2021

I also have the same problem here, using the default install from pip.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.