Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce V3.0 #39

Merged
merged 38 commits into from
Aug 10, 2021
Merged

Introduce V3.0 #39

merged 38 commits into from
Aug 10, 2021

Conversation

TRoboto
Copy link
Owner

@TRoboto TRoboto commented Aug 10, 2021

Changes

This PR introduces major changes to the library. These changes are summarized as follows:

  • Switched the bank-end to selenium to bypass Cloudflare IUAM protection system. The headed version is used, which means chrome browser will appear when using the tool, because the headless version is still unable to bypass cloudflare (UC is detected when using headless mode ultrafunkamsterdam/undetected-chromedriver#258). requests is still used to download materials (slides,videos,...).
  • Improved error messages.
  • Allowed non subscribers to download their completed courses/tracks.
  • Courses/Tracks are now printed on the fly.
  • ID and row number columns are now merged into one column.
  • Code is now formatted with black and isort

I guess I didn't miss any big change. I would like to hear comments about this before merging. Any feedback would be greatly appreciated.

Test

To test the changes, it is recommended to create a new virtual environment first, then clone the v3.0 repo with:

pip install git+https://github.com/TRoboto/datacamp-downloader.git@v3.0

Now run datacamp as per the README file.

TODO before merging

  • Update docs
  • Add simple tests

@jorritvm
Copy link

Hey,
I did some functional testing of the V3 update:

The refactored login with selenium now works for me, I tried it using set-token:

D:\dev\python\datacamp_downloader_3\venv\Scripts
(venv) λ datacamp set-token ...
INFO: Hi, Jorrit
INFO: Active subscription found

Then courses seems to work out too, although there are many completed courses that Datacamp no longer provides:

D:\dev\python\datacamp_downloader_3\venv\Scripts
(venv) λ datacamp courses
+-----+--------+------------------------------------------+------------+------------+------------+
| #   | ID     | Title                                    | Datasets   | Exercises  | Videos     |
+-----+--------+------------------------------------------+------------+------------+------------+
| 1   | 735    | Introduction to Python                   | 2          | 46         | 11         |
+-----+--------+------------------------------------------+------------+------------+------------+
| 2   | 58     | Introduction to R                        | 0          | 62         | 0          |
+-----+--------+------------------------------------------+------------+------------+------------+
| 3   | 799    | Intermediate Python                      | 3          | 69         | 18         |
+-----+--------+------------------------------------------+------------+------------+------------+
| 4   | 1532   | Python Data Science Toolbox (Part 1)     | 1          | 34         | 12         |
+-----+--------+------------------------------------------+------------+------------+------------+
| 5   | 672    | Intermediate R                           | 0          | 67         | 14         |
+-----+--------+------------------------------------------+------------+------------+------------+
| 6   | 22639  | Joining Data with pandas                 | 23         | 37         | 15         |
+-----+--------+------------------------------------------+------------+------------+------------+
| 7   | 4914   | Introduction to the Tidyverse            | 1          | 34         | 16         |
+-----+--------+------------------------------------------+------------+------------+------------+
| 8   | 1531   | Python Data Science Toolbox (Part 2)     | 2          | 34         | 12         |
+-----+--------+------------------------------------------+------------+------------+------------+
| 9   | 1607   | Introduction to Importing Data in Python | 9          | 39         | 15         |
+-----+--------+------------------------------------------+------------+------------+------------+
| 10  | 13369  | Writing Efficient Python Code            | 1          | 38         | 15         |
+-----+--------+------------------------------------------+------------+------------+------------+
| 11  | 24364  | Cleaning Data in Python                  | 5          | 31         | 13         |
+-----+--------+------------------------------------------+------------+------------+------------+
| 12  | 1606   | Intermediate Importing Data in Python    | 3          | 22         | 7          |
+-----+--------+------------------------------------------+------------+------------+------------+
| 13  | 24558  | Object-Oriented Programming in Python    | 0          | 31         | 13         |
+-----+--------+------------------------------------------+------------+------------+------------+
| 14  | 15876  | Writing Functions in Python              | 0          | 31         | 15         |
+-----+--------+------------------------------------------+------------+------------+------------+
ERROR: Cannot get course with id: 1008.
| 15  | 5355   | Introduction to Git                      | 0          | 46         | 0          |
+-----+--------+------------------------------------------+------------+------------+------------+
ERROR: Cannot get course with id: 774.
ERROR: Cannot get course with id: 723.
| 16  | 16719  | Streamlined Data Ingestion with pandas   | 3          | 37         | 16         |
+-----+--------+------------------------------------------+------------+------------+------------+
| 17  | 7355   | Web Scraping in Python                   | 1          | 39         | 17         |
+-----+--------+------------------------------------------+------------+------------+------------+
ERROR: Cannot get course with id: 753.
| 18  | 616    | Data Analysis in R, the data.table Way   | 0          | 27         | 10         |
+-----+--------+------------------------------------------+------------+------------+------------+
| 19  | 15974  | Unit Testing for Data Science in Python  | 0          | 38         | 17         |
+-----+--------+------------------------------------------+------------+------------+------------+
ERROR: Cannot get course with id: 944.
| 20  | 13203  | Software Engineering for Data Scientists | 0          | 36         | 15         |
|     |        | in Python                                |            |            |            |
+-----+--------+------------------------------------------+------------+------------+------------+
| 21  | 20680  | Building Web Applications with Shiny in  | 4          | 45         | 16         |
|     |        | R                                        |            |            |            |
+-----+--------+------------------------------------------+------------+------------+------------+
| 22  | 5323   | Data Manipulation with data.table in R   | 0          | 44         | 15         |
+-----+--------+------------------------------------------+------------+------------+------------+
| 23  | 25074  | Reshaping Data with pandas               | 4          | 37         | 15         |
+-----+--------+------------------------------------------+------------+------------+------------+
| 24  | 1143   | Time Series Analysis in R                | 0          | 42         | 16         |
+-----+--------+------------------------------------------+------------+------------+------------+
| 25  | 14630  | Writing Efficient Code with pandas       | 3          | 31         | 14         |
+-----+--------+------------------------------------------+------------+------------+------------+
ERROR: Cannot get course with id: 1057.
| 26  | 5882   | Joining Data with data.table in R        | 8          | 34         | 13         |
+-----+--------+------------------------------------------+------------+------------+------------+

I was able to download course 1:

D:\dev\python\datacamp_downloader_3\venv\Scripts
(venv) λ datacamp download 1
INFO: [1/1] Start to download (1) Introduction to R

Downloading [chapter 0] [==================================================] 100%
Downloading [chapter 3] [==================================================] 100%
Downloading [chapter 4] [==================================================] 100%
Downloading [chapter 5] [================                                  ] 32%
Aborted!

However, other courses failed, some worse than others...

D:\dev\python\datacamp_downloader_3\venv\Scripts
(venv) λ datacamp download 21
ERROR: Cannot get course with id: 21.

D:\dev\python\datacamp_downloader_3\venv\Scripts
(venv) λ datacamp download 13
ERROR: Cannot get course with id: 13.

D:\dev\python\datacamp_downloader_3\venv\Scripts
(venv) λ datacamp download 22
INFO: [1/1] Start to download (22) None
Traceback (most recent call last):
  File "C:\Program Files\Python38\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Program Files\Python38\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "D:\dev\python\datacamp_downloader_3\venv\Scripts\datacamp.exe\__main__.py", line 7, in <module>
  File "d:\dev\python\datacamp_downloader_3\venv\lib\site-packages\typer\main.py", line 214, in __call__
    return get_command(self)(*args, **kwargs)
  File "d:\dev\python\datacamp_downloader_3\venv\lib\site-packages\click\core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "d:\dev\python\datacamp_downloader_3\venv\lib\site-packages\click\core.py", line 782, in main
    rv = self.invoke(ctx)
  File "d:\dev\python\datacamp_downloader_3\venv\lib\site-packages\click\core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "d:\dev\python\datacamp_downloader_3\venv\lib\site-packages\click\core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "d:\dev\python\datacamp_downloader_3\venv\lib\site-packages\click\core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "d:\dev\python\datacamp_downloader_3\venv\lib\site-packages\typer\main.py", line 497, in wrapper
    return callback(**use_params)  # type: ignore
  File "d:\dev\python\datacamp_downloader_3\venv\lib\site-packages\datacamp_downloader\downloader.py", line 154, in download
    datacamp.download(
  File "d:\dev\python\datacamp_downloader_3\venv\lib\site-packages\datacamp_downloader\datacamp_utils.py", line 44, in wrapper
    return f(*args, **kwargs)
  File "d:\dev\python\datacamp_downloader_3\venv\lib\site-packages\datacamp_downloader\datacamp_utils.py", line 216, in download
    self.download_course(material, path, **kwargs)
  File "d:\dev\python\datacamp_downloader_3\venv\lib\site-packages\datacamp_downloader\datacamp_utils.py", line 248, in download_course
    index + correct_path(course.slug or course.title.lower().replace(" ", "-"))
AttributeError: 'NoneType' object has no attribute 'lower'

@TRoboto
Copy link
Owner Author

TRoboto commented Aug 10, 2021

Thank you for providing great functional test. You should use the id of the course(s) with the download command instead of the sequential numbering. Please try the download command again with the course ID and let me know the result. I will look into showing a warning for that.

If you have already completed the old courses and been able to download them, I think this should be fine. Otherwise, I have to see where the problem lies.

@jorritvm
Copy link

Hi,

Ok, I should have been more careful and mindfull of your documentation when doing my test.
It is a bit weird though that it accepted 1 as an ID even though none of the courses on my completed list have this ID.
I tried it with the proper course ID and it works for me.

D:\dev\python\datacamp_downloader_3\venv\Scripts
(venv) λ datacamp download 20680
INFO: [1/1] Start to download (20680) Building Web Applications with Shiny in R
Downloading [datasets] [==================================================] 100%
Downloading [chapter1.pdf] [==================================================] 100%
Downloading [ch1_1.mp4] [==================================================] 100%
Downloading [ch1_2.mp4] [==================================================] 100%
Downloading [ch1_3.mp4] [==================================================] 100%
Downloading [chapter 1] [==================================================] 100%
Downloading [chapter2.pdf] [==================================================] 100%
Downloading [ch2_1.mp4] [==================================================] 100%
Downloading [ch2_2.mp4] [==================================================] 100%
Downloading [ch2_3.mp4] [==================================================] 100%
Downloading [ch2_4.mp4] [==================================================] 100%
Downloading [chapter 2] [==================================================] 100%
Downloading [chapter3.pdf] [==================================================] 100%
Downloading [ch3_1.mp4] [==================================================] 100%
Downloading [ch3_2.mp4] [==================================================] 100%
Downloading [ch3_3.mp4] [==================================================] 100%
Downloading [ch3_4.mp4] [==================================================] 100%
Downloading [chapter 3] [==================================================] 100%
Downloading [chapter4.pdf] [==================================================] 100%
Downloading [ch4_1.mp4] [==================================================] 100%
Downloading [ch4_2.mp4] [==================================================] 100%
Downloading [ch4_3.mp4] [==================================================] 100%
Downloading [ch4_4.mp4] [==================================================] 100%
Downloading [ch4_5.mp4] [==================================================] 100%
Downloading [chapter 4] [==================================================] 100%

I have found that course videos, exercises, markdown files, datasets are all downloaded successfully.

Additionally I tested the tracks command, it seems to work too for me.

D:\dev\python\datacamp_downloader_3\venv\Scripts
(venv) λ datacamp tracks
+-----+--------+------------------------------------------+------------+
| #   | ID     | Title                                    | Courses    |
+-----+--------+------------------------------------------+------------+
| 1   | t1     | R Programming                            | 2          |
+-----+--------+------------------------------------------+------------+
| 2   | t2     | Python Fundamentals                      | 4          |
+-----+--------+------------------------------------------+------------+
| 3   | t3     | Importing & Cleaning Data  with Python   | 5          |
+-----+--------+------------------------------------------+------------+
| 4   | t4     | Python Programming                       | 6          |
+-----+--------+------------------------------------------+------------+

@TRoboto
Copy link
Owner Author

TRoboto commented Aug 10, 2021

No worries, that's a problem on my end, there is a bug that will be fixed soon. I will also default the sequential numbering, which is more convenient in my opinion, in the download command and will remove the ID column.

Once again thank you for the tests.

Copy link
Collaborator

@mohammad-albarham mohammad-albarham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, it's a great tool. One thing, I encountered this error when trying to login for the first time.

ERROR: Message: stale element reference: element is not attached to the page document

@TRoboto TRoboto merged commit b2a7f69 into master Aug 10, 2021
@TRoboto TRoboto deleted the v3.0 branch August 11, 2021 00:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants