Adding HTML Scraping functionality #75

lucanello · 2020-04-21T18:56:08Z

I created a fix for the problem regarding the Google YouTube Search/Data API. It simply uses the normal search by requesting the HTML page, parsing and filtering the video IDs. There is also the possibility to fetch all the videos contained on the first search page and more informations like video title etc.

The problem with PR #73 is, that it's using client-side hardcoded authentication and removes the ability to still provide the official YouTube Data API. The authentication in his PR will lead to more frequent rate limitings because we're all using the same footprint.

Issues regarding this problem:
#60

Todo's:

Add correct exception handling
Tests how often those requests work and possibly add rate limiting handler

The pro's are:

No need to specify a Google API Key (please check if my removal of the boundaries work)
Possibility to download way bigger playlists

The con's are:

Data is not nicely structured
You could be blocked by Google (may use a VPN)

PS: I really like this repo! Thanks for making Spotify more open for everybody.

…#71) * add test, handle Youtube quota expiry better (#68) * add sigint handler to handle Ctrl+C better * major version bump due to Python version change (now needs 3.6+) * move sentry around so that github action doesn;t complain when trying to fetch version for publishing * move sentry around * oops, moved signal to wrong place

SathyaBhat · 2020-04-21T19:03:38Z

Wow, @lucanello - this is fantastic, thank you very much. From a first glance it looks great and I love that your PR gives an option to scrape as well as to go via the API route. I'll take a deeper look this week.

Once again, much thanks for the PR

opiumozor

was about to jump on this since it bothered me as well, great implementation 👍

SathyaBhat

Hey, @lucanello - thanks for your contribs. Overall looks good, except for some minor changes. I'm working on incorporating these changes, just wanted to pass the feedback to you

SathyaBhat · 2020-05-01T16:54:47Z

spotify_dl/youtube.py

+            video_id = re.search("((\?v=)[a-zA-Z0-9_-]{4,15})", video[0]).group(0)[3:]
+            print(video_id)
+            return YOUTUBE_VIDEO_URL + video_id
+        except:


Blank except are bad, as this will capture any and all exceptions - including the ones that are raised when Ctrl+C is pressed, so this makes it impossible to exit when the search is happening.

SathyaBhat · 2020-05-01T16:55:47Z

spotify_dl/scaffold.py

@@ -45,6 +45,10 @@ def check_for_tokens():

            Generate the key from


if --scrape or -s has been provided, then we don't need to print this

SathyaBhat · 2020-05-01T16:56:42Z

spotify_dl/scaffold.py

            ''')
+        if args.scrape:


can move this check before printing "get dev key.."

SathyaBhat · 2020-05-01T16:57:58Z

spotify_dl/youtube.py

-            else:
-                secho(f"\t Search failed due to {error_domain}:{error_reason}, message: {error_message}")
-        return None
+                if error_reason == 'quotaExceeded' or error_reason == 'dailyLimitExceeded':


was this intended? if there are multiple errors then it will repeatedly print the error message

SathyaBhat · 2020-05-02T19:43:57Z

@lucanello I have merged this into dev branch, will be out when the next release is cut., thanks for your contribs!

SathyaBhat and others added 3 commits March 29, 2020 11:02

Add HTML Scraping functionality

57f3758

Update requirements.txt to fit HTML Scrape

f19386a

opiumozor approved these changes Apr 23, 2020

View reviewed changes

SathyaBhat added this to the 4.0 milestone Apr 25, 2020

SathyaBhat added the enhancement label Apr 25, 2020

SathyaBhat added this to In progress in Spotify DL via automation Apr 25, 2020

SathyaBhat self-assigned this Apr 25, 2020

SathyaBhat modified the milestones: 4.0, 5.0 Apr 25, 2020

SathyaBhat requested changes May 1, 2020

View reviewed changes

SathyaBhat changed the base branch from master to dev May 2, 2020 19:00

SathyaBhat merged commit f19386a into SathyaBhat:dev May 2, 2020

Spotify DL automation moved this from In progress to Done May 2, 2020

This was referenced May 10, 2020

Uses youtube web api instead of API #73

Closed

Google API key rotation #62

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding HTML Scraping functionality #75

Adding HTML Scraping functionality #75

lucanello commented Apr 21, 2020

SathyaBhat commented Apr 21, 2020

opiumozor left a comment

SathyaBhat left a comment

SathyaBhat May 1, 2020

SathyaBhat May 1, 2020

SathyaBhat May 1, 2020

SathyaBhat May 1, 2020

SathyaBhat commented May 2, 2020 •

edited

		@@ -45,6 +45,10 @@ def check_for_tokens():

		Generate the key from

Adding HTML Scraping functionality #75

Adding HTML Scraping functionality #75

Conversation

lucanello commented Apr 21, 2020

SathyaBhat commented Apr 21, 2020

opiumozor left a comment

Choose a reason for hiding this comment

SathyaBhat left a comment

Choose a reason for hiding this comment

SathyaBhat May 1, 2020

Choose a reason for hiding this comment

SathyaBhat May 1, 2020

Choose a reason for hiding this comment

SathyaBhat May 1, 2020

Choose a reason for hiding this comment

SathyaBhat May 1, 2020

Choose a reason for hiding this comment

SathyaBhat commented May 2, 2020 • edited

SathyaBhat commented May 2, 2020 •

edited