-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding HTML Scraping functionality #75
Conversation
…#71) * add test, handle Youtube quota expiry better (#68) * add sigint handler to handle Ctrl+C better * major version bump due to Python version change (now needs 3.6+) * move sentry around so that github action doesn;t complain when trying to fetch version for publishing * move sentry around * oops, moved signal to wrong place
Wow, @lucanello - this is fantastic, thank you very much. From a first glance it looks great and I love that your PR gives an option to scrape as well as to go via the API route. I'll take a deeper look this week. Once again, much thanks for the PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was about to jump on this since it bothered me as well, great implementation 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, @lucanello - thanks for your contribs. Overall looks good, except for some minor changes. I'm working on incorporating these changes, just wanted to pass the feedback to you
video_id = re.search("((\?v=)[a-zA-Z0-9_-]{4,15})", video[0]).group(0)[3:] | ||
print(video_id) | ||
return YOUTUBE_VIDEO_URL + video_id | ||
except: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blank except
are bad, as this will capture any and all exceptions - including the ones that are raised when Ctrl+C is pressed, so this makes it impossible to exit when the search is happening.
@@ -45,6 +45,10 @@ def check_for_tokens(): | |||
|
|||
Generate the key from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if --scrape
or -s
has been provided, then we don't need to print this
''') | ||
if args.scrape: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can move this check before printing "get dev key.."
else: | ||
secho(f"\t Search failed due to {error_domain}:{error_reason}, message: {error_message}") | ||
return None | ||
if error_reason == 'quotaExceeded' or error_reason == 'dailyLimitExceeded': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was this intended? if there are multiple errors then it will repeatedly print the error message
@lucanello I have merged this into dev branch, will be out when the next release is cut., thanks for your contribs! |
I created a fix for the problem regarding the Google YouTube Search/Data API. It simply uses the normal search by requesting the HTML page, parsing and filtering the video IDs. There is also the possibility to fetch all the videos contained on the first search page and more informations like video title etc.
The problem with PR #73 is, that it's using client-side hardcoded authentication and removes the ability to still provide the official YouTube Data API. The authentication in his PR will lead to more frequent rate limitings because we're all using the same footprint.
Issues regarding this problem:
#60
Todo's:
The pro's are:
The con's are:
PS: I really like this repo! Thanks for making Spotify more open for everybody.