Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix parsing Soundcloud tracks that contain the term 'sets' #410

Merged
merged 1 commit into from
Oct 31, 2020

Conversation

Scrxtchy
Copy link
Contributor

  • I carefully read the contribution guidelines and agree to them.
  • I have tested the API against NewPipe.
  • I agree to create a pull request for NewPipe as soon as possible to make it compatible with the changed API.

In the current builds of newpipe, if a track contained the term sets in their url, regardless of position, it would be treated as a playlist, causing a ParsingException.
On the assumption that all sets on the soundcloud URL scheme that playlist sets are always within slashes, this should provide an adequate solution

Before the change, the following Exception would be displayed

Exception

Crash log

org.schabi.newpipe.extractor.exceptions.ParsingException: failed to find pattern ""uri":\s*"https:\/\/api\.soundcloud\.com\/playlists\/((\d)*?)"
	at org.schabi.newpipe.extractor.services.soundcloud.linkHandler.SoundcloudStreamLinkHandlerFactory.getId(SoundcloudStreamLinkHandlerFactory.java:37)
	at org.schabi.newpipe.extractor.linkhandler.LinkHandlerFactory.fromUrl(LinkHandlerFactory.java:57)
	at org.schabi.newpipe.extractor.linkhandler.LinkHandlerFactory.fromUrl(LinkHandlerFactory.java:48)
	at org.schabi.newpipe.extractor.StreamingService.getStreamExtractor(StreamingService.java:261)
	at org.schabi.newpipe.extractor.stream.StreamInfo.getInfo(StreamInfo.java:64)
	at org.schabi.newpipe.util.ExtractorHelper.lambda$getStreamInfo$3(ExtractorHelper.java:116)
	at org.schabi.newpipe.util.-$$Lambda$ExtractorHelper$5fJcha6Sq5APJBLdG6osaJby-mc.call(Unknown Source:4)
	at io.reactivex.internal.operators.single.SingleFromCallable.subscribeActual(SingleFromCallable.java:44)
	at io.reactivex.Single.subscribe(Single.java:3666)
	at io.reactivex.internal.operators.single.SingleDoOnSuccess.subscribeActual(SingleDoOnSuccess.java:35)
	at io.reactivex.Single.subscribe(Single.java:3666)
	at io.reactivex.internal.operators.maybe.MaybeFromSingle.subscribeActual(MaybeFromSingle.java:41)
	at io.reactivex.Maybe.subscribe(Maybe.java:4290)
	at io.reactivex.internal.operators.maybe.MaybeConcatArray$ConcatMaybeObserver.drain(MaybeConcatArray.java:153)
	at io.reactivex.internal.operators.maybe.MaybeConcatArray$ConcatMaybeObserver.request(MaybeConcatArray.java:78)
	at io.reactivex.internal.operators.flowable.FlowableElementAtMaybe$ElementAtSubscriber.onSubscribe(FlowableElementAtMaybe.java:66)
	at io.reactivex.internal.operators.maybe.MaybeConcatArray.subscribeActual(MaybeConcatArray.java:42)
	at io.reactivex.Flowable.subscribe(Flowable.java:14935)
	at io.reactivex.internal.operators.flowable.FlowableElementAtMaybe.subscribeActual(FlowableElementAtMaybe.java:36)
	at io.reactivex.Maybe.subscribe(Maybe.java:4290)
	at io.reactivex.internal.operators.maybe.MaybeToSingle.subscribeActual(MaybeToSingle.java:46)
	at io.reactivex.Single.subscribe(Single.java:3666)
	at io.reactivex.internal.operators.single.SingleSubscribeOn$SubscribeOnObserver.run(SingleSubscribeOn.java:89)
	at io.reactivex.Scheduler$DisposeTask.run(Scheduler.java:578)
	at io.reactivex.internal.schedulers.ScheduledRunnable.run(ScheduledRunnable.java:66)
	at io.reactivex.internal.schedulers.ScheduledRunnable.call(ScheduledRunnable.java:57)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:301)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1162)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:636)
	at java.lang.Thread.run(Thread.java:764)
Caused by: org.schabi.newpipe.extractor.utils.Parser$RegexException: failed to find pattern ""uri":\s*"https:\/\/api\.soundcloud\.com\/playlists\/((\d)*?)"
	at org.schabi.newpipe.extractor.utils.Parser.matchGroup(Parser.java:72)
	at org.schabi.newpipe.extractor.utils.Parser.matchGroup(Parser.java:61)
	at org.schabi.newpipe.extractor.utils.Parser.matchGroup1(Parser.java:52)
	at org.schabi.newpipe.extractor.services.soundcloud.SoundcloudParsingHelper.resolveIdWithEmbedPlayer(SoundcloudParsingHelper.java:157)
	at org.schabi.newpipe.extractor.services.soundcloud.linkHandler.SoundcloudStreamLinkHandlerFactory.getId(SoundcloudStreamLinkHandlerFactory.java:35)
	... 30 more


and now the given URL is parsed correctly
image

@TobiGr TobiGr added bug Issue is related to a bug soundcloud service, https://soundcloud.com/ labels Oct 11, 2020
@Stypox Stypox requested a review from wb9688 October 11, 2020 10:47
Copy link
Contributor

@wb9688 wb9688 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is OK-ish, to ideally we would parse it through URL.

Edit: I think the !url.endsWith("sets") isn't needed anymore then though.

@Scrxtchy
Copy link
Contributor Author

ideally we would parse it through URL

I could look into altering the solution to use this method when I wake up in the morning. I'm unfamiliar with the larger scale of the codebase, so I just kept it to it's current behaviour. Should not be a hassle to change

@Scrxtchy
Copy link
Contributor Author

Scrxtchy commented Oct 15, 2020

Interestingly enough, users are forbidden from naming their tracks sets
image

Just debugging a new revision now

You can have a set that is named sets so additional changes are in need to handle this

@Scrxtchy
Copy link
Contributor Author

Feels a bit strange jumping between two issues, but if we were under the task that the tailing slash were to be removed, a simple check for the existence of the term "/sets/" within the urlPath would be an accurate check for a playlist URL. I am unsure how that issue is going to be steered going forward.

@@ -153,7 +153,7 @@ public static String resolveIdWithEmbedPlayer(String url) throws IOException, Re
String response = NewPipe.getDownloader().get("https://w.soundcloud.com/player/?url="
+ URLEncoder.encode(url, "UTF-8"), SoundCloud.getLocalization()).responseBody();
// handle playlists / sets different and get playlist id via uir field in JSON
if (url.contains("sets") && !url.endsWith("sets") && !url.endsWith("sets/"))
if (url.contains("/sets/") && !url.endsWith("sets") && !url.endsWith("sets/"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think I was wrong above and that this should be changed to url.contains("/sets/") && !url.endsWith("/sets") && !url.endsWith("/sets/") so that it only doesn't get soundcloud.com/artist/sets URLs, while e.g. soundcloud.com/artist/track-ending-with-sets still works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so that it only doesn't get soundcloud.com/artist/sets

Yeah, this seems to be working, but it will still error out on a widget-related issue, similar to that of #412 (comment) in where it lacks data. Otherwise it seems to be detecting it as a channel type request just fine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With changes in f803558, as URLs need to have their tailing slash removed, it is now impossible for a URL to end in sets/. However I am having a decent struggle determining if ending with sets should be checked if /sets/ is our primary check. This has genuinely caused quite a headache. I have left it in there for safety reasons, but if you think that this would be adequate to be removed, then it shall not be a hassle

@Scrxtchy
Copy link
Contributor Author

I have made changes based on #410 (review) to parse the URL as a Java URL and return a malformed URL exception as per actions used elsewhere in the application

@Scrxtchy
Copy link
Contributor Author

Much like changes made in #414, these also need to apply to elements related to this PR which I caught when returning back to this issue. With some extra time I would hopefully be able to test this on a greater scale within the extractor, but this is already getting a bit exhausting.

Copy link
Member

@Stypox Stypox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Is this ready?

@@ -21,6 +21,8 @@ public static SoundcloudChannelLinkHandlerFactory getInstance() {
@Override
public String getId(String url) throws ParsingException {
Utils.checkUrl(URL_PATTERN, url);
// Remove the tailing slash from URLs due to issues with the SoundCloud API
if (url.charAt(url.length() -1) == '/') url = url.substring(0, url.length()-1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not moving this inside resolveIdWithEmbedPlayer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I was just mimicking what I did in 6a70cb9
but I think I can move everything to that instead

I don't really remember my branches that well anymore, but I think I got it

Copy link
Member

@Stypox Stypox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some tests for this in soundcloud link handler tests?

@Scrxtchy
Copy link
Contributor Author

Testing of adding a slash to the end of the url

assertEquals("259273264", linkHandler.fromUrl("https://soundcloud.com/liluzivert/ps-qs-produced-by-don-cannon/").getId());

Tests for sets in the track url, this also benefits from not relying on some other artist as a source

assertEquals("339401339", linkHandler.fromUrl("https://soundcloud.com/liluzivert/for-real?in=liluzivert/sets/luv-is-rage-2-1").getId());

@Scrxtchy
Copy link
Contributor Author

Wait, that's not going to test the sets URL correctly at all, I remembered why that URL parses, but it only does so since the parameters are dropped, I think I'm going to need another artist source for that one

@Scrxtchy
Copy link
Contributor Author

Would the sample used in the OP be acceptable for the repo? I couldn't find anything suitable on users such as trapcity and nocopyrightsounds, these would have been preferable as to not add another account to be dependant on

@B0pol
Copy link
Member

B0pol commented Oct 27, 2020

Would the sample used in the OP be acceptable for the repo? I couldn't find anything suitable on users such as trapcity and nocopyrightsounds, these would have been preferable as to not add another account to be dependant on

https://soundcloud.com/kechuspider-sets-1
Choose any track here

@Scrxtchy
Copy link
Contributor Author

Thanks, 947ce3e should be final

Copy link
Member

@Stypox Stypox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Looks good

@Stypox Stypox merged commit 6cc50b5 into TeamNewPipe:dev Oct 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is related to a bug soundcloud service, https://soundcloud.com/
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants