Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YouTube] Throttling parameter decryption is broken, decrypt function is not again fully extracted #902

Closed
AudricV opened this issue Aug 18, 2022 · 13 comments · Fixed by #905
Labels
bug youtube service, https://www.youtube.com/

Comments

@AudricV
Copy link
Member

AudricV commented Aug 18, 2022

With player 1f7d5369, the decryption of the throttling parameter fails because the function is not again fully extracted:

function_n_parameter_not_extracted_fully

Left: what is extracted by the extractor; right: the real function

The extractor still works, because this time an exception catch is properly made.

@AudricV AudricV added bug youtube service, https://www.youtube.com/ labels Aug 18, 2022
@Theta-Dev
Copy link
Contributor

I just noticed the same issue. This time regex literals are to blame:

/,,[/,913,/](,)}/,

Avoiding these is not as easy as braces in strings. We cant simply treat slashes like quotes, because regex character ranges can have slashes in them.

@Theta-Dev
Copy link
Contributor

At this point, wouldn't it be the best solution to use an actual JavaScript lexer to extract the function?

@SamantazFox
Copy link

At this point, wouldn't it be the best solution to use an actual JavaScript lexer to extract the function?

Yep, seems the only reasonnable option to me. And I'm pretty sure that functions wil get harder and harder to parse as the time goes on.

@Theta-Dev
Copy link
Contributor

I am currently working on a YouTube downloader/client library in Rust (thats how noticed the issue).
So I wrote a test implementation of the fix for it, using the ress lexer.

fn extract_js_fn(js: &str, name: &str) -> Result<String> {
    let scan = ress::Scanner::new(js);
    let mut state = 0;
    let mut level = 0;

    let mut start = 0;
    let mut end = 0;

    for item in scan {
        let it = item?;
        let token = it.token;
        match state {
            // Looking for fn name
            0 => {
                if token.matches_ident_str(name) {
                    state = 1;
                    start = it.span.start;
                }
            }
            // Looking for equals
            1 => {
                if token.matches_punct(ress::tokens::Punct::Equal) {
                    state = 2;
                } else {
                    state = 0;
                }
            }
            // Looking for begin/end braces
            2 => {
                if token.matches_punct(ress::tokens::Punct::OpenBrace) {
                    level += 1;
                } else if token.matches_punct(ress::tokens::Punct::CloseBrace) {
                    level -= 1;

                    if level == 0 {
                        end = it.span.end;
                        state = 3;
                        break;
                    }
                }
            }
            _ => break,
        };
    }

    if state != 3 {
        return Err(anyhow!("could not extract js fn"));
    }

    Ok(js[start..end].to_owned())
}

This works fine with the new player.js.
And it looks like Mozilla Rhino, the JS interpreter we are using, has an API for its parser. So it should be possible to
implement this for NewPipe without additional dependencies.

https://javadoc.io/doc/org.mozilla/rhino/latest/index.html
http://ramkulkarni.com/blog/understanding-ast-created-by-mozilla-rhino-parser/

@pukkandan
Copy link

pukkandan commented Aug 19, 2022

A lexer isn't really needed. The function body can be extracted by carefully keeping track of the quotes and braces. Equivalent code in yt-dlp: https://github.com/yt-dlp/yt-dlp/blob/b76e9cedb33d23f21060281596f7443750f67758/yt_dlp/jsinterp.py#L229-L254

But if your dependency already has a Lexer, ig why not use it

@Theta-Dev
Copy link
Contributor

I now have a working prototype. It is not pretty and definitely needs cleanup, so I have to do that first before I make a PR. I ended up having to copy Rhino's tokenizer class because it is private. The higher-level parser is accessable, but it only parses entire JS documents into syntax trees, which would take too much time.

I also found an issue with the Rhino JS interpreter. Version 1.7.14 uses javax.lang.model.SourceVersion, which is not available on android. This causes the app to load indefinitely when opening a video. If you have any idea how to fix this without downgrading, please help me. I have no idea why this error did not occur before.
mozilla/rhino#1149

@litetex
Copy link
Member

litetex commented Aug 21, 2022

The problem described here will also be partially fixed with #882 (comment)

@triallax
Copy link
Contributor

triallax commented Aug 24, 2022

A lexer isn't really needed. The function body can be extracted by carefully keeping track of the quotes and braces.

I think that's a good approach.

But if your dependency already has a Lexer, ig why not use it

It does, but as mentioned by @Theta-Dev, it is unfortunately private, and I don't think we should copy the lexer to our codebase.

An alternative is to fork Rhino and make the lexer public.

@litetex
Copy link
Member

litetex commented Aug 24, 2022

An alternative is to fork Rhino and make the lexer public.

Or maybe contribute the changes to Mozilla ;)

@triallax
Copy link
Contributor

If they would accept it, sure. ;)

@Stypox
Copy link
Member

Stypox commented Apr 5, 2023

I am currently working on a YouTube downloader/client library in Rust

@Theta-Dev are you still rewriting NewPipeExtractor in Rust? Is it public yet? ;-)

Sorry for writing this comment here, but since you're not on IRC I didn't know how to write to you otherwise.

@Theta-Dev
Copy link
Contributor

Theta-Dev commented Apr 5, 2023

@Stypox yes, RustyPipe is basically finished. You can get it here:

https://code.thetadev.de/ThetaDev/rustypipe

btw: how can I join you on IRC?

@Stypox
Copy link
Member

Stypox commented Apr 5, 2023

Check out Contributing.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug youtube service, https://www.youtube.com/
Projects
None yet
7 participants