Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The content is always wrapped in HTML #55

Closed
robrwo opened this issue Feb 19, 2021 · 5 comments
Closed

The content is always wrapped in HTML #55

robrwo opened this issue Feb 19, 2021 · 5 comments

Comments

@robrwo
Copy link

robrwo commented Feb 19, 2021

When requesting something that returns a non-HTML document, e.g. application/json, if the response from the server is HTTP 304, then the content_type is undefined but the content (presumably the cached content) is wrapped in HTML, e.g.

<html><head></head><body><pre style="word-wrap: break-word; white-space: pre-wrap;">{ value => 1 }</pre></body></html>

This consistently happens when the response is HTTP 304, but this seems to happen sometimes when the response is HTTP 200.

@robrwo
Copy link
Author

robrwo commented Feb 19, 2021

Looking at the code, the decoded_content method is treating the content as HTML:

sub decoded_content($self) {
    $self->document_future->then(sub( $root ) {
        # Join _all_ child nodes together to also fetch DOCTYPE nodes
        # and the stuff that comes after them
        my @content = map {
            my $nodeId = $_->{nodeId};
            $self->log('trace', "Fetching HTML for node " . $nodeId );
            $self->target->send_message('DOM.getOuterHTML', nodeId => 0+$nodeId )
        } @{ $root->{root}->{children} };
 
        Future->wait_all( @content )
    })->then( sub( @outerHTML_f ) {
        Future->done( join "", map { $_->get->{outerHTML} } @outerHTML_f )
    })->get;
};

It should check the content type, and perhaps just return the raw content instead?

@Corion
Copy link
Owner

Corion commented Feb 20, 2021

Yes, this is an ugly problem. Can you see if the following works well enough for your use case(s)? It uses the content from the response, but that content will already have been decoded and I'm not sure how well it works with binary content:

sub decoded_content($self) {
    $self->document_future->then(sub( $root ) {
        # Join _all_ child nodes together to also fetch DOCTYPE nodes
        # and the stuff that comes after them
        my $ct = $self->ct;

        my $res;
        if( $ct eq 'text/html' ) {
            my @content = map {
                my $nodeId = $_->{nodeId};
                $self->log('trace', "Fetching HTML for node " . $nodeId );
                $self->target->send_message('DOM.getOuterHTML', nodeId => 0+$nodeId )
            } @{ $root->{root}->{children} };

            $res = Future->wait_all( @content )
            ->then( sub( @outerHTML_f ) {
                Future->done( join "", map { $_->get->{outerHTML} } @outerHTML_f );
            });
        } else {

            # Return the raw body
            #use Data::Dumper;
            #warn Dumper $self->response;
            #warn $self->response->content;

            # The content is already decoded (?!)
            # I'm not sure how well this plays with encodings, and
            # binary content
            $res = Future->done($self->response->content);
        };
        return $res;
    })->get;
};

@robrwo
Copy link
Author

robrwo commented Feb 22, 2021

That works for JSON data, but I get an error for HTML pages:

Could not find node with given id

-32000 at perl5/perlbrew/perls/perl-5.28.1/lib/site_perl/5.28.1/Chrome/DevToolsProtocol/Target.pm line 491
at perl5/perlbrew/perls/perl-5.28.1/lib/site_perl/5.28.1/Future.pm line 882

@Corion
Copy link
Owner

Corion commented Feb 22, 2021

Whoops - sorry, I didn't run the test suite properly. This one passes my new test and the existing test suite - does it work for your case too?

sub decoded_content($self) {
    my $res;
    my $ct = $self->ct || 'text/html';
    if( $ct eq 'text/html' ) {
        $res = $self->document_future->then(sub( $root ) {
        # Join _all_ child nodes together to also fetch DOCTYPE nodes
        # and the stuff that comes after them

            my @content = map {
                my $nodeId = $_->{nodeId};
                $self->log('trace', "Fetching HTML for node " . $nodeId );
                $self->target->send_message('DOM.getOuterHTML', nodeId => 0+$nodeId )
            } @{ $root->{root}->{children} };

            return Future->wait_all( @content )
            ->then( sub( @outerHTML_f ) {
                Future->done( join "", map { $_->get->{outerHTML} } @outerHTML_f );
            });
        });
    } else {
        # Return the raw body
        #use Data::Dumper;
        #warn Dumper $self->response;
        #warn $self->response->content;

        # The content is already decoded (?!)
        # I'm not sure how well this plays with encodings, and
        # binary content
        $res = Future->done($self->response->content);
    };
    return $res->get
};

@robrwo
Copy link
Author

robrwo commented Mar 20, 2021

This seems better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants