Skip to content

Conversation

@bgreni
Copy link
Collaborator

@bgreni bgreni commented Sep 13, 2024

Fixes #48

Follow redirect URLs in the Mojo client implementation.

@bgreni bgreni marked this pull request as ready for review September 13, 2024 16:10
@saviorand
Copy link
Collaborator

Thanks, wanted to share some tips but it looks like you've been able to start just fine! Will review today and tomorrow

@bgreni
Copy link
Collaborator Author

bgreni commented Sep 13, 2024

Thanks, wanted to share some tips but it looks like you've been able to start just fine! Will review today and tomorrow

It's a very approachable codebase I had no issues at all really, well done!

@saviorand
Copy link
Collaborator

@bgreni looks great, I've just merged main since there was a new release today and fixed tests.
I'm trying out your current implementation with a couple examples, thinking to also write some new unit tests for this.
Right now if I replace the URI in client.mojo with http://httpbin.org/status/302 and run magic run mojo client.mojo it works (I get a 200). In theory the redirect path for this is:

http://httpbin.org/status/302
302 FOUND
/redirect/1
302 FOUND
/get
200 OK

One thing I noticed is the Content-Length and connection-close header values are off , e.g. according to redirect-checker the content-length for the last call should be 323 , but I'm getting 9593. This likely has nothing to do with your code and is a bug in Lightbug's (🥁) client, but I thought it would be interesting to fix it.

Another case I found where it breaks is if I use http://google.com as the URI in client.mojo . Then getaddrinfo seems to crash, likely an issue with how the strings are being handled when parsing the URI. Again, nothing to do with the follow-redirect functionality but we can try to debug to properly test follow-redirects as well.

If you are interested feel free to try and debug on your end, I am currently writing tests and can also help debugging if needed. Thanks again for this great contribution!

Comment on lines 392 to 398
struct StatusCode:
alias OK = 200
alias MOVED_PERMANENTLY = 301
alias FOUND = 302
alias TEMPORARY_REDIRECT = 307
alias PERMANENT_REDIRECT = 308
alias NOT_FOUND = 404
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially we could also handle 303 (See Other), 300 (Multiple Choices) and 304 (Not modified), although the logic for latter two might be more complicated. But can also leave it for now and implement in the future

https://developer.mozilla.org/en-US/docs/Web/HTTP/Redirections

@bgreni
Copy link
Collaborator Author

bgreni commented Sep 16, 2024

@saviorand Thank you for the review and testing effort, I'll look into those issues asap

@bgreni
Copy link
Collaborator Author

bgreni commented Sep 21, 2024

Ended up just reimplementing this due my own drastic changes. Also fixed an issue where the path query string was not being added to the request headerline.

The issue with Content-Length that you mentioned is still present, and when I tried google.com the location header is http://www.google.com and it seems to just redirect loop forever? (careful testing that google ip banned me for a few hours lol)

@saviorand
Copy link
Collaborator

@bgreni I think currently content-length is always zero in the HTTPResponse , at least when I'm making a request from the client. If I comment out this line it reads the content-length correctly, but not sure what happens in this case if the header is not present in the response

fn __init__(
        inout self,
        body_bytes: Bytes,
        headers: Headers = Headers(),
        status_code: Int = 200,
        status_text: String = "OK",
        protocol: String = strHttp11,
    ):
        self.headers = headers
        if HeaderKey.CONTENT_TYPE not in self.headers:
            self.headers[HeaderKey.CONTENT_TYPE] = "application/octet-stream"
        self.status_code = status_code
        self.status_text = status_text
        self.protocol = protocol
        self.body_raw = body_bytes
        self.set_connection_keep_alive()
        # self.set_content_length(len(body_bytes)) - comment this out, otherwise content-length = 0

I think it's set to 0 because of setting body to Bytes() here, and then it's not updated after parsing the actual body

fn from_bytes(owned b: Bytes) raises -> HTTPResponse:
        var reader = ByteReader(b^)

        var headers = Headers()
        var protocol: String
        var status_code: String
        var status_text: String

        try:
            protocol, status_code, status_text = headers.parse_raw(reader)
        except e:
            raise Error("Failed to parse response headers: " + e.__str__())

        var response = HTTPResponse(
            Bytes(),
            headers=headers,
            protocol=protocol,
            status_code=int(status_code),
            status_text=status_text,
        )

@saviorand
Copy link
Collaborator

saviorand commented Sep 24, 2024

@bgreni I wonder if the infinite loop with google has to do with an https redirect they have 🤔 although httpbin seems to work fine. Lightbug doesn't have TLS/HTTPS support yet

@bgreni
Copy link
Collaborator Author

bgreni commented Sep 25, 2024

I think it's set to 0 because of setting body to Bytes() here, and then it's not updated after parsing the actual body

Yes it would appear I forgot to include that logic on the response side...

@bgreni
Copy link
Collaborator Author

bgreni commented Sep 25, 2024

@saviorand Seems the loop was related to not updating the Host header after receiving a redirect specifying a new hostname. I end up getting a 200 but no content, which I assume could also be related to missing headers google requires?

@saviorand
Copy link
Collaborator

@bgreni when I print the buffer after reading from the connection like so:

 var bytes_recv = conn.read(new_buf)
        print("new_buf:", String(new_buf))

I see the body, so the issue has to be with how we're then adding it to HTTPResponse or reading it later down the line?

@bgreni
Copy link
Collaborator Author

bgreni commented Sep 25, 2024

@bgreni when I print the buffer after reading from the connection like so:

 var bytes_recv = conn.read(new_buf)
        print("new_buf:", String(new_buf))

I see the body, so the issue has to be with how we're then adding it to HTTPResponse or reading it later down the line?

Ah I see google is using the Transfer-Encoding: chunked header, which is currently not supported (I don't think it was before the refactor either?). Maybe we should create a separate ticket for that and leave that test case commented out for now?

Signed-off-by: Brian Grenier <grenierb96@gmail.com>
@saviorand
Copy link
Collaborator

@bgreni yup, that was not supported before the refactor as well. Let's merge 🔥

@saviorand saviorand merged commit e695225 into Lightbug-HQ:main Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Follow redirects

2 participants