Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve protocol from client #22

Open
guigeek38 opened this issue Nov 13, 2021 · 36 comments
Open

Preserve protocol from client #22

guigeek38 opened this issue Nov 13, 2021 · 36 comments
Assignees
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@guigeek38
Copy link

My apt source list on Armbian uses the HTTPS version of apt.armbian.com.
However, I am sometimes redirected to a mirror using HTTP, which APT does not allow, and then I get the error:

Err:7 https://apt.armbian.com bullseye Release
  Redirection from https to 'http://mirror.armbian.de/apt/dists/bullseye/Release' is forbidden

I saw this PR which should correct the problem, however it doesn't seem to work: #14

To check I simply make requests with Curl on https://apt.armbian.com and get redirects either in https or in http :

$ curl -Ls -o /dev/null -w %{url_effective\} https://apt.armbian.com
http://armbian.16z.eu/apt/
$ curl -Ls -o /dev/null -w %{url_effective\} https://apt.armbian.com
http://mirrors.netix.net/armbian/apt/
$ curl -Ls -o /dev/null -w %{url_effective\} https://apt.armbian.com
https://imola.armbian.com/apt/
$ curl -Ls -o /dev/null -w %{url_effective\} https://apt.armbian.com
http://mirror.armbian.de/apt/

Is there something that I misunderstood, or is there a problem?
Thank you for your help !

@lanefu
Copy link
Member

lanefu commented Nov 18, 2021

Let me look into this again... I sware it fixes adn comes back

@lanefu
Copy link
Member

lanefu commented Nov 18, 2021

There may be a mirror hardcoded to a protocol, or redirecting rather our tool
if so we may have to remove those mirrors from rotation

@lanefu lanefu self-assigned this Nov 18, 2021
@lanefu lanefu added bug Something isn't working help wanted Extra attention is needed labels Nov 18, 2021
@MichaIng
Copy link
Contributor

MichaIng commented Nov 18, 2021

Jep, known and expected, still, sadly. I just reloaded the uWSGI server and now it should work. But after some hours it will again fail. As discussed earlier in related issues and discord, it seems to be some response caching. Since Armbian ships images with plain HTTP in APT sources, nearly all answers will contain the plain HTTP URL, and after a while the cache (presumably) serves the same plain HTTP URLs as well when the request actually was via HTTPS.

I didn't find time to look into the used Docker container, but since it is a 4 stage request handling (host Nginx => container Nginx => uWSGI => flask), all of them need to be checked for possible caches. The logic in the flask app is definitely correct, which is proven by the fact that after a reload it works as expected for some hours.

@lanefu
Copy link
Member

lanefu commented Nov 18, 2021

ahh okay.. sounds like I need to run this app in a different base container... aka... no nginx.. just uwsgi... or gnuicorn.

@lanefu
Copy link
Member

lanefu commented Nov 18, 2021

that will simplify the first part..and make it user to tune the appserver layer

@MichaIng
Copy link
Contributor

MichaIng commented Nov 18, 2021

If I remember right, the used Docker container has Nginx right included: https://github.com/tiangolo/uwsgi-nginx-flask-docker
The other option would be to forward/map ports 80/443 right via Docker instead of via dedicated Nginx proxy on the host. Like -p 80:25090 -p 443:25090? But indeed an Nginx on the host can be easier configured and monitored, so an alternative would be to expose the uWSGI socket on the host and have the hosts Nginx with:

location / {
    include uwsgi_params;
    uwsgi_pass unix:///path/to/uwsgi.sock;
}

The container's Nginx would then basically be unused, probably its startup can be prevented when starting the container.

EDIT: Ah sorry your idea was to use different base container. The above could be tests whether skipping the additional Nginx helps with the issue, but indeed a container which doesn't ship the Nginx proxy in the first place would be cleaner 👍.

@BloodyNora
Copy link

BloodyNora commented Nov 27, 2021

# curl https://apt.armbian.com
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>Redirecting...</title>
<h1>Redirecting...</h1>
<p>You should be redirected automatically to target URL: <a href="http://armbian.hosthatch.com/apt/">http://armbian.hosthatch.com/apt/</a>.  If not click the link.

that was seconds ago. not behind a proxy or anything else that could interfere, it's the webserver at apt.armbian.com not paying attention to the URL scheme that is presented when redirecting.

i've done it a number of times now today, there is currently seemingly no redirect entry that correctly redirects from https to https.

here are the ones i found in my backlog

Redirection from https to 'http://xogium.performanceservers.nl/apt/dists/bullseye/Release' is forbidden
Redirection from https to 'http://armbian.hosthatch.com/apt/dists/bullseye/Release' is forbidden

i've seen at least 2 more but i can't find them in the backlog, sorry. the [IP: 173.45.128.12 443] part of the error is always the same. last time i've seen it work OK was about a week ago, didnt check since.

my interim solution is to just use https://armbian.hosthatch.com/apt/ or the other URL directly. for example, put this into your /etc/apt/sources.list.d/armbian.list:

#deb https://apt.armbian.com bullseye main bullseye-utils bullseye-desktop
deb https://armbian.hosthatch.com/apt bullseye main bullseye-utils bullseye-desktop

N.B. in my armbian.sources.list file - the URL that points to armbian.com doesn't have the /apt/ subdir, thats added in the redirect, too.

@MichaIng
Copy link
Contributor

MichaIng commented Nov 27, 2021

@BloodyNora
Please try again now (soon), to have another indication that caching is the issue. I reloaded the uWSGI server and expect it to work now for an hour or less.

@BloodyNora
Copy link

BloodyNora commented Nov 27, 2021

@MichaIng can confirm it works as of now. thank you very much!

# cat /etc/apt/sources.list.d/armbian.list 
deb https://apt.armbian.com bullseye main bullseye-utils bullseye-desktop
#deb https://armbian.hosthatch.com/apt bullseye main bullseye-utils bullseye-desktop
# apt update
Hit:2 https://security.debian.org bullseye-security InRelease
Hit:3 https://deb.debian.org/debian bullseye InRelease
Hit:4 https://deb.debian.org/debian bullseye-updates InRelease
Hit:5 https://deb.debian.org/debian bullseye-backports InRelease
Get:1 https://xogium.performanceservers.nl/apt bullseye InRelease [18.5 kB]
Get:6 https://xogium.performanceservers.nl/apt bullseye/main armhf Packages [169 kB]            
[...]

@MichaIng
Copy link
Contributor

No need to thank me, it will break itself soon 😅. But it's good to have this once verified by someone else. I'll have another look into the container tomorrow to see whether I can find any caching instance/setting.

@lanefu
Copy link
Member

lanefu commented Nov 28, 2021

No need to thank me, it will break itself soon sweat_smile. But it's good to have this once verified by someone else. I'll have another look into the container tomorrow to see whether I can find any caching instance/setting.

Hey @MichaIng I swapped out the container runtime to https://hub.docker.com/r/tiangolo/meinheld-gunicorn/

Seems to be behaving... I also noticed some lookups were failing when sites were sending proxy headers. causing a list of IPs to be sent to the redirector.... so i fixed that in nginx as well

@lanefu
Copy link
Member

lanefu commented Nov 28, 2021

since no longer using uwsgi. need to change how the reload function

something like os.kill(os.ppid(), SIGHUP)

@MichaIng
Copy link
Contributor

MichaIng commented Nov 28, 2021

Great, I'll test it a few times throughout the day to assure the caching issue is gone.

since no longer using uwsgi. need to change how the reload function

When doing so, maybe add some kind of authentication of access limit 😉.

EDIT: Hmm, sadly it doesn't work again:

# curl -I 'https://apt.armbian.com/'
HTTP/2 302
server: nginx/1.18.0 (Ubuntu)
date: Sun, 28 Nov 2021 13:50:38 GMT
content-type: text/html; charset=utf-8
content-length: 280
location: http://armbian.systemonachip.net/apt/

While the client request scheme and IP is detected correctly (same as before):

# curl -I 'https://apt.armbian.com/status'
HTTP/2 200
server: nginx/1.18.0 (Ubuntu)
date: Sun, 28 Nov 2021 14:01:16 GMT
content-type: text/html; charset=utf-8
content-length: 2
x-client-ip: <myIPv6>
x-request-scheme: https
# curl -I 'http://apt.armbian.com/status'
HTTP/1.1 200 OK
Server: nginx/1.18.0 (Ubuntu)
Date: Sun, 28 Nov 2021 14:04:42 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 2
Connection: keep-alive
X-Client-IP: <myIPv6>
X-Request-Scheme: http

If you switched out the container, maybe then the host Nginx is the issue? I know it is not great to experiment with a live production server, but could you try to stop the hosts Nginx and expose the containers ports at 80+443 directly on the host? Of course this would mean that it handles HTTPS by itself, not sure if it can be configured to do so and then starting it with -p 80:80 -p 443:443?

Or it is flask itself which starts response caching after hitting a certain amount of those, sadly ignoring the response headers? For debugging it could be also beneficial to add the X-Request-Scheme response header as well to regular redirects, so see whether it is then set wrong as well, while /status keeps responding with the correct header.

@lanefu
Copy link
Member

lanefu commented Nov 28, 2021 via email

@MichaIng
Copy link
Contributor

Makes sense, as long as you have direct server access, no need for such an API.

@MichaIng
Copy link
Contributor

Btw it now works here. Did you reload/restart the service/container/server recently?

I opened a PR with a quick debugging code: #23
If it again stops preserving HTTPS, adding this code may help to check whether the Location header only is cached or whether it makes a difference when manipulating the response explicitly by adding another header, and we can see whether the two get_scheme() calls always return the same (after a while false) result or whether the one within get_redirect() is what is cached, or the whole get_redirect() output somehow. Since I have no idea at what level this may or may not (host webserver, container networking, unicorn, flask, Python itself) this may at least narrow it down a little.

@lanefu
Copy link
Member

lanefu commented Nov 28, 2021

before I merge that.... want to test what i just rolled out

now running single worker process with single thread... (it still handled 500 req/sec no problem)

If the issue re-occurs we'll know to keep troubleshooting flask and python. This will eliminate all variables around multiple instances of the script running

monitoring with this

#!/bin/bash

result=0
while [[ ${result} -eq 0 ]];
    do 
        date
        curl -Ls -o /dev/null -w %{url_effective\} https://apt.armbian.com | grep -q https
        result=$?
        sleep 30s
    done
pushover -m 'she failed us'

@MichaIng
Copy link
Contributor

MichaIng commented Nov 28, 2021

Yes definitely makes sense to verify that the issue is still present. Also probably it's better to copy&paste the little code change manually instead of merging this into production branch, as it is meant for debugging only, and if wanted, could be implemented a more efficient way, I think.

EDIT: Ah, hehe here now I get http:// redirects again.

@lanefu
Copy link
Member

lanefu commented Nov 28, 2021 via email

@lanefu
Copy link
Member

lanefu commented Dec 2, 2021

I lost my monitoring task :( anyway yeah it's back.. sooo it doesnt' seem to be a threading issue.. so time to look closer at the code.

@lanefu
Copy link
Member

lanefu commented Dec 2, 2021

i've reloaded for now

@BloodyNora
Copy link

FYI, it's already gone again.

since this thread is getting longer and longer now and nobody ever reads a full thread, please see here for an interim solution: #22 (comment)

lanefu pushed a commit that referenced this issue Dec 5, 2021
This is mainly for debugging: #22

Signed-off-by: MichaIng <micha@dietpi.com>
@lanefu
Copy link
Member

lanefu commented Dec 5, 2021

Okay I think I fixed it with #25

Also reverted #23

@MichaIng
Copy link
Contributor

MichaIng commented Dec 5, 2021

Sadly not 😢:

# curl -I https://apt.armbian.com
HTTP/2 302
server: nginx/1.18.0 (Ubuntu)
date: Sun, 05 Dec 2021 14:13:32 GMT
content-type: text/html; charset=utf-8
content-length: 258
location: http://armbian.12z.eu/apt/

With the header being set, did you see it failing as well? I was thinking whether adding the header may have an effect on whichever cache is still kicking in here. At least yesterday, after you deployed the PR with the header, it worked. I just didn't have a chance to test again some hours later.

@lanefu
Copy link
Member

lanefu commented Dec 5, 2021 via email

@MichaIng
Copy link
Contributor

MichaIng commented Dec 5, 2021

I tried to find info about Flask caching, but while there is a dedicated Flask cache module and of course response headers for browser caching can be set, this all does not apply here. Could you re-add the scheme response header, just to see whether it keeps matching the redirect scheme or not? And if there is any caching layer setting up different cached instances based on it, it may even help. But actually I'm pretty out of ideas at which layer this caching is done, and whether "caching" is even the right word to describe it 😄.

Shall we try to get some external help, on Super User, Stack Overflow, Flask repository or so?

@lanefu
Copy link
Member

lanefu commented Dec 5, 2021 via email

@MichaIng
Copy link
Contributor

MichaIng commented Dec 5, 2021

But earlier when the bug occurred.. the header showed
the correct scheme despite returning the wrong result.

Was this before or after you moved the get_scheme() call into the main route? Reasonable step when resp.headers['X-Request-Scheme'] = get_scheme() lead to a different scheme shown in the header than what get_redirect() obviously got internally. Would be interesting to see whether the same variable derived from the same single get_scheme() call applied once as header and then passed to get_redirect() still can lead to non-matching schemes in the response. That would be very strange. Only left would then be to verify that mirror_url = mirror_class.next(region) keeps getting mirrors without scheme or whether for magical reasons it suddenly returns mirrors with schemes, so that the client request scheme is simply not used 😄.

@lanefu
Copy link
Member

lanefu commented Dec 5, 2021 via email

@MichaIng
Copy link
Contributor

MichaIng commented Dec 5, 2021

Makes sense. Let's see what effect both changes have.

@lanefu
Copy link
Member

lanefu commented Dec 6, 2021

https://github.com/armbian/dl-router/blob/master/app/main.py#L100

i feel like we're hitting an edge case here. im wondering if get_redirect isn't returning a scheme at all, and flask or the server is nice enough to add http to make it a valid redirect

@lanefu
Copy link
Member

lanefu commented Dec 6, 2021

@MichaIng okay running on this now https://github.com/armbian/dl-router/tree/pass_scheme

@MichaIng
Copy link
Contributor

MichaIng commented Dec 6, 2021

A mismatch indeed/still:

# curl -I https://apt.armbian.com
HTTP/2 302
server: nginx/1.18.0 (Ubuntu)
date: Mon, 06 Dec 2021 11:58:40 GMT
content-type: text/html; charset=utf-8
content-length: 264
location: http://mirror.armbian.de/apt/
x-request-scheme: https

So we have two get_scheme right after another but either giving different results or more likely get_redirect doesn't really respect it after a while. Last idea to show this more explicity would be:

    scheme = get_scheme()
    resp = redirect(get_redirect(path, get_ip(), scheme), 302)
    resp.headers['X-Request-Scheme'] = scheme
    return resp

I do not believe that mirror_url.find('://', 3, 8) or mirror_url.count('://', 2, 8) are the issue or do make any difference, since both works for some hours reliably and the logic is simple and clear. But I have no other explanation either 🤔.

@lanefu
Copy link
Member

lanefu commented Dec 6, 2021 via email

@lanefu
Copy link
Member

lanefu commented Dec 7, 2021

gonna be busy for rest of week.. i've made a bandaid script that polls every 30 minutes, and if it returns the wrong result it reloads.... and also sends me a notification 😬

@lanefu
Copy link
Member

lanefu commented Dec 12, 2021

After scratching surface with unit tests and switching to urlsplit per advise of a friend, I don't think get_redirect() is directly the culprit

https://github.com/armbian/dl-router/tree/moar_tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants