New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache updates #1957

Open
andrewdavidwong opened this Issue May 5, 2016 · 44 comments

Comments

Projects
None yet
8 participants
@andrewdavidwong
Member

andrewdavidwong commented May 5, 2016

It's common for users to have multiple TemplateVMs that download many of the same packages when being individually updated. Caching these packages (e.g., in the UpdateVM) would allow us to download a package only once, then make it available to all the TemplateVMs which need it (and perhaps even to dom0), thereby saving bandwidth.

This has come up on the mailing lists several times over the years:

Here's a blog post about setting up a squid caching proxy for DNF updates on baremetal Fedora:

@andrewdavidwong andrewdavidwong changed the title from Cache package updates to Cache updates May 5, 2016

@taradiddles

This comment has been minimized.

Show comment
Hide comment
@taradiddles

taradiddles May 5, 2016

It's indeed a common problem when deploying fedora vms/containers, or with server farms. Debian has apt-cacher(ng) but fedora doesn't have something similar.

Solutions that came up:

Anyway, instead of having specific tools for each distro it would be wiser to have a generic solution.
So - all in all, the squid solution may be the best one, with cache misses rate being something to investigate.

It's indeed a common problem when deploying fedora vms/containers, or with server farms. Debian has apt-cacher(ng) but fedora doesn't have something similar.

Solutions that came up:

Anyway, instead of having specific tools for each distro it would be wiser to have a generic solution.
So - all in all, the squid solution may be the best one, with cache misses rate being something to investigate.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek May 5, 2016

Member

Actually apt-cacher-ng works for Fedora too :)
Maybe we can simply use it instead of tinyproxy as update proxy?

Member

marmarek commented May 5, 2016

Actually apt-cacher-ng works for Fedora too :)
Maybe we can simply use it instead of tinyproxy as update proxy?

@taradiddles

This comment has been minimized.

Show comment
Hide comment
@taradiddles

taradiddles May 5, 2016

apt-cacher-ng works on fedora for mirroring debian stuff, but does it really work for mirroring (d)rpms/metadata downloaded with yum/dnf ?

From the doc [1]: "6.3 Fedora Core - Attempts to add apt-cacher-ng support ended up in pain and the author lost any motivation in further research on this subject. "

[1] https://www.unix-ag.uni-kl.de/~bloch/acng/html/distinstructions.html#hints-fccore

apt-cacher-ng works on fedora for mirroring debian stuff, but does it really work for mirroring (d)rpms/metadata downloaded with yum/dnf ?

From the doc [1]: "6.3 Fedora Core - Attempts to add apt-cacher-ng support ended up in pain and the author lost any motivation in further research on this subject. "

[1] https://www.unix-ag.uni-kl.de/~bloch/acng/html/distinstructions.html#hints-fccore

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek May 5, 2016

Member

Yes, I've seen this. But in practice it works. The only problem is
dynamic mirror selection - it may make caching difficult (when each time
different mirror is selected).

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

Member

marmarek commented May 5, 2016

Yes, I've seen this. But in practice it works. The only problem is
dynamic mirror selection - it may make caching difficult (when each time
different mirror is selected).

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

@adrelanos

This comment has been minimized.

Show comment
Hide comment
@adrelanos

adrelanos May 5, 2016

Member

Marek Marczykowski-Górecki:

Actually apt-cacher works for Fedora too :)
Maybe we can simply use it instead of tinyproxy as update proxy?

Can it also let through non-apt traffic? Specifically I am wondering
about tb-updater.

Member

adrelanos commented May 5, 2016

Marek Marczykowski-Górecki:

Actually apt-cacher works for Fedora too :)
Maybe we can simply use it instead of tinyproxy as update proxy?

Can it also let through non-apt traffic? Specifically I am wondering
about tb-updater.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek May 5, 2016

Member

Can it also let through non-apt traffic? Specifically I am wondering
about tb-updater.

That's interesting question - if you have apt-cacher-ng instance handy,
it worth a try. Anyway it has quite flexible configuration, so probably
doable.

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

Member

marmarek commented May 5, 2016

Can it also let through non-apt traffic? Specifically I am wondering
about tb-updater.

That's interesting question - if you have apt-cacher-ng instance handy,
it worth a try. Anyway it has quite flexible configuration, so probably
doable.

Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

@adrelanos

This comment has been minimized.

Show comment
Hide comment
@adrelanos

adrelanos May 6, 2016

Member

I don't think there is a generic solution that works at the same time
well enough for both, Debian and Fedora based. Why do we need a generic
all at once solution anyhow? Here is what I suggest:

  • Let's keep tinyproxy as is. As fallback. And for misc traffic.
    (tb-updater, user custom stuff and what not.)
  • Let's install apt-cacher-ng and the fedora caching proxy by default in
    the UpdateVM.
  • Let's configure Debian based VMs to use apt-cacher-ng.
  • Let's configure Fedora based VMs to use the fedora caching proxy.

What do you think?

@marmarek

Can it also let through non-apt traffic? Specifically I am wondering
about tb-updater.

That's interesting question - if you have apt-cacher-ng instance handy,
it worth a try. Anyway it has quite flexible configuration, so probably
doable.

I've read all the config, and tired, does not seem possible but never
mind as per my above suggestion.

Member

adrelanos commented May 6, 2016

I don't think there is a generic solution that works at the same time
well enough for both, Debian and Fedora based. Why do we need a generic
all at once solution anyhow? Here is what I suggest:

  • Let's keep tinyproxy as is. As fallback. And for misc traffic.
    (tb-updater, user custom stuff and what not.)
  • Let's install apt-cacher-ng and the fedora caching proxy by default in
    the UpdateVM.
  • Let's configure Debian based VMs to use apt-cacher-ng.
  • Let's configure Fedora based VMs to use the fedora caching proxy.

What do you think?

@marmarek

Can it also let through non-apt traffic? Specifically I am wondering
about tb-updater.

That's interesting question - if you have apt-cacher-ng instance handy,
it worth a try. Anyway it has quite flexible configuration, so probably
doable.

I've read all the config, and tired, does not seem possible but never
mind as per my above suggestion.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek May 7, 2016

Member

It will require more resources (memory), somehow wasted when one use for example only Debian templates. But maybe it is possible to activate those services on demand (socket activation comes to my mind). It will be even easier for qrexec-based updates proxy.

Member

marmarek commented May 7, 2016

It will require more resources (memory), somehow wasted when one use for example only Debian templates. But maybe it is possible to activate those services on demand (socket activation comes to my mind). It will be even easier for qrexec-based updates proxy.

@taradiddles

This comment has been minimized.

Show comment
Hide comment
@taradiddles

taradiddles May 7, 2016

@adrelanos

Why do we need a generic all at once solution anyhow

I'm all for 100% caching success rate with a specific mechanism for each distro, but do Qubes developpers/contributors have time to develop/support that feature ?
If yes, that's cool ; otherwise, a solution like squid would be easy to implement, and since it's distro agnostic it will help not only the supported distros (fedora, debian, arch?), but also other distributions that users install in HVMs (even windows then). The problems/unknowns with squid are the cache miss rate, the cache disk usage in order to minimize those, and the use of different mirrors with yum (although I find out that I usually always connect to the same one).

@adrelanos

Why do we need a generic all at once solution anyhow

I'm all for 100% caching success rate with a specific mechanism for each distro, but do Qubes developpers/contributors have time to develop/support that feature ?
If yes, that's cool ; otherwise, a solution like squid would be easy to implement, and since it's distro agnostic it will help not only the supported distros (fedora, debian, arch?), but also other distributions that users install in HVMs (even windows then). The problems/unknowns with squid are the cache miss rate, the cache disk usage in order to minimize those, and the use of different mirrors with yum (although I find out that I usually always connect to the same one).

@qjoo

This comment has been minimized.

Show comment
Hide comment
@qjoo

qjoo May 7, 2016

I'm using polipo proxy => tor to cache updates. I also modified the repo configuration to use one specific update server instead of dynamically selecting it. I'm planing to document my setup and will post a link here.

qjoo commented May 7, 2016

I'm using polipo proxy => tor to cache updates. I also modified the repo configuration to use one specific update server instead of dynamically selecting it. I'm planing to document my setup and will post a link here.

@kalkin

This comment has been minimized.

Show comment
Hide comment
@kalkin

kalkin May 7, 2016

Member

Just wanted to throw in https://github.com/yevmel/squid-rpm-cache I planned to setup a dedicated squid vm and use the above mentioned config/plugin to cache rpms, but never found the time for it.

The problems/unknowns with squid are the cache miss rate, the cache disk usage in order to minimize those, and the use of different mirrors with yum (although I find out that I usually always connect to the same one).

Currently i just use my NAS which has a "normal" squid running as caching proxy. I have an ansible script which generates me my templates. In the templates I replaced the metalink parameter with baseurl to the nearest Fedora mirror, in /etc/yum.repos.d/fedora.repo. In /etc/yum.conf I replaced the proxy option with my NAS proxy and allowed TempalteVMs to connect to it.

Member

kalkin commented May 7, 2016

Just wanted to throw in https://github.com/yevmel/squid-rpm-cache I planned to setup a dedicated squid vm and use the above mentioned config/plugin to cache rpms, but never found the time for it.

The problems/unknowns with squid are the cache miss rate, the cache disk usage in order to minimize those, and the use of different mirrors with yum (although I find out that I usually always connect to the same one).

Currently i just use my NAS which has a "normal" squid running as caching proxy. I have an ansible script which generates me my templates. In the templates I replaced the metalink parameter with baseurl to the nearest Fedora mirror, in /etc/yum.repos.d/fedora.repo. In /etc/yum.conf I replaced the proxy option with my NAS proxy and allowed TempalteVMs to connect to it.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek May 7, 2016

Member

My experience with squid is horrible in terms of resources (RAM, I/O usage) for small setups. Looks like an overkill for just downloading updates from a few templates from time to time.

Member

marmarek commented May 7, 2016

My experience with squid is horrible in terms of resources (RAM, I/O usage) for small setups. Looks like an overkill for just downloading updates from a few templates from time to time.

@adrelanos

This comment has been minimized.

Show comment
Hide comment
@adrelanos

adrelanos May 9, 2016

Member

I don't like saying this, but we should also consider making this an additional, non-default option or wontfix also. I like apt-cacher-ng very much and use it myself. However, introducing it by default into Qubes would lead to new issues, more users having issues with upgrading due to added technical complexity. There are corner cases where apt-cacher-ng introduces new issues, such as showing Hash Sum mismatch errors during apt-get update.

Member

adrelanos commented May 9, 2016

I don't like saying this, but we should also consider making this an additional, non-default option or wontfix also. I like apt-cacher-ng very much and use it myself. However, introducing it by default into Qubes would lead to new issues, more users having issues with upgrading due to added technical complexity. There are corner cases where apt-cacher-ng introduces new issues, such as showing Hash Sum mismatch errors during apt-get update.

@taradiddles

This comment has been minimized.

Show comment
Hide comment
@taradiddles

taradiddles May 10, 2016

@marmarek

FWIW I have squid installed on an embedded router (RB450g) for a 25+ people office and it's been running for literally ages without any problem. There's a strict bandwidth control (delay pools), which is usually the biggest offender in terms of resources, but squid's memory usage has constantly been < 20 Mo and highest CPU usage < 6%. Granted, the office's uplink speed is low - in the megabits/s range - but the resources available for updateVM are in another league compared to the embedded stuff and the setup - only caching - is not fancy.

tl;dr, squid is not as bad as it used to be years ago.

@adrelanos

The issues you mention reinforce my concern that it will be too time-consuming for Qubes devs to support distro-specific solutions. A simple generic one, even if not optimal is still better than nothing at all, rather than "wontfix".
Plus, users kalkin and qjoo seem to have a working solution, why not try those ?

just my 2c - not pushing for anything, you guys are doing a great work !

@marmarek

FWIW I have squid installed on an embedded router (RB450g) for a 25+ people office and it's been running for literally ages without any problem. There's a strict bandwidth control (delay pools), which is usually the biggest offender in terms of resources, but squid's memory usage has constantly been < 20 Mo and highest CPU usage < 6%. Granted, the office's uplink speed is low - in the megabits/s range - but the resources available for updateVM are in another league compared to the embedded stuff and the setup - only caching - is not fancy.

tl;dr, squid is not as bad as it used to be years ago.

@adrelanos

The issues you mention reinforce my concern that it will be too time-consuming for Qubes devs to support distro-specific solutions. A simple generic one, even if not optimal is still better than nothing at all, rather than "wontfix".
Plus, users kalkin and qjoo seem to have a working solution, why not try those ?

just my 2c - not pushing for anything, you guys are doing a great work !

@andrewdavidwong

This comment has been minimized.

Show comment
Hide comment
@andrewdavidwong

andrewdavidwong May 10, 2016

Member

At the very least, we should provide some documentation (or suggestions or pointers in the documentation) regarding something like @taradiddles's router solution. Qubes users are more likely than the average Linux user to have multiple machines (in this case, virtual) downloading exactly the same updates.

Member

andrewdavidwong commented May 10, 2016

At the very least, we should provide some documentation (or suggestions or pointers in the documentation) regarding something like @taradiddles's router solution. Qubes users are more likely than the average Linux user to have multiple machines (in this case, virtual) downloading exactly the same updates.

@Rudd-O

This comment has been minimized.

Show comment
Hide comment
@Rudd-O

Rudd-O May 10, 2016

Looks like what you want is Squid with an adaptive disk cache size (for storing packages in the volatile /var/cache/squid directory), and configured with no memory cache. Since the config file can be in a different place and the unit file can be overridden to specify the Qubes specific config file, it may work very well for this purpose. Squid is goddamn good these days, and it supports regex-based filters (plus you can block methods other than GET, and you can support proxy caching FTP sites).

OTOH, it's always a security footprint issue to run a larger codebase for a cache. Also, Squid caching can be ineffective if multiple VMs download files from different mirrors (remember that the decision of which mirror to use is left practically at random to the VM calling onto the Squid proxy to do its job).

For those reasons, it may be wise to investigate solutions that do a better job of proxy caching using a content-addressable store, or matching file names.

Rudd-O commented May 10, 2016

Looks like what you want is Squid with an adaptive disk cache size (for storing packages in the volatile /var/cache/squid directory), and configured with no memory cache. Since the config file can be in a different place and the unit file can be overridden to specify the Qubes specific config file, it may work very well for this purpose. Squid is goddamn good these days, and it supports regex-based filters (plus you can block methods other than GET, and you can support proxy caching FTP sites).

OTOH, it's always a security footprint issue to run a larger codebase for a cache. Also, Squid caching can be ineffective if multiple VMs download files from different mirrors (remember that the decision of which mirror to use is left practically at random to the VM calling onto the Squid proxy to do its job).

For those reasons, it may be wise to investigate solutions that do a better job of proxy caching using a content-addressable store, or matching file names.

@Rudd-O

This comment has been minimized.

Show comment
Hide comment
@Rudd-O

Rudd-O May 10, 2016

Perhaps a custom Go-based (to prevent security vulns) cache that can listen for requests using the net/http module, and proxy them to the VMs? This has potential to be a very efficient solution too, as a Go program would have a minuscule memory footprint.

Rudd-O commented May 10, 2016

Perhaps a custom Go-based (to prevent security vulns) cache that can listen for requests using the net/http module, and proxy them to the VMs? This has potential to be a very efficient solution too, as a Go program would have a minuscule memory footprint.

@kalkin

This comment has been minimized.

Show comment
Hide comment
Member

kalkin commented May 11, 2016

@Rudd-O

This comment has been minimized.

Show comment
Hide comment
@Rudd-O

Rudd-O May 12, 2016

Looking. Note we need something like that for Debian as well.

Rudd-O commented May 12, 2016

Looking. Note we need something like that for Debian as well.

@Rudd-O

This comment has been minimized.

Show comment
Hide comment
@Rudd-O

Rudd-O May 12, 2016

The code is not idiomatic Go and there are some warts there that I would fix before including it anywhere. Just as a small example on https://github.com/mojaves/yumreproxyd/blob/master/yumreproxy/yumreproxy.go#L33 you can see he is using a nil value as a sort of a bool. That is not correct -- the return type should be (bool, struct).

Rudd-O commented May 12, 2016

The code is not idiomatic Go and there are some warts there that I would fix before including it anywhere. Just as a small example on https://github.com/mojaves/yumreproxyd/blob/master/yumreproxy/yumreproxy.go#L33 you can see he is using a nil value as a sort of a bool. That is not correct -- the return type should be (bool, struct).

@Rudd-O

This comment has been minimized.

Show comment
Hide comment
@Rudd-O

Rudd-O May 12, 2016

https://github.com/mojaves/yumreproxyd/blob/master/yumreproxy/yumreproxy.go#L73 <- also problematic. TODO: path sanitization is not what you want in secure software.

But the BIGGEST problem, is that the program appears not to give a shit about concurrency. Save into cache and serve from cache can have a race, and no locking is performed, nor are channels being used there. Big fat red flag. The right way to do that by communicating with the Cache aspect of the application through channels -- send request to the Cache, await for response, if not available, then download file, send storage to the Cache, await for response.

Also, all content types returned are application/rpm. That's wrong in many cases.

BUT, that only means that project can be extended or rewritten, and it should not be very difficult to do so.

Rudd-O commented May 12, 2016

https://github.com/mojaves/yumreproxyd/blob/master/yumreproxy/yumreproxy.go#L73 <- also problematic. TODO: path sanitization is not what you want in secure software.

But the BIGGEST problem, is that the program appears not to give a shit about concurrency. Save into cache and serve from cache can have a race, and no locking is performed, nor are channels being used there. Big fat red flag. The right way to do that by communicating with the Cache aspect of the application through channels -- send request to the Cache, await for response, if not available, then download file, send storage to the Cache, await for response.

Also, all content types returned are application/rpm. That's wrong in many cases.

BUT, that only means that project can be extended or rewritten, and it should not be very difficult to do so.

andrewdavidwong added a commit that referenced this issue May 31, 2016

@rustybird

This comment has been minimized.

Show comment
Hide comment
@rustybird

rustybird Jun 6, 2016

I just uploaded the Squid-based https://github.com/rustybird/qubes-updates-cache (posted to qubes-devel too)

rustybird commented Jun 6, 2016

I just uploaded the Squid-based https://github.com/rustybird/qubes-updates-cache (posted to qubes-devel too)

andrewdavidwong added a commit that referenced this issue Jun 6, 2016

@rustybird

This comment has been minimized.

Show comment
Hide comment
@rustybird

rustybird Jun 8, 2016

The latest commit (-57 lines, woo) reworks qubes-updates-cache to act as a drop-in replacement for qubes-updates-proxy. No changes to the client templates are needed at all now.

The latest commit (-57 lines, woo) reworks qubes-updates-cache to act as a drop-in replacement for qubes-updates-proxy. No changes to the client templates are needed at all now.

andrewdavidwong added a commit that referenced this issue Jun 8, 2016

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Jun 8, 2016

Member

How much memory does it use? I.e. is it a good idea to have it instead of tinyproxy by default, or give the user a choice?

Member

marmarek commented Jun 8, 2016

How much memory does it use? I.e. is it a good idea to have it instead of tinyproxy by default, or give the user a choice?

@taradiddles

This comment has been minimized.

Show comment
Hide comment
@taradiddles

taradiddles Jun 9, 2016

FWIW I had a similar setup running after my last post, the difference being that I used/tweaked the store_id program mentioned by @kalkin in an earlier post [1]. But there were many cache misses ; a quick look at the log showed that different mirrors would send different mime types for the same rpm (or repo) file, so that might be the culprit. Other tasks piled up and I didn't have time to work on that.

@marmarek : after boot, memory = ~30Mo (as far as you can trust ps). But I guess the question is more on the long term use, after squid has cached many objects. Rusty used 'cache_mem=0', so there shouldn't be a huge difference in mem usage, but he might have more statistics.

@rustybird : tinyproxy's configuration is quite locked down, maybe that would be a good idea to do the same with squid's ? I'm also not sure it is a good idea to mess with the cache ids for files other than rpm/repo (and deb/...).

For instance, stuff like:

acl localnet src 10.137.0.0/16
acl http_ports port 80
acl SSL_ports port 443
acl CONNECT method CONNECT
http_access deny to_localhost
http_access deny CONNECT !SSL_ports
http_access allow http_ports
http_access allow SSL_ports
http_access deny all

# that one was from https://github.com/yevmel/squid-rpm-cache
# have to understand why that's changed
#refresh_pattern .              0       20%     4320
#                 3 month               12 month
refresh_pattern . 129600        33%     525600

# cache only specific files types
acl rpm_files urlpath_regex \/Packages\/.*\.rpm
acl repodata_files urlpath_regex \/repodata\/.*\.(|sqlite\.xz|xml(\.[xg]z)?)
cache allow rpm_files
cache allow repodata_files
cache deny all

[1] https://github.com/yevmel/squid-rpm-cache

FWIW I had a similar setup running after my last post, the difference being that I used/tweaked the store_id program mentioned by @kalkin in an earlier post [1]. But there were many cache misses ; a quick look at the log showed that different mirrors would send different mime types for the same rpm (or repo) file, so that might be the culprit. Other tasks piled up and I didn't have time to work on that.

@marmarek : after boot, memory = ~30Mo (as far as you can trust ps). But I guess the question is more on the long term use, after squid has cached many objects. Rusty used 'cache_mem=0', so there shouldn't be a huge difference in mem usage, but he might have more statistics.

@rustybird : tinyproxy's configuration is quite locked down, maybe that would be a good idea to do the same with squid's ? I'm also not sure it is a good idea to mess with the cache ids for files other than rpm/repo (and deb/...).

For instance, stuff like:

acl localnet src 10.137.0.0/16
acl http_ports port 80
acl SSL_ports port 443
acl CONNECT method CONNECT
http_access deny to_localhost
http_access deny CONNECT !SSL_ports
http_access allow http_ports
http_access allow SSL_ports
http_access deny all

# that one was from https://github.com/yevmel/squid-rpm-cache
# have to understand why that's changed
#refresh_pattern .              0       20%     4320
#                 3 month               12 month
refresh_pattern . 129600        33%     525600

# cache only specific files types
acl rpm_files urlpath_regex \/Packages\/.*\.rpm
acl repodata_files urlpath_regex \/repodata\/.*\.(|sqlite\.xz|xml(\.[xg]z)?)
cache allow rpm_files
cache allow repodata_files
cache deny all

[1] https://github.com/yevmel/squid-rpm-cache

andrewdavidwong added a commit that referenced this issue Jun 9, 2016

@rustybird

This comment has been minimized.

Show comment
Hide comment
@rustybird

rustybird Jun 9, 2016

@marmarek:

How much memory does it use?

With DefaultMemoryAccounting=yes in /etc/systemd/system.conf, the following values were observed in /sys/fs/cgroup/memory/system.slice/qubes-updates-cache.service/memory.memsw.max_usage_in_bytes:

  • Squid first started, created new cache dir = 41 MiB
  • Upgraded a new clone of qubes-template-fedora-23-3.0.4-201601120722 (~450 packages) = 202 MiB
  • Upgraded another new clone of the same template, ~100% cache hits = still 202 MiB
  • Squid restarted, uses filled cache dir = 16 MiB

That's already the latest commit, which sets memory_pools off in the Squid config to allow the system to reclaim unused memory. But apparently Squid doesn't free() aggressively enough yet for our purposes.

@taradiddles:

But there were many cache misses ; a quick look at the log showed that different mirrors would send different mime types for the same rpm (or repo) file, so that might be the culprit.

Yes, that seems to happen sometimes, probably because .drpm is a relatively young file extension. Is it possible to make Squid ignore the MIME type header?

tinyproxy's configuration is quite locked down, maybe that would be a good idea to do the same with squid's ?

Definitely. IIRC Whonix also wants some sort of magic string from the proxy port? Paging @adrelanos :)

I'm also not sure it is a good idea to mess with the cache ids for files other than rpm/repo (and deb/...).

So far I haven't seen the regexes in https://github.com/rustybird/qubes-updates-cache/blob/master/usr/lib/qubes/updates-cache-dedup#L6-L7 match anything else besides metadata and packages. Files aren't listed explicitly because that's such a hassle to maintain for all compression formats and package types, e.g. Debian source packages didn't work with qubes-updates-proxy when tinyproxy still used filters.

@marmarek:

How much memory does it use?

With DefaultMemoryAccounting=yes in /etc/systemd/system.conf, the following values were observed in /sys/fs/cgroup/memory/system.slice/qubes-updates-cache.service/memory.memsw.max_usage_in_bytes:

  • Squid first started, created new cache dir = 41 MiB
  • Upgraded a new clone of qubes-template-fedora-23-3.0.4-201601120722 (~450 packages) = 202 MiB
  • Upgraded another new clone of the same template, ~100% cache hits = still 202 MiB
  • Squid restarted, uses filled cache dir = 16 MiB

That's already the latest commit, which sets memory_pools off in the Squid config to allow the system to reclaim unused memory. But apparently Squid doesn't free() aggressively enough yet for our purposes.

@taradiddles:

But there were many cache misses ; a quick look at the log showed that different mirrors would send different mime types for the same rpm (or repo) file, so that might be the culprit.

Yes, that seems to happen sometimes, probably because .drpm is a relatively young file extension. Is it possible to make Squid ignore the MIME type header?

tinyproxy's configuration is quite locked down, maybe that would be a good idea to do the same with squid's ?

Definitely. IIRC Whonix also wants some sort of magic string from the proxy port? Paging @adrelanos :)

I'm also not sure it is a good idea to mess with the cache ids for files other than rpm/repo (and deb/...).

So far I haven't seen the regexes in https://github.com/rustybird/qubes-updates-cache/blob/master/usr/lib/qubes/updates-cache-dedup#L6-L7 match anything else besides metadata and packages. Files aren't listed explicitly because that's such a hassle to maintain for all compression formats and package types, e.g. Debian source packages didn't work with qubes-updates-proxy when tinyproxy still used filters.

@rustybird

This comment has been minimized.

Show comment
Hide comment
@rustybird

rustybird Jun 9, 2016

IIRC Whonix also wants some sort of magic string from the proxy port? Paging @adrelanos :)

Sorry, never mind, I literally found something with grep -r magic /etc/tinyproxy. Will check that out.

rustybird commented Jun 9, 2016

IIRC Whonix also wants some sort of magic string from the proxy port? Paging @adrelanos :)

Sorry, never mind, I literally found something with grep -r magic /etc/tinyproxy. Will check that out.

@adrelanos

This comment has been minimized.

Show comment
Hide comment
@adrelanos

adrelanos Jun 9, 2016

Member

Ivan:

tinyproxy's configuration is quite locked down, maybe that would be a good idea to do the same with squid's ?

Tinyproxy configuration was relaxed some time ago. There was a ticket
and discussion. In short: locking down tinyproxy does not improve actual
security. Users who explicitly configure their applications to use the
updates proxy should be free to do so.

Member

adrelanos commented Jun 9, 2016

Ivan:

tinyproxy's configuration is quite locked down, maybe that would be a good idea to do the same with squid's ?

Tinyproxy configuration was relaxed some time ago. There was a ticket
and discussion. In short: locking down tinyproxy does not improve actual
security. Users who explicitly configure their applications to use the
updates proxy should be free to do so.

andrewdavidwong added a commit that referenced this issue Jun 9, 2016

@rustybird

This comment has been minimized.

Show comment
Hide comment
@rustybird

rustybird Jun 9, 2016

@adrelanos:

Tinyproxy configuration was relaxed some time ago. There was a ticket and discussion. In short: locking down tinyproxy does not improve actual security. Users who explicitly configure their applications to use the updates proxy should be free to do so.

There's the "Squid Manager" though, which I've restricted access to in commit rustybird/qubes-updates-cache@0da1dcd -- along with a basic sanity check that requests are coming from 10.137.*.

Also a paragraph on how to use qubes-updates-cache with Whonix at the moment: https://github.com/rustybird/qubes-updates-cache/blob/3b9d5e153f89b551e9b38f82928cbc7c9c2f7ba3/README#L32-L35 (works nicely BTW, tons of cache hits across Debian / Whonix GW / Whonix WS)

rustybird commented Jun 9, 2016

@adrelanos:

Tinyproxy configuration was relaxed some time ago. There was a ticket and discussion. In short: locking down tinyproxy does not improve actual security. Users who explicitly configure their applications to use the updates proxy should be free to do so.

There's the "Squid Manager" though, which I've restricted access to in commit rustybird/qubes-updates-cache@0da1dcd -- along with a basic sanity check that requests are coming from 10.137.*.

Also a paragraph on how to use qubes-updates-cache with Whonix at the moment: https://github.com/rustybird/qubes-updates-cache/blob/3b9d5e153f89b551e9b38f82928cbc7c9c2f7ba3/README#L32-L35 (works nicely BTW, tons of cache hits across Debian / Whonix GW / Whonix WS)

@adrelanos

This comment has been minimized.

Show comment
Hide comment
@adrelanos

adrelanos Jun 10, 2016

Member

I have just now finished documenting Qubes-Whonix torified updates proxy:
https://www.whonix.org/wiki/Dev/Qubes#Torified_Updates_Proxy

In essence, Whonix TemplateVMs are getting the output of UWT_DEV_PASSTHROUGH="1" curl --silent --connect-timeout 10 "http://10.137.255.254:8082/" and grep it for <meta name="application-name" content="tor proxy"/>. If that matches, that test is considered successful.

Of course qubes-updates-cache's squid should only include the magic string, if it is actually torrified, i.e. run inside sys-whonix.

Do you know if it is possible to conditionally inject this magic string? Or if not, we need to modify Qubes-Whonix torified updates check to do something supported by squid.

I am wondering if any whonix-gw-firewall modifications will be required. Current tinyproxy rules:
https://github.com/Whonix/whonix-gw-firewall/blob/724a0fc0546c83555a008cd1b7b03c048519121a/usr/bin/whonix_firewall#L310-L328

Does squid support outgoing proxy settings? Can squid be configured to use a Tor SocksPort?

Member

adrelanos commented Jun 10, 2016

I have just now finished documenting Qubes-Whonix torified updates proxy:
https://www.whonix.org/wiki/Dev/Qubes#Torified_Updates_Proxy

In essence, Whonix TemplateVMs are getting the output of UWT_DEV_PASSTHROUGH="1" curl --silent --connect-timeout 10 "http://10.137.255.254:8082/" and grep it for <meta name="application-name" content="tor proxy"/>. If that matches, that test is considered successful.

Of course qubes-updates-cache's squid should only include the magic string, if it is actually torrified, i.e. run inside sys-whonix.

Do you know if it is possible to conditionally inject this magic string? Or if not, we need to modify Qubes-Whonix torified updates check to do something supported by squid.

I am wondering if any whonix-gw-firewall modifications will be required. Current tinyproxy rules:
https://github.com/Whonix/whonix-gw-firewall/blob/724a0fc0546c83555a008cd1b7b03c048519121a/usr/bin/whonix_firewall#L310-L328

Does squid support outgoing proxy settings? Can squid be configured to use a Tor SocksPort?

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Jun 10, 2016

Member

Do you know if it is possible to conditionally inject this magic string? Or if not, we need to modify Qubes-Whonix torified updates check to do something supported by squid.

AFAIR in case of tinyproxy it is placed in default error page. Squid should allow the same.

Member

marmarek commented Jun 10, 2016

Do you know if it is possible to conditionally inject this magic string? Or if not, we need to modify Qubes-Whonix torified updates check to do something supported by squid.

AFAIR in case of tinyproxy it is placed in default error page. Squid should allow the same.

@rustybird

This comment has been minimized.

Show comment
Hide comment
@rustybird

rustybird Jun 10, 2016

Does squid support outgoing proxy settings? Can squid be configured to use a Tor SocksPort?

Haven't found anything about outgoing HTTP proxies. Semi-official Socks support can be added in during compilation via libsocks, which Debian doesn't seem to do, but ...

I am wondering if any whonix-gw-firewall modifications will be required. Current tinyproxy rules:
https://github.com/Whonix/whonix-gw-firewall/blob/724a0fc0546c83555a008cd1b7b03c048519121a/usr/bin/whonix_firewall#L310-L328

... I think you'd only need to change --uid-owner tinyproxy to --uid-owner squid.

AFAIR in case of tinyproxy it is placed in default error page. Squid should allow the same.

Yes, the relevant file to modify is /usr/share/squid-langpack/templates/ERR_INVALID_URL from Debian package squid-langpack.

rustybird commented Jun 10, 2016

Does squid support outgoing proxy settings? Can squid be configured to use a Tor SocksPort?

Haven't found anything about outgoing HTTP proxies. Semi-official Socks support can be added in during compilation via libsocks, which Debian doesn't seem to do, but ...

I am wondering if any whonix-gw-firewall modifications will be required. Current tinyproxy rules:
https://github.com/Whonix/whonix-gw-firewall/blob/724a0fc0546c83555a008cd1b7b03c048519121a/usr/bin/whonix_firewall#L310-L328

... I think you'd only need to change --uid-owner tinyproxy to --uid-owner squid.

AFAIR in case of tinyproxy it is placed in default error page. Squid should allow the same.

Yes, the relevant file to modify is /usr/share/squid-langpack/templates/ERR_INVALID_URL from Debian package squid-langpack.

@rustybird

This comment has been minimized.

Show comment
Hide comment
@rustybird

rustybird Jun 10, 2016

All the security implications of using qubes-updates-cache I could think of:
https://github.com/rustybird/qubes-updates-cache/blob/master/README#L8

Edit: Hmm, regarding (1) it's really the same with qubes-updates-proxy. Not sure why I always (wrongly) thought of that as circuit-isolated per client VM...

rustybird commented Jun 10, 2016

All the security implications of using qubes-updates-cache I could think of:
https://github.com/rustybird/qubes-updates-cache/blob/master/README#L8

Edit: Hmm, regarding (1) it's really the same with qubes-updates-proxy. Not sure why I always (wrongly) thought of that as circuit-isolated per client VM...

andrewdavidwong added a commit that referenced this issue Jun 11, 2016

@rustybird

This comment has been minimized.

Show comment
Hide comment
@rustybird

rustybird Jun 19, 2016

Some news:

  • Made qubes-updates-cache work on Debian, incl. Whonix gateways pending PRs Whonix/whonix-firewall#1 and Whonix/qubes-whonix#2
  • Ordered the systemd service after bind-dirs.sh/rc.local and made it simpler and more reliable, which should fix TCP_SWAPFAIL_MISS cache corruption
  • Switched to asynchronous cache backend, reducing memory consumption to ~ 130 MiB after one Fedora template upgrade. Still have to find out why adding new objects leaks memory at all
  • Rewrote the URL rewriting script in pure bash with no child processes. Now in addition to dedup it also transparently upgrades some hosts from HTTP to HTTPS: {ftp,yum,deb}.qubes-os.org, www.whonix.org, deb.torproject.org, dl.google.com, mirrors.kernel.org

rustybird commented Jun 19, 2016

Some news:

  • Made qubes-updates-cache work on Debian, incl. Whonix gateways pending PRs Whonix/whonix-firewall#1 and Whonix/qubes-whonix#2
  • Ordered the systemd service after bind-dirs.sh/rc.local and made it simpler and more reliable, which should fix TCP_SWAPFAIL_MISS cache corruption
  • Switched to asynchronous cache backend, reducing memory consumption to ~ 130 MiB after one Fedora template upgrade. Still have to find out why adding new objects leaks memory at all
  • Rewrote the URL rewriting script in pure bash with no child processes. Now in addition to dedup it also transparently upgrades some hosts from HTTP to HTTPS: {ftp,yum,deb}.qubes-os.org, www.whonix.org, deb.torproject.org, dl.google.com, mirrors.kernel.org

andrewdavidwong added a commit that referenced this issue Jun 19, 2016

@adrelanos

This comment has been minimized.

Show comment
Hide comment
@adrelanos

adrelanos Jul 3, 2016

Member

@rustybird

Made qubes-updates-cache work on Debian, incl. Whonix gateways pending PRs Whonix/whonix-firewall#1 and Whonix/qubes-whonix#2

This is done btw.

Member

adrelanos commented Jul 3, 2016

@rustybird

Made qubes-updates-cache work on Debian, incl. Whonix gateways pending PRs Whonix/whonix-firewall#1 and Whonix/qubes-whonix#2

This is done btw.

andrewdavidwong added a commit that referenced this issue Jul 3, 2016

@qjoo

This comment has been minimized.

Show comment
Hide comment
@qjoo

qjoo Jul 3, 2016

I use polipo as a caching proxy between template VMs and Tor SOCKS port. It has SOCKS support and might be more lightweight than squid?

qjoo commented Jul 3, 2016

I use polipo as a caching proxy between template VMs and Tor SOCKS port. It has SOCKS support and might be more lightweight than squid?

@rustybird

This comment has been minimized.

Show comment
Hide comment
@rustybird

rustybird Jul 4, 2016

@adrelanos:

Made qubes-updates-cache work on Debian, incl. Whonix gateways pending PRs Whonix/whonix-firewall#1 and Whonix/qubes-whonix#2

This is done btw.

Looks like whonix-gw-firewall needs a version bump, and qubes-whonix 5.3-1 hasn't been uploaded yet?

@qjoo:

I use polipo as a caching proxy between template VMs and Tor SOCKS port. It has SOCKS support and might be more lightweight than squid?

It doesn't seem to support deduplication or (transparent) rewriting of URLs :(

@adrelanos:

Made qubes-updates-cache work on Debian, incl. Whonix gateways pending PRs Whonix/whonix-firewall#1 and Whonix/qubes-whonix#2

This is done btw.

Looks like whonix-gw-firewall needs a version bump, and qubes-whonix 5.3-1 hasn't been uploaded yet?

@qjoo:

I use polipo as a caching proxy between template VMs and Tor SOCKS port. It has SOCKS support and might be more lightweight than squid?

It doesn't seem to support deduplication or (transparent) rewriting of URLs :(

andrewdavidwong added a commit that referenced this issue Jul 5, 2016

@adrelanos

This comment has been minimized.

Show comment
Hide comment
@adrelanos

adrelanos Jul 5, 2016

Member

Rusty Bird:

@adrelanos:

Made qubes-updates-cache work on Debian, incl. Whonix gateways pending PRs Whonix/whonix-firewall#1 and Whonix/qubes-whonix#2

This is done btw.

Looks like whonix-gw-firewall needs a version bump, and qubes-whonix 5.3-1 hasn't been uploaded yet?

Yes. The usual ETA to reach Whonix stable users is the next release,
Whonix 14.

Member

adrelanos commented Jul 5, 2016

Rusty Bird:

@adrelanos:

Made qubes-updates-cache work on Debian, incl. Whonix gateways pending PRs Whonix/whonix-firewall#1 and Whonix/qubes-whonix#2

This is done btw.

Looks like whonix-gw-firewall needs a version bump, and qubes-whonix 5.3-1 hasn't been uploaded yet?

Yes. The usual ETA to reach Whonix stable users is the next release,
Whonix 14.

andrewdavidwong added a commit that referenced this issue Jul 7, 2016

@adrelanos

This comment has been minimized.

Show comment
Hide comment
@adrelanos

adrelanos Jul 26, 2016

Member
  • I'll release a qubes-whonix package with your qubes-updates-cache changes soon. (Currently in developers repository, contains some other fixes.)
  • Should not writing to /etc i.e. /etc/systemd/system/multi-user.target.wants/qubes-updates-cache.service better be avoided and standard distribution default systemd folders /lib/systemd/system be used?
  • Where is the code for qubes-updates-cache.service?
Member

adrelanos commented Jul 26, 2016

  • I'll release a qubes-whonix package with your qubes-updates-cache changes soon. (Currently in developers repository, contains some other fixes.)
  • Should not writing to /etc i.e. /etc/systemd/system/multi-user.target.wants/qubes-updates-cache.service better be avoided and standard distribution default systemd folders /lib/systemd/system be used?
  • Where is the code for qubes-updates-cache.service?
@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Jul 26, 2016

Member

Where is the code for qubes-updates-cache.service?

Here: https://github.com/rustybird/qubes-updates-cache/blob/master/usr/lib/systemd/system/qubes-updates-cache.service

Should not writing to /etc i.e. /etc/systemd/system/multi-user.target.wants/qubes-updates-cache.service better be avoided and standard distribution default systemd folders /lib/systemd/system be used?

The standard way is to create such symlink in post-installation script (preferably using presets). But since the service is controlled by qvm-service, it may be indeed good idea to provide the symlink in the package. In such a case it should live in /lib/systemd/system and be relative one.

Member

marmarek commented Jul 26, 2016

Where is the code for qubes-updates-cache.service?

Here: https://github.com/rustybird/qubes-updates-cache/blob/master/usr/lib/systemd/system/qubes-updates-cache.service

Should not writing to /etc i.e. /etc/systemd/system/multi-user.target.wants/qubes-updates-cache.service better be avoided and standard distribution default systemd folders /lib/systemd/system be used?

The standard way is to create such symlink in post-installation script (preferably using presets). But since the service is controlled by qvm-service, it may be indeed good idea to provide the symlink in the package. In such a case it should live in /lib/systemd/system and be relative one.

@rustybird

This comment has been minimized.

Show comment
Hide comment
@rustybird

rustybird Jul 26, 2016

Short update:

  • The TCP_SWAPFAIL_MISS cache corruption still happens. It looks like Squid 3 just cannot deal with unclean shutdowns.
  • Squid 4 fixes the memory leak! But the latest beta (4.0.12) is still too crashy to use.

@marmarek:

But since the service is controlled by qvm-service, it may be indeed good idea to provide the symlink in the package. In such a case it should live in /lib/systemd/system and be relative one.

It's currently created in /etc, as if qubes-updates-cache.service was listed in https://github.com/QubesOS/qubes-core-agent-linux/blob/master/vm-systemd/75-qubes-vm.preset just like qubes-updates-proxy.service.

But I'll have to move at least the actual qubes-updates-cache.service to $(pkg-config --variable=systemdsystemunitdir systemd) anyway, installing it to /usr/lib/systemd/system is wrong for Debian. Then the symlink could be moved there, too.

Short update:

  • The TCP_SWAPFAIL_MISS cache corruption still happens. It looks like Squid 3 just cannot deal with unclean shutdowns.
  • Squid 4 fixes the memory leak! But the latest beta (4.0.12) is still too crashy to use.

@marmarek:

But since the service is controlled by qvm-service, it may be indeed good idea to provide the symlink in the package. In such a case it should live in /lib/systemd/system and be relative one.

It's currently created in /etc, as if qubes-updates-cache.service was listed in https://github.com/QubesOS/qubes-core-agent-linux/blob/master/vm-systemd/75-qubes-vm.preset just like qubes-updates-proxy.service.

But I'll have to move at least the actual qubes-updates-cache.service to $(pkg-config --variable=systemdsystemunitdir systemd) anyway, installing it to /usr/lib/systemd/system is wrong for Debian. Then the symlink could be moved there, too.

andrewdavidwong added a commit that referenced this issue Jul 30, 2016

@Rudd-O

This comment has been minimized.

Show comment
Hide comment
@Rudd-O

Rudd-O Aug 4, 2016

I would really like to urge folks into developing a custom cache solution using the very mature Go libraries that exist for HTTP and proxying. It will be memory-safe (no pointer bullshit), it will be far smaller than trying to shoehorn Squid into this role, and it will be trivial to provide a proper solution that will cache requested file names based on content.

Rudd-O commented Aug 4, 2016

I would really like to urge folks into developing a custom cache solution using the very mature Go libraries that exist for HTTP and proxying. It will be memory-safe (no pointer bullshit), it will be far smaller than trying to shoehorn Squid into this role, and it will be trivial to provide a proper solution that will cache requested file names based on content.

@adrelanos

This comment has been minimized.

Show comment
Hide comment
@adrelanos

adrelanos Aug 14, 2016

Member

@rustybird:

Looks like whonix-gw-firewall needs a version bump, and qubes-whonix 5.3-1 hasn't been uploaded yet?

@adrelanos:

I'll release a qubes-whonix package with your qubes-updates-cache changes soon. (Currently in developers repository, contains some other fixes.)

It's in Whonix jessie (stable) repository for a few days now. (And if you reinstall Qubes-Whonix 13 from qubes-tempaltes-community repository, it is also included.)

Member

adrelanos commented Aug 14, 2016

@rustybird:

Looks like whonix-gw-firewall needs a version bump, and qubes-whonix 5.3-1 hasn't been uploaded yet?

@adrelanos:

I'll release a qubes-whonix package with your qubes-updates-cache changes soon. (Currently in developers repository, contains some other fixes.)

It's in Whonix jessie (stable) repository for a few days now. (And if you reinstall Qubes-Whonix 13 from qubes-tempaltes-community repository, it is also included.)

andrewdavidwong added a commit that referenced this issue Aug 14, 2016

@andrewdavidwong andrewdavidwong added this to the Release 4.0 milestone Dec 24, 2016

@rustybird

This comment has been minimized.

Show comment
Hide comment
@rustybird

rustybird Mar 18, 2017

The latest qubes-updates-cache has many new rewriting rules that transparently upgrade repository URLs to HTTPS, and optionally to .onion (#2576).

Current coverage:

Repository HTTPS HTTP .onion
yum.Qubes upgrade upgrade to v3
deb.Qubes upgrade upgrade to v3
Whonix upgrade upgrade to v3
Debian upgrade upgrade to v2
Debian Security upgrade upgrade to v2
Fedora upgrade -
RPM Fusion upgrade -
Tor Project upgrade upgrade to v2
Google upgrade -
Fedora-Cisco uncached -
Adobe - -

rustybird commented Mar 18, 2017

The latest qubes-updates-cache has many new rewriting rules that transparently upgrade repository URLs to HTTPS, and optionally to .onion (#2576).

Current coverage:

Repository HTTPS HTTP .onion
yum.Qubes upgrade upgrade to v3
deb.Qubes upgrade upgrade to v3
Whonix upgrade upgrade to v3
Debian upgrade upgrade to v2
Debian Security upgrade upgrade to v2
Fedora upgrade -
RPM Fusion upgrade -
Tor Project upgrade upgrade to v2
Google upgrade -
Fedora-Cisco uncached -
Adobe - -

andrewdavidwong added a commit that referenced this issue Mar 18, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment