You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks a lot for contributing this great work with the community and maintaining over time !
Describe the feature
I'm trying to get stats per requested FQDN (in the case of a CONNECT request) or Urls (in the case plain HTTP request) such as:
number of client requests
number of client kbytes received
number of client kbytes transferred
cache hits per url
Currently, I understand that available metrics don't have labels with fqdn or urls.
I'm not yet so familiar with what squid offers in terms of stats/metrics/reports that could be used. I did the following research in the documentation to learn a bit more below.
Is there prior work on this topic ?
Expected behavior
A new flag passed to the exporter to turn on a feature which adds metrics labels with client requested FQDN or Url
As to avoid prometheus cardinality explosion, the flag could select the k top FQDN/URL to surface as labels, and the group the long tail into an other category
Additional context
Add any other context about the problem here.
Research in squid documentation about per FQDN/Url stats/report available
A Cache Digest is a summary of the contents of an Internet Object Caching Server. It contains, in a compact (i.e. compressed) format, an indication of whether or not particular URLs are in the cache.
Enabling Cache Digests
If you wish to use Cache Digests (available in Squid version 2) you need to add a configure option, so that the relevant code is compiled in:
./configure --enable-cache-digests ...
This is an example from a default build of Squid-3.2. Remember the menu varies with available features.
index Cache Manager Interface public
menu Cache Manager Menu public
offline_toggle Toggle offline_mode setting hidden
shutdown Shut Down the Squid Process hidden
reconfigure Reconfigure Squid hidden
rotate Rotate Squid Logs hidden
pconn Persistent Connection Utilization Histograms public
mem Memory Utilization public
diskd DISKD Stats public
squidaio_counts Async IO Function Counters public
config Current Squid Configuration hidden
comm_epoll_incoming comm_incoming() stats public
ipcache IP Cache Stats and Contents public
fqdncache FQDN Cache Stats and Contents public
idns Internal DNS Statistics public
redirector URL Redirector Stats public
external_acl External ACL stats public
http_headers HTTP Header Statistics public
info General Runtime Information public
service_times Service Times (Percentiles) public
filedescriptors Process Filedescriptor Allocation public
objects All Cache Objects public
vm_objects In-Memory and In-Transit Objects public
io Server-side network read() size histograms public
counters Traffic and Resource Counters public
peer_select Peer Selection Algorithms public
digest_stats Cache Digest and ICP blob public
5min 5 Minute Average of Counters public
60min 60 Minute Average of Counters public
utilization Cache Utilization public
histograms Full Histogram Counts public
active_requests Client-side Active Requests public
username_cache Active Cached Usernames public
openfd_objects Objects with Swapout files open public
store_digest Store Digest public
store_log_tags Histogram of store.log tags public
storedir Store Directory Stats public
store_io Store IO Interface Stats public
store_check_cachable_stats storeCheckCachable() Stats public
refresh Refresh Algorithm Statistics public
forward Request Forwarding Statistics public
cbdata Callback Data Registry Contents public
events Event Queue public
client_list Cache Client List public
asndb AS Number Database public
carp CARP information public
userhash peer userhash information public
sourcehash peer sourcehash information public
server_list Peer Cache Statistics public
config Current Squid Configuration hidden
store_log_tags Histogram of store.log tags public
The following table details SMP support for each Cache Manager object or report. Unless noted otherwise, an aggregated statistics is either a sum, arithmetic mean, minimum, or maximum across all kids, as appropriate to represent the “whole Squid” view.
Name
Component
Aggregated?
Comments
menu
all
yes
info
Number of clients accessing cache
yes, poorly
Coordinator sums up the number of clients reported by each kid, which is usually wrong because most active clients will use more than one worker, leading to exaggerated values. Note that even without SMP, this statistics is exaggerated because the count goes down when Squid cleans up the internal client table and not when the last client connection closes. SMP amplifies that effect.
UP Time
yes
The maximum uptime across all kids is reported
other
yes
server_list
all
no, but can be
If you work on aggregating these stats, please keep in mind that kids may have a different set of peers. The to-Coordinator responses should include, for each peer, a peer name and not just its “index”
mem
all
no, but can be
If you work on aggregating these stats, please keep in mind that kids may have a different set of memory pools. The to-Coordinator responses should include, for each pool, a pool name and not just its “index”. Full stats may exceed typical UDS message size limits (16KB). If overflows are likely, it may be a good idea to create response messages so that overflowing items are not included (in the current sort order). Another alternative is to split mgr:mem into mgr:mem (with various aggregated totals) and mgr:pools (with non-aggregated per-pool details).
counters
sample_time
yes
The latest (maximum) sample time across all kids is reported
refresh
all
no, but can be
idns
queue
no and should not be
The kids should probably report their own queues, especially since DNS query IDs are kid-specific.
other
no, but can be
If you work on aggregating these stats, please keep in mind that kids may have a different set of name servers. The to-Coordinator responses should include, for each name server, a server address and not just its “index”.
histograms
all
no, but can be
If you work on aggregating these stats, please keep typical UDS message size limits (16KB) in mind.
5min
sample_start_time
yes
The earliest (minimum) sample time across all kids is reported
sample_end_time
yes
The latest (maximum) sample time across all kids is reported.
median
yes, approximately
The arithmetic mean over kids medians is reported. This is not a true median. True median reporting is possible but would require adding code to exchange and aggregate raw histograms.
other
yes
60min
all
See 5min rows for component details.
utilization
all
no, but can be
If you work on aggregating these stats, please reuse or mimic mgr:5min/60min aggregation code.
other
all
varies
TBD. In general, statistics inside "by kidK {...}" blobs are not aggregated while all others are.
The text was updated successfully, but these errors were encountered:
The only way that I know we can possibly get information per URL in squid is analyzing squid logs. this exporter however only relies on squid cache object, which does not provide any information about the URLs.
I also briefly looked into the cache digest, but it seems it provides information provided a URL and not the other way around. That said there might be way to get this information from the squid cache object, likely by modifying the cache object to include for example top 100 origins or something like that but that's a large project by itself.
If you have more concrete ideas to implement this feature in this project, I'll be happy to discuss options.
I started searching for ways to automatically parse the squid access logs and surface this as metrics. I so far only found the following (unmaintained repos):
Thanks a lot for contributing this great work with the community and maintaining over time !
Describe the feature
I'm trying to get stats per requested FQDN (in the case of a CONNECT request) or Urls (in the case plain HTTP request) such as:
Currently, I understand that available metrics don't have labels with fqdn or urls.
I'm not yet so familiar with what squid offers in terms of stats/metrics/reports that could be used. I did the following research in the documentation to learn a bit more below.
Is there prior work on this topic ?
Expected behavior
A new flag passed to the exporter to turn on a feature which adds metrics labels with client requested FQDN or Url
As to avoid prometheus cardinality explosion, the flag could select the k top FQDN/URL to surface as labels, and the group the long tail into an
other
categoryAdditional context
Add any other context about the problem here.
Research in squid documentation about per FQDN/Url stats/report available
Squid report content
https://wiki.squid-cache.org/Features/CacheManager/Index
Cache Manager objects or reports
The text was updated successfully, but these errors were encountered: