Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prometheus: Unable to create mmap-ed active query log #21

Closed
masterkain opened this issue Oct 23, 2019 · 23 comments · Fixed by #22
Closed

prometheus: Unable to create mmap-ed active query log #21

masterkain opened this issue Oct 23, 2019 · 23 comments · Fixed by #22
Labels
bug Something isn't working

Comments

@masterkain
Copy link
Contributor

hello,
I'm experiencing this issue while trying to run eks/appmesh-prometheus

prometheus/prometheus#5976

% kubectl logs appmesh-prometheus-5f9b989c88-gz5mr -n appmesh-system
level=warn ts=2019-10-23T07:02:45.771Z caller=main.go:282 deprecation_notice="'storage.tsdb.retention' flag is deprecated use 'storage.tsdb.retention.time' instead."
level=info ts=2019-10-23T07:02:45.771Z caller=main.go:329 msg="Starting Prometheus" version="(version=2.12.0, branch=HEAD, revision=43acd0e2e93f9f70c49b2267efa0124f1e759e86)"
level=info ts=2019-10-23T07:02:45.771Z caller=main.go:330 build_context="(go=go1.12.8, user=root@7a9dbdbe0cc7, date=20190818-13:53:16)"
level=info ts=2019-10-23T07:02:45.771Z caller=main.go:331 host_details="(Linux 4.14.146-119.123.amzn2.x86_64 #1 SMP Mon Sep 23 16:58:43 UTC 2019 x86_64 appmesh-prometheus-5f9b989c88-gz5mr (none))"
level=info ts=2019-10-23T07:02:45.771Z caller=main.go:332 fd_limits="(soft=65536, hard=65536)"
level=info ts=2019-10-23T07:02:45.771Z caller=main.go:333 vm_limits="(soft=unlimited, hard=unlimited)"
level=error ts=2019-10-23T07:02:45.771Z caller=query_logger.go:82 component=activeQueryTracker msg="Error opening query log file" file=data/queries.active err="open data/queries.active: permission denied"
panic: Unable to create mmap-ed active query log

I've installed with:

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: prometheus
  namespace: appmesh-system
  labels:
    app.kubernetes.io/name: appmesh-prometheus
spec:
  storageClassName: gp2
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi
EOF
helm3 upgrade -i appmesh-prometheus eks/appmesh-prometheus \
--namespace appmesh-system \
--set retention=12h \
--set persistentVolumeClaim.claimName=prometheus
@MayuraRam
Copy link

component=activeQueryTracker msg="Error opening query log file" file=data/queries.active err="open data/queries.active: permission denied"
panic: Unable to create mmap-ed active query log

I installed prometheus using helm last week and the problem still exists. How to resolve this issue?

@stefanprodan
Copy link
Collaborator

Can you post here the init container logs?

kubectl -n appmesh-system logs deploy/appmesh-prometheus -c chown

@thomaslange24
Copy link

I am running into the same problem:

level=info ts=2019-12-02T13:07:43.871Z caller=main.go:296 msg="no time or size retention was set so using the default time retention" duration=15d
level=info ts=2019-12-02T13:07:43.871Z caller=main.go:332 msg="Starting Prometheus" version="(version=2.14.0, branch=HEAD, revision=edeb7a44cbf745f1d8be4ea6f215e79e651bfe19)"
level=info ts=2019-12-02T13:07:43.871Z caller=main.go:333 build_context="(go=go1.13.4, user=root@df2327081015, date=20191111-14:27:12)"
level=info ts=2019-12-02T13:07:43.871Z caller=main.go:334 host_details="(Linux 3.10.0-862.14.4.el7.x86_64 #1 SMP Wed Sep 26 15:12:11 UTC 2018 x86_64 prometheus-6d9c45679-6dp5x (none))"
level=info ts=2019-12-02T13:07:43.871Z caller=main.go:335 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2019-12-02T13:07:43.871Z caller=main.go:336 vm_limits="(soft=unlimited, hard=unlimited)"
level=error ts=2019-12-02T13:07:43.872Z caller=query_logger.go:85 component=activeQueryTracker msg="Error opening query log file" file=/opt/archive/monitoring/prometheus_data_psr/queries.active err="open /opt/archive/monitoring/prometheus_data_psr/queries.active: permission denied"
panic: Unable to create mmap-ed active query log

goroutine 1 [running]:
github.com/prometheus/prometheus/promql.NewActiveQueryTracker(0x7ffd7dcaead8, 0x2b, 0x14, 0x2b4f400, 0xc0006e2180, 0x2b4f400)
/app/promql/query_logger.go:115 +0x48c
main.main()
/app/cmd/prometheus/main.go:364 +0x5229

@stefanprodan
Copy link
Collaborator

@thomaslange24 can you post the helm chart version and the init container logs. The command to print the logs is posted above

@thomaslange24
Copy link

i did not use helm for this

@Form1ca
Copy link

Form1ca commented Dec 9, 2019

kubectl -n appmesh-system logs deploy/appmesh-prometheus -c chown
Error from server (NotFound): namespaces "appmesh-system" not found

i install prometheus with helm 3 - and it not work

`kubectl logs prometheus-1575844324-server-6899f6b7fd-wk2r4 prometheus-server
level=info ts=2019-12-09T17:55:47.209Z caller=main.go:332 msg="Starting Prometheus" version="(version=2.13.1, branch=HEAD, revision=6f92ce56053866194ae5937012c1bec40f1dd1d9)"
level=info ts=2019-12-09T17:55:47.209Z caller=main.go:333 build_context="(go=go1.13.1, user=root@88e419aa1676, date=20191017-13:15:01)"
level=info ts=2019-12-09T17:55:47.209Z caller=main.go:334 host_details="(Linux 4.19.76-linuxkit #1 SMP Thu Oct 17 19:31:58 UTC 2019 x86_64 prometheus-1575844324-server-6899f6b7fd-wk2r4 (none))"
level=info ts=2019-12-09T17:55:47.209Z caller=main.go:335 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2019-12-09T17:55:47.209Z caller=main.go:336 vm_limits="(soft=unlimited, hard=unlimited)"
level=error ts=2019-12-09T17:55:47.209Z caller=query_logger.go:85 component=activeQueryTracker msg="Error opening query log file" file=/data/queries.active err="open /data/queries.active: permission denied"
panic: Unable to create mmap-ed active query log

goroutine 1 [running]:
github.com/prometheus/prometheus/promql.NewActiveQueryTracker(0x7ffc9b34dd4c, 0x5, 0x14, 0x29db1e0, 0xc00066f350, 0x29db1e0)
/app/promql/query_logger.go:115 +0x48c
main.main()
/app/cmd/prometheus/main.go:364 +0x5229`

@ZDV2622
Copy link

ZDV2622 commented Dec 16, 2019

The same problem,
i install prometheus with helm 3

level=info ts=2019-12-16T23:26:38.040Z caller=main.go:332 msg="Starting Prometheus" version="(version=2.13.1, branch=HEAD, revision=6f92ce56053866194ae5937012c1bec40f1dd1d9)"
level=info ts=2019-12-16T23:26:38.040Z caller=main.go:333 build_context="(go=go1.13.1, user=root@88e419aa1676, date=20191017-13:15:01)"
level=info ts=2019-12-16T23:26:38.040Z caller=main.go:334 host_details="(Linux 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64 prometheus-1576123597-server-695877fbd6-hw4g8 (none))"
level=info ts=2019-12-16T23:26:38.040Z caller=main.go:335 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2019-12-16T23:26:38.040Z caller=main.go:336 vm_limits="(soft=unlimited, hard=unlimited)"
level=error ts=2019-12-16T23:26:38.041Z caller=query_logger.go:85 component=activeQueryTracker msg="Error opening query log file" file=/data/queries.active err="open /data/queries.active: permission denied"
panic: Unable to create mmap-ed active query log

goroutine 1 [running]:
github.com/prometheus/prometheus/promql.NewActiveQueryTracker(0x7fff4038c327, 0x5, 0x14, 0x29db1e0, 0xc0008a56e0, 0x29db1e0)
/app/promql/query_logger.go:115 +0x48c
main.main()
/app/cmd/prometheus/main.go:364 +0x5229

@xriser
Copy link

xriser commented Feb 29, 2020

I have installed prometheus with helm 3 and faced with the same permission issue

msg="Error opening query log file" file=/data/queries.active err="open /data/queries.active: permission denied" panic: Unable to create mmap-ed active query log

@vijay-jindal
Copy link

Hi, I came across the same issue today and I got a solution to it.

The error :
level=error ts=2020-03-05T06:17:05.264Z caller=query_logger.go:82 component=activeQueryTracker msg="Error opening query log file" file=data/queries.active err="open data/queries.active: permission denied"

The Solution :
I solved it by using initContainer to change the user/permission of the Persistent Volume Claim :

spec:
  containers:
    - name: prometheus
      image: docker.io/prom/prometheus:v2.12.0
      volumeMounts:
      - mountPath: "/prometheus/data"
        name: prometheus-data
  initContainers:
  - name: prometheus-data-permission-fix
    image: busybox
    command: ["/bin/chmod","-R","777", "/data"]
    volumeMounts:
    - name: prometheus-data
      mountPath: /data
  volumes:
    - name: prometheus-data
      persistentVolumeClaim:
        claimName: prometheus-pvc

Thanks to this Medium article which I used to solve the permission issue : https://medium.com/faun/digitalocean-kubernetes-and-volume-permissions-820f46598965

@khdevel
Copy link

khdevel commented Mar 31, 2020

Today I tried your solution @vijay-jindal on my EKS (v1.15) and Prometheus server and it did not work. I am using aws-efs-csi-driver and I still have the message chmod: /data: Operation not permitted.

@vijay-jindal
Copy link

Hi @khdevel , The above solution worked for me. Please make sure you have made the right volume mounts and also the names of the volume mounts are mapped properly.
Apart from the above, I also gave persistent volume permission for that node in GCS. Just check how to give persistent volume permission to a node and maybe that can help you.

@khdevel
Copy link

khdevel commented Mar 31, 2020

I believe you @vijay-jindal but in my case I even tried to use very trivial example from aws-efs-csi-driver/examples/kubernetes/multiple_pods/specs. I have changed the pod2.yaml manifest:

apiVersion: v1
kind: Pod
metadata:
  name: app2
spec:
  securityContext:
    fsGroup: 65534
    runAsGroup: 65534
    runAsNonRoot: true
    runAsUser: 65534
  containers:
  - name: app2
    image: busybox
    command: ["/bin/sh"]
    args: ["-c", "while true; do echo $(date -u) >> /data/out2.txt; sleep 5; done"]
    volumeMounts:
    - name: persistent-storage
      mountPath: /data
  initContainers:
  - name: app-data-permission-fix
    image: busybox
    command: ["sh", "-c", "/bin/chmod -R g+rwX /appdata"]
    volumeMounts:
    - name: persistent-storage
      mountPath: /appdata
  volumes:
  - name: persistent-storage
    persistentVolumeClaim:
      claimName: efs-claim-app

I think I am sure that everything seems to be fine... I did not set any special permissions at my AWS EFS. Could you take a look on my case... just briefly.

Thank you!

@vijay-jindal
Copy link

vijay-jindal commented Mar 31, 2020

@khdevel , I went through your pod2.yaml file. You only have 2 busybox containers. Where is the Prometheus container which has the data directory.
Also make sure your mount path is right. Because I faced the issue with mount path initially.
I haven't worked with AWS EFS, but you can check how to give node permissions incase it is needed. The issue seems to be at Prometheus end, because by default there is no root access to the data directory.
If the error is only regarding "access denied : data" directory, then I don't think you need to worry about node permissions.

The error message that you put in the first comment is about "chmod : /data: operation not permitted" , which means, you are not able to run chmod itself.
Mostly the issue should be with the mount path.

@khdevel
Copy link

khdevel commented Apr 1, 2020

Thank you @vijay-jindal for your time. Regarding the Where is the Prometheus container... in my example - there is no, because that was only an example how I try to force the permission change via some busybox. But the idea is the same as in my Prometheus Chart.

I found the problem and indeed your code works but... in my case does not, because for my Prometheus Pods, I have a securityContext as follow

securityContext:
    fsGroup: 65534
    runAsGroup: 65534
    runAsNonRoot: true
    runAsUser: 65534

and it sets this for all the Pods, even for the initContainer - Set the security context for a Pod

To specify security settings for a Pod, include the securityContext field in the Pod specification. The securityContext field is a PodSecurityContext object. The security settings that you specify for a Pod apply to all Containers in the Pod.

The AWS EFS resource by default has the permission 0755 root:root for their mounts, so any noonRoot user (like 65534) cannot change it. To handle it I added the securityContext with uid and guid equal 0 for my initContainer (Set the security context for a Container and it worked for me too:

apiVersion: v1
kind: Pod
metadata:
  name: app2
spec:
  containers:
  - name: app2
    image: busybox
    command: ["/bin/sh"]
    args: ["-c", "while true; do echo $(date -u) >> /data/out2.txt; sleep 5; done"]
    volumeMounts:
    - name: persistent-storage
      mountPath: /data
  securityContext:
    fsGroup: 65534
    runAsGroup: 65534
    runAsNonRoot: true
    runAsUser: 65534
  initContainers:
  - name: app-data-permission-fix
    image: busybox
    command: ["/bin/chmod","-R","777","/appdata"]
    volumeMounts:
    - name: persistent-storage
      mountPath: /appdata
    securityContext:
      runAsGroup: 0
      runAsNonRoot: false
      runAsUser: 0
  volumes:
  - name: persistent-storage
    persistentVolumeClaim:
      claimName: efs-claim-app

@vijay-jindal
Copy link

@khdevel glad to know you solved it. I too had certain doubts with security context, but didn't think that itself could be the whole issue.
Have a good day!!

sijie pushed a commit to apache/pulsar-helm-chart that referenced this issue Apr 29, 2020
### Motivation

As seen below, there is a fix for one of the Grafana dashboards that are currently broken in this project (available since version 0.0.5):
- [The Pulsar-topics metrics can't load in Grafana](streamnative/charts#49)

Additionally, upgrading Prometheus to the latest version improves performance as seen here: https://prometheus.io/blog/2017/11/08/announcing-prometheus-2-0

### Modifications

Bring Docker images to their most up-to-date version (streamnative/apache-pulsar-grafana-dashboard-k8s:0.0.6, prom/prometheus:v2.17.2) to fix the following issues:
- streamnative/charts#49 <- fixes Pulsar-topics metrics failure to load
- prometheus/prometheus#2859 <- prevent escalation vulnerabilities by defaulting to the ```nobody``` user

**Note**: upgrading to the latest version of Prometheus (currently v2.17.2) caused the pod to fail with the following error: ```open /prometheus/queries.active: permission denied```. In order to fix this issue I followed the instructions from these 2 comments:

- [Permission denied UID/GID solution](prometheus/prometheus#5976 (comment))
- [Unable to create mmap-ed active query log securityContext fix](aws/eks-charts#21 (comment))

### Verifying this change

- [x] Make sure that the change passes the CI checks.
@r-trigo
Copy link

r-trigo commented Aug 3, 2020

I had to chown nobody:nogroup in order to solve the issue.

@itninja-hue
Copy link

for stable/prometheus-operator replace in values.yaml initConatiners: [] by the following

    initContainers:
      - name: set-data-dir-ownership
        image: alpine:3
        command:
          - chown
          - -R
          - 65534:655b34
          - /mnt
        volumeMounts:
        - name: prometheus-promethus-prometheus-opera-prometheus-db
          mountPath: /mnt
          subPath: /prometheus

@arashkaffamanesh
Copy link

If using kube-prometheus helm chart, one needs to adapt the securityContext in values.yaml like this and create a PV:

  ## SecurityContext configuration
  ##
  securityContext:
    enabled: true
    # changed / adapted for efs
    # runAsUser: 1001
    # fsGroup: 1001
    runAsUser: 0
    fsGroup: 0
apiVersion: v1
kind: PersistentVolume
metadata:
name: prometheus-kube-prometheus-prometheus-db-0
spec:
accessModes:
- ReadWriteMany
capacity:
  storage: 8Gi
claimRef:
  apiVersion: v1
  kind: PersistentVolumeClaim
  name: prometheus-kube-prometheus-prometheus-db-prometheus-kube-prometheus-prometheus-0
  namespace: monitoring
csi:
  driver: efs.csi.aws.com
  volumeHandle: fs-xxxxxxxxxxxxx
persistentVolumeReclaimPolicy: Delete
storageClassName: efs-sc
volumeMode: Filesystem

@OneideLuizSchneider
Copy link

OneideLuizSchneider commented Jan 25, 2021

Just change the securityContext,
I just had this issue in my local environment.

securityContext:
  fsGroup: 65534
  runAsGroup: 0  <----here
  runAsNonRoot: false
  runAsUser: 0  <----here

@hakanqq66
Copy link

level=info ts=2021-08-10T08:38:15.183Z caller=main.go:389 msg="No time or size retention was set so using the default time retention" duration=15d
level=info ts=2021-08-10T08:38:15.183Z caller=main.go:443 msg="Starting Prometheus" version="(version=2.28.1, branch=HEAD, revision=b0944590a1c9a6b35dc5a696869f75f422b107a1)"
level=info ts=2021-08-10T08:38:15.183Z caller=main.go:448 build_context="(go=go1.16.5, user=root@37280701f401, date=20210701-15:18:34)"
level=info ts=2021-08-10T08:38:15.183Z caller=main.go:449 host_details=(windows)
level=info ts=2021-08-10T08:38:15.184Z caller=main.go:450 fd_limits=N/A
level=info ts=2021-08-10T08:38:15.184Z caller=main.go:451 vm_limits=N/A
level=error ts=2021-08-10T08:38:15.185Z caller=query_logger.go:87 component=activeQueryTracker msg="Error opening query log file" file=data\queries.active err="open data\queries.active: The requested operation cannot be performed on a file with a user-mapped section open."
panic: Unable to create mmap-ed active query log

I am using wmi exporter. How can I fix this error?

@Sadiksha-svg
Copy link

I solved it by creating the volume assigned in the file.

@a0s
Copy link

a0s commented Aug 21, 2022

For kube-prometheus-stack and cdktf it looks like:

<ReleaseSet>{name: "prometheus.prometheusSpec.securityContext.runAsUser", value: "65534"},
<ReleaseSet>{name: "prometheus.prometheusSpec.securityContext.runAsGroup", value: "65534"},
<ReleaseSet>{name: "prometheus.prometheusSpec.securityContext.fsGroup", value: "65534"},
<ReleaseSet>{name: "prometheus.prometheusSpec.securityContext.runAsNonRoot", value: "true"},

<ReleaseSet>{name: "prometheus.prometheusSpec.initContainers[0].name", value: "fix-volume-permissions"},
<ReleaseSet>{name: "prometheus.prometheusSpec.initContainers[0].image", value: "busybox"},

<ReleaseSet>{name: "prometheus.prometheusSpec.initContainers[0].command[0]", value: "/bin/chown"},
<ReleaseSet>{name: "prometheus.prometheusSpec.initContainers[0].command[1]", value: "-R"},
<ReleaseSet>{name: "prometheus.prometheusSpec.initContainers[0].command[2]", value: "65534:65534"},
<ReleaseSet>{name: "prometheus.prometheusSpec.initContainers[0].command[3]", value: "/volume"},

<ReleaseSet>{name: "prometheus.prometheusSpec.initContainers[0].volumeMounts[0].name", value: "prometheus-kube-prometheus-stack-prometheus-db"},
<ReleaseSet>{name: "prometheus.prometheusSpec.initContainers[0].volumeMounts[0].mountPath", value: "/volume"},

<ReleaseSet>{name: "prometheus.prometheusSpec.initContainers[0].securityContext.runAsGroup", value: "0"},
<ReleaseSet>{name: "prometheus.prometheusSpec.initContainers[0].securityContext.runAsNonRoot", value: "false"},
<ReleaseSet>{name: "prometheus.prometheusSpec.initContainers[0].securityContext.runAsUser", value: "0"},

@chq1234
Copy link

chq1234 commented Mar 4, 2024

ts=2024-03-04T09:02:26.024Z caller=main.go:509 level=warn deprecation_notice="'storage.tsdb.retention' flag is deprecated use 'storage.tsdb.retention.time' instead."
ts=2024-03-04T09:02:26.024Z caller=main.go:564 level=info msg="Starting Prometheus Server" mode=server version="(version=2.43.1, branch=HEAD, revision=e278195e3983c966c2a0f42211f62fa8f40c5561)"
ts=2024-03-04T09:02:26.024Z caller=main.go:569 level=info build_context="(go=go1.19.9, platform=linux/amd64, user=root@fdbae5f7538f, date=20230504-20:56:42, tags=netgo,builtinassets)"
ts=2024-03-04T09:02:26.024Z caller=main.go:570 level=info host_details="(Linux 4.19.90-2211.2.0.0176.oe1.x86_64 #1 SMP Wed Nov 9 11:00:21 UTC 2022 x86_64 monitor-prometheus-collect-deployment-5774bf464-lzq76 (none))"
ts=2024-03-04T09:02:26.024Z caller=main.go:571 level=info fd_limits="(soft=524288, hard=524288)"
ts=2024-03-04T09:02:26.024Z caller=main.go:572 level=info vm_limits="(soft=unlimited, hard=unlimited)"
ts=2024-03-04T09:02:26.031Z caller=query_logger.go:72 level=error component=activeQueryTracker msg="Failed to read query log file" err=EOF
ts=2024-03-04T09:02:26.099Z caller=query_logger.go:103 level=error component=activeQueryTracker msg="Failed to mmap" file=/prometheus/queries.active Attemptedsize=20001 err="invalid argument"
panic: Unable to create mmap-ed active query log

goroutine 1 [running]:
github.com/prometheus/prometheus/promql.NewActiveQueryTracker({0x7ffcc1ab14eb, 0xb}, 0x14, {0x3dec5c0, 0xc0006acb40})
/app/promql/query_logger.go:121 +0x3cd
main.main()
/app/cmd/prometheus/main.go:626 +0x6cf3

What are the possible reasons for this situation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.