Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aerospike pods consume a lot of RAM during migration, which is not released after migration #41

Closed
mdnfiras opened this issue Aug 25, 2022 · 5 comments

Comments

@mdnfiras
Copy link

mdnfiras commented Aug 25, 2022

Platform: GKE
Aerospike container version: aerospike/aerospike-server:5.5.0.7


Aerospike pods consume a lot of RAM during migration, which is not released after migration: (graph below is container memory usage / container memory limit, which is 5Gi)
image

we run a 3 Aerospike community edition pods cluster:
pods resources:

resources:
  limits:
    cpu: 2
    memory: 5Gi
  requests:
    cpu: 500m
    memory: 4Gi

some of the aerospike helm values:

aerospikeNamespaceMemoryGB: "3"
aerospikeReplicationFactor: "2"
aerospikeConfFile: |
  # Aerospike database configuration file.
  # This stanza must come first.
  service { ... dedacted ... }
  logging { ... dedacted ... }
  network { ... dedacted ... }
  namespace ${NAMESPACE} {
    replication-factor ${REPL_FACTOR}
    memory-size ${MEM_GB}G
    default-ttl ${DEFAULT_TTL}
    nsup-period 1d
    storage-engine device {
      file /opt/aerospike/data/${MY_POD_NAME}-${NAMESPACE}.dat
      filesize 35G
      data-in-memory false # Store data in memory in addition to file.
      write-block-size 512K
    }
  }

at the start of the migration, running kubectl top -n aerospike pod aerospike-v2-aerospike-0 shows

NAME                         CPU(cores)   MEMORY(bytes)   
aerospike-v2-aerospike-0   2098m        1442Mi

and at the end of the migration it shows:

NAME                         CPU(cores)   MEMORY(bytes)   
aerospike-v2-aerospike-0   27m          4336Mi

output of asadm -e "info" after migration:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Network Information (2022-08-25 12:34:56 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
               Cluster|                                                                              Node|         Node ID|              IP|    Build|Migrations|~~~~~~~~~~~~~~~~~~Cluster~~~~~~~~~~~~~~~~~~|Client|  Uptime
                      |                                                                                  |                |                |         |          |Size|         Key|Integrity|      Principal| Conns|        
aerospike-v2-aerospike|aerospike-v2-aerospike-0.aerospike-v2-aerospike.aerospike.svc.cluster.local:3000| BB91617E999F27A|10.48.18.33:3000|C-5.5.0.7|   0.000  |   3|601ECA002FD7|True     |BB9ED2135EE683E|    19|00:21:47
aerospike-v2-aerospike|aerospike-v2-aerospike-1.aerospike-v2-aerospike.aerospike.svc.cluster.local:3000| BB913F72779F4F6|10.48.9.124:3000|C-5.5.0.7|   0.000  |   3|601ECA002FD7|True     |BB9ED2135EE683E|    17|00:21:20
aerospike-v2-aerospike|aerospike-v2-aerospike-2.aerospike-v2-aerospike.aerospike.svc.cluster.local:3000|*BB9ED2135EE683E|10.48.30.31:3000|C-5.5.0.7|   0.000  |   3|601ECA002FD7|True     |BB9ED2135EE683E|    18|00:21:52
Number of rows: 3

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Namespace Usage Information (2022-08-25 12:34:56 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Namespace|                                                                              Node|   Total|Expirations|Evictions|  Stop|~~~~~~~~~~~Disk~~~~~~~~~~~|~~~~~~~~~~~Memory~~~~~~~~~~|~Primary~
         |                                                                                  | Records|           |         |Writes|    Used|Used%|HWM%|Avail%|      Used|Used%|HWM%|Stop%|~~Index~~
         |                                                                                  |        |           |         |      |        |     |    |      |          |     |    |     |     Type
nyris    |aerospike-v2-aerospike-0.aerospike-v2-aerospike.aerospike.svc.cluster.local:3000| 7.511 M|    0.000  |  0.000  |False |2.669 GB|    8|   0|    91|458.418 MB|   15|   0|   90|undefined
nyris    |aerospike-v2-aerospike-1.aerospike-v2-aerospike.aerospike.svc.cluster.local:3000| 7.680 M|    0.000  |  0.000  |False |2.729 GB|    8|   0|    91|468.728 MB|   16|   0|   90|undefined
nyris    |aerospike-v2-aerospike-2.aerospike-v2-aerospike.aerospike.svc.cluster.local:3000| 7.536 M|    0.000  |  0.000  |False |2.678 GB|    8|   0|    91|459.940 MB|   15|   0|   90|undefined
nyris    |                                                                                  |22.726 M|    0.000  |  0.000  |      |8.075 GB|     |    |      |  1.355 GB|     |    |     |         
Number of rows: 3

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Namespace Object Information (2022-08-25 12:34:56 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Namespace|                                                                              Node|Rack|  Repl|   Total|~~~~~~~~~~~Objects~~~~~~~~~~~|~~~~~~~~~Tombstones~~~~~~~~|~~~~Pending~~~~
         |                                                                                  |  ID|Factor| Records|  Master|   Prole|Non-Replica| Master|  Prole|Non-Replica|~~~~Migrates~~~
         |                                                                                  |    |      |        |        |        |           |       |       |           |     Tx|     Rx
nyris    |aerospike-v2-aerospike-0.aerospike-v2-aerospike.aerospike.svc.cluster.local:3000|   0|     2| 7.511 M| 3.684 M| 3.827 M|    0.000  |0.000  |0.000  |    0.000  |0.000  |0.000  
nyris    |aerospike-v2-aerospike-1.aerospike-v2-aerospike.aerospike.svc.cluster.local:3000|   0|     2| 7.680 M| 3.930 M| 3.749 M|    0.000  |0.000  |0.000  |    0.000  |0.000  |0.000  
nyris    |aerospike-v2-aerospike-2.aerospike-v2-aerospike.aerospike.svc.cluster.local:3000|   0|     2| 7.536 M| 3.749 M| 3.787 M|    0.000  |0.000  |0.000  |    0.000  |0.000  |0.000  
nyris    |                                                                                  |    |      |22.726 M|11.363 M|11.363 M|    0.000  |0.000  |0.000  |    0.000  |0.000  |0.000  
Number of rows: 3

I have seen this post which explains why does it consumes this much memory, but i don't see why this memory is not released after migration is done.

@kportertx
Copy link
Contributor

kportertx commented Aug 26, 2022

I have seen this post which explains why does it consumes this much memory, but i don't see why this memory is not released after migration is done.

That's a very stale KB article. The issue with threads loading entire partitions into memory was addressed back in Aerospike 3.x. I've requested this be reviewed internally.

@mdnfiras
Copy link
Author

mdnfiras commented Oct 6, 2022

any ideas what could be the problem here? we are still experiencing this. should we update to latest version?

@kportertx
Copy link
Contributor

kportertx commented Oct 6, 2022

Could you provide the output of:

ipcs

If the migrations were from decreasing the cluster size, it is possible that the primary index added one or more stages. The primary index is allocated in 1 GiB stages (by default). Primary index memory isn't ever freed back to the OS, the free space in the stages is managed by Aerospike.

@mdnfiras
Copy link
Author

mdnfiras commented Oct 7, 2022

@kportertx thanks for the response! right now there are no migrations happening, and the output of ipcs in all 3 nodes is:

# ipcs

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages    

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status      

------ Semaphore Arrays --------
key        semid      owner      perms      nsems     

@kportertx
Copy link
Contributor

Oh, right, this is Aerospike Community so ipcs will not show primary index stages because the primary index isn't stored in shared memory. We do not have a metric that shows how many stages the primary index has allocated, so I cannot definitively say that this is the cause in your case, but I believe it is the most likely explanation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants