TL;DR
I want to replace some metrics currently produced by the CloudWatch Agent with aperf records stored in S3. I’m not a performance-optimization expert, so I may be missing some common best practices.
Aperf seems to be designed mainly for manual performance analysis, so it lacks some features I expected. I also couldn’t find certain data that’s important for my intended use cases.
Proposed usage
Today, we rely on the CloudWatch Agent for host-level performance metrics. I’m exploring whether aperf could be used instead to collect more detailed, low-level performance data and store it in S3 for later analysis. This would allow us to have fine-grained visibility into host behavior, without being limited to the 1-minute or 1-hour granularity of CloudWatch metrics.
My goal is to collect performance records for the entire lifetime of each EC2 host running in AWS ECS.
Because I don’t control when a host is terminated, my current idea is:
- Run a cron job every minute on each host.
- Each run:
- Records performance data for 60 seconds using
aperf.
- Uploads the resulting record to S3.
- This should minimize gaps in host-level performance history and make sure we don’t lose data when instances are terminated unexpectedly.
Questions / concerns
- Is this usage pattern aligned with how aperf is intended to be used?
If not, could you please share any concerns you may have?
- Are there any known limitations when using aperf in this way (for example, overhead, storage, or data quality)?
Based on my estimates, the record size should remain reasonable.
Features I would like to have as part of aperf
- Currently, records do not include absolute timestamps. I would like to have the ability to merge multiple records into a single larger record, even if they overlap.
- I would appreciate native support for using S3 as the output location for records. This would remove the need to upload them manually using
awscli.
- If
aperf record could run in the background and periodically upload partial records to S3, I would not need to rely on cron jobs.
Data that I cannot find in the report and would like to have in the future
- Network usage speed - at the moment I can only find packet counters, but I would like to see actual throughput values.
- Disk usage speed - we are using EC2 instances with NVMe disks, and disk performance is an important factor for us.
- Speed of communication with S3 - we upload and download many gigabytes of data to and from S3, so it would be very helpful to have a separate graph for this traffic.
- Free and used disk space for each mount point.
I’d really appreciate your feedback on this request and any guidance on possible implementations. I’m happy to contribute and help implement these features through PRs.
TL;DR
I want to replace some metrics currently produced by the CloudWatch Agent with
aperfrecords stored in S3. I’m not a performance-optimization expert, so I may be missing some common best practices.Aperf seems to be designed mainly for manual performance analysis, so it lacks some features I expected. I also couldn’t find certain data that’s important for my intended use cases.
Proposed usage
Today, we rely on the CloudWatch Agent for host-level performance metrics. I’m exploring whether aperf could be used instead to collect more detailed, low-level performance data and store it in S3 for later analysis. This would allow us to have fine-grained visibility into host behavior, without being limited to the 1-minute or 1-hour granularity of CloudWatch metrics.
My goal is to collect performance records for the entire lifetime of each EC2 host running in AWS ECS.
Because I don’t control when a host is terminated, my current idea is:
aperf.Questions / concerns
If not, could you please share any concerns you may have?
Based on my estimates, the record size should remain reasonable.
Features I would like to have as part of
aperfawscli.aperf recordcould run in the background and periodically upload partial records to S3, I would not need to rely on cron jobs.Data that I cannot find in the report and would like to have in the future
I’d really appreciate your feedback on this request and any guidance on possible implementations. I’m happy to contribute and help implement these features through PRs.