Skip to content

Commit

Permalink
basic structure
Browse files Browse the repository at this point in the history
  • Loading branch information
arzoo14 committed Oct 22, 2021
1 parent abcdc2c commit 35882fe
Show file tree
Hide file tree
Showing 74 changed files with 21,560 additions and 21,052 deletions.
2 changes: 1 addition & 1 deletion docs/.gitignore
Original file line number Diff line number Diff line change
@@ -1 +1 @@
_build
_build
10 changes: 5 additions & 5 deletions docs/HowTo/AddNewMetrics.rst → docs/HowDoI/AddNewMetrics.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
.. _AddNewMetrics:

How do I add new metrics to the available set ?
################################################

.. _AddNewMetrics:

How do I add new metrics to the available set ?
################################################

159 changes: 159 additions & 0 deletions docs/HowDoI/AnalyzeLinuxContainers.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
.. _AnalyzeLinuxContainers:

How do I analyze Linux containers ?
####################################

.. contents::

Container engines like Docker, LXC, Rocket and others build on two Linux kernel facilities - cgroups and namespaces. In order to understand the performance characteristics of containerized environments, we need some background kernel knowledge to see how these concepts affect both the system itself, and system-level analysis tools like PCP.

Getting Started
----------------
1. If you don't have some handy containers, create and start one or two containers for experimentation.
2. To observe running containers:
Using Docker: docker ps -a
Under libpod: podman ps -a
With LXC: lxc-ls and lxc-info
3. Check local PCP collector installation (requires the pcp-verify utility):
$ pcp verify --containers
4. Request networking metrics for a host and then a container running on the host:
$ pminfo --fetch containers.name containers.state.running

containers.name
inst [0 or "f4d3b90bea15..."] value "sharp_feynman"
inst [1 or "d43eda0a7e7d..."] value "cranky_colden"
inst [2 or "252b56e79da5..."] value "desperate_turing"

containers.state.running
inst [0 or "f4d3b90bea15..."] value 1
inst [1 or "d43eda0a7e7d..."] value 0
inst [2 or "252b56e79da5..."] value 1

$ pmprobe -I network.interface.up
network.interface.up 5 "p2p1" "wlp2s0" "lo" "docker0" "veth2234780"

$ pmprobe -I --container sharp_feynman network.interface.up
network.interface.up 2 "lo" "eth0"

$ pmprobe -I --container f4d3b90bea15 network.interface.up
network.interface.up 2 "lo" "eth0"

Note: these commands all query the same pmcd process - from the host running the container engine. In other words, there is no need to install any PCP software inside the monitored containers.

Control groups
---------------
Control Groups are a Linux kernel mechanism for aggregating or partitioning sets of tasks, and their children, into hierarchical groups with specialized behaviour. This is the underlying technology used for controlling the set of processes within each container.

Recall that the concept of a "container" is a user-space construct only, and it is the role of the container engine to ensure the kernel cgroup hierarchies are constructed and managed appropriately for the containers it provides.

A cgroup subsystem is kernel code that makes use of the task grouping facilities provided by cgroups to treat groups of tasks in particular ways. A subsystem is typically a "resource controller" that schedules a resource or applies per-cgroup limits. Examples of cgroup subsystems used by container engines include the virtual memory subsystem (memory), the processor accounting subsystem (cpuacct), the block accounting cgroup (blkio), and several others.

Within the scope of individual cgroup subsystems, hierarchies can be created, managed and shaped in terms of the tasks within them. A hierarchy is a set of cgroups arranged in a tree, such that every task in the system is in exactly one of the cgroups in the hierarchy, and a set of subsystems; each subsystem has system-specific state attached to each cgroup in the hierarchy.

Each hierarchy has an instance of the cgroup virtual filesystem associated with it.

These can be interrogated by querying the PCP cgroup.subsys and cgroup.mounts metrics.
$ pminfo --fetch cgroup.subsys.num_cgroups

cgroup.mounts.subsys
inst [0 or "/sys/fs/cgroup/systemd"] value "?"
inst [1 or "/sys/fs/cgroup/cpuset"] value "cpuset"
inst [2 or "/sys/fs/cgroup/cpu,cpuacct"] value "cpu,cpuacct"
inst [3 or "/sys/fs/cgroup/memory"] value "memory"
inst [4 or "/sys/fs/cgroup/devices"] value "devices"
inst [5 or "/sys/fs/cgroup/freezer"] value "freezer"
inst [6 or "/sys/fs/cgroup/net_cls,net_prio"] value "net_cls,net_prio"
inst [7 or "/sys/fs/cgroup/blkio"] value "blkio"
inst [8 or "/sys/fs/cgroup/perf_event"] value "perf_event"
inst [9 or "/sys/fs/cgroup/hugetlb"] value "hugetlb"

cgroup.subsys.num_cgroups
inst [0 or "cpuset"] value 1
inst [1 or "cpu"] value 77
inst [2 or "cpuacct"] value 77
inst [3 or "memory"] value 3
inst [4 or "devices"] value 3
inst [5 or "freezer"] value 3
inst [6 or "net_cls"] value 1
inst [7 or "blkio"] value 77
inst [8 or "perf_event"] value 1
inst [9 or "net_prio"] value 1
inst [10 or "hugetlb"] value 1

Userspace code (i.e. container engines like Docker and LXC) can create and destroy cgroups by name in an instance of the cgroup virtual file system, specify and query to which cgroup a task is assigned, and list the task PIDs assigned to a cgroup. Those creations and assignments only affect the hierarchy associated with that instance of the cgroup file system.

Namespaces
--------------

Completely distinct to cgroups, at least within the kernel, is the concept of namespaces. Namespaces allow different processes to have differing views of several aspects of the kernel, such as the hostname (UTS namespace), network interfaces (NET namespace), process identifiers (PID namespace), mounted filesystems (MNT namespace) and so on.

When processes share a namespace, they share the same view of a resource. For example, objects created in one IPC namespace are visible to all other processes that are members of that namespace, but are not visible to processes having another IPC namespace. So for our purposes, all processes running in one container can thus have a different view to the IPC resources visible to a process running on the host or a different container.

Returning to the first networking example in this tutorial, we can see how namespaces become important from a performance tooling point of view. For network metrics within a container, we want to be able to report values for the set of network interfaces visible within that container, instead of the set from the host itself.

Finally, it is important to note that namespaces are not a complete abstraction, in that many aspects of the underlying host remain visible from within the container. This affects performance tools in that the values exported for some metrics can be adjusted and fine-tuned relative to the container, while others cannot.

Containers and PCP
--------------------

1. Core Extensions

1.1 All connections made to the PCP metrics collector daemon (pmcd) are made using the PCP protocol, which is TCP/IP based and thus (importantly for containers) connection-oriented.
1.2. Each individual monitoring tool has a unique connection to pmcd and can request values for a specific, custom set of metrics. This includes being able to request metric values related to a specific, named container.

Note that PCP differs to the design of several other monitoring systems in this regard, which write or send out a specified set of system-wide values, on a set interval.

1.3. From a user point of view, this boils down to being able to specify a container via the interface (whether command line or graphical) of the PCP monitoring tools and to have that container name transfered to the PCP collector system. This allows for filtering and fine-tuning of the metric values it returns, such that the values are specific to the named container.

2. Elevated Privileges
--------------------

2.1.Support for containers was first added in the 3.10.2 version of PCP (released in January 2015). This version includes the pmdaroot daemon - a critical component of the container support, it must be enabled in order to monitor containers.

It performs privileged operations on behalf of other PCP agents and plays a pivotal role in informing the other agents about various attributes of the active containers that it discovers on the PCP collector system.

Verify that there is a pmdaroot line in /etc/pcp/pmcd/pmcd.conf and that the pcp command reports that it is running.

3. Container-specific Metric Values

With that core functionality in place, several kernel agents have been taught to customize the metric values they report when the monitoring of a named container has been requested. These include the network, filesys, ipc, and other metrics in pmdalinux, as well as the per-process and cgroup metrics in pmdaproc.


To request container-specific process and control group metrics:
$ pminfo -t --fetch --container sharp_feynman cgroup.memory.stat.cache proc.psinfo.rss pmcd.hostname

cgroup.memory.stat.cache [Number of bytes of page cache memory]
inst [2 or "/system.slice/docker-f4d3b90bea15..."] value 9695232

proc.psinfo.rss [resident set size (i.e. physical memory) of the process]
inst [21967 or "021967 dd if=/dev/random of=/tmp/bits count=200k"] value 676
inst [27996 or "027996 /bin/bash"] value 2964

pmcd.hostname [local hostname]
value "f4d3b90bea15"


4. Performance Metric Domain Agents

As the underlying container technologies have matured, instrumentation has been added for analysis. For example, podman and docker have APIs to extract operational metrics, and these are available from pmdapodman and pmdadocker.

Additionally, components of container infrastructure usually expose metrics via a /metrics HTTP endpoint in the OpenMetrics (Prometheus) format. These metrics can be observed using PCP tools via pmdaopenmetrics.

Next Steps
------------

Web and Graphical Tools


In the PCP strip chart utility pmchart, connections to containers can be established using the "Add Host" dialog, as shown to the left.

This can be accessed via the "New Chart" or "Open View" menu entries.

Specify the name of the PCP Collector system where pmcd is running.
Press the "Advanced" push button to enable additional connection attributes to be specified.
Select the "Container" check box, and enter a container name.
Press "OK" to establish a connection to the container on host - this functions in much the same was as the pminfo examples from earlier in this tutorial.

---------------Add figure here---------------

PCP container metric charts using Vector<link here>
5 changes: 5 additions & 0 deletions docs/HowDoI/AuthenticatedConnections.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
.. _AuthenticatedConnections:

How do I setup authenticated connections ?
################################################

5 changes: 5 additions & 0 deletions docs/HowDoI/AutomateProblemDetection.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
.. _AutomateProblemDetection:

How do I automate performance problem detection ?
################################################

5 changes: 5 additions & 0 deletions docs/HowDoI/AutomatedReasoningBasics.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
.. _AutomatedReasoningBasics:

How do I automated reasoning with pmie ?
################################################

109 changes: 109 additions & 0 deletions docs/HowDoI/ConfigureAutomatedReasoning.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
.. _ConfigureAutomatedReasoning:

How do I configure automated reasoning ?
################################################

.. contents::

Initial setup - Create a scenario
***********************************

1. Open the terminal and write::

$ while true; do sleep 0; done &

2. To observe its effect on the system::

$ pmchart -t 0.5sec -c CPU &

3. Create a new chart showing the process context switch rate to the existing display::

kernel.all.pswitch

4. The above test case can be quite intrusive on low processor count machines, so remember to terminate it when you've finished this tutorial::

$ jobs
[1]- Running while true; do sleep 0; done &
[2]+ Running pmchart -t 0.5sec -c CPU &
$ fg %1

However, you should leave it running throughout all of the tests below.

Using pmieconf and pmie
***********************************

1. Create your own pmie rules using pmieconf::

$ pmieconf -f myrules
pmieconf> disable all
pmieconf> enable cpu.context_switch
pmieconf> modify global delta "5 sec"
pmieconf> modify global holdoff ""
pmieconf> modify global syslog_action no
pmieconf> modify global user_action yes
pmieconf> quit

This command sequence is:

- Inspecting the created file *myrules*
- Making reference to the *pmieconf* man page
- Exploring other *pmieconf* commands ("help" and "list" are useful in this context)

2. Run *pmie* rules using *pmieconf*, and see if the alarm messages appear on standard output::

$ pmie -c myrules

3. Terminate *pmie* and use the reported values from *pmchart* to determine what the average rate of system calls is. Then re-run *pmieconf* to adjust the threshold level up or down to alter the behaviour of *pmie*. Re-run *pmie*.

.. sourcecode:: none

$ pmieconf -f myrules
pmieconf> modify cpu.context_switch threshold 5000 # <-- insert suitable value here
pmieconf> quit
$ pmie -c myrules

Monitoring state with the *shping* PMDA
*****************************************

1. Install *pmdashping* to record system state::

# cd $PCP_PMDAS_DIR/shping
# ./Install


The default *shping* configuration is ``$PCP_PMDAS_DIR/shping/sample.conf``.
However, we can create a new configuration file, say ``$PCP_PMDAS_DIR/shping/my.conf``, with shell tag and command of the form:

.. sourcecode:: none

no-pmie test ! -f /tmp/no-pmie

2. Monitoring pmdashping to observe system state::

$ pmval -t 5 shping.status

Open another command shell, first create the file */tmp/no-pmie*, wait ten seconds, and then remove the file. Observe what *pmval* reports in the other window. Terminate *pmval*.

Custom site rules with *pmieconf*
*********************************

1. Open an editor, edit the *pmieconf* output file created earlier, i.e. *myrules*. Append a new rule at the end (after the **END GENERATED SECTION** line), that is a copy of the **cpu.context_switch** rule.

2. To this new rule, add the following conjunct before the action line (containing ->), modify the message in the new rule's action to be different to the standard rule, make sure the threshold is low enough for the predicate to be true, and then save the file.

.. sourcecode:: none

&& shping.status #'no-pmie' == 0


3. Re-run *pmieconf* to disable the standard rule::

$ pmieconf -f myrules
pmieconf> disable cpu.context_switch
pmieconf> quit

4. Inspect the re-created file *myrules*. Check your new rule is still there and the standard rule has been removed.

5. Run *pmie* using *myrules*, and verify that your new alarm messages appear on standard output. In another window, create the file */tmp/no-pmie*, wait a while, then remove the file.

Notice there may be some delay between the creation or removal of */tmp/no-pmie* and the change in *pmie* behaviour.
5 changes: 5 additions & 0 deletions docs/HowDoI/ExportMetricValues.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
.. _AddNewMetrics:

How do I export metric values in a comma-separated format ?
############################################################

5 changes: 5 additions & 0 deletions docs/HowDoI/GraphPerformanceMetric.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
.. _GraphPerformanceMetric:

How do I graph a performance metric ?
################################################

52 changes: 52 additions & 0 deletions docs/HowDoI/HowDoIGuide.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
.. _AboutHowTo:

How Do I Guide
#############

.. contents::

This guide will assist new users by clearly showing how to solve specific problem scenarios that users encounter frequently.
The aim of this guide is to assist first-time users to become more productive right away.

⁠What This Guide Contains
**************************

This guide contains the following chapters:

Chapter 1, :ref:`How do I list the available performance metrics ?`

Chapter 2, :ref:`How do I add new metrics to the available set ?`

Chapter 3, :ref:`How do I record metrics on my local system ?`

Chapter 4, :ref:`How do I record metrics from a remote system ?`

Chapter 5, :ref:`How do I graph a performance metric ?`

Chapter 6, :ref:`How do I automate performance problem detection ?`

Chapter 7, :ref:`How do I setup automated rules to write to the system log ?`

Chapter 8, :ref:`How do I record historical values for use with the pcp-dstat tool ?`

Chapter 9, :ref:`How do I export metric values in a comma-separated format ?`

Chapter 10, :ref:`How do I use charts ?`

Chapter 11, :ref:`How do I manage archive log?`, covers PCP tools for creating and managing PCP archive logs.

Chapter 12, :ref:`How do I automated reasoning with pmie ?`

Chapter 13, :ref:`How do I configure automated reasoning ?`, covers customization of pmie rules using pmieconf.

Chapter 14, :ref:`How do I analyze Linux containers ?`

Chapter 15, :ref:`How do I establish secure connections ?`

Chapter 16, :ref:`How do I establish secure client connections ?`

Chapter 17, :ref:`How do I setup authenticated connections ?`

Chapter 18, :ref:`How do I import data and create PCP archives?`

Chapter 19, :ref:`How do I use 3D views?`
5 changes: 5 additions & 0 deletions docs/HowDoI/ImportData.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
.. _ImportData:

How do I import data and create PCP archives?
################################################

0 comments on commit 35882fe

Please sign in to comment.