# Configuration Trees
Most configurations can be modeled as trees where a node has a name, some optional attributes, and some optional children. This includes systems that use yaml, json, and ini as well as systems like httpd, nginx, multipath, logrotate and many others that have custom formats. Many also have a primary configuration file with supplementary files included by special directives in the main file.

We have developed parsers for common configuration file formats as well as the custom formats of many systems. These parsers all construct a tree of the same primitive building blocks, and their combiners properly handle include directives. The final configuration for a given system is a composite of the primary and supplementary configuration files.

Since the configurations are parsed to the same primitives to build their trees, we can navigate them all using the same API.

This tutorial will focus on the common API for accessing config trees. It uses httpd configuration as an example, but the API is exactly the same for other systems.

In [1]:
import sys
sys.path.insert(0, "../..")

In [2]:
from insights.combiners.httpd_conf import get_tree
from insights.parsr.query import *

conf = get_tree()

`conf` now contains the consolidated httpd configuration tree from my machine. The API that follows is exactly the same for nginx, multipath, logrotate, and ini parsers. Yaml and Json parsers have a `.doc` attribute that exposes the same API. They couldn't do so directly for backward compatibility reasons.

## Basic Navigation
The configuration can be treated in some sense like a dictionary:

In [3]:
conf["Alias"]

Alias: /icons/ /usr/share/httpd/icons/
Alias: /.noindex.html /usr/share/httpd/noindex/index.html

In [4]:
conf["Directory"]

[Directory /]
    AllowOverride: none
    Require: all denied

[Directory /var/www]
    AllowOverride: None
    Require: all granted

[Directory /var/www/html]
    Options: Indexes FollowSymLinks
    AllowOverride: None
    Require: all granted

[Directory /var/www/cgi-bin]
    AllowOverride: None
    Options: None
    Require: all granted

[Directory /usr/share/httpd/icons]
    Options: Indexes MultiViews FollowSymlinks
    AllowOverride: None
    Require: all granted

[Directory /home/*/public_html]
    AllowOverride: FileInfo AuthConfig Limit Indexes
    Options: MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec
    Require: method GET POST OPTIONS

[Directory /usr/share/httpd/noindex]
    AllowOverride: None
    Require: all granted

In [5]:
conf["Directory"]["Options"]

Options: Indexes FollowSymLinks
Options: None
Options: Indexes MultiViews FollowSymlinks
Options: MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec

Notice that the first pair of brackets are a query against the first level of the configuration tree. `conf["Alias"]` returns all of the "Alias" nodes. `conf["Directory"]` returns all of the "Directory" nodes.

A set of brackets after another set means to chain the queries using previous query results as the starting point. So, `conf["Directory"]["Options"]` first finds all of the "Directory" nodes, and then those are queried for their "Options" directives.

## Complex Queries
In addition to simple queries that match node names, more complex queries are supported. For example, to get the "Directory" node for "/", we can do the following:

In [6]:
conf["Directory", "/"]

[Directory /]
    AllowOverride: none
    Require: all denied

The comma constructs a tuple, so `conf["Directory", "/"]` and `conf[("Directory", "/")]` are equivalent. The first element of the tuple exactly matches the node name, and subsequent elements exactly match any of the node's attributes. Notice that this is still a query, and the result behaves like a list:

In [7]:
conf["Directory", "/", "/var/www"]

[Directory /]
    AllowOverride: none
    Require: all denied

[Directory /var/www]
    AllowOverride: None
    Require: all granted

That's asking for Directory nodes with any attribute exactly matching any of "/" or "/var/www". These can be chained with more brackets just like the simpler queries shown earlier.

## Predicates
In addition to exact matches, predicates can be used to more exactly express what you want:

In [8]:
conf["Directory", startswith("/var/www")]

[Directory /var/www]
    AllowOverride: None
    Require: all granted

[Directory /var/www/html]
    Options: Indexes FollowSymLinks
    AllowOverride: None
    Require: all granted

[Directory /var/www/cgi-bin]
    AllowOverride: None
    Options: None
    Require: all granted

In [9]:
conf[contains("Icon")]

AddIconByEncoding: (CMP,/icons/compressed.gif) x-compress x-gzip
AddIconByType: (TXT,/icons/text.gif) text/*
AddIconByType: (IMG,/icons/image2.gif) image/*
AddIconByType: (SND,/icons/sound2.gif) audio/*
AddIconByType: (VID,/icons/movie.gif) video/*
AddIcon: /icons/binary.gif .bin .exe
AddIcon: /icons/binhex.gif .hqx
AddIcon: /icons/tar.gif .tar
AddIcon: /icons/world2.gif .wrl .wrl.gz .vrml .vrm .iv
AddIcon: /icons/compressed.gif .Z .z .tgz .gz .zip
AddIcon: /icons/a.gif .ps .ai .eps
AddIcon: /icons/layout.gif .html .shtml .htm .pdf
AddIcon: /icons/text.gif .txt
AddIcon: /icons/c.gif .c
AddIcon: /icons/p.gif .pl .py
AddIcon: /icons/f.gif .for
AddIcon: /icons/dvi.gif .dvi
AddIcon: /icons/uuencoded.gif .uu
AddIcon: /icons/script.gif .conf .sh .shar .csh .ksh .tcl
AddIcon: /icons/tex.gif .tex
AddIcon: /icons/bomb.gif core.
AddIcon: /icons/back.gif ..
AddIcon: /icons/hand.right.gif README
AddIcon: /icons/folder.gif ^^DIRECTORY^^
AddIcon: /icons/blank.gif ^^BLANKICON^^
DefaultIcon: /icons/un

In [10]:
conf[contains("Icon"), contains("zip")]

AddIconByEncoding: (CMP,/icons/compressed.gif) x-compress x-gzip
AddIcon: /icons/compressed.gif .Z .z .tgz .gz .zip

Predicates can be combined with boolean logic. Here are all the top level nodes with "Icon" in the name and attributes that contain "CMP" and "zip". Note the helper `any_` (there's also an `all_`) that means any attribute must succeed.

In [11]:
conf[contains("Icon"), any_(contains("CMP")) & any_(contains("zip"))]

AddIconByEncoding: (CMP,/icons/compressed.gif) x-compress x-gzip

Here are the entries with all attributes not starting with "/"

In [12]:
conf[contains("Icon"), all_(~startswith("/"))]

AddIconByEncoding: (CMP,/icons/compressed.gif) x-compress x-gzip
AddIconByType: (TXT,/icons/text.gif) text/*
AddIconByType: (IMG,/icons/image2.gif) image/*
AddIconByType: (SND,/icons/sound2.gif) audio/*
AddIconByType: (VID,/icons/movie.gif) video/*

Several predicates are provided: startswith, endswith, contains, lt, le, gt, ge, and eq. They can all be negated with ~ (not) and combined with & (boolean and) and | (boolean or).

It's also possible to filter results based on whether they're a `Section` or a `Directive`.

In [13]:
conf.find(startswith("Directory"))

[Directory /]
    AllowOverride: none
    Require: all denied

[Directory /var/www]
    AllowOverride: None
    Require: all granted

[Directory /var/www/html]
    Options: Indexes FollowSymLinks
    AllowOverride: None
    Require: all granted

DirectoryIndex: index.html

[Directory /var/www/cgi-bin]
    AllowOverride: None
    Options: None
    Require: all granted

[Directory /usr/share/httpd/icons]
    Options: Indexes MultiViews FollowSymlinks
    AllowOverride: None
    Require: all granted

[Directory /home/*/public_html]
    AllowOverride: FileInfo AuthConfig Limit Indexes
    Options: MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec
    Require: method GET POST OPTIONS

[Directory /usr/share/httpd/noindex]
    AllowOverride: None
    Require: all granted

In [14]:
query = startswith("Directory")
print "Directives:"
print conf.find(query).directives
print
print "Sections:"
print conf.find(query).sections
print
print "Chained filtering:"
print conf.find(query).sections["Options"]

Directives:
DirectoryIndex: index.html

Sections:
[Directory /]
    AllowOverride: none
    Require: all denied

[Directory /var/www]
    AllowOverride: None
    Require: all granted

[Directory /var/www/html]
    Options: Indexes FollowSymLinks
    AllowOverride: None
    Require: all granted

[Directory /var/www/cgi-bin]
    AllowOverride: None
    Options: None
    Require: all granted

[Directory /usr/share/httpd/icons]
    Options: Indexes MultiViews FollowSymlinks
    AllowOverride: None
    Require: all granted

[Directory /home/*/public_html]
    AllowOverride: FileInfo AuthConfig Limit Indexes
    Options: MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec
    Require: method GET POST OPTIONS

[Directory /usr/share/httpd/noindex]
    AllowOverride: None
    Require: all granted


Chained filtering:
Options: Indexes FollowSymLinks
Options: None
Options: Indexes MultiViews FollowSymlinks
Options: MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec


Notice that `conf[startswith("Dir")].sections` is not the same as `conf.sections.[startswith("Dir")]`. The first finds all the top level nodes that start with "Dir" and then filters those to just the sections. The second gets all of the top level sections and then searches their children for nodes starting with "Dir."

In [15]:
print "Top level Sections starting with 'Dir':"
print conf[startswith("Dir")].sections
print
print "Children starting with 'Dir' of any top level Section:"
print conf.sections[startswith("Dir")]

Top level Sections starting with 'Dir':
[Directory /]
    AllowOverride: none
    Require: all denied

[Directory /var/www]
    AllowOverride: None
    Require: all granted

[Directory /var/www/html]
    Options: Indexes FollowSymLinks
    AllowOverride: None
    Require: all granted

[Directory /var/www/cgi-bin]
    AllowOverride: None
    Options: None
    Require: all granted

[Directory /usr/share/httpd/icons]
    Options: Indexes MultiViews FollowSymlinks
    AllowOverride: None
    Require: all granted

[Directory /home/*/public_html]
    AllowOverride: FileInfo AuthConfig Limit Indexes
    Options: MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec
    Require: method GET POST OPTIONS

[Directory /usr/share/httpd/noindex]
    AllowOverride: None
    Require: all granted


Children starting with 'Dir' of any top level Section:
DirectoryIndex: index.html


### Ignoring Case
All of the predicates configtree defines take an `ignore_case` keywork parameter. They also have versions with an `i` prefix that pass `ignore_case=True` for you. So `startswith("abc", ignore_case=True)` is the same as `istartswith("abc")`, etc.

It's not possible to ignore case with simple dictionary like access unless you use a predicate: `conf[ieq("ifmodule")]` gets all top level elements with a name equal to any case variant of "ifmodule" whereas `conf["ifmodule"]` is a strict case match.

## Truth and Iteration
Nodes are "truthy" depending on whether they have children. They're also iterable and indexable.

In [16]:
res = conf["Blah"]
print "Boolean:", bool(res)
print "Length:", len(res)
print
print "Iteration:"
for c in conf["Directory"]:
    print c.value
print
print "Indexing:"
print conf["Directory"][0].value
print conf["Directory"][first].value
print conf["Directory"][-1].value
print conf["Directory"][last].value


Boolean: False
Length: 0

Iteration:
/
/var/www
/var/www/html
/var/www/cgi-bin
/usr/share/httpd/icons
/home/*/public_html
/usr/share/httpd/noindex

Indexing:
/
/
/usr/share/httpd/noindex
/usr/share/httpd/noindex


This is also true of conf itself:

In [17]:
sorted(set(c.name for c in conf))

['AddDefaultCharset',
 'AddIcon',
 'AddIconByEncoding',
 'AddIconByType',
 'Alias',
 'DNSSDEnable',
 'DefaultIcon',
 'Directory',
 'DocumentRoot',
 'EnableSendfile',
 'ErrorLog',
 'Files',
 'Group',
 'HeaderName',
 'IfModule',
 'IndexIgnore',
 'IndexOptions',
 'Listen',
 'LoadModule',
 'LocationMatch',
 'LogLevel',
 'ReadmeName',
 'ServerAdmin',
 'ServerRoot',
 'User']

## Attributes
The individual results in a result set have a name, value, attributes, children, an immediate parent, a root, and context for their enclosing file that includes its path and their line within it. If a node exists at the top of the tree, it is its own root.

In [18]:
root = conf.find("ServerRoot")[0]
print "Node name:", root.name
print "Value:", root.value
print "Attributes:", root.attrs
print "Children:", len(root.children)
print "Parent:", root.parent.name
print "Root:", root.root
print "File: ", root.file_path
print "Original Line:", root.line
print "Line Number:", root.lineno

Node name: ServerRoot
Value: /etc/httpd
Attributes: ['/etc/httpd']
Children: 0
Parent: None
Root: ServerRoot: /etc/httpd
File:  /etc/httpd/conf/httpd.conf
Original Line: ServerRoot "/etc/httpd"
Line Number: 31


In [19]:
port = conf.find("Listen").value
print port
print type(port)

80
<type 'int'>


## find and select
In addition to brackets, config trees support two other functions.

### find
`find` searches the entire tree for the query you provide and returns a `Result` of all elements that match.

In [20]:
conf.find("ServerRoot")

ServerRoot: /etc/httpd

In [21]:
conf.find("Alias")

Alias: /icons/ /usr/share/httpd/icons/
Alias: /.noindex.html /usr/share/httpd/noindex/index.html

In [22]:
conf.find("LogFormat")

LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" combined
LogFormat: %h %l %u %t "%r" %>s %b common
LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" %I %O combinedio

If you want the first or last match, access them with brackets as you would a list:

In [23]:
print conf.find("Alias")[0]
print conf.find("Alias")[-1]

Alias: /icons/ /usr/share/httpd/icons/
Alias: /.noindex.html /usr/share/httpd/noindex/index.html


In [24]:
r = conf.find("Boom")
print type(r)
print r

<class 'insights.parsr.query.Result'>



Find takes an addition parameter, `roots`, which defaults to `False`. If it is `False`, the matching entries are returned. If set to `True`, the unique set of ancestors of all matching results are returned.

In [25]:
print 'conf.find("LogFormat"):'
print conf.find("LogFormat")
print
print 'conf.find("LogFormat", roots=True):'
print conf.find("LogFormat", roots=True)

conf.find("LogFormat"):
LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" combined
LogFormat: %h %l %u %t "%r" %>s %b common
LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" %I %O combinedio

conf.find("LogFormat", roots=True):
[IfModule log_config_module]
    LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" combined
    LogFormat: %h %l %u %t "%r" %>s %b common

    [IfModule logio_module]
        LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" %I %O combinedio

    CustomLog: logs/ssl_request_log %t %h %{SSL_PROTOCOL}x %{SSL_CIPHER}x  %r  %b
    CustomLog: logs/access_log combined



In [26]:
conf.find(("IfModule", "logio_module"), "LogFormat")

LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" %I %O combinedio

In [27]:
conf.find("IfModule", ("LogFormat", "combinedio"))

LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" %I %O combinedio

### select
`select` is the primitive query function on which everything else is built. Its parameters operate just like `find`, and by default it queries like a `find` that only searches from the top of the configuration tree instead of walking subtrees.

To support the other cases, it takes two keyword arguments. `deep=True` causes it to search subtrees (default is `deep=False`). `roots=True` causes it to return the unique, top level nodes containing a match. This is true even when `deep=True`. If `roots=False`, it returns matching leaves instead of top level roots.

* `conf.find(*queries) = conf.select(*queries, deep=True, roots=False)`
* `conf[query] = conf.select(query, deep=False, roots=False)`

In [28]:
print conf.select("Alias")
print
print conf.select("LogFormat") or "Nothing"
print conf.select("LogFormat", deep=True)
print conf.select("LogFormat", deep=True, roots=False)
print
print conf.select("LogFormat", deep=True, roots=False)[0]
print conf.select("LogFormat", deep=True, roots=False)[-1]

Alias: /icons/ /usr/share/httpd/icons/
Alias: /.noindex.html /usr/share/httpd/noindex/index.html

Nothing
LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" combined
LogFormat: %h %l %u %t "%r" %>s %b common
LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" %I %O combinedio
LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" combined
LogFormat: %h %l %u %t "%r" %>s %b common
LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" %I %O combinedio

LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" combined
LogFormat: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" %I %O combinedio


## Custom Predicates
It's easy to create your own predicates to use with config trees. They come in parameterized and unparameterized types and can be used against names or attributes. If used in a name position, they're passed the node's name. If used in an attribute position, they're passed the node's attributes one at a time. If the predicate raises an exception because an attribute is of the wrong type, it's considered `False` for that attribute. Note that other attribute of the node can still cause a `True` result.

In [29]:
from insights.parsr.query.boolean import lift, lift2

is_ifmod = lift(lambda x: x == "IfModule")
is_user_mod = lift2(lambda x: "user" in x)
divisible_by = lift2(lambda in_val, divisor: (in_val % divisor) == 0)

In [30]:
print "Num IfModules:", len(conf[is_ifmod])
print "User mod checks:", len(conf.find(("IfModule", is_user_mod)))
print "Div by 10?", conf["Listen", divisible_by(10)] or "No matches"
print "Div by 3?", conf["Listen", divisible_by(3)] or "No matches"

Num IfModules: 9
User mod checks: 10
Div by 10? Listen: 80
Div by 3? No matches
