Skip to content

Categorizing files into fileclasses

Thomas Leibovici edited this page Jan 13, 2015 · 3 revisions

Table of Contents

Introduction

The goal of this article is to get an overview of your FS content by defining file classes.

Fileclass definition is very flexible and can be based on any set of file attributes.

Example of fileclass definition:

    FileClass BigLogs {
       Definition {
            type == file
            and size > 100MB
            and ( name == "*.log" or path == "/fs/logdir")
       }
    }

File class stats can be retrieved using '--class-info' option of rbh-report:

    > rbh-report --class-info --csv
    FileClass,     count,         spc_used,     avg_size
    Class_X  ,      1100,       1153433600,     10485760
    Class_Y  ,     10139,     378007123968,     3728667

This makes it possible to build nice charts like this (this example uses rrdtools):

Step 1 - Defining FileClasses

Define all your file classes in the "Filesets" section of the config file:

    Filesets {
       FileClass classA {
          Definition { size >= 1KB }
       }
       FileClass classB {
          Definition { name == "*.o" }
       }
    }

Step 2 - Make them match

FileClasses are only matched if they are referenced in a policy, basically because you may want to have a different file classification for purge policies, migration policies, etc... We agree this can be quite confusing, so this behavior will be changed in next versions of Robinhood.

So, you need to reference them in a purge policy, even if you don't perform purges on your filesystem:

    purge_policies {
        ignore_fileclass = classA;
        ignore_fileclass = classB;
    }

FileClasses are matched using the following order:

1) "Ignore" rules of *_policies block. Entries are displayed as "ignored" by rbh-report --class-info.

2) "ignore_fileclass" rules, in the order they are specified in *_policies block. Entries are displayed with the matching ignored fileclass.

3) "target_fileclass" rules defined in policy blocks, in the order they appear. Entries are displayed with the matching target fileclass.

According to this matching order, if ClassB is a subset of ClassA in the example above, no file will be categorized as ClassB because ClassA is matched before.

Important: if you don't want robinhood daemon to apply these purge policies, make sure RBH_OPT is NOT empty and does NOT include "--purge" in /etc/sysconfig/robinhood.
For instance, you can set: RBH_OPT="--scan" so robinhood daemon only performs scans.

Step 3 - Run the matching

Matching occurs when robinhood processes an entry, when scanning it, or processing a changelog event about it. Thus, to match (or rematch) all entries, it is recommended to run a full filesystem scan.