Touchstone

Framework to help data comparison between 2 similar datasets. Touchstone is a framework written in python that provides you an apples to apples comparison between 2 similar datasets.

Touchstone currently supports comparison of the following docs:

Benchmark	Database	Harness
Uperf	Elasticsearch	Ripsaw

Usage

It is suggested to use a venv to install and run touchstone.

python -m venv /path/to/new/virtual/environment
source /path/to/new/virtual/environment/bin/activate
git clone https://github.com/cloud-bulldozer/touchstone
python setup.py develop
touchstone_compare -h

For example:

To compare 2 runs of uperf data indexed into elasticsearch server marquez.perf.lab.eng.rdu2.redhat.com ran through ripsaw, which generated 2 uuids: [6c5d0257-57e4-54f0-9c98-e149af8b4a5c 70cbb0eb-8bb6-58e3-b92a-cb802a74bb52]

You'd be running it as follows:

touchstone_compare uperf elasticsearch ripsaw -url marquez.perf.lab.eng.rdu2.redhat.com marquez.perf.lab.eng.rdu2.redhat.com  -u 6c5d0257-57e4-54f0-9c98-e149af8b4a5c 70cbb0eb-8bb6-58e3-b92a-cb802a74bb52

Contributing

Touchstone uses factory pattern for the creating main objects - Benchmarks and Databases.

The logic of interacting with a specific database is in the databases directory, while the knowledge and interaction of what the query and data should look like goes into a specific benchmark.

As a contributor, more often than not you'll be adding code to benchmarks dir.

Benchmarks

To add a new benchmark, you'll create your the class and define three member function which will need to put together the following 4 types of keys:

Filter: To only take the particular entry into consideration if it passes filter
Bucket: To facilitate apple to apple comparison, touchstone will put records into buckets
Aggregation: Apply aggregation and the type on the keys
Compare: Compare the keys that help characterize the SUT/benchmark run
Collate: Collates the keys after applying filters, buckets and aggregations.

The member functions are:

emit_compute_map(): This should emit a dictionary where key is the index and value is a list of various compute dictionaries, and each compute dictionary will have the following keys:

a. 'filter': essentially only select docs/rows that match these conditions. one example for uperf is {'test_type.keyword': 'stream'}

b. 'buckets': this is a list of all the various keys we'll need to look at while bucketing to ensure we do an apple to apple comparison. one example for uperf is ['protocol.keyword', 'message_size', 'num_threads']

c. 'aggregations': aggregations is a dictionary of keys to do aggregations with value being a list of type of aggregations, note an aggregation can also be a dictionary. one example for uperf is {'norm_byte': ['max', 'avg', {'percentiles': {'percents': [50]}}]}
emit_compare_map(): This should emit a dictionary where key is the index and the value is a list of keys to compare
emit_indices(): This should emit a list of indices to search against.

And you'll need to create the above for all the indices in the database choice, so for example in Uperf we look up in the dict and build which looks like:

{
      'elasticsearch': {
        'ripsaw': {
          'ripsaw-uperf-results': {
            'compare': ['uuid', 'user', 'cluster_name',
              'hostnetwork', 'service_ip'
            ],
            'compute': [{
              'filter': {
                'test_type.keyword': 'stream'
              },
              'buckets': ['protocol.keyword',
                'message_size', 'num_threads'
              ],
              'aggregations': {
                'norm_byte': ['max', 'avg',
                  {'percentiles': {
                    'percents': [50]
                  }}]
              }
            }, {
              'filter': {
                'test_type.keyword': 'rr'
              },
              'buckets': ['protocol.keyword',
                'message_size', 'num_threads'
              ],
              'aggregations': {
                'norm_ops': ['max', 'avg'],
                'norm_ltcy': [{
                  'percentiles': {
                    'percents': [90, 99]
                  }
                }, 'avg']
              },
            }]
          }
        }
      }
    }

The highest level is the database type, and then comes the harness, then the dictionary is of the indices ( in this case only one index ) This dictionary has 2 keys - compare and compute.

Databases

If you're looking at adding databases please take a look at the elasticsearch class.

The main interfaces that will need to be added are as follows:

emit_compare_dict: This needs to be a dictionary where the keys are the keys in compare map for the benchmark, while the value is another dictionary where the sub dictionary's key is the uuid and the value is the associated value.

So an example for uperf's compare is as follows:

{
    "uuid": {
        "6c5d0257-57e4-54f0-9c98-e149af8b4a5c": "6c5d0257-57e4-54f0-9c98-e149af8b4a5c"
    },
    "user": {
        "6c5d0257-57e4-54f0-9c98-e149af8b4a5c": "aakarsh"
    },
    "cluster_name": {
        "6c5d0257-57e4-54f0-9c98-e149af8b4a5c": "cnvcluster"
    },
    "hostnetwork": {
        "6c5d0257-57e4-54f0-9c98-e149af8b4a5c": "False"
    },
    "service_ip": {
        "6c5d0257-57e4-54f0-9c98-e149af8b4a5c": "172.16.12.12"
    }
}

emit_compute_dict: This dictionary is going to be a nested dictionary with a depth of 2 * len(buckets) in the compute map, where first level key will be the first key in the list of bucket and then the value is a dictionary with keys being the potential values for the bucket and then the value then being a dictionary where the key will be the second level key and so on until we reach a depth of 2 * len(buckets) at which case the value ends up being a dictionary with keys being the aggregations and the value being a dictionary similar to the compare dictionary with key being uuid and the value being aggregation value.

So an example for uperf's compute map where buckets is ['protocol.keyword', 'message_size', 'num_threads'] and lets say protocol was only of 'tcp' while message_size could be either 512 or 1024 while num_threads takes the value of 1 or 2, then it looks like following :

{
    "protocol": {
        "tcp": {
            "message_size": {
                "512": {
                    "num_threads": {
                        "1": {
                            "max(norm_ops)": {
                                "6c5d0257-57e4-54f0-9c98-e149af8b4a5c": 1740.0
                            },
                            "avg(norm_ops)": {
                                "6c5d0257-57e4-54f0-9c98-e149af8b4a5c": 1040.0
                            },
                            "90.0percentiles(norm_ltcy)": {
                                "6c5d0257-57e4-54f0-9c98-e149af8b4a5c": null
                            },
                            "99.0percentiles(norm_ltcy)": {
                                "6c5d0257-57e4-54f0-9c98-e149af8b4a5c": null
                            },
                            "avg(norm_ltcy)": {
                                "6c5d0257-57e4-54f0-9c98-e149af8b4a5c": null
                            }
                        },
                        "2": {
                            "max(norm_ops)": {
                                "6c5d0257-57e4-54f0-9c98-e149af8b4a5c": 4622.0
                            },
                            "avg(norm_ops)": {
                                "6c5d0257-57e4-54f0-9c98-e149af8b4a5c": 2455.0833333333335
                            },
                            "90.0percentiles(norm_ltcy)": {
                                "6c5d0257-57e4-54f0-9c98-e149af8b4a5c": null
                            },
                            "99.0percentiles(norm_ltcy)": {
                                "6c5d0257-57e4-54f0-9c98-e149af8b4a5c": null
                            },
                            "avg(norm_ltcy)": {
                                "6c5d0257-57e4-54f0-9c98-e149af8b4a5c": null
                            }
                        }
                    }
                },
                "1024": {
                    "num_threads": {
                        "1": {
                            "max(norm_ops)": {
                                "6c5d0257-57e4-54f0-9c98-e149af8b4a5c": 2412.0
                            },
                            "avg(norm_ops)": {
                                "6c5d0257-57e4-54f0-9c98-e149af8b4a5c": 1000.3333333333334
                            },
                            "90.0percentiles(norm_ltcy)": {
                                "6c5d0257-57e4-54f0-9c98-e149af8b4a5c": null
                            },
                            "99.0percentiles(norm_ltcy)": {
                                "6c5d0257-57e4-54f0-9c98-e149af8b4a5c": null
                            },
                            "avg(norm_ltcy)": {
                                "6c5d0257-57e4-54f0-9c98-e149af8b4a5c": null
                            }
                        },
                        "2": {
                            "max(norm_ops)": {
                                "6c5d0257-57e4-54f0-9c98-e149af8b4a5c": 3357.0
                            },
                            "avg(norm_ops)": {
                                "6c5d0257-57e4-54f0-9c98-e149af8b4a5c": 1507.0833333333333
                            },
                            "90.0percentiles(norm_ltcy)": {
                                "6c5d0257-57e4-54f0-9c98-e149af8b4a5c": null
                            },
                            "99.0percentiles(norm_ltcy)": {
                                "6c5d0257-57e4-54f0-9c98-e149af8b4a5c": null
                            },
                            "avg(norm_ltcy)": {
                                "6c5d0257-57e4-54f0-9c98-e149af8b4a5c": null
                            }
                        }
                    }
                }
            }
        }
    }
}

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
src/touchstone		src/touchstone
.gitignore		.gitignore
AUTHORS.rst		AUTHORS.rst
CHANGELOG.rst		CHANGELOG.rst
LICENSE.txt		LICENSE.txt
README.md		README.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
test-requirements.txt		test-requirements.txt
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src/touchstone

src/touchstone

.gitignore

.gitignore

AUTHORS.rst

AUTHORS.rst

CHANGELOG.rst

CHANGELOG.rst

LICENSE.txt

LICENSE.txt

README.md

README.md

requirements.txt

requirements.txt

setup.cfg

setup.cfg

setup.py

setup.py

test-requirements.txt

test-requirements.txt

tox.ini

tox.ini

Repository files navigation

Touchstone

Usage

Contributing

Benchmarks

Databases

About

Releases

Packages

Languages

License

bengland2/touchstone

Folders and files

Latest commit

History

Repository files navigation

Touchstone

Usage

Contributing

Benchmarks

Databases

About

Resources

License

Stars

Watchers

Forks

Languages