Insights Analytics Collector

This package helps with collecting data by user-defined collector methods. It packs collected data to one or more tarballs and sends them to user-defined URL.

Some data and classes has to be implemented. By function:

persisting settings
data like credentials, content type etc. for shipping (POST request)

By Classes:

Collector
Package
collector_module:
- functions with @register decorator, one with config=True, format='json'
- slicing functions (optional) for splitting large data (db tables) by time intervals

Collector

Entrypoint with "gather()" method.

Implementation

Collector is an Abstract class, implement abstract methods.

_package_class: Returns class of your implementation of Package
_is_valid_license: Check for valid license specific to your service
_is_shipping_configured: Check if shipping to cloud is configured
_last_gathering: returns datetime. Loading last successful run from some persistent storage
_save_last_gather: Persisting last successful run
_load_last_gathered_entries: Has to fill dictionary self.last_gathered_entries. Load from persistent storage Dict contains keys equal to collector's registered functions' keys (with @register decorator)
_save_last_gathered_entries: Persisting self.last_gathered_entries

An example can be found in Test collector

Package

One package represents one .tar.gz file which will be uploaded to Analytics. Registered collectors are placed to collections as JSON/CSV files this way:

Upload limit is 100MB. The maximum bytes of uncompressed data is MAX_DATA_SIZE (by guess 200MB, redefine if needed)
JSON collectors are processed first, it's not expected they'll exceed this size
- if yes, use CSV format instead
CSV files can be collected in two modes:
- with slicing function
  - splitting data by custom function - usually time interval
  - the purpose is to have reasonable SQL query in big databases
  - @register(fnc_slicing=...)
- without slicing function
CSV files are expected to be large (db data), so they can be split by CsvFileSplitter in the collector function.

How are files included into packages:

JSON files are in first package
CSVs without slicing are included to first free package with enough size (can be added to JSON files)
- if function collects i.e. 900MB, it's sent in first 5 packages
- two functions cannot have the same name in @register() decorator
CSVs with slicing are sent after each slice is collected (with respect to smaller volume size if running in OpenShift/docker)
- each slice can be also split by CsvFileSplitter, if bigger than MAX_DATA_SIZE
  - then each part of slice is sent immediately
- two functions can have the same name in @register() decorator

Number of packages (tarballs) is bigger of:

number of files collected by one biggest registered CSV collector without slicing
number of files collected by all registered CSV collectors with slicing
can be +1 for JSON files

See the test_gathering.py for details

Implementation

Package is also abstract class. You have to implement basically info for POST request to cloud.

PAYLOAD_CONTENT_TYPE: contains registered content type for cloud's ingress service
MAX_DATA_SIZE: maximum size in bytes of uncompressed data for one tarball. Ingress limits uploads to 100MB. Defaults to 200MB.
get_ingress_url: Cloud's ingress service URL
_get_rh_user: User for POST request
_get_rh_password: Password for POST request
_get_x_rh_identity: X-RH Identity Used for local testing instead of user and password
_get_http_request_headers: Dict with any custom headers for POST request

An example can be found in Test package

Collector module

Module with gathering functions is the main part you need to implement. It should contain functions returning data either in dict format or list of CSV files.

Function is registered by @register decorator:

from insights_analytics_collector import register

@register('json_data', '1.0', format='json', description="Data description")
def json_data(**kwargs):
    return {'my_data': 'True'}

Decorator @register has following attributes:

key: (string) name of output file (usually the same as function name)
version: (string) i.e. '1.0'. Version of data - added to the manifest.json for parsing on cloud's side
description: (string) not used yet
format: (string) Default: 'json' extension of output file, can be "json" of "csv". Also determines function output.
config: (bool) Default: False. there has to be one function with config=True, format=json
fnc_slicing: Intended for large data. Described in Slicing function below
shipping_group: (string) Default: 'default'. Splits data to packages by group, if required.

from <your-namespace> import Collector  # your implementation

collector = Collector
collector.gather()

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.github		.github
insights_analytics_collector		insights_analytics_collector
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE.md		LICENSE.md
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py
test		test

License

RedHatInsights/insights-analytics-collector

Folders and files

Latest commit

History

Repository files navigation

Insights Analytics Collector

Collector

Implementation

Package

Implementation

Collector module

Slicing function

Collectors

Registered collectors

Abstract classes

Tarballs

About

Resources

License

Stars

Watchers

Forks

Languages