Collecting data through insights-upload (topological_inventory-sync) #74

slemrmartin · 2020-03-06T13:46:24Z

ISSUE TYPE

Feature Idea

Following discussion with @Ladas and ansible/awx#5931

SUMMARY

Full refresh through AnsibleTower collector <-> Tower API makes big traffic to Tower.
And through receptor it'll be quite slow.
So there is an idea to collect data through insights-upload service.

DETAILS

This will be applicable for new versions of Tower, so collector's full refresh will be needed for some time.
Targeted refresh has to be implemented for operations worker (job templates ordering)
List of collected entities:
- credentials (sensitive data not needed)
- credential_types
- inventories
- job_templates/workflow_job_templates
- surveys for job_templates/workflow_job_templates
- workflow_job_template_nodes
- credentials for workflow_job_template_nodes
- jobs/workflow_jobs
- workflow_job_nodes
- credentials for workflow_job_nodes
- https://github.com/RedHatInsights/topological_inventory-ansible_tower/blob/master/lib/topological_inventory/ansible_tower/collector/service_catalog.rb

The text was updated successfully, but these errors were encountered:

slemrmartin · 2020-03-06T13:47:23Z

cc @iphands @gtanzillo @Ladas @agrare

Ladas · 2020-03-06T14:01:05Z

If you want to use dataset already sent by tower into cloud.redhat.com, please create issue in https://github.com/ansible/awx/issues/ and we can go through what's missing

job_templates(unified so workflows templates and templates), jobs (unified), workflow_job_nodes and workflow_job_template_nodes are already implemented and will be sent to c.r.c

You should use data that are already being sent to c.r.c ingress service (done in a very effective way, considering perf impact on the tower cluster itself) and not do another full API scanning (that will have perf impact to tower clusters, since we are talking about hundreds of thousands - dozens of millions of records for bigger customers)

slemrmartin · 2020-03-06T15:50:57Z

I'm thinking if topo and catalog need collecting all jobs, are they used somehow?
If we'll have targeted refresh, we'll sync only jobs ordered by catalog. And we can skip these dozens of millions records in our db

gtanzillo · 2020-03-06T16:29:14Z

This approach seems like it would be much more efficient and manageable from the standpoint of topology. It takes the burden of getting Tower inventory to c.r.c off of Topo. I believe we originally considered this as one of two potential approaches but timeframes, limited resources and other factors to us pushed in the receptor direction. I think it's definitely worth pursuing.

Ladas · 2020-03-06T17:42:32Z

@gtanzillo so the main reason to not go with this path for me was, that just a subset of data was available (basically just list of templates and jobs, from what we need). Now I am in a position to fix that, so this path becomes viable :-D

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Collecting data through insights-upload (topological_inventory-sync) #74

Collecting data through insights-upload (topological_inventory-sync) #74

slemrmartin commented Mar 6, 2020 •

edited

Loading

slemrmartin commented Mar 6, 2020

Ladas commented Mar 6, 2020

slemrmartin commented Mar 6, 2020 •

edited

Loading

gtanzillo commented Mar 6, 2020

Ladas commented Mar 6, 2020

Collecting data through insights-upload (topological_inventory-sync) #74

Collecting data through insights-upload (topological_inventory-sync) #74

Comments

slemrmartin commented Mar 6, 2020 • edited Loading

slemrmartin commented Mar 6, 2020

Ladas commented Mar 6, 2020

slemrmartin commented Mar 6, 2020 • edited Loading

gtanzillo commented Mar 6, 2020

Ladas commented Mar 6, 2020

slemrmartin commented Mar 6, 2020 •

edited

Loading

slemrmartin commented Mar 6, 2020 •

edited

Loading