Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalisation #9

Open
k4lipso opened this issue May 2, 2018 · 0 comments
Open

Normalisation #9

k4lipso opened this issue May 2, 2018 · 0 comments
Labels

Comments

@k4lipso
Copy link
Collaborator

k4lipso commented May 2, 2018

Normalising Data works well using the --format SQL, but we still have a lot of Data which is stored multiple times in the Database.

The Hwloc_pci extractor is mostly identical on each host, but the "domain" specifier is unique, so it could be outsourced. the hwloc_pci datastring contains about 40.000 chars which where written 63 times on a test with 64 Hosts, so about 2.520.000 chars where written to the Database. that could be reduced to about 45.000.

The Filesystem extractor collects data about Mountpoints and Partitions, which both have their own unique EID in the Database representation. Most of the values are static and could be normalized, but there is one dynamic value: "available". Because of that each Host creates its own Datastring even if the Mountpoints are exactly the same. To be exact from 64 Hosts 62 data entrys where created.

Hwloc_machine Extractor creates Machineinfo entry. The data linked to that really is identical on each Host, but the "hostname" specifier is unique and not needed at all in the database represantation since the Hostname is not only unique but also allready saved in the Hosttable. Could be deleted or outsourced completly.

@k4lipso k4lipso added the minor label May 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant