Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor repository meta analyzer to use Protobuf schemas #411

Merged
merged 13 commits into from
Mar 17, 2023

Conversation

nscuro
Copy link
Member

@nscuro nscuro commented Mar 16, 2023

This PR refactors the repository meta analyzer to use Protobuf schemas.

While implementing this, I came across multiple issues with the current implementation, that I also addressed in this PR:

  • Incoming events were re-keyed from UUID to package URL, but no repartition was ever triggered. The ordering guarantees that we wanted to achieve through the re-keying operation were never effective.
  • Analyzers were defined as @ApplicationScoped CDI beans, meaning there would only ever be one instance per analyzer. Those instances were shared by all Kafka Streams stream threads. Analyzers are not thread safe however, as things like repository URLs and credentials are mutable fields of the analyzer classes.
  • Analysis results were keyed by component UUID. That doesn't make sense, as repo meta data will be ingested to RepositoryMetaComponents on the API server side. That entity is identified purely by PURL type, namespace, and name. Keying the results by component UUID resulted in out-of-order consumption on the API server side, as results for the same type-namespace-name combination were spread across multiple partitions.
    • This was evident by the flood of Received repository meta information for X that is older than what's already in the database; Discarding logs

Old Topology

As there is only one sub-topology, it is evident that no repartitioning was happening after the re-key operation. Splitting the stream based on the applicable analyzer did not have any effect (other than looking cool in the diagram).

svg-image(1)

New Topology

Drastically simplified by getting rid of the unnecessary split by analyzer type. We now have two sub-topologies due to the repartitioning.

svg-image

Signed-off-by: nscuro <nscuro@protonmail.com>
Signed-off-by: nscuro <nscuro@protonmail.com>
... by creating new instances every time they're needed, through a factory. This is analog to how it works in vanilla DT (minus the factory).

Previously, repository URLs were modified on `@ApplicationScoped` instances of the analyzers, which could cause race conditions and unexpected behavior.

Signed-off-by: nscuro <nscuro@protonmail.com>
... and perform more extensive error handling.

Signed-off-by: nscuro <nscuro@protonmail.com>
Signed-off-by: nscuro <nscuro@protonmail.com>
Signed-off-by: nscuro <nscuro@protonmail.com>
Signed-off-by: nscuro <nscuro@protonmail.com>
Signed-off-by: nscuro <nscuro@protonmail.com>
@nscuro nscuro marked this pull request as ready for review March 17, 2023 13:50
Signed-off-by: nscuro <nscuro@protonmail.com>
Signed-off-by: nscuro <nscuro@protonmail.com>
Signed-off-by: nscuro <nscuro@protonmail.com>
Signed-off-by: nscuro <nscuro@protonmail.com>
Signed-off-by: nscuro <nscuro@protonmail.com>
@sonarcloud
Copy link

sonarcloud bot commented Mar 17, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 4 Code Smells

80.0% 80.0% Coverage
0.0% 0.0% Duplication

@VinodAnandan VinodAnandan merged commit 0be0492 into main Mar 17, 2023
@VinodAnandan VinodAnandan deleted the repo-meta-analysis-proto branch March 17, 2023 22:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants