Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #29321 - Indexing large repo #8605

Merged
merged 3 commits into from
Mar 12, 2020
Merged

Conversation

sjha4
Copy link
Member

@sjha4 sjha4 commented Mar 10, 2020

To do:

  • Measure time for sync/index with/without the change

@theforeman-bot
Copy link

Issues: #29321

@sjha4 sjha4 changed the title [WIP]Fixes #29321 - Indexing large repo Fixes #29321 - Indexing large repo Mar 11, 2020
@jlsherrill
Copy link
Member

Do you think it'd be possible to add a test that would have caught this (without having to actually sync a repo with thousands of units)?

I'm thinking you could create 50,000 fake file units with ActiveRecordImport fairly quickly, and then grab all the ids and pass that to the sync_repository_association method?

@sjha4
Copy link
Member Author

sjha4 commented Mar 12, 2020

@jlsherrill : Added a couple of tests for the 2 types of flows, file and errata with 70000 units. Is that what you had in mind?

ids_href_map["href#{i}"] = nil
i += 1
end
Katello::FileUnit.import([:pulp_id], ids)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding validate: false shaves off about 20 seconds from this method call. It still takes about ~14 seconds, but i think thats okay. I don't know that i see a ton of value for having an errata test and a file test, any reason to have the errata test in addition?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right..I dont think we need the validate:true for this.
The errata test follows a different path where there are actually mappings in the id_href_map..So the 2 flows are the map having nil values for all keys on all content types minus errata and map having actual pulp3_hrefs for errata that we want to store on the katello_repository_erratum table.

Copy link
Member

@jlsherrill jlsherrill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense!

@sjha4
Copy link
Member Author

sjha4 commented Mar 12, 2020

Thanks @jlsherrill ! Merging..

@sjha4 sjha4 merged commit 7ff28c1 into Katello:master Mar 12, 2020
@sjha4 sjha4 deleted the large_repo_index branch March 12, 2020 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants