Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #21691 - clean installed packages on upgrade #7069

Merged
merged 1 commit into from Nov 17, 2017

Conversation

jlsherrill
Copy link
Member

No description provided.

@theforeman-bot
Copy link

Issues: #21691

@@ -0,0 +1,56 @@
class CleanupInstalledPackages < ActiveRecord::Migration

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra empty line detected at class body beginning.

@jlsherrill jlsherrill force-pushed the clean_packages branch 2 times, most recently from 2ef951a to ec7d97b Compare November 16, 2017 18:19
@akofink akofink self-requested a review November 16, 2017 19:03
Setting.where(:name => 'bulk_query_installed_packages', :category => 'Setting::Content').delete_all
end

def down
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am debating if you should do

    def down
        raise ActiveRecord::IrreversibleMigration
    end

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that is needed? The db should be in the same state as before, just with de-duplicate data.

end

#Copy unique entires to new table
ActiveRecord::Base.connection.execute('insert into katello_installed_packages_new(nvra, name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couldn't you just run execute(....) instead of ActiveRecord::Base.connection.execute

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you can!

select distinct(nvra) as nvra, name from katello_installed_packages')

#copy associations
ActiveRecord::Base.connection.execute('
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

def up
create_table "katello_installed_packages_new", force: :cascade do |t|
t.string "name", limit: 255, null: false
t.string "nvra", limit: 255, null: false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding a unique index to this column to speed up the search

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that it would help, but i can time it and see

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tested and with the index it took about ~4 minutes more.

t.integer "host_id", null: false
t.integer "installed_package_id", null: false
end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about a unique index for the combination columns here. This should speed up the lookup.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There arne't any lookups being done on this table?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I guess add it on the katello_installed_packages_new

@parthaa
Copy link
Contributor

parthaa commented Nov 16, 2017

Can you give me like general estimate on how long it took to run this on a huge packages DB, like 10s of millions of rows. I am trying to figure out if we should spend cycles trying to fine tune this.

@jlsherrill
Copy link
Member Author

@parthaa using a db with 100 million rows in katello_installed_packages it took about 2 hours. I'll see if adding that one index prior will help.

@jlsherrill
Copy link
Member Author

sorry, hit enter too soon. I tried a variety of other methods. Attempting to cache more data resulted in OOMs, trying to do stuff in 'chunks' took wayyyyyy longer (still running after ~10 hours). My box only had ~7GB, so a user with 100Million rows is likely to have a lot more, and hopefully it would take less time.

Copy link
Member

@akofink akofink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any real complaints about the code. I did test the migration and the rollback (with host packages, of course), and it appears to work fine. ACK pending another reviewer.

@parthaa
Copy link
Contributor

parthaa commented Nov 17, 2017

ACK

@jlsherrill jlsherrill merged commit ed0019a into Katello:master Nov 17, 2017
@jlsherrill jlsherrill deleted the clean_packages branch November 17, 2017 21:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants