-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DEV: First pass at process_uploads
script
#26662
Closed
Closed
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,3 +2,5 @@ | |
|
||
tmp/* | ||
Gemfile.lock | ||
|
||
/config/process_uploads.yml |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
#!/usr/bin/env ruby | ||
# frozen_string_literal: true | ||
|
||
require_relative "../lib/uploads/cli" | ||
|
||
# ./migrations/bin/process_uploads [--settings=migrations/config/process_uploads.yml] | ||
Migrations::Uploads::CLI.start(ARGV) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
source_db_path: "/path/to/your/db.sqlite3" | ||
output_db_path: "/path/to/your/uploads.sqlite3" | ||
|
||
root_paths: | ||
- "/path/to/your/files" | ||
- "/path/to/more/files" | ||
|
||
# The number of threads to use for processing uploads is calculated as: | ||
# thread_count = [number of cores] * [thread_count_factor] | ||
# The thread count will be doubled if uploads are stored on S3 because there's a higher latency. | ||
thread_count_factor: 1.5 | ||
|
||
# Delete uploads from the output database that are not found in the source database. | ||
delete_surplus_uploads: false | ||
|
||
# Delete uploads from the output database that do not have a Discourse upload record. | ||
delete_missing_uploads: false | ||
|
||
# Check if files are missing in the upload store and update the database accordingly. | ||
# Set to false and re-run the script afterwards if you want to create new uploads for missing files. | ||
fix_missing: false | ||
|
||
# Create optimized images for post uploads and avatars. | ||
create_optimized_images: false | ||
|
||
site_settings: | ||
authorized_extensions: "*" | ||
max_attachment_size_kb: 102_400 | ||
max_image_size_kb: 102_400 | ||
|
||
enable_s3_uploads: true | ||
s3_upload_bucket: "your-bucket-name" | ||
s3_region: "your-region" | ||
s3_access_key_id: "your-access-key-id" | ||
s3_secret_access_key: "your-secret-access-key" | ||
s3_cdn_url: "https://your-cdn-url.com" | ||
|
||
# Set this to true if the site is a multisite and configure the `multisite_db_name` accordingly | ||
multisite: false | ||
multisite_db_name: "default" | ||
|
||
# Sometimes a file can be found at one of many locations. Here's a list of transformations that can | ||
# be applied to the path to try and find the file. The first transformation that results in a file | ||
# being found will be used. | ||
path_replacements: | ||
# - ["/foo/", "/bar"] | ||
# - ["/foo/", "/bar/baz/"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
# frozen_string_literal: true | ||
|
||
require "etc" | ||
require "sqlite3" | ||
|
||
module Migrations | ||
module Uploads | ||
class Base | ||
TRANSACTION_SIZE = 1000 | ||
QUEUE_SIZE = 1000 | ||
|
||
# TODO: Use IntermediateDatabase instead | ||
def create_connection(path) | ||
sqlite = SQLite3::Database.new(path, results_as_hash: true) | ||
sqlite.busy_timeout = 60_000 # 60 seconds | ||
sqlite.journal_mode = "WAL" | ||
sqlite.synchronous = "off" | ||
sqlite | ||
end | ||
|
||
def query(sql, db) | ||
db.prepare(sql).execute | ||
end | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
# frozen_string_literal: true | ||
|
||
require_relative "../migrations" | ||
require_relative "./settings" | ||
require_relative "./fixer" | ||
require_relative "./uploader" | ||
require_relative "./optimizer" | ||
|
||
module Migrations | ||
load_rails_environment | ||
|
||
load_gemfiles("common") | ||
configure_zeitwerk("lib/common") | ||
|
||
module Uploads | ||
class CLI < Thor | ||
default_task :execute | ||
|
||
class_option :settings, | ||
type: :string, | ||
aliases: "-s", | ||
default: "./migrations/config/process_uploads.yml", | ||
banner: "SETTINGS_FILE", | ||
desc: "Upload settings file" | ||
|
||
def initialize(*args) | ||
super | ||
|
||
EXIFR.logger = Logger.new(nil) | ||
@settings = Settings.from_file(options[:settings]) | ||
end | ||
|
||
def self.exit_on_failure? | ||
true | ||
end | ||
|
||
desc "execute [--settings=SETTINGS_FILE]", "Process uploads" | ||
def execute | ||
return run_fixer! if @settings[:fix_missing] | ||
|
||
Uploader.run!(@settings) | ||
|
||
run_optimizer! if @settings[:create_optimized_images] | ||
end | ||
|
||
desc "fix-missing [--settings=SETTINGS_FILE]", "Fix missing uploads" | ||
def fix_missing | ||
run_fixer! | ||
end | ||
|
||
desc "optimize [--settings=SETTINGS_FILE]", "Optimize uploads" | ||
def optimize | ||
run_optimize! | ||
end | ||
|
||
private | ||
|
||
def run_fixer! | ||
Fixer.run!(@settings) | ||
end | ||
|
||
def run_optimizer! | ||
Optimizer.run!(@settings) | ||
end | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,110 @@ | ||
# frozen_string_literal: true | ||
|
||
require_relative "./base" | ||
|
||
module Migrations | ||
module Uploads | ||
class Fixer < Base | ||
def initialize(settings) | ||
@settings = settings | ||
|
||
@source_db = create_connection(settings[:output_db_path]) | ||
end | ||
|
||
def self.run!(settings) | ||
puts "Fixing missing uploads..." | ||
|
||
new(settings).run! | ||
end | ||
|
||
def run! | ||
queue = SizedQueue.new(QUEUE_SIZE) | ||
consumer_threads = [] | ||
|
||
max_count = | ||
@source_db.get_first_value("SELECT COUNT(*) FROM uploads WHERE upload IS NOT NULL") | ||
|
||
binding | ||
producer_thread = | ||
Thread.new do | ||
query( | ||
"SELECT id, upload FROM uploads WHERE upload IS NOT NULL ORDER BY rowid DESC", | ||
@source_db, | ||
).tap do |result_set| | ||
result_set.each { |row| queue << row } | ||
result_set.close | ||
end | ||
end | ||
|
||
status_queue = SizedQueue.new(QUEUE_SIZE) | ||
status_thread = | ||
Thread.new do | ||
error_count = 0 | ||
current_count = 0 | ||
missing_count = 0 | ||
|
||
while !(result = status_queue.pop).nil? | ||
current_count += 1 | ||
|
||
case result[:status] | ||
when :ok | ||
# ignore | ||
when :error | ||
error_count += 1 | ||
puts "Error in #{result[:id]}" | ||
when :missing | ||
missing_count += 1 | ||
puts "Missing #{result[:id]}" | ||
|
||
@output_db.execute("DELETE FROM uploads WHERE id = ?", result[:id]) | ||
Upload.delete_by(id: result[:upload_id]) | ||
end | ||
|
||
error_count_text = error_count > 0 ? "#{error_count} errors".red : "0 errors" | ||
|
||
print "\r%7d / %7d (%s, %s missing)" % | ||
[current_count, max_count, error_count_text, missing_count] | ||
end | ||
end | ||
|
||
store = Discourse.store | ||
|
||
(Etc.nprocessors * @settings[:thread_count_factor] * 2).to_i.times do |index| | ||
consumer_threads << Thread.new do | ||
Thread.current.name = "worker-#{index}" | ||
fake_upload = OpenStruct.new(url: "") | ||
while (row = queue.pop) | ||
begin | ||
upload = JSON.parse(row["upload"]) | ||
fake_upload.url = upload["url"] | ||
path = add_multisite_prefix(store.get_path_for_upload(fake_upload)) | ||
|
||
file_exists = | ||
if store.external? | ||
store.object_from_path(path).exists? | ||
else | ||
File.exist?(File.join(store.public_dir, path)) | ||
end | ||
|
||
if file_exists | ||
status_queue << { id: row["id"], upload_id: upload["id"], status: :ok } | ||
else | ||
status_queue << { id: row["id"], upload_id: upload["id"], status: :missing } | ||
end | ||
rescue StandardError => e | ||
puts e.message | ||
status_queue << { id: row["id"], upload_id: upload["id"], status: :error } | ||
end | ||
end | ||
end | ||
end | ||
|
||
producer_thread.join | ||
queue.close | ||
consumer_threads.each(&:join) | ||
status_queue.close | ||
status_thread.join | ||
end | ||
end | ||
end | ||
end |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to keep CLI code in
bin/process_uploads
. It's not really reusable code that you'd expect to find inlib
. It doesn't help with testability, either. But maybe I'm missing something, what are your arguments for putting it intolib
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be great for if we ever want to compose the CLIs into a single "migration" CLI, making each a sub command will be relatively straightforward.
Could you explain this further?