Skip to content
This repository has been archived by the owner on Oct 27, 2022. It is now read-only.

Commit

Permalink
Merge pull request #1075 from dmann/DDS-974-enforce_upload_chunk_size
Browse files Browse the repository at this point in the history
DDS-974 enforce upload chunk size
  • Loading branch information
dmann committed Sep 19, 2017
2 parents 8252bb4 + 00297af commit b8d3c26
Show file tree
Hide file tree
Showing 23 changed files with 532 additions and 47 deletions.
174 changes: 174 additions & 0 deletions api_design/DDS-974-enforce_upload_chunk_size.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
# DDS-974 Enforce Upload Chunk Size

## Deployment View

**NOTE** The following must be done **before** the circle build

In order for rake db:data:migrate to work, the following ENV must be set in all heroku applications:
- heroku config:set SWIFT_CHUNK_MAX_NUMBER=1000
- heroku config:set SWIFT_CHUNK_MAX_SIZE_BYTES=5368709122

We must back up the Postgresql database! This is because we had to change the
size fields in uploads/chunks from int to bigint to allow for large files.

## Logical View

A critical issue was recently exposed with the chunked upload capability. The storage provider (Swift), is configured with a set maximum number of segments (i.e. Data Service chunks), beyond which it cannot coalesce a large file. To remedy this situation, the API will be extended to enforce a minumum chunk size, based on the overall file size and max segments setting.

#### Summary of impacted APIs

|Endpoint |Description |
|---|---|
| `GET /storage_providers` | List supported storage providers. |
| `GET /storage_providers/{id}` | Get a storage provider. |
| `POST /projects/{id}/uploads` | Inititate a chunked upload. |
| `PUT /uploads/{id}/chunks` | Generate and return a pre-signed upload URL for a chunk. |

#### API Specification
This section defines the proposed API interface extensions.

##### List supported storage providers / Get a storage provider
`GET /storage_providers` / `GET /storage_providers/{id}`

###### Response Example (Extensions)
The following properties will be added to the storage providers resource:

+ **chunk\_max\_size\_bytes** - The maximum size of a chunk that can be upload.
+ **chunk\_max\_number** - The maximum number of chunks that can be uploaded for a single file.
+ **file\_max\_size\_bytes** - Maximum supported file size that can be uploaded. (`chunk_max_size_bytes * chunk_max_number`)

```
{
"id": "g5579f73-0558-4f96-afc7-9d251e65bv33",
"name": "duke_oit_swift",
"description": "Duke OIT Storage",
"chunk_hash_algorithm": "md5",
"chunk_max_size_bytes": 5368709120,
"chunk_max_number": 1000,
"file_max_size_bytes": 5497558138880,
"is_deprecated": false
}
```

##### Intitate a chunked upload
`POST /projects/{id}/uploads`

###### Response Headers (Extensions)
The following custom response headers will be added to inform clients of the minimum chunk size that may be utlized to ensure chunks can be coalesced, as well as the maximum chunk size the storage provider can accommodate.

+ **X-MIN-CHUNK-UPLOAD-SIZE** - The minimum chunk size in bytes.
+ **X-MAX-CHUNK-UPLOAD-SIZE** - The maximum chunk size in bytes.

###### Response Messages (Extensions)
+ 400 - File size is currently not supported - maximum size is {max_segments * max_chunk_upload_size}

###### Response Example
```
{
error: '400',
code: "not_provided",
reason: 'validation failed',
suggestion: 'Fix the following invalid fields and resubmit',
errors:
[
{
"size": "File size is currently not supported - maximum size is {max_segments * max_chunk_upload_size}"
}
]
}
```

##### Generate and return a pre-signed upload URL for a chunk
`PUT /uploads/{id}/chunks`

###### Response Messages (Extensions)
+ 400 - Invalid chunk size specified - must be in range {min}-{max}
+ 400 - Upload chunks exceeded, must be less than {max}

###### Response Example
```
{
error: '400',
code: "not_provided",
reason: 'validation failed',
suggestion: 'Fix the following invalid fields and resubmit',
errors:
[
{
"size": "Invalid chunk size specified - must be in range {min}-{max}"
}
]
}
```
or
```
{
error: '400',
code: "not_provided",
reason: 'maximum upload chunks exceeded.',
suggestion: ''
}
```

## Implementation View

+ The offcial GCB python client and DDS Web portal client will need to be modifed to interface with these chunked upload API extensions.

+ The Swift `max_manifest_segements` will be set to 2000 and all uploads that are inconsistent due to exceeding the prior setting of 1000, will be re-queued for processing.

## Process View

Add notes about performance, scalability, throughput, etc. here. These can inform future proposals to change the implementation.

This design introduces a change the the error response for validation errors.
Most validation_error responses will remain unchanged, reporting a list of field
errors that must be addressed:
```
{
error: '400',
code: "not_provided",
reason: 'validation failed',
suggestion: 'Fix the following invalid fields and resubmit',
errors:
[
{
"field": "something is wrong with this"
}
]
}
```

Some validation errors happen for the entire object, and not for any specific
field, such as when a user attempts to delete a property or template that is
associated with an object, or create a chunk that exceeds the storage_provider
maximum_chunk_number.

In the past, these errors would have come in the list of 'errors', labeled
`base`:
```
{
error: '400',
code: "not_provided",
reason: 'validation failed',
suggestion: 'Fix the following invalid fields and resubmit',
errors:
[
{
"base": "something is wrong with this"
}
]
}
```

Going forward, these object errors will be placed into `reason`, and the response
payload may or may not have other fields that are invalid as well. If there are
no invalid fields, the suggestion will be a blank string, and there will not be
an errors entry in the payload.
```
{
error: '400',
code: "not_provided",
reason: 'something is wrong with this.',
suggestion: ''
}
```
15 changes: 11 additions & 4 deletions app/api/dds/v1/base.rb
Original file line number Diff line number Diff line change
Expand Up @@ -104,17 +104,24 @@ def validation_error!(object)
error: '400',
code: "not_provided",
reason: 'validation failed',
suggestion: 'Fix the following invalid fields and resubmit',
errors: []
suggestion: ''
}
object.errors.messages.each do |field, errors|
unless object.errors.messages[:base].empty?
error_payload[:reason] = object.errors.messages[:base].join(' ')
end
field_errors = []
object.errors.messages.reject{|field| field == :base }.each do |field, errors|
errors.each do |message|
error_payload[:errors] << {
field_errors << {
field: field,
message: message
}
end
end
unless field_errors.empty?
error_payload[:errors] = field_errors
error_payload[:suggestion] = 'Fix the following invalid fields and resubmit'
end
error!(error_payload, 400)
end

Expand Down
2 changes: 2 additions & 0 deletions app/api/dds/v1/uploads_api.rb
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,8 @@ class UploadsAPI < Grape::API
})
authorize upload, :create?
if upload.save
header 'X-MIN-CHUNK-UPLOAD-SIZE', upload.minimum_chunk_size
header 'X-MAX-CHUNK-UPLOAD-SIZE', upload.max_size_bytes
upload
else
validation_error!(upload)
Expand Down
25 changes: 23 additions & 2 deletions app/models/chunk.rb
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,24 @@ class Chunk < ActiveRecord::Base
has_many :project_permissions, through: :upload

validates :upload_id, presence: true
validates :number, presence: true,
validates :number, presence: true,
uniqueness: {scope: [:upload_id], case_sensitive: false}
validates :size, presence: true
validates :size, numericality: {
less_than: :chunk_max_size_bytes,
greater_than_or_equal_to: :minimum_chunk_size,
message: ->(object, data) do
"Invalid chunk size specified - must be in range #{object.minimum_chunk_size}-#{object.chunk_max_size_bytes}"
end
}, if: :storage_provider

validates :fingerprint_value, presence: true
validates :fingerprint_algorithm, presence: true

delegate :project_id, to: :upload
validate :upload_chunk_maximum, if: :storage_provider

delegate :project_id, :minimum_chunk_size, to: :upload
delegate :chunk_max_size_bytes, to: :storage_provider

def http_verb
'PUT'
Expand Down Expand Up @@ -47,8 +58,18 @@ def url
storage_provider.build_signed_url(http_verb, sub_path, expiry)
end

def total_chunks
upload.chunks.count
end

private

def upload_chunk_maximum
unless total_chunks < storage_provider.chunk_max_number
errors[:base] << 'maximum upload chunks exceeded.'
end
end

def update_upload_etag
last_audit = self.audits.last
new_comment = last_audit.comment ? last_audit.comment.merge({raised_by_audit: last_audit.id}) : {raised_by_audit: last_audit.id}
Expand Down
4 changes: 3 additions & 1 deletion app/models/storage_provider.rb
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,9 @@ class StorageProvider < ActiveRecord::Base
validates :service_pass, presence: true
validates :primary_key, presence: true
validates :secondary_key, presence: true

validates :chunk_max_number, presence: true
validates :chunk_max_size_bytes, presence: true

def auth_token
call_auth_uri['x-auth-token']
end
Expand Down
16 changes: 15 additions & 1 deletion app/models/upload.rb
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,14 @@ class Upload < ActiveRecord::Base

validates :project_id, presence: true
validates :name, presence: true
validates :size, presence: true
validates :storage_provider_id, presence: true
validates :size, presence: true
validates :size, numericality: {
less_than: :max_size_bytes,
message: ->(object, data) do
"File size is currently not supported - maximum size is #{object.max_size_bytes}"
end
}, if: :storage_provider
validates :creator_id, presence: true
validates :completed_at, immutable: true, if: :completed_at_was
validates :completed_at, immutable: true, if: :error_at_was
Expand Down Expand Up @@ -93,6 +99,14 @@ def create_and_validate_storage_manifest
end
end

def max_size_bytes
storage_provider.chunk_max_number * storage_provider.chunk_max_size_bytes
end

def minimum_chunk_size
(size.to_f / storage_provider.chunk_max_number).ceil
end

private
def integrity_exception(message)
exactly_now = DateTime.now
Expand Down
3 changes: 2 additions & 1 deletion app/serializers/storage_provider_serializer.rb
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
class StorageProviderSerializer < ActiveModel::Serializer
attributes :id, :name, :description, :is_deprecated, :chunk_hash_algorithm
attributes :id, :name, :description, :is_deprecated, :chunk_hash_algorithm,
:chunk_max_number, :chunk_max_size_bytes

def name
object.display_name
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
class AddChunkMaxNumberChunkMaxSizeBytesToStorageProviders < ActiveRecord::Migration[5.0]
def change
add_column :storage_providers, :chunk_max_number, :integer
add_column :storage_providers, :chunk_max_size_bytes, :integer, limit: 8
end
end
5 changes: 5 additions & 0 deletions db/migrate/20170912182841_change_chunk_size.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
class ChangeChunkSize < ActiveRecord::Migration[5.0]
def change
change_column :chunks, :size, :integer, limit: 8
end
end
6 changes: 4 additions & 2 deletions db/schema.rb
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
#
# It's strongly recommended that you check this file into your version control system.

ActiveRecord::Schema.define(version: 20170622204323) do
ActiveRecord::Schema.define(version: 20170912182841) do

# These are extensions that must be enabled in order to support this database
enable_extension "plpgsql"
Expand Down Expand Up @@ -98,7 +98,7 @@
create_table "chunks", id: :uuid, default: -> { "uuid_generate_v4()" }, force: :cascade do |t|
t.uuid "upload_id"
t.integer "number"
t.integer "size"
t.bigint "size"
t.string "fingerprint_value"
t.string "fingerprint_algorithm"
t.datetime "created_at", null: false
Expand Down Expand Up @@ -264,6 +264,8 @@
t.datetime "created_at", null: false
t.datetime "updated_at", null: false
t.string "chunk_hash_algorithm", default: "md5"
t.integer "chunk_max_number"
t.bigint "chunk_max_size_bytes"
end

create_table "system_permissions", id: :uuid, default: -> { "uuid_generate_v4()" }, force: :cascade do |t|
Expand Down
21 changes: 21 additions & 0 deletions lib/tasks/db_data_migrate.rake
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,26 @@ def migrate_nil_consistency_status
puts "#{updated_uploads} uploads updated."
end

def migrate_storage_provider_chunk_environment
bad_storage_providers = StorageProvider.where(
chunk_max_size_bytes: nil,
chunk_max_number: nil
)

if bad_storage_providers.count > 0
if (ENV['SWIFT_CHUNK_MAX_NUMBER'] && ENV['SWIFT_CHUNK_MAX_SIZE_BYTES'])
bad_storage_providers.each do |bad_storage_provider|
bad_storage_provider.update(
chunk_max_size_bytes: ENV['SWIFT_CHUNK_MAX_SIZE_BYTES'],
chunk_max_number: ENV['SWIFT_CHUNK_MAX_NUMBER']
)
end
else
$stderr.puts 'please set ENV[SWIFT_CHUNK_MAX_NUMBER] AND ENV[SWIFT_CHUNK_MAX_SIZE_BYTES]'
end
end
end

namespace :db do
namespace :data do
desc "Migrate existing data to fit current business rules"
Expand All @@ -116,6 +136,7 @@ namespace :db do
create_missing_fingerprints
type_untyped_authentication_services
migrate_nil_consistency_status
migrate_storage_provider_chunk_environment
end
end
end
2 changes: 1 addition & 1 deletion spec/factories/chunks.rb
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
factory :chunk do
upload
sequence(:number, 1000)
size { Faker::Number.digit }
size { Faker::Number.between(100, 1000) }
fingerprint_value { SecureRandom.hex(32) }
fingerprint_algorithm "md5"

Expand Down
Loading

0 comments on commit b8d3c26

Please sign in to comment.