The Flood program now includes a robust retry mechanism with exponential backoff starting at 30 seconds and increasing up to a maximum of 10 retries. This ensures that the program can handle temporary S3 provider unavailability effectively while avoiding overwhelming the S3 service with frequent retries.
-
Exponential Backoff with Jitter:
- Implement exponential backoff with jitter for retries on transient S3 upload failures.
- The initial backoff delay is 30 seconds, and it doubles with each retry attempt (30s, 60s, 120s, etc.), with a random jitter added to each delay to avoid synchronization issues.
- The backoff will continue for up to 10 retry attempts.
-
Maximum Retry Count:
- The program must retry up to a maximum of 10 times for each file.
- After reaching the maximum retry limit (i.e., after 10 retries), the file should be marked as failed and moved to the
faileddirectory.
-
Retry on Specific Errors:
- The retry mechanism will apply only for transient errors such as:
- Network timeouts
- Connection resets
- DNS errors
- Non-recoverable errors (e.g., missing bucket, authentication failure) will not trigger a retry.
- The retry mechanism will apply only for transient errors such as:
-
Track Retry Attempts in SQLite:
- Each retry attempt must be logged in the SQLite database, recording:
- The number of retries attempted
- The timestamp of each retry
- The outcome of the retry (e.g., success or failure)
- Each retry attempt must be logged in the SQLite database, recording:
-
Backoff Reset on Success:
- Once a file is successfully uploaded, the retry counter must be reset, and no further retries will be attempted for that file.
- The program must accept a credentials file (with AWS credentials syntax) as an argument.
- It must parse the AWS credentials file and extract all profiles, including named profiles.
- It must validate the credentials file to ensure it has at least one profile.
- Each profile must supply the following based on the
provider:- If
provider = amazon:AWS_ACCESS_KEY_IDAWS_SECRET_ACCESS_KEYAWS_REGION
- If
provider = cloudflare:AWS_ACCESS_KEY_IDAWS_SECRET_ACCESS_KEYAWS_REGIONAWS_ENDPOINT
- If
provider = backblaze:AWS_ACCESS_KEY_IDAWS_SECRET_ACCESS_KEYAWS_REGIONAWS_ENDPOINT
- If
- If the
providerkey is missing, the program must log an error and abort execution with a message indicating the missing provider. - If any required configuration key is missing based on the provider, the program must log an error and abort execution with a message indicating the missing key(s).
- The program must include a retry mechanism with exponential backoff and jitter for handling transient errors when attempting to upload files to S3.
- The retry mechanism will have an initial delay of 30 seconds, which doubles with each retry attempt (30s, 60s, 120s, etc.).
- A random jitter will be added to each delay to prevent synchronized retries across multiple clients.
- The program must retry up to a maximum of 10 attempts for each file.
- The retry mechanism must apply only to transient errors, including:
- Network timeouts
- DNS failures
- Connection resets
- Non-recoverable errors (e.g., authentication failure, missing bucket) must not trigger retries.
- Each retry attempt must be logged in the SQLite database, including:
- The number of retries attempted
- The timestamp of each retry
- The outcome of the retry (success or failure).
- If a file is successfully uploaded after a retry, the backoff and retry counter must be reset.
- After 10 failed retry attempts, the program must log the failure, move the file to the
faileddirectory, and update the database accordingly.
- The bucket name is the second directory level under each profile in the main directories (
incoming_tmp,incoming,processing,failed, andcompleted).
- Directory structure:
/server_directory/{main_dir}/{profileName}/{bucketName}/... - Example:
/server_directory/incoming/profile1/mybucket/file.txt
- The program must be able to run in server mode if a
server_directoryargument is provided. - In server mode, the program should run continuously, processing files as they arrive in
incoming. - The program must use
fsnotifyto monitor theincomingdirectory forMOVEorCLOSE_WRITEevents. - A file is considered to have arrived if it triggers a
MOVEorCLOSE_WRITEevent in theincomingdirectory. - Recursive directory watching: The program must watch subdirectories inside the
incomingdirectory and process files within them. - Once detected, files must be moved to the corresponding profile's directory under
processingfor handling. - Files should then be moved to either
completed(if processing succeeds) orfailed(if processing fails). - In server mode, after reading the credentials file, all directories (from requirement 13) must be created under the
server_directory. - In server mode, the credentials file must still be provided and parsed as per requirement 1.
- The core purpose of the program in server mode is to move files to S3 buckets based on the profile and bucket structure in the
incomingdirectory. - Before enabling
fsnotifymonitoring, the program must first process all existing files in theprocessingdirectory for each profile. - After processing files in the
processingdirectory, it must then process all existing files in theincomingdirectory for each profile. - Only after all existing files have been processed from
processingandincomingshould the program enablefsnotifyto monitor new files inincoming.
- The program must also support copy mode, where it accepts a source directory or filename and copies it to the appropriate profile's
incoming_tmpdirectory. - In copy mode, the destination must be specified in the format:
s3://{profilename}/{bucketname}/filepath_or_name. - The program must extract the profile and bucket directly from the S3 URI.
- The program must validate the profile and bucket by checking the credentials file and the structure of the
incomingandincoming_tmpdirectories. - The program must dynamically create bucket directories (if they don't exist) during copy mode operations.
- The necessary bucket directory structure must be created in both
incoming_tmpandincomingas needed.
- In copy mode, the program must support an optional recursive copy (using typical command-line syntax such as
-ror--recursive).- If the recursive flag is used, it should copy directories and all their contents to
incoming_tmp.
- If the recursive flag is used, it should copy directories and all their contents to
- After the file or directory is copied into
incoming_tmp, it must be moved to the corresponding location inincoming. - If a directory or file is copied, it must respect the profile and S3 bucket structure, meaning files are placed in the correct subdirectory based on the profile and bucket specified in the S3 URI.
- An
incoming_tmpdirectory with the same subdirectory structure asincomingmust be created, where files are initially copied. - All new files must be first copied into the appropriate subdirectory of
incoming_tmpfor each profile and bucket. - Once a file copy is complete in
incoming_tmp, the file should be moved to the corresponding location in theincomingdirectory, which is monitored byfsnotify. - When the program starts, it must first delete the entire contents of the
incoming_tmpdirectory to ensure it is empty before continuing. - After deleting the contents of
incoming_tmp, the program must proceed with creating all required directories for profiles and buckets underincoming_tmp,incoming,processing,failed, andcompleted.
- Files are processed by copying them to the target S3 profile using the Golang AWS SDK.
- The program must ensure that the file is fully uploaded to the S3 bucket before moving it to the
completeddirectory.
- The check should be informational only if the target S3 service does not support
HEADrequests.
- If the S3 upload fails, the program must log the error and move the file to the
faileddirectory for retry or manual intervention.
- Before copying any files to a bucket, the program must validate the existence of the bucket on the S3 server.
- The program must:
- Query the S3 server for a list of buckets using the AWS SDK's
ListBucketsfunction. - Ensure that the bucket specified in the S3 URI (or derived from the directory structure) exists on the server
- Query the S3 server for a list of buckets using the AWS SDK's
. 44. If the bucket does not exist on the S3 server, the program should:
- Log an error indicating that the bucket does not exist.
- Skip the operation for that specific bucket.
- The program must create an SQLite database in the current directory to track the state and progress of each file.
- The SQLite database must contain a table named
file_recordswith the following fields:
id: A unique identifier for each record (auto-incrementing primary key).profile: The profile name.bucket: The S3 bucket name.filepath: The relative file path within the bucket.file_creation_date: The file creation timestamp (to ensure a unique combination).current_state: The current state of the file (e.g.,incoming,processing,completed,failed).last_updated: The timestamp when the state was last updated.upload_outcome: The outcome of the upload (success,failure).
- Each file should have a unique record in the database, determined by the
profile,bucket,filepath, and file creation date. - If a file enters the system again with the same
profile,bucket,filepath, and creation date (indicating a re-upload or retry), the program must create a new record in the database. - The program must update the database record as the file moves through each state (
incoming_tmp,incoming,processing,completed,failed) and log the outcome (success or failure).
- The program must search for the AWS credentials file using the same algorithm as the AWS CLI, before falling back to the file provided as an argument.
- The program must output detailed logging and information about each action it performs, including file movements, copying, and S3 uploads.
- Exponential backoff retry mechanism for transient S3 errors, starting at 30 seconds, doubling on each attempt, up to 10 retries.
- Maximum retry count of 10 retries, after which the file is marked as failed.
- SQLite logging to track retry attempts and file states.
- Abort execution if the provider or required configuration keys are missing for any profile.
Let me know if you'd like any further adjustments or clarification!