Skip to content

Commit

Permalink
- doc updates
Browse files Browse the repository at this point in the history
  • Loading branch information
giventocode committed Mar 5, 2018
1 parent 400281c commit 1877f3b
Show file tree
Hide file tree
Showing 4 changed files with 71 additions and 16 deletions.
8 changes: 6 additions & 2 deletions args.go
Original file line number Diff line number Diff line change
Expand Up @@ -522,7 +522,7 @@ func (p *paramParserValidator) pvSourceInfoForS3IsReq() error {
burl, err := url.Parse(p.params.sourceURIs[0])

if err != nil {
return fmt.Errorf("Invalid S3 endpoint URL. Parsing error: %v.\nThe format is s3://[END_POINT]/[BUCKET]/[OBJECT]", err)
return fmt.Errorf("Invalid S3 endpoint URL. Parsing error: %v.\nThe format is s3://[END_POINT]/[BUCKET]/[PREFIX]", err)
}

p.params.s3Source.endpoint = burl.Hostname()
Expand All @@ -533,10 +533,14 @@ func (p *paramParserValidator) pvSourceInfoForS3IsReq() error {

segments := strings.Split(burl.Path, "/")

if len(segments) < 2 {
return fmt.Errorf("Invalid S3 endpoint URL. Bucket not specified. The format is s3://[END_POINT]/[BUCKET]/[PREFIX]")
}

p.params.s3Source.bucket = segments[1]

if p.params.s3Source.bucket == "" {
return fmt.Errorf("Invalid source S3 URI. Bucket name could be parsed")
return fmt.Errorf("Invalid source S3 URI. Bucket name could not be parsed")
}

prefix := ""
Expand Down
19 changes: 5 additions & 14 deletions docs/perfmode.rst
Original file line number Diff line number Diff line change
@@ -1,14 +1,7 @@
Performance Mode
======================================

If you want to maximize performance, and your source and target are public HTTP based end-points (Blob, S3, and HTTP), running the transfer in a high bandwidth environment such as a VM on the cloud, is strongly recommended. This recommendation comes from the fact that blob to blob, S3 to blob or HTTP to blob transfers are bidirectional where BlobPorter downloads the data (without writing to disk) and uploads it as it is received.

When running in the cloud, consider the region where the transfer VM ( where BlobPorter will be running), will be deployed. When possible, deploy the transfer VM in the same the same region as the target of the transfer. Running in the same region as the target minimizes the transfer costs (egress from the VM to the target storage account) and the network performance impact (lower latency) for the upload operation.

For downloads or uploads of multiple or large files the disk i/o could be the constraining resource that slows down the transfer. And often identifying if this is the case, is a cumbersome process. But if done, it could lead to informed decisions about the environment where BlobPorter runs.

To help with this indentification process, BlobPorter has a performance mode that uploads random data generated in memory and measures the performance of the operation without the impact of disk i/o.
The performance mode for uploads could help you identify the potential upper limit of throughput that the network and the target storage account can provide.
BlobPorter has a performance mode that uploads random data generated in memory and measures the performance of the operation without the impact of disk i/o.
The performance mode for uploads could help you identify the potential upper limit of throughput that the network and the target storage account can provide.

For example, the following command will upload 10 x 10GB files to a storage account.

Expand All @@ -24,19 +17,17 @@ blobporter -f "1GB:10" -c perft -t perf-blockblob -g 20

Similarly, for downloads, you can simulate downloading data from a storage account without writing to disk. This mode could also help you fine-tune the number of readers (-r option) and get an idea of the maximum download throughput.

The following command will download the data we previously uploaded.
The following command downloads the data previously uploaded.

```
export SRC_ACCOUNT_KEY=$ACCOUNT_KEY
blobporter -f "https://myaccount.blob.core.windows.net/perft" -t blob-perf
```

Then you can try downloading to disk.
Then you can download to disk.

```
blobporter -c perft -t blob-file
```

If the performance difference is significant then you can conclude that disk i/o is the bottleneck, at which point you can consider an SSD backed VM.


The performance difference will you a measurement of the impact of disk i/o.
59 changes: 59 additions & 0 deletions docs/resumable_transfers.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
Resumable Transfers
======================================
BlobPorter supports resumable transfers. To enable this feature you need to set the -l option with a path to the transfer status file.

```
blobporter -f "manyfiles/*" -c "many" -l mylog
```
The status transfer file contains entries for when a file is queued and when it was succesfully tranferred.

The log entries are created with the following tab-delimited format:

```
[Timestamp] [Filename] [Status (1:Started,2:Completed,3:Ignored)] [Size] [Transfer ID ]
```

The following output from a transfer status file shows that three files were included in the transfer (file10, file11 and file15).
However, only two were successfully transferred: file10 and file11.

```
2018-03-05T03:31:13.034245807Z file10 1 104857600 938520246_mylog
2018-03-05T03:31:13.034390509Z file11 1 104857600 938520246_mylog
2018-03-05T03:31:13.034437109Z file15 1 104857600 938520246_mylog
2018-03-05T03:31:25.232572306Z file10 2 104857600 938520246_mylog
2018-03-05T03:31:25.591239355Z file11 2 104857600 938520246_mylog
```

In case of failure, you can reference the same status file and BlobPorter will skip files that were already transferred.

Consider the previous scenario. After executing the transfer again, the status file has entries only for the missing file (file15).

```
2018-03-05T03:31:13.034245807Z file10 1 104857600 938520246_mylog
2018-03-05T03:31:13.034390509Z file11 1 104857600 938520246_mylog
2018-03-05T03:31:13.034437109Z file15 1 104857600 938520246_mylog
2018-03-05T03:31:25.232572306Z file10 2 104857600 938520246_mylog
2018-03-05T03:31:25.591239355Z file11 2 104857600 938520246_mylog
2018-03-05T03:54:33.660161772Z file15 1 104857600 495675852_mylog
2018-03-05T03:54:34.579295059Z file15 2 104857600 495675852_mylog
```

When the transfer is sucessful, a summary is created at the end of the transfer status file.

```
----------------------------------------------------------
Transfer Completed----------------------------------------
Start Summary---------------------------------------------
Last Transfer ID:495675852_mylog
Date:Mon Mar 5 03:54:34 UTC 2018
File:file15 Size:104857600 TID:495675852_mylog
File:file10 Size:104857600 TID:938520246_mylog
File:file11 Size:104857600 TID:938520246_mylog
Transferred Files:3 Total Size:314572800
End Summary-----------------------------------------------
```




1 change: 1 addition & 0 deletions sources/s3info.go
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ type s3InfoProvider struct {
func newS3InfoProvider(params *S3Params) (*s3InfoProvider, error) {
s3client, err := minio.New(params.Endpoint, params.AccessKey, params.SecretKey, true)


if err != nil {
log.Fatalln(err)
}
Expand Down

0 comments on commit 1877f3b

Please sign in to comment.