Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/cluster-management/uninstall.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,10 +33,10 @@ To delete them:
export AWS_ACCESS_KEY_ID=***
export AWS_SECRET_ACCESS_KEY=***

# identify the name of your cortex s3 bucket
# identify the name of your cortex S3 bucket
aws s3 ls

# delete the s3 bucket
# delete the S3 bucket
aws s3 rb --force s3://<bucket>

# delete the log group (replace <log_group> with what was configured during installation, default: cortex)
Expand Down
20 changes: 10 additions & 10 deletions docs/deployments/batch-api/endpoints.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,14 +62,14 @@ RESPONSE:

### S3 file paths

If your input data is a list of files such as images/videos in an s3 directory, you can define `file_path_lister` in your submission request payload. You can use `file_path_lister.s3_paths` to specify a list of files or prefixes, and `file_path_lister.includes` and/or `file_path_lister.excludes` to remove unwanted files. The s3 file paths will be aggregated into batches of size `file_path_lister.batch_size`. To learn more about fine-grained S3 file filtering see [filtering files](#filtering-files).
If your input data is a list of files such as images/videos in an S3 directory, you can define `file_path_lister` in your submission request payload. You can use `file_path_lister.s3_paths` to specify a list of files or prefixes, and `file_path_lister.includes` and/or `file_path_lister.excludes` to remove unwanted files. The S3 file paths will be aggregated into batches of size `file_path_lister.batch_size`. To learn more about fine-grained S3 file filtering see [filtering files](#filtering-files).

__The total size of a batch must be less than 256 KiB.__

This submission pattern can be useful in the following scenarios:

* you have a list of images/videos in an s3 directory
* each s3 file represents a single sample or a small number of samples
* you have a list of images/videos in an S3 directory
* each S3 file represents a single sample or a small number of samples

If a single S3 file contains a lot of samples/rows, try the next submission strategy.

Expand All @@ -78,10 +78,10 @@ POST <batch_api_endpoint>/:
{
"workers": <int>, # the number of workers to allocate for this job (required)
"file_path_lister": {
"s3_paths": [<string>], # can be s3 prefixes or complete s3 paths (required)
"s3_paths": [<string>], # can be S3 prefixes or complete S3 paths (required)
"includes": [<string>], # glob patterns (optional)
"excludes": [<string>], # glob patterns (optional)
"batch_size": <int>, # the number of s3 file paths per batch (the predict() function is called once per batch) (required)
"batch_size": <int>, # the number of S3 file paths per batch (the predict() function is called once per batch) (required)
}
"config": { # custom fields for this specific job (will override values in `config` specified in your api configuration) (optional)
"string": <any>
Expand All @@ -102,22 +102,22 @@ RESPONSE:

### Newline delimited JSON files in S3

If your input dataset is a newline delimited json file in an s3 directory (or a list of them), you can define `delimited_files` in your request payload to break up the contents of the file into batches of size `delimited_files.batch_size`.
If your input dataset is a newline delimited json file in an S3 directory (or a list of them), you can define `delimited_files` in your request payload to break up the contents of the file into batches of size `delimited_files.batch_size`.

Upon receiving `delimited_files`, your Batch API will iterate through the `delimited_files.s3_paths` to generate the set of s3 files to process. You can use `delimited_files.includes` and `delimited_files.excludes` to filter out unwanted files. Each S3 file will be parsed as a newline delimited JSON file. Each line in the file should be a JSON object, which will be treated as a single sample. The S3 file will be broken down into batches of size `delimited_files.batch_size` and submitted to your workers. To learn more about fine-grained S3 file filtering see [filtering files](#filtering-files).
Upon receiving `delimited_files`, your Batch API will iterate through the `delimited_files.s3_paths` to generate the set of S3 files to process. You can use `delimited_files.includes` and `delimited_files.excludes` to filter out unwanted files. Each S3 file will be parsed as a newline delimited JSON file. Each line in the file should be a JSON object, which will be treated as a single sample. The S3 file will be broken down into batches of size `delimited_files.batch_size` and submitted to your workers. To learn more about fine-grained S3 file filtering see [filtering files](#filtering-files).

__The total size of a batch must be less than 256 KiB.__

This submission pattern is useful in the following scenarios:

* one or more s3 files contains a large number of samples and must be broken down into batches
* one or more S3 files contains a large number of samples and must be broken down into batches

```yaml
POST <batch_api_endpoint>/:
{
"workers": <int>, # the number of workers to allocate for this job (required)
"delimited_files": {
"s3_paths": [<string>], # can be s3 prefixes or complete s3 paths (required)
"s3_paths": [<string>], # can be S3 prefixes or complete S3 paths (required)
"includes": [<string>], # glob patterns (optional)
"excludes": [<string>], # glob patterns (optional)
"batch_size": <int>, # the number of json objects per batch (the predict() function is called once per batch) (required)
Expand Down Expand Up @@ -201,7 +201,7 @@ RESPONSE:

When submitting a job using `delimited_files` or `file_path_lister`, you can use `s3_paths` in conjunction with `includes` and `excludes` to precisely filter files.

The Batch API will iterate through each s3 path in `s3_paths`. If the s3 path is a prefix, it iterates through each file in that prefix. For each file, if `includes` is non-empty, it will discard the s3 path if the s3 file doesn't match any of the glob patterns provided in `includes`. After passing the `includes` filter (if specified), if the `excludes` is non-empty, it will discard the s3 path if the s3 files matches any of the glob patterns provided in `excludes`.
The Batch API will iterate through each S3 path in `s3_paths`. If the S3 path is a prefix, it iterates through each file in that prefix. For each file, if `includes` is non-empty, it will discard the S3 path if the S3 file doesn't match any of the glob patterns provided in `includes`. After passing the `includes` filter (if specified), if the `excludes` is non-empty, it will discard the S3 path if the S3 files matches any of the glob patterns provided in `excludes`.

If you aren't sure which files will be processed in your request, specify the `dryRun=true` query parameter in the job submission request to see the target list.

Expand Down
6 changes: 3 additions & 3 deletions docs/deployments/batch-api/predictors.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ For proper separation of concerns, it is recommended to use the constructor's `c
### Examples

<!-- CORTEX_VERSION_MINOR -->
You can find an example of a BatchAPI using a PythonPredictor in [examples/batch/image-classifier](https://github.com/cortexlabs/cortex/tree/master/examples/batch/image-classifier)
You can find an example of a BatchAPI using a PythonPredictor in [examples/batch/image-classifier](https://github.com/cortexlabs/cortex/tree/master/examples/batch/image-classifier).

### Pre-installed packages

Expand Down Expand Up @@ -198,7 +198,7 @@ For proper separation of concerns, it is recommended to use the constructor's `c
### Examples

<!-- CORTEX_VERSION_MINOR -->
You can find an example of a BatchAPI using a TensorFlowPredictor in [examples/batch/tensorflow](https://github.com/cortexlabs/cortex/tree/master/examples/batch/tensorflow)
You can find an example of a BatchAPI using a TensorFlowPredictor in [examples/batch/tensorflow](https://github.com/cortexlabs/cortex/tree/master/examples/batch/tensorflow).

### Pre-installed packages

Expand Down Expand Up @@ -267,7 +267,7 @@ For proper separation of concerns, it is recommended to use the constructor's `c
### Examples

<!-- CORTEX_VERSION_MINOR -->
You can find an example of a BatchAPI using an ONNXPredictor in [examples/batch/onnx](https://github.com/cortexlabs/cortex/tree/master/examples/batch/onnx)
You can find an example of a BatchAPI using an ONNXPredictor in [examples/batch/onnx](https://github.com/cortexlabs/cortex/tree/master/examples/batch/onnx).

### Pre-installed packages

Expand Down
4 changes: 2 additions & 2 deletions examples/batch/image-classifier/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -415,7 +415,7 @@ spinning up workers...

The status of your job, which you can get from `cortex get <BATCH_API_NAME> <JOB_ID>`, should change from `running` to `succeeded` once the job has completed. If it changes to a different status, you may be able to find the stacktrace using `cortex logs <BATCH_API_NAME> <JOB_ID>`. If your job has completed successfully, you can view the results of the image classification in the S3 directory you specified in the job submission.

Using AWS CLI:
Using the AWS CLI:

```bash
$ aws s3 ls $CORTEX_DEST_S3_DIR/<JOB_ID>/
Expand Down Expand Up @@ -524,7 +524,7 @@ spinning up workers...

The status of your job, which you can get from `cortex get <BATCH_API_NAME> <JOB_ID>`, should change from `running` to `succeeded` once the job has completed. If it changes to a different status, you may be able to find the stacktrace using `cortex logs <BATCH_API_NAME> <JOB_ID>`. If your job has completed successfully, you can view the results of the image classification in the S3 directory you specified in the job submission.

Using AWS CLI:
Using the AWS CLI:

```bash
$ aws s3 ls $CORTEX_DEST_S3_DIR/<JOB_ID>/
Expand Down