Skip to content

Commit

Permalink
PR feedback
Browse files Browse the repository at this point in the history
  • Loading branch information
klwetstone committed Oct 19, 2021
1 parent fa684eb commit 52b1cf2
Show file tree
Hide file tree
Showing 8 changed files with 98 additions and 100 deletions.
38 changes: 19 additions & 19 deletions docs/docs/configurations.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ All video loading arguments can be specified either in a [YAML file](yaml-config
=== "YAML file"
```yaml
video_loader_config:
model_input_height: 50
model_input_width: 50
model_input_height: 240
model_input_width: 426
total_frames: 16
# ... other parameters
```
Expand All @@ -34,9 +34,9 @@ All video loading arguments can be specified either in a [YAML file](yaml-config

predict_config = PredictConfig(data_directory="example_vids/")
video_loader_config = VideoLoaderConfig(
model_input_height=224,
model_input_width=224,
total_frames=16.
model_input_height=240,
model_input_width=426,
total_frames=16
# ... other parameters
)
predict_model(
Expand Down Expand Up @@ -87,11 +87,11 @@ Only load frames that correspond to [scene changes](http://www.ffmpeg.org/ffmpeg

#### `megadetector_lite_config (MegadetectorLiteYoloXConfig, optional)`

The `megadetector_lite_config` is used to specify any parameters that should be passed to the [MegadetectorLiteYoloX model](models.md#megadetectorliteyolox) for frame selection. For all possible options, see the MegadetectorLiteYoloXConfig<!-- TODO: add github link><!-->. If `megadetector_lite_config` is `None` (the default), the MegadetectorLiteYoloX model will not be used to select frames.
The `megadetector_lite_config` is used to specify any parameters that should be passed to the [Megadetector model](models.md#megadetectorliteyolox) for frame selection. For all possible options, see the MegadetectorLiteYoloXConfig<!-- TODO: add github link><!-->. If `megadetector_lite_config` is `None` (the default), the Megadetector model will not be used to select frames.

#### `frame_selection_height (int, optional), frame_selection_width (int, optional)`

Resize the video to this height and width in pixels, prior to frame selection. If None, the full size video will be used for frame selection. Using full size videos (setting to `None`) is recommended for MegadetectorLite, especially if your species of interest are smaller. Default to `None`
Resize the video to this height and width in pixels, prior to frame selection. If None, the full size video will be used for frame selection. Using full size videos (setting to `None`) is recommended for MegadetectorLite, especially if your species of interest are smaller. Defaults to `None`

#### `total_frames (int, optional)`

Expand Down Expand Up @@ -132,7 +132,7 @@ Cache directory where preprocessed videos will be saved upon first load. Alterna

#### `cleanup_cache (bool, optional)`

Whether to delete the cache dir after training or predicting ends. Defaults to `False`
Whether to delete the cache directory after training or predicting ends. Defaults to `False`

<a id='prediction-arguments'></a>

Expand All @@ -146,9 +146,9 @@ All possible model inference parameters are defined by the `PredictConfig` class

class PredictConfig(ZambaBaseModel)
| PredictConfig(*,
data_directory: pydantic.types.DirectoryPath = # your current working directory ,
filepaths: pydantic.types.FilePath = None,
checkpoint: pydantic.types.FilePath = None,
data_directory: DirectoryPath = # your current working directory ,
filepaths: FilePath = None,
checkpoint: FilePath = None,
model_name: zamba.models.config.ModelEnum = <ModelEnum.time_distributed: 'time_distributed'>,
gpus: int = 0,
num_workers: int = 3,
Expand Down Expand Up @@ -188,7 +188,7 @@ The number of GPUs to use during inference. By default, all of the available GPU

#### `num_workers (int, optional)`

The number of CPUs to use during training. The maximum value for `num_workers` is the number of CPUs available in the system. If you are using MegadetectorLiteYoloX, it is not recommended to use the total number of CPUs available. Defaults to `3`
The number of CPUs to use during training. The maximum value for `num_workers` is the number of CPUs available on the machine. If you are using MegadetectorLite for frame selection, it is not recommended to use the total number of CPUs available. Defaults to `3`

#### `batch_size (int, optional)`

Expand All @@ -210,7 +210,7 @@ By default no threshold is passed, `proba_threshold=None`. This will return a pr

#### `output_class_names (bool, optional)`

Setting this option to `True` yields the most concise output `zamba` is capable of. The highest species probability in a video is taken to be the _only_ species in that video, and the output returned is simply the video name and the name of the s pecies with the highest class probability, or `blank` if the most likely classification is no animal. Defaults to `False`
Setting this option to `True` yields the most concise output `zamba` is capable of. The highest species probability in a video is taken to be the _only_ species in that video, and the output returned is simply the video name and the name of the species with the highest class probability, or `blank` if the most likely classification is no animal. Defaults to `False`

#### `weight_download_region [us|eu|asia]`

Expand All @@ -223,7 +223,7 @@ By default, before kicking off inference `zamba` will iterate through all of the

#### `model_cache_dir (Path, optional)`

Cache directory where downloaded model weights will be saved. If None and the MODEL_CACHE_DIR environment variable is not set, will use your default cache directory, which is often an automatic temp directory at `~/.cache/zamba`. Defaults to `None`.
Cache directory where downloaded model weights will be saved. If None and the MODEL_CACHE_DIR environment variable is not set, will use your default cache directory (e.g. `~/.cache`). Defaults to `None`

<a id='training-arguments'></a>

Expand All @@ -237,9 +237,9 @@ All possible model training parameters are defined by the `TrainConfig` class<!-

class TrainConfig(ZambaBaseModel)
| TrainConfig(*,
labels: Union[pydantic.types.FilePath, pandas.core.frame.DataFrame],
data_directory: pydantic.types.DirectoryPath = # your current working directory ,
checkpoint: pydantic.types.FilePath = None,
labels: Union[FilePath, pandas.DataFrame],
data_directory: DirectoryPath = # your current working directory ,
checkpoint: FilePath = None,
scheduler_config: Union[str, zamba.models.config.SchedulerConfig, NoneType] = 'default',
model_name: zamba.models.config.ModelEnum = <ModelEnum.time_distributed: 'time_distributed'>,
dry_run: Union[bool, int] = False,
Expand Down Expand Up @@ -303,15 +303,15 @@ Whether to run a [learning rate finder algorithm](https://arxiv.org/abs/1506.011

#### `backbone_finetune_config (zamba.models.config.BackboneFinetuneConfig, optional)`

Set parameters to finetune a backbone model to align with the current learning rate. Derived from Pytorch Lightning's built-in `BackboneFinetuning`, but with the ability to freeze batch norm layers during the freeze phase. See `zamba.pytorch.finetuning` for details.<!-- TODO: add github link><!--> The default values are specified in the `BackboneFinetuneConfig` <!-- TODO: add link to github source code><!--> class: `BackboneFinetuneConfig(unfreeze_backbone_at_epoch=15, backbone_initial_ratio_lr=0.01, multiplier=1, pre_train_bn=False, train_bn=False, verbose=True)`
Set parameters to finetune a backbone model to align with the current learning rate. Derived from Pytorch Lightning's built-in [`BackboneFinetuning`](https://pytorch-lightning.readthedocs.io/en/latest/_modules/pytorch_lightning/callbacks/finetuning.html). The default values are specified in the `BackboneFinetuneConfig` <!-- TODO: add link to github source code><!--> class: `BackboneFinetuneConfig(unfreeze_backbone_at_epoch=15, backbone_initial_ratio_lr=0.01, multiplier=1, pre_train_bn=False, train_bn=False, verbose=True)`

#### `gpus (int, optional)`

The number of GPUs to use during training. By default, all of the available GPUs found on the machine will be used. An error will be raised if the number of GPUs specified is more than the number that are available on the machine.

#### `num_workers (int, optional)`

The number of CPUs to use during training. The maximum value for `num_workers` is the number of CPUs available in the system. If you are using MegadetectorLiteYoloX, it is not recommended to use the total number of CPUs available. Defaults to `3`
The number of CPUs to use during training. The maximum value for `num_workers` is the number of CPUs available in the system. If you are using the Megadetector, it is not recommended to use the total number of CPUs available. Defaults to `3`

#### `max_epochs (int, optional)`

Expand Down
4 changes: 2 additions & 2 deletions docs/docs/debugging.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ The dry run will also catch any GPU memory errors. If you hit a GPU memory error

#### Decreasing video size

Resize video frames to be smaller before they are passed to the model. The default for all three models is 224x224 pixels. `model_input_height` and `model_input_width` cannot be passed directly to the command line, so if you are using the CLI these must be specified in a [YAML file](yaml-config.md).
Resize video frames to be smaller before they are passed to the model. The default for all three models is 240x426 pixels. `model_input_height` and `model_input_width` cannot be passed directly to the command line, so if you are using the CLI these must be specified in a [YAML file](yaml-config.md).

=== "YAML file"
```yaml
Expand All @@ -54,7 +54,7 @@ Resize video frames to be smaller before they are passed to the model. The defau

#### Reducing `num_workers`

Reduce the number of workers (subprocesses) used for data loading. By default `num_workers` will be set to 3. The minimum value is 0, which means that the data will be loaded in the main process, and the maximum is one less than the number of CPUs in the system. `num_workers` cannot be passed directly to the command line, so if you are using the CLI it must be specified in a [YAML file](yaml-config.md).
Reduce the number of workers (subprocesses) used for data loading. By default `num_workers` will be set to 3. The minimum value is 0, which means that the data will be loaded in the main process, and the maximum is one less than the number of CPUs in the system.

=== "CLI"
```console
Expand Down
29 changes: 17 additions & 12 deletions docs/docs/extra-options.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,9 @@ The options for `weight_download_region` are `us`, `eu`, and `asia`. Once a mode

## Video size

When `zamba` loads videos prior to either inference or training, it resizes all of the video frames before feeding them into a model. Higher resolution videos will lead to more detailed accuracy in prediction, but will use more memory and take longer to either predict on or train from. The default video loading configuration for all three pretrained models resizes images to 224x224 pixels.
When `zamba` loads videos prior to either inference or training, it resizes all of the video frames before feeding them into a model. Higher resolution videos will lead to more detailed accuracy in prediction, but will use more memory and take longer to either predict on or train from. The default video loading configuration for all three pretrained models resizes images to 240x426 pixels.

Say that you have a large number of videos, and you are more concerned with detecting blank v. non-blank videos than with identifying different species. In this case, you may not need a very high resolution and iterating through all of your videos with a high resolution would take a very long time. To resize all images to 50x50 pixels instead of the default 224x224:
Say that you have a large number of videos, and you are more concerned with detecting blank v. non-blank videos than with identifying different species. In this case, you may not need a very high resolution and iterating through all of your videos with a high resolution would take a very long time. To resize all images to 50x50 pixels instead of the default 240x426:

=== "YAML file"
```yaml
Expand Down Expand Up @@ -107,9 +107,9 @@ A simple option is to sample frames that are evenly distributed throughout a vid
)
```

### MegadetectorLiteYoloX
### Megadetector

You can use a pretrained object detection model called [MegadetectorLiteYoloX](models.md#megadetectorliteyolox) to select only the frames that are mostly likely to contain an animal. This is the default strategy for all three pretrained models. The parameter `megadetector_lite_config` is used to specify any arguments that should be passed to the megadetector model. If `megadetector_lite_config` is None, the MegadetectorLiteYoloX model will not be used.
You can use a pretrained object detection model called [Megadetector](models.md#megadetectorliteyolox) to select only the frames that are mostly likely to contain an animal. This is the default strategy for all three pretrained models. The parameter `megadetector_lite_config` is used to specify any arguments that should be passed to the megadetector model. If `megadetector_lite_config` is None, the Megadetector model will not be used.

For example, to take the 16 frames with the highest probability of detection:

Expand All @@ -125,8 +125,8 @@ For example, to take the 16 frames with the highest probability of detection:
In Python, these can be specified in the `megadetector_lite_config` argument passed to `VideoLoaderConfig`:
```python hl_lines="6 7 8 9 10"
video_loader_config = VideoLoaderConfig(
model_input_height=224,
model_input_width=224,
model_input_height=240,
model_input_width=426,
crop_bottom_pixels=50,
ensure_total_frames=True,
megadetector_lite_config={
Expand All @@ -142,30 +142,35 @@ For example, to take the 16 frames with the highest probability of detection:
train_model(video_loader_config=video_loader_config, train_config=train_config)
```

Using `model_input_height` and `model_input_width` resizes images *after* any frame selection is done. If you are using the MegaDetector, the frames that are input into MegadetectorLiteYoloX will still be full size. Using `frame_selection_height` and `frame_selection_width` resizes images *before* they are input to MegadetectorLiteYoloX. Inputting full size images is recommended, especially if your species of interest are on the smaller side, but resizing before using MegadetectorLiteYoloX will speed up training. The above feeds full-size images to MegadetectorLiteYoloX, and then resizes images before running them through the neural network.
If you are using the [Megadetector](models.md#megadetectorliteyolox) for frame selection, there are two ways that you can specify frame resizing:

To see all of the options that can be passed to `MegadetectorLiteYoloX`, see the `MegadetectorLiteYoloXConfig` class. <!-- TODO: add link to github code><!-->
- `frame_selection_width` and `frame_selection_height` resize images *before* they are input to the frame selection method. If both are `None`, the full size images will be used during frame selection. Using full size images for selection is recommended for better detection of smaller species, but will slow down training and inference.
- `model_input_height` and `model_input_width` resize images *after* frame selection. These specify the image size that is passed to the actual model.

You can specify both of the above at once, just one, or neither. The example code feeds full-size images to the Megadetector, and then resizes images before running them through the neural network.

To see all of the options that can be passed to the Megadetector, see the `MegadetectorLiteYoloXConfig` class. <!-- TODO: add link to github code><!-->

## Speed up training

Training will run faster if you increase `num_workers` or increase `batch_size`. `num_workers` is the number of subprocesses to use for data loading. The minimum is 0, meaning the data will be loaded in the main process, and the maximum is one less than the number of CPUs in your system. By default `num_workers` is set to 3 and `batch_size` is set to 8. Increasing either of these will use more GPU memory, and could raise an error if the memory required is more than your machine has available.
Training will run faster if you increase `num_workers` and/or increase `batch_size`. `num_workers` is the number of subprocesses to use for data loading. The minimum is 0, meaning the data will be loaded in the main process, and the maximum is one less than the number of CPUs in your system. By default `num_workers` is set to 3 and `batch_size` is set to 2. Increasing either of these will use more GPU memory, and could raise an error if the memory required is more than your machine has available.

Both can be specified in either [`predict_config`](configurations.md#prediction-arguments) or [`train_config`](configurations.md#training-arguments). For example, to increase `num_workers` to 5 and `batch_size` to 10 for inference:
Both can be specified in either [`predict_config`](configurations.md#prediction-arguments) or [`train_config`](configurations.md#training-arguments). For example, to increase `num_workers` to 5 and `batch_size` to 4 for inference:

=== "YAML file"
```yaml
predict_config:
data_directory: example_vids/
num_workers: 5
batch_size: 10
batch_size: 4
# ... other parameters
```
=== "Python"
```python
predict_config = PredictConfig(
data_directory="example_vids/",
num_workers=5,
batch_size=10,
batch_size=4,
# ... other parameters
)
```
Expand Down

0 comments on commit 52b1cf2

Please sign in to comment.