# Download the LSFB dataset

Both LSFB datasets, e.g. `cont` and `isol`, can be downloaded through *HTTP/HTTPS*. The  `lsfb-dataset` package provide a `Downloader` class taking care of the download according to your needs.
For example, the downloader can filter out the files that you don't need and can also resume the downloading where it stops.

| Name of the dataset | ID   | Poses | Videos (GB) |
|---------------------|------|-------|-------------|
| LSFB ISOL           | isol | 10GB  | 25GB        |
| LSFB CONT           | cont | 31GB  | **~400GB**  |

As you can see in this table, the datasets can be heavy, especially the videos of the LSFB CONT dataset.

**!!! READ THE EXAMPLES BEFORE LAUNCHING THE CODE !!!**

## Download LSFB ISOL Landmarks

By default, the downloader will fetch the landmarks of the entirety of the specified dataset. The only mandatory parameters are the dataset name and the destination folder where the files are going to be downloaded.

In [1]:
from lsfb_dataset import Downloader

downloader = Downloader(dataset='isol', destination="./destination/folder")
downloader.download()


## Download LSFB CONT Landmarks and Videos

Here's another example where we download the videos and the landmarks of the continuous sign language discussions in LSFB CONT dataset.
We have to specify that we want to include the videos, otherwise it will only download the poses (landmarks).

Be aware that the videos are heavy. For example, this code will download more than 400 videos and can use hundreds of GB!

In [None]:
from lsfb_dataset import Downloader

downloader = Downloader(
    dataset='cont',
    destination="./destination/folder",
    include_cleaned_poses=True,
    include_videos=True,
)
downloader.download()

## Downloading a subset of LSFB ISOL (OR CONT)

By default, the full dataset is downloaded. To only download a subset of the dataset, you need to set the parameter `splits` with at least one string.
We recommend you to first try the `mini_sample` split as it contains a minimal number of instances. Other splits are `fold_0` to `fold_4`, `train` and `test`.
The default split is `all`.

In [None]:
from lsfb_dataset import Downloader

downloader = Downloader(
    dataset='isol',
    destination="./destination/folder",
    splits=['mini_sample'],
)
downloader.download()

## Overwrite existing files

By default, the downloader skip the existing files.
If you want to re-download the dataset, you can disable this behavior.

In [None]:
from lsfb_dataset import Downloader

downloader = Downloader(
    dataset='isol',
    destination="./destination/folder",
    splits=['mini_sample'],
    skip_existing_files=False,
)
downloader.download()

## A more complex example

Here's a more complex example where we only download the instances:
* Of the subsets `fold_0` and `fold_2`;
* Only the instances of the signers `20 to 39`;
* Only download the raw poses (without any interpolation of the missing landmarks nor smoothing);
* Only includes the landmarks of the `pose` (body) and the hands;
* Without skipping the existing files.

In [None]:
from lsfb_dataset import Downloader

downloader = Downloader(
    dataset='isol',
    destination="./destination/folder",
    splits=['fold_0', 'fold_2'],
    signers=list(range(20, 40)),
    include_cleaned_poses=False,
    include_raw_poses=True,
    include_videos=False,
    landmarks=['pose', 'left_hand', 'right_hand'],
    skip_existing_files=False,
)
downloader.download()