# Downloading Datasets

This is how
first, need client

Normally, one would construct a client as blow.
In this example, we use ESS's SciCat.
If you want to use a different one, you need to figure out its URL.
Note that this is *not* the same URL that you open in a browser but typically ends in a suffix like `"/api/v3"`.

The token needed to authenticate can be found in the web interface by logging in and opening the settings.

<div class="alert alert-warning">
    <b>WARNING:</b>

Do *not* hard code secrets like tokens in notebooks or scripts!
Scitacean currently does not support any way to aceess them then passing them as an argument.
So you will have to find your own solution for now.

</div>

While the client itself is responsible for talking to SciCat, a `file_transfer` object is required to download data files.
Currently, only `ESSTestFileTransfer` is implemented.
It downloads / uploads files via SSH.
It will almost definitely change in the future!

```python
from scitacean import Client
from scitacean.transfer.ess import ESSTestFileTransfer
client = Client.from_token(url="https://scicat.ess.eu/api/v3",
                           token=...,
                           file_transfer=ESSTestFileTransfer())
```

For the purposes of this guide, don't want to connect to a real SciCat server in order to avoid the complications associated with that.
So we set up a fake client that only pretends to connect to a SciCat and file server.
Everything else in this guide works in the same way with a real client.

In [None]:
from scitacean.testing.docs import setup_fake_client
client = setup_fake_client()

We need the ID (`pid`) of a dataset in order to download it.
The fake client provides a dataset with id `"'20.500.12269/72fe3ff6-105b-4c7f-b9d0-073b67c90ec3'"`.
We can download it using

In [None]:
dset = client.get_dataset("20.500.12269/72fe3ff6-105b-4c7f-b9d0-073b67c90ec3")

Datasets can easily be inspected in Jupyter notebooks:

In [None]:
dset

The data files associated with this dataset can be accessed using

In [None]:
for f in dset.files:
    print(f"{f.remote_access_path=}, {f.local_path}")

Note that the `local_path` for both files is `None`.
This indicates that the files have not been downloaded.
Indeed, `client.get_dataset` downloads only the meta data from SciCat, not the files.

We can download the first file using

In [None]:
path = dset.files[0].provide_locally('data/raw/',
                                     downloader=client.file_transfer,
                                     checksum_algorithm=None)

Which populates the `local_path`:

In [None]:
dset.files[0].local_path

And the returned path points to the local file:

In [None]:
path

We can use it to read the file:

In [None]:
with path.open('r') as f:
    print(f.read())