# Datasets command


The `datasets` command is one of the group resources command, so it needs to use with a current group.


## Setup PrimeHub Python SDK


In [None]:
from primehub import PrimeHub, PrimeHubConfig
ph = PrimeHub(PrimeHubConfig())

if ph.is_ready():
    print("PrimeHub Python SDK setup successfully")
else:
    print("PrimeHub Python SDK couldn't get the group information, follow the 00-getting-started.ipynb to complete it")

## Help documentation

In [None]:
help(ph.datasets)

## Dataset management

---


```
primehub datasets
Usage:
  primehub datasets <command>

Get a dataset or list datasets

Available Commands:
  create               Create a dataset
  delete               Delete a dataset by id
  get                  Get a dataset by name
  list                 List datasets
  update               Update the dataset
  upload_secret        Regenerate the secret of the upload server
```

---

All mutating actions require the `Admin` role:

* create
* delete
* update
* upload_secret (`regenerate_upload_server_secret` for method name)

## Dataset configuration

You need a configuration `create` and `update` to operate. Here is an example to create a `pv-dataset`:

```json
{
  "name": "pv-dataset",
  "displayName": "the dataset created by SDK",
  "description": "It is a PV dataset",
  "type": "pv",
  "global": false,
  "groups": {
    "connect": [
      {
        "id": "a7a283b5-c0e2-4b79-a78c-39c630324762",
        "writable": true
      },
      {
        "id": "a962305b-c884-4413-9358-ef56373b287c",
        "writable": false
      }
    ]
  },
  "pvProvisioning": "auto",
  "volumeSize": 1
}
```

In our system, there are 5 types for datasets: `['pv', 'nfs', 'hostPath', 'git', 'env']`. Please check the fields reference to give a proper configuration to create your own dataset.



## Fields for creating or updating

| field | required | type | description |
| --- | --- | --- | --- |
| name | required | string | it should be a valid resource name for kubernetes |
| displayName | optional | string | display name for this dataset |
| description | optional | string | |
| global | optional | boolean | when a dataset is global, it could be seen for each group |
| type | required | string | one of ['pv', 'nfs', 'hostPath', 'git', 'env'] |
| url | conditional | string | **MUST** use with `git` type |
| pvProvisioning | conditional | string | onf of ['auto', 'manual'], **MUST** use with `pv` type. This field only uses in `CREATE` action |
| nfsServer | conditional | string | **MUST** use with `nfs` type |
| nfsPath | conditional | string | **MUST** use with `nfs` type |
| hostPath | conditional | string | **MUST** use with `hostPath` type  |
| variables | optional | dict | **MAY** use with `env` type. It is key value pairs. All values have to a string value. For example: `{"key1":"value1","key2":"value2"}`. |
| groups | optional | list of connected groups (dict) | please see the `connect` examples |
| secret | optional | dict | **MAY** use with `git` type | bind a `secret` to the `git` dataset |
| volumeSize | conditional | integer | **MUST** use with `pv` type. The unit is `GB`.|
| enableUploadServer | optional | boolean | it only works with one of ['pv', 'nfs', 'hostPath'] writable types |

> There is a simple rule to use fields for `UPDATE`. All required fields should not be in the payload.

For example, there is a configuration for creating env dataset:

```bash
primehub datasets create <<EOF
{
  "name": "env-dataset",
  "description": "",
  "type": "env",
  "variables": {
    "ENV": "prod",
    "LUCKY_NUMBER": "7"
  }
}
EOF
```

After removing required `name` and `type` fields, it could be used with updating:

```bash
primehub datasets update env-dataset <<EOF
{
  "description": "make changes to the description",
  "variables": {
    "ENV": "prod",
    "LUCKY_NUMBER": "8"
  }
}
EOF
```

For updating, giving things that you want to make different:

```bash
primehub datasets update env-dataset <<EOF
{
    "groups": {
      "connect": [
        {
          "id": "a7a283b5-c0e2-4b79-a78c-39c630324762",
          "writable": false
        }
      ]
    }
  }
EOF
```








## Examples

You could find [more examples on our github](https://github.com/InfuseAI/primehub-python-sdk/blob/main/docs/CLI/datasets.md).

In [None]:
# List datasets
ph.datasets.list()

In [None]:
# Get a dataset
ph.datasets.get('primehub')