## 🐍 Introduction to Reproducibility with Python

*Notebook by [Pedro V Hernandez Serrano](https://github.com/pedrohserrano)*


# 3. Managing Data with pyDataverse
* [3.1. Connection to Dataverse API](#3.1)
* [3.2. Uploading data](#3.2)
* [3.3. Automating uploads](#3.3)

---
## 3.1. Connection to Dataverse API
<a id="3.1">

**[pyDataverse](https://pydataverse.readthedocs.io/en/latest/)** is a Python module for Dataverse you can use for:

- Accessing the Dataverse API’s
- Manipulating and using the Dataverse (meta)data - Dataverses, Datasets, Datafiles
- Install

> `!pip install pyDataverse`

Connecting to Dataverse via pyDataverse involves initializing a `NativeApi` object with the base URL of your Dataverse installation and an API token for authentication.

2 main python parameters are needed to initialize the connection: 
- **`API_TOKEN`** (obtained from DataverseNL)
- **`DATAVERSE`** (the link to the desired working dataverse environment)

Connection:

``` Python
from pyDataverse.api import NativeApi

API_TOKEN = 
DATAVERSE = 

# Initialize the API connection
BASE_URL = DATAVERSE.split('/dataverse/')[0]
api = NativeApi(BASE_URL, API_TOKEN)
resp = api.get_info_version()
if resp.json()['status'] == 'OK':
    print('Successful connection to DataverseNL API!!')
else:
    print('Failed to connect to DataverseNL API.')

```

---
**IMPORTANT:**
Secrets (such as Tokens should be in a separate config file and never pushed to Github

- Create a config file
`config.json`

- Add the parameters in a python dictionary format `{ }`

In [None]:
import json
from pyDataverse.api import NativeApi

with open('config.json', 'r') as config_file:
    config = json.load(config_file)

API_TOKEN = config['API_TOKEN']
DATAVERSE = config['DATAVERSE']

# Initialize the API connection
BASE_URL = DATAVERSE.split('/dataverse/')[0]
api = NativeApi(BASE_URL, API_TOKEN)
resp = api.get_info_version()
if resp.json()['status'] == 'OK':
    print('Successful connection to DataverseNL API!!')
else:
    print('Failed to connect to DataverseNL API.')

---
## 3.2. Uploading Data
<a id="3.2">

After establishing a connection, the next step is to upload datasets. This involves creating `Datafile` instances and uploading them using your `NativeApi` connection.

We need 2 more parameters to do the upload
- **`DOI`** (obtained from the dataset entry we want to upload data to)
- **`PATH_TO_FILE`** (path directory where the data file is)

In [None]:
from pyDataverse.models import Datafile

with open('config.json', 'r') as config_file:
    config = json.load(config_file)

DOI = config['DOI']
PATH_TO_FILE = config['PATH_TO_FILE']
                 
# Upload the file to Dataverse
df = Datafile()
df.set({"pid": DOI, "filename": PATH_TO_FILE})
resp = api.upload_datafile(DOI, PATH_TO_FILE)
if resp.status_code == 200:
    print(f"File {PATH_TO_FILE} -> status: {resp.json()['status']}")
else:
    print(f"Failed to upload {PATH_TO_FILE}. Error: {resp.content}")

---
## 3.3. Automating uploads
<a id="3.3">

To automate the process of uploading datasets to Dataverse, we create a standalone Python script that reads the parameters `config.json` .

Standalone Python:
    
```Python
import json
from pyDataverse.api import NativeApi
from pyDataverse.models import Datafile

with open('config.json', 'r') as config_file:
    config = json.load(config_file)

API_TOKEN = config['API_TOKEN']
DATAVERSE = config['DATAVERSE']
DOI = config['DOI']
PATH_TO_FILE = config['PATH_TO_FILE']

# Initialize the API connection
BASE_URL = DATAVERSE.split('/dataverse/')[0]
api = NativeApi(BASE_URL, API_TOKEN)
resp = api.get_info_version()
if resp.json()['status'] == 'OK':
    print('Successful connection to DataverseNL API!!')
else:
    print('Failed to connect to DataverseNL API.')
    
# Upload the file to Dataverse
df = Datafile()
df.set({"pid": DOI, "filename": PATH_TO_FILE})
resp = api.upload_datafile(DOI, PATH_TO_FILE)
if resp.status_code == 200:
    print(f"File {PATH_TO_FILE} -> status: {resp.json()['status']}")
else:
    print(f"Failed to upload {dataset_file_path}. Error: {resp.content}")
```

## 🙌🏼 Hands-on

Create a standalone script that with one command line will perform all the dataset uploading to dataverseNL

**Usage:**
```bash
python upload_dataset_to_dataverseNL.py
```