# Neomaril DataSources

This notebook give a exemple on how to import your datasorces and datasets to Neomaril

### NeomarilDataSourceClient

It's where you can manage your Datasources

In [1]:
# Import the client
from neomaril_codex.datasources import NeomarilDataSourceClient

In [2]:
# Start the client. We are reading the credentials in the NEOMARIL_TOKEN env variable
client = NeomarilDataSourceClient()
client

2024-03-20 19:19:35.385 | INFO     | neomaril_codex.base:__init__:20 - Loading .env
2024-03-20 19:19:37.219 | INFO     | neomaril_codex.base:__init__:30 - Successfully connected to Neomaril


<neomaril_codex.datasources.NeomarilDataSourceClient at 0x7f6ba047ded0>

### NeomarilDataSource

It's where you can register your datasource.

In [3]:
import os

In [4]:
credentials_path = os.path.abspath('./samples/datasources/credentials.json')

client.register_datasource(
    datasource_name='testeDataSource',
    provider='GCP',
    cloud_credentials=credentials_path,
    group='datarisk'
)

2024-03-20 19:19:43.230 | INFO     | neomaril_codex.base:__init__:20 - Loading .env
2024-03-20 19:19:43.238 | INFO     | neomaril_codex.base:__init__:30 - Successfully connected to Neomaril
2024-03-20 19:19:44.777 | INFO     | neomaril_codex.datasources:register_datasource:101 - DataSource 'testeDataSouce' was registered!


<neomaril_codex.datasources.NeomarilDataSource at 0x7f6ba047e170>

#### List Datasources

Use this function to find all data sources in your group from one specific provider.

In [5]:
client.list_datasource(provider='GCP', group='datarisk')

[{'Name': 'testeDataSouce',
  'Group': 'datarisk',
  'Provider': 'GCP',
  'RegisteredAt': '2024-03-20T22:19:44.745936+00:00'}]

#### Get Datasources

Use this function to get your datasource neomaril object.

In [6]:
datasource = client.get_datasource(datasource_name='testeDataSource', provider='GCP', group='datarisk')

2024-03-20 19:19:50.600 | INFO     | neomaril_codex.base:__init__:20 - Loading .env
2024-03-20 19:19:50.604 | INFO     | neomaril_codex.base:__init__:30 - Successfully connected to Neomaril


### NeomarilDataset

It's where you can import your dataset.
It is mandatory that you register a datasource so that you can import your dataset into it

In [7]:
dataset_uri = 'https://storage.cloud.google.com/projeto/arquivo.csv'

dataset = datasource.import_dataset(
    dataset_uri=dataset_uri,
    dataset_name='meudatasetcorreto'
)
dataset

2024-03-20 19:19:58.408 | INFO     | neomaril_codex.datasources:import_dataset:279 - Datasource testeDataSouce import process started! Use the D66c8bc440dc4882bfeff40c0dac11641c3583f3aa274293b15ed5db21000b49 on the `/api/datasets/status` endpoint to check it's status.
2024-03-20 19:19:58.410 | INFO     | neomaril_codex.base:__init__:20 - Loading .env
2024-03-20 19:19:58.415 | INFO     | neomaril_codex.base:__init__:30 - Successfully connected to Neomaril


<neomaril_codex.datasources.NeomarilDataset at 0x7f6ba4f68040>

#### List Datasets

Use this function to find your datasets.

In [8]:
datasource.list_datasets()

[{'Id': 'D66c8bc440dc4882bfeff40c0dac11641c3583f3aa274293b15ed5db21000b49',
  'CreationDate': '2024-03-20T22:19:58.433928+00:00',
  'Size': 2558,
  'Name': 'meudatasetcorreto',
  'Origin': 'E2dbf476b85e417cb4fdc325a38ee7575a30b81a82264745b3e3a2d92700bc43'}]

#### Get Datasets

Use this function to get your dataset neomaril object.

In [10]:
dataset = datasource.get_dataset(dataset_hash='D66c8bc440dc4882bfeff40c0dac11641c3583f3aa274293b15ed5db21000b49')

2024-03-20 19:20:16.688 | INFO     | neomaril_codex.base:__init__:20 - Loading .env
2024-03-20 19:20:16.692 | INFO     | neomaril_codex.base:__init__:30 - Successfully connected to Neomaril


#### Get Dataset Status

Use this function to get dataset status.

You can wait a dict:

when success:
```
{
    status : 'Succeeded',
    log : ''
}
```
when failed:
```
{
    "status": "Failed",
    "log": "UnexpectedError\n  \"Azure Request error! Message: Service request failed.\nStatus: 403 (Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.)\nErrorCode: AuthenticationFailed\n\nHeaders:\nTransfer-Encoding: chunked\nServer: Microsoft-HTTPAPI/2.0\nx-ms-request-id: xxxxx\nx-ms-error-code: AuthenticationFailed\nDate: Wed, 24 Jan 2024 12:00:36 GMT\n\""
}
```
when dataset it's not found, you recive an error DatasetNotFound

In [11]:
dataset.get_status()

{'status': 'Succeeded', 'log': ''}

#### Delete Dataset

Use this function to delete your dataset.

Pay attention when doing this action, it is irreversible!

In [12]:
dataset.delete()

2024-03-20 19:20:21.864 | INFO     | neomaril_codex.datasources:delete:468 - Dataset removed


#### Delete DataSource

Use this function to delete your datasource.

Pay attention when doing this action, it is irreversible!

In [13]:
datasource.delete()

2024-03-20 19:20:23.980 | INFO     | neomaril_codex.datasources:delete:347 - DataSource testeDataSouce was deleted!
