## Install prerequisities

The only prerequisity needed at this time is installing Python Sparc Client library (sparc.client).

Optionally, for uploading the files, Pennsieve Agent needs to be installed.

For details, please follow the instruction on https://docs.pennsieve.io/docs/uploading-files-programmatically .


In [2]:
!pip install sparc.client

Defaulting to user installation because normal site-packages is not writeable


## Load modules


sparc.client has a modular structure. Modules can be loaded either automatically (without the 'connect' flag), or manually.

In the following example we are loading a Pennsieve2 module and connecting to the Pennsieve agent runnin

In [3]:
from sparc.client import SparcClient
client = SparcClient(connect=False, config_file='../config/config.ini')

#Connect to a specific module - REQUIRES PENNSIEVE AGENT RUNNING
#module = client.pennsieve.connect()
#module.user.whoami() #execute internal functions of the module

# alternatively connect all the services available
#client.connect()  #connect to all services

Modules can also be loaded from other locations by simply providing a dictionary with configurations and path to the module.


In [4]:
#modules could also be added later by passing a config with env variables and path to the module  
client.add_module(config={'pennsieve_profile_name' : 'ci'}, 
                  path = 'sparc.client.services.pennsieve', 
                  connect=False)

## Pennsieve Module API 

Pennsieve module allows users to interact with Pennsieve platform.

Without connecting to the agent, the user is able to query Discover service of the Pennsieve platform for databases, specific files and records as well as to download publicly available files or datasets. 



#### Listing datasets

Listing a dataset that match a specific word, e.g. the last name of the PI or a medical term could be performed the following way:

In [5]:
response=client.pennsieve.list_datasets(query='Wagenaar')
response

{'limit': 10,
 'offset': 0,
 'totalCount': 5,
 'datasets': [{'id': 3,
   'sourceDatasetId': 13,
   'name': 'Canine Epilepsy Dataset',
   'description': 'Intracranial EEG recordings from three dogs with naturally-occurring focal epilepsy.',
   'ownerId': 97,
   'ownerFirstName': 'Jacqueline',
   'ownerLastName': 'Boccanfuso',
   'ownerOrcid': '',
   'organizationName': 'Mayo',
   'organizationId': 6,
   'license': 'Creative Commons Zero 1.0 Universal',
   'tags': ['canine', 'epilepsy', 'intracranial', 'continuous', 'eeg'],
   'version': 1,
   'revision': None,
   'size': 263615741197,
   'modelCount': [{'modelName': 'Channel', 'count': 48},
    {'modelName': 'Recording', 'count': 3},
    {'modelName': 'Subject', 'count': 3},
    {'modelName': 'Annotation', 'count': 219}],
   'fileCount': 61,
   'recordCount': 273,
   'uri': 's3://pennsieve-prod-discover-publish-use1/3/1/',
   'arn': 'arn:aws:s3:::pennsieve-prod-discover-publish-use1/3/1/',
   'status': 'PUBLISH_SUCCEEDED',
   'doi': '10

We can query the Discover service with different options, e.g. looking within certain organization, only for embargoed datasets, and order the records by name, date, size in ascending or descending direction.

In [10]:
response=client.pennsieve.list_datasets(organization='Sparc', embargo=True, order_by='date', order_direction='asc')
response

{'limit': 10,
 'offset': 0,
 'totalCount': 5,
 'datasets': [{'id': 238,
   'sourceDatasetId': 1560,
   'name': 'Mapping colon and bladder innervating sensory neurons in CLARITY cleared ganglia in mouse',
   'description': 'Imaging of colon and bladder retrograde labelled sensory neurons in whole CLARITY cleared dorsal root ganglia and nodose/jugular ganglia complex.',
   'ownerId': 1205,
   'ownerFirstName': 'Stuart',
   'ownerLastName': 'Brierley',
   'ownerOrcid': '0000-0002-2527-2905',
   'organizationName': 'SPARC Consortium',
   'organizationId': 367,
   'license': 'Creative Commons Attribution',
   'tags': ['visceral sensory neurons', 'dorsal root ganglia'],
   'version': 1,
   'revision': None,
   'size': 52139108574,
   'modelCount': [{'modelName': 'researcher', 'count': 2},
    {'modelName': 'human_subject', 'count': 0},
    {'modelName': 'term', 'count': 2},
    {'modelName': 'award', 'count': 0},
    {'modelName': 'animal_subject', 'count': 21},
    {'modelName': 'sample', '

#### Listing records

Apart from listing the dataset, we can also zoom into the records of a given dataset for a specific model, for example explore researchers within Sparc project.

In [26]:
response=client.pennsieve.list_records(model='researcher', organization='Sparc')
response

{'limit': 10,
 'offset': 0,
 'totalCount': 1141,
 'records': [{'datasetId': 282,
   'version': 1,
   'model': 'researcher',
   'properties': {'hasORCIDId': 'https://orcid.org/0000-0002-0067-510X',
    'hasAffiliation': 'Univeristy of California, Los Angeles;University of California, Los Angeles;https://ror.org/046rm7j60',
    'middleName': '',
    'hasRole': '',
    'lastName': 'Yuan',
    'recordHash': '82329d634a673e2f45a1bfe90930e5fc',
    'firstName': 'Pu-Qing',
    'id': '3136648b-e83d-4e30-a20c-d6af747712d5'}},
  {'datasetId': 287,
   'version': 1,
   'model': 'researcher',
   'properties': {'hasORCIDId': 'https://orcid.org/0000-0002-4153-9614',
    'hasAffiliation': 'University College London',
    'middleName': '',
    'hasRole': '',
    'lastName': 'Thompson',
    'recordHash': '77fbf8f43a9adb0180b8d1c8d289e6f6',
    'firstName': 'Nicole',
    'id': 'ea50a52a-54a3-4180-ace6-0b6b59beeb2a'}},
  {'datasetId': 301,
   'version': 1,
   'model': 'researcher',
   'properties': {'hasO

#### Listing files

Similarly, we can query for files that are related with given name, or extension, e.g. that are included in a specific dataset.

In [27]:
response=client.pennsieve.list_files(dataset_id=90, query='manifest', file_type='json')
response

[{'name': 'manifest.json',
  'datasetId': 90,
  'datasetVersion': 1,
  'size': 19928,
  'fileType': 'Json',
  'packageType': 'Unsupported',
  'icon': 'JSON',
  'uri': 's3://pennsieve-prod-discover-publish-use1/90/1/manifest.json',
  'createdAt': None,
  'sourcePackageId': None},
 {'name': 'manifest.json',
  'datasetId': 90,
  'datasetVersion': 1,
  'size': 1660,
  'fileType': 'Json',
  'packageType': 'Unsupported',
  'icon': 'JSON',
  'uri': 's3://pennsieve-prod-discover-publish-use1/90/1/revisions/1/manifest.json',
  'createdAt': None,
  'sourcePackageId': None}]

If we are only interested in relative paths of the files, for convenienve we can use list_filenames() function. 

In [28]:
response=client.pennsieve.list_filenames(dataset_id=90, query='manifest')
response

['files/primary/sub-898/manifest.xlsx',
 'manifest.json',
 'files/primary/sub-897/manifest.xlsx',
 'files/primary/sub-896/manifest.xlsx',
 'revisions/1/manifest.json',
 'files/derivative/manifest.xlsx']

#### Downloading files

Downloading files is also very simple. All we need to do is to list file(s) that are to be downloaded and pass it to the download_files function.

The function will either download the file with its original extension (if output_name is not specified) or pack the files and download them in gzip format to the specified directory.

In [36]:
!dir

Beginners\ guide.ipynb


In [37]:
response=client.pennsieve.list_file(dataset_id=90, query='manifest', file_type='json')
client.pennsieve.download_files(file_list=response, output_name='myfile')
!dir 

Beginners\ guide.ipynb	myfile.gz


In [38]:
response=client.pennsieve.list_file(dataset_id=90, query='manifest', file_type='json')
client.pennsieve.download_file(file_list=response[0])
!dir 

Beginners\ guide.ipynb	download  myfile.gz


In [None]:
!gunzip

In [49]:
response[0].get('namesdf') is None

True

In [54]:
response=client.pennsieve.list_files(dataset_id=90, query='manifest')
response

[{'name': 'manifest.xlsx',
  'datasetId': 90,
  'datasetVersion': 1,
  'size': 10502,
  'fileType': 'MSExcel',
  'packageType': 'Unsupported',
  'icon': 'Excel',
  'uri': 's3://pennsieve-prod-discover-publish-use1/90/1/files/primary/sub-898/manifest.xlsx',
  'createdAt': None,
  'sourcePackageId': 'N:package:825b9c5c-efe7-42ef-8421-22a23d319217'},
 {'name': 'manifest.json',
  'datasetId': 90,
  'datasetVersion': 1,
  'size': 19928,
  'fileType': 'Json',
  'packageType': 'Unsupported',
  'icon': 'JSON',
  'uri': 's3://pennsieve-prod-discover-publish-use1/90/1/manifest.json',
  'createdAt': None,
  'sourcePackageId': None},
 {'name': 'manifest.xlsx',
  'datasetId': 90,
  'datasetVersion': 1,
  'size': 10647,
  'fileType': 'MSExcel',
  'packageType': 'Unsupported',
  'icon': 'Excel',
  'uri': 's3://pennsieve-prod-discover-publish-use1/90/1/files/primary/sub-897/manifest.xlsx',
  'createdAt': None,
  'sourcePackageId': 'N:package:99029aaa-e8e6-44c6-bb45-33a9b17a4e92'},
 {'name': 'manifest.

In [68]:
#paths = ['files/derivative/manifest.xlsx']
paths = ['files/derivative/manifest.xlsx']

json = {
        "data": {
            "paths": paths,
            "datasetId": 90,
            "version": 1,
        }
    }

headers = {"content-type": "application/json"}


response=client.pennsieve.post('https://api.pennsieve.io/zipit/discover/', json=json, headers=headers)
response.content

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

In [56]:
!pennsieve agent

Pennsieve Agent started on port: 9000


In [51]:


paths = [x if x.get("name") is None else x.get("name") for x in response]
response

[{'name': 'manifest.json',
  'datasetId': 90,
  'datasetVersion': 1,
  'size': 19928,
  'fileType': 'Json',
  'packageType': 'Unsupported',
  'icon': 'JSON',
  'uri': 's3://pennsieve-prod-discover-publish-use1/90/1/manifest.json',
  'createdAt': None,
  'sourcePackageId': None},
 {'name': 'manifest.json',
  'datasetId': 90,
  'datasetVersion': 1,
  'size': 1660,
  'fileType': 'Json',
  'packageType': 'Unsupported',
  'icon': 'JSON',
  'uri': 's3://pennsieve-prod-discover-publish-use1/90/1/revisions/1/manifest.json',
  'createdAt': None,
  'sourcePackageId': None}]

In [39]:
client.pennsieve.download_files(['files/derivative/manifest.xlsx'])

TypeError: string indices must be integers

In [199]:
import os
def download_files(json, output_name):
    url = "https://api.pennsieve.io/zipit/discover"
    json = [json] if type(json) is dict else json
    print(json)
    properties=set([(x['datasetId'], x['datasetVersion']) for x in json])
    print(properties)
    paths = [x['name'] for x in json]
    print(len(paths))
    assert len(properties) == 1, "Downloading files from multiple datasets or dataset versions is not supported."
    data = {"data": {
        "paths":  paths,
        "datasetId": next(iter(properties))[0],
        "version": next(iter(properties))[1]
    }}

    # download the files with zipit service
    headers = {"content-type": "application/json"}
    response = requests.post(url, json=data, headers=headers)
    extension = output_name.split('.')[-1]
    output_name = output_name if len(paths)==1 else os.path.splitext(output_name)[0]+'.gz'
    with open(output_name, mode='wb+') as f:
        f.write(response.content)




In [201]:
download_files(response['files'],'aa')

[{'name': 'manifest.json', 'datasetId': 90, 'datasetVersion': 1, 'size': 19928, 'fileType': 'Json', 'packageType': 'Unsupported', 'icon': 'JSON', 'uri': 's3://pennsieve-prod-discover-publish-use1/90/1/manifest.json', 'createdAt': None, 'sourcePackageId': None}, {'name': 'manifest.json', 'datasetId': 90, 'datasetVersion': 1, 'size': 1660, 'fileType': 'Json', 'packageType': 'Unsupported', 'icon': 'JSON', 'uri': 's3://pennsieve-prod-discover-publish-use1/90/1/revisions/1/manifest.json', 'createdAt': None, 'sourcePackageId': None}, {'name': 'manifest.xlsx', 'datasetId': 90, 'datasetVersion': 1, 'size': 10502, 'fileType': 'MSExcel', 'packageType': 'Unsupported', 'icon': 'Excel', 'uri': 's3://pennsieve-prod-discover-publish-use1/90/1/files/primary/sub-898/manifest.xlsx', 'createdAt': None, 'sourcePackageId': 'N:package:825b9c5c-efe7-42ef-8421-22a23d319217'}, {'name': 'manifest.xlsx', 'datasetId': 90, 'datasetVersion': 1, 'size': 10647, 'fileType': 'MSExcel', 'packageType': 'Unsupported', 'ic

In [76]:
uri='s3://pennsieve-prod-discover-publish-use1/90/1/files/primary/sub-898/manifest.xlsx'
'/'.join(uri.split('/')[5:])

'files/primary/sub-898/manifest.xlsx'

In [70]:
import requests
    
url = "https://api.pennsieve.io/zipit/discover"

data = {"data": {
        "paths":  ["files/subjects.xlsx","files/dataset_description.xlsx"],
        "version": 1,
        "datasetId": 295
    }}


data = {"data": {
        "paths": ["metadata/schema.json"],
        "version": 1,
        "datasetId": 90
    }}





headers = {"content-type": "application/json"}
                
response = requests.post(url, json=data, headers=headers)

print(response.text)


{
    "models": [
        {
            "name": "summary",
            "displayName": "Summary",
            "description": "",
            "properties": [
                {
                    "name": "title",
                    "displayName": "Title",
                    "description": "",
                    "dataType": {
                        "type": "String"
                    }
                },
                {
                    "name": "isDescribedBy",
                    "displayName": "Publication URL",
                    "description": "",
                    "dataType": {
                        "type": "Array",
                        "items": {
                            "type": "String"
                        }
                    }
                },
                {
                    "name": "description",
                    "displayName": "Description",
                    "description": "",
                    "dataType": {
                        "typ

## Download files

In [69]:
import requests
    
url = "https://api.pennsieve.io/zipit/discover"

data = {"data": {
        "paths":  ["files/subjects.xlsx","files/dataset_description.xlsx"],
        "version": 1,
        "datasetId": 295
    }}


data = {"data": {
        "paths": ["files/samples.xlsx"],
        "version": 1,
        "datasetId": 295
    }}





headers = {"content-type": "application/json"}
                
response = requests.post(url, json=data, headers=headers)

print(response.text)


PK    #f%VAMb�   �      docProps/app.xmlM�=1D��q��A�Bb@�R��{/�dC�B~�9��noF�
g*�-�T��"� ��N]�n�h�cy ;�Ό�HI`���	���M��F�r�xN��pe'å!�rmީ�5�&����;i^PK    #f%V�>�:  �     docProps/core.xmlŒOO1ſ
ٳK�] �Y6Q�#		��ڝ��ퟴŅoo��"ы'�3��o^&SpC����ڀ��� �(7�d뽡9���0(T��J�Ci7�0�c@�	��Y�<C05=1)��Sn�ymO���x��M�UA�w(f()��-�Y]ñ@LD6Lm�a������UD��˃��Uϋ�_�q�����D�j�v��Q������*0�y�8��̒���~�~HJ�I��IJFk<�cL���{��X�J���4#)��	�����[�s����0���ݱ\�f�o�J��w�	�J}��-?PK    #f%V���  �      xl/theme/theme1.xml�Y�o�6��� ��J�%u
[���4A�v葖i�1%$��(
�i��a���0+�+v�����{��D�j�6-:,`���=�����x��IH�#�aQ�,^�MG>�h�0�����i���,�s��ym��Ϯ�-� �-�0)'[�%|F�
�����.Y��AoH��mW���4"�ڽ����%*�����H�d����Og�J���������(7�m�0π���4����N?��}�Z
Q�#��릟��\`0.�r|�_
:��T�K����M\�کt*K}) �>����Y-y��͚��j�\T���|�M�|y�w6�ݮ��a4k�x�Uo�U��
_��W�f۩*�P�7ж[){��.!CFoh�u��VKs�
ee�k&ɼX�!�] ��E�D��N����%}N�2
 �&(b��ݵ���9i+�(��(#=���P�

In [50]:
%timeit 
with open('x.gzip', mode='wb') as f:
    f.write(response.content)




In [48]:
!pwd

/home/patrick/IdeaProjects/sparc.client/tutorials


In [115]:
response.content

b'PK\x03\x04\x14\x00\x00\x00\x08\x00#f%V\x07AMb\x81\x00\x00\x00\xb1\x00\x00\x00\x10\x00\x00\x00docProps/app.xmlM\x8e=\x0b\x021\x10D\xff\xcaq\xbd\xb7A\xc1Bb@\xd0R\xb0\xb2\x0f{\x1b/\x90dC\xb2B~\xbe9\xc1\x8fn\x1eo\x18F\xdf\ng*\xe2\xa9\x0e-\x86T\x8f\xe3"\x92\x0f\x00\x15\x17\x8a\xb6N]\xa7n\x1c\x97h\xa5cy\x00;\xe7\x91\xce\x8c\xcfHI`\xab\xd4\x1e\xa8\t\xa5\x99\xe6M\xfe\x0e\x8eF\x9fr\x0e\x1e\xadxN\xe6\xea\xb1pe\'\xc3\xa5!\x05\r\xffrm\xde\xa9\xd45\xef&\xf5\x96\x1f\xd6\xf0;i^PK\x03\x04\x14\x00\x00\x00\x08\x00#f%V\xd0\x07>\xe1:\x01\x00\x00\xcb\x02\x00\x00\x11\x00\x00\x00docProps/core.xml\xc5\x92OO\x021\x10\xc5\xbf\n\xd9\xb3K\xbb] \xa1Y6Q\x89\x07#\t\t\x18\x8d\xb7\xda\x9d\x85\x86\xed\x9f\xb4\xc5\x85oo\xb7\xc2"\xd1\x8b\'\x8f3\xf3\xdeo^&SpC\xb9\xb6\xb0\xb4\xda\x80\xf5\x02\xdc\xe0 \x1b\xe5(7\xb3d\xeb\xbd\xa1\x089\xbe\x05\xc9\xdc0(T\x18\xd6\xdaJ\xe6Ci7\xc80\xbec\x1b@\x04\xe3\t\x92\xe0Y\xc5<C\x1d05=1)\x8b\x8aSn\x81ymO\xf8\x8a\xf7x\xb3\xb7M\x84U\x1cA\x03\x12\x94w(\x1bf()\xe7\xf0\xae-\xdb\x0e\x1eY]\xc3\xb1

In [71]:
response=a.pennsieve.list_files(query="*.png", limit=5000)['files']

In [88]:
data=set([(x['datasetId'], x['datasetVersion']) for x in response])
len(data)

51

In [89]:
[x['name'] for x in response]
z=set([x['datasetId'] for x in response])
z


{10,
 11,
 12,
 27,
 33,
 44,
 57,
 86,
 89,
 109,
 115,
 116,
 117,
 119,
 123,
 134,
 137,
 142,
 147,
 150,
 156,
 167,
 169,
 177,
 180,
 182,
 183,
 184,
 185,
 188,
 207,
 210,
 211,
 221,
 229,
 230,
 236,
 239,
 242,
 243,
 244,
 245,
 247,
 254,
 256,
 270,
 271,
 272,
 287,
 288,
 303}

In [110]:
response=response[0]

In [121]:
response

{'limit': 10,
 'offset': 0,
 'totalCount': 10000,
 'files': [{'name': 'Rat_896_Control_2_Abd Vag Ant Gas_100X_1(1)_01_02_19 (1).tif',
   'datasetId': 90,
   'datasetVersion': 1,
   'size': 17740866,
   'fileType': 'TIFF',
   'packageType': 'Slide',
   'icon': 'Microscope',
   'uri': 's3://pennsieve-prod-discover-publish-use1/90/1/files/primary/sub-896/sam-2/Rat_896_Control_2_Abd Vag Ant Gas_100X_1(1)_01_02_19 (1).tif',
   'createdAt': None,
   'sourcePackageId': 'N:package:07975557-9ef9-40f4-8060-2bafe9bebe7d'},
  {'name': 'animal_subject.csv',
   'datasetId': 90,
   'datasetVersion': 1,
   'size': 702,
   'fileType': 'CSV',
   'packageType': 'CSV',
   'icon': 'Tabular',
   'uri': 's3://pennsieve-prod-discover-publish-use1/90/1/metadata/records/animal_subject.csv',
   'createdAt': None,
   'sourcePackageId': None},
  {'name': 'banner.jpg',
   'datasetId': 90,
   'datasetVersion': 1,
   'size': 192863,
   'fileType': 'JPEG',
   'packageType': 'Image',
   'icon': 'Image',
   'uri': 's3:/

In [120]:
json = response if type(response) is dict else response
json

{'limit': 10,
 'offset': 0,
 'totalCount': 10000,
 'files': [{'name': 'Rat_896_Control_2_Abd Vag Ant Gas_100X_1(1)_01_02_19 (1).tif',
   'datasetId': 90,
   'datasetVersion': 1,
   'size': 17740866,
   'fileType': 'TIFF',
   'packageType': 'Slide',
   'icon': 'Microscope',
   'uri': 's3://pennsieve-prod-discover-publish-use1/90/1/files/primary/sub-896/sam-2/Rat_896_Control_2_Abd Vag Ant Gas_100X_1(1)_01_02_19 (1).tif',
   'createdAt': None,
   'sourcePackageId': 'N:package:07975557-9ef9-40f4-8060-2bafe9bebe7d'},
  {'name': 'animal_subject.csv',
   'datasetId': 90,
   'datasetVersion': 1,
   'size': 702,
   'fileType': 'CSV',
   'packageType': 'CSV',
   'icon': 'Tabular',
   'uri': 's3://pennsieve-prod-discover-publish-use1/90/1/metadata/records/animal_subject.csv',
   'createdAt': None,
   'sourcePackageId': None},
  {'name': 'banner.jpg',
   'datasetId': 90,
   'datasetVersion': 1,
   'size': 192863,
   'fileType': 'JPEG',
   'packageType': 'Image',
   'icon': 'Image',
   'uri': 's3:/

In [202]:
client.list_records()

AttributeError: 'SparcClient' object has no attribute 'list_records'

In [67]:
[x['name'] for x in response]
list(map(lambda x: '/'.join(x['uri'].split('/')[5:]), response))

['files/primary/sub-PR1729/PR1729_258/2019-07-30_131.jpeg',
 'files/primary/sub-PR1729/PR1729_258/2019-07-30_133.jpeg',
 'files/primary/sub-PR1729/PR1729_258/2019-07-30_136.jpeg',
 'files/primary/sub-PR1729/PR1729_258/2019-07-30_141.jpeg',
 'files/primary/sub-PR1729/PR1729_258/2019-07-30_142.jpeg',
 'files/primary/sub-PR1729/PR1729_258/2019-07-30_145.jpeg',
 'files/primary/sub-PR1729/PR1729_258/2019-07-30_148.jpeg',
 'files/primary/sub-PR1729/PR1729_258/2019-07-30_154.jpeg',
 'files/primary/sub-PR1729/PR1729_258/2019-07-30_155.jpeg',
 'files/primary/sub-PR1729/PR1729_258/2019-07-30_160.jpeg',
 'files/primary/sub-PR1729/PR1729_258/2019-07-30_164.jpeg',
 'files/primary/sub-PR1729/PR1729_258/2019-07-30_176.jpeg',
 'files/primary/sub-PR1729/PR1729_258/2019-07-30_177.jpeg',
 'files/primary/sub-PR1729/PR1729_258/2019-07-30_178.jpeg',
 'files/primary/sub-PR1729/PR1729_258/2019-07-30_195.jpeg',
 'files/primary/sub-PR1729/PR1729_258/2019-07-30_201.jpeg',
 'files/primary/sub-PR1729/PR1729_258/20

In [29]:
response['files']

TypeError: 'Response' object is not subscriptable

In [35]:
files=['files/primary/sub-10163/sam-10163/10163_P_20_1_9.xml',
 'files/primary/sub-10176/sam-10176/10176_P_10_1_11.xml',
 'files/primary/sub-10207/sam-10207/10207_P_17_1_9.xml',
 'files/primary/sub-10246/sam-10246/10246_P_13_1_6.xml',
 'files/primary/sub-10284/sam-10284/10284_P_1_1_3.xml',
 'files/primary/sub-10444/sam-10444/10444_P_9_3_13.xml']

In [36]:
import requests
    
url = "https://api.pennsieve.io/zipit/discover"

data = {"data": {
        "paths": files,
        "version": 1,
        "datasetId": 295
    }}
headers = {"content-type": "application/json"}
                
response = requests.post(url, json=data, headers=headers)

#print(response.text)


## Upload files to Pennsieve

In [None]:
In order to upload files to Pennsieve, a user needs to 

https://docs.pennsieve.io/docs/uploading-files-programmatically

In [None]:

!pennsieve agent


In [None]:
client.

In [None]:
        local_filename = url.split('/')[-1]
        with requests.get(url, stream=True) as r:
            with open(local_filename, 'wb') as f:
                shutil.copyfileobj(r.raw, f)

