# 2024-02-08: SF4 Data Pull from HISE
### By [Aishwarya Chander](aishwarya.chander@alleninstitute.org), High Resolution Translational Immunology, Allen Institute for Immunology

**Main aim**: Download all files needed for analysis of SF4 cohort, including `.h5`s and metadata. 

In [2]:
!pwd

/home/jupyter/certpro_sf4/01-sample_collection


## 00 Imports and functions

In [13]:
from datetime import date
import hisepy
import os
import pandas
import session_info
import warnings
import re

warnings.simplefilter(action='ignore', category=FutureWarning)

In [3]:
out_dir = 'output'
if not os.path.isdir(out_dir):
    os.makedirs(out_dir)

## 01 Download `.h5` files from HISE
Using HISE advance search (Query ID: `1c116580-43ea-4be4-9bed-b943fbb3d71e`) we filtered out 235 files of interest from the SF4 Cohorts across 6 batches. We'll then use `hisepy.read_files` to download all the files in one go. 


In [8]:
files = ["d78bc58c-252c-4d3a-b4cf-c2f19dadf1b4", "a8c872cc-3d62-4096-964d-a1382ca18352", "d000d96f-9a38-42c0-9a14-9db86e59567e", "a13b5b32-b912-47a8-9833-994fe816d293", "c5495948-436e-44d6-953d-60f0b37d54de", "fe30ec5d-1c6d-46e6-84e3-d7ac4ca6212b", "21a3d12a-e7f4-4fcf-bcab-f149c894559e", "68c73155-2a44-41e0-9d5f-4699404fd8f7", "722a40c1-2e0b-494e-a8f6-b4d20024445c", "30d76164-f2c4-4a57-b468-49c083e4656b", "0a2e9f45-513f-4a7b-8382-d5647373b1f0", "4c7c55a9-732b-4b81-b5e8-f2b2c80e8c6f", "92d289ca-6556-459b-9ef7-33d99e5ca628", "08b99538-5464-407e-8c02-4c8c3e64645f", "f6063871-8002-492a-a62e-53db72dd21ae", "81470842-a28e-46ce-a773-ae019e4e2544", "35c5746e-6346-4de2-adbd-7111161a3cb9", "a398bdfb-a6ee-4428-ac6b-013f9906e62c", "ed101dc1-3bd9-4e8b-bf74-6d0a7a26f1ed", "45a726cb-6302-4d56-8afb-85fd7073500d", "01ab6f7f-52bf-40c3-8c23-e9c2fb34a451", "f244c7e5-dca3-43a1-89b6-3830377b4c1c", "7f6f1b93-4496-44d2-bd58-53b29f1eb679", "07a847b2-f940-4802-b0fd-5cd5074f257f", "021b5e05-cf74-4674-a594-dda5c2c4eee4", "93a57391-1d7d-4d75-82e3-ddea15102196", "08512447-3b1f-4dc0-818e-6a7769d8e685", "79996a5a-7baa-4965-b84d-7e3fc0fa51cb", "0534024e-006f-4262-85fe-c13ff98d9063", "6ae5617b-54d1-4884-a595-12372c01b0c5", "89a1b000-928e-4fd8-8304-c9905a8bb07c", "42f8927f-6973-49cb-8b9d-3af6ce730822", "e6a5c97e-14c3-4d4d-96b2-b3ae29ea7703", "68b74b03-c87d-4366-8d8c-e8509bc09b1e", "bbba21d0-fcd2-4a95-a2f4-4ae4ef9706ab", "96f32d7a-9c76-4d19-bfc7-ae67dc942b1f", "b3a206ff-67bc-445b-98ee-3117b4a5871f", "90bfcdf2-617a-41bd-b679-575c90e441f5", "9b2b1ba6-b145-488e-9fec-579df131ebf8", "9470ab9e-3233-4cdb-b42d-2496012d9ceb", "64151481-b492-4f65-a534-4776afc07aa5", "a42aa881-4fbe-49dd-8eba-2a5235cf8260", "bbea9799-59b0-4666-b01f-b1c5b80cebd0", "0cab1ba7-15c5-4f96-8695-763d226c6569", "dc404b99-c885-4d17-a63f-3f89e2cb808e", "4c792c8d-321b-4f4f-ac8d-56ec31475a70", "b5b4d0a9-4b58-4394-8664-fefea5e2b5d8", "d15af91b-1bf9-4e95-b722-72b3cc5c7efa", "bd9e8a1a-6cd6-41ef-83ba-2bf466109f72", "0f6b6458-3787-4534-b3b1-9fe406c951ca", "ee85e09e-4fd5-4d86-9e99-474a70b7d523", "10bdc353-4723-4806-a59a-38a001710957", "822d72f1-955f-4b12-a2eb-c39ca48e4927", "4356458a-d342-4ec0-b926-1c28172b0b20", "110e2f46-6baf-4648-b7e7-8eb35da2fcee", "b290160c-ed86-4752-ba82-6f705a2dc674", "61a7f91c-de91-412b-bcb4-d16b82c8a5b8", "5184fdc5-e8de-450d-906f-dfb99c9d1273", "f4715826-7de8-483e-961a-9eaee6d44873", "86b99127-6aa6-4541-bd5e-a3bdf3ed3f32", "6be56cb3-fcb1-499b-9103-263a4cec6200", "d221cc18-6ddd-42ae-9523-8210582e1851", "e1ca6495-b734-4a7c-9915-730f617ce051", "46ef69c8-7e04-4426-97d1-7ba9e04a2789", "47b51a30-40b3-45da-87d4-d8029181213f", "33262a0d-16fa-4f3f-8526-a1a04616d8a5", "b884b9c2-457d-4700-8fe8-e31509ad5495", "60b6e019-4dc2-4cb8-afaf-b346642eb11f", "dcba9cbe-a407-4ad5-99a1-52021b30755d", "92a878d9-6b03-425d-b908-8ada180ebca5", "a7bbadaa-1f48-4b48-b03e-53acb1f940e7", "bfe7ad62-2755-4217-8271-fe8668a93149", "588a2be9-67e9-4896-9cee-02c400c9bda8", "1cfbeb4c-7d79-4a01-9ead-c62b22ad55ef", "8f9622de-ff24-4131-b154-cbab4b3c9ffb", "a3c04302-b365-449b-b2d4-b63d7f24575c", "021a93db-94b6-4ac3-85e1-7df1c2739c03", "c8fc5ff9-26a6-4fd6-9a7f-e5956dfb4b51", "3218a4b8-cf2a-4fcc-9abc-96853a78c274", "92c474f6-4af4-40ec-b148-d5a6b7403ef7", "f3861998-e573-4e82-ad41-ef1ee58278b3", "a342159b-d711-4def-a64e-1bbeea182ef3", "c585562c-a68a-4679-ab79-eb0c071f893f", "51c435d3-00d9-4196-beb7-6bef36b5b4f3", "54cf9698-e004-4f79-8251-511531d46653", "bde25ca5-2686-4295-97fa-f1d582009908", "b02e1e88-0225-442a-bac0-5faa6516145c", "491bc759-548e-4321-9454-c3b11becb0cd", "67b43117-fde4-4976-9631-f6ae6bcdf2e5", "46782aa7-daa0-4cba-9fb8-2189b3f477c7", "0a68d1c7-fb27-4241-91e6-329676cca0ad", "879f5eca-70a7-497f-91f9-998e30b856d0", "fc5457b4-8bdc-4ab2-9719-3fc94bc007dc", "009d99e5-d714-4753-9f78-eb4d64a3292e", "53abaedb-e101-4650-8b78-4de6a54e2689", "a983178a-342d-4e79-84d8-3b7adf52e3cd", "b122f097-0304-47cc-a0e9-b64da60c3b99", "50ad6ecc-63c4-4b56-95ba-444fcb620302", "d0d46e40-0309-4d80-b2e1-e9c5272b31ed", "cafea68b-8ffb-43a6-b434-c280c45ce322", "2dcb0893-d6b2-4fed-8c12-cbc4ddb9f075", "fbd3e487-2420-47d8-bdc7-ea23b38a2ccd", "eac6f42c-f257-472b-86d2-e0c03fb3cfb8", "7e808603-c960-4dab-8564-874108d415f4", "ad3f43dc-2505-49b8-83fd-091c77e5cdba", "384f67da-6d5a-43b6-8c45-d806d5203f65", "84496ded-b68e-4df2-8f05-1b1dbbe7c289", "279c06a6-4148-4c03-b502-85b1d04a6a4a", "f696ae54-dcbc-4c35-8ef7-eba903ff465a", "844fe0fa-9191-40b2-821f-e367faf7f16d", "d4954c7e-cbba-48d2-bbcf-9756fb1952d5", "033d5da4-a271-4682-850c-29c1cc0253a2", "15132757-c03b-4b52-b90e-18a4d299390c", "accc8bc9-2c67-45dc-8c29-26aafa676f2c", "c3a5d8b9-6f50-42de-8a25-229f33692b4d", "6d386487-7c9e-4a45-aa78-7042ac445b2f", "fcd85c95-2130-4169-aab7-8b0eeee043fc", "29d51903-8b11-43a7-91e2-72ec04ede6b0", "5b5d1a21-148b-4000-99c6-2e82d4a70bf1", "68729572-ae73-473c-9abf-860a7ddaa7c1", "b90c773e-8035-4f3f-be05-367a31a7113a", "49e7d1f3-cd85-4e5c-a853-70351f3eda36", "ac3ba972-bc61-4afe-be28-beccd579ef4e", "48ed97ea-0de0-45ae-a127-e6f4ba3c6d1f", "1cd68713-8171-49c8-8ca0-bf37e7f473e9", "b1caa3cd-6293-46ce-a98b-3e755cba631a", "b7f64830-c441-4ae2-8599-920f10b8ac3d", "6143932b-59c9-417b-8253-9755b8fccb8d", "5d9cc8d4-08b3-4d84-945c-856eb935ef71", "d537cbd5-d4c9-48ad-aefd-b25f77be075a", "07d67a76-0062-4d72-ba1c-c52e59b5a40a", "6354b71c-66f3-4ad2-baf4-7c92c895ada7", "fa783e88-b477-4888-8433-def63f1599ad", "4179ccb5-d87c-4bd3-aaee-b9e695a1bd82", "bb00af6e-089c-4210-b857-b96e19dea6fe", "c71a4b1c-5eee-408c-a0ad-d5e87c9ae0fa", "8d825573-de5c-4356-82dd-eef46074ce87", "53224237-2142-4474-bd41-8cbbff8a8430", "2cabe2be-b6b7-4050-8ca0-a6ace6418d49", "da022122-3f4f-4bbd-9537-554d67e4d0ea", "04aacf93-33a8-4d21-9d26-f742f3013c00", "492729e2-761c-44b7-9253-d42c3213a69f", "c149167f-cf52-45ea-ac7e-db72025cec9f", "b0d7adba-aeff-478d-9e5e-f4105e277089", "9c2d8171-9f23-446a-968c-1f2231bc26be", "455c72ed-0000-4779-98a5-761388865a27", "ae6f9e8d-c536-46bb-85b1-0c5cea1ec0c8", "c41306f8-b721-4451-a5e7-80db8fbf8e04", "e7826592-19ea-481f-8ec5-51a437614dcf", "017f6145-b7af-482a-8a25-86e2077c2bef", "225c698f-9dd0-4603-a313-9c8f64d89985", "d1c29060-b099-457b-8dab-dd61c62ff9fa", "c0134e07-4231-4360-84fc-99c114ef3540", "ab34dbe9-29c8-44aa-b8d8-3ea99ea97500", "914ee449-e965-4a21-a53f-db9350f4c093", "a953b1b9-3f38-4530-8950-de4bf32f87f4", "7ec290d8-e589-4846-9d6e-c333e7398299", "14c7fda0-c8e9-4feb-844e-5f99ef902c3f", "77a749b3-f99c-4235-bf90-ccb4776a0f61", "47ce6fbf-5222-460e-8102-c6426e49dcdf", "dd6f05c5-b488-409b-ba3b-0bfcb83c7cff", "bd522fb3-74ad-49cb-805a-2251fa23e753", "db4f974e-8262-434b-97d8-d0a4fab72db6", "20e6247f-54c1-4920-8878-58778a9b7734", "e22af67d-d4ae-4722-81ff-aff0efd97655", "7ec683e4-4c6f-4f29-a200-fee072116991", "38ba28ca-8f8a-4827-b0d8-abbfbc1fe068", "d171e878-6aea-4215-ab8f-9054c7400888", "f09398d9-ae36-41b3-8314-1018de865bf5", "f5aecaba-5478-4716-916f-5b37d6d74397", "96580d8f-ee85-4cc1-bd9b-58f742c01ba2", "5553a53c-2d2b-429b-bfe1-28e1409a618d", "147c5afc-a327-4f22-a7d9-5aa9596f3f3f", "cc4c7672-7f96-41a6-8c11-396228c5b2da", "60eb3a1a-6792-4ae8-9c66-fb643c1dcbe7", "8ac75efd-eb0e-4caa-80f8-9fde76b88758", "a19e1fe3-da4b-4870-8c97-767d62ad6d05", "f818d119-4090-4cee-8b1f-7a219db9c9b5", "273d9956-8d38-4fa8-9481-0f5b4c5aa6df", "d8ce1201-9262-4433-b628-a31a1070242e", "55d72b8e-5045-40cf-9f78-acfbabb79cfd", "fa7a0435-10ba-498f-b856-06802b0bbc7a", "b3ee1022-ea9b-457d-a3c9-e5916c94e703", "73bbf613-4bb6-40c8-96d5-d0bd35afb95c", "308b348e-41ad-43e5-ba91-f56ce7b36cb8", "0496712a-88d5-46ce-982c-1fd9bab0e82a", "793be3c0-8941-4bad-9532-49bf1f9087cb", "f4e0088f-c674-4ef1-bf96-2694e89b4aaf", "00efe39c-239c-41ae-a8c8-4bbb756f4e61", "6792b8f9-60ae-4551-8a82-40b2ce1e3e07", "584bfc61-1544-4d92-b622-360d6bea984f", "5c687367-29d6-4a34-897e-94e45099da58", "35dbca8d-279b-4050-955a-a2d214b41d15", "17ba2f30-6340-4a51-940e-48f12515a1a6", "cdb0cc67-89bc-4844-9bc2-973baae173b6", "04d66a72-b35a-4430-9265-519d3551e264", "06d8e8bf-593b-4f15-aef4-ac1d98dda3a0", "c4e01f5e-0ba1-4d3a-a7cc-26dc242a955e", "88f211d2-bb82-4f0f-b20a-8f7e222e5299", "879f121f-aa8f-483a-96ae-d41de300b1d7", "c0b38d3e-15ab-480e-8859-aa179e4a3a09", "a6048b97-445c-4ee7-a2f0-0ac319af17f4", "d8ff3a0a-e3ba-421e-8a5d-2e37396f8c10", "dc0cb51e-8a62-4eab-85f6-b9bd19ff8d2a", "7f6afbf7-9194-43b8-8f21-d2f332af799e", "a20d685a-63f5-4bef-8a37-448446592873", "d96e05f1-d1c2-4d08-a9ee-4ab3bcdaef2f", "cda745a9-7d95-4f45-bd70-1616a23908d4", "0d27699f-08c8-460f-8541-d80b71298b90", "ae6d7732-12b1-420c-8e3a-f318e5393843", "ffbacd4d-b4a1-4d4f-92b5-2c4ea6ccc2d9", "4c64f1cf-227a-40b0-a637-407b39772975", "c71d0ae4-e35b-4921-8ac4-c993f3767c4f", "5ff71b0b-ab86-4589-8320-1a91d50e23d8", "5b2be195-c26d-4651-a1bd-9b696e6a79c9", "0790221a-199c-4598-8ef6-459f3070387e", "720a85d5-136c-4106-8be3-e0310e6666c6", "f42c9ff0-8ce8-48d6-aa53-4cc2e3877769", "9648d914-2fda-4f2c-ad8c-f45e015be4e7", "6ba20e28-2f6e-48db-9e9c-0de9acd53722", "c5ad4ed3-1f4c-4d55-888e-82739f52ed41", "47e7b3aa-be98-4b52-9ea6-19fd0fb12279", "782e277d-6b01-49bc-9db6-027e2d3d0656", "aa527b75-0d57-42bd-a020-6dbb3f57d687", "109f2170-9a5b-40cf-9904-e15bd694e84e", "4e2d4698-3f6d-42c1-bba0-d8e840d951cc", "7bc23a6b-a856-467b-a6e4-7372b96d66cd", "cb6d1432-9ae3-4595-a4ea-5484b58d95d4", "3ce27bc5-91f2-4d2d-a160-a95fae4b3302", "87e5238c-79ee-405e-be94-a6d6ff8047c6", "d6b231e5-5b86-4a87-93ec-7aa7cf7dedef", "e000a831-3901-4244-80cf-5908fec96d8a", "4ae40b71-7f6d-48c8-8b51-7fda059b45f1", "dbfd3f81-ab49-47b6-b71f-02e374f8c5c0", "875a3fbb-6279-49ee-83bb-3be588196cf1"]
len(files)

235

In [5]:
downloads = hisepy.read_files(files)

## 02 Extract Sample Metadata 
Now that we have our files, we'll pull all necessay metadata. Some of our data was automatically downloaded within the `downloads` object. From here, we can get a list of all necessary sample IDs and extract sample metadata such as age using `hisepy.read_samples`.

In [6]:
downloads['descriptors'].head()

Unnamed: 0,sample.id,sample.bridgingControl,sample.sampleKitGuid,sample.visitName,sample.visitDetails,sample.drawDate,sample.daysSinceFirstVisit,sample.diseaseStatesRecordedAtVisit,file.id,file.name,...,subject.biologicalSex,subject.birthYear,subject.ethnicity,subject.partnerCode,subject.race,subject.subjectGuid,cohort.cohortGuid,lastUpdated,labLastModified,surveyLastModified
0,a35aeceb-822f-4cfd-a925-3d271ef56392,False,KT03173,,,0001-01-01T00:00:00Z,,[],d78bc58c-252c-4d3a-b4cf-c2f19dadf1b4,automated/frna-tenx/2024-01-10T19:09:38.738495...,...,Female,0,,SF,African American,SF3001,SF4,2024-03-16T17:52:34.816Z,0001-01-01T00:00:00Z,0001-01-01T00:00:00Z
1,3ec4df4e-91c4-48e3-a1fc-d38955fe7574,False,KT03180,,,0001-01-01T00:00:00Z,,[],a8c872cc-3d62-4096-964d-a1382ca18352,automated/frna-tenx/2024-01-10T19:09:38.738495...,...,Male,0,,SF,American Indian,SF3006,SF4,2024-03-16T17:52:34.816Z,0001-01-01T00:00:00Z,0001-01-01T00:00:00Z
2,7c8fd8bc-4b69-4110-ab68-25d974bb0195,False,KT03195,,,0001-01-01T00:00:00Z,,[],d000d96f-9a38-42c0-9a14-9db86e59567e,automated/frna-tenx/2024-01-09T20:34:10.956051...,...,Male,0,,SF,Caucasian,SF3016,SF4,2024-03-16T17:52:34.816Z,0001-01-01T00:00:00Z,0001-01-01T00:00:00Z
3,0cbea8fd-9d54-4937-ba44-07294b0eb0ab,False,KT03199,,,0001-01-01T00:00:00Z,,[],a13b5b32-b912-47a8-9833-994fe816d293,automated/frna-tenx/2024-01-09T20:21:28.714278...,...,Female,0,,SF,Asian,SF3020,SF4,2024-03-16T17:52:34.816Z,0001-01-01T00:00:00Z,0001-01-01T00:00:00Z
4,fd5e9208-d9f2-45d3-8e70-d70aa9d92e4c,False,KT03203,,,0001-01-01T00:00:00Z,,[],c5495948-436e-44d6-953d-60f0b37d54de,automated/frna-tenx/2024-01-10T19:09:38.738495...,...,Female,0,,SF,American Indian,SF3024,SF4,2024-03-16T17:52:34.816Z,0001-01-01T00:00:00Z,0001-01-01T00:00:00Z


In [7]:
samples = list(downloads['descriptors']['sample.id'].unique())
sample_metadata = hisepy.read_samples(samples)

In [17]:
subject_ages = sample_metadata['metadata'][['subject.id', 'subject.ageAtEnrollment']]
complete_metadata = downloads['descriptors'].merge(subject_ages, how='inner', on='subject.id')

complete_metadata['extracted_name'] = complete_metadata['file.name'].apply(os.path.basename)
complete_metadata['sampleID'] = [re.search('PB\d+-\d+', file_name).group() 
                                for file_name in complete_metadata['extracted_name']]

complete_metadata = complete_metadata.rename(columns={'subject.ageAtEnrollment':'age_enrollment',
                           'subject.ethnicity':'ethnicity',
                           'subject.biologicalSex':'biological_sex',
                           'subject.subjectGuid':'subject_guid'})

metadata_path = 'output/sf4_cohort_hise_metadata.csv'
complete_metadata.to_csv(metadata_path)

## 03 Upload results to HISE

Now we'll use `hisepy.upload.upload_files()` to send a copy of our output to HISE. Here, we'll be uploading our sample metadata for future use. The `.h5` files we downloaded will remain in our local environment to be used in the next notebook. 

In [4]:
study_space_uuid = 'de025812-5e73-4b3c-9c3b-6d0eac412f2a'
d = str(date.today())
title = f'{d} SF4 Cohort HISE Metadata [AC]'

In [10]:
metadata_path = 'output/sf4_cohort_hise_metadata.csv'
out_files = [metadata_path]

In [11]:
hisepy.upload.upload_files(
    files = out_files,
    study_space_id = study_space_uuid,
    title = title,
    input_file_ids = files
)

you are trying to upload file_ids... ['output/sf4_cohort_hise_metadata.csv']. Do you truly want to proceed?


(y/n) y


{'trace_id': 'c323c646-844e-444a-b469-02d9c16d9d7a',
 'files': ['output/sf4_cohort_hise_metadata.csv']}

In [14]:
session_info.show()