# Ingesting Transformed Data to Beacon

This notebook outlines how you can use transformed data to ingest into Beacon instance running in the data portal.

As demonstrated in other ETL notebooks duckdb can load tables (from CSV/TSV) from URLs.

## Step 1: Extract data to tables using DuckDB


In [1]:
%pip install duckdb

Note: you may need to restart the kernel to use updated packages.


In [None]:
import duckdb

url = "https://gasi-dataportal-20241120071209060300000001.s3.amazonaws.com/projects/ETL%20Test/metadata.csv?AWSAccessKeyId=<KEY>&Signature=<SIGNATURE>&x-amz-security-token=<TOKEN>&Expires=<EXPIRY>"
con = duckdb.connect(database="./ingestion-demo.db", )
con.execute(f"CREATE TABLE IF NOT EXISTS metadata AS SELECT * FROM read_csv('{url}', ALL_VARCHAR=TRUE)")

con.execute("SELECT * FROM metadata").fetch_df()


Unnamed: 0,aligner,analysis_date,pipeline_name,pipeline_ref,variant_caller,vcf_sample_id,ethnicity,geographic_origin,karyotypic_sex,sex,...,sample_origin_type,tumor_progression,notes,library_layout,library_selection,library_source,library_strategy,platform,platform_model,run_date
0,bwa-0.7.8,2020-2-15,pipeline 5,Example,SoapSNP,HG00096,Congolese,United States of America,XXY,"Surgically transgendered transsexual, male-to-...",...,Capillary blood,Primary Malignant Neoplasm,,PAIRED,RANDOM,other library source,WGS,PacBio,PacBio RS II,2021-10-18
1,minimap2,2019-3-17,pipeline 1,Example,GATK4.0,HG00097,Tamils,United States of America,XXYY,"Surgically transgendered transsexual, male-to-...",...,Capillary blood,,,PAIRED,RANDOM,genomic source,WGS,Illumina,Illumina HiSeq 3000,2021-10-18
2,minimap2,2018-10-2,pipeline 5,Example,GATK4.0,HG00099,Aymara,United States of America,XXX,Transsexual,...,Cultured cells,Recurrent Malignant Neoplasm,,PAIRED,RANDOM,genomic source,WGS,NanoPore,Oxford Nanopore MinION,2021-10-18
3,bwa-0.7.8,2018-11-9,pipeline 5,Example,kmer2snp,HG00100,Onge,India,XYY,Female-to-male transsexual,...,Cultured autograft of skin,Primary Malignant Neoplasm,,PAIRED,RANDOM,other library source,WGS,NanoPore,Oxford Nanopore MinION,2021-10-18
4,bowtie,2019-5-27,pipeline 3,Example,GATK4.0,HG00101,Tamils,Africa,XXXX,Transsexual,...,Cultured autograft of skin,,,PAIRED,RANDOM,other library source,WGS,PacBio,PacBio RS II,2018-01-01
5,bwa-0.7.8,2021-11-22,pipeline 1,Example,SoapSNP,HG00102,Papuans,Argentina,XX,Female,...,Cultured autograft of skin,Primary Malignant Neoplasm,,PAIRED,RANDOM,other library source,WGS,PacBio,PacBio RS II,2018-01-01
6,bowtie,2018-1-8,pipeline 1,Example,SoapSNP,HG00103,Atacamenos,Africa,XXXY,Female-to-male transsexual,...,Capillary blood,Primary Malignant Neoplasm,,PAIRED,RANDOM,other library source,WGS,Illumina,Illumina HiSeq 3000,2021-10-18
7,minimap2,2022-3-6,pipeline 1,Example,GATK4.0,HG00105,Alacaluf,Africa,XX,"Surgically transgendered transsexual, male-to-...",...,Agar medium,,,PAIRED,RANDOM,genomic source,WGS,NanoPore,Oxford Nanopore MinION,2021-10-18
8,bowtie,2021-2-17,pipeline 2,Example,SoapSNP,HG00106,Guamians,Africa,XXXX,Female-to-male transsexual,...,Capillary blood,,,PAIRED,RANDOM,genomic source,WGS,Illumina,Illumina HiSeq 3000,2018-01-01
9,bwa-0.7.8,2019-8-13,pipeline 1,Example,SoapSNP,HG00107,Yanomama,United States of America,XXXY,Male,...,Agar medium,Primary Malignant Neoplasm,,PAIRED,RANDOM,other library source,WGS,Illumina,Illumina HiSeq 3000,2022-08-08


This is just a demonstration of the working principal of the notebooks. In practice you need to execute a similar logic to what we have presented in other notebooks to transform data.

Assuming the transformations have been done, you will have a submission entry similar to below.


In [4]:
import json

submission = json.loads('''{
  "dataset": {
    "createDateTime": "2021-03-21T02:37:00-08:00",
    "dataUseConditions": "general research use",
    "dataUseConditionsVersions": "17-07-2016",
    "description": "Simulation set 1.",
    "externalUrl": "http://example.org/wiki/Main_Page",
    "info": "{}",
    "name": "Dataset with fake data",
    "updateDateTime": "2022-08-05T17:21:00+01:00",
    "version": "v1.1"
  },
  "assemblyId": "GRCH38",
  "individuals": [
    {
      "id": "HG00096",
      "ethnicity": {
        "id": "SNOMED:52075006",
        "label": "Congolese"
      },
      "geographicOrigin": {
        "id": "SNOMED:223688001",
        "label": "United States of America"
      },
      "diseases": [],
      "interventionsOrProcedures": [
        {
          "procedureCode": {
            "id": "NCIT:C79426",
            "label": "Cancer Diagnostic or Therapeutic Procedure"
          }
        },
        {
          "procedureCode": {
            "id": "NCIT:C64264",
            "label": "Imaging Biomarker Analysis"
          }
        }
      ],
      "karyotypicSex": "XXY",
      "sex": {
        "id": "SNOMED:407378000",
        "label": "Surgically transgendered transsexual, male-to-female"
      }
    },
    {
      "id": "HG00097",
      "ethnicity": {
        "id": "SNOMED:12556008",
        "label": "Tamils"
      },
      "geographicOrigin": {
        "id": "SNOMED:223688001",
        "label": "United States of America"
      },
      "diseases": [
        {
          "diseaseCode": {
            "id": "SNOMED:734099007",
            "label": "Neuroblastoma of central nervous system"
          }
        },
        {
          "diseaseCode": {
            "id": "SNOMED:135811000119107",
            "label": "Lewy body dementia with behavioral disturbance (disorder)"
          }
        },
        {
          "diseaseCode": {
            "id": "SNOMED:23853001",
            "label": "Disorder of the central nervous system"
          }
        }
      ],
      "interventionsOrProcedures": [
        {
          "procedureCode": {
            "id": "NCIT:C79426",
            "label": "Cancer Diagnostic or Therapeutic Procedure"
          }
        },
        {
          "procedureCode": {
            "id": "NCIT:C64264",
            "label": "Imaging Biomarker Analysis"
          }
        }
      ],
      "karyotypicSex": "XXYY",
      "sex": {
        "id": "SNOMED:407378000",
        "label": "Surgically transgendered transsexual, male-to-female"
      }
    },
    {
      "id": "HG00099",
      "ethnicity": {
        "id": "SNOMED:113170005",
        "label": "Aymara"
      },
      "geographicOrigin": {
        "id": "SNOMED:223688001",
        "label": "United States of America"
      },
      "diseases": [
        {
          "diseaseCode": {
            "id": "SNOMED:26929004",
            "label": "Alzheimer's disease"
          }
        },
        {
          "diseaseCode": {
            "id": "SNOMED:23853001",
            "label": "Disorder of the central nervous system"
          }
        },
        {
          "diseaseCode": {
            "id": "SNOMED:359642000",
            "label": "Diabetes mellitus type 2 in nonobese (disorder)"
          }
        }
      ],
      "interventionsOrProcedures": [
        {
          "procedureCode": {
            "id": "NCIT:C79426",
            "label": "Cancer Diagnostic or Therapeutic Procedure"
          }
        },
        {
          "procedureCode": {
            "id": "NCIT:C64263",
            "label": "Laboratory Biomarker Analysis"
          }
        }
      ],
      "karyotypicSex": "XXX",
      "sex": {
        "id": "SNOMED:407374003",
        "label": "Transsexual"
      }
    },
    {
      "id": "HG00100",
      "ethnicity": {
        "id": "SNOMED:10432001",
        "label": "Onge"
      },
      "geographicOrigin": {
        "id": "SNOMED:223600005",
        "label": "India"
      },
      "diseases": [],
      "interventionsOrProcedures": [
        {
          "procedureCode": {
            "id": "NCIT:C79426",
            "label": "Cancer Diagnostic or Therapeutic Procedure"
          }
        },
        {
          "procedureCode": {
            "id": "NCIT:C64264",
            "label": "Imaging Biomarker Analysis"
          }
        }
      ],
      "karyotypicSex": "XYY",
      "sex": {
        "id": "SNOMED:407377005",
        "label": "Female-to-male transsexual"
      }
    },
    {
      "id": "HG00101",
      "ethnicity": {
        "id": "SNOMED:12556008",
        "label": "Tamils"
      },
      "geographicOrigin": {
        "id": "SNOMED:223498002",
        "label": "Africa"
      },
      "diseases": [
        {
          "diseaseCode": {
            "id": "SNOMED:254955001",
            "label": "Pituitary carcinoma"
          }
        }
      ],
      "interventionsOrProcedures": [
        {
          "procedureCode": {
            "id": "NCIT:C64263",
            "label": "Laboratory Biomarker Analysis"
          }
        },
        {
          "procedureCode": {
            "id": "NCIT:C64264",
            "label": "Imaging Biomarker Analysis"
          }
        }
      ],
      "karyotypicSex": "XXXX",
      "sex": {
        "id": "SNOMED:407374003",
        "label": "Transsexual"
      }
    },
    {
      "id": "HG00102",
      "ethnicity": {
        "id": "SNOMED:17789004",
        "label": "Papuans"
      },
      "geographicOrigin": {
        "id": "SNOMED:223713009",
        "label": "Argentina"
      },
      "diseases": [
        {
          "diseaseCode": {
            "id": "SNOMED:56265001",
            "label": "Heart disease (disorder)"
          }
        }
      ],
      "interventionsOrProcedures": [
        {
          "procedureCode": {
            "id": "NCIT:C93025",
            "label": "Serum Tumor Marker Test"
          }
        }
      ],
      "karyotypicSex": "XX",
      "sex": {
        "id": "SNOMED:248152002",
        "label": "Female"
      }
    },
    {
      "id": "HG00103",
      "ethnicity": {
        "id": "SNOMED:77502007",
        "label": "Atacamenos"
      },
      "geographicOrigin": {
        "id": "SNOMED:223498002",
        "label": "Africa"
      },
      "diseases": [],
      "interventionsOrProcedures": [
        {
          "procedureCode": {
            "id": "NCIT:C79426",
            "label": "Cancer Diagnostic or Therapeutic Procedure"
          }
        }
      ],
      "karyotypicSex": "XXXY",
      "sex": {
        "id": "SNOMED:407377005",
        "label": "Female-to-male transsexual"
      }
    },
    {
      "id": "HG00105",
      "ethnicity": {
        "id": "SNOMED:89026003",
        "label": "Alacaluf"
      },
      "geographicOrigin": {
        "id": "SNOMED:223498002",
        "label": "Africa"
      },
      "diseases": [
        {
          "diseaseCode": {
            "id": "SNOMED:359642000",
            "label": "Diabetes mellitus type 2 in nonobese (disorder)"
          }
        },
        {
          "diseaseCode": {
            "id": "SNOMED:312991009",
            "label": "Senile dementia of the Lewy body type (disorder)"
          }
        },
        {
          "diseaseCode": {
            "id": "SNOMED:81531005",
            "label": "Diabetes mellitus type 2 in obese (disorder)"
          }
        }
      ],
      "interventionsOrProcedures": [
        {
          "procedureCode": {
            "id": "NCIT:C64263",
            "label": "Laboratory Biomarker Analysis"
          }
        }
      ],
      "karyotypicSex": "XX",
      "sex": {
        "id": "SNOMED:407378000",
        "label": "Surgically transgendered transsexual, male-to-female"
      }
    },
    {
      "id": "HG00106",
      "ethnicity": {
        "id": "SNOMED:10292001",
        "label": "Guamians"
      },
      "geographicOrigin": {
        "id": "SNOMED:223498002",
        "label": "Africa"
      },
      "diseases": [
        {
          "diseaseCode": {
            "id": "SNOMED:26929004",
            "label": "Alzheimer's disease"
          }
        },
        {
          "diseaseCode": {
            "id": "SNOMED:81531005",
            "label": "Diabetes mellitus type 2 in obese (disorder)"
          }
        },
        {
          "diseaseCode": {
            "id": "SNOMED:135811000119107",
            "label": "Lewy body dementia with behavioral disturbance (disorder)"
          }
        }
      ],
      "interventionsOrProcedures": [],
      "karyotypicSex": "XXXX",
      "sex": {
        "id": "SNOMED:407377005",
        "label": "Female-to-male transsexual"
      }
    },
    {
      "id": "HG00107",
      "ethnicity": {
        "id": "SNOMED:76460008",
        "label": "Yanomama"
      },
      "geographicOrigin": {
        "id": "SNOMED:223688001",
        "label": "United States of America"
      },
      "diseases": [],
      "interventionsOrProcedures": [
        {
          "procedureCode": {
            "id": "NCIT:C64264",
            "label": "Imaging Biomarker Analysis"
          }
        }
      ],
      "karyotypicSex": "XXXY",
      "sex": {
        "id": "SNOMED:248153007",
        "label": "Male"
      }
    }
  ],
  "biosamples": [
    {
      "id": "HG00096",
      "individualId": "HG00096",
      "biosampleStatus": {
        "id": "SNOMED:365641003",
        "label": "Minor blood groups - finding"
      },
      "collectionDate": "2019-04-23",
      "collectionMoment": "P32Y6M1D",
      "histologicalDiagnosis": {
        "id": "SNOMED:719046005",
        "label": "12q14 microdeletion syndrome"
      },
      "obtentionProcedure": {
        "procedureCode": {
          "id": "NCIT:C157179",
          "label": "FGFR1 Mutation Analysis"
        }
      },
      "pathologicalTnmFinding": [
        {
          "id": "NCIT:C48725",
          "label": "T2a Stage Finding"
        }
      ],
      "sampleOriginDetail": {
        "id": "SNOMED:258497007",
        "label": "Abscess swab"
      },
      "sampleOriginType": {
        "id": "SNOMED:31675002",
        "label": "Capillary blood"
      },
      "tumorProgression": {
        "id": "NCIT:C84509",
        "label": "Primary Malignant Neoplasm"
      },
      "info": {},
      "notes": ""
    },
    {
      "id": "HG00097",
      "individualId": "HG00097",
      "biosampleStatus": {
        "id": "SNOMED:702782002",
        "label": "Mitochondrial 1555 A to G mutation positive"
      },
      "collectionDate": "2022-04-23",
      "collectionMoment": "P32Y6M1D",
      "histologicalDiagnosis": {
        "id": "SNOMED:771439009",
        "label": "14q22q23 microdeletion syndrome"
      },
      "obtentionProcedure": {
        "procedureCode": {
          "id": "",
          "label": ""
        }
      },
      "pathologicalTnmFinding": [
        {
          "id": "NCIT:C48699",
          "label": "M0 Stage Finding"
        }
      ],
      "sampleOriginDetail": {
        "id": "SNOMED:734336008",
        "label": "Specimen from aorta"
      },
      "sampleOriginType": {
        "id": "SNOMED:31675002",
        "label": "Capillary blood"
      },
      "tumorProgression": {
        "id": "",
        "label": ""
      },
      "info": {},
      "notes": ""
    },
    {
      "id": "HG00099",
      "individualId": "HG00099",
      "biosampleStatus": {
        "id": "SNOMED:702782002",
        "label": "Mitochondrial 1555 A to G mutation positive"
      },
      "collectionDate": "2021-04-23",
      "collectionMoment": "P32Y6M1D",
      "histologicalDiagnosis": {
        "id": "SNOMED:771439009",
        "label": "14q22q23 microdeletion syndrome"
      },
      "obtentionProcedure": {
        "procedureCode": {
          "id": "",
          "label": ""
        }
      },
      "pathologicalTnmFinding": [
        {
          "id": "NCIT:C48725",
          "label": "T2a Stage Finding"
        }
      ],
      "sampleOriginDetail": {
        "id": "",
        "label": ""
      },
      "sampleOriginType": {
        "id": "SNOMED:702451000",
        "label": "Cultured cells"
      },
      "tumorProgression": {
        "id": "NCIT:C4813",
        "label": "Recurrent Malignant Neoplasm"
      },
      "info": {},
      "notes": ""
    },
    {
      "id": "HG00100",
      "individualId": "HG00100",
      "biosampleStatus": {
        "id": "SNOMED:365641003",
        "label": "Minor blood groups - finding"
      },
      "collectionDate": "2021-04-23",
      "collectionMoment": "P7D",
      "histologicalDiagnosis": {
        "id": "SNOMED:771439009",
        "label": "14q22q23 microdeletion syndrome"
      },
      "obtentionProcedure": {
        "procedureCode": {
          "id": "NCIT:C157179",
          "label": "FGFR1 Mutation Analysis"
        }
      },
      "pathologicalTnmFinding": [
        {
          "id": "NCIT:C48725",
          "label": "T2a Stage Finding"
        }
      ],
      "sampleOriginDetail": {
        "id": "SNOMED:258603007",
        "label": "Respiratory specimen"
      },
      "sampleOriginType": {
        "id": "SNOMED:782814004",
        "label": "Cultured autograft of skin"
      },
      "tumorProgression": {
        "id": "NCIT:C84509",
        "label": "Primary Malignant Neoplasm"
      },
      "info": {},
      "notes": ""
    },
    {
      "id": "HG00101",
      "individualId": "HG00101",
      "biosampleStatus": {
        "id": "SNOMED:310294002",
        "label": "Mitochondrial antibodies positive"
      },
      "collectionDate": "2022-04-23",
      "collectionMoment": "P7D",
      "histologicalDiagnosis": {
        "id": "SNOMED:362965005",
        "label": "Disorder of body system (disorder)"
      },
      "obtentionProcedure": {
        "procedureCode": {
          "id": "",
          "label": ""
        }
      },
      "pathologicalTnmFinding": [
        {
          "id": "NCIT:C48725",
          "label": "T2a Stage Finding"
        }
      ],
      "sampleOriginDetail": {
        "id": "SNOMED:258500001",
        "label": "Nasopharyngeal swab"
      },
      "sampleOriginType": {
        "id": "SNOMED:782814004",
        "label": "Cultured autograft of skin"
      },
      "tumorProgression": {
        "id": "",
        "label": ""
      },
      "info": {},
      "notes": ""
    },
    {
      "id": "HG00102",
      "individualId": "HG00102",
      "biosampleStatus": {
        "id": "SNOMED:276447000",
        "label": "Mite present"
      },
      "collectionDate": "2018-04-23",
      "collectionMoment": "P32Y6M1D",
      "histologicalDiagnosis": {
        "id": "SNOMED:719046005",
        "label": "12q14 microdeletion syndrome"
      },
      "obtentionProcedure": {
        "procedureCode": {
          "id": "NCIT:C15189",
          "label": "biopsy"
        }
      },
      "pathologicalTnmFinding": [
        {
          "id": "",
          "label": ""
        }
      ],
      "sampleOriginDetail": {
        "id": "",
        "label": ""
      },
      "sampleOriginType": {
        "id": "SNOMED:782814004",
        "label": "Cultured autograft of skin"
      },
      "tumorProgression": {
        "id": "NCIT:C84509",
        "label": "Primary Malignant Neoplasm"
      },
      "info": {},
      "notes": ""
    },
    {
      "id": "HG00103",
      "individualId": "HG00103",
      "biosampleStatus": {
        "id": "SNOMED:310294002",
        "label": "Mitochondrial antibodies positive"
      },
      "collectionDate": "2021-04-23",
      "collectionMoment": "P32Y6M1D",
      "histologicalDiagnosis": {
        "id": "SNOMED:237592006",
        "label": "Abnormality of bombesin secretion"
      },
      "obtentionProcedure": {
        "procedureCode": {
          "id": "",
          "label": ""
        }
      },
      "pathologicalTnmFinding": [
        {
          "id": "NCIT:C48699",
          "label": "M0 Stage Finding"
        }
      ],
      "sampleOriginDetail": {
        "id": "SNOMED:734336008",
        "label": "Specimen from aorta"
      },
      "sampleOriginType": {
        "id": "SNOMED:31675002",
        "label": "Capillary blood"
      },
      "tumorProgression": {
        "id": "NCIT:C84509",
        "label": "Primary Malignant Neoplasm"
      },
      "info": {},
      "notes": ""
    },
    {
      "id": "HG00105",
      "individualId": "HG00105",
      "biosampleStatus": {
        "id": "SNOMED:702782002",
        "label": "Mitochondrial 1555 A to G mutation positive"
      },
      "collectionDate": "2015-04-23",
      "collectionMoment": "P32Y6M1D",
      "histologicalDiagnosis": {
        "id": "SNOMED:237592006",
        "label": "Abnormality of bombesin secretion"
      },
      "obtentionProcedure": {
        "procedureCode": {
          "id": "",
          "label": ""
        }
      },
      "pathologicalTnmFinding": [
        {
          "id": "",
          "label": ""
        }
      ],
      "sampleOriginDetail": {
        "id": "SNOMED:385338007",
        "label": "Specimen from anus obtained by transanal disk excision"
      },
      "sampleOriginType": {
        "id": "SNOMED:422236008",
        "label": "Agar medium"
      },
      "tumorProgression": {
        "id": "",
        "label": ""
      },
      "info": {},
      "notes": ""
    },
    {
      "id": "HG00106",
      "individualId": "HG00106",
      "biosampleStatus": {
        "id": "SNOMED:310293008",
        "label": "Mitochondrial antibodies negative"
      },
      "collectionDate": "2018-04-23",
      "collectionMoment": "P32Y6M1D",
      "histologicalDiagnosis": {
        "id": "SNOMED:771439009",
        "label": "14q22q23 microdeletion syndrome"
      },
      "obtentionProcedure": {
        "procedureCode": {
          "id": "",
          "label": ""
        }
      },
      "pathologicalTnmFinding": [
        {
          "id": "NCIT:C48709",
          "label": "N1c Stage Finding"
        }
      ],
      "sampleOriginDetail": {
        "id": "",
        "label": ""
      },
      "sampleOriginType": {
        "id": "SNOMED:31675002",
        "label": "Capillary blood"
      },
      "tumorProgression": {
        "id": "",
        "label": ""
      },
      "info": {},
      "notes": ""
    },
    {
      "id": "HG00107",
      "individualId": "HG00107",
      "biosampleStatus": {
        "id": "SNOMED:365641003",
        "label": "Minor blood groups - finding"
      },
      "collectionDate": "2022-04-23",
      "collectionMoment": "P7D",
      "histologicalDiagnosis": {
        "id": "SNOMED:719046005",
        "label": "12q14 microdeletion syndrome"
      },
      "obtentionProcedure": {
        "procedureCode": {
          "id": "",
          "label": ""
        }
      },
      "pathologicalTnmFinding": [
        {
          "id": "NCIT:C48709",
          "label": "N1c Stage Finding"
        }
      ],
      "sampleOriginDetail": {
        "id": "",
        "label": ""
      },
      "sampleOriginType": {
        "id": "SNOMED:422236008",
        "label": "Agar medium"
      },
      "tumorProgression": {
        "id": "NCIT:C84509",
        "label": "Primary Malignant Neoplasm"
      },
      "info": {},
      "notes": ""
    }
  ],
  "runs": [
    {
      "id": "HG00096",
      "biosampleId": "HG00096",
      "individualId": "HG00096",
      "libraryLayout": "PAIRED",
      "librarySelection": "RANDOM",
      "librarySource": {
        "id": "GENEPIO:0001969",
        "label": "other library source"
      },
      "libraryStrategy": "WGS",
      "platform": "PacBio",
      "platformModel": {
        "id": "OBI:0002012",
        "label": "PacBio RS II"
      },
      "runDate": "2021-10-18"
    },
    {
      "id": "HG00097",
      "biosampleId": "HG00097",
      "individualId": "HG00097",
      "libraryLayout": "PAIRED",
      "librarySelection": "RANDOM",
      "librarySource": {
        "id": "GENEPIO:0001966",
        "label": "genomic source"
      },
      "libraryStrategy": "WGS",
      "platform": "Illumina",
      "platformModel": {
        "id": "OBI:0002048",
        "label": "Illumina HiSeq 3000"
      },
      "runDate": "2021-10-18"
    },
    {
      "id": "HG00099",
      "biosampleId": "HG00099",
      "individualId": "HG00099",
      "libraryLayout": "PAIRED",
      "librarySelection": "RANDOM",
      "librarySource": {
        "id": "GENEPIO:0001966",
        "label": "genomic source"
      },
      "libraryStrategy": "WGS",
      "platform": "NanoPore",
      "platformModel": {
        "id": "OBI:0002750",
        "label": "Oxford Nanopore MinION"
      },
      "runDate": "2021-10-18"
    },
    {
      "id": "HG00100",
      "biosampleId": "HG00100",
      "individualId": "HG00100",
      "libraryLayout": "PAIRED",
      "librarySelection": "RANDOM",
      "librarySource": {
        "id": "GENEPIO:0001969",
        "label": "other library source"
      },
      "libraryStrategy": "WGS",
      "platform": "NanoPore",
      "platformModel": {
        "id": "OBI:0002750",
        "label": "Oxford Nanopore MinION"
      },
      "runDate": "2021-10-18"
    },
    {
      "id": "HG00101",
      "biosampleId": "HG00101",
      "individualId": "HG00101",
      "libraryLayout": "PAIRED",
      "librarySelection": "RANDOM",
      "librarySource": {
        "id": "GENEPIO:0001969",
        "label": "other library source"
      },
      "libraryStrategy": "WGS",
      "platform": "PacBio",
      "platformModel": {
        "id": "OBI:0002012",
        "label": "PacBio RS II"
      },
      "runDate": "2018-01-01"
    },
    {
      "id": "HG00102",
      "biosampleId": "HG00102",
      "individualId": "HG00102",
      "libraryLayout": "PAIRED",
      "librarySelection": "RANDOM",
      "librarySource": {
        "id": "GENEPIO:0001969",
        "label": "other library source"
      },
      "libraryStrategy": "WGS",
      "platform": "PacBio",
      "platformModel": {
        "id": "OBI:0002012",
        "label": "PacBio RS II"
      },
      "runDate": "2018-01-01"
    },
    {
      "id": "HG00103",
      "biosampleId": "HG00103",
      "individualId": "HG00103",
      "libraryLayout": "PAIRED",
      "librarySelection": "RANDOM",
      "librarySource": {
        "id": "GENEPIO:0001969",
        "label": "other library source"
      },
      "libraryStrategy": "WGS",
      "platform": "Illumina",
      "platformModel": {
        "id": "OBI:0002048",
        "label": "Illumina HiSeq 3000"
      },
      "runDate": "2021-10-18"
    },
    {
      "id": "HG00105",
      "biosampleId": "HG00105",
      "individualId": "HG00105",
      "libraryLayout": "PAIRED",
      "librarySelection": "RANDOM",
      "librarySource": {
        "id": "GENEPIO:0001966",
        "label": "genomic source"
      },
      "libraryStrategy": "WGS",
      "platform": "NanoPore",
      "platformModel": {
        "id": "OBI:0002750",
        "label": "Oxford Nanopore MinION"
      },
      "runDate": "2021-10-18"
    },
    {
      "id": "HG00106",
      "biosampleId": "HG00106",
      "individualId": "HG00106",
      "libraryLayout": "PAIRED",
      "librarySelection": "RANDOM",
      "librarySource": {
        "id": "GENEPIO:0001966",
        "label": "genomic source"
      },
      "libraryStrategy": "WGS",
      "platform": "Illumina",
      "platformModel": {
        "id": "OBI:0002048",
        "label": "Illumina HiSeq 3000"
      },
      "runDate": "2018-01-01"
    },
    {
      "id": "HG00107",
      "biosampleId": "HG00107",
      "individualId": "HG00107",
      "libraryLayout": "PAIRED",
      "librarySelection": "RANDOM",
      "librarySource": {
        "id": "GENEPIO:0001969",
        "label": "other library source"
      },
      "libraryStrategy": "WGS",
      "platform": "Illumina",
      "platformModel": {
        "id": "OBI:0002048",
        "label": "Illumina HiSeq 3000"
      },
      "runDate": "2022-08-08"
    }
  ],
  "analyses": [
    {
      "id": "HG00096",
      "individualId": "HG00096",
      "biosampleId": "HG00096",
      "runId": "HG00096",
      "aligner": "bwa-0.7.8",
      "analysisDate": "2020-2-15",
      "pipelineName": "pipeline 5",
      "pipelineRef": "Example",
      "variantCaller": "SoapSNP",
      "vcfSampleId": "HG00096"
    },
    {
      "id": "HG00097",
      "individualId": "HG00097",
      "biosampleId": "HG00097",
      "runId": "HG00097",
      "aligner": "minimap2",
      "analysisDate": "2019-3-17",
      "pipelineName": "pipeline 1",
      "pipelineRef": "Example",
      "variantCaller": "GATK4.0",
      "vcfSampleId": "HG00097"
    },
    {
      "id": "HG00099",
      "individualId": "HG00099",
      "biosampleId": "HG00099",
      "runId": "HG00099",
      "aligner": "minimap2",
      "analysisDate": "2018-10-2",
      "pipelineName": "pipeline 5",
      "pipelineRef": "Example",
      "variantCaller": "GATK4.0",
      "vcfSampleId": "HG00099"
    },
    {
      "id": "HG00100",
      "individualId": "HG00100",
      "biosampleId": "HG00100",
      "runId": "HG00100",
      "aligner": "bwa-0.7.8",
      "analysisDate": "2018-11-9",
      "pipelineName": "pipeline 5",
      "pipelineRef": "Example",
      "variantCaller": "kmer2snp",
      "vcfSampleId": "HG00100"
    },
    {
      "id": "HG00101",
      "individualId": "HG00101",
      "biosampleId": "HG00101",
      "runId": "HG00101",
      "aligner": "bowtie",
      "analysisDate": "2019-5-27",
      "pipelineName": "pipeline 3",
      "pipelineRef": "Example",
      "variantCaller": "GATK4.0",
      "vcfSampleId": "HG00101"
    },
    {
      "id": "HG00102",
      "individualId": "HG00102",
      "biosampleId": "HG00102",
      "runId": "HG00102",
      "aligner": "bwa-0.7.8",
      "analysisDate": "2021-11-22",
      "pipelineName": "pipeline 1",
      "pipelineRef": "Example",
      "variantCaller": "SoapSNP",
      "vcfSampleId": "HG00102"
    },
    {
      "id": "HG00103",
      "individualId": "HG00103",
      "biosampleId": "HG00103",
      "runId": "HG00103",
      "aligner": "bowtie",
      "analysisDate": "2018-1-8",
      "pipelineName": "pipeline 1",
      "pipelineRef": "Example",
      "variantCaller": "SoapSNP",
      "vcfSampleId": "HG00103"
    },
    {
      "id": "HG00105",
      "individualId": "HG00105",
      "biosampleId": "HG00105",
      "runId": "HG00105",
      "aligner": "minimap2",
      "analysisDate": "2022-3-6",
      "pipelineName": "pipeline 1",
      "pipelineRef": "Example",
      "variantCaller": "GATK4.0",
      "vcfSampleId": "HG00105"
    },
    {
      "id": "HG00106",
      "individualId": "HG00106",
      "biosampleId": "HG00106",
      "runId": "HG00106",
      "aligner": "bowtie",
      "analysisDate": "2021-2-17",
      "pipelineName": "pipeline 2",
      "pipelineRef": "Example",
      "variantCaller": "SoapSNP",
      "vcfSampleId": "HG00106"
    },
    {
      "id": "HG00107",
      "individualId": "HG00107",
      "biosampleId": "HG00107",
      "runId": "HG00107",
      "aligner": "bwa-0.7.8",
      "analysisDate": "2019-8-13",
      "pipelineName": "pipeline 1",
      "pipelineRef": "Example",
      "variantCaller": "SoapSNP",
      "vcfSampleId": "HG00107"
    }
  ]
}'''
)

You can use the python requests library to submit this data to the project using the presigned url code the admin of the project has given to you. Your duty is to execute the code as described and the file will be uploaded to the appropriate project automatically.


In [5]:
%pip install requests

Note: you may need to restart the kernel to use updated packages.


## Example Upload Script Provided by Project Admin


In [None]:
import requests

presigned_post = {
  "url": "<URL>",
  "fields": {
    "key": "<KEY>",
    "bucket": "<BUCKET>",
    "X-Amz-Algorithm": "AWS4-HMAC-SHA256",
    "X-Amz-Credential": "<CREDENTIAL>",
    "X-Amz-Date": "<DATE>",
    "X-Amz-Security-Token": "<TOKEN>",
    "Policy": "<POLICY>",
    "X-Amz-Signature": "<SIGNATURE>"
  }
}

file_path = "./submission-data-dictionary.json"

# Prepare multipart form data
with open(file_path, "rb") as file:
    # Add all fields and the file to the multipart data
    multipart_form_data = {
        **presigned_post["fields"],  # Include all presigned POST fields
        "file": file,  # Add the file
    }

    # Make the POST request
    response = requests.post(presigned_post["url"], files=multipart_form_data)

# Check the response
if response.status_code == 204:
    print("File uploaded successfully!")
else:
    print("Failed to upload file!")
    print(f"Status Code: {response.status_code}")
    print(f"Response: {response.text}")

File uploaded successfully!
