# How to use a cohort

This notebook shows how to use a cohort saved from Data Explorer.

It uses a cohort saved in the [Terra Notebooks Playground workspace](https://app.terra.bio/#workspaces/help-gatk/Terra%20Notebooks%20Playground/data).

## Setup

In [None]:
import firecloud.api as fapi
import pandas as pd

## Retrieve cohort SQL query

In [None]:
# Hard-code instead of use WORKSPACE_NAMESPACE/WORKSPACE_NAME, since other workspaces
# won't have the 1000g_americans cohort.
ws_namespace = "help-gatk"
ws_name = "Terra Notebooks Playground"
cohort_query = fapi.get_entity(
    ws_namespace, ws_name, "cohort", "1000g_americans"
).json()["attributes"]["query"]
cohort_query

## Create pandas dataframe of cohort participant ids

In [None]:
participant_ids = pd.read_gbq(cohort_query, dialect="standard")
participant_ids.head()

## See what tables are available to join against

In [None]:
bq_table_entities = fapi.get_entities(ws_namespace, ws_name, "BigQuery_table").json()
bq_tables = list(map(lambda e: e["attributes"]["table_name"], bq_table_entities))
bq_tables

## Join cohort participant ids against sample_info table

In [None]:
sample_info = pd.read_gbq(
    "SELECT * FROM `verily-public-data.human_genome_variants.1000_genomes_sample_info`",
    dialect="standard",
)
print("sample_info has %d rows" % len(sample_info.index))

sample_info_americans = participant_ids.join(sample_info, lsuffix="_L", rsuffix="_R")
print("sample_info_americans has %d rows\n" % len(sample_info_americans.index))

sample_info_americans.head()

# Provenance

In [None]:
import datetime

print(datetime.datetime.now())
!pip3 freeze

Copyright 2019 The Broad Institute, Inc., Verily Life Sciences, LLC All rights reserved.

This software may be modified and distributed under the terms of the BSD license. See the LICENSE file for details.