# PhenEx Study Tutorial
In this page we will show you how to use PhenEx to :
1. Connect to a Snowflake Database
2. Work with OMOP data
3. Create a simple cohort
4. View cohort summary statistics

First make sure that your PhenEx version is up to date

In [None]:
# For updating PhenEx to latest released version
!pip install -Uq PhenEx

In [None]:
import ibis
ibis.options.interactive = True

## Set Snowflake Credentials
PhenEx needs to connect to a Snowflake backend and therefore needs your login credentials. There are two ways to do this : (1) explicitly or (2) using an .env (dot env) file. We show how to do both, but only do one!
### Method 1 :

In [None]:
# import os

# # authentication
# os.environ.update({
#     'SNOWFLAKE_ACCOUNT':'ACOUNT',
#     'SNOWFLAKE_WAREHOUSE':'WAREHOUSE'
#     'SNOWFLAKE_ROLE':'ROLE'
#     'SNOWFLAKE_USER':'USER'
# })

### Method 2 :
You can also specify these with using a dotenv file (https://github.com/motdotla/dotenv). One advantage to doing this is that you do not put sensitive credential information into your jupyter notebook.

In [None]:
from dotenv import load_dotenv
load_dotenv()

If you see True above, it means python was able to find and load your environment file.



## Connect to the database

We will now establish a connection to Snowflake using a SnowflakeConnector; these connectors will use your environment variables (set above) for login credentials.

At this point we must define two databases in Snowflake:
1. Source : the snowflake location where input data to phenex should come from
2. Destination (dest) : the snowflake location where output data from phenex should be written. The destination will be created if it does not exist.

Run this cell to connect to these databases; this cell will open up two browser tabs (if you're using browser authentication). After those pages load (wait for them to say completed!), close them and return to this notebook.

In [None]:
%%capture
from phenex.ibis_connect import SnowflakeConnector

con = SnowflakeConnector(
    # SNOWFLAKE_SOURCE_DATABASE = 'SOURCE_DATABASE', # enter these, use or use the .env file
    # SNOWFLAKE_DEST_DATABASE = 'DEST_DATABASE'      # enter these, use or use the .env file
)

Notice that both of these locations can also be specified using environment variables (like we did in method 1/2 for credentials), and vice versa (credentials can be passed to a connector as keyword arguments, rather being hidden in the .env file). However, as credentials generally remain the same between projects and the database locations are project dependent, it is best practice to define database locations with the connector.


## Define input data structure

PhenEx needs to know a little bit about the structure of the input data in order to help us make phenotypes and cohorts.

What this means is that PhenEx knows in what table and column to find information such as patient id, year of birth, diagnosis events, etc. This information is generally present in all RWD sources, but for each data source, is (1) organized in a different way and (2) can have different column names.

When using a new data source, we need to onboard that database for usage with PhenEx (tell it about table structure and column names). Go to the [tutorial on onboarding a new database](/2_Onboarding_New_Database.ipynb) to learn how to onboard a database.

For the purposes of this tutorial, we will be using OMOP data, which is already onboarded and available in the PhenEx library. All we have to do is import the OMOPDomains and then get the mapped tables.

In [None]:
from phenex.mappers import OMOPDomains
omop_mapped_tables = OMOPDomains.get_mapped_tables(con)
list(omop_mapped_tables.keys())

### Looking at input data
PhenEx bundles all input data into a dictionary, in this case in the variable called omop_mapped_tables. The keys in this dictionary are known as 'domains'; we can access the input data by these domain keys. The values for each key are the actual tables

## Integrating medical codelists

In [None]:
from phenex.codelists import LocalCSVCodelistFactory

codelist_factory = LocalCSVCodelistFactory(
    path='./codelists_for_tutorial.csv',
    name_code_column='CONCEPT_ID',
    name_codelist_column='CODELIST',
    name_code_type_column = 'VOCABULARY_ID'
)

# let's see what codelists are available
codelist_factory.get_codelists()

## Cohort Definition

### Entry criterion

In [None]:
from phenex.phenotypes.codelist_phenotype import CodelistPhenotype
from phenex.codelists.codelists import Codelist

cl_af = codelist_factory.get_codelist('ATRIAL_FIBRILLATION').copy(use_code_type=False)
pt_entry = CodelistPhenotype(
    name='first_atrial_fibrillation_diagnosis',
    domain='CONDITION_OCCURRENCE',
    codelist=cl_af,
    return_date='first',
)

In [None]:
pt_entry.execute(omop_mapped_tables)
pt_entry.table

### Inclusions
#### Inclusion 1 : One year continuous coverage

In [None]:
from phenex.phenotypes import TimeRangePhenotype
from phenex.filters import RelativeTimeRangeFilter, GreaterThanOrEqualTo

pt_inclusion1 = TimeRangePhenotype(
    name = 'one_year_coverage',
    relative_time_range=RelativeTimeRangeFilter(
        when='before',
        min_days=GreaterThanOrEqualTo(365),
        anchor_phenotype=pt_entry # this is only necessary if we want to execute pt_inclusion1 outside of a cohort.
    )
)

pt_inclusion1.execute(omop_mapped_tables)

#### Inclusion 2 : Age greater than 18

In [None]:
from phenex.phenotypes import AgePhenotype
from phenex.filters import ValueFilter, GreaterThan

pt_inclusion2 = AgePhenotype(
    name = 'age_g18',
    value_filter = ValueFilter(
        min_value=GreaterThan(18)
    ),
    anchor_phenotype=pt_entry # this is only necessary if we want to execute pt_inclusion1 outside of a cohort.
)

pt_inclusion2.execute(omop_mapped_tables)

In [None]:
inclusions = [pt_inclusion1, pt_inclusion2]

### Exclusions
#### Exclusion 1 : Inpatient myocardial infarction diagnosis

In [None]:

from phenex.filters import GreaterThan
from phenex.filters.categorical_filter import CategoricalFilter
from phenex.filters import CategoricalFilter, RelativeTimeRangeFilter, GreaterThanOrEqualTo, LessThan

f_inpatient = CategoricalFilter(
    allowed_values = [
        9203,   #Emergency Room Visit
        262,    #Emergency Room and Inpatient Visit
        9201,   #Inpatient Visit
    ],
    column_name = 'VISIT_CONCEPT_ID',
    domain = 'VISIT_OCCURRENCE'
)


f_one_year_pre_index = RelativeTimeRangeFilter(
    when='before',
    anchor_phenotype=pt_entry, # this is only necessary if we want to execute pt_inclusion1 outside of a cohort.
    min_days=GreaterThanOrEqualTo(0),
    max_days=LessThan(365), 
)

cl_mi = codelist_factory.get_codelist('MYOCARDIAL_INFARCTION').copy(use_code_type=False)

pt_exclusion1 = CodelistPhenotype(
    name='myocardial_infarction_hospitalization',
    domain='CONDITION_OCCURRENCE',
    codelist=cl_mi,
    categorical_filter=f_inpatient,
    relative_time_range=f_one_year_pre_index
)

pt_exclusion1.execute(omop_mapped_tables)

In [None]:
exclusions = [pt_exclusion1]

### Characteristics

In [None]:
from phenex.phenotypes import AgePhenotype, CategoricalPhenotype

pt_characteristic1 = AgePhenotype()

pt_characteristic2 = CategoricalPhenotype(
    name = 'sex',
    categorical_filter=CategoricalFilter(column_name="GENDER_SOURCE_VALUE"), domain = "PERSON"
)

characteristics = [pt_characteristic1, pt_characteristic2]

### Outcomes

In [None]:
f_postindex = RelativeTimeRangeFilter(
    when='after',
    min_days=GreaterThan(0),
)


pt_outcome1 = CodelistPhenotype(
    name='myocardial_infarction_after_index',
    domain='CONDITION_OCCURRENCE',
    codelist=cl_mi,
    categorical_filter=f_inpatient,
    relative_time_range=f_postindex
)



In [None]:
outcomes = [pt_outcome1]

### Create the cohort

In [None]:
from phenex.phenotypes.cohort import Cohort

cohort = Cohort(
    name = 'study_tutorial_cohort',
    entry_criterion=pt_entry,
    inclusions=inclusions,
    exclusions=exclusions,
    characteristics=characteristics,
    outcomes = outcomes,
)

In [None]:
cohort.execute(omop_mapped_tables, con = con, n_threads=6, overwrite=True, lazy_execution=True)

## Reporting
### Attrition

In [None]:
from phenex.reporting import Waterfall

reporter = Waterfall()
reporter.execute(cohort)

In [None]:
### Table 1
cohort.table1

In [None]:
from phenex.reporting import TimeToEvent

end_of_followup = TimeRangePhenotype(
    name='end_of_followup',
    relative_time_range=RelativeTimeRangeFilter(when='after')
)

death_right_censor = DeathPhenotype(
    name = 'death_censoring',
    domain='DEATH',
    relative_time_range=post_index
)
right_censor_phenotypes = [end_of_followup, death_right_censor]


tte = TimeToEvent(
    right_censor_phenotypes = right_censor_phenotypes, 
    end_of_study_period=datetime.date(2025,12,12)
)

tte.execute(cohort)

print("FINISHED WRITING TABLE")
tte.plot_multiple_kaplan_meier(xlim=[0,90], outcome_indices=[0,1,2,3], path_dir=path_output, n_cols=2)
print("FINISHED SINGLE PLOT")
for i in range(0,4):
    print("PLOTTIN",i)
    tte.plot_single_kaplan_meier(xlim=[0,90], outcome_index=i, path_dir = path_output)

