## PATIENT_FOLLOW_UP to OMOP 

In [21]:
import mysql.connector
import pandas as pd
import psycopg2
import random
import numpy as np
from datetime import datetime, timedelta

def get_random_value(val):
    if isinstance(val, tuple):
        return np.random.choice(val)
    return val

There are tables that map to values already entered previously. Check for potential issues with this and keep it in mind.

#### Recommended Links

The following link discusses the issue of a missing value in the `visit_start_date` column of OMOP. It is important not to use arbitrary "default" values; instead, the correct value must be specified. I assume this applies not only to this column but also to other date columns, such as the `episode_end_date` column in the OMOP `EPISODE` table that we also need to map:

[https://forums.ohdsi.org/t/missing-visit-start-date/16470](https://forums.ohdsi.org/t/missing-visit-start-date/16470)

In [22]:
df_patients_FollowUp_IDEA4RC = pd.read_csv("./IDEA4RC-data/patientsFollowUpIDEA4RC.csv")

In [23]:
df_patients_FollowUp_IDEA4RC.head(3)

Unnamed: 0,Patient,Status at last follow-up,Patient Follow Up date,New cancer diagnosis,Date of new cancer diagnosis,New cancer topography,Last Contact
0,45,0,0,4188540,0,44498973,0
1,20,0,0,4188539,0,36534215,0
2,35,0,0,4188539,0,44498973,0


**Translations and Explanations:**

- **"Patient"**: I assume these are the same "ids" from the "Patient" table. In OMOP, they will be the same values as `person_id`.

- **"Status at last follow-up"**: Use 0 as a "default" since no values are specified for the vocabularies in the EXCEL.

- **"Patient Follow Up date"**: Use 0 as a "default" because nothing is specified in IDEA4RC. Refer to the link provided for guidance on what value should be used in the `visit_start_date` column of the OMOP `VISIT` table.

- **"New cancer diagnosis"**: This seems fine. Check if these vocabularies are present in the `Patient` table. If not, add them to OMOP in a similar manner as in the `PATIENT` table, with `person_id` and `observation_concept_id`.

- **"Date of new cancer diagnosis"**: Use 0 by default since there is no value in IDEA4RC. In OMOP, this value is expected to come from the `Condition_Occurrence` table. Check where this value should come from.

- **"New cancer topography"**: It has a numeric vocabulary reference, but it is not specified in which OMOP table to add this column.

- **"Last Contact"**: Use 0 by default since there is no value in IDEA4RC. Add it as a date to the `EPISODE` table. Ensure that this value is not arbitrary, as explained for "Patient Follow Up date".

In [24]:
conn = psycopg2.connect(
    dbname="omopdb",
    user="postgres",
    password="mysecretpassword",
    host="localhost",
    port="5432"
)

cur = conn.cursor()
config = {
    'user': 'user', 
    'password': 'password',
    'host': '127.0.0.1',
    'database': 'idea4rc_dm',
    'raise_on_warnings': True
}

conn = mysql.connector.connect(**config)
curIDEA = conn.cursor()

df_patients_follow_up.head(5)

vocab_uploader-1  | CREATE TABLE IF NOT EXISTS omopcdm.VISIT_OCCURRENCE
vocab_uploader-1  | (
vocab_uploader-1  |     visit_occurrence_id           integer     NOT NULL,
vocab_uploader-1  |     person_id                     integer     NOT NULL,
vocab_uploader-1  |     visit_concept_id              integer     NOT NULL,
vocab_uploader-1  |     visit_start_date              date        NOT NULL,
vocab_uploader-1  |     visit_end_date                date        NOT NULL,
vocab_uploader-1  |     visit_type_concept_id         Integer     NOT NULL,

### PatientFollowUp to VISIT_OCCURRENCE table

What to do with visit_concept_id and visit_type_concept_id

In [28]:
sql="""
    INSERT INTO omopcdm.visit_occurrence (person_id, visit_concept_id, visit_start_date, visit_end_date,visit_type_concept_id)
    VALUES (%s, %s, %s, %s, %s, %s)
"""

df_tables=df_patients_FollowUp_IDEA4RC
for idx, row in df_tables.iterrows():
    person_id=row['patient']
    visit_concept_id=0
    visit_start_date=row['patientFollowUpDate']
    visit_end_date=row['patientFollowUpDate']
    visit_type_concept_id=0
    cur.execute(sql, (person_id, visit_concept_id, visit_start_date, visit_end_date,visit_type_concept_id))
    conn.commit()

In [None]:
vocab_uploader-1  | CREATE TABLE IF NOT EXISTS omopcdm.EPISODE
vocab_uploader-1  | (
vocab_uploader-1  |     episode_id                bigint      NOT NULL,
vocab_uploader-1  |     person_id                 bigint      NOT NULL,
vocab_uploader-1  |     episode_concept_id        integer     NOT NULL,
vocab_uploader-1  |     episode_start_date        date        NOT NULL,
vocab_uploader-1  |     episode_object_concept_id integer     NOT NULL,
vocab_uploader-1  |     episode_type_concept_id   integer     NOT NULL,
vocab_uploader-1  | );

### PatientFollowUp to EPISODE table

episode_object_concept_id???
I believe maybe 32528, 32677, 32529 as episode_concept_id
episode_type_concept_id???

In [None]:
sql = """
    INSERT INTO omopcdm.episode (person_id, episode_concept_id, episode_start_date,episode_end_date,episode_object_concept_id)
    VALUES (%s, %s, %s, %s,%s)
"""
#Episode concept id=vocabulary from event type
episode_object_concept_id=0
df_tables=df_patients_FollowUp_IDEA4RC

for idx, row in df_tables.iterrows():
    person_id = row['patient']
    episode_concept_id=0
    episode_start_date=row['patientFollowUpDate']
    episode_end_date=row['lastContact']
    episode_object_concept_id=0
    cur.execute(sql, (person_id, episode_concept_id, episode_start_date,episode_end_date,episode_object_concept_id))
    conn.commit()
cur.close()
conn.close()

### PatientFollowUp to OBSERVATION table

I do not believe this is okay because it does not make sense, and from OMOP to IDEA we will have to know when an observation=New cancer topography but how?

In [None]:
sql="""
    INSERT INTO omopcdm.observation (person_id, observation_concept_id, observation_date, observation_type_concept_id)
    VALUES (%s, %s, %s,%s,%s,%s,%s)
    """

observation_type_concept_id=37117814

for idx, row in df_tables.iterrows():
    person_id = row['patient']
    observation_date=row['dateOfNewCancerDiagnosis']
    observation_concept_id=row['newCancerDiagnosis']
    cur.execute(sql,(person_id, observation_concept_id, observation_date, observation_type_concept_id))
    conn.commit()

### New cancer topography???????

### Patient Follow Up to Death

If we know the cause of death, what code do we use?

In [1]:
query = """
    SELECT p.date 
    FROM patient_follow_up p
    WHERE p.patient = %s 
    AND p.date = (
        SELECT MAX(a.date) 
        FROM patient_follow_up a
        WHERE a.patient = %s
    );
"""
sql="""
    INSERT INTO omopcdm.death (person_id, death_date)
    VALUES (%s, %s)
    """
sqlKD="""
    INSERT INTO omopcdm.death (person_id, death_date, cause_concept_id)
    VALUES (%s, %s, %s)
    """

for idx, row in df_tables.iterrows():
    status=row['statusAtLastFollowUp']
    person_id = row['patient']
    curIDEA.execute(query, (row['patient']))
    res=curIDEA.fetchone()
    death_date=res
    if status == "Dead of Unknown Cause (DUC)" or status == "Dead of Other Cause (DOC)":
        cur.execute(sql,(person_id, death_date))
        conn.commit()
    elif status == "Dead of Disease (DOD)":   
        cause_concept_id=0 #WHAT SHOULD THE CODE BE?
        cur.execute(sql,(person_id, death_date,cause_concept_id))
        conn.commit()
    

SyntaxError: incomplete input (3527740831.py, line 1)

The history saving thread hit an unexpected error (OperationalError('attempt to write a readonly database')).History will not be written to the database.
