[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/CO-CONNECT/co-connect-tools/HEAD)


## Installing

The best way is to install the module via `pip`. 

In [None]:
!pip3 install co-connect-tools -q

## Loading the Rules

In [1]:
import coconnect.tools
import json

rules = coconnect.tools.load_json('test/rules/rules_14June2021.json')
print(json.dumps(rules, indent=2)[0:500])

{
  "metadata": {
    "date_created": "2021-06-14T15:27:37.123947",
    "dataset": "Test"
  },
  "cdm": {
    "observation": [
      {
        "observation_concept_id": {
          "source_table": "Demographics.csv",
          "source_field": "ethnicity",
          "term_mapping": {
            "Asian": 35825508
          }
        },
        "observation_datetime": {
          "source_table": "Demographics.csv",
          "source_field": "date_of_birth"
        },
        "observation_source_co


## Loading the input data

A convienience function is available to create a map between a file name and a file path for all files in a directory:

In [2]:
f_map = coconnect.tools.get_file_map_from_dir('test/inputs/')
f_map

{'Symptoms.csv': '/Users/calummacdonald/Usher/CO-CONNECT/Software/clean/co-connect-tools/coconnect/data/test/inputs/Symptoms.csv',
 'Covid19_test.csv': '/Users/calummacdonald/Usher/CO-CONNECT/Software/clean/co-connect-tools/coconnect/data/test/inputs/Covid19_test.csv',
 'covid19_antibody.csv': '/Users/calummacdonald/Usher/CO-CONNECT/Software/clean/co-connect-tools/coconnect/data/test/inputs/covid19_antibody.csv',
 'vaccine.csv': '/Users/calummacdonald/Usher/CO-CONNECT/Software/clean/co-connect-tools/coconnect/data/test/inputs/vaccine.csv',
 'Demographics.csv': '/Users/calummacdonald/Usher/CO-CONNECT/Software/clean/co-connect-tools/coconnect/data/test/inputs/Demographics.csv'}

use the `f_map` to load all the inputs into a map between the file name and a dataframe object

In [3]:
inputs = coconnect.tools.load_csv(f_map)

## Creating a CDM 

In [4]:
from coconnect.cdm import CommonDataModel

cdm = CommonDataModel(name=rules['metadata']['dataset'],
                      inputs=inputs,
                      output_folder='output_dir/')
cdm

[32m2021-06-15 16:19:01[0m - [34mCommonDataModel[0m - [1;37mINFO[0m - CommonDataModel created


<coconnect.cdm.model.CommonDataModel at 0x120b52400>

## Adding CDM Objects to the CDM

Loop over all the rules, creating and adding a new CDM object (e.g. Person) to the CDM

In [5]:
from coconnect.cdm import get_cdm_class
from coconnect.tools import apply_rules

for destination_table,rules_set in rules['cdm'].items():
    for i,rules in enumerate(rules_set):
        obj = get_cdm_class(destination_table)()
        obj.set_name(f"{destination_table}_{i}")
        apply_rules(cdm,obj,rules)
        cdm.add(obj)

[32m2021-06-15 16:19:01[0m - [34mCommonDataModel[0m - [1;37mINFO[0m - Added observation_0 of type observation
[32m2021-06-15 16:19:01[0m - [34mCommonDataModel[0m - [1;37mINFO[0m - Added observation_1 of type observation
[32m2021-06-15 16:19:01[0m - [34mCommonDataModel[0m - [1;37mINFO[0m - Added observation_2 of type observation
[32m2021-06-15 16:19:01[0m - [34mCommonDataModel[0m - [1;37mINFO[0m - Added observation_3 of type observation
[32m2021-06-15 16:19:01[0m - [34mCommonDataModel[0m - [1;37mINFO[0m - Added observation_4 of type observation
[32m2021-06-15 16:19:01[0m - [34mCommonDataModel[0m - [1;37mINFO[0m - Added observation_5 of type observation
[32m2021-06-15 16:19:01[0m - [34mCommonDataModel[0m - [1;37mINFO[0m - Added condition_occurrence_0 of type condition_occurrence
[32m2021-06-15 16:19:01[0m - [34mCommonDataModel[0m - [1;37mINFO[0m - Added person_0 of type person
[32m2021-06-15 16:19:01[0m - [34mCommonDataModel[0m - [1;37

see what objects we have created..

In [6]:
cdm.objects()

{'observation': {'observation_0': <coconnect.cdm.objects.observation.Observation at 0x120ba04f0>,
  'observation_1': <coconnect.cdm.objects.observation.Observation at 0x107fe61c0>,
  'observation_2': <coconnect.cdm.objects.observation.Observation at 0x120ba8610>,
  'observation_3': <coconnect.cdm.objects.observation.Observation at 0x120ba8550>,
  'observation_4': <coconnect.cdm.objects.observation.Observation at 0x120b52ca0>,
  'observation_5': <coconnect.cdm.objects.observation.Observation at 0x120badfd0>},
 'condition_occurrence': {'condition_occurrence_0': <coconnect.cdm.objects.condition_occurrence.ConditionOccurrence at 0x120badeb0>},
 'person': {'person_0': <coconnect.cdm.objects.person.Person at 0x120badf40>,
  'person_1': <coconnect.cdm.objects.person.Person at 0x120bb2cd0>}}

## Process The CDM

In [7]:
cdm.process()

[32m2021-06-15 16:19:03[0m - [34mCommonDataModel[0m - [1;37mINFO[0m - Starting processing in order: ['person', 'observation', 'condition_occurrence']
[32m2021-06-15 16:19:03[0m - [34mCommonDataModel[0m - [1;37mINFO[0m - Number of objects to process for each table...
{
      "observation": 6,
      "condition_occurrence": 1,
      "person": 2
}
[32m2021-06-15 16:19:03[0m - [34mCommonDataModel[0m - [1;37mINFO[0m - for person: found 2 objects
[32m2021-06-15 16:19:03[0m - [34mCommonDataModel[0m - [1;37mINFO[0m - working on person
[32m2021-06-15 16:19:03[0m - [34mCommonDataModel[0m - [1;37mINFO[0m - finished person_0 ... 0/2, 6 rows
[32m2021-06-15 16:19:03[0m - [34mCommonDataModel[0m - [1;37mINFO[0m - finished person_1 ... 1/2, 4 rows
[32m2021-06-15 16:19:03[0m - [34mCommonDataModel[0m - [1;37mINFO[0m - Merging 2 objects for person
[32m2021-06-15 16:19:03[0m - [34mCommonDataModel[0m - [1;37mINFO[0m - finalised person
[32m2021-06-15 16:19:03[

[32m2021-06-15 16:19:04[0m - [34mCommonDataModel[0m - [1;37mINFO[0m - saving condition_occurrence to output_dir//condition_occurrence.csv
[32m2021-06-15 16:19:04[0m - [34mCommonDataModel[0m - [1;37mINFO[0m -                          person_id  condition_concept_id  \
condition_occurrence_id                                    
1                              101                254761   
2                              102                254761   
3                              103                254761   
4                              105                254761   

                        condition_start_datetime condition_end_datetime  \
condition_occurrence_id                                                   
1                            2020-11-15 00:00:00    2020-11-15 00:00:00   
2                            2020-01-04 00:00:00    2020-01-04 00:00:00   
3                            2020-03-27 00:00:00    2020-03-27 00:00:00   
4                            2020-07-27 00:0

## Inspect Outputs

In [8]:
cdm.keys()

dict_keys(['person', 'observation', 'condition_occurrence'])

In [9]:
cdm['person'].dropna(axis=1,how='all')

Unnamed: 0_level_0,gender_concept_id,birth_datetime,person_source_value,gender_source_value,gender_source_concept_id,race_source_value,ethnicity_source_value
person_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
101,8507,1951-12-25 00:00:00,,M,8507,,
102,8507,1981-11-19 00:00:00,,M,8507,,
103,8532,1997-05-11 00:00:00,,F,8532,,
104,8532,1975-06-07 00:00:00,,F,8532,,
105,8532,1976-04-23 00:00:00,,F,8532,,
106,8507,1966-09-29 00:00:00,,M,8507,,
107,8532,1956-11-12 00:00:00,,F,8532,,
108,8507,1985-03-01 00:00:00,,M,8507,,
109,8532,1950-10-31 00:00:00,,F,8532,,
110,8532,1993-09-07 00:00:00,,F,8532,,


In [10]:
cdm['observation'].dropna(axis=1,how='all')

Unnamed: 0_level_0,person_id,observation_concept_id,observation_datetime,value_as_string,observation_source_value,observation_source_concept_id,unit_source_value,qualifier_source_value
observation_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,107,35825508,1956-11-12 00:00:00,,Asian,35825508,,
2,104,35825531,1975-06-07 00:00:00,,Bangladeshi,35825531,,
3,103,35826241,1997-05-11 00:00:00,,Indian,35826241,,
4,101,35827394,1951-12-25 00:00:00,,White,35827394,,
5,105,35827394,1976-04-23 00:00:00,,White,35827394,,
6,110,35827394,1993-09-07 00:00:00,,White,35827394,,
7,102,35825567,1981-11-19 00:00:00,,Black,35825567,,
8,106,35825567,1966-09-29 00:00:00,,Black,35825567,,
9,108,35827395,1985-03-01 00:00:00,,White and Asian,35827395,,


In [11]:
cdm['condition_occurrence'].dropna(axis=1,how='all')

Unnamed: 0_level_0,person_id,condition_concept_id,condition_start_datetime,condition_end_datetime,stop_reason,condition_source_value,condition_source_concept_id,condition_status_source_value
condition_occurrence_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,101,254761,2020-11-15 00:00:00,2020-11-15 00:00:00,,Y,254761,
2,102,254761,2020-01-04 00:00:00,2020-01-04 00:00:00,,Y,254761,
3,103,254761,2020-03-27 00:00:00,2020-03-27 00:00:00,,Y,254761,
4,105,254761,2020-07-27 00:00:00,2020-07-27 00:00:00,,Y,254761,
