[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/CO-CONNECT/co-connect-tools/HEAD)


## Installing

The best way is to install the module via `pip`. 

In [None]:
!pip3 install co-connect-tools -q

In [1]:
import coconnect.tools
data = coconnect.tools.load_json('example/sample_config/panther_structural_mapping.json')
coconnect.tools.extract.make_class(data,'Panther')

Recreating file /Users/calummacdonald/Usher/CO-CONNECT/Software/docs/docs/co-connect-tools/coconnect/cdm/classes/Panther.py


In [2]:
print (coconnect.tools.get_classes(format=True))

{
      "Panther": {
            "module": "coconnect.cdm.classes.Panther",
            "path": "/Users/calummacdonald/Usher/CO-CONNECT/Software/docs/docs/co-connect-tools/coconnect/cdm/classes/Panther.py",
            "last-modified": "2021-03-31 11:57:21"
      }
}


In [5]:
f_map = coconnect.tools.get_file_map_from_dir('example/sample_input_data/')
f_map

{'tracker.csv': '/Users/calummacdonald/Usher/CO-CONNECT/Software/docs/docs/co-connect-tools/coconnect/data/example/sample_input_data/tracker.csv',
 'demographic.csv': '/Users/calummacdonald/Usher/CO-CONNECT/Software/docs/docs/co-connect-tools/coconnect/data/example/sample_input_data/demographic.csv',
 'questionnaire.csv': '/Users/calummacdonald/Usher/CO-CONNECT/Software/docs/docs/co-connect-tools/coconnect/data/example/sample_input_data/questionnaire.csv'}

In [6]:
inputs = coconnect.tools.load_csv(f_map)

NameError: name 'pd' is not defined

In [None]:
from coconnect.cdm.classes.Panther import Panther
panther = Panther(inputs=inputs)

In [None]:
panther.process()

In [None]:
panther.omop['person']

## Load Inputs

To run the tool you need to load some input datasets, and specify how to map the fields 

The data will be loaded into pandas dataframes that we'll use for some visualisations of what the input `csv` files will look like

### Source data

This data is synthetic data that has been produced by [OHDSI](http://ohdsi.org/) which simply details a record of patients.

_Note: that these example data files will be stored in `<install_dir>/lib/python3.8/site-packages/coconnect/`, a directory that `ETLTool` will be looking in. For your own files, you should specify the full path to the inputs_

In [None]:
f_input_data = 'sample_input_data/patients_sample.csv'
etltool.load_input_data([f_input_data])

Verify what files have been loaded, by default the input dataset is mapped to to a name via `/path/<name>.csv`

In [None]:
etltool.get_input_names()

Sample (3 entries) what this input data looks like. __Note__ becareful using this method with a large dataset

In [None]:
df_input = etltool.get_input_df('patients_sample.csv')
df_input.sample(3)

### Structural Mapping

Next we use another `csv` file to define how to map different fields in the source data to a [Common Data Model (CDM)](https://www.ohdsi.org/data-standardization/the-common-data-model/).

In this example, the CDM that the source data (`patients_sample`)  is being mappped to is the `person` CDM.

There are three rules defined:
1. Performs a straight one-to-one mapping between the field `id` in the source field to the `person_id` field of the `person` CDM
2. Performs a mapping with the operation/function `extract year` 
3. Performs a term mapping which is defined in the term mapping `csv` file, see the next section for more information 


In [None]:
f_structural_mapping = 'sample_input_data/rules1.csv'
etltool.load_structural_mapping(f_structural_mapping) 
etltool.get_structural_mapping_df()

#### Testing operations
The 2nd rule defined the operation `extract year`, this is a default operation defined in `etltool`, here is a quick example of how it works..

Load the function

In [None]:
fn_extract_year = etltool.allowed_operations['EXTRACT_YEAR']
fn_extract_year

For example, taking the `BIRTHDATE` columns, which looks like:

In [None]:
df_input['BIRTHDATE'].head(4)

The function can be used to easily extract the year from the date

In [None]:
fn_extract_year(df_input.head(4),column='BIRTHDATE')

### Term Mapping

In the term mapping, the structural mapping `rule_id` is mapped telling us how to map a source term to a destination term, i.e. if the source term is `M` then the output should be `8507`

In [None]:
f_term_mapping = 'sample_input_data/rules2.csv'
etltool.load_term_mapping(f_term_mapping)
etltool.get_term_mapping_df()

### Run the tool

In [None]:
etltool.run()

We can finally get the output in a dataframe

In [None]:
etltool.get_output_df('person')