## Tutorial Prerequisites:

It's recommended you have an empty dataset in GCP to play around with the FDMBuilder tools - so either create one now in preparation, or ask someone with the relevant priviledges to make one for you if you can't.

## Quick Jupyter notebooks primer

A jupyter notebook allows you to write markdown and execute python/R script in one document. 

This text has been written in a markdown cell (double click right here and you'll be able to edit the markdown). Running a markdown cell renders it (so displaying the markdown in it's non-scripted format).

Immediately below this is the first code cell - it contains script that imports all of the required python libraries to run the FDMBuilder. Any output from a code cell will be displayed immediately below the cell.

Be sure to write text and documentation in a markdown cell, and script in a code cell - otherwise you'll get some pretty colourful errors!

There are a bunch of controls to manage each cell in the notebook: the UI has buttons above that can run a code cell, change a code cell to a markdown cell or visa-versa, stop execution of a code cell, execute every cell in the notebook, and so on... Hover over each of the buttons above to see what they do. You can also perform all cell-related activities by selecting the `Cell` menu in the toolbar and choosing the relevant option.

However, hotkeys are usually the easiest way to quickly run code cells (and render markdown). Simply select a code cell and:

* press `ctrl+enter` to run the cell 
* press `shift+enter` to run the cell and move focus to the cell below
* press `ctrl+shift+enter` to run the cell and create a new code cell below

That should be enough to get started - plenty of other online guides exist if you want to get better acquainted with the jupyter notebook environment.

Get started by running the below code cell, which imports all the required python libraries for the FDMBuilder:

In [15]:
from FDMBuilder.FDMTable import *
from FDMBuilder.FDMDataset import *
from FDMBuilder.testing_helpers import *

## FDM Builder - The basics

Note: This guide assumes you're familiar with the term FDM and associated concepts.

The FDMBuilder library has been designed with the hope that a non-python user shouldn't (hopefully) have too much difficulty using the FDM tools to build a dataset from scratch. The workflow is split into two major steps:

1. Prepare the source tables
2. Build the FDM

Each step comes with it's own tool or helper that walks through the process of preparing and bulding an FDM dataset. Source tables are "built" or prepared for the FDM process with the `FDMTable` tool - this is a python "class" that contains all the bits and pieces needed to clean and prep a table for FDMing. Once all the source tables are ready, the FDM dataset itself is "built" using the `FDMDataset` tool - another python class responsible for drawing all the source tables together and building the standard FDM tables (person and observation_period).

We'll begin with the basics of using the FDMTable and FDMDataset tools to buld an FDM dataset. Once we're more comfortable with the python workflow, we can then move onto the more "advanced" functions that can streamline many of the more common cleaning/manipulation activities that pop up during the FDM process.

## FDMTable

To begin the FDM process, we need to prep each source table. This process ensures that:

1. The source table is copied to the FDM dataset location
2. person_ids are added to each entry
3. An event_start_date is added to each entry in a cleaned `DATETIME` format
4. If needed an event_end_date is added to each entry in a cleaned `DATETIME` format

To do this using the python FDMBuilder, you first need to define an individual FDMTable object for each of the source tables in your FDM dataset. Lets look at an example:

In [3]:
eg_table = FDMTable(
    source_table_id="CY_STAGING_DATABASE.src_WH_BRI_tbl_maternity_pathway_antenatal",
    dataset_id="CY_TEST_SR"
)

The above code cell creates a new FDMTable object and stores it as `eg_table` - the arguments when creating or initialising an FDMTable are:

* `source_table_id`: the id of the source table (hopefully that wasn't a surprise!). This can be in "project.dataset_id.table_id" form or just "dataset_id.table_id" form
* `dataset_id`: the id of the dataset in which you'll be copying/building your FDM dataset 

    Note: you'll need to change "CY_TEST_SR" to your own dataset - you may want to change the source table too if you don't have access to CY_STAGING_DATABASE

Initialising an FDMTable doesn't actually do anything in GCP. For that you need to call one of the FDMTable's "methods". Methods are functions attached to a specific class, that update/manipulate/otherwise mess about with the related class. So, the FDMTable class has methods that do things like add columns to the associated table, delete columns, rename columns etc. etc.

To start, we'll look at the most of important of these methods `build` - fortunately it's also the easiest to get to grips with. Methods are called by specifying the class object, followed by a `.` and then the name of the method. So we call the `build` method on the above FDMTable we just defined by running:

```
eg_table.build()
```

The `build` method is designed to walk the user through the process of preparing an FDM table, stopping each time user input is required. Each time the script stops, it will give a short explanation why and will ask for input with a bit of guidance on the input required.

Give it a try! Run the below cell to build your first FDM table:

In [4]:
eg_table.build()

	 ##### BUILDING FDM TABLE COMPONENTS FOR src_WH_BRI_tbl_maternity_pathway_antenatal #####
________________________________________________________________________________

1. Copying src_WH_BRI_tbl_maternity_pathway_antenatal to CY_TEST_SR:

    Looks like something went wrong! Likely culprits are:"
    
    1. You misspelled either the source table location or dataset id: 
    
        Source table location - "yhcr-prd-phm-bia-core.CY_STAGING_DATABASE.src_WH_BRI_tbl_maternity_pathway_antenatal" 
        Dataset id - "CY_TEST_SR" 
        
    If so, just correct the spelling error and then re-initialise.
    
    2. The dataset CY_TEST_SR doesn't exist yet
    
    If so, and you have the relevant permissions, you can create a new dataset
    using an FDMDataset object and .create_dataset(), or just use GCP.
    Otherwise, if you don't have the necessary permissions, have a word with  
    the CYP data team and have them create you a dataset.
    
    Note: DO NOT CONTINUE TO USE THI

NotFound: 404 Not found: Dataset yhcr-prd-phm-bia-core:CY_TEST_SR was not found in location europe-west2

Location: europe-west2
Job ID: 30b96464-d6ea-444d-91f5-992d02a81cfd
