# FDMBuilder - The Extras

## BigQuery Cell Magics

The first tool isn't anything the FDM pipeline can take credit for. Packaged in the python bigquery library is a "cell magic", that allows you to run pure SQL queries directly from a Jupyter notebook cell. A cell magic is a little bit of syntax that changes or adds extra functionality to a notebook cell - the general syntax of a cell magic is `%%magic-name`, so the bigquery cell magic is `%%bigquery`. Add that to the top of a cell, write your SQL below, and juypter/python will do the rest. Give the below a try:

In [9]:
%%bigquery
SELECT *
FROM `CY_FDM_MASTER.person`
LIMIT 10

Query complete after 0.00s: 100%|██████████| 1/1 [00:00<00:00, 504.67query/s] 
Downloading: 100%|██████████| 10/10 [00:00<00:00, 12.02rows/s]


Unnamed: 0,person_id,gender_concept_id,year_of_birth,month_of_birth,day_of_birth,birth_datetime,death_datetime,race_concept_id,ethnicity_concept_id,location_id,provider_id,care_site_id,person_source_value,gender_source_value,gender_source_concept_id,race_source_value,race_source_concept_id,ethnicity_source_value,ethnicity_source_concept_id
0,10871865,45454912,2016,1,15,2016-01-15,NaT,0,0,,,,10871865,Male,45454912,British,0,,0
1,10877333,45454912,2016,1,15,2016-01-15,NaT,0,0,,,,10877333,Female,45454912,British,0,,0
2,10861223,45454912,2010,1,15,2010-01-15,NaT,0,0,,,,10861223,Female,45454912,British,0,,0
3,10855850,45454912,2016,1,15,2016-01-15,NaT,0,0,,,,10855850,Female,45454912,British,0,,0
4,10874629,45454912,2010,1,15,2010-01-15,NaT,0,0,,,,10874629,Male,45454912,British,0,,0
5,10855432,45454912,2016,1,15,2016-01-15,NaT,0,0,,,,10855432,Female,45454912,British,0,,0
6,10861024,45454912,2016,1,15,2016-01-15,NaT,0,0,,,,10861024,Male,45454912,British,0,,0
7,10874854,45454912,2010,1,15,2010-01-15,NaT,0,0,,,,10874854,Female,45454912,British,0,,0
8,10869527,45454912,2016,2,15,2016-02-15,NaT,0,0,,,,10869527,Female,45454912,British,0,,0
9,10856693,45454912,2010,2,15,2010-02-15,NaT,0,0,,,,10856693,Female,45454912,British,0,,0


Easy. 

For those familiar with the pandas library, you can store the results of your query as a `pandas.DataFrame` by naming it immediately after the `%%bigquery` magic. So the following cell runs the same query as above, and stores the result in `eg_df`:

In [10]:
%%bigquery eg_df
SELECT *
FROM `CY_FDM_MASTER.person`
LIMIT 10

Query complete after 0.00s: 100%|██████████| 1/1 [00:00<00:00, 355.15query/s] 
Downloading: 100%|██████████| 10/10 [00:00<00:00, 15.11rows/s]


In [11]:
eg_df

Unnamed: 0,person_id,gender_concept_id,year_of_birth,month_of_birth,day_of_birth,birth_datetime,death_datetime,race_concept_id,ethnicity_concept_id,location_id,provider_id,care_site_id,person_source_value,gender_source_value,gender_source_concept_id,race_source_value,race_source_concept_id,ethnicity_source_value,ethnicity_source_concept_id
0,10871865,45454912,2016,1,15,2016-01-15,NaT,0,0,,,,10871865,Male,45454912,British,0,,0
1,10877333,45454912,2016,1,15,2016-01-15,NaT,0,0,,,,10877333,Female,45454912,British,0,,0
2,10861223,45454912,2010,1,15,2010-01-15,NaT,0,0,,,,10861223,Female,45454912,British,0,,0
3,10855850,45454912,2016,1,15,2016-01-15,NaT,0,0,,,,10855850,Female,45454912,British,0,,0
4,10874629,45454912,2010,1,15,2010-01-15,NaT,0,0,,,,10874629,Male,45454912,British,0,,0
5,10855432,45454912,2016,1,15,2016-01-15,NaT,0,0,,,,10855432,Female,45454912,British,0,,0
6,10861024,45454912,2016,1,15,2016-01-15,NaT,0,0,,,,10861024,Male,45454912,British,0,,0
7,10874854,45454912,2010,1,15,2010-01-15,NaT,0,0,,,,10874854,Female,45454912,British,0,,0
8,10869527,45454912,2016,2,15,2016-02-15,NaT,0,0,,,,10869527,Female,45454912,British,0,,0
9,10856693,45454912,2010,2,15,2010-02-15,NaT,0,0,,,,10856693,Female,45454912,British,0,,0


If so inclined, you can run and document your SQL pipelines in a notebook by using the above cell magics, and then documenting your work in markdown text cells (like this). Be sure to only stick SQL cells marked with the `%%bigquery` magic - everything inside these cells is interpreted as SQL, so you'll get some pretty colourful errors if you try sticking python in there too.

Now on to the extra bits of the FDM pipeline. 

## FDMTable Helpers

The `FDMTable` comes with a bunch of extra "methods" that quicken up some of the more "fiddly" bits of the BigQuery SQL environment. We'll be using the last of the test tables `test_table_3` to try out these new helpers. As we did in our examples above, we'll start by initialising an `FDMTable` object for `test_table_3`:

### copy_table_to_dataset

You may have noticed the first stage of the table `.build()` process copying the source table into the FDM dataset. This doesn't happen automatically. When you initialise a new `FDMTable`, you'll need to add a copy to the new FDM dataset before you can use any of the below helper functions. Otherwise, you'd be messing about with the original copy of the source data, whichs is a big no-no! 

You can quickly copy over your table to your FDM dataset like so: 

### add_column

### drop_column

### rename_columns

### head

### quick_build

### recombine

## FDMDataset Helpers

### create_dataset

## Other Helpers

not attached to the table/dataset objects

### check_dataset_exists / check_table_exists

### clear_dataset

## Example Workflow

Some examples of using helper functions more fluidly