# Minimal JupyterLabs Notebook Example

***
This notebook is delivered "As-Is". Notwithstanding anything to the contrary, DNAnexus will have no warranty, support, liability or other obligations with respect to Materials provided hereunder.

[MIT License](https://github.com/dnanexus/OpenBio/blob/master/LICENSE.md) applies to this notebook.
***

This is a minimal notebook that shows how to download data into the JupyterLab VM, load it, and do something with it.
This notebook is used for `Using JupyterLab on UKB Research Analysis Platform - Part 1` webinar.

## Jupyterlab app details (launch configuration) 
### Recommended configuration
- runtime: < 10 min
- cluster configuration: `single node`
- app: `dxjupyterlab`
- recommended instance: `mem1_ssd1_v2_x4`
- cost: < £0.05

Load packages as usual using `import`.

In [None]:
import pandas as pd

*Note: in order to run this code you should create file `results.csv` and put it into the `Data` folder. To create this file use [Table Exporter App](https://dnanexus.gitbook.io/uk-biobank-rap/working-on-the-research-analysis-platform/accessing-phenotypic-data-as-a-file#creating-a-tsv-or-csv-file-using-table-exporter) or retrieve fields using [Spark Jupyter Notebook](https://dnanexus.gitbook.io/uk-biobank-rap/getting-started/working-with-ukb-data#analyzing-tabular-data-using-spark-in-jupyterlab). For purposes of this notebook, your dataset should have the following column: `Year of birth (p34)`. `eid` column should be added automatically during data export into the file.*


## 1. Loading project data into a pandas dataframe

### 1a. Access data file and load it in Python using `pandas.read_csv()`

Using `dxfuse` filesystem by adding `/mnt/project/` to the filepath, e.g. `/mnt/project/Data/results.csv`. You can use `dxfuse` for read-only.

In [None]:
data = pd.read_csv("/mnt/project/Data/results.csv")

### 1b. Download data to Jupyterlab storage using `dx download` and load it locally using `pandas.read_csv()`

Use `%%bash` in the cell to run `dx download` command-line utility.

In [None]:
%%bash
dx download "/Data/results.csv"

In [None]:
data = pd.read_csv("results.csv")

## 3. Perform analysis, get tables and graphs, etc.

Here we want to filter and save 5 percent of the youngest participant base on `Year of birth` (`p34`) field.

In [None]:
data.describe()

In [None]:
data.p34.quantile(0.95)

In [None]:
data_young = data[data.p34 > data.p34.quantile(0.95)]
data_young.describe()

In [None]:
data_young.shape

In [None]:
data_young.to_csv('results_95_perc.csv')

## 4. Upload results to project storage using `dx upload`

Use `%%bash` in the cell to run `dx upload` command-line utility and specify file uploading path (`--path`).

In [None]:
%%bash
dx upload results_95_perc.csv --path /Data/