# Getting Started

The first thing you need to create our virutal environment. Let's let VSCode do this for us. Click 'Select Kernel' in the upper right hand and do 'New Python Environment'. You can use a python enviornment. We can use PIP or Anaconda.

### Anaconda

You should use Aanaconda if you plan to do future research that involves informatics. It is a package manager with better support for ensuring your python package dependencies, ensuring they work more consistently; however, [PIP](#pip) is easier to use out of the box without additional installation.

If you're using Anaconda and do not have it yet, download it from here [https://www.anaconda.com/download/success](https://www.anaconda.com/download/success).

### PIP

If you installed Anaconda, skip this section. Otherwise we'll be using PIP. It's a bit easier at first but is more brittle and not as good a choice for research. We will need to set up a virtual environment.

## Setting up the Virtual Enviornment

It is best practice to set up a 'virtual environment' where python modules will be installed in an isolated space on your computer that we can eventually delete without affecting your computer in a larger scale. 

In the menu bar in VS Code, click `Terminal` and `New Terminal`, then insert this shell script.

### Windows 
```sh
.\configure.bat
```
### MacOS or Linux
```sh
chmod +x ./configure.sh # Lets us use the install script
./configure.sh # Runs the script
```
The path to the virtual environment kernel will be printed at the end of this.

Now we should start using the new virtual enviornment. In the upper right hand of this document click `Select Kernel` and use the path provided.

### Turn on AutoReload
This ensures that changes to the modules we write in the `./src/` directory are reloaded by this notebook each time. Otherwise your changes will be ignored.

In [None]:
from IPython.extensions import autoreload
%load_ext autoreload
%autoreload 2

# Our Lab Informatics System Database Schema

Here is the Database Schema for our example laboratory informatics system (LIS). 

![Reload if you can't see this](LIS.svg)

# Importing our Python Source

Instead of defining all of our functions within the notebook, we will define our functions outside of the notebook to make the project modular. 

We will import the source from the src directory:
```py
from src.qc import print_sql_table, get_qc_data
```

After this we can start using the data.

# Getting the Data
Unlike the work we did in our [Google Colab Notebook](https://colab.research.google.com/drive/1oskiQwGFTW1RC8T28wah4uTfixJ3TNEv?authuser=1), we will use [Pandas](https://pandas.pydata.org/getting_started.html) for data frame maniputation. Essentially, instead of iterating through rows manually like I taught you, Pandas will do most of that for us. [Please quickly learn about Pandas here](https://www.w3schools.com/python/pandas/default.asp).

In [None]:
# Lets test our import and execute a simple query to see if it works
from src.qc import print_sql_table, get_qc_data
print_sql_table("SELECT * from analyzers LIMIT 5;")

In [None]:
from src.qc import get_data
# Instead of printing the table, let's get it as a DataFrame and print that
data_frame = get_data("SELECT * from demographics LIMIT 5;")
print(data_frame)

# You should be able to see data_frame in 'Data Wrangler'
# It can be viewed like a spreadsheet, and you can also see the data types of each column.
# This is useful for understanding the structure of the data and for debugging any issues with data types when performing analysis or visualization later on.

# Visualizing the Data

We will import some visualization source from the src directory:
```py
from src.visualization import generate_QC_graph
```

You will see if you look within the src director that I have already defined the function ```generate_QC_graph```. Let's use that now to see both the patient and QC Data:

In [None]:
from src.visualization import generate_QC_graph

# Get the data
df_sodium = get_qc_data('Sodium, Plasma')
fake_rr_sodium_lower, fake_rr_sodium_upper = 136.0, 146.0
generate_QC_graph(df_sodium, 'Sodium, Plasma', 
                  mean=141.0, sd=2.5, 
                  ref_lower=fake_rr_sodium_lower, 
                  ref_upper=fake_rr_sodium_upper,
                  box_color='lightblue')

# For Potassium
df_potassium = get_qc_data('Potassium, Serum')
fake_rr_potassium_lower, fake_rr_potassium_upper = 3.5, 5.0
generate_QC_graph(df_potassium, 'Potassium, Serum', 
                  mean=4.2, sd=0.3, 
                  ref_lower=fake_rr_potassium_lower, 
                  ref_upper=fake_rr_potassium_upper,
                  box_color='lightgreen')