# Part 1: Python Introduction

This notebook is part of the **Radiotherapy image data analysis using Python** Workshop at ASMIRT 2023 in Sydney, Australia.

In this part you will learn the basics around:
- Python syntax
- Python data types (String, Numbers, Lists and Dictionaries)
- Installing Python libraries
- Pandas DataFrames
- Plotting

## Python Syntax

First, let's take a look at some basic Python syntax.

In [None]:
# Comment you code by using the # symbol

# Print some text using the print function
print("Hello Python!")

# Define variables using the following syntax
my_lucky_number = 42
# Tip: Variable names are typically define all in lowercase and use underscores (_) to separate
# words. You can't use spaces in variable names.

# In Python, indentation of lines is used to nest blocks of code together. In the following
# example, the value is only printed if your lucky number is over 21. Take note of the indentation.

if my_lucky_number > 21:
    print(f"My lucky number is {my_lucky_number}")

## Python Data Types

Your Python variables can store all sorts of data objects. In this example we'll see how some basic
data types are stored in Python.

In [None]:
# We can store an integer (whole) number
my_lucky_number = 42

# Or a floating point number
my_unlucky_number = 13.13

# And we can perform mathematically operations on this
my_sum = my_lucky_number + my_unlucky_number

# Storing a string value is easy
my_string = "Hello Python"

# Let's print it all out
print(f"{my_string}, the sum of my lucky and unlucky numbers is {my_sum}")

In [None]:
# Lists can store a collection of data objects
my_list = [1, 2, 3, 4, 5]

# Access values in a list by using the index value at which they are stored
print(f"The third element of the list is {my_list[2]}")

# Or we can use a loop to print each element in the list
for number in my_list:
    print(f"Number: {number}")

# And dictionary objects can store collections that can be accessed using a key
my_dict = {
    "my_lucky_number": 42,
    "my_unlucky_number": 13.13
}

# Access values in a dictionary by using the key
print(f"My lucky number is: {my_dict['my_lucky_number']}")

In [None]:
my_list_of_dicts = [
    {"patient_id": "PAT-001", "treatment_site": "breast", "prescribed_dose": 42.5},
    {"patient_id": "PAT-002", "treatment_site": "breast", "prescribed_dose": 40},
    {"patient_id": "PAT-003", "treatment_site": "lung", "prescribed_dose": 60}
]

# We can print the entire list if we want
print(my_list_of_dicts)

## Installing Python Libraries

Python provides a tool named `pip` which can be used to install libraries provided by the
community. These libraries can provide all sorts of useful functionality and are mostly
open-source and completely free to use!

In this example we will install two popualr libraries, **pandas** used for working with tablular
data, and **seaborn** which is used for plotting.

In [None]:
! pip install pandas seaborn

## Pandas DataFrames

The **pandas** library we just installed is extrememly popular with data scientists since it
provides lots of functionality to store, manipulate and analyse data in tabular form.

Let's create a DataFrame using the list of dictionaries we created above.

In [None]:
# First, we must import the library. To do this we use the import keyword. Here we also give the
# library and optional alias by which we can refer to it in our code.
import pandas as pd
import seaborn as sns

In [None]:
# We can create a DataFrame object by using our list of dictionaries
df = pd.DataFrame(my_list_of_dicts)

# And we can display the DataFrame in a notebook inserting the name of the variable at the end of a
# cell.
df

In [None]:
# Now can perform operation like filtering the data
df[df.prescribed_dose<60]

## Spreadsheets and CSV's

Pandas is very versatile, you can actually import Excel spreadsheets and CSV files very easily into DataFrames using one line of code!

Let's grab some dummy RT data spreadsheets and CSV's first 

In [None]:
from pathlib import Path
import requests
import tempfile
import zipfile

For spreadsheet loading using Pandas, we require the `openpyxl` library


In [None]:
! pip install openpyxl

In [None]:
spreadsheet_zip_url = "https://unsw-my.sharepoint.com/:x:/g/personal/z5114185_ad_unsw_edu_au/EUtGdF21K6NCr4ZQc6ZmuUQB-amyHpVrJTV44b58HshHQg?download=1"

with tempfile.TemporaryDirectory() as temp_dir:
    temp_file = Path(temp_dir).joinpath("tmp.xlsx")
        
    data = requests.get(spreadsheet_zip_url)
    with open(temp_file, 'wb') as out_file:
        out_file.write(data.content)
    df_spreadsheet = pd.read_excel(temp_file)

df_spreadsheet.head()

In [None]:
csv_zip_url = "https://unsw-my.sharepoint.com/:x:/g/personal/z5114185_ad_unsw_edu_au/EfbB4MU25aBFnHrxV2j7aUwBGeMJQGbl_n6LwmzCbDLEAg?download=1"

with tempfile.TemporaryDirectory() as temp_dir:
    temp_file = Path(temp_dir).joinpath("tmp.csv")
        
    data = requests.get(csv_zip_url)
    with open(temp_file, 'wb') as out_file:
        out_file.write(data.content)
    df_csv = pd.read_csv(temp_file)

df_csv.head()

## Plotting

Finally, let's visualise this data by using the **seaborn** library. **seaborn** works well with
**pandas**, in this example we will produce a box plot by providing the `df_csv` DataFrame and
specifying which columns should be used to plot the x- and y-axis.

In [None]:
sns.boxplot(data=df_csv, y="centroid_z_cm", x="target_volume")

In [None]:
sns.boxplot(data=df_csv, y="centroid_z_cm", x="target_volume", hue="observer")

## Exercise

In the empty cells below, try adapting the code from above to produce some different plots.

In [None]:
# Produce a box plot with the maximum Hausdorff Distance on the y-axis, the Target Volume on the x-axis, split by different sequences



In [None]:
# Produce a scatter plot using the sns.scatterplot function, try using observer on the x-axis
# and dice_cofficient on the y-axis



In [None]:
# Produce a box plot using our simple data in df. Plot the prescribed dose on the y-axis and
# treatment site on the x-axis

