In [None]:
#| label: load-packages
#| include: false

# Load packages here
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import textwrap

In [None]:
#| label: setup
#| include: false
# Set up plot theme and figure resolution

In [None]:
#| label: load-data
#| include: false
# Load data in Python
url = 'data/drugs_dataset.csv'
drugs = pd.read_csv(url)

# Overview and Analysis

## Introduction {style="font-size: 0.8em;"}

- The evolution of drug development over time, including therapeutic and disease areas are gaining attention. This shift is driven by changing needs of the population, increasing scientific knowledge, and advances in technology.

- Drug development has become more focused on targeted therapies and personalized treatments, as well as more holistic approaches that target the underlying disease processes. The dataset of interest was collected by the European Medicines Agency. A comprehensive list of medicines is provided, including their categories, names, therapeutic areas, common names, active substances, and unique product numbers. This dataset contains 1988 rows and 28 columns, and its diverse nature, comprising numerical, categorical, and date variables, makes it an invaluable resource for pharmaceutical research, medical care, and government regulation.

## Question 2: What are the most recently released medicines (name and company) authorized for human usage for 'Hepatitis B'? {style="font-size: 0.8em;"}

Answering question 2 requires the use of the following variables: 'category' : The category (human or veterinary) of the medicine; 'medicine_name' : The brand name of the medicine; 'therapeutic_area' : List of therapeutic areas for which the medicine is authorized; 'authorisation_status' : The authorization status of the medicine; 'marketing_authorisation_holder_company_name' : The company holding the marketing authorization for the medicine; 'revision_date': The date of the latest revision for the medicine.
 
## Why We're Interested {style="font-size: 0.8em;"}

- Utilizing these variables, we are able to identify the latest medicines specifically designed for the treatment of hepatitis B, providing insight into recent developments in that field.
- Knowing the most recent treatments for Hepatitis B is essential for staying informed about recent advancements in treatment. As a result of the selection of variables, comprehensive information can be captured, including the name of the company that is responsible for marketing authorization as well as the revision date, providing a temporal context. By using this approach, we ensure a thorough exploration of recent releases, providing valuable insight into the pharmaceutical industry's response to Hepatitis B, and making sure that informed decisions are made within the healthcare and regulatory sectors.

---

## Approach {style="font-size: 0.8em;"}
- We employ a two-step strategy to identify the latest revisions to Hepatitis B medicines. In the initial stages of data preparation and filtering, we focus on human-specific Hepatitis B medicines and sort them by revision dates. The top 5 most recent medicines are then presented in a visual layout that includes custom column labels, color mapping, as well as essential details such as medicine names, company names, revision dates, etc. It features key information in a format that is easily accessible and engaging for the latest developments in Hepatitis B medicines.

## Result
- From our result, the most recently revised vaccine is Vaxelis by MCM Vaccine B.V. on the 20th os February 2023.


## Layouts

You can use plain text

::: columns
::: {.column width="40%"}
-   or bullet points[^1]
:::

::: {.column width="60%"}
or in two columns
:::
:::

[^1]: And add footnotes

-   like

-   this

## Code

In [None]:
<<<<<<< HEAD
# Load Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import textwrap

# Read in the data
url = 'data/drugs_dataset.csv'
drugs = pd.read_csv(url)

#drugs.head()

## Question 1 Solution

# Replace null values with empty string in 'therapeutic_area' column
drugs['therapeutic_area'].fillna('', inplace=True)

# Filter COVID vaccines based on 'therapeutic_area'
covid_vaccines = drugs[drugs['therapeutic_area'].str.contains('COVID', case=False)]

# Exclude medicines with conditions applied
covid_vaccines = covid_vaccines[~covid_vaccines['conditional_approval']]

# Filter the dataset to include only authorised medicines
covid_vaccines = covid_vaccines[covid_vaccines['authorisation_status'] == 'authorised']

# Sort the dataset based on 'revision_number' in descending order
covid_vaccines_sorted = covid_vaccines.sort_values(by='revision_number', ascending=False)

# Extract relevant columns ('medicine_name' and 'revision_number')
vaccine_revisions = covid_vaccines_sorted[['revision_number', 'medicine_name', 'therapeutic_area']]

# Top-5 data
medicine_names = vaccine_revisions['medicine_name'][:5]
revision_numbers = vaccine_revisions['revision_number'][:5]

## Question 2 Solution

# Replace null values with empty string in 'therapeutic_area' column
drugs['therapeutic_area'].fillna('', inplace=True)

# Filter the dataset to include only medicines related to 'Hepatitis B' in the 'therapeutic_area' variable
hepatitis_b_medicines = drugs[drugs['therapeutic_area'].str.contains('Hepatitis B', case=False)]

# Filter the dataset to include only medicines for humans in the 'category' variable
human_hepatitis_b_medicines = hepatitis_b_medicines[hepatitis_b_medicines['category'] == 'human']

# Filter the dataset to include only authorised medicines
authorised_hepatitis_b_medicines = human_hepatitis_b_medicines[human_hepatitis_b_medicines['authorisation_status'] == 'authorised']

# Sort the dataset based on the 'revision_date' in descending order to get the most recently revised medicines
recently_revised_medicines = authorised_hepatitis_b_medicines.sort_values(by='revision_date', ascending=False)

# Extract and display the relevant columns ('medicine_name' and 'marketing_authorisation_holder_company_name') for the most recently revised medicines
top_5_result = recently_revised_medicines[['medicine_name', 'marketing_authorisation_holder_company_name', 'revision_date']].head(5)

# Extract only the date part using string operations
top_5_result['revision_date'] = top_5_result['revision_date'].str[:10]

print(top_5_result)

# Custom column labels
custom_col_labels = ['Medicine Name', 'Company Name', 'Revision Date']

=======
## Question 1 solution
# Replace null values with empty string in 'therapeutic_area' column
drugs['therapeutic_area'].fillna('', inplace=True)

# Filter COVID vaccines based on 'therapeutic_area'
covid_vaccines = drugs[drugs['therapeutic_area'].str.contains('COVID', case=False)]

# Exclude medicines with conditions applied
covid_vaccines = covid_vaccines[~covid_vaccines['conditional_approval']]

# Filter the dataset to include only authorised medicines
covid_vaccines = covid_vaccines[covid_vaccines['authorisation_status'] == 'authorised']

# Sort the dataset based on 'revision_number' in descending order
covid_vaccines_sorted = covid_vaccines.sort_values(by='revision_number', ascending=False)

# Extract relevant columns ('medicine_name' and 'revision_number')
vaccine_revisions = covid_vaccines_sorted[['revision_number', 'medicine_name', 'therapeutic_area']]

# Top-5 data
medicine_names = vaccine_revisions['medicine_name'][:5]
revision_numbers = vaccine_revisions['revision_number'][:5]

## Question 2 solution
# Replace null values with empty string in 'therapeutic_area' column
drugs['therapeutic_area'].fillna('', inplace=True)

# Filter the dataset to include only medicines related to 'Hepatitis B' in the 'therapeutic_area' variable
hepatitis_b_medicines = drugs[drugs['therapeutic_area'].str.contains('Hepatitis B', case=False)]

# Filter the dataset to include only medicines for humans in the 'category' variable
human_hepatitis_b_medicines = hepatitis_b_medicines[hepatitis_b_medicines['category'] == 'human']

# Filter the dataset to include only authorised medicines
authorised_hepatitis_b_medicines = human_hepatitis_b_medicines[human_hepatitis_b_medicines['authorisation_status'] == 'authorised']

# Sort the dataset based on the 'revision_date' in descending order to get the most recently revised medicines
recently_revised_medicines = authorised_hepatitis_b_medicines.sort_values(by='revision_date', ascending=False)

# Extract and display the relevant columns ('medicine_name' and 'marketing_authorisation_holder_company_name') for the most recently revised medicines
top_5_result = recently_revised_medicines[['medicine_name', 'marketing_authorisation_holder_company_name', 'revision_date']].head(5)

# Extract only the date part using string operations
top_5_result['revision_date'] = top_5_result['revision_date'].str[:10]
>>>>>>> 679aa841b9d4c842e1aa99775448eda1065ce541

## Plots

In [None]:
<<<<<<< HEAD
## Question 1
=======

## Question 1 plot
>>>>>>> 679aa841b9d4c842e1aa99775448eda1065ce541
# Create a figure and axis object
fig, ax = plt.subplots(figsize=(10, 6))

# Create bar plot
bars = ax.bar(medicine_names, revision_numbers, color='skyblue')

# Set labels and title with customized font
label_font = {'fontsize': 12, 'fontweight': 'bold'}
title_font = {'fontsize': 16, 'fontweight': 'bold'}

# Set labels and title
ax.set_xlabel('Medicine Name', fontdict=label_font)
ax.set_ylabel('Revision Number', fontdict=label_font)
ax.set_title('\n Top 5 COVID vaccines that have undergone the most revisions \n', fontdict=title_font)

# Wrap long medicine names into two lines
wrapped_names = [textwrap.fill(name, width=25) for name in medicine_names]

# Set tick labels with line breaks
ax.set_xticklabels(wrapped_names, rotation=30, ha='right')

# Show plot
plt.tight_layout()
plt.show()

<<<<<<< HEAD
## Question 2
# Draw a table to display the result for question 2
#plt.figure(figsize=(8, 2.5))
=======
## Question 2 plot
# Custom column labels
custom_col_labels = ['Medicine Name', 'Company Name', 'Revision Date']

# Draw a table to display the result
plt.figure(figsize=(8, 2.5))
>>>>>>> 679aa841b9d4c842e1aa99775448eda1065ce541
table = plt.table(cellText=top_5_result.values,
                  colLabels=custom_col_labels,
                  loc='center',
                  cellLoc='center',
                  colColours=['skyblue']*len(top_5_result.columns),
                  cellColours=[['lightgrey']*len(top_5_result.columns)]*len(top_5_result),
                  fontsize=10)

table.auto_set_font_size(False)
table.set_fontsize(10)
table.scale(1, 1.5)  # Adjust the scale to make the table more compact
plt.axis('off')  # Turn off axis
plt.title('Most recently released Hepatitis B medicines \n authorized for human usage', fontdict={'fontsize': 14, 'fontweight': 'bold'})
plt.show()

## Plot and text

::: columns
::: {.column width="50%"}
-   Some text

-   goes here
:::

::: {.column width="50%"}

In [None]:
#| warning: false

## Question 1 plot
# Create a figure and axis object
fig, ax = plt.subplots(figsize=(10, 6))

# Create bar plot
bars = ax.bar(medicine_names, revision_numbers, color='skyblue')

# Set labels and title with customized font
label_font = {'fontsize': 12, 'fontweight': 'bold'}
title_font = {'fontsize': 16, 'fontweight': 'bold'}

# Set labels and title
ax.set_xlabel('Medicine Name', fontdict=label_font)
ax.set_ylabel('Revision Number', fontdict=label_font)
ax.set_title('\n Top 5 COVID vaccines that have undergone the most revisions \n', fontdict=title_font)

# Wrap long medicine names into two lines
wrapped_names = [textwrap.fill(name, width=25) for name in medicine_names]

# Set tick labels with line breaks
ax.set_xticklabels(wrapped_names, rotation=30, ha='right')

# Show plot
plt.tight_layout()
plt.show()

## Question 2 plot
# Custom column labels
custom_col_labels = ['Medicine Name', 'Company Name', 'Revision Date']

# Draw a table to display the result
plt.figure(figsize=(8, 2.5))
table = plt.table(cellText=top_5_result.values,
                  colLabels=custom_col_labels,
                  loc='center',
                  cellLoc='center',
                  colColours=['skyblue']*len(top_5_result.columns),
                  cellColours=[['lightgrey']*len(top_5_result.columns)]*len(top_5_result),
                  fontsize=10)

table.auto_set_font_size(False)
table.set_fontsize(10)
table.scale(1, 1.5)  # Adjust the scale to make the table more compact
plt.axis('off')  # Turn off axis
plt.title('Most recently released Hepatitis B medicines \n authorized for human usage', fontdict={'fontsize': 14, 'fontweight': 'bold'})
plt.show()

:::
:::

# A new section...

## Tables

If you want to generate a table, make sure it is in the HTML format (instead of Markdown or other formats), e.g.,

## Images

![Image credit: Danielle Navarro, Percolate.](images/watercolour_sys02_img34_teacup-ocean.png){fig-align="center" width="500"}

## Math Expressions {.smaller}

You can write LaTeX math expressions inside a pair of dollar signs, e.g. \$\\alpha+\\beta\$ renders $\alpha + \beta$. You can use the display style with double dollar signs:

```         
$$\bar{X}=\frac{1}{n}\sum_{i=1}^nX_i$$
```

$$
\bar{X}=\frac{1}{n}\sum_{i=1}^nX_i
$$

Limitations:

1.  The source code of a LaTeX math expression must be in one line, unless it is inside a pair of double dollar signs, in which case the starting `$$` must appear in the very beginning of a line, followed immediately by a non-space character, and the ending `$$` must be at the end of a line, led by a non-space character;

2.  There should not be spaces after the opening `$` or before the closing `$`.

# Wrap up

## Feeling adventurous?

-   You are welcomed to use the default styling of the slides. In fact, that's what I expect majority of you will do. You will differentiate yourself with the content of your presentation.

-   But some of you might want to play around with slide styling. Some solutions for this can be found at https://quarto.org/docs/presentations/revealjs.