# Lesson 13 - HSCIC data

So let's try out another example to retrieve some data from OpenPrescribing. 

## Antibiotics - give 5 days max

In 2024, NHS guidance recommended that 60% of amoxicillin prescriptions should be 5-days by March 2025. This is due to evidence showing there is no real benefit of courses over 5 days (for most infections). We can work out if we have hit this target with the data we have in OpenPrescribing. First we need to find out the total number of all amoxicillin prescriptions (we will look at 500 mg tablets). Then we need to work out the number of prescriptions that are 5 days long. Finally, we need to found out the fraction of the two numbers as a percentage.

## Denominators and numerators

Ok, this subheading just threw some new (or not) words at you. Let's talk through them.

When you are comparing data, you often need to find out `how much` of a `fraction` is one `thing` compared to `another thing`. You might remember the below equation from your school days:

fraction = $\frac{\text{numerator}}{\text{denominator}}$

So our `denominator`, for the example that we described above, is the number of all amoxicillin prescriptions, and our `numerator` is the number of prescriptions over 5-days.

**Let's work out both!**

## Denominator

Run the below cell and see what the final result is:

In [None]:
from ebmdatalab import bq
from pathlib import Path

DATA_FOLDER = Path("data")

denominator_sql = """
    SELECT SUM(items) AS denominator_items
    FROM `ebmdatalab.hscic.raw_prescribing_normalised`
    WHERE bnf_code LIKE '0501013B0%AB'
    """

denominator = bq.cached_read(denominator_sql, DATA_FOLDER / "amoxcillin_denominator.csv", use_cache=True)
denominator

## Numerator

Run the cell below and see what you get (by the way a 5 day course of antibiotics is 3 tablets a day time 5 = 15 tablets):

In [None]:
numerator_sql = """
    SELECT SUM(items) AS numerator_items
    FROM `ebmdatalab.hscic.raw_prescribing_normalised`
    WHERE bnf_code LIKE '0501013B0%AB' AND quantity_per_item = 15
    """

numerator = bq.cached_read(numerator_sql, DATA_FOLDER / "amoxcillin_numerator.csv", use_cache=True)
numerator

## Fraction

So we have our denominator and we have our numerator. We can then go on to work out the fraction of the two.

In [None]:
import pandas

denominator_int = int(denominator["denominator_items"][0])
numerator_int = int(numerator["numerator_items"][0])

fraction = numerator_int / denominator_int if denominator_int else 0
fraction

In [None]:
percentage = fraction * 100
print(f"{round(percentage)}%")

So there is probably some code in the worked examples above that you may be a little confused by. Let's go through the code line by line.

**But, before we do that, let's talk about comments.**

## Comments

Comments are just what they sound like, comments. In code, a comment is human reable text that the python interpreter just ignores. Comments in code are there to explain to us humans what some piece of code is going, or why we decided to do things the way we did. There are several ways to create comments, let's look at each one.

### Single line comments

A single line comment looks like this:

```python
# I am a comment that the computer ignores
print("I am a piece of code that the computer DOES read and executes!")
```

This prints out:

```bash
I am a piece of code that the computer DOES read and executes!
```

So, we are using the hashtag `#` for single line comments.

You can write several lines of code with a hashtag for each line:

```python
# I am a comment that the computer ignores
# Me too!
print("I am a piece of code that the computer DOES read and executes!")
```

### Triple quotes

If you want to comment over several lines, you can use triple quotes like this:

```python
"""
I am a very important comment.
Only a human reads me.
Done!
"""
```

So that was double qoutes `"`. You can also use single quotes `'`:

```python
'''
I am also a very important comment.
The computers cannot see me.
Almost!
'''
```

## Comments in SQL

Now comments in SQL are a little different, even if they reside in your python code. If you want to add a comment to a SQL query, the you use the double hyphens `--` for single lines and the `block` comment `/*  */` for multiple lines. For example, for single lines you could have something like this:

```sql
-- This part gets all the data
SELECT *
-- This part tells the query which database we will be using
FROM prescriptions;
```

And for multiple line comments, we can do it like this:

```sql
/* This select component is different.
   It looks at the Quantity column.
   And calculates the sum. */
SELECT SUM(Quantity)
/* But this FROM component is the same as before.
   It gets the data from the presciptions database*/
FROM prescriptions;
```

Comments are really useful, especially if you want to add information about your code. Let's use comments to talk through the above OpenPrescribing code.

In [None]:
# As we have spoken about before, this line of code imports the 'bq' module so we can use
# it in our code
from ebmdatalab import bq

# And you know this one! This import gives you the ability to interact with files and folders
from pathlib import Path

# You have also see this before, where we create a 'Path' object, referencing the 'data' folder location
DATA_FOLDER = Path("data")

# Here is where we define the denominator SQL
denominator_sql = """
    /* This select statement looks at the items column, adds up all of the values, 
    and displays the results under the lable `denominator_items` */
    SELECT SUM(items) AS denominator_items
    
    /* This statement tells bigquery to get data from the 'ebmdatalab' project, 'hscic' dataset 
    and the 'raw_prescribing_normalised' table */ 
    FROM `ebmdatalab.hscic.raw_prescribing_normalised`

    /* The LIKE keyword allows you to look for bnf_codes that match a pattern of '0501013B0%AB', 
    where % states that any number of characters can be found at this point.
    WHERE bnf_code LIKE '0501013B0%AB' */
    """

# Here we send the SQL query to the BigQuery data platform, save the results in the below file
# and we are caching the data. The data is also made available to code by storing to the variable
# 'denominator', which is a panda object (see later for details on this)!
denominator = bq.cached_read(den_sql, DATA_FOLDER / "amoxcillin_denominator.csv", use_cache=True)

# By just typing out a variable, JupyterLabs will print it out.
denominator

- By the way, the `%` character in a string is called a `wildcard`.
- the file extension (after the fullstop) csv stands for `comma separated variables`. This just uses commans and new lines to separate out all of the data.

In [None]:
# Here is where we define the numerator SQL
numerator_sql = """
    /* This select statement looks at the items column, adds up all of the values, 
    and displays the results under the lable `numerator_items` */
    SELECT SUM(items) AS numerator_items

    -- We get our data from the same place as that for the denominator.
    FROM `ebmdatalab.hscic.raw_prescribing_normalised`

    -- In this WHERE statement, we are also looking for prescriptions of 15 tablets
    WHERE bnf_code LIKE '0501013B0%AB' AND quantity_per_item = 15
    """

# Much like with denominator, we are sending an sql query, saving in a csv file and on repeat code 
# execution, we used the saved data rather than rerun the BigQuery search.
numerator = bq.cached_read(numerator_sql, DATA_FOLDER / "amoxcillin_numerator.csv", use_cache=True)

# Print it out!
numerator

## Pandas

Now when we undertake the fraction calculations, we see a new library called Pandas. What is this?

The `panda` library gives you access to functionality to manipulate data stored in tables, just like in a database. Using pandas makes working with the data you download from a database, like that stored in BigQuery, much easier to work with.

In [None]:
# Here we import the new panda library.
import pandas

"""
This part is a little complicated, let's break it down:
1. We reference the denominator panda object, and ask for the value in the "denominator_items"
    column and row 0 (row 0 being the first row).
2. We then convert the value into an integer (eg number). This step is needed as pandas do not 
    always get the type correct.
"""
denominator_int = int(denominator["denominator_items"][0])

# We do the same sort of calculation here
numerator_int = int(numerator["numerator_items"][0])

"""
Here we divide the numerator_int by the denominator_int. The if...else statement stops us dividing
the two numbers if denominator is zero (which will cause an error)
"""
fraction = numerator_int / denominator_int if denominator_int else 0

# We print out 'fraction'
fraction

And finally...

In [None]:
# We multiply fraction by 100% to get a percentage
percentage = fraction * 100

# Here we use the 'round' function to round up percentage to the next whole number
# the f"..." code is called an f-string. It helps you format how something is printed to screen.
print(f"{round(percentage)}%")

## And there you have it

So hopefully that made sense. If not, ask a friendly Bennett Person.

Now onto lesson 14!