# Title: Collecting data using interactive Jupyter widgets  
**Author details:** *Author:* Mairead Bermingham. *Contact details:* mairead.bermingham@ed.ac.uk.  
**Notebook and data info:** This Notebook provides an example of using interactive jupyter-widgets and to collect the NHS England accident and emergency attendances and admissions (ae_attendances) data (your test data) and save it to your working ‘Data’ folder, and finally saving all the captured test data to your 'RawData'.  
**Data:** Data consists of date, numerical data and character data from NHSRdatasets package.  
**Copyright statement:** This Notebook is the product of The University of Edinburgh.  

# Data
The data you will be managing on the course are from the NHSRdatasets package. This package has been created to support skills development in the NHS-R community and contains several free datasets. The dataset set I have chosen to manage from the NHSRdatasets package is the NHS England accident and emergency (A&E) attendances and admissions (`ae_attendances`) data. The `ae_attendances` data includes reported attendances, four-hour breaches and admissions for all A&E departments in England for 2016/17 through 2018/19 (Apr-Mar). We previously selected a subset of the variables needed for my data capture tool, including period, attendances and breaches, and subsetted the data into test and training data. However, for this lesson, we will use the full `ae_attendances` dataset to demonstrate how to use interactive Jupyter-widgets from the *ipywidgets* package to collect all data types from the `ae_attendances` data. The R script "./RScripts/LoadingNHSRdatasets_fulldata.R" was used to subset the full `ae_attendances` data into test and training data.

**Note**, you only need to set up widgets for the subset of the variables required for your data capture tool. We are using the full data set here, as you will be using interactive Jupyter widgets to collect different variables from your `ae_attendances` data subsets.

### The *pandas* package
To import the data, you will need to load the *pandas* package. The Python *pandas* package is used for data manipulation and analysis.

In [1]:
#Load the 'pandas' package
import pandas as pd
testData=pd.read_csv("../Data/ae_attendances_ENG_4hr_perfom_test_full.csv")
testData

Unnamed: 0,index,period,org_code,type,attendances,breaches,admissions,performance
0,1155,2016-12-01,C82010,other,200,0,0,1.0
1,2059,2016-10-01,RDZ,1,6452,360,1814,0.94
2,3468,2016-05-01,RVR,2,417,0,6,1.0
3,4153,2018-03-01,RQM,other,9376,112,0,0.99
4,4820,2018-02-01,R1F,other,245,0,0,1.0
5,7243,2017-07-01,RE9,1,5170,235,1269,0.95
6,8057,2017-04-01,RQM,1,15957,1309,3375,0.92
7,8957,2019-02-01,RNL,1,7258,1374,1947,0.81
8,10214,2018-10-01,RJ1,other,3197,0,0,1.0
9,10328,2018-10-01,RKB,2,2033,8,105,1.0


#### Data type
We now need to check the data type in the testData data frame. Let us use the `dtypes` function from the Python *pandas* package to query the data types in the testData. The `dtypes` function returns the data types in the data frame.

In [2]:
result = testData.dtypes
print("Output:")
print(result)

Output:
index            int64
period          object
org_code        object
type            object
attendances      int64
breaches         int64
admissions       int64
performance    float64
dtype: object


The data type object is a string

Now let us collect the first row of data from the test data. 
Use the `df.head()` function to see the first row in the data frame(df).

##### The `head()` function
The `head()` function lets you look at the top n rows of a data frame. By default, it shows the first five rows in a data frame. We can specify the number of rows we want to see in a data frame with the argument “n”. For example, look at the first row (n=1) of the test data:

In [3]:
testData.head(n=1)

Unnamed: 0,index,period,org_code,type,attendances,breaches,admissions,performance
0,1155,2016-12-01,C82010,other,200,0,0,1.0


We need to set up an empty data frame in the working data folder to collect the data captured by the Juypter widgets.

In [4]:
dfTofill = pd.DataFrame({'index': [0],# Integer
                   'period': [pd.Timestamp('20000101')], # Date
                   'org_code': ['NA'], # String
                   'type': ['NA'], # String
                   'attendances': [0], # Integer
                   'breaches': [0], # Integer
                   'admissions': [0], # Integer
                   'performance': [0.0], # Float
                   'consent': [False]}) # Boolean 

dfTofill

Unnamed: 0,index,period,org_code,type,attendances,breaches,admissions,performance,consent
0,0,2000-01-01,,,0,0,0,0.0,False


Save the empty data frame to your working 'Data' folder:

In [5]:
#dfTofill.to_csv('../Data/CollectedData.csv', index=False)

The empty data frame is now saved to the working 'Data' folder. Now make sure to comment out the last cell (Ctrl+/), as you only need to do this once. Now let's read in the empty data frame to collect the data from the Jupyter-widgets.

In [6]:
CollectData=pd.read_csv("../Data/CollectedData.csv")
CollectData

Unnamed: 0,index,period,org_code,type,attendances,breaches,admissions,performance,consent
0,1155,2016-12-01,C82010,other,200,0,0,1.0,True


Now let us collect the first row of data from the test data. 
Use the `df.head()` function to see the first row in the data frame(df).

##### The `head()` function
The `head()` function lets you look at the top n rows of a data frame. By default, it shows the first five rows in a data frame. We can specify the number of rows we want to see in a data frame with the argument “n”. For example, look at the first row (n=1) of the test data:

In [7]:
testData.head(n=1)

Unnamed: 0,index,period,org_code,type,attendances,breaches,admissions,performance
0,1155,2016-12-01,C82010,other,200,0,0,1.0


# Index variable 
The first variable contains the index number, that allows us to connect the test data to the orginal data set "../RawData/ae_attendances.csv". We will have to use indexing to to add the index number to the 'dfTofill' file

###  Indexing in Python
Indexing in Python is a way to refer the individual items by its position. In other words, you can directly access your elements of choice. In Python, objects are “zero-indexed” meaning the position count starts at zero. 

In [8]:
index_number=1155 #Remember to change for each record.
dfTofill.iloc[0,0]=index_number
dfTofill

Unnamed: 0,index,period,org_code,type,attendances,breaches,admissions,performance,consent
0,1155,2000-01-01,,,0,0,0,0.0,False


# Widgets
Widgets are interactive Python objects that have a representation in the browser. A widget is a graphical user interface element, such as a button, dropdown or textbox. Widgets can be embedded in the Notebook and provide a user-friendly interface to collect the user input and see the impact the changes have on the data/results without interacting with your code. Widgets can transform your notebooks from static documents to dynamic dashboards, ideal for showcasing your data story.

To use the widget framework, you need to import the *ipywidgets* Python package. The *ipywidgets* package provides a list of widgets commonly used in web apps and dashboards like dropdown, checkbox, radio buttons, etc.

In [9]:
#Load the 'ipywidgets' package
import ipywidgets as widgets

### `display()`

The *IPython.display* package is used to display different objects in Jupyter. 
You can also explicitly display a widget using the `display()` function from the *IPython.display* package

In [10]:
#Load the 'IPython.display' package
from IPython.display import display

# Consent
Consent is a vital area for data protection compliance. Consent means giving data subjects genuine choice and control over how you process their data. If the data subject has no real choice, consent is not freely given, and it will be invalid. The [General Data Protection Regulation](https://eu01.alma.exlibrisgroup.com/leganto/public/44UOE_INST/citation/37632538310002466?auth=SAML) sets a high standard for consent and contains significantly more detail than previous data protection legislation. Consent is defined in Article 4 as: “Consent of the data subject means any freely given, specific informed and unambiguous indication of the data subject’s wishes by which he or she, by a statement or by a clear affirmative action, signifies agreement to the processing of personal data relating to him or her”.

Before we collect any data, we need to get consent from the end-user to process and share the data we will collect with the data capture tool.

## Boolean widgets
Boolean widgets are designed to display a boolean value.

### Checkbox widget

In [11]:
a = widgets.Checkbox(
    value=False,
    description='I consent for the data I have provided to be processed and shared in accordance with data protection regulations with the purpose of improving care service provision across the UK.',
    disabled=False
)

In [12]:
display(a)

Checkbox(value=False, description='I consent for the data I have provided to be processed and shared in accord…

In [13]:
dfTofill.iloc[0,8]=a.value
dfTofill

Unnamed: 0,index,period,org_code,type,attendances,breaches,admissions,performance,consent
0,1155,2000-01-01,,,0,0,0,0.0,False


# The period variable  
The period variable includes the month this activity relates to, stored as a date (1st of each month).  

#### Data type
We now need to check the data type in the testData data frame. Let us use the `dtypes` function from the Python *pandas* package to query the data types in the testData. The `dtypes` function returns the data types in the data frame.

In [14]:
print(result[1])
#String data type

object


The data type object is a string.

##### The `head()` function
The `head()` function lets you look at the top n rows of a data frame. By default, it shows the first five rows in a data frame. We can specify the number of rows we want to see in a data frame with the argument “n”. For example, look at the first row (n=1) of the test data:

In [15]:
testData.head(n=1)

Unnamed: 0,index,period,org_code,type,attendances,breaches,admissions,performance
0,1155,2016-12-01,C82010,other,200,0,0,1.0


### DatePicker widget 
We next need to set up a DatePicker widget to collect the period data.

In [16]:
b = widgets.DatePicker(
    description='Period',
    disabled=False
)
display(b)

DatePicker(value=None, description='Period')

In [17]:
dfTofill.iloc[0,1]=b.value
dfTofill

Unnamed: 0,index,period,org_code,type,attendances,breaches,admissions,performance,consent
0,1155,NaT,,,0,0,0,0.0,False


## The org_code variable
The org_code variable includes the Organisation data service (ODS) code for the organisation. The ODS code is a unique code created by the Organisation data service within [NHS Digital](_https://www.digitalsocialcare.co.uk/latest-guidance/how-to-find-your-ods-code/), and used to identify organisations across health and social care. ODS codes are required in order to gain access to national systems like NHSmail and the Data Security and Protection Toolkit. If you want to know the organisation associated with a particular ODS code, you can look it up from the following address: <https://odsportal.digital.nhs.uk/Organisation/Search>. For example, the organisation associated with the ODS code 'AF003' is
[Parkway health centre](https://odsportal.digital.nhs.uk/Organisation/OrganisationDetails?organisationId=132839&showOpenChildredOnly=True).      

#### Data type
We now need to check the data type in the testData data frame. Let us use the `dtypes` function from the Python *pandas* package to query the data types in the testData. The `dtypes` function returns the data types in the data frame.

In [18]:
print(result[2])
#String data type

object


The data type object is a string.

#### Describe the test data
Here we are going to use the `describe()` function from the *numpy* Python package to calculate summary statistics for the testData data frame. The numpy package is the core package for scientific computing in Python. The `describe()` function from the *numpy* package computes the descriptive statistics.

In [19]:
#Load the 'numpy' package
import numpy as np
testData.describe(include='all')

Unnamed: 0,index,period,org_code,type,attendances,breaches,admissions,performance
count,11.0,11,11,11,11.0,11.0,11.0,11.0
unique,,10,10,3,,,,
top,,2018-10-01,RQM,other,,,,
freq,,2,2,5,,,,
mean,6565.545455,,,,4603.727273,309.0,774.181818,0.964545
std,3618.976329,,,,4951.508338,524.261385,1161.917365,0.059053
min,1155.0,,,,200.0,0.0,0.0,0.81
25%,3810.5,,,,376.5,0.0,0.0,0.945
50%,7243.0,,,,3197.0,8.0,6.0,1.0
75%,9585.5,,,,6855.0,297.5,1541.5,1.0


#### Applying *pandas* `unique()` function
We must first use the *pandas* package `unique()` function to get the unique Organisation data service (ODS) codes in the test data.

In [20]:
org_code=list(testData['org_code'].unique())
org_code

['C82010', 'RDZ', 'RVR', 'RQM', 'R1F', 'RE9', 'RNL', 'RJ1', 'RKB', 'NLO12']

##### The `head()` function
The `head()` function lets you look at the top n rows of a data frame. By default, it shows the first five rows in a data frame. We can specify the number of rows we want to see in a data frame with the argument “n”. For example, look at the first row (n=1) of the test data:

In [21]:
testData.head(n=1)

Unnamed: 0,index,period,org_code,type,attendances,breaches,admissions,performance
0,1155,2016-12-01,C82010,other,200,0,0,1.0


## Selection widgets
Several widgets can be used to display single selection lists. You can specify the selectable options by passing a list.  

In [22]:
c=widgets.Select(
    options=org_code,
    value='C82010',
    rows=len(org_code),
    description='ODS code:',
    disabled=False
)
display(c)

Select(description='ODS code:', options=('C82010', 'RDZ', 'RVR', 'RQM', 'R1F', 'RE9', 'RNL', 'RJ1', 'RKB', 'NL…

In [23]:
dfTofill.iloc[0,2]=c.value
dfTofill

Unnamed: 0,index,period,org_code,type,attendances,breaches,admissions,performance,consent
0,1155,NaT,C82010,,0,0,0,0.0,False


## The type variable
The type variable contains the department type for this activity, either  
* **1:** Emergency departments are a consultant-led 24-hour service with full resuscitation facilities and designated accommodation for the reception of accident and emergency patients,  
* **2:** Consultant-led mono speciality accident and emergency service (e.g. ophthalmology, dental) with designated accommodation for the reception of patients, or  
* **other:** Other type of A&E/minor injury activity with designated accommodation for the reception of accident and emergency patients. The department may be doctor-led or nurse-led and treats at least minor injuries and illnesses and can be routinely accessed without an appointment. A service mainly or entirely appointment-based (for example, a GP Practice or Outpatient clinic) is excluded even though it may treat a number of patients with minor illnesses or injury. Excludes NHS walk-in centres.[(National Health Service, 2020)](https://eu01.alma.exlibrisgroup.com/leganto/public/44UOE_INST/citation/37459630310002466?auth=SAML)

#### Data type
We now need to check the data type in the testData data frame. Let us use the `dtypes` function from the Python *pandas* package to query the data types in the testData. The `dtypes` function returns the data types in the data frame.

In [24]:
print(result[3])
#String data type

object


The data type object is a string.

#### Applying *pandas* `unique()` function
We must first use the *pandas* package `unique()` function to get the unique department type in the test data.

In [25]:
type=list(testData['type'].unique())
type

['other', '1', '2']

##### The `head()` function
The `head()` function lets you look at the top n rows of a data frame. By default, it shows the first five rows in a data frame. We can specify the number of rows we want to see in a data frame with the argument “n”. For example, look at the first row (n=1) of the test data:

In [26]:
testData.head(n=1)

Unnamed: 0,index,period,org_code,type,attendances,breaches,admissions,performance
0,1155,2016-12-01,C82010,other,200,0,0,1.0


### RadioButtons

In [27]:
d=widgets.RadioButtons(
    options=type,
#     value='other',
    description='Type:',
    disabled=False
)
display(d)

RadioButtons(description='Type:', options=('other', '1', '2'), value='other')

In [28]:
dfTofill.iloc[0,3]=d.value
dfTofill

Unnamed: 0,index,period,org_code,type,attendances,breaches,admissions,performance,consent
0,1155,NaT,C82010,other,0,0,0,0.0,False


# The attendances variable
The attendances variable includes the number of attendances for this department type at this organisation for this month.

#### Data type
We now need to check the data type in the testData data frame. Let us use the `dtypes` function from the Python *pandas* package to query the data types in the testData. The `dtypes` function returns the data types in the data frame.

In [29]:
print(result[4])

int64


##### The `head()` function
The `head()` function lets you look at the top n rows of a data frame. By default, it shows the first five rows in a data frame. We can specify the number of rows we want to see in a data frame with the argument “n”. For example, look at the first row (n=1) of the test data:

In [30]:
testData.head(n=1)

Unnamed: 0,index,period,org_code,type,attendances,breaches,admissions,performance
0,1155,2016-12-01,C82010,other,200,0,0,1.0


## Numeric widgets
There are many widgets distributed with ipywidgets that are designed to display numeric values. Widgets exist for displaying integers and floats, both bounded and unbounded. The integer widgets share a similar naming scheme to their floating point counterparts. By replacing Float with Int in the widget name, you can find the Integer equivalent.

### IntText

In [31]:
e=widgets.IntText(
    value=0,
    description='Attendances:',
    disabled=False)
display(e)

IntText(value=0, description='Attendances:')

In [32]:
dfTofill.iloc[0,4]=e.value
dfTofill

Unnamed: 0,index,period,org_code,type,attendances,breaches,admissions,performance,consent
0,1155,NaT,C82010,other,0,0,0,0.0,False


# The breaches variable
The breaches variable includes the number of attendances that breached the four hour target.   

#### Data type
We now need to check the data type in the testData data frame. Let us use the `dtypes` function from the Python *pandas* package to query the data types in the testData. The `dtypes` function returns the data types in the data frame.

In [33]:
print(result[5])

int64


In [34]:
testData.head(1)

Unnamed: 0,index,period,org_code,type,attendances,breaches,admissions,performance
0,1155,2016-12-01,C82010,other,200,0,0,1.0


### IntText

In [35]:
f=widgets.IntText(
    value=0,
    description='Breaches:',
    disabled=False)
display(f)

IntText(value=0, description='Breaches:')

In [36]:
dfTofill.iloc[0,5]=f.value
dfTofill

Unnamed: 0,index,period,org_code,type,attendances,breaches,admissions,performance,consent
0,1155,NaT,C82010,other,0,0,0,0.0,False


#### The admissions variable
The admissions variable includes the number of attendances that resulted in an admission to the hospital.[(Chris Mainey, 2021)](https://eu01.alma.exlibrisgroup.com/leganto/public/44UOE_INST/citation/37444097490002466?auth=SAML)

#### Data type
We now need to check the data type in the testData data frame. Let us use the `dtypes` function from the Python *pandas* package to query the data types in the testData. The `dtypes` function returns the data types in the data frame.

In [37]:
print(result[6])

int64


It is an integer variable.

##### The `head()` function
The `head()` function lets you look at the top n rows of a data frame. By default, it shows the first five rows in a data frame. We can specify the number of rows we want to see in a data frame with the argument “n”. For example, look at the first row (n=1) of the test data:

In [38]:
testData.head(n=1)

Unnamed: 0,index,period,org_code,type,attendances,breaches,admissions,performance
0,1155,2016-12-01,C82010,other,200,0,0,1.0


### IntText

In [39]:
g=widgets.IntText(
    value=0,
    description='Admissions:',
    disabled=False)
display(g)

IntText(value=0, description='Admissions:')

In [40]:
dfTofill.iloc[0,6]=g.value
dfTofill

Unnamed: 0,index,period,org_code,type,attendances,breaches,admissions,performance,consent
0,1155,NaT,C82010,other,0,0,0,0.0,False


# The performance variable
The performance variable was calculated for the whole of England as (1 - breaches)/ attendances.

#### Data type
We now need to check the data type in the testData data frame. Let us use the `dtypes` function from the Python *pandas* package to query the data types in the testData. The `dtypes` function returns the data types in the data frame.

In [41]:
print(result[7])

float64


It is a float variable.

##### The `head()` function
The `head()` function lets you look at the top n rows of a data frame. By default, it shows the first five rows in a data frame. We can specify the number of rows we want to see in a data frame with the argument “n”. For example, look at the first row (n=1) of the test data:

In [42]:
testData.head(n=1)

Unnamed: 0,index,period,org_code,type,attendances,breaches,admissions,performance
0,1155,2016-12-01,C82010,other,200,0,0,1.0


### FloatText

In [43]:
h=widgets.FloatText(
    value=0.0,
    description='Performance:',
    disabled=False
)
display(h)

FloatText(value=0.0, description='Performance:')

In [44]:
dfTofill.iloc[0,7]=h.value
dfTofill

Unnamed: 0,index,period,org_code,type,attendances,breaches,admissions,performance,consent
0,1155,NaT,C82010,other,0,0,0,0.0,False


# Concatenating the collected data to the CollectData data frame.   
Let us use the `concat()` function from the Python *pandas* package to append the CollectData and dfTofill data frames. The concat() function is used to concatenate *pandas* objects.

In [45]:
# CollectData is the first data frame
# dfTofill is the second data frame
CollectData  = pd.concat([CollectData, dfTofill])
display(CollectData)

Unnamed: 0,index,period,org_code,type,attendances,breaches,admissions,performance,consent
0,1155,2016-12-01,C82010,other,200,0,0,1.0,True
0,1155,,C82010,other,0,0,0,0.0,False


## Have you consent to process and share the data before you save it to the working data folder?

Before we save our data to file, we must make sure we have consent to do so. The following line of code, will ensure that you have consent to save data.

In [46]:
CollectData=CollectData[CollectData['consent'] == True]
display(CollectData)

Unnamed: 0,index,period,org_code,type,attendances,breaches,admissions,performance,consent
0,1155,2016-12-01,C82010,other,200,0,0,1.0,True


### Saving the CollectData data frame
Saving the data collected by your data-capture tool to the working data folder:

In [47]:
CollectData.to_csv('../Data/CollectedData.csv', index=False)

That is the CollectData data frame saved to the working 'Data' folder. You need to iterate through this Notebook until you have collected all of your test data and then save the captured test data to your 'RawData' folder.

In [48]:
#CollectData.to_csv('../RawData/CollectedDataFinal.csv', index=False)

<br>
<br>
<br>

# The user interface for your data collection tool 

In this section, you will provide a little background for your end-user, why your need their data, and what you are going to do with it.
<br>

## The Box widget
The Box widget enables rich reactive layouts in the Jupyter Notebook. It aims at providing an efficient way to lay out, align and distribute space among your widgets in a box. The HBox (Horizontal layout) and VBox (vertical layout) classes above are special cases of the Box widget.

<br>

### Create a reactive form for end-user
Let’s use the VBox widget to create a reactive form for our end-user. The form itself, and each row in the form is a Box widget.

<br>

In [49]:
#form=widgets.VBox([a,b,c,d,e,f,g,h])
form=widgets.VBox([a,b,e,f])

# Our commitment to a maximum four-hour accident and emergency wait 


The four-hour accident and emergency waiting time target is a pledge set out in our ['Handbook to the NHS Constitution'](https://eu01.alma.exlibrisgroup.com/leganto/public/44UOE_INST/citation/37819402820002466?auth=SAML). 
Our operational standard is that at least 95% of patients attending A&E should be admitted, transferred, or discharged within four hours.[(The UK Government, 2022)](https://eu01.alma.exlibrisgroup.com/leganto/public/44UOE_INST/citation/37819402820002466?auth=SAML) This standard applies to all areas of emergency care, including attendances in trolleyed areas of an Assessment Unit as well as Emergency Departments and minor injury units. For service users that require admission to A&E, the time they wait between the doctor deciding that they should be admitted for treatment and the patient arriving on the ward is an important measure of safety. The Royal College of Emergency Medicine estimated that overcrowding and extreme delays led to 4,519 excess deaths in England in 2020/21.  In March 2022, 136,297 patients waited over four hours from decision to admission, 27% of all patients. [(The Nuffield Trust, 2022)](https://eu01.alma.exlibrisgroup.com/leganto/public/44UOE_INST/citation/37819506800002466?auth=SAML)

To keep our service users and NHS England safe by ensuring A&E departments provide the fastest and most appropriate care for service users as and when they need it. We need your monthly data on the number of attendances and breaches over time to make available to your and other service managers to set as a benchmark against which to assess and improve your department’s performance against the 4-hour standard. We would be very grateful if you could take one minute each month to share your data with us in the form below:

In [50]:
display(form)

VBox(children=(Checkbox(value=False, description='I consent for the data I have provided to be processed and s…

Thank you for sharing your data, and giving us your consent to process and share it with other service management teams across England. We will add your data to our [open data resource](https://github.com/B111333/B111333WorkingWithDataTypesAndStructuresInPythonandR_Assessment) for you to use now or in the futures as a benchmark against which to assess and improve your department’s performance against the 4-hour standard. 


I hope these examples help you to improve your Python programming skills. Happy Coding!