# Title: Collecting data using interactive Jupyter widgets  
**Author details:** *Author:* B208593  
**Notebook and data info:** This Notebook provides an example of using interactive jupyter-widgets and to collect the NHS England accident and emergency attendances and admissions (ae_attendances) data (your test data) and save it to your working ‘Data’ folder, and finally saving all the captured test data to your 'RawData'.  
**Data:** Data consists of date, numerical data and character data from NHSRdatasets package. 
**Copyright statement:** This Notebook is the product of The University of Edinburgh.

# Data
The data are from the NHSRdatasets package: the NHS England accident and emergency (A&E) attendances and admissions (`ae_attendances`) data.  A subset of the variables was selected using R for this data capture tool, including period, organisation code, attendances, breaches and performance. The subsetted data were divided into test and training data. The R script "./RScripts/LoadingNHSRdatasets_fulldata.R" was used to subset the full `ae_attendances` data into test and training data.


### The *pandas* package
To import the data, you will need to load the *pandas* package. The Python *pandas* package is used for data manipulation and analysis.

In [None]:
#Load the 'pandas' package
import pandas as pd
testData=pd.read_csv("../Data/ae_type1_performance_test_full.csv")
testData

#### Data type
I checked the data type in the testData data frame in order to know the type of widget to use
I used the `dtypes` function from the Python *pandas* package to query the data types in the testData. The `dtypes` function returns the data types in the data frame.

In [None]:
result = testData.dtypes
print("Output:")
print(result)

The data type object is a string

To collect the first row of data from the test data the `df.head()` function was sed to see the first row in the data frame(df).

##### The `head()` function
The `head()` function lets you look at the top n rows of a data frame.

In [None]:
testData.head(n=1)

An empty data frame was set up in the working data folder to collect the data captured by the Juypter widgets. The data fram eonly shows the variables we would like to collect with the data capture tool.

In [None]:
dfTofill = pd.DataFrame({'index': [0],# Integer
                   'period': [pd.Timestamp('20000101')], # Date
                   'org_code': ['NA'], # String
                   'attendances': [0], # Integer
                   'breaches': [0], # Integer
                   'performance': [0.0], # Float
                   'consent': [False]}) # Boolean 

dfTofill

Save the empty data frame to your working 'Data' folder:

In [None]:
#dfTofill.to_csv('../Data/CollectedData.csv', index=False)

The empty data frame is now saved to the working 'Data' folder. We have to make sure to comment out the last cell (Ctrl+/), as we only need to do this once. We read in the empty data frame to collect the data from the Jupyter-widgets.

In [None]:
CollectData=pd.read_csv("../Data/CollectedData.csv")
CollectData

Now let us collect the first row of data from the test data. 
Use the `df.head()` function to see the first row in the data frame(df).

##### The `head()` function
The `head()` function lets you look at the top n rows of a data frame. By default, it shows the first five rows in a data frame. We can specify the number of rows we want to see in a data frame with the argument “n”. For example, look at the first row (n=1) of the test data:

In [None]:
testData.head(n=1)

# Index variable 
The first variable contains the index number, that allows us to connect the test data to the orginal data set "../RawData/ae_attendances.csv". We will have to use indexing to to add the index number to the 'dfTofill' file

###  Indexing in Python
Indexing in Python is a way to refer the individual items by its position. In other words, we can directly access the elements of choice. In Python, objects are “zero-indexed” meaning the position count starts at zero. 

In [None]:
index_number=12463 #Remember to change for each record.
dfTofill.iloc[0,0]=index_number
dfTofill

# Widgets
Widgets are interactive Python objects that have a representation in the browser such as a button, dropdown or textbox. Widgets can be embedded in the Notebook and provide a user-friendly interface to collect the user input and see the impact the changes have on the data/results without interacting with the code. 

To use the widget framework, you need to import the *ipywidgets* Python package. The *ipywidgets* package provides a list of widgets commonly used in web apps and dashboards like dropdown, checkbox, radio buttons, etc.

In [None]:
#Load the 'ipywidgets' package
import ipywidgets as widgets

### `display()`

The *IPython.display* package is used to display different objects in Jupyter. 
We can also explicitly display a widget using the `display()` function from the *IPython.display* package

In [None]:
#Load the 'IPython.display' package
from IPython.display import display

# Consent
Consent is a vital area for data protection compliance. Consent means giving data subjects genuine choice and control over how you process their data. If the data subject has no real choice, consent is not freely given, and it will be invalid. 

Before we collect any data, we need to get consent from the end-user to process and share the data we will collect with the data capture tool.

## Boolean widgets
Boolean widgets are designed to display a boolean value (TRUE or FALSE).

### Checkbox widget

In [None]:
a = widgets.Checkbox(
    value=False,
    description='I consent for the data I have provided to be processed and shared in accordance with data protection regulations with the purpose of improving care service provision across the UK.',
    disabled=False
)

In [None]:
display(a)

In [None]:
dfTofill.iloc[0,6]=a.value
dfTofill

# The period variable  
The period variable includes the month this activity relates to, stored as a date (1st of each month).  

#### Data type
We now need to check the data type in the testData data frame by using the `dtypes` function from the Python *pandas* package to query the data types in the testData. The `dtypes` function returns the data types in the data frame.

In [None]:
print(result[1])
#String data type

The data type is object.

##### The `head()` function
The `head()` function lets you look at the top n rows of a data frame. By default, it shows the first five rows in a data frame. We can specify the number of rows we want to see in a data frame with the argument “n”. For example, look at the first row (n=1) of the test data:

In [None]:
testData.head(n=1)

### DatePicker widget 
For the period variable (an object) we wil use the DatePicker widget so wen eed to set up a DatePicker widget to collect the period data.

In [None]:
b = widgets.DatePicker(
    description='Period',
    disabled=False
)
display(b)

In [None]:
dfTofill.iloc[0,1]=b.value
dfTofill

## The org_code variable
The org_code variable includes the Organisation data service (ODS) code for the organisation. The ODS code is a unique code created by the Organisation data service within [NHS Digital](_https://www.digitalsocialcare.co.uk/latest-guidance/how-to-find-your-ods-code/), and used to identify organisations across health and social care. 

#### Data type
We now need to check the data type in the testData data frame. Let us use the `dtypes` function from the Python *pandas* package to query the data types in the testData. The `dtypes` function returns the data types in the data frame.

In [None]:
print(result[2])
#String data type

The data type object is a string.

#### Describe the test data
Here we are going to use the `describe()` function from the *numpy* Python package to calculate summary statistics for the testData data frame. The numpy package is the core package for scientific computing in Python. The `describe()` function from the *numpy* package computes the descriptive statistics.

In [None]:
#Load the 'numpy' package
import numpy as np
testData.describe(include='all')

#### Applying *pandas* `unique()` function
We must first use the *pandas* package `unique()` function to get the unique Organisation data service (ODS) codes in the test data.

In [None]:
org_code=list(testData['org_code'].unique())
org_code

##### The `head()` function
The `head()` function lets you look at the top n rows of a data frame. By default, it shows the first five rows in a data frame. We can specify the number of rows we want to see in a data frame with the argument “n”. For example, look at the first row (n=1) of the test data:

In [None]:
testData.head(n=1)

## Selection widgets
Several widgets can be used to display single selection lists. You can specify the selectable options by passing a list.  

In [None]:
c=widgets.Select(
    options=org_code,
    value='RGT',
    rows=len(org_code),
    description='ODS code:',
    disabled=False
)
display(c)

In [None]:
dfTofill.iloc[0,2]=c.value
dfTofill

# The attendances variable
The attendances variable includes the number of attendances for this department type at this organisation for this month.

#### Data type
We now need to check the data type in the testData data frame. Let us use the `dtypes` function from the Python *pandas* package to query the data types in the testData. The `dtypes` function returns the data types in the data frame.

In [None]:
print(result[3])

##### The `head()` function
The `head()` function lets you look at the top n rows of a data frame. By default, it shows the first five rows in a data frame. We can specify the number of rows we want to see in a data frame with the argument “n”. For example, look at the first row (n=1) of the test data:

In [None]:
testData.head(n=1)

## Numeric widgets
There are many widgets distributed with ipywidgets that are designed to display numeric values. Widgets exist for displaying integers and floats, both bounded and unbounded. The integer widgets share a similar naming scheme to their floating point counterparts. By replacing Float with Int in the widget name, you can find the Integer equivalent.

### IntText

In [None]:
e=widgets.IntText(
    value=0,
    description='Attendances:',
    disabled=False)
display(e)

In [None]:
dfTofill.iloc[0,3]=e.value
dfTofill

# The breaches variable
The breaches variable includes the number of attendances that breached the four hour target.   

#### Data type
We now need to check the data type in the testData data frame. Let us use the `dtypes` function from the Python *pandas* package to query the data types in the testData. The `dtypes` function returns the data types in the data frame.

In [None]:
print(result[4])

In [None]:
testData.head(1)

### IntText

In [None]:
f=widgets.IntText(
    value=0,
    description='Breaches:',
    disabled=False)
display(f)

In [None]:
dfTofill.iloc[0,4]=f.value
dfTofill

# The performance variable
The performance variable was calculated for the whole of England as (1 - breaches)/ attendances.

#### Data type
We now need to check the data type in the testData data frame. Let us use the `dtypes` function from the Python *pandas* package to query the data types in the testData. The `dtypes` function returns the data types in the data frame.

In [None]:
print(result[5])

It is a float variable.

##### The `head()` function
The `head()` function lets you look at the top n rows of a data frame. By default, it shows the first five rows in a data frame. We can specify the number of rows we want to see in a data frame with the argument “n”. For example, look at the first row (n=1) of the test data:

In [None]:
testData.head(n=1)

### FloatText

In [None]:
h=widgets.FloatText(
    value=0.0,
    description='Performance:',
    disabled=False
)
display(h)

In [None]:
dfTofill.iloc[0,5]=h.value
dfTofill

# Concatenating the collected data to the CollectData data frame.   
Let us use the `concat()` function from the Python *pandas* package to append the CollectData and dfTofill data frames. The concat() function is used to concatenate *pandas* objects.

In [None]:
# CollectData is the first data frame
# dfTofill is the second data frame
CollectData  = pd.concat([CollectData, dfTofill])
display(CollectData)

## Have you consent to process and share the data before you save it to the working data folder?

Before we save our data to file, we must make sure we have consent to do so. The following line of code, will ensure that you have consent to save data.

In [None]:
CollectData=CollectData[CollectData['consent'] == True]
display(CollectData)

### Saving the CollectData data frame
Saving the data collected by your data-capture tool to the working data folder:

In [None]:
CollectData.to_csv('../Data/CollectedData.csv', index=False)

That is the CollectData data frame saved to the working 'Data' folder. You need to iterate through this Notebook until you have collected all of your test data and then save the captured test data to your 'RawData' folder.

In [None]:
CollectData.to_csv('../RawData/CollectedDataFinal.csv', index=False)

That is the final CollectData data frame saved to the 'RawData' folder. 

I hope these examples help you to improve your Python programming skills. Happy Coding!