# Diagnostic Analysis of Data on NHS GP Appointments

# Purpose

The purpose of this notebook is to analyse the data on the UK public's appointments with General Practitioners (GPs) at the National Health Service (NHS). We want to get closer to understanding enough about why patients may miss appointments, and through this understanding enable decision makers at the NHS and the UK government reach decisions that will reduce the number of missed apppointments.\
\
For more on the purpose of this project, please read the project's [README file](https://github.com/andreas-yiallouros/LSE_DA_Course_NHS_GP_Appointments/blob/main/README.md). 

# Our data files
To begin with, we had the following data files:\
\
Data:
- *actual_duration.csv*
- *appointments_regional.csv* 
- *national_categories.xlsx*

Metadata:
- *metadata_nhs.txt*

A copy of these files is in our [GitHub repo](https://github.com/andreas-yiallouros/LSE_DA_Course_NHS_GP_Appointments).

# What does the metadata tell us?
*metadata_nhs.txt* tells us about: 
- the quality of the data we have for our project
- collectively what's in the three data files

# A note on data quality
The most important point to note is that our data was collected from systems that were designed to help manage appointments. Our data therefore, is different compared to what it would have been had it been collected by systems designed to enable analysis to inform policy decisions.

# What our data is about, and specific data quality issues
At the start of our project, each of our three data files had been partially cleaned by for example removing columns that were considered irrelevant to our analysis.\
\
The data fields highlighted most prominently in the metadata file are:
1. Appointment Status
2. Healthcare Professional Type
3. Mode of Appointment
4. Time between Booking and Appointment

Below we consider what the metadata tells us about each of these four data fields. This is important because it will help us focus our initial analysis.

### (1) Appointment Status (appointment_status)
The Appointment Status for each appointment changes over time moving from (1) 'Available' to (2) 'Booked' to (3) 'Attended' or 'Did Not Attend' or 'Cancelled'.

<span style="color:purple">Question:
Do cancelled appointments revert to 'available'?</span>

##### Two data issues are noted in the metadata file:
1. For between 3% and 6% of appointments the final status of the appointment remains 'Booked'. The status of these appointments in our dataset is reported as 'Unknown'.
2. Due to an issue with data collection, missed appointments are under-reported for the period from July 2018 until and including November 2018 (five months). ***Note:*** *We will use our analysis to evaluate the impact of this issue on the reliability of the missed appointments statistics.*

<span style="color:purple">Insight: If the appointments that continue showing as 'Booked' after the appointment time has lapsed are more likely to be in fact missed appointments, the missed appointments statistic will be understated. For example, if out of 100 appointments five are known to have been missed and three are shown as unknown, but in fact we know they have also been missed, the true rate of missed appointments would be 8% instead of the reported 5.2% (1 - 92/97).</span>

### (2) Healthcare Professional (hcp_type)
The healthcare professional type data point for each appointment is based on what's recorded as the occupation of the NHS employee attending the appointment. Due to data extraction issues our data shows only the value 'GP' and 'Other Practice Staff'. We are told in the metadata that a small proportion of GP appointments (for example some of the appointments with Trainee GPs or GP Partners) may be misclassified as appointments with other practice staff.

<span style="color:purple">Questions:
- <span style="color:purple">What is the definition of 'small' in 'small proportion of GP appointments'?
- <span style="color:purple">To what extent could this data issue impact the answer to our key questions about whether there has been adequate staff and capacity, and what was the utilisation of resources?</span>

### (3) Mode of Appointment (appointment_mode)

<span style="color:purple">Insights: 
- <span style="color:purple">The data on the mode of appointments (for example face-to-face, telephone, online, or home visits) is important because it could show us differences in the likelihood of appointments being missed depending on their mode. Knowing about such differences can inform decisions about reducing the rate of missed appointments.

- <span style="color:purple">Based on the metadata file there seem to be widespread data quality issues in how the mode of appointment is recorded.

***Note:*** *We will consider the implications of these issues for the purposes of our analysis.*

### (4) Time between Booking and Appointment (time_between_book_and_appointment)
The metadata file doesn't mention quality issues for the data about the time elapsed between when the booking was made and when the appointment happened.\
\
<span style="color:purple">Insight: It may be helpful for the purposes of our project to consider the relationship between this data point and the data on missed appointments. For example are appointments more likely to be missed when the elapsed time between booking and the appointment is longer?</span>

# What could the rest of the data fields tell us?
In this section, to further set the context for our data analysis, we consider what the metadata file tells us for the following data fields which are the the subject of the objectives for activity #2:
1. Service settings
2. Context types
3. National categories

## (1) Service settings
The metadata for 'Service settings' seems more technical compared to that for the first four fields considered above. 

In my interpretation aided by Google searches for the terms ['GMS/PMS/APMS contracts'](https://www.bma.org.uk/advice-and-support/gp-practices/funding-and-contracts/gms-contract-and-pms-agreement-differences); ['Additional Roles Reimursement Scheme'](https://www.england.nhs.uk/gp/expanding-our-workforce/); and ['Extended Access Provision'](https://www.england.nhs.uk/statistics/statistical-work-areas/extended-access-general-practice/), a description of the four service settings could be:
- **General Practice**: appointments delivering GP services in core contractual hours.
- **Extended Access Provision**: appointments delivering GP services outside of core contractual hours, either in the early morning, evening or at weekends.
- **Primary Care Network**: appointments delivering services by healthcare professionals other than GPs (for example 'clinical pharmacists', 'first contact physiotherapists', and 'health and wellbeing coaches')
- **Other**: appointments delivering services at the GP surgery, but by other service providers. 

## (2) Context types
'Context type' here refers to whether the appointment is an encounter with a patient or an activity which is part of patient care, but without the presence of the patient. I imagine an example of an activity which is part of patient care, but does not involve the presence of the patient might be the analysis of test results.\
\
The metadata suggest two 'Context types' plus an option to record the appointment as 'Unmapped' which would be used in case of an error in receiving the data or the data not captured.\
\
The two 'Context types' are:
- 'Care Related Encounter'
- 'Inconsistent Mapping'

'Care Related Encounter' seems self-explanatory - an appointment attended by a patient and a healthcare provider.\
\
'Inconsistent Mapping' as explained by the metadata file refers to a recorded appointment that did not involve the attendance of a patient. The use of the term 'inconsistent' shows that the notion of a patient appointment is contradictory with the notion that the patient is not in attendance.

## (3) National categories
From the metadata this seems to relate to the reason why the patient needed the appointment. It includes 13 options, for example 'General Consultation Acute', 'General Consultation Routine', and 'Planned Clinical Procedure'.



In [1]:
# Import the relevant libraries.
# Libraries selected based on the goals of the project and the data.
# See the markdown cells above for more details.
import numpy as np
import pandas as pd
import matplotlib as plt
import seaborn as sns
from bs4 import BeautifulSoup
import datetime

# Creating and exploring our DataFrames

## (1) Actual Duration of Appointments

In [2]:
# Create a DataFrame for 'actual_duration.csv'.
# Choosing a longer more decriptive name for the DataFrame.
df_actual_duration = pd.read_csv('actual_duration.csv')

In [3]:
# See the 'df_actual_duration' column names together with their data types.
# See the 'df_actual_duration' descriptive statistics.
df_actual_duration.info()
df_actual_duration.describe()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 137793 entries, 0 to 137792
Data columns (total 8 columns):
 #   Column                     Non-Null Count   Dtype 
---  ------                     --------------   ----- 
 0   sub_icb_location_code      137793 non-null  object
 1   sub_icb_location_ons_code  137793 non-null  object
 2   sub_icb_location_name      137793 non-null  object
 3   icb_ons_code               137793 non-null  object
 4   region_ons_code            137793 non-null  object
 5   appointment_date           137793 non-null  object
 6   actual_duration            137793 non-null  object
 7   count_of_appointments      137793 non-null  int64 
dtypes: int64(1), object(7)
memory usage: 8.4+ MB


Unnamed: 0,count_of_appointments
count,137793.0
mean,1219.080011
std,1546.902956
min,1.0
25%,194.0
50%,696.0
75%,1621.0
max,15400.0


In [32]:
# See the first ten rows.
df_actual_duration.head(10)

Unnamed: 0,sub_icb_location_code,sub_icb_location_ons_code,sub_icb_location_name,icb_ons_code,region_ons_code,appointment_date,actual_duration,count_of_appointments
0,00L,E38000130,NHS North East and North Cumbria ICB - 00L,E54000050,E40000012,01-Dec-21,31-60 Minutes,364
1,00L,E38000130,NHS North East and North Cumbria ICB - 00L,E54000050,E40000012,01-Dec-21,21-30 Minutes,619
2,00L,E38000130,NHS North East and North Cumbria ICB - 00L,E54000050,E40000012,01-Dec-21,6-10 Minutes,1698
3,00L,E38000130,NHS North East and North Cumbria ICB - 00L,E54000050,E40000012,01-Dec-21,Unknown / Data Quality,1277
4,00L,E38000130,NHS North East and North Cumbria ICB - 00L,E54000050,E40000012,01-Dec-21,16-20 Minutes,730
5,00L,E38000130,NHS North East and North Cumbria ICB - 00L,E54000050,E40000012,01-Dec-21,11-15 Minutes,1073
6,00L,E38000130,NHS North East and North Cumbria ICB - 00L,E54000050,E40000012,01-Dec-21,1-5 Minutes,1539
7,00L,E38000130,NHS North East and North Cumbria ICB - 00L,E54000050,E40000012,02-Dec-21,21-30 Minutes,601
8,00L,E38000130,NHS North East and North Cumbria ICB - 00L,E54000050,E40000012,02-Dec-21,Unknown / Data Quality,1391
9,00L,E38000130,NHS North East and North Cumbria ICB - 00L,E54000050,E40000012,02-Dec-21,11-15 Minutes,1139


### Insights
* 'df_actual_duration' includes eight columns and 137,793 rows.
* It seems to capture the count of appointments together with appointment dates and duration by location.
* From a Google search 'icb' may stand for 'Integrated Care Board' and 'ons' may stand for 'Office for National Statistics'.
* Seven columns have the data type 'object' and one ('count_of_appointments') has the data type 'integer'.
* We will consider changing the data type for 'appointment_date'.
* There seem to be no missing values.
* We will investigate the possible presence of 'count_of_appointments' outliers, given the min value of 1 and the max value of 15,400 compared to the mean (1,219) and standard deviation (1,547).


## (2) Regional Appointments

In [4]:
# Create a DataFrame for 'appointments_regional.csv'.
# Choosing a longer more decriptive name for the DataFrame.
df_appointments_regional = pd.read_csv('appointments_regional.csv')

In [5]:
# See the 'df_appointments_regional' column names together with their data types.
# See the 'df_appointments_regional' descriptive statistics.
df_appointments_regional.info()
df_appointments_regional.describe()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 596821 entries, 0 to 596820
Data columns (total 7 columns):
 #   Column                             Non-Null Count   Dtype 
---  ------                             --------------   ----- 
 0   icb_ons_code                       596821 non-null  object
 1   appointment_month                  596821 non-null  object
 2   appointment_status                 596821 non-null  object
 3   hcp_type                           596821 non-null  object
 4   appointment_mode                   596821 non-null  object
 5   time_between_book_and_appointment  596821 non-null  object
 6   count_of_appointments              596821 non-null  int64 
dtypes: int64(1), object(6)
memory usage: 31.9+ MB


Unnamed: 0,count_of_appointments
count,596821.0
mean,1244.601857
std,5856.887042
min,1.0
25%,7.0
50%,47.0
75%,308.0
max,211265.0


In [33]:
# See the first ten rows.
df_appointments_regional.head(10)

Unnamed: 0,icb_ons_code,appointment_month,appointment_status,hcp_type,appointment_mode,time_between_book_and_appointment,count_of_appointments
0,E54000034,2020-01,Attended,GP,Face-to-Face,1 Day,8107
1,E54000034,2020-01,Attended,GP,Face-to-Face,15 to 21 Days,6791
2,E54000034,2020-01,Attended,GP,Face-to-Face,2 to 7 Days,20686
3,E54000034,2020-01,Attended,GP,Face-to-Face,22 to 28 Days,4268
4,E54000034,2020-01,Attended,GP,Face-to-Face,8 to 14 Days,11971
5,E54000034,2020-01,Attended,GP,Face-to-Face,More than 28 Days,3273
6,E54000034,2020-01,Attended,GP,Face-to-Face,Same Day,64649
7,E54000034,2020-01,Attended,GP,Home Visit,1 Day,151
8,E54000034,2020-01,Attended,GP,Home Visit,15 to 21 Days,12
9,E54000034,2020-01,Attended,GP,Home Visit,2 to 7 Days,141


In [6]:
# Answer to how many appointment statuses are represented in the data.
count_apt_status = df_appointments_regional['appointment_status'].nunique()
print("Count of appointment statuses: ", count_apt_status)

Count of appointment statuses:  3


In [8]:
# Answer to what are the names of the three appointment statuses.
pd.unique(df_appointments_regional['appointment_status']).tolist()

['Attended', 'DNA', 'Unknown']

In [10]:
# Answer to how what is the count_of_appointments per appointment status.
df_apt_count_by_status_norm = (
    df_appointments_regional.appointment_status.value_counts(normalize=True))
df_apt_count_by_status_norm

Attended    0.388956
Unknown     0.337327
DNA         0.273717
Name: appointment_status, dtype: float64

### Insights
* 'df_appointments_regional' has seven columns and 596,821 rows.
* It seems to capture six data points by location (based on the 'icb_ons_code'): (1) the count_of_appointments together with (2) appointment_month, (3) status, (4) mode, (5) hcp ('healthcare professional type'), and (6) time_between_book_and_appointment.
* Six columns have data type 'object' and one ('count_of_appointments') has data type 'integer'.
* We will consider changing the data type for 'appointment_month' and 'time_between_book_and_appointment'.
* There seem to be no missing values.
* We will investigate the possible presence of 'count_of_appointments' outliers, given the min value of 1 and the max value of 211,265 compared to the mean (1,245) and standard deviation (5,857).
* The status of 34% of appointments shows as 'Unknown' and for 27% it is shown as 'DNA' ('did not attend'). Both of these are higher than expected indicating possible data quality issues beyond those described in the metadata file. 

## (3) National Categories

In [11]:
# Create a DataFrame for 'national_categories.xlsx'.
df_national_categories = pd.read_excel('national_categories.xlsx')

In [12]:
# See the 'df_national_categories' column names together with their data types.
# See the 'df_national_categories' descriptive statistics.
df_national_categories.info()
df_national_categories.describe()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 817394 entries, 0 to 817393
Data columns (total 8 columns):
 #   Column                 Non-Null Count   Dtype         
---  ------                 --------------   -----         
 0   appointment_date       817394 non-null  datetime64[ns]
 1   icb_ons_code           817394 non-null  object        
 2   sub_icb_location_name  817394 non-null  object        
 3   service_setting        817394 non-null  object        
 4   context_type           817394 non-null  object        
 5   national_category      817394 non-null  object        
 6   count_of_appointments  817394 non-null  int64         
 7   appointment_month      817394 non-null  object        
dtypes: datetime64[ns](1), int64(1), object(6)
memory usage: 49.9+ MB


Unnamed: 0,count_of_appointments
count,817394.0
mean,362.183684
std,1084.5766
min,1.0
25%,7.0
50%,25.0
75%,128.0
max,16590.0


In [36]:
# See the first ten rows.
df_national_categories.head(10)

Unnamed: 0,appointment_date,icb_ons_code,sub_icb_location_name,service_setting,context_type,national_category,count_of_appointments,appointment_month
0,2021-08-02,E54000050,NHS North East and North Cumbria ICB - 00L,Primary Care Network,Care Related Encounter,Patient contact during Care Home Round,3,2021-08
1,2021-08-02,E54000050,NHS North East and North Cumbria ICB - 00L,Other,Care Related Encounter,Planned Clinics,7,2021-08
2,2021-08-02,E54000050,NHS North East and North Cumbria ICB - 00L,General Practice,Care Related Encounter,Home Visit,79,2021-08
3,2021-08-02,E54000050,NHS North East and North Cumbria ICB - 00L,General Practice,Care Related Encounter,General Consultation Acute,725,2021-08
4,2021-08-02,E54000050,NHS North East and North Cumbria ICB - 00L,General Practice,Care Related Encounter,Structured Medication Review,2,2021-08
5,2021-08-02,E54000050,NHS North East and North Cumbria ICB - 00L,General Practice,Care Related Encounter,Care Home Visit,11,2021-08
6,2021-08-02,E54000050,NHS North East and North Cumbria ICB - 00L,Unmapped,Unmapped,Unmapped,372,2021-08
7,2021-08-02,E54000050,NHS North East and North Cumbria ICB - 00L,Primary Care Network,Care Related Encounter,Home Visit,4,2021-08
8,2021-08-02,E54000050,NHS North East and North Cumbria ICB - 00L,Other,Care Related Encounter,Clinical Triage,98,2021-08
9,2021-08-02,E54000050,NHS North East and North Cumbria ICB - 00L,Primary Care Network,Care Related Encounter,General Consultation Acute,35,2021-08


### Insights
* 'df_national_categories' has eight columns and 817,394 rows.
* It seems to capture six data points by location (based on the 'sub_icb_location_name'): (1) the count of appointments together with (2) appointment date; (3) appointment_month, (4) service_setting, (5) context_type, and (6) national_category.
* Seven columns have data type 'object' and one ('count_of_appointments') has data type 'integer'.
* We will consider changing the data type for 'appointment_month' and 'appointment_date'.
* There seem to be no missing values.
* We will investigate the possible presence of 'count_of_appointments' outliers, given the min value of 1 and the max value of 16,590 compared to the mean (362) and standard deviation (1,084).

In [13]:
# Answer how many locations are represented in the data.
count_locations = df_national_categories['sub_icb_location_name'].nunique()
print("Count of locations: ", count_locations)

Count of locations:  106


In [14]:
# Answer which are the five locations with the highest number of records.
df_locations = df_national_categories.sub_icb_location_name.value_counts()
df_locations.head()

NHS North West London ICB - W2U3Z              13007
NHS Kent and Medway ICB - 91Q                  12637
NHS Devon ICB - 15N                            12526
NHS Hampshire and Isle Of Wight ICB - D9Y0V    12171
NHS North East London ICB - A3A8R              11837
Name: sub_icb_location_name, dtype: int64

In [15]:
# Five locations with the highest number of records in percentage terms.
df_locations_norm = (
    df_national_categories.sub_icb_location_name.value_counts(normalize=True))
df_locations_norm.head()

NHS North West London ICB - W2U3Z              0.015913
NHS Kent and Medway ICB - 91Q                  0.015460
NHS Devon ICB - 15N                            0.015324
NHS Hampshire and Isle Of Wight ICB - D9Y0V    0.014890
NHS North East London ICB - A3A8R              0.014481
Name: sub_icb_location_name, dtype: float64

In [16]:
# Answer how many service settings are represented in the data.
count_service_settings = df_national_categories['service_setting'].nunique()
print("Count of service settings: ", count_service_settings)

Count of service settings:  5


In [17]:
# Answer how many context types are represented in the data.
count_context_types = df_national_categories['context_type'].nunique()
print("Count of context types: ", count_context_types)

Count of context types:  3


In [18]:
# Answer how many national categories are represented in the data.
count_national_categories = df_national_categories['national_category'].nunique()
print("Count of national categories: ", count_national_categories)

Count of national categories:  18


In [19]:
# Number of records by sevice setting in percentage terms.
df_service_setting_norm = (
    df_national_categories.service_setting.value_counts(normalize=True))
df_service_setting_norm

General Practice             0.439536
Primary Care Network         0.224849
Other                        0.169794
Extended Access Provision    0.132276
Unmapped                     0.033544
Name: service_setting, dtype: float64

In [20]:
# Number of records by context type in percentage terms.
df_context_type_norm = (
    df_national_categories.context_type.value_counts(normalize=True))
df_context_type_norm

Care Related Encounter    0.856969
Inconsistent Mapping      0.109487
Unmapped                  0.033544
Name: context_type, dtype: float64

In [21]:
# Number of records by national category in percentage terms.
df_nat_cat_norm = (
    df_national_categories.national_category.value_counts(normalize=True))
df_nat_cat_norm

Inconsistent Mapping                                                   0.109487
General Consultation Routine                                           0.109285
General Consultation Acute                                             0.103835
Planned Clinics                                                        0.093503
Clinical Triage                                                        0.091191
Planned Clinical Procedure                                             0.072953
Structured Medication Review                                           0.054401
Service provided by organisation external to the practice              0.052722
Home Visit                                                             0.051199
Unplanned Clinical Activity                                            0.049444
Patient contact during Care Home Round                                 0.035228
Unmapped                                                               0.033544
Care Home Visit                         

# Other questions before we move on to Activity 3
The following may help gain a deeper understanding of the data before we move on to Activity 3:
1. Is the number of locations at sub-icb level as shown by 'sub_icb_location_name' and 'sub_icb_location_code' the same in 'df_actual_duration' as it is in 'df_national_categories'?
2. Are the locations repreented by the 'icb_ons_code' data consistent across the three DataFrames? 
3. Why could we not have the 'sub_icb_location' breakdown in 'df_regional_appointments'?
4. Is the count of appointments consistent across the three DataFrames?

### Locations

In [24]:
# Check whether the number of locations represented in 'df_national_categories'
# is the same as the number of locations in 'df_actual_duration'.
count_locations_ad = df_actual_duration['sub_icb_location_name'].nunique()
print("Count of locations: ", count_locations_ad)
print("Count of locations in df_ad equals Count of locations in df_nc: ",
     count_locations == count_locations_ad)

Count of locations:  106
Count of locations in df_ad equals Count of locations in df_nc:  True


In [25]:
# Answer how many locations are represented in the data for 'sub_icb_location_ons_code'.
count_sub_icb_locations_ons = df_actual_duration['sub_icb_location_ons_code'].nunique()
print("Count of sub_icb locations based on ons: ", count_sub_icb_locations_ons)

Count of sub_icb locations based on ons:  106


In [27]:
# Answer how many locations are represented in the data for 'icb_ons_code'
# in df_actual_duration.
count_icb_ons_ad = df_actual_duration['icb_ons_code'].nunique()
print("Count of icb locations based on ons in 'df_actual_duration': ", count_icb_ons_ad)

Count of icb locations based on ons in 'df_actual_duration':  42


In [28]:
# Answer how many locations are represented in the data for 'icb_ons_code'
# in df_appointments_regional.
count_icb_ons_ra = df_appointments_regional['icb_ons_code'].nunique()
print("Count of icb locations based on ons in 'df_appointments_regional': ", count_icb_ons_ra)

Count of icb locations based on ons in 'df_appointments_regional':  42


In [29]:
# Answer how many locations are represented in the data for 'icb_ons_code'
# in df_national_categories.
count_icb_ons_nc = df_national_categories['icb_ons_code'].nunique()
print("Count of icb locations based on ons in 'df_national_categories': ", count_icb_ons_nc)

Count of icb locations based on ons in 'df_national_categories':  42


### Timeframe

In [35]:
# Answer to what are the 'appointment month' categories.
pd.unique(df_appointments_regional['appointment_month']).tolist()

['2020-01',
 '2020-02',
 '2020-03',
 '2020-04',
 '2020-05',
 '2020-06',
 '2020-07',
 '2020-08',
 '2020-09',
 '2020-10',
 '2020-11',
 '2020-12',
 '2021-01',
 '2021-02',
 '2021-03',
 '2021-04',
 '2021-05',
 '2021-06',
 '2021-07',
 '2021-08',
 '2021-09',
 '2021-10',
 '2021-11',
 '2021-12',
 '2022-01',
 '2022-02',
 '2022-03',
 '2022-04',
 '2022-05',
 '2022-06']

### Appointments

In [31]:
# Answer to what are the 'actual_duration' categories.
pd.unique(df_actual_duration['actual_duration']).tolist()

['31-60 Minutes',
 '21-30 Minutes',
 '6-10 Minutes',
 'Unknown / Data Quality',
 '16-20 Minutes',
 '11-15 Minutes',
 '1-5 Minutes']

In [34]:
# Answer to what are the 'time_between_book_and_appointment' categories.
pd.unique(df_appointments_regional['time_between_book_and_appointment']).tolist()

['1 Day',
 '15  to 21 Days',
 '2 to 7 Days',
 '22  to 28 Days',
 '8  to 14 Days',
 'More than 28 Days',
 'Same Day',
 'Unknown / Data Quality']

In [38]:
# Calculate the sum of appointments in 'df_actual_duration'
np.sum(df_actual_duration['count_of_appointments'])

167980692

In [39]:
# Calculate the sum of appointments in 'df_appointments_regional'
np.sum(df_appointments_regional['count_of_appointments'])

742804525

In [40]:
# Calculate the sum of appointments in 'df_national_categories'
np.sum(df_national_categories['count_of_appointments'])

296046770