# CLIPBOARD HEALTH SALES ANALYSIS

## Business Understanding
### Overview
Clipboard Health is a nationwide staffing platform specializing in providing staffing to long-term care facilities. The Centers for Medicaid and Medicare Services publish a quarterly report containing daily staffing data for all registered nursing homes in the US. I have been tasked to use the PBJ data and any other CMS data that you see fit to make a few recommendations to the Clipboard Health sales leadership team, using Q1 data.

### Objectives
The objective of this analysis is to evaluate the staffing data to identify trends, inefficiencies, and opportunities that can be leveraged to improve Clipboard Health’s staffing strategies and sales approach. 

### Analytical questions

1. Staffing Trends: What are the current staffing trends in nursing homes for Q1 2024? How do these trends vary by state, city, or county?

2. Workforce Allocation: What is the distribution of staffing hours between full-time employees and contractors? Are there any significant disparities that need to be addressed?

3. Geographical Analysis: Are there specific regions where staffing levels are consistently high or low? How can Clipboard Health tailor its offerings based on these regional differences?

4. Operational Efficiency: Are there any patterns in staffing hours that suggest inefficiencies or potential areas for improvement? How can Clipboard Health’s solutions address these inefficiencies?

5. Competitive Positioning: How do the staffing levels at facilities serviced by Clipboard Health compare to those serviced by competitors? What can be learned to enhance Clipboard Health’s market positioning and sales strategy?

## Data Understanding

In [1]:
#importing necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [5]:
#Loading my dataset
data = pd.read_csv('Dataset/PBJ_Daily_Nurse_Staffing_Q1_2024.csv', encoding='latin1')


  data = pd.read_csv('Dataset/PBJ_Daily_Nurse_Staffing_Q1_2024.csv', encoding='latin1')


In [6]:
#DisplayIng the first few rows of the data
data.head(5)

Unnamed: 0,PROVNUM,PROVNAME,CITY,STATE,COUNTY_NAME,COUNTY_FIPS,CY_Qtr,WorkDate,MDScensus,Hrs_RNDON,...,Hrs_LPN_ctr,Hrs_CNA,Hrs_CNA_emp,Hrs_CNA_ctr,Hrs_NAtrn,Hrs_NAtrn_emp,Hrs_NAtrn_ctr,Hrs_MedAide,Hrs_MedAide_emp,Hrs_MedAide_ctr
0,15009,"BURNS NURSING HOME, INC.",RUSSELLVILLE,AL,Franklin,59,2024Q1,20240101,50,8.0,...,0.0,156.34,156.34,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,15009,"BURNS NURSING HOME, INC.",RUSSELLVILLE,AL,Franklin,59,2024Q1,20240102,49,8.0,...,0.0,149.4,149.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,15009,"BURNS NURSING HOME, INC.",RUSSELLVILLE,AL,Franklin,59,2024Q1,20240103,49,8.0,...,0.0,147.15,147.15,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,15009,"BURNS NURSING HOME, INC.",RUSSELLVILLE,AL,Franklin,59,2024Q1,20240104,50,8.0,...,0.0,142.21,142.21,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,15009,"BURNS NURSING HOME, INC.",RUSSELLVILLE,AL,Franklin,59,2024Q1,20240105,51,8.0,...,0.0,149.4,149.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [7]:
#Checking datatypes of the columns in my data
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1330966 entries, 0 to 1330965
Data columns (total 33 columns):
 #   Column            Non-Null Count    Dtype  
---  ------            --------------    -----  
 0   PROVNUM           1330966 non-null  object 
 1   PROVNAME          1330966 non-null  object 
 2   CITY              1330966 non-null  object 
 3   STATE             1330966 non-null  object 
 4   COUNTY_NAME       1330966 non-null  object 
 5   COUNTY_FIPS       1330966 non-null  int64  
 6   CY_Qtr            1330966 non-null  object 
 7   WorkDate          1330966 non-null  int64  
 8   MDScensus         1330966 non-null  int64  
 9   Hrs_RNDON         1330966 non-null  float64
 10  Hrs_RNDON_emp     1330966 non-null  float64
 11  Hrs_RNDON_ctr     1330966 non-null  float64
 12  Hrs_RNadmin       1330966 non-null  float64
 13  Hrs_RNadmin_emp   1330966 non-null  float64
 14  Hrs_RNadmin_ctr   1330966 non-null  float64
 15  Hrs_RN            1330966 non-null  float64
 16  

In [8]:
#checking for null values
data.isna().sum()

PROVNUM             0
PROVNAME            0
CITY                0
STATE               0
COUNTY_NAME         0
COUNTY_FIPS         0
CY_Qtr              0
WorkDate            0
MDScensus           0
Hrs_RNDON           0
Hrs_RNDON_emp       0
Hrs_RNDON_ctr       0
Hrs_RNadmin         0
Hrs_RNadmin_emp     0
Hrs_RNadmin_ctr     0
Hrs_RN              0
Hrs_RN_emp          0
Hrs_RN_ctr          0
Hrs_LPNadmin        0
Hrs_LPNadmin_emp    0
Hrs_LPNadmin_ctr    0
Hrs_LPN             0
Hrs_LPN_emp         0
Hrs_LPN_ctr         0
Hrs_CNA             0
Hrs_CNA_emp         0
Hrs_CNA_ctr         0
Hrs_NAtrn           0
Hrs_NAtrn_emp       0
Hrs_NAtrn_ctr       0
Hrs_MedAide         0
Hrs_MedAide_emp     0
Hrs_MedAide_ctr     0
dtype: int64

In [9]:
#checking if there are any duplicates in my data
data.duplicated()

0          False
1          False
2          False
3          False
4          False
           ...  
1330961    False
1330962    False
1330963    False
1330964    False
1330965    False
Length: 1330966, dtype: bool

#### Observations
- There are no nulls
- I have to convert the workdate column to date time for filtering and easier analysis
- There are no nulls in myy data
- There are 33 columns