### Prepping Data Challenge:  Is that the Case (week 43)

### Requirements
- Input the data
- From the Business Unit A Input, create a Date Lodged field
- Use the lookup table to update the risk rating
- Bring Business Unit A & B together
- We want to classify each case in relation to the beginning of the quarter (01/10/21):
  - Opening cases = if the case was lodged before the beginning of the quarter   
  - New cases = if the case was lodged after the beginning of the quarter
- In order to count cases closed/deferred within the quarter, we want to call out cases with a completed or deferred status
- For each rating, we then want to count how many cases are within the above 4 classifications
- We then want to create a field for Cases which will carry over into the next quarter
  - i.e. Opening Cases + New Cases - Completed Cases - Deferred Cases
- Reshape the data to match the final output
- Output the data

In [1]:
import pandas as pd
import numpy as np

In [2]:
#Input the data
#From the Business Unit A Input, create a Date Lodged field
with pd.ExcelFile(r"\Dataprep\2021\2021W43-Input.xlsx") as xl:
    bur = pd.read_excel(xl, 'Risk Level')
    bua = pd.read_excel(xl, 'Business Unit A ', parse_dates={'Date lodged' : ['Month ', 'Date', 'Year']})
    bub = pd.read_excel(xl, 'Business Unit B ', parse_dates=['Date lodged'], skiprows=5) \
         .rename(columns={'Unit' : 'Business Unit '})
    bub['Date lodged'] = pd.to_datetime(bub['Date lodged'], dayfirst=True)

In [3]:
bua.head()

Unnamed: 0,Date lodged,Ticket ID,Business Unit,Owner,Issue,Management Strategy,Status,Rating
0,2021-04-17,1,A,Jimmy Collins,Item was the wrong colour,Phosfluorescently engage worldwide methodologi...,In Progress,2
1,2021-10-03,2,A,Garret Hatfield,Delivery delayed,Collaboratively administrate turnkey channels ...,In Progress,1
2,2021-07-15,3,A,Garret Hatfield,Item didn't fit,Progressively maintain extensive infomediaries...,In Progress,1
3,2021-06-23,4,A,Amanda Williams,Item didn't fit,Proactively fabricate one-to-one materials via...,In Progress,3
4,2021-12-21,5,A,Amanda Williams,Delivery delayed,Appropriately empower dynamic leadership skill...,Deferred,3


In [4]:
bub.head()

Unnamed: 0,Ticket ID,Business Unit,Owner,Issue,Management Strategy,Date lodged,Status,Rating
0,19,B,Garret Hatfield,Sold the wrong product,Objectively innovate empowered manufactured pr...,2021-09-01,In Progress,Low
1,20,B,Amanda Williams,Item didn't fit,Proactively envisioned multimedia based expert...,2021-10-03,In Progress,Low
2,21,B,Jimmy Collins,Item was the wrong colour,"Credibly innovate granular internal or ""organi...",2021-02-02,Completed,Medium
3,22,B,Amanda Williams,Delivery delayed,Interactively procrastinate high-payoff conten...,2021-02-06,In Progress,High
4,23,B,Jimmy Collins,Delivery delayed,Globally incubate standards compliant channels...,2021-10-16,Completed,Medium


In [5]:
#Use the lookup table to update the risk rating
dict_bur = dict(zip(bur['Risk level'], bur['Risk rating']))
bua['Rating'] = bua['Rating'].replace(dict_bur)
#bua.head()

In [6]:
#Bring Business Unit A & B together
df = pd.concat([bua, bub])
df.columns = [c.strip() for c in df.columns]

In [7]:
#- We want to classify each case in relation to the beginning of the quarter (01/10/21):
#  - Opening cases = if the case was lodged before the beginning of the quarter   
#  - New cases = if the case was lodged after the beginning of the quarter
start_of_quarter = pd.to_datetime("2021-10-01")
df['Case Type'] = df['Date lodged'].apply(lambda x: 'Opening cases' if x < start_of_quarter else 'New cases')

In [8]:
# In order to count cases closed/deferred within the quarter, we want to call out cases with a completed or deferred status
# Count Completed and Deferred cases by Rating
completed_cases = df[df['Status'] == 'Completed'].groupby('Rating')['Ticket ID'].count().reset_index(name='Completed')
deferred_cases = df[df['Status'] == 'Deferred'].groupby('Rating')['Ticket ID'].count().reset_index(name='Deferred')

# For each rating, we then want to count how many cases are within the above 4 classifications
# Count New and Opening cases by Rating
new_cases = df[df['Case Type'] == 'New cases'].groupby('Rating')['Ticket ID'].count().reset_index(name='New cases')
opening_cases = df[df['Case Type'] == 'Opening cases'].groupby('Rating')['Ticket ID'].count().reset_index(name='Opening cases')

In [9]:
# - We then want to create a field for Cases which will carry over into the next quarter
#  - i.e. Opening Cases + New Cases - Completed Cases - Deferred Cases
total_cases = pd.merge(new_cases, opening_cases, on='Rating', how='outer').fillna(0)
closed_cases = pd.merge(completed_cases, deferred_cases, on='Rating', how='outer').fillna(0)

continuing_cases = pd.merge(total_cases, closed_cases, on='Rating', how='outer').fillna(0)
continuing_cases['Continuing'] = (
    continuing_cases['New cases'] + continuing_cases['Opening cases'] - continuing_cases['Completed'] - continuing_cases['Deferred'])

In [11]:
# Reshape the data to match the final output
output = continuing_cases.melt(
    id_vars=['Rating'],
    value_vars=['Completed', 'Deferred', 'New cases', 'Opening cases', 'Continuing'],
    var_name='Status',
    value_name='Cases'
)


output = output.sort_values(by=['Rating', 'Status']).reset_index(drop=True)

In [12]:
output

Unnamed: 0,Rating,Status,Cases
0,High,Completed,1.0
1,High,Continuing,5.0
2,High,Deferred,1.0
3,High,New cases,1.0
4,High,Opening cases,6.0
5,Low,Completed,4.0
6,Low,Continuing,8.0
7,Low,Deferred,0.0
8,Low,New cases,3.0
9,Low,Opening cases,9.0


In [13]:
#output the data
output.to_csv('wk43-output.csv', index=False)