## Preliminaries

Import necessary functions

In [69]:
from datetime import datetime as dt
import math
import pandas as pd

## Define weights

Create a dictionary object containing the weights for each factor. The default weight is 10 but can range between 0 and 100.

Primary factors should have a higher weight.

In [70]:
fweights = {'r_made': 20, 'r_acc': 20, 'r_com': 20, 'acc_r': 20, 'com_r': 20, 's_url': 10, 's_mail': 10, 's_tel': 10}

In [71]:
w = fweights

## Import data

Import the service data as a `pandas` dataframe.

In [72]:
df = pd.read_csv("./data/dummydatav001-ar-19dec22.csv")
df

Unnamed: 0,SID,ServiceStatus,ServiceName,ServiceAddress,ServiceEmail,ServiceWeb,ServiceTelephone,DateLastUpdated,rmade,raccepted,rcompleted
0,1,active,a,0,0,0,0,10/06/2022,279,201,174
1,2,active,b,1,1,1,0,17/08/2021,369,347,254
2,3,active,c,1,0,0,0,04/02/2021,209,164,101
3,4,active,d,1,1,1,1,26/09/2021,295,211,146
4,5,active,a,1,0,0,0,23/05/2021,443,362,267
...,...,...,...,...,...,...,...,...,...,...,...
245,246,inactive,d,0,1,1,1,08/06/2021,283,218,145
246,247,inactive,a,0,0,1,1,19/05/2021,471,464,434
247,248,inactive,a,1,0,0,1,21/11/2021,92,47,30
248,249,inactive,b,0,0,0,0,28/03/2021,195,110,67


In [73]:
# Create derived factors
    
df['acc_r'] = df['raccepted'] / df['rmade']
df['com_r'] = df['rcompleted'] / df['raccepted']

## Create population summaries function

Create a function that calculates statistical summaries of relavant service factors i.e., median number of referrals for all service providers.

In [74]:
def statsum(df):
    
    sdata = df
    
    r_made_p50 = sdata['rmade'].median()
    r_acc_p50 = sdata['raccepted'].median()
    r_com_p50 = sdata['rcompleted'].median()
    acc_r_p50 = sdata['acc_r'].median()
    com_r_p50 = sdata['com_r'].median()
    
    summaries = {'r_made_p50': r_made_p50, 'r_acc_p50': r_acc_p50, 'r_com_p50': r_com_p50, 'acc_r_p50': acc_r_p50, 'com_r_p50': com_r_p50}
    return summaries

In [75]:
summaries = statsum(df)
summaries

{'r_made_p50': 254.0,
 'r_acc_p50': 202.5,
 'r_com_p50': 162.0,
 'acc_r_p50': 0.8056638142845038,
 'com_r_p50': 0.7897727272727273}

In [76]:
pop_sum = summaries

In [77]:
pop_sum['r_made_p50']

254.0

## Create metric normalisation function

Consider whether this is better off performed in the ri_score() function.

In [78]:
def met_t(factor, factor_p50):
    
    fac_t = 1 / (1 + math.exp(-0.01*(factor - factor_p50)))
    
    return fac_t

In [79]:
# Transform metric variables to normalised versions

df['r_made'] = ""
df['r_acc'] = ""
df['r_com'] = ""

for i in df.index:
    df['r_made'][i] = met_t(df['rmade'][i], pop_sum['r_made_p50'])
    df['r_acc'][i] = met_t(df['raccepted'][i], pop_sum['r_acc_p50'])
    df['r_com'][i] = met_t(df['rcompleted'][i], pop_sum['r_com_p50'])

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['r_made'][i] = met_t(df['rmade'][i], pop_sum['r_made_p50'])
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['r_acc'][i] = met_t(df['raccepted'][i], pop_sum['r_acc_p50'])
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['r_com'][i] = met_t(df['rcompleted'][i], pop_sum['r_com_p50'])


In [80]:
df[['r_made', 'r_acc', 'r_com']]

Unnamed: 0,r_made,r_acc,r_com
0,0.562177,0.49625,0.529964
1,0.759511,0.809228,0.715042
2,0.389361,0.404922,0.352059
3,0.601088,0.521237,0.460085
4,0.868756,0.831318,0.740775
...,...,...,...
245,0.571996,0.538673,0.457602
246,0.897523,0.931821,0.938197
247,0.165205,0.174365,0.210818
248,0.356635,0.28394,0.278885


## Handle missing values of primary factors

Replace with median value for all services. However also need an indicator of missingness, otherwise a new service will look better than a lower scoring, longer-running service.

In [81]:
## Handle missing values

df.loc[df['r_made'].isna(), 'r_made_miss'] = 1
df.loc[df['r_acc'].isna(), 'r_acc_miss'] = 1
df.loc[df['r_com'].isna(), 'r_com_miss'] = 1
df.loc[df['acc_r'].isna(), 'acc_r_miss'] = 1
df.loc[df['com_r'].isna(), 'com_r_miss'] = 1

df.loc[df['r_made'].isna(), 'r_made'] = pop_sum['r_made_p50']
df.loc[df['r_acc'].isna(), 'r_acc'] = pop_sum['r_acc_p50']
df.loc[df['r_com'].isna(), 'r_com'] = pop_sum['r_com_p50']
df.loc[df['acc_r'].isna(), 'acc_r'] = pop_sum['acc_r_p50']
df.loc[df['com_r'].isna(), 'com_r'] = pop_sum['com_r_p50']

In [66]:
## Calculate reliability index
    
df['ri_score'] = "" # initialise as empty column

# If service is inactive, ri_score = 0

df.loc[df['ServiceStatus']=='inactive', 'ri_score'] = 0


    
df['ri_score'] = 1000 * (((df['r_made'] * w['r_made']) + (df['r_acc'] * w['r_acc']) + (df['r_com'] * w['r_com']) 
                         + (df['acc_r'] * w['acc_r']) + (df['com_r'] * w['com_r']) + (df['ServiceEmail'] * w['s_mail'])
                         + (df['ServiceWeb'] * w['s_url']) + (df['ServiceTelephone'] * w['s_tel']))
                         / sum(w.values())) 

In [67]:
df['ri_score']

0      488.383442
1      762.484405
2       391.82833
3      690.708858
4      614.703639
          ...    
245     692.88099
246    875.080472
247    338.393213
248     321.94667
249    351.979935
Name: ri_score, Length: 250, dtype: object

In [68]:
df.to_csv("./data/testdata.csv")

## Create RI function

Define a function that calculates a Reliability Index score for service providers.

TASK: how should I loop over service providers: within the function or outside?

In [None]:
def ri_score(df, summaries, fweights):
    df = df
    pop_sum = summaries
    w = fweights
    
    # Transform metric variables to normalised versions
    
    df['r_made'] = met_t(df['rmade'], pop_sum['r_made_p50'])
    
    ## Calculate reliability index
    
    df['ri_score'] = "" # initialise as empty column
    
    df['ri_score'] = (df['r_made'] * w['r_made']) + 
    
    s_res = {'service_id': s_id, 'service_name': s_name, 'reliability_index': ri_score}
    
    return s_res

In [None]:
ri_results = ri_score(fweights)
ri_results