## Analysis of KPIs

We have 10 unique KPIs in the dataset, each with a specified range. Based on the names and ranges of the KPIs, we can infer whether each KPI should ideally be maximized or minimized. Here’s a preliminary classification:

### KPI Classifications:
- **Share of teams constituted as circles** – *Maximized*  
  A higher percentage is generally considered positive.
- **Share of short-term leave** – *Minimized*  
  A lower percentage indicates fewer short-term leaves.
- **Involuntary headcount change (FTE)** – *Minimized*  
  A lower percentage indicates less involuntary turnover.
- **Reachability** – *Maximized*  
  Assuming this is a measure of communication or network reach.
- **Count sessions on .projuventute.ch** – *Maximized*  
  More sessions indicate higher website engagement.
- **Count leads** – *Maximized*  
  More leads are generally better for business.
- **Net promoter score** – *Maximized*  
  A higher score indicates better customer loyalty.
- **Private donations** – *Maximized*  
  More donations are beneficial.
- **Additional monetization/savings from CRM** – *Maximized*  
  Indicating increased efficiency or revenue.
- **Additional monetization/savings from programs** – *Maximized*  
  Indicating increased efficiency or revenue.

### Next Steps:
1. **Set Targets:**  
   Based on historical data, considering the nature of each KPI.
2. **Evaluate KPI Values:**  
   Against these targets to understand the performance.
3. **Ensure Targets are Within Range:**  
   Confirm that the set targets are within the specified KPI range to ensure they are achievable and realistic.

In [1]:
import pandas as pd
import numpy as np
import plotly.graph_objs as go
import plotly.express as px
from plotly.subplots import make_subplots
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import MinMaxScaler

# Function to forecast the next 3 months using linear regression
def forecastNextMonths(kpiData, periods=4):
    """
    Forecast the next 3 months using linear regression
    Args:
        kpiData (pandas.DataFrame): The input data
        periods (int): The number of months to forecast
    Returns:
        pandas.DataFrame: The forecasted data
    """
    X = np.array(range(len(kpiData))).reshape(-1, 1)
    y = kpiData['value'].values
    model = LinearRegression()
    model.fit(X, y)
    futureMonths = np.array(range(len(kpiData), len(kpiData) + periods)).reshape(-1, 1)
    forecasts = model.predict(futureMonths)
    last_date = kpiData['date'].iloc[-1]
    future_dates = pd.date_range(start=last_date, periods=periods+1, freq='MS')[1:]
    forecastDf = pd.DataFrame({'date': future_dates, 'value': forecasts})
    return forecastDf

def createLinePlot(kpiData, kpi_name, targetDf):
    """
    Create a line plot
    Args:
        kpiData (pandas.DataFrame): The input data
        kpi_name (str): The name of the KPI
        targetDf (pandas.DataFrame): The target data
    Returns:
        plotly.graph_objs: The line plot
    """
    traceActual = go.Scatter(
        x=kpiData['date'],
        y=kpiData['value'],
        mode='lines+markers',
        name=f'{kpi_name} Value',
        line=dict(color='blue',width=1,dash='dot')
    )
    
    forecastDf = forecastNextMonths(kpiData,4)
    traceForecast = go.Scatter(
        x=forecastDf['date'],
        y=forecastDf['value'],
        mode='lines+markers',
        name=f'{kpi_name} Forecast',
        line=dict(color='green', dash='dot',width=1)
    )
    
    traceTarget = go.Scatter(
        x=targetDf['date'],
        y=targetDf['target'],
        mode='lines',
        name=f'{kpi_name} Target',
        line=dict(color='red', dash='dot',width=1)
        )
    
    return [traceActual, traceForecast, traceTarget]



# Pre-processing

In [2]:
# Load your data
df = pd.read_csv('/Users/diana/Dropbox/_hackathon/deploy_2023/_data/pj_sample_value.csv')

# Preprocessing: Creating a new date column combining period_year and period_month
df['date'] = pd.to_datetime(df['period_year'].astype(str) + '-' + df['period_month'].astype(str) + '-01', format='%Y-%m-%d')

# Sort by the new date column
df = df.sort_values(by='date')
df.drop(['period_year', 'period_month'], axis=1, inplace=True)

# clean dataset, gather classes, correct mistakes
df.dropna(inplace=True)
df.replace('0 <= X <= 100', '0 <= % <= 100', inplace=True)
df.replace('share short tern leave', 'share short term leave', inplace=True)
df.head(5)

Unnamed: 0,circle,kpi,periodicity,range,value,date
0,HR,share of teams constituted as circles,month,0 <= % <= 100,35.0,2023-01-01
12,HR,share short term leave,month,0 <= % <= 100,2.04,2023-01-01
40,Programs - Parents -Online,count sessions on .projuventute.ch,month,0 <= X,158611.0,2023-01-01
65,Fundraising,private donations,month,0 <= X,1369218.0,2023-01-01
24,HR,involuntary headcount change (FTE),month,0 <= % <= 100,2.26,2023-01-01


## Define KPI conditions:
- Trend of the target: up or down
- Set targets: a. based on historical data b. absolute targets (i.e. 0 or 100)

In [3]:
# define upward and downward trends
df['trend_target'] = 'upward'
df.loc[(df.kpi=='share short term leave')^(df.kpi=='involuntary headcount change (FTE)'),'trend_target'] = 'downward'

# Setting targets: Adjust this part according to specific needs and business logic
# Option 1: Set targets based on historical data (for example, the 75th percentile)
df['target'] = df.groupby('kpi')['value'].transform(lambda x: x.quantile(0.75))
# Option 2: Calculate the 6-month moving 75th percentile for each KPI, handling NaN values
df['target'] = df.groupby('kpi')['value'].transform(lambda x: x.rolling(window=6, min_periods=1).quantile(0.75))

# Set target for kpis with downward trend and defined range
selection = (df.trend_target=='downward')&(df.range=='0 <= % <= 100')
# Option 1: Set target to 0
df.loc[selection,'target'] = 0 # set target to 0
# Option 2: Set target to 25th percentile
df.loc[selection,'target'] = df.groupby('kpi')['value'].transform(lambda x: x.rolling(window=6, min_periods=1).quantile(0.25)).loc[selection]

# # Applying MinMax scaler to each KPI's values
df['value_norm'] = df.groupby('kpi')['value'].transform(
    lambda x: MinMaxScaler(feature_range=(0, 100)).fit_transform(x.values.reshape(-1, 1)).flatten()
    if np.count_nonzero(~np.isnan(x)) > 0 else x)
df['target_norm'] = df.groupby('kpi')['target'].transform(
    lambda x: MinMaxScaler(feature_range=(0, 100)).fit_transform(x.values.reshape(-1, 1)).flatten()
    if np.count_nonzero(~np.isnan(x)) > 0 else x)

# Difference between target and value
df['delta_norm'] = (df['value_norm'] - df['target_norm'])
df['delta_mean'] = df.groupby('kpi')['delta_norm'].transform(lambda x: x.mean())

df.head(10)

Unnamed: 0,circle,kpi,periodicity,range,value,date,trend_target,target,value_norm,target_norm,delta_norm,delta_mean
0,HR,share of teams constituted as circles,month,0 <= % <= 100,35.0,2023-01-01,upward,35.0,0.0,0.0,0.0,5.973193
12,HR,share short term leave,month,0 <= % <= 100,2.04,2023-01-01,downward,2.04,46.835443,0.0,46.83544,-4.904104
40,Programs - Parents -Online,count sessions on .projuventute.ch,month,0 <= X,158611.0,2023-01-01,upward,158611.0,0.0,0.0,0.0,-29.114078
65,Fundraising,private donations,month,0 <= X,1369218.0,2023-01-01,upward,1369218.0,25.701673,0.0,25.70167,-28.866233
24,HR,involuntary headcount change (FTE),month,0 <= % <= 100,2.26,2023-01-01,downward,2.26,100.0,100.0,1.421085e-14,3.934124
25,HR,involuntary headcount change (FTE),month,0 <= % <= 100,0.98,2023-02-01,downward,1.3,43.362832,54.982415,-11.61958,3.934124
41,Programs - Parents -Online,count sessions on .projuventute.ch,month,0 <= X,203755.0,2023-02-01,upward,192469.0,100.0,87.836769,12.16323,-29.114078
1,HR,share of teams constituted as circles,month,0 <= % <= 100,50.0,2023-02-01,upward,46.25,27.272727,23.076923,4.195804,5.973193
13,HR,share short term leave,month,0 <= % <= 100,2.2,2023-02-01,downward,2.08,53.586498,29.090909,24.49559,-4.904104
66,Fundraising,private donations,month,0 <= X,2144446.0,2023-02-01,upward,1950639.0,100.0,87.43929,12.56071,-28.866233


In [8]:
# forecast with linear regression
kpi = 'share short term leave'
kpiDF = df[df['kpi'] == kpi].dropna(subset=['value'])[['date','value']].reset_index(drop=True)

forecastDf = forecastNextMonths(kpiDF,3)
# add the last actual data point to plot continuosly
forecastDf = pd.concat([pd.DataFrame(kpiDF.iloc[-1,:]).T,forecastDf],ignore_index=True)

kpiDF['status'] = 'actual'
forecastDf['status'] = 'predicted'

# plot_df = pd.concat([kpiDF, forecastDf],ignore_index=True)
# targetDf = df.loc[df.kpi==kpi,['date','target']]
# traces = createLinePlot(kpiDF, kpi, targetDf)

# Determining the grid size for the subplots
numKpis = len(df['kpi'].unique())
cols = int(np.ceil(np.sqrt(numKpis)))
rows = int(np.ceil(numKpis / cols))

# Create subplots in a grid
fig = make_subplots(rows=rows, cols=cols, 
                    subplot_titles=df['kpi'].unique(), 
                    shared_xaxes=False, shared_yaxes=False)

# Adding traces for each KPI
for index, kpi in enumerate(df['kpi'].unique()):
    kpiDF = df[df['kpi'] == kpi].dropna(subset=['value'])
    targetDf = df.loc[df.kpi==kpi,['date','target']]#.rename(columns={'moving_percentile_75':'target'}) # or target
    traces = createLinePlot(kpiDF, kpi, targetDf)
    
    row = int(index / cols) + 1
    col = (index % cols) + 1
    
    for trace in traces:
        fig.add_trace(trace, row=row, col=col)

# Updating layout and reducing the font size of subplot titles
fig.update_layout(height=300*rows, 
                  title_text="KPIs with Targets and Forecasts", 
                  showlegend=False,
                  # subplot_titles=[{'text': title, 'font': {'size': 10}} for title in df['kpi'].unique()]
                  )
fig.update_xaxes(tickangle=45, tickfont=dict(size=8))
fig.update_yaxes(tickfont=dict(size=8))
fig.update_annotations(font=dict(family="Helvetica", size=12))

# Show plot
fig.show()