# Sales Manager conversion rates

### Opening DB


In [1]:
import pandas as pd

#Read the data
sales = pd.read_csv("best_salesman_homework.csv", encoding="ISO-8859-1")
sales = sales.reset_index()
sales.columns

Index(['index', 'client_account_id', 'date', 'event_name', 'manager_id',
       'manager_nickname'],
      dtype='object')

In [2]:
sales.info

<bound method DataFrame.info of       index  client_account_id        date   event_name  manager_id  \
0         0                  0  2022-05-09  first_touch         1.0   
1         1                  1  2022-03-21  first_touch         3.0   
2         2                  2  2022-04-18  first_touch         2.0   
3         3                  3  2022-02-07  first_touch         2.0   
4         4                  4  2022-04-08  first_touch         1.0   
...     ...                ...         ...          ...         ...   
3178   3178               2982  2021-05-17  first_touch         3.0   
3179   3179               2983  2022-03-24  first_touch         3.0   
3180   3180               2984  2022-05-09  first_touch         1.0   
3181   3181               2985  2022-03-22  first_touch         2.0   
3182   3182               2986  2021-11-21  first_touch         3.0   

     manager_nickname  
0       Justin Beiber  
1           Joe Biden  
2        Kylie Jenner  
3        Kylie Jenn

In [3]:
sales['client_account_id'].value_counts()

1175    2
1329    2
2751    2
2393    2
2390    2
       ..
1027    1
1028    1
1029    1
1030    1
2986    1
Name: client_account_id, Length: 2987, dtype: int64

In [4]:
sales['event_name'].unique()

array(['first_touch', 'deal'], dtype=object)

### Exploring the fields

What we can see in this dataset (I opened it also in Excel) is that there are many calls that all sales managers do, and some of them lead to actual sales, but those sales could be made by another sales manager. Also the pipeline has only two steps - first call and an actual sale. 

The main issue is that there are two different sales managers working with the same customer. So we can compute two different parameters.

1. Group performance. Sales overall. Just compute Group conversion - how many sales and calls each manager does regardless the customer.
2. Individual performance. Compute calls that lead to sales and divide them to all calls made by each manager. Probably more important to count golden calls that causes sales.

### Group performance

First of all we will count amount of all calls and deals per sales manager

In [98]:
s = sales.groupby(['manager_nickname','event_name'])['client_account_id'].count()
display(s)

manager_nickname  event_name 
Joe Biden         deal             91
                  first_touch    1158
Justin Beiber     deal             37
                  first_touch     890
Kylie Jenner      deal             68
                  first_touch     939
Name: client_account_id, dtype: int64

In [78]:
def rate(df):
    return df[df.event_name == 'deal'].count() / df[df.event_name == 'first_touch'].count()

sales_calls = sales.groupby(['manager_nickname']).apply(rate)
display(sales_calls)

Unnamed: 0_level_0,index,client_account_id,date,event_name,manager_id,manager_nickname
manager_nickname,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Joe Biden,0.078584,0.078584,0.078584,0.078584,0.078584,0.078584
Justin Beiber,0.041573,0.041573,0.041573,0.041573,0.041573,0.041573
Kylie Jenner,0.072417,0.072417,0.072417,0.072417,0.072417,0.072417


As we can see, Joe Biden's conversion rate is little bit higher then others, but let's perform another calculation to have a bigger view.

### Individual performance

Here we should find all calls lead to deals first.

In [79]:
res = {}
for index, row in sales.iterrows():
    if row['event_name'] == 'deal':
        if sales.iloc[[index - 1]]['manager_nickname'][index - 1] not in res:
            res[sales.iloc[[index - 1]]['manager_nickname'][index - 1]] = 1
        else:
            res[sales.iloc[[index - 1]]['manager_nickname'][index - 1]] += 1
print(res)

{'Justin Beiber': 53, 'Joe Biden': 78, 'Kylie Jenner': 65}


Now we will find a conversion rate for each sales manager

In [89]:
def coeff(df):
    return res[df.manager_nickname.values[0]] / df[df.event_name == 'first_touch'].count()

sales_coeffs = sales.groupby(['manager_nickname']).apply(coeff)
display(sales_coeffs)


Unnamed: 0_level_0,index,client_account_id,date,event_name,manager_id,manager_nickname
manager_nickname,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Joe Biden,0.067358,0.067358,0.067358,0.067358,0.067358,0.067358
Justin Beiber,0.059551,0.059551,0.059551,0.059551,0.059551,0.059551
Kylie Jenner,0.069223,0.069223,0.069223,0.069223,0.069223,0.069223


Now we can see that Kylie's rate is little bit higher then Joe's even though she officially made less calls and sales. Biden will be dissapointed. We will not tell him.