# Can you find a better way to segment your customers?

## 📖 Motivation (Kyra) 
You work for a medical device manufacturer in Switzerland. Your company manufactures orthopedic devices and sells them worldwide. The company sells directly to individual doctors who use them on rehabilitation and physical therapy patients.

Historically, the sales and customer support departments have grouped doctors by geography. However, the region is not a good predictor of the number of purchases a doctor will make or their support needs.

Your team wants to use a data-centric approach to segmenting doctors to improve marketing, customer service, and product planning. 

# Appendix

## Data Cleaning & Wrangling

### Setup

In [1]:
import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt
import pandas as pd

import statsmodels.api as sm

### Import Data

In [2]:
#Read in all four at once
doctors = pd.read_csv('data/doctors.csv')
orders = pd.read_csv('data/orders.csv')
complaints = pd.read_csv('data/complaints.csv')
instructions = pd.read_csv('data/instructions.csv')

In [3]:
def clean_satisfaction(sat):
    if sat == '--':
        sat = np.nan
    else:
        sat = float(sat)
    return sat

doctors['Satisfaction'] = doctors['Satisfaction'].apply(clean_satisfaction)

def transform_rank(rank):
    ###Takes name of doctor's rank and transforms it into ordinal data from 1-9
    if rank == 'Ambassador':
        num_rank = 9
    elif rank == 'Titanium Plus':
        num_rank = 8
    elif rank == 'Titanium':
        num_rank = 7
    elif rank == 'Platinum Plus':
        num_rank = 6
    elif rank == 'Platinum':
        num_rank = 5
    elif rank == 'Gold Plus':
        num_rank = 4
    elif rank == 'Gold':
        num_rank = 3
    elif rank == 'Silver Plus':
        num_rank = 2
    elif rank == 'Silver':
        num_rank = 1
    else:
        num_rank = np.nan
    return num_rank

def conv_cat_to_num(cat):
    ###Takes category of doctor and returns 1 if specialist and 0 if GP
    if cat == 'Specialist':
        cat = 1
    elif cat == 'General Practitioner':
        cat = 0
    else:
        cat = np.nan
    return cat

#apply to doctors dataframe
doctors['Rank'] = doctors['Rank'].apply(transform_rank)
doctors['Category'] = doctors['Category'].apply(conv_cat_to_num)



In [4]:
ords_per_doc = orders['DoctorID'].value_counts()
ords_per_doc = pd.DataFrame(ords_per_doc)
ords_per_doc.index.name = 'DoctorID'
ords_per_doc.columns = ['Orders']
ords_per_doc.reset_index(inplace=True)


In [5]:
doc_IDs = complaints['DoctorID'].unique()
doc_IDs = list(doc_IDs)
comp_per_doc = pd.DataFrame(doc_IDs)
comp_per_doc.columns = ['DoctorID']
comp_per_doc['Total Complaints'] = 0

for ID in doc_IDs:
    temp_df = complaints[complaints['DoctorID'] == ID]
    total_comp = temp_df['Qty'].sum()
    index = comp_per_doc.index[comp_per_doc['DoctorID'] == ID].tolist()[0]
    comp_per_doc.iloc[index, 1] = total_comp



In [6]:
def instr_conv_to_number(str_in):
    if str_in == 'Yes':
        result = 1
    elif str_in == 'No':
        result = 0
    else:
        result = np.nan
    return result

instructions['Instructions'] = instructions['Instructions'].apply(instr_conv_to_number)

In [7]:
doc_merged = doctors.merge(comp_per_doc, how = 'left', on = 'DoctorID')
doc_merged = doc_merged.merge(ords_per_doc, how = 'left', on = 'DoctorID')
doc_merged = doc_merged.merge(instructions, how = 'left', on = 'DoctorID')

doc_merged = doc_merged [['DoctorID',
                          'Satisfaction', 
                          'Category', 
                          'Incidence rate', 
                          'R rate', 
                          'Experience', 
                          'Purchases', 
                          'Total Complaints', 
                          'Orders', 
                          'Instructions']]

In [8]:
doc_merged

Unnamed: 0,DoctorID,Satisfaction,Category,Incidence rate,R rate,Experience,Purchases,Total Complaints,Orders,Instructions
0,AHDCBA,53.85,1,49.00,0.90,1.20,49.0,,,1.0
1,ABHAHF,100.00,0,37.00,0.00,0.00,38.0,,,
2,FDHFJ,,1,33.00,1.53,0.00,34.0,,,
3,BJJHCA,,1,28.00,2.03,0.48,29.0,,,
4,FJBEA,76.79,1,23.00,0.96,0.75,24.0,,,
...,...,...,...,...,...,...,...,...,...,...
432,AIABDJ,11.76,1,2.18,0.80,0.77,35.0,2.0,1.0,1.0
433,BBAJCF,,1,2.17,1.68,0.11,19.0,1.0,2.0,
434,GGCFB,,1,2.14,0.77,0.27,22.0,,,1.0
435,FDCEG,100.00,1,2.13,0.84,0.32,25.0,,,


<img src="images/mike.PNG" width="200">