# Creating synthetic Total Rewards data with the Python Faker library

Composition of geographical data: 
- unique 4 digits employee number, starting at 1000
- unique names
- tenure information (start date & termination date)
- department information (sample: HR, Finance, Marketing, Sales, IT, Customer Service, Legal, Project Management)
- hierarchy level (79% individual contributor, 16% manager/director, 5% senior leadership) per this [ratio of individual contributors/managers/directors](https://ravio.com/blog/effective-management-structures-how-to-know-if-your-company-is-too-top-heavy#)
- base salary
- car allowance
- % of equity

In [1]:
# importing libraries
from faker import Faker
import pandas as pd
import numpy as np
import random

# creating a Faker instance in Canada
fake = Faker(locale='en_CA')

# importing dynamic provider for weighted choices
from faker.providers import DynamicProvider
from random import choices

# importing datetime to convert strings into dates if necessary
from datetime import datetime

In [2]:
# creating the list of hierarchy levels with their weights
hierarchy_pool = ["Individual contributor","Manager/Director",
                  "Senior Leadership"]
h_wts=[0.79,0.16,0.05]

# instancing the dynamic provider
hierarchy_level = DynamicProvider(provider_name="level",
                                  elements=choices(hierarchy_pool,
                                                   weights=h_wts,
                                    k=len(hierarchy_pool)))

# adding the new provider to the Faker instance
fake.add_provider(hierarchy_level)

### Rules for base salary, car allowances and equity %

**Base salary**
- If level is individual contributor, then base is $50k-$90k
- If level is manager/director, then base is $91k-$180k
- If level is senior leadership, then base is >$181k

**Car allowances**
- If level is individual contributor, then allowance is none
- If level is manager/director, then allowance is $1.8k-$3.6k
- If level is senior leadership, then allowance is $4.8k+

**Equity %**
- If level is individual contributor, then equity is none
- If level is manager/director, then equity is 2-5%
- If level is senior leadership, then equity is 6-10%

In [5]:
# creating a function to generate employee records
def create_employees(num_employees):
    employee_list = []
    for i in range(0, num_employees):
        employee = {}
# employee personal info
        employee['ee#'] = 1000+i
        employee['employee_name'] = fake.unique.name()
# employee job info        
        start_date = fake.date()
        employee['start_date'] = start_date
        employee['term_date'] = fake.date_between_dates(
            date_start=datetime.strptime(start_date,"%Y-%m-%d"))
        employee['department'] = fake.random_element(
            elements=("HR","Finance","Marketing","Sales",
                      "IT","Customer Service","Legal",
                      "Project Management"))
        level = fake.level()
        employee['level'] = level
# employee compensation info
# base salary
        if level == 'Individual contributor': 
            employee['base'] = fake.random_int(
                min=50000,
                max=90000,
                step=1000)
        elif level == 'Manager/Director':
            employee['base'] = fake.random_int(
                min=91000,
                max=180000,
                step=1000)
        else:
            employee['base'] = fake.random_int(
                min=181000,
                max=500000,
                step=1000)
# car allowances
        if level == 'Individual contributor': 
            employee['travel_allowance'] = 0
        elif level == 'Manager/Director':
            employee['travel_allowance'] = fake.random_int(
                min=1800,
                max=3600,
                step=100)
        else:
            employee['travel_allowance'] = fake.random_int(
                min=4800,
                step=100)
# equity %
        if level == 'Individual contributor': 
            employee['equity'] = 0
        elif level == 'Manager/Director':
            employee['equity'] = round(random.uniform(2,5),1)
        else:
            employee['equity'] = round(random.uniform(6,10),1)

# append the employee to the list        
        employee_list.append(employee)
    return pd.DataFrame(employee_list)

In [6]:
# creating a dataframe to hold the output of the function
# and visualize it to check for correct output
records = create_employees(5000)
print(records.shape)
print(records['level'].value_counts())
records

(5000, 9)
level
Individual contributor    3346
Senior Leadership         1654
Name: count, dtype: int64


Unnamed: 0,ee#,employee_name,start_date,term_date,department,level,base,travel_allowance,equity
0,1000,John Zuniga,1999-06-20,2019-10-23,Project Management,Senior Leadership,267000,4900,6.3
1,1001,Mario Shaw,2001-10-05,2014-02-24,IT,Individual contributor,66000,0,0.0
2,1002,Kaitlyn Black,1981-11-23,2007-08-29,Customer Service,Individual contributor,86000,0,0.0
3,1003,Mrs. Valerie Higgins DDS,2010-05-03,2015-09-04,Marketing,Individual contributor,82000,0,0.0
4,1004,Matthew Flores,2015-02-08,2015-11-20,Legal,Individual contributor,50000,0,0.0
...,...,...,...,...,...,...,...,...,...
4995,5995,Michael Glover,1999-01-12,2018-01-23,IT,Individual contributor,70000,0,0.0
4996,5996,Michelle Hall,1988-03-19,2009-03-17,Sales,Individual contributor,73000,0,0.0
4997,5997,April Snow,2017-05-05,2018-08-06,Project Management,Individual contributor,59000,0,0.0
4998,5998,William Rojas,2017-09-19,2023-03-07,IT,Individual contributor,54000,0,0.0


In [None]:
# export the dataframe to .csv file
records.to_csv('records.csv',index=False)