# Worker Widget Data Generation

This is a generic take home question to generate some data and process it from Towards Data Science. 
The first script generates said data, while the "Take_home" script processes the data using various python packages.

The idea for the data is to generate a workers dataframe with various employee details (full/part time, Team, Status, etc.). Accompanying this dataframe is another that has product ("widget") information like an item number, the worker that made it, and the time taken at various steps to manufacture the widget.

In [6]:
#First import the required packages to generate some fake data.
import pandas as pd
import numpy as np
import faker as F
import random

# create some fake data with a faker object
fake = F.Faker()

# function to create a dataframe with fake values for our workers
def make_workers(num):
    
    # lists for some attributes to randomly assign to workers
    status_list = ['Full Time', 'Part Time', 'Per Diem']
    team_list = [fake.color_name() for x in range(4)]
    
    fake_workers = [{'Worker ID':x+1000,
                  'Worker Name':fake.name(), 
                  'Hire Date':fake.date_between(start_date='-30y', end_date='today'),
                  'Worker Status':np.random.choice(status_list, p=[0.50, 0.30, 0.20]), # assign items from list with different probabilities
                  'Team':np.random.choice(team_list)} for x in range(num)]
        
    return fake_workers

worker_df = pd.DataFrame(make_workers(num=5000))
worker_df.head()

Unnamed: 0,Worker ID,Worker Name,Hire Date,Worker Status,Team
0,1000,Lori Knox,1992-06-18,Part Time,Maroon
1,1001,Daniel Wallace,2004-01-12,Full Time,LightSteelBlue
2,1002,Justin Schmidt,2015-08-20,Part Time,LightSteelBlue
3,1003,Danielle Griffin,2007-05-26,Part Time,LightSteelBlue
4,1004,Tiffany Ellis,2021-02-27,Full Time,LightSteelBlue


In [7]:
# function to create widget data

def make_widget_data(num):
    
    fake_widgets = [{'Item Number':id(y),
                     'Step 1':np.random.gamma(shape=3, scale=1),
                     'Step 2':np.random.normal(5), 
                     'Step 3':np.random.exponential(4)} for y in range(num)]
    
    return fake_widgets

# empty list to store our widget dataframes in    
dfs_list = []

# now lets make some widget data for each worker
# iterate through the worker dataframe
for index, row in worker_df.iterrows():
    
    # not all workers work at the same rate - or the same number of hours
    # randomly select a number of widgets for them to create based on 'worker status'
    if row['Worker Status'] == 'Full Time':
        num_widgets = random.randrange(500, 1000)
    elif row['Worker Status'] == 'Part Time':
        num_widgets = random.randrange(100, 500)
    else:
        num_widgets = random.randrange(1, 1000)
    
    # make widgets for each worker
    tmp_widgets = pd.DataFrame(make_widget_data(num=num_widgets))
    
    # add worker id so we know who made the widget
    tmp_widgets['Worker ID'] = row['Worker ID']
    
    # make sure item number is unique by appending worker id
    tmp_widgets['Item Number'] = tmp_widgets['Item Number'].astype('str')+ '-' + tmp_widgets['Worker ID'].astype('str')
    
    # append to df list
    dfs_list.append(tmp_widgets)
    
# concatenate all the dfs 
widget_df = pd.concat(dfs_list)
print(widget_df.shape)
widget_df.head()

(2800207, 5)


Unnamed: 0,Item Number,Step 1,Step 2,Step 3,Worker ID
0,2118526593296-1000,1.467372,4.969398,0.220448,1000
1,2118526593328-1000,4.037566,6.351534,3.241431,1000
2,2118526593360-1000,2.473174,6.103326,12.529116,1000
3,2118526593392-1000,2.088975,6.27814,21.544346,1000
4,2118526593424-1000,4.103535,5.628992,6.779659,1000


In [8]:
#worker_df.to_csv('data_workers.csv', index=False)
#widget_df.to_csv('data_widgets.csv', index=False)