Let's start by loading the data and taking a look at it.

In [2]:
import pandas as pd

employees_df = pd.read_csv("../data/employees.csv")
safehouses_df = pd.read_csv("../data/safehouses.csv")
divisions_df = pd.read_csv("../data/divisions.csv")
managers_df = pd.read_csv("../data/managers.csv")
actions_df = pd.read_csv("../data/actions.csv")


Examining Employees

In [10]:
employees_df.sample(5)


Unnamed: 0,EmployeeID,EmployeeName,JobTitle,Email,Phone,Manager
10786,10789,Pamela Doyle,Machine Learning Engineer,pamela_doyle@brlda.gov,3391516360,Laura Stevens
5238,5241,Julie Rogers,Data Analyst,julie_rogers@brlda.gov,(687)006-0887x908,James Rodriguez
18701,18705,Matthew Winters,Data Analyst,matthew_winters@brlda.gov,231.857.6562,Michelle Kramer
9185,9188,Jessica Singh,Project Manager,jessica_singh@brlda.gov,(658)504-4366x7007,Leslie Larson
12780,12783,Leslie Moody,Data Analyst,leslie_moody@brlda.gov,566-283-2550x452,Gina Johnson


If they removed their users, we should see some missing EmployeeID values.

In [11]:
# Check min and max EmployeeID
min_eid = employees_df["EmployeeID"].min()
max_eid = employees_df["EmployeeID"].max()
print("Min EmployeeID: ", min_eid)
print("Max EmployeeID: ", max_eid)

# Check employees ids not in the min-max range
missing_employees = set(range(min_eid, max_eid + 1)) - set(employees_df["EmployeeID"])
print("Employees ids not in the min-max range: ", missing_employees)


Min EmployeeID:  1
Max EmployeeID:  26849
Employees ids not in the min-max range:  {14976, 22602, 26188, 1423, 4284}


Are they in other tables?

In [12]:
divisions_df[divisions_df["EmployeeID"].isin(missing_employees)]


Unnamed: 0,EmployeeID,EmployeeName,Division,Project,known_safehouses
1422,1423,,[Division 7],[Project e-enable_holistic_models],"[14, 214, 181, 219]"
4283,4284,,[Division 7],[Project repurpose_collaborative_methodologies...,"[10, 219]"
14975,14976,,[Division 7],[Project transform_24/365_functionalities],"[25, 154, 231, 33, 219]"
22601,22602,,[Division 7],"[Project monetize_one-to-one_mindshare, Projec...","[12, 221, 19, 18, 219]"
26187,26188,,[Division 7],[Project extend_robust_action-items],"[7, 219]"


In [5]:
actions_df[actions_df["EmployeeID"].isin(missing_employees)].sort_values("ActionDate")


Unnamed: 0,EmployeeID,ActionType,ActionDate,ActionDescription,ActionLocation,ActionStatus,ActionSeverity,AssociatedProject,AssociatedDivision
41824,14976,Quantum Key Generation,1994-06-06 00:00:00,perform data mining on social media data for s...,Puerto Rico,completed,critical,Project transform_24/365_functionalities,Division 1
53036,26188,Predictive Modeling,1994-11-20 00:00:00,construct algorithms for automatic gait recogn...,Puerto Rico,failed,critical,Project extend_robust_action-items,Division 10
4283,4284,Data Clustering,1996-05-08 00:00:00,Initiate operation Networked_discrete_system_e...,Martinique,completed,high,Project repurpose_collaborative_methodologies,Division 6
81969,1423,User Profiling,1997-04-12 00:00:00,Operation Re-contextualized_attitude-oriented_...,Egypt,failed,medium,Project e-enable_holistic_models,Division 3
68673,14976,Natural Language Generation,2007-06-06 00:00:00,construct algorithms for automatic vein recogn...,Benin,completed,critical,Project transform_24/365_functionalities,Division 1
31132,4284,Quantum Resistant Cryptography,2007-09-17 00:00:00,Initiate operation Customizable_discrete_paral...,Kazakhstan,failed,critical,Project repurpose_collaborative_methodologies,Division 6
57981,4284,Machine Learning-based Intrusion Detection,2007-10-25 00:00:00,analyze communication patterns through Fully-c...,Albania,completed,high,Project repurpose_collaborative_methodologies,Division 6
28271,1423,Automated Surveillance,2009-10-18 00:00:00,Operation Down-sized_24/7_capability to develo...,Puerto Rico,failed,high,Project e-enable_holistic_models,Division 3
14975,14976,Natural Language Generation,2011-08-06 00:00:00,Initiate operation Centralized_upward-trending...,Bahrain,completed,low,Project transform_24/365_functionalities,Division 1
49450,22602,Quantum Key Generation,2012-10-22 00:00:00,Operation Digitized_methodical_structure to ap...,Ireland,failed,high,Project monetize_one-to-one_mindshare,Division 6


Let's take a look at Safehouses now.

In [13]:
safehouses_df.sample(5)


Unnamed: 0,ID,City,Address,Latitude,Longitude
135,31,Mexico City,"52660 Rancho Jiménez, MEX, Mexico",19.128836,-99.371064
47,152,Paris,"Rue de Montubois, 95840 Béthemont-la-Forêt, Fr...",49.049567,2.241954
167,14,Moscow,"46Н-05815, Протасово, Moscow Oblast, Russia, 1...",56.136233,37.619822
60,130,Davao City,"Angalan Road, Davao City, 8022 Davao Region, P...",7.118768,125.461715
40,159,Moscow,"Ирландская улица 33, Минзаг, Moscow, Russia, 1...",55.440171,37.326182


In [8]:
# Map of safehouses with Latitude and Longitude
import folium

safehouses_map = folium.Map(
    location=[safehouses_df["Latitude"].mean(), safehouses_df["Longitude"].mean()],
    zoom_start=12,
)

for index, row in safehouses_df.iterrows():
    folium.Marker([row["Latitude"], row["Longitude"]], popup=row["ID"]).add_to(
        safehouses_map
    )

safehouses_map
