# AI-Assisted HR Management System – Exploratory Phase

## Motivation

This project was inspired by the observation that HR management systems are often built on cumbersome legacy systems that rely on HR managers to conduct a series of tasks anytime an employee is hired, terminated, moved to a different department, or given a new set of responsibilities. Because HR managers are often overtaxed—responsible for hundreds or thousands of employees—and have pressing legal matters to attend to, HR systems and org charts are outdated and inaccurate more often than not.

This exploratory notebook is intended to explore three aspects of the application we intend to build so that our approach to the code is optimized:


### *__1 – Data Structures__*

Our intent is to develop the core data management system through use of the Python library `anytree`. We want to ensure that the data structure is tractable and scalable.

### *__2 – Data Storage__*

We intend to explore the ideal data storage practices that will maximize traversibility by an AI agent while ensuring that storage is secure, and cost-efficient.

### *__3 – AI Applications__*

We want to explore the ways in which generative AI can add value to an HR system. In order to do that we'll need to find an ideal way to vectorize our HR data so that.
***

## Data Structures

### Investigating the Utility of  `anytree`

Let's start with an install:

In [1]:
!pip install anytree -q

And proceed to explore some of the basic functionality we might need:

In [71]:
from anytree import AnyNode, RenderTree, AsciiStyle, LevelOrderGroupIter

root = AnyNode(id='CEO', fte=1.0, children=[
    AnyNode(id='COO', mgmt_lvl='C-Suite', fte=1.0, children=[
        AnyNode(id='VP, Strategy & Operations', mgmt_lvl='Executive Director', fte=1.0),
        AnyNode(id='EVP, Content Platforms', mgmt_lvl='Executive Director', fte=1.0, children=[
            AnyNode(id='Director, Streaming Performance', mgmt_lvl='Director', fte=1.0, children=[
                AnyNode(id='Manager, Streaming Analytics', mgmt_lvl='Manager', fte=1.0, children=[
                    AnyNode(id='Data Analyst', mgmt_lvl='Individual Contributor', fte=0.8)
                ]),
                AnyNode(id='Manager, Streaming Marketing Strategy', mgmt_lvl='Manager', fte=1.3, children=[
                    AnyNode(id='Coordinator, Social Media Marketing', mgmt_lvl='Manager', fte=1.0),
                    AnyNode(id='Coordinator, Print & Digital Marketing', mgmt_lvl='Manager', fte=1.0)
                ])
            ])
        ])
    ]),
    AnyNode(id='CFO', mgmt_lvl='C-Suite', fte=1.0),
    AnyNode(id='CMBO', mgmt_lvl='C-Suite', fte=1.0)
])

In [72]:
print(RenderTree(root))

AnyNode(fte=1.0, id='CEO')
├── AnyNode(fte=1.0, id='COO', mgmt_lvl='C-Suite')
│   ├── AnyNode(fte=1.0, id='VP, Strategy & Operations', mgmt_lvl='Executive Director')
│   └── AnyNode(fte=1.0, id='EVP, Content Platforms', mgmt_lvl='Executive Director')
│       └── AnyNode(fte=1.0, id='Director, Streaming Performance', mgmt_lvl='Director')
│           ├── AnyNode(fte=1.0, id='Manager, Streaming Analytics', mgmt_lvl='Manager')
│           │   └── AnyNode(fte=0.8, id='Data Analyst', mgmt_lvl='Individual Contributor')
│           └── AnyNode(fte=1.3, id='Manager, Streaming Marketing Strategy', mgmt_lvl='Manager')
│               ├── AnyNode(fte=1.0, id='Coordinator, Social Media Marketing', mgmt_lvl='Manager')
│               └── AnyNode(fte=1.0, id='Coordinator, Print & Digital Marketing', mgmt_lvl='Manager')
├── AnyNode(fte=1.0, id='CFO', mgmt_lvl='C-Suite')
└── AnyNode(fte=1.0, id='CMBO', mgmt_lvl='C-Suite')


In [73]:
for pre, _, node in RenderTree(root):
    print("%s%s" % (pre, node.id))

CEO
├── COO
│   ├── VP, Strategy & Operations
│   └── EVP, Content Platforms
│       └── Director, Streaming Performance
│           ├── Manager, Streaming Analytics
│           │   └── Data Analyst
│           └── Manager, Streaming Marketing Strategy
│               ├── Coordinator, Social Media Marketing
│               └── Coordinator, Print & Digital Marketing
├── CFO
└── CMBO


In [74]:
print(RenderTree(root).by_attr('id'))

CEO
├── COO
│   ├── VP, Strategy & Operations
│   └── EVP, Content Platforms
│       └── Director, Streaming Performance
│           ├── Manager, Streaming Analytics
│           │   └── Data Analyst
│           └── Manager, Streaming Marketing Strategy
│               ├── Coordinator, Social Media Marketing
│               └── Coordinator, Print & Digital Marketing
├── CFO
└── CMBO


In [75]:
root.children

(AnyNode(fte=1.0, id='COO', mgmt_lvl='C-Suite'),
 AnyNode(fte=1.0, id='CFO', mgmt_lvl='C-Suite'),
 AnyNode(fte=1.0, id='CMBO', mgmt_lvl='C-Suite'))

In [77]:
level_summaries = [[node.fte for node in children] for children in LevelOrderGroupIter(root)]

In [82]:
dir_count = 0
ind_count = 0
i = 0

for l in level_summaries:
    
    if i == 1:
        dir_count += sum(l)
        
    if i > 1:
        ind_count += sum(l)
        
    i += 1

In [83]:
dir_count

3.0

In [84]:
ind_count

8.1

In [48]:
other = root.children[0].children[1]

In [51]:
print(RenderTree(other).by_attr('id'))

EVP, Content Platforms
└── Director, Streaming Performance
    ├── Manager, Streaming Analytics
    │   └── Data Analyst
    └── Manager, Streaming Marketing Strategy
        ├── Coordinator, Social Media Marketing
        └── Coordinator, Print & Digital Marketing


Looks like this will serve our needs pretty well. So now let's import a dataset and try to wrangle it into something that approaches sense:

In [55]:
 structure = [[node.id for node in children] for children in LevelOrderGroupIter(other)]

In [58]:
len(structure)

4

In [None]:
def IndirectReports(start_node):
    
    return tree