# Employee Review Demo

A demo of using [SurrealDB](https://surrealdb.com/) to query Kaggle's [Employee Review](https://www.kaggle.com/datasets/fiodarryzhykau/employee-review) dataset.

In [1]:
%pip -q install surrealdb pandas

Note: you may need to restart the kernel to use updated packages.


## Get started

### Run database in container

First, run SurrealDB as a container. I use [podman](https://podman.io/) because I like it better, but docker will work too.

The volume mount is optional, in case you want to export your database.

```{danger}
Passing credentials like this is insecure and is for testing and demo only!
```

```bash
podman run --rm -it -p 8000:8000 -v `pwd`/mydata:/mydata docker.io/surrealdb/surrealdb:latest start --auth --user root --pass testing 
```

You should see

> INFO surrealdb::net: Started web server on 0.0.0.0:8000

### Login to database

We will use the [Python SDK for SurrealDB](https://docs.surrealdb.com/docs/integration/sdks/python/).

Note that I believe Python is an awful language and you should avoid it in most production cases. But for this demo notebook, eh, I suppose I can hold my nose and do it.

SurrealDB has a plethora of SDKs to chose from, though they aren't as developed as one might hope.

In [2]:
from surrealdb import Surreal

db = Surreal("http://localhost:8000")
await db.connect()

# NOTE: This is an insecure way of handling credentials and should not be used in production.
await db.signin({"user": "root", "pass": "testing"})
await db.use("test", "test")

print(db.client_state)

ConnectionState.CONNECTED


## Preview data

Again, this dataset is from Kaggle's [Employee Review](https://www.kaggle.com/datasets/fiodarryzhykau/employee-review).

Let's check it out. We'll also grab a list of unique names because we will need those.

In [3]:
# Preview the CSV
import pandas as pd

# Load the CSV file
file_path = 'augmented_employee_feedback.csv'
data = pd.read_csv(file_path)

# Display the info about the dataframe
print(data.info())

# For each unique employeeId in the CSV
unique_employeeId = data['employeeId'].unique()
print("Unique employees in dataset:", len(unique_employeeId))


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 225 entries, 0 to 224
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   employeeId  225 non-null    int64 
 1   name        225 non-null    object
 2   year        225 non-null    int64 
 3   team        225 non-null    object
 4   nine_box    225 non-null    int64 
 5   feedback    225 non-null    object
 6   Source      225 non-null    object
dtypes: int64(3), object(4)
memory usage: 12.4+ KB
None
Unique employees in dataset: 75


## Create member table

The first thing we will do with our database is create table to put the members of our company in. Let's call it **member**.

We will use a [schemafull table](https://docs.surrealdb.com/docs/surrealql/statements/define/table#schemafull-tables)
because then the database will strictly enforce the definitions we add to the table for each record.

While schemaless tables offer advantages in flexibility, for something as critical as the cornerstone record of who an employee is, schemafull is *absolutely* the way to go.

The queries can get a tad unwieldy, so I like to preview the string before sending to the database.

In [4]:
member_tb = "member"
createMemberQuery = f"""
# Create primary table for holding member info
DEFINE TABLE {member_tb} SCHEMAFULL;

# Name
DEFINE FIELD name ON TABLE {member_tb} TYPE string;

# Employee ID as unique index
DEFINE FIELD employeeId ON TABLE {member_tb} TYPE int
    ASSERT employeeId > 10000 && employeeId < 99999;
DEFINE INDEX employeeIdIndex ON TABLE {member_tb} COLUMNS employeeId UNIQUE;
"""

print(createMemberQuery)


# Create primary table for holding member info
DEFINE TABLE member SCHEMAFULL;

# Name
DEFINE FIELD name ON TABLE member TYPE string;

# Employee ID as unique index
DEFINE FIELD employeeId ON TABLE member TYPE int
    ASSERT employeeId > 10000 && employeeId < 99999;
DEFINE INDEX employeeIdIndex ON TABLE member COLUMNS employeeId UNIQUE;



In [5]:
# If the command to create the member table looks good, go ahead and create it.
await db.query(createMemberQuery)
# Verify the table was created
await db.query("INFO FOR TABLE member;")

[{'result': {'events': {},
   'fields': {'employeeId': 'DEFINE FIELD employeeId ON member TYPE int ASSERT employeeId > 10000 AND employeeId < 99999 PERMISSIONS FULL',
    'name': 'DEFINE FIELD name ON member TYPE string PERMISSIONS FULL'},
   'indexes': {'employeeIdIndex': 'DEFINE INDEX employeeIdIndex ON member FIELDS employeeId UNIQUE'},
   'lives': {},
   'tables': {}},
  'status': 'OK',
  'time': '4.156837ms'}]

### employeeId

Let's randomly assign each employee an ID number. Recall that the schema definition above enforces that `employeeId` is

- **UNIQUE** amongst all `employeeIds`
- A string made up of 5 digits
- [Indexed](https://docs.surrealdb.com/docs/surrealql/statements/define/indexes) with `employeeIdIndex`.

The record creation here will fail if any of those criteria (as well as the other definitions) are not met.

In [6]:
from random import randint

# Roster of entries
records_added = 0
failed_records: list[dict[str, str]] = []

# Iterate through the unique employee IDs in data and get the name
for employee_id in unique_employeeId:
    # Get the name for the employee ID
    name = data.loc[data['employeeId'] == employee_id, 'name'].iloc[0]
    # Add the person to the member table
    try:
        result = await db.create(member_tb + ":" + str(employee_id), {
        'name': name,
        'employeeId': int(employee_id),
        })
        records_added = records_added + 1
    except Exception as e:
        failed_records.append({name, e})

print("Added records:", records_added)
print("Errors adding:", len(failed_records))
print("Failed records:\n", failed_records)

Added records: 75
Errors adding: 0
Failed records:
 []


In [7]:
# View some of the results
await db.query(f"SELECT * FROM {member_tb} LIMIT 5;")

[{'result': [{'employeeId': 11287,
    'id': 'member:11287',
    'name': 'Deon Griffith'},
   {'employeeId': 12605, 'id': 'member:12605', 'name': 'Ivan Reese'},
   {'employeeId': 13965, 'id': 'member:13965', 'name': 'Ella Green'},
   {'employeeId': 17072, 'id': 'member:17072', 'name': 'Amy Jones'},
   {'employeeId': 17293, 'id': 'member:17293', 'name': 'Daisy Pearce'}],
  'status': 'OK',
  'time': '130.031µs'}]

## Add performance feedback

Now that we have our employees loaded in to the database, let's add the performance feedback!

### Create feedback table

This table will be a schemafull **edge** table: **gotFeedback**. I advocate for schemafull here because I'm imagining a standard feedback form with fields that must be filled in.

Additionally, there are some advantages for queries if you standardize fields. We will see this in action below.

In [8]:
# Create a schemafull table for feedback
feedback_tb = "gotFeedback"
createFeedbackQuery = f"""
DEFINE TABLE {feedback_tb} SCHEMAFULL;
DEFINE FIELD date on TABLE {feedback_tb} TYPE datetime;
DEFINE FIELD body ON TABLE {feedback_tb} TYPE string;
DEFINE FIELD rating ON TABLE {feedback_tb} TYPE number
    ASSERT rating >= 1 && rating <= 9;
DEFINE FIELD team ON TABLE {feedback_tb} TYPE record<team>;
"""

print(createFeedbackQuery)


DEFINE TABLE gotFeedback SCHEMAFULL;
DEFINE FIELD date on TABLE gotFeedback TYPE datetime;
DEFINE FIELD body ON TABLE gotFeedback TYPE string;
DEFINE FIELD rating ON TABLE gotFeedback TYPE number
    ASSERT rating >= 1 && rating <= 9;
DEFINE FIELD team ON TABLE gotFeedback TYPE record<team>;



In [9]:
# If the command to create the table looks good, go ahead and create it.
await db.query(createFeedbackQuery)
# Verify the table was created
print(await db.query(f"INFO FOR TABLE {feedback_tb};"))

[{'result': {'events': {}, 'fields': {'body': 'DEFINE FIELD body ON gotFeedback TYPE string PERMISSIONS FULL', 'date': 'DEFINE FIELD date ON gotFeedback TYPE datetime PERMISSIONS FULL', 'rating': 'DEFINE FIELD rating ON gotFeedback TYPE number ASSERT rating >= 1 AND rating <= 9 PERMISSIONS FULL', 'team': 'DEFINE FIELD team ON gotFeedback TYPE record<team> PERMISSIONS FULL'}, 'indexes': {}, 'lives': {}, 'tables': {}}, 'status': 'OK', 'time': '147.805µs'}]


#### Define team table

We also happen to need a team table

In [10]:
# Define a schemaless team table
await db.query("DEFINE TABLE team SCHEMALESS;")

# Iterate over the teams and add them to the team table
for team in data['team'].unique():
    try:
        await db.create("team:" + team, {
            'name': team,
        })
    except Exception as e:
        print(e)
        break

### Add Feedback

To add the feedback to our database we'll make a few assumptions. Specifically, we'll randomly select and assign a manager and date.

In [11]:
# Assume managers are the last 5 names in the list
managers = unique_employeeId[-5:]
print(managers)

[31666 93665 78812 58666 69269]


In [12]:
from random import choice

# Go through each feedback and add it to the database with a randomly selected manager
feedback_df = data[['employeeId','feedback', 'team', 'year', 'nine_box']]
# Roster of entries
records_added = 0
failed_records: list[dict[str, str]] = []

# Add feedback
for _, row in feedback_df.iterrows():
    # Randomly generate datetime for when feedback happened since dataset only has year
    feedbackDate = str(row.year) + "-" + str(randint(10,12)) + "-" + str(randint(10,28))
    feedbackDate = feedbackDate + "T00:00:00Z"
    try:
        # Convert employeeId and nine_box to int because JSON doesn't like numpy
        employeeId = int(row.employeeId)
        nine_box = int(row.nine_box)
        # Since  we used the employeeId as the id for the member, we can just add the prefix
        raterId = "member:" + str(choice(managers))
        # Get id of member
        memberId = await db.query(f"""SELECT id FROM member WHERE employeeId = {employeeId};""")
        memberId = memberId[0]["result"][0]["id"]
        query = f"""RELATE {memberId}->gotFeedback->{raterId} CONTENT {{
                    'date': "{feedbackDate}",
                    'rating': {nine_box},
                    'team': team:{row.team},
                    'body': "{row.feedback.replace('"', "'")}",   
                  }};
                  """
        result = await db.query(query)
        # If result contains 'status': 'ERR' then throw an exception
        if result[0]["status"] == "ERR":
            raise Exception(result)
        
        records_added = records_added +1
    except Exception as e:
        failed_records.append({name, e})

print("Added records:", records_added)
print("Errors adding:", len(failed_records))
print("Failed records:\n", failed_records)

    

Added records: 225
Errors adding: 0
Failed records:
 []


## Query Feedback Records

We will step through a few queries to showcase the power and speed of the hybrid database!

First, just show a sample of the feedback.

In [13]:
from json import dumps

# Show a sample of the feedback
print(dumps(await db.query(f"SELECT * FROM {feedback_tb} LIMIT 1;"), indent=2))

[
  {
    "result": [
      {
        "body": "Rachel is the star worker! She loves to come in and work until theres nothing left. The only thing she needs to work on is being more expressive. By doing that she could be able to lead one day.",
        "date": "2017-10-12T00:00:00Z",
        "id": "gotFeedback:05mw2y00dmy1gxrujylk",
        "in": "member:25771",
        "out": "member:69269",
        "rating": 6,
        "team": "team:red"
      }
    ],
    "status": "OK",
    "time": "104.735\u00b5s"
  }
]


### Get employee feedback

We can easily get all of the feedback every provided to an employee!

Let's use **Valeria Crane**, `employeeId=66919` as an example.

In [14]:
# Get all the feedback a person has ever gotten (use Valeria Crane as an example)
# This syntax uses the graph query syntax to iterate the outgoing edges of the member object. It's very fast.
print(dumps(await db.query("SELECT body, out.name AS raterName FROM member:66919->gotFeedback;"), indent=2))

[
  {
    "result": [
      {
        "body": "Valeria Crane is a hazard in our team. She has shown no signs of reliability. She is difficult to work with and does not seem motivated. Valeria needs a lot of guidance to complete tasks.",
        "raterName": "George Jones"
      },
      {
        "body": "Valeria Crane has shown little in her work to amaze. So far it has been of a level below standard. We have not seen a consistent member of the team. Undertaking tasks has been a poor turn out so far from Valeria.",
        "raterName": "Ivanna Boyer"
      },
      {
        "body": "Valeria is considered a risk. She has not been performing up to required standards. We have tried working with her to improve and there has not been any improvement. More testing will be required, but I would not recommend for personal advancement until noticeable and consistent improvement from management.",
        "raterName": "Lauren Baker"
      },
      {
        "body": "To simply but, Valeria is c

What if we only want the most recent feedback? No problem.

In [15]:
# Get the most recent feedback for Valeria Crane
print(dumps(await db.query("SELECT body, out.name AS raterName, date FROM member:66919->gotFeedback ORDER BY date DESC LIMIT 1;"), indent=2))

[
  {
    "result": [
      {
        "body": "Valeria Crane has shown little in her work to amaze. So far it has been of a level below standard. We have not seen a consistent member of the team. Undertaking tasks has been a poor turn out so far from Valeria.",
        "date": "2018-11-15T00:00:00Z",
        "raterName": "Ivanna Boyer"
      }
    ],
    "status": "OK",
    "time": "185.5\u00b5s"
  }
]


### Ask questions about the rater

What if we want to know about how a rater evaluates her subordinates?

We will use **Ivanna Boyer**, `employeeId=69269` as an example.

There are three ways we can fetch every record of feedback a rater has given.
You can see them below and uncomment the other options to experiment.
Note that the time differences will be more pronounced if you remove the `LIMIT 3` statement.

In [16]:
# Get the name of every person this rater has ever given feedback on (use Ivanna Boyer as an example)
# Added a LIMIT 3 to make the result easier to read

# This syntax uses graph query to iterate over the edges of the rater to the gotFeedback table. It's very fast.
print(dumps(await db.query("SELECT in.name AS rateeName FROM member:69269<-gotFeedback LIMIT 3;"), indent=2))

# This syntax uses graph query syntax to iterate over the rater member object. It's somewhat fast.
# print(dumps(await db.query("SELECT <-gotFeedback.in.name AS rateeName FROM member:69269 LIMIT 3;"), indent=2))

# This syntax uses traditional SQL syntax to iterate over the gotFeedback table. It's painfully slow.
# print(dumps(await db.query("SELECT in.name AS rateeName FROM gotFeedback WHERE out = member:69269 LIMIT 3;"), indent=2))


[
  {
    "result": [
      {
        "rateeName": "Rachel Harper"
      },
      {
        "rateeName": "Libby Parker"
      },
      {
        "rateeName": "Rachel Harper"
      }
    ],
    "status": "OK",
    "time": "234.296\u00b5s"
  }
]


You can also do filtering; for example, only get ratees who received feedback while on the **green** team.

In [17]:
# Get the name of every person Ivanna Boyer rated while they were on the green team
# This syntax uses the graph query syntax to iterate edges of the graph. It's very fast.
print(dumps(await db.query("SELECT in.name AS rateeName FROM member:69269<-gotFeedback WHERE team=team:green;"), indent=2))

# You can also do this by using the traditional SQL syntax to iterate over the gotFeedback table. It's very slow.
# print(dumps(await db.query("SELECT in.name AS rateeName FROM gotFeedback WHERE team = team:green && out = member:69269;"), indent=2))

[
  {
    "result": [
      {
        "rateeName": "Ella Green"
      },
      {
        "rateeName": "Logan Ellis"
      },
      {
        "rateeName": "Ella Powers"
      },
      {
        "rateeName": "Georgia King"
      }
    ],
    "status": "OK",
    "time": "633.237\u00b5s"
  }
]
