# Integrating to TimeDetect

This tutorial outlines the process for integrating with the TimeDetect API in a detailed, step-by-step manner.

For a comprehensive understanding and additional details about various endpoints, it's advisable to follow along with this guide while consulting the TimeDetect [documentation](https://docs.resolve.visma.com/time-detect/).

The TimeDetect API streamlines the management of time and absence records for companies. It leverages machine learning to analyze historical data, offering insights into the anomaly level and reasons behind new registrations. This facilitates the automation of time registration processes and enhances the efficiency of those responsible for approvals, ultimately boosting productivity.


## Getting Started

The very first thing we need to do is to **request access**.

We use Visma Connect for authentication. To get access, you need to register an application on the Visma Developer Portal by following these steps. Use Stage for testing the API, and switch to Production when you are ready to go live with real users.

1. Log into Visma Developer Portal
   - **Stage:** [https://oauth.developers.stagaws.visma.com](https://oauth.developers.stagaws.visma.com)
   - **Prod:** [https://oauth.developers.visma.com](https://oauth.developers.visma.com)
     <br> <br>
2. Navigate to "My Applications". Go to "Add Application" > "Service (Machine-to-Machine)"
   <br>

3. Fill in the required details for your application <br>
   Make sure to note down the `client_id`.

4. Publish the application.

5. Open the application and add a new integration.

6. Select the API you want to integrate with: <br>
   **Stage:** `Machine Learning Factory API Stage` <br>
   **Prod:** `Machine Learning Factory API Prod` <br>

7. Select the appropriate scope for your integration: <br>
   **Stage:** `machine-learning-factory-api-stage:td` <br>
   **Prod:** `machine-learning-factory-api-prod:td ` <br>

8. Wait for the integration to be accepted. <br>
   Feel free to contact the Time Detect team to speed up the process.

9. Open the application, generate credentials, and securely store the `client_secret`.


## Authenticate

If you have followed the steps above, you now have the:

- Client_id
- Scope of your integration
- Your client_secret

To use the API you need an access token generated by Visma Connect's API. These have a lifetime of up to 60 minutes, depending on what you selected when setting up the application. <br>
We'll walk you through the steps of generating an access token. <br>

The first thing we need to do is create a POST request to the Visma Connect API:

- **Stage:** https://connect.identity.stagaws.visma.com/connect/token <br>
- **Prod:** https://connect.visma.com/connect/token <br>


In [4]:
# include all imports
import json
import time
from datetime import datetime, timedelta
from typing import Dict

import pandas as pd
import requests

from src.generate_data import DataGenerator
from src.utils import save_data_to_file, load_data_from_file, to_df, to_json

In [5]:
# Define the constants variables we need:
VISMA_CONNECT_TOKEN_URL = (
    "https://connect.identity.stagaws.visma.com/connect/token"  # token url for stage
)
VISMA_BASE_URL = (
    "https://api.machine-learning-factory.stage.visma.com/td"  # base url for stage
)
VISMA_CONNECT_API_SCOPE = (
    "machine-learning-factory-api-stage:td"  # if you are connecting to stage
)

VISMA_CONNECT_CLIENT_ID = "your-client-id"  # TODO: replace "insert-your-clinet-id-here"
CLIENT_SECRET = "your-secret"  # TODO: replace with your client secret

In the header, you specify `Content-Type: application/x-www-form-urlencoded.`


In [6]:
headers = {"Content-Type": "application/x-www-form-urlencoded"}

payload = (
    f"client_secret={CLIENT_SECRET}"
    f"&client_id={VISMA_CONNECT_CLIENT_ID}"
    f"&grant_type=client_credentials"  # specified in the documentation
    f"&Scope={VISMA_CONNECT_API_SCOPE}"
)

# make the POST request
response = requests.post(VISMA_CONNECT_TOKEN_URL, headers=headers, data=payload)

if response.status_code == 200:
    result: Dict = json.loads(response.text)
    ACCESS_TOKEN = result["access_token"]
    print("successfully fetched token from Visma Connect")
else:
    print("Something went wrong when fetching token from Visma Connect")

successfully fetched token from Visma Connect


## Generate data

We have now acccess to the API, and are ready to start using it. But first, we need some data!

You can find functionality for generating dummy-data in `src/generate_data.py`. Feel free to create different types of datasets to expirement with the API. <br>


In [7]:
# Lets define some of the parameters we need to generate data
num_employees = 5
projects = ["project-1"]
work_categories = ["work_category-1", "work_category-2"]
departments = ["department-1"]
numericals = []  # generates numericals with 0.3 probability


# First items chosen with highest probability
start_times = [8, 7.5]
end_times = [16, 17]
break_durations = [0.5, 0.75]


# Define some dates
start_date_train = (datetime.now() - timedelta(days=120)).strftime("%Y-%m-%d")
end_date_train = (datetime.now() - timedelta(days=10)).strftime("%Y-%m-%d")
start_date_test = (datetime.now() - timedelta(days=9)).strftime("%Y-%m-%d")
end_date_test = (datetime.now()).strftime("%Y-%m-%d")

# Generates one registration per day per employee
data_generator = DataGenerator(
    num_employees,
    projects,
    work_categories,
    departments,
    numericals,
    start_times=start_times,
    end_times=end_times,
    break_durations=break_durations,
)

train_data = data_generator.generate_data(start_date_train, end_date_train)
predict_data = data_generator.generate_data(start_date_test, end_date_test)

train_df = to_df(train_data)
pred_df = to_df(predict_data)

In [8]:
train_df.head()

Unnamed: 0,registrationId,date,employeeId,projectId,departmentId,workCategory,startTime,endTime,workDuration,breakDuration,publicHoliday,numericals
0,reg-0,2023-09-28,employee-0,project-1,department-1,work_category-1,7.5,17,8.75,0.75,False,[]
1,reg-1,2023-09-28,employee-1,project-1,department-1,work_category-2,8.0,17,8.5,0.5,False,[]
2,reg-2,2023-09-28,employee-2,project-1,department-1,work_category-1,7.5,17,9.0,0.5,False,[]
3,reg-3,2023-09-28,employee-3,project-1,department-1,work_category-1,7.5,16,7.75,0.75,False,[]
4,reg-4,2023-09-28,employee-4,project-1,department-1,work_category-2,7.5,16,8.0,0.5,False,[]


In [9]:
pred_df.head()

Unnamed: 0,registrationId,date,employeeId,projectId,departmentId,workCategory,startTime,endTime,workDuration,breakDuration,publicHoliday,numericals
0,reg-555,2024-01-17,employee-0,project-1,department-1,work_category-2,8.0,17,8.5,0.5,False,[]
1,reg-556,2024-01-17,employee-1,project-1,department-1,work_category-2,8.0,16,7.5,0.5,False,[]
2,reg-557,2024-01-17,employee-2,project-1,department-1,work_category-1,8.0,16,7.25,0.75,False,[]
3,reg-558,2024-01-17,employee-3,project-1,department-1,work_category-1,7.5,17,9.0,0.5,False,[]
4,reg-559,2024-01-17,employee-4,project-1,department-1,work_category-1,8.0,17,8.25,0.75,False,[]


### Create patterns in the data

Now that we have some dummy-data, we can create some patterns in the data which we can see if the API is able to catch. <br>

**_In this example, we make everyone except mr.weekend not work on the weekends:_**


In [10]:
def adjust_weekend_schedule(df):
    """
    Mask for general employees on weekends (excluding employee-0), which is renamed to mr.weekend
    """
    general_mask = (pd.to_datetime(df["date"]).dt.dayofweek > 4) & (
        df["employeeId"] != "employee-0"
    )

    df.loc[general_mask, ["startTime", "endTime", "workDuration", "breakDuration"]] = 0
    df.loc[df["employeeId"] == "employee-0", "employeeId"] = "mr.weekend"
    return df


# Apply the function to train_df and pred_df
train_df = adjust_weekend_schedule(train_df)
pred_df = adjust_weekend_schedule(pred_df)

In [11]:
def set_weekend_schedule(df):
    weekend_mask = pd.to_datetime(df["date"]).dt.weekday > 4
    anomaly_mask = weekend_mask & (df["employeeId"] != "mr.weekend")
    df.loc[anomaly_mask, ["startTime", "endTime", "workDuration", "breakDuration"]] = 0
    return df


# Apply the function to both train_df and pred_df
train_df = set_weekend_schedule(train_df)
pred_df = set_weekend_schedule(pred_df)

In [12]:
weekend_mask = pd.to_datetime(pred_df["date"]).dt.weekday > 4
pred_df[weekend_mask].head(5)

Unnamed: 0,registrationId,date,employeeId,projectId,departmentId,workCategory,startTime,endTime,workDuration,breakDuration,publicHoliday,numericals
15,reg-570,2024-01-20,mr.weekend,project-1,department-1,work_category-1,8.0,17,8.25,0.75,False,[]
16,reg-571,2024-01-20,employee-1,project-1,department-1,work_category-1,0.0,0,0.0,0.0,False,[]
17,reg-572,2024-01-20,employee-2,project-1,department-1,work_category-2,0.0,0,0.0,0.0,False,[]
18,reg-573,2024-01-20,employee-3,project-1,department-1,work_category-1,0.0,0,0.0,0.0,False,[]
19,reg-574,2024-01-20,employee-4,project-1,department-1,work_category-2,0.0,0,0.0,0.0,False,[]


**Create double a double entry in prediction data:**


In [13]:
# Lets pick a date where we don't have a zero-entry for employee-1
mask = (pd.to_datetime(pred_df["date"]).dt.dayofweek == 3) & (
    pred_df["employeeId"] == "employee-1"
)

# get one reg that fulfills this criteria
reg = pred_df[mask].sample(1)
duplicate_reg = reg.copy()
duplicate_reg["registrationId"] = duplicate_reg["registrationId"] + "-dup"
pred_df = pd.concat([pred_df, duplicate_reg], ignore_index=True)

# lets check:
pred_df[pred_df["registrationId"].str.contains("-dup")]

Unnamed: 0,registrationId,date,employeeId,projectId,departmentId,workCategory,startTime,endTime,workDuration,breakDuration,publicHoliday,numericals
50,reg-596-dup,2024-01-25,employee-1,project-1,department-1,work_category-1,8.0,16,7.25,0.75,False,[]


**Create missing entry in the prediction data:**


In [14]:
# delete a registration for employee-employee-2 on tuesday
mask = (pd.to_datetime(pred_df["date"]).dt.dayofweek == 1) & (
    pred_df["employeeId"] == "employee-2"
)

# get one reg that fulfills this criteria
reg = pred_df[mask].sample(1)
pred_df = pred_df.drop(reg.index)

**Save to file:**


In [15]:
# Save to file
train_regs = to_json(train_df)
pred_regs = to_json(pred_df)

save_data_to_file(train_regs, "data/train_data.json")
save_data_to_file(pred_regs, "data/predict_data.json")

## Get the presigned URL

The next step is to get a presigned URL for uploading raw data to the platform. Make a `PUT` request to the `presigned_url` with the raw data in the request body.

The URL is valid for 60 minutes.


In [16]:
TENANT_ID = "insert-your-tenant-id-here"  # TODO: replace "insert-your-tenant-id-here" with a tenant-id if your choice.
headers = {"tenantId": TENANT_ID, "Authorization": f"Bearer {ACCESS_TOKEN}"}
response = requests.get(f"{VISMA_BASE_URL}/presigned_url", headers=headers)

# Keep track of the job_id
current_job_id = None

if response.status_code == 200:
    result: Dict = json.loads(response.text)
    current_job_id = result["jobId"]
    PRESIGNED_URL = result["url"]
else:
    print("Something went wrong when getting presigned url")

## Uploading Data to the Presigned URL

Now that we have the Presigned URL, we ready for uploading our training data.
This is usually the approved time registrations, meaning they are registrations we know are "correct" for the employees.

By sending a PUT request to the presigned url (generated by calling `GET /presigned_url`), the **_validation_** process is triggered. <br>
Here, we do not need send the authentication token, the authentication is handled through the presigned url. <br>
This process validates the data and prepares it for model training. The status of the validation process can be fetched by calling `GET /status.`

**_Note_**: the header of your `PUT` request should empty. It should not contain a `"Content-Type"` or any other headers. This is crucial for the request to be processed correctly.


In [17]:
dataset_id = "demo"  # TODO: -> replace with your dataset id
registrations = load_data_from_file("data/train_data.json")

payload = {"datasets": [{"datasetId": dataset_id, "registrations": registrations}]}

response = requests.put(PRESIGNED_URL, data=json.dumps(payload))
if response.status_code == 200:
    print("Raw data uploaded successfully")
else:
    print(response)
    print("Something went wrong when uploading data")

Raw data uploaded successfully


### Wait for the validation job to finish

Initiating the data upload triggers the validation process. The completion time of this job varies and is dependent on the dataset's volume. It may take several minutes for larger datasets.


In [18]:
def get_job_status() -> Dict:
    if current_job_id is None:
        print("No job id found")
        return
    headers = {
        "tenantId": TENANT_ID,
        "Authorization": f"Bearer {ACCESS_TOKEN}",
        "jobId": current_job_id,
    }
    response = requests.get(f"{VISMA_BASE_URL}/status", headers=headers)
    result: Dict = json.loads(response.text)
    return result


def pull_status_endpoint():
    while True:
        result = get_job_status()
        if result is None:
            print("Error in getting job status. Trying again in 5 seconds...")
            time.sleep(5)
            continue

        if result.get("status") == "success":
            print("Job completed")
            break
        elif result.get("status") == "invalid":
            print("Invalid job status. Exiting loop.")
            break
        else:
            print("Job still in progress. Checking again in 10 seconds...")
            time.sleep(10)


pull_status_endpoint()

Job still in progress. Checking again in 10 seconds...
Job still in progress. Checking again in 10 seconds...
Job still in progress. Checking again in 10 seconds...
Job still in progress. Checking again in 10 seconds...
Job still in progress. Checking again in 10 seconds...
Job still in progress. Checking again in 10 seconds...
Job completed


## Start Model Training

Having completed the upload and validation of our training data, we can now proceed to the next step: training the model.

For this we will make a `POST` request to the `/start_trainer` endpoint. <br>

### Full retrain vs rebuild (streaming)

This endpoint will do 1 out of 2 things, depending on the boolean rebuildModels field:

**_If rebuildModels = True:_** <br>
Starts the machine learning pipeline which traines from scratch and stores a model for each datasetId included in the request body. This will overwrite exsiting models. The status of the rebuild procedure can be fetched by calling GET /status endpoint with the provided jobId.

**_If rebuildModels = False:_** <br>
Starts the machine learning pipeline which updates the model for each datasetId included in the request body. Updating a model means continuing the training procedure on recent data to make sure the models can use all the latest information available in the predictions. The status of the update procedure can be fetched by calling `GET /status` endpoint with the provided jobId.


In [19]:
# set rebuild_models to True if you want to rebuild models
rebuild_models = True

headers = {
    "tenantId": TENANT_ID,
    "Authorization": f"Bearer {ACCESS_TOKEN}",
    "Content-Type": "application/json",
}

payload = {"parameters": [{"datasetId": dataset_id, "rebuildModels": rebuild_models}]}

# Send the request
response = requests.post(
    f"{VISMA_BASE_URL}/start_trainer", headers=headers, data=json.dumps(payload)
)

current_job_id = None
if response.status_code == 202:
    result: Dict = json.loads(response.text)
    current_job_id = result["jobId"]
    print("Trainer started successfully")
else:
    print("Something went wrong when starting trainer")

Trainer started successfully


In [21]:
# Pull the job status for triaing job as well:
pull_status_endpoint()

Job completed


## Creating Predictions

The prediction procedure computes and stores predictions for each `datasetId` included in the request body.

The status of the prediction procedure can be fetched by calling `GET /status` with the provided `jobId`. The predictions can be fetched by calling `GET /results`, also with the provided `jobId`.


In [22]:
current_job_id = None
url: str = f"{VISMA_BASE_URL}/create_prediction"

# get the employee_ids from the predict_data
employee_ids = list(set([x["employeeId"] for x in pred_regs]))
pred_registrations = load_data_from_file("data/predict_data.json")

headers = {
    "tenantId": TENANT_ID,
    "Authorization": f"Bearer {ACCESS_TOKEN}",
    "Content-Type": "application/json",
}
payload = {
    "parameters": [
        {
            "datasetId": dataset_id,
            "registrations": pred_registrations,
            "aggregateForEmployeeIds": employee_ids,
        }
    ]
}

# Send the request
response = requests.post(url, headers=headers, data=json.dumps(payload))

if response.status_code == 202:
    result: Dict = json.loads(response.text)
    current_job_id = result["jobId"]
    print("Prediction job started successfully")
else:
    print("Something went wrong when creating predictions")

Prediction job started successfully


In [23]:
pull_status_endpoint()

Job still in progress. Checking again in 10 seconds...
Job still in progress. Checking again in 10 seconds...
Job still in progress. Checking again in 10 seconds...
Job still in progress. Checking again in 10 seconds...
Job still in progress. Checking again in 10 seconds...
Job still in progress. Checking again in 10 seconds...
Job completed


## Fetching the prediction results


In [24]:
headers = {
    "tenantId": TENANT_ID,
    "Authorization": f"Bearer {ACCESS_TOKEN}",
    "jobId": current_job_id,
}

response = requests.get(f"{VISMA_BASE_URL}/results", headers=headers)
if response.status_code == 200:
    result: Dict = json.loads(response.text)
else:
    print("Something went wrong when getting results")

## Assess the results

Lets look at the results!

Here we see that the duplicate registration for employee-1 has a high anomaly-score, with the significant field being work-duration! <br>
We also see that employee-2 has a missing registration, which is also flagged with a high anomaly-score as expected.


In [25]:
results_as_df = pd.DataFrame(result["results"][0]["predictions"])

# sort by anomaly_score
results_as_df.sort_values(by="anomalyScore", ascending=False).head(5)

Unnamed: 0,registrationId,date,employeeId,anomalyScore,significantFields,aggregated,missing,relatedRegistrationIds,subModelId
17,agg_employee-1_2024-01-25,2024-01-25,employee-1,62.0,"[{'field': 'work_duration', 'significance': 51...",True,False,"[reg-596, reg-596-dup]",employee_agg_level-employee-1
24,agg_employee-2_2024-01-23,2024-01-23,employee-2,24.0,"[{'field': 'date', 'significance': 13}, {'fiel...",True,True,[],employee_agg_level-employee-2
4,agg_mr.weekend_2024-01-21,2024-01-21,mr.weekend,7.0,"[{'field': 'work_duration', 'significance': 6}...",True,False,[reg-575],employee_agg_level-mr.weekend
38,agg_employee-4_2024-01-19,2024-01-19,employee-4,6.0,"[{'field': 'date', 'significance': 4}, {'field...",True,False,[reg-569],employee_agg_level-employee-4
32,agg_employee-3_2024-01-22,2024-01-22,employee-3,6.0,"[{'field': 'work_duration', 'significance': 2}...",True,False,[reg-583],employee_agg_level-employee-3


## Test the Real-Time endpoint

Lets check out the real-time endpoint.

A prediction will be made for each registration sent in. This is a fast process and will only return predictions on a registration level and not on an aggregated level, which is why this endpoint can be executed in real-time. The registrations will not be used to update the models.


In [26]:
# TODO: feel free to play around with the registration
rt_registration = (
    {
        "registrationId": "reg-rt-test",
        "date": (datetime.now()).strftime("%Y-%m-%d"),
        "employeeId": "mr.weekend",
        "projectId": "project-1",
        "departmentId": "department-2",
        "workCategory": "work_category-2",
        "startTime": 8.0,
        "endTime": 16,
        "workDuration": 7.25,
        "breakDuration": 0.75,
        "publicHoliday": False,
        "numericals": [],
    },
)


headers = {
    "tenantId": TENANT_ID,
    "Authorization": f"Bearer {ACCESS_TOKEN}",
    "Content-Type": "application/json",
}
payload = {"parameters": [{"datasetId": dataset_id, "registrations": rt_registration}]}

response = requests.post(
    f"{VISMA_BASE_URL}/real_time_prediction", headers=headers, data=json.dumps(payload)
)

if response.status_code == 200:
    result: Dict = json.loads(response.text)
else:
    print("Something went wrong when creating real time predictions")


result = result["results"][0]["predictions"]
results_as_df = pd.DataFrame(result)
results_as_df

Unnamed: 0,registrationId,date,employeeId,anomalyScore,significantFields,aggregated,missing,relatedRegistrationIds,subModelId
0,reg-rt-test,2024-01-26,mr.weekend,4.0,"[{'field': 'work_duration', 'significance': 2}...",False,False,[],employee_level-mr.weekend


## Delete dataset

Although we have implemented a data deletion policy, you can also manually delete your datasets.


In [27]:
headers = {
    "tenantId": TENANT_ID,
    "Authorization": f"Bearer {ACCESS_TOKEN}",
    "datasetId": dataset_id,
    "Content-Type": "application/json",
}
response = requests.delete(f"{VISMA_BASE_URL}/data/{dataset_id}", headers=headers)

if response.status_code == 200:
    result: Dict = json.loads(response.text)
    print("Dataset deleted successfully")
else:
    print("Something went wrong when deleting dataset")

Dataset deleted successfully
