# Integrating to TimeDetect

This tutorial outlines the process for integrating with the TimeDetect API in a detailed, step-by-step manner.

For a comprehensive understanding and additional details about various endpoints, it's advisable to follow along with this guide while consulting the TimeDetect [documentation](https://docs.resolve.visma.com/time-detect/).

The TimeDetect API streamlines the management of time and absence records for companies. It leverages machine learning to analyze historical data, offering insights into the anomaly level and reasons behind new registrations. This facilitates the automation of time registration processes and enhances the efficiency of those responsible for approvals, ultimately boosting productivity.


## Getting Started

The very first thing we need to do is to **request access**.

We use Visma Connect for authentication. To get access, you need to register an application on the Visma Developer Portal by following these steps. Use Stage for testing the API, and switch to Production when you are ready to go live with real users.

1. Log into Visma Developer Portal
   - [https://oauth.developers.stagaws.visma.com](https://oauth.developers.stagaws.visma.com)
     <br>
     <br>
2. Navigate to "My Applications". Go to "Add Application" > "Service (Machine-to-Machine)"
   <br>

3. Fill in the required details for your application <br>
   Make sure to note down the `client_id`.

4. Publish the application.

5. Open the application and add a new integration.

6. Select the API you want to integrate with: <br>
   `Machine Learning Factory API Stage` <br>

7. Select the appropriate scope for your integration: <br>
   `machine-learning-factory-api-stage:td` <br>

8. Wait for the integration to be accepted. <br>
   Feel free to contact the Time Detect team to speed up the process.

9. Open the application, generate credentials, and securely store the `client_secret`.


## Authenticate

If you have followed the steps above, you now have the:

- Client_id
- Scope of your integration
- Your client_secret

To use the API you need an access token generated by Visma Connect's API. These have a lifetime of up to 60 minutes, depending on what you selected when setting up the application. <br>
We'll walk you through the steps of generating an access token. <br>

The first thing we need to do is create a POST request to the Visma Connect API:

- https://connect.identity.stagaws.visma.com/connect/token <br>


In [34]:
import json
import time
import requests
import pandas as pd
from typing import Dict
from datetime import datetime, timedelta

from src.generate_data import DataGenerator
from src.utils import save_data_to_file, load_data_from_file, to_df, to_json

## Set values for needed constants and variables


In [35]:
VISMA_CONNECT_TOKEN_URL = "https://connect.identity.stagaws.visma.com/connect/token"
VISMA_BASE_URL = "https://api.machine-learning-factory.stage.visma.com/td"
VISMA_CONNECT_API_SCOPE = "machine-learning-factory-api-stage:td"
date_format = "%Y-%m-%d"

VISMA_CONNECT_CLIENT_ID = "your-client-id"  # TODO: replace with your client id
CLIENT_SECRET = "your-secret"  # TODO: replace with your client secret

In the header, we specify `Content-Type` as `application/x-www-form-urlencoded.`


In [36]:
headers = {"Content-Type": "application/x-www-form-urlencoded"}

payload = (
    f"client_secret={CLIENT_SECRET}"
    f"&client_id={VISMA_CONNECT_CLIENT_ID}"
    f"&grant_type=client_credentials"  # specified in the documentation
    f"&Scope={VISMA_CONNECT_API_SCOPE}"
)

# make the POST request
response = requests.post(VISMA_CONNECT_TOKEN_URL, headers=headers, data=payload)

if response.status_code == 200:
    result: Dict = json.loads(response.text)
    ACCESS_TOKEN = result["access_token"]
    print("successfully fetched token from Visma Connect")
else:
    print("Something went wrong when fetching token from Visma Connect")

successfully fetched token from Visma Connect


## Generate data

We now have acccess to the API and are ready to start using it. But first, we need some data!

You can find functionality for generating dummy-data in `src/generate_data.py`. Feel free to create different datasets to expirement with the API. <br>
For details on the data schema, please refer to the [documentation](https://docs.resolve.visma.com/time-detect/api-reference#tag/Upload-data/operation/rawDataBucketUpload).

### The numericals field

This field is designed to incorporate extra numerical data into your registration. <br>
Examples include custom features like "overtime", "driving-distance", or other numerical values that can be added here.


In [37]:
# lets define some of the parameters we need to generate data
num_employees = 5
projects = ["project-1"]
work_categories = ["work_category-1", "work_category-2"]
departments = ["department-1"]
numericals = []  # empty for now, we will add numericals later


# first items chosen with highest probability
start_times = [8, 7.5]
end_times = [16, 17]
break_durations = [0.5, 0.75]


# define some dates
start_date_train = (datetime.now() - timedelta(days=120)).strftime(date_format)
end_date_train = (datetime.now() - timedelta(days=10)).strftime(date_format)
start_date_test = (datetime.now() - timedelta(days=9)).strftime(date_format)
end_date_test = (datetime.now()).strftime(date_format)

# generates one registration per day per employee
data_generator = DataGenerator(
    num_employees,
    projects,
    work_categories,
    departments,
    numericals,
    start_times=start_times,
    end_times=end_times,
    break_durations=break_durations,
)

train_data = data_generator.generate_data(start_date_train, end_date_train)
predict_data = data_generator.generate_data(start_date_test, end_date_test)

train_df = to_df(train_data)
pred_df = to_df(predict_data)

In [38]:
train_df.head()

Unnamed: 0,registrationId,date,employeeId,projectId,departmentId,workCategory,startTime,endTime,workDuration,breakDuration,publicHoliday,numericals
0,reg-0,2023-10-23,employee-0,project-1,department-1,work_category-1,8.0,17,8.25,0.75,False,[]
1,reg-1,2023-10-23,employee-1,project-1,department-1,work_category-2,8.0,17,8.25,0.75,False,[]
2,reg-2,2023-10-23,employee-2,project-1,department-1,work_category-1,8.0,16,7.25,0.75,False,[]
3,reg-3,2023-10-23,employee-3,project-1,department-1,work_category-2,8.0,16,7.5,0.5,False,[]
4,reg-4,2023-10-23,employee-4,project-1,department-1,work_category-2,8.0,16,7.25,0.75,False,[]


In [39]:
pred_df.head()

Unnamed: 0,registrationId,date,employeeId,projectId,departmentId,workCategory,startTime,endTime,workDuration,breakDuration,publicHoliday,numericals
0,reg-555,2024-02-11,employee-0,project-1,department-1,work_category-1,8.0,17,8.5,0.5,False,[]
1,reg-556,2024-02-11,employee-1,project-1,department-1,work_category-2,7.5,16,7.75,0.75,False,[]
2,reg-557,2024-02-11,employee-2,project-1,department-1,work_category-1,8.0,16,7.25,0.75,False,[]
3,reg-558,2024-02-11,employee-3,project-1,department-1,work_category-2,7.5,17,9.0,0.5,False,[]
4,reg-559,2024-02-11,employee-4,project-1,department-1,work_category-1,8.0,16,7.5,0.5,False,[]


### Create patterns in the data

Now that we have generated some sample data, we can create some patterns in this data to see if the API can successfully detect them.

**_In this example, we make everyone except mr.weekend not work on the weekends:_**


In [40]:
def adjust_weekend_schedule(df):
    """
    Mask for general employees on weekends (excluding employee-0), which is renamed to mr.weekend
    """
    general_mask = (pd.to_datetime(df["date"]).dt.dayofweek > 4) & (
        df["employeeId"] != "employee-0"
    )
    # Remove weekend entries from training data
    df = df[~general_mask]
    df.loc[df["employeeId"] == "employee-0", "employeeId"] = "mr.weekend"
    return df


# apply the function to train_df and pred_df
train_df = adjust_weekend_schedule(train_df)
pred_df = adjust_weekend_schedule(pred_df)

In [41]:
weekend_mask = pd.to_datetime(pred_df["date"]).dt.weekday > 4
pred_df[weekend_mask].head(5)  # only mr.weekend is working on weekends

Unnamed: 0,registrationId,date,employeeId,projectId,departmentId,workCategory,startTime,endTime,workDuration,breakDuration,publicHoliday,numericals
0,reg-555,2024-02-11,mr.weekend,project-1,department-1,work_category-1,8.0,17,8.5,0.5,False,[]
30,reg-585,2024-02-17,mr.weekend,project-1,department-1,work_category-2,8.0,16,7.5,0.5,False,[]
35,reg-590,2024-02-18,mr.weekend,project-1,department-1,work_category-1,8.0,16,7.25,0.75,False,[]


**Create double a double entry in prediction data:**


In [42]:
mask = (pd.to_datetime(pred_df["date"]).dt.dayofweek == 3) & (
    pred_df["employeeId"] == "employee-1"
)

reg = pred_df[mask].sample(1)
duplicate_reg = reg.copy()
duplicate_reg["registrationId"] = duplicate_reg["registrationId"] + "-dup"
pred_df = pd.concat([pred_df, duplicate_reg], ignore_index=True)

# lets check:
pred_df[
    pred_df["registrationId"].isin(
        [reg.iloc[0]["registrationId"], duplicate_reg.iloc[0]["registrationId"]]
    )
]

Unnamed: 0,registrationId,date,employeeId,projectId,departmentId,workCategory,startTime,endTime,workDuration,breakDuration,publicHoliday,numericals
17,reg-576,2024-02-15,employee-1,project-1,department-1,work_category-2,7.5,16,7.75,0.75,False,[]
38,reg-576-dup,2024-02-15,employee-1,project-1,department-1,work_category-2,7.5,16,7.75,0.75,False,[]


**Create missing entry in the prediction data:**


In [43]:
# create a mask for weekdays and exclude the first and last day, for employee-2
mask = (
    (pd.to_datetime(pred_df["date"]) != start_date_test)  # weekdays only
    & (pd.to_datetime(pred_df["date"]) != end_date_test)  # exclude first day
    & (pred_df["employeeId"] == "employee-2")  # exclude last day
)

# get one registration that fulfills this criteria
reg = pred_df[mask].sample(1)
pred_df = pred_df.drop(reg.index)

# the missing registration:
reg

Unnamed: 0,registrationId,date,employeeId,projectId,departmentId,workCategory,startTime,endTime,workDuration,breakDuration,publicHoliday,numericals
8,reg-567,2024-02-13,employee-2,project-1,department-1,work_category-2,7.5,17,9.0,0.5,False,[]


**Add a numerical field in the prediction data:**


In [44]:
# create a mask for 'employee-3'
mask = pred_df["employeeId"] == "employee-3"
random_index = pred_df[mask].sample(1).index[0]

# adding overtime as a numerical to a random registration for empoyee-3:
pred_df.at[random_index, "numericals"] = [{"name": "overtime", "value": 2}]

pred_df.loc[pred_df.index == random_index]

Unnamed: 0,registrationId,date,employeeId,projectId,departmentId,workCategory,startTime,endTime,workDuration,breakDuration,publicHoliday,numericals
4,reg-563,2024-02-12,employee-3,project-1,department-1,work_category-1,8.0,16,7.25,0.75,False,"[{'name': 'overtime', 'value': 2}]"


**Save to file:**


In [45]:
# save to file
train_regs = to_json(train_df)
pred_regs = to_json(pred_df)

save_data_to_file(train_regs, "data/train_data.json")
save_data_to_file(pred_regs, "data/predict_data.json")

## Get the presigned URL

The next step is to get a presigned URL for uploading raw data to the platform. <br>
We do this by making a `PUT` request to the `presigned_url` with the raw data in the request body.

The URL is valid for 60 minutes.


In [46]:
TENANT_ID = (
    "camilla-test"  # TODO: replace "your-tenant-id" with a tenant-id if your choice.
)
headers = {"tenantId": TENANT_ID, "Authorization": f"Bearer {ACCESS_TOKEN}"}
response = requests.get(f"{VISMA_BASE_URL}/presigned_url", headers=headers)

# we'll keep track of the job_id
current_job_id = None

if response.status_code == 200:
    result: Dict = json.loads(response.text)
    current_job_id = result["jobId"]
    PRESIGNED_URL = result["url"]
else:
    print("Something went wrong when getting presigned url")

## Uploading Data to the Presigned URL

Now that we have the Presigned URL, we ready for uploading our training data.

By sending a `PUT` request to the presigned url (generated by calling `GET /presigned_url`), the **_validation_** process is triggered. <br>
Here, we do not need send the authentication token, the authentication is handled through the presigned url. <br>
This process validates the data and prepares it for model training. The status of the validation process can be fetched by calling `GET /status.`

**_Note_**: the header of your `PUT` request should empty. It should not contain a `"Content-Type"` or any other headers. This is crucial for the request to be processed correctly.


In [47]:
dataset_id = "demo"  # TODO: replace with your dataset id.
registrations = load_data_from_file("data/train_data.json")

payload = {"datasets": [{"datasetId": dataset_id, "registrations": registrations}]}

response = requests.put(PRESIGNED_URL, data=json.dumps(payload))
if response.status_code == 200:
    print("Raw data uploaded successfully")
else:
    print(response)
    print("Something went wrong when uploading data")

Raw data uploaded successfully


### Wait for the validation job to finish

Initiating the data upload triggers the validation process. The completion time of this job varies and is dependent on the dataset's volume. It may take several minutes for larger datasets.


In [48]:
def get_job_status() -> Dict:
    if current_job_id is None:
        print("No job id found")
        return
    headers = {
        "tenantId": TENANT_ID,
        "Authorization": f"Bearer {ACCESS_TOKEN}",
        "jobId": current_job_id,
    }
    response = requests.get(f"{VISMA_BASE_URL}/status", headers=headers)
    result: Dict = json.loads(response.text)
    return result


def pull_status_endpoint():
    while True:
        result = get_job_status()
        if result is None:
            print("Error in getting job status. Trying again in 5 seconds...")
            time.sleep(5)
            continue

        if result.get("status") == "success":
            print("Job completed")
            break
        elif result.get("status") == "invalid":
            print("Invalid job status. Exiting loop.")
            break
        else:
            print("Job still in progress. Checking again in 10 seconds...")
            time.sleep(10)


pull_status_endpoint()

Job still in progress. Checking again in 10 seconds...
Job still in progress. Checking again in 10 seconds...
Job still in progress. Checking again in 10 seconds...
Job still in progress. Checking again in 10 seconds...
Job still in progress. Checking again in 10 seconds...
Job still in progress. Checking again in 10 seconds...
Job completed


## Start Model Training

Having completed the upload and validation of our training data, we can now proceed to the next step: training the model.

For this we will make a `POST` request to the `/start_trainer` endpoint. <br>

### Full retrain vs rebuild (streaming)

This endpoint will do 1 out of 2 things, depending on the boolean rebuildModels field:

**_If rebuildModels = True:_** <br>
Starts the machine learning pipeline which traines from scratch and stores a model for each datasetId included in the request body. This will overwrite exsiting models. The status of the rebuild procedure can be fetched by calling GET /status endpoint with the provided jobId.

**_If rebuildModels = False:_** <br>
Starts the machine learning pipeline which updates the model for each datasetId included in the request body. Updating a model means continuing the training procedure on recent data to make sure the models can use all the latest information available in the predictions. The status of the update procedure can be fetched by calling `GET /status` endpoint with the provided jobId.


In [49]:
# set rebuild_models to True if you want to rebuild models
rebuild_models = True

headers = {
    "tenantId": TENANT_ID,
    "Authorization": f"Bearer {ACCESS_TOKEN}",
    "Content-Type": "application/json",
}

payload = {"parameters": [{"datasetId": dataset_id, "rebuildModels": rebuild_models}]}

# send the request
response = requests.post(
    f"{VISMA_BASE_URL}/start_trainer", headers=headers, data=json.dumps(payload)
)

current_job_id = None
if response.status_code == 202:
    result: Dict = json.loads(response.text)
    current_job_id = result["jobId"]
    print("Trainer started successfully")
else:
    print("Something went wrong when starting trainer")

Trainer started successfully


In [50]:
# pull the job status for training job as well:
pull_status_endpoint()

Job still in progress. Checking again in 10 seconds...
Job still in progress. Checking again in 10 seconds...
Job still in progress. Checking again in 10 seconds...
Job still in progress. Checking again in 10 seconds...
Job still in progress. Checking again in 10 seconds...
Job still in progress. Checking again in 10 seconds...
Job still in progress. Checking again in 10 seconds...
Job completed


## Creating Predictions

The prediction procedure computes and stores predictions for each `datasetId` included in the request body.

The status of the prediction procedure can be fetched by calling `GET /status` with the provided `jobId`. The predictions can be fetched by calling `GET /results`, also with the provided `jobId`.


In [51]:
current_job_id = None
url: str = f"{VISMA_BASE_URL}/create_prediction"

# get the employee_ids from the predict_data
employee_ids = list(set([x["employeeId"] for x in pred_regs]))
pred_registrations = load_data_from_file("data/predict_data.json")

headers = {
    "tenantId": TENANT_ID,
    "Authorization": f"Bearer {ACCESS_TOKEN}",
    "Content-Type": "application/json",
}
payload = {
    "parameters": [
        {
            "datasetId": dataset_id,
            "registrations": pred_registrations,
            "aggregateForEmployeeIds": employee_ids,
        }
    ]
}

# send the request
response = requests.post(url, headers=headers, data=json.dumps(payload))

if response.status_code == 202:
    result: Dict = json.loads(response.text)
    current_job_id = result["jobId"]
    print("Prediction job started successfully")
else:
    print("Something went wrong when creating predictions")

Prediction job started successfully


In [52]:
pull_status_endpoint()

Job still in progress. Checking again in 10 seconds...
Job still in progress. Checking again in 10 seconds...
Job still in progress. Checking again in 10 seconds...
Job still in progress. Checking again in 10 seconds...
Job still in progress. Checking again in 10 seconds...
Job completed


## Fetching the prediction results


In [53]:
headers = {
    "tenantId": TENANT_ID,
    "Authorization": f"Bearer {ACCESS_TOKEN}",
    "jobId": current_job_id,
}

response = requests.get(f"{VISMA_BASE_URL}/results", headers=headers)
if response.status_code == 200:
    result: Dict = json.loads(response.text)
    print("Prediction results fetched successfully")
else:
    print("Something went wrong when getting results")

Prediction results fetched successfully


## Assess the results

Lets look at the results!

- Here we see that the **duplicate registration** for employee-1 has a high anomaly-score, with the significant field being **work-duration**. This is something the aggregated model will capture, and not the employee-level model.

- We also see that employee-2 has a **missing registration**, which is also flagged with a high anomaly-score as expected. This is also something that the aggregated model will capture, and not the employee-level model.

- And, we see that the unexpected numerical **overtime** for employee-3 was captured with a high anomaly-score. This is something both the aggregated model and employee-level model will capture.


In [54]:
results_as_df = pd.DataFrame(result["results"][0]["predictions"])

# sort by anomaly_score
results_as_df.sort_values(by="anomalyScore", ascending=False).head(6)

Unnamed: 0,registrationId,date,employeeId,anomalyScore,significantFields,aggregated,missing,relatedRegistrationIds,subModelId
12,agg_employee-1_2024-02-15,2024-02-15,employee-1,64.0,"[{'field': 'workDuration', 'significance': 54}...",True,False,"[reg-576, reg-576-dup]",employee_agg_level-employee-1
18,agg_employee-2_2024-02-13,2024-02-13,employee-2,31.0,"[{'field': 'date', 'significance': 20}, {'fiel...",True,True,[],employee_agg_level-employee-2
25,agg_employee-3_2024-02-12,2024-02-12,employee-3,28.0,"[{'field': 'overtime', 'significance': 24}, {'...",True,False,[reg-563],employee_agg_level-employee-3
65,reg-563,2024-02-12,employee-3,25.0,"[{'field': 'overtime', 'significance': 21}, {'...",False,False,[],employee_level-employee-3
20,agg_employee-2_2024-02-15,2024-02-15,employee-2,7.0,"[{'field': 'workDuration', 'significance': 6},...",True,False,[reg-577],employee_agg_level-employee-2
19,agg_employee-2_2024-02-14,2024-02-14,employee-2,7.0,"[{'field': 'workDuration', 'significance': 6},...",True,False,[reg-572],employee_agg_level-employee-2


## Test the Real-Time endpoint

Lets check out the real-time endpoint.

Upon submitting a registration, a real-time prediction will be generated. This procedure is quick and delivers predictions for each individual registration, rather than providing aggregated results. As a result, this endpoint is capable of real-time execution.


In [55]:
# TODO: feel free to play around with the registration. Here, we try with an ususual endTime
rt_registration = (
    {
        "registrationId": "reg-rt-test",
        "date": (datetime.now()).strftime(date_format),
        "employeeId": "mr.weekend",
        "projectId": "project-1",
        "departmentId": "department-2",
        "workCategory": "work_category-2",
        "startTime": 8.0,
        "endTime": 19,
        "workDuration": 7.25,
        "breakDuration": 0.75,
        "publicHoliday": False,
        "numericals": [],
    },
)


headers = {
    "tenantId": TENANT_ID,
    "Authorization": f"Bearer {ACCESS_TOKEN}",
    "Content-Type": "application/json",
}
payload = {"parameters": [{"datasetId": dataset_id, "registrations": rt_registration}]}

response = requests.post(
    f"{VISMA_BASE_URL}/real_time_prediction", headers=headers, data=json.dumps(payload)
)

if response.status_code == 200:
    result: Dict = json.loads(response.text)
else:
    print("Something went wrong when creating real time predictions")


result = result["results"][0]["predictions"]
results_as_df = pd.DataFrame(result)
results_as_df

Unnamed: 0,registrationId,date,employeeId,anomalyScore,significantFields,aggregated,missing,relatedRegistrationIds,subModelId
0,reg-rt-test,2024-02-20,mr.weekend,27.0,"[{'field': 'end_time', 'significance': 25}, {'...",False,False,[],employee_level-mr.weekend


## Delete dataset [Optional]

Although we have implemented a data deletion policy, you can also manually delete your datasets.

**Note**: Please skip this step if you intend to use the Streamlit demo with the same data and models that were used for training in this tutorial.


In [57]:
headers = {
    "tenantId": TENANT_ID,
    "Authorization": f"Bearer {ACCESS_TOKEN}",
    "datasetId": dataset_id,
    "Content-Type": "application/json",
}
response = requests.delete(f"{VISMA_BASE_URL}/data/{dataset_id}", headers=headers)

if response.status_code == 200:
    result: Dict = json.loads(response.text)
    print("Dataset deleted successfully")
else:
    print("Something went wrong when deleting dataset")

Dataset deleted successfully
