<div style="background-color:  #663066; padding: 20px;">
   
   
</div>

<div style="display: flex; align-items: center; justify-content: space-between;">
    <div>
        <h1><strong>MIM 2 Data Spaces Graduation Project</strong></h1>
        <h4>by Maxwell Ernst - 18/06/2024</h4>
    </div>
    <div>
        <img src="fontyslogo.png" alt="Fontys Logo" style="height: 80px; margin-left: 20px;">

    
</div>

## **MIM2 Process Tutorial**


This Jupyter Notebook serves as a tutorial for creating a Minimum Interoperable Mechanism - MIM2 (data models and sharing) for data spaces, using mock data that resembles real-world sensor data. The steps outlined here are designed to be generalizable and can be adapted for various data sources and MIM2 development purposes.

# 1) General Steps for creating a MIM

General process for creating a MIM can be outlined in the following steps:

1) Read the Data Spaces Summary Document to gain a good understanding of Data Spaces.
2) Determine the domain you are working in - Mobility, Smart and Sustainable Cities, and Communities.
3) Define the requirements for what needs to be developed to identify which MIM to create, as outlined in the building blocks - Figure 1: Building Blocks taxonomy recommended by OpenDEI and adapted by the DSBA Technical.
4) Search for available standards for that MIM from technical and governance standpoints.
5) Develop the necessary MIM(s).

![MIMsOverview.png](MIMsOverview.png)

Figure 1: Building Blocks taxonomy recommended by OpenDEI and adapted by the DSBA Technical.

# 2) Steps for creating MIM2

## 2.1) What is MIM2?


MIM2, or "Shared Data Models," ensures that data sets use the same definitions for key terms, which is crucial for accurate data linking. For instance, if one dataset defines "children" as ages 5-15 and another defines them as ages 2-12, merging these datasets would create inaccuracies.

Data models are machine-readable definitions of terms, which allow APIs to understand and handle them properly. Consistent data models enable applications to link relevant contextual data with datasets.

## 2.2) Why are shared data-models important?

A common set of data models creates a shared language, allowing systems to communicate effectively. Well-defined data models help cities to integrate and open up data across different solutions and support various applications. Harmonized data models can be reused, facilitating data sharing and learning among cities.

## 2.3) EU Policy Context


Sharing data between different agencies within a city or between cities requires a common way of defining entities. For example, consistent definitions for terms like "bus" or "taxi" are essential. Without common data models, each agency would need to create their own, making data sharing difficult and inefficient.

Common data models support benchmarking and shared learning, reducing the effort required to define data sets.

## 2.4) Requirements for Compliance

Entities described by data in the ecosystem should use consistent data models based on:

- Resource Description Framework (RDF)
- Resource Description Framework Schema (RDFS)
- Web Ontology Language (OWL)

For spatial and spatio-temporal data, consider the provisions of MIM-7 (Places) regarding data encoding.

## 2.5) Recommended Specifications

Using NGSI-LD compliant data models is the preferred option for smart city aspects. These data models have been defined by organizations and projects, including OASC, FIWARE, GSMA, and the SynchroniCity project. There is ongoing collaboration between OASC, TM Forum, and FIWARE to specify more models through the Smart Data Models initiative: Smart Data Models.

Alternatively, existing data models and ontologies can be adapted for use with NGSI-LD by identifying entities, properties, and relationships that can be managed by the NGSI-LD API. Some examples include:

- oneM2M base ontology (compatible with SAREF), which provides semantic descriptions of data through metadata
- SAREF: Smart Appliances REFerence ontology, with SAREF4Cities focused on smart cities
- Core vocabularies of ISA, such as the Core Public Service Vocabulary Application Profile, used for the Single Digital Gateway Regulation
- Digital Twin Definition Language (DTDL) developed by Microsoft, based on json-ld, with existing Fiware data models converted to this format

## 2.6) Relevant European References and Specifications

As part of ongoing work related to MIM2, support for the Smart Data Models Initiative aims to:

- Develop guidelines and a catalogue of minimum common data models in different sectors for interoperability
- Create harmonized representation formats and semantics for applications to consume and publish data
- Develop data models for interoperable and replicable smart solutions across sectors, starting with smart cities and extending to smart agri-food, smart utilities, smart industry, etc.
- Establish a methodology to translate between credible initiatives developing data models
- Provide guidelines on developing consistent data models
- Expand the catalogue of data models agreed upon by OASC cities as common models for use

# 3) MIM2 Example using mock data

## 3.1) Prerequisites

Before following this tutorial, ensure you have the following:

- Python 3.x: Download and install Python from https://www.python.org/downloads/.
- Jupyter Notebook: Install Jupyter Notebook using pip install jupyter in your terminal.
- Pandas library: Install Pandas using pip install pandas in your terminal.
- Familiarity with MIM concepts: Basic understanding of MIMs and data spaces is recommended.
- datetime, json, and jsonschema (validate, ValadationError) imported as shown in the libraries section

## 3.2) Data Source

This tutorial utilizes mock data that simulates real-world sensor data. You can replace this with your actual data source during implementation. The mock data will have a similar structure to sensor readings, including:

- time_recorded: Timestamp when each observation was recorded in "YYYY-MM-DD HH:MM
" format.
- sensor_identifier: Identifier/name of the sensor that recorded each observation.
- temp_celsius: Temperature recorded by each sensor in degrees Celsius.
- humidity_percent: Relative humidity recorded at each observation as a percentage.
- latitude: Latitude coordinates of the location where each observation was recorded.
- longitude: Longitude coordinates of the location where each observation was recorded.

## 3.4) ETL Process

- Extract: In a real scenario, you'd extract data from its source (databases, APIs, etc.). Here, we'll create some mock data to demonstrate the process.

### 3.4.1) Libraries

In [1]:
import pandas as pd
import numpy as np
from IPython.display import Image
from datetime import datetime
import json
from jsonschema import validate, ValidationError


## 3.5) Extract

In this step, the data is extracted depending on the use case, and source data. For this tutorial mock data will be made so no extraction is needed.

In [2]:


data = {
    "time_recorded": ["2024-05-30 10:00:00", "2024-05-30 11:00:00", "2024-05-30 12:00:00"],
    "sensor_identifier": ["sensor_1", "sensor_2", "sensor_3"],
    "temp_celsius": [22.5, 23.2, 21.8],
    "humidity_percent": [55, 60, 52],
    "latitude": [51.452869111964304, 51.452869111964304, 51.452869111964304],
    "longitude": [5.481549426062068, 5.481549426062068, 5.481549426062068]
}
df = pd.DataFrame(data)

df



Unnamed: 0,time_recorded,sensor_identifier,temp_celsius,humidity_percent,latitude,longitude
0,2024-05-30 10:00:00,sensor_1,22.5,55,51.452869,5.481549
1,2024-05-30 11:00:00,sensor_2,23.2,60,51.452869,5.481549
2,2024-05-30 12:00:00,sensor_3,21.8,52,51.452869,5.481549


## 3.6) Data Model Selection

- Choose a standardized data model for representing your data within the MIM2. This tutorial uses Smart Data Models by FIWARE as an example. Select the most appropriate model(s) that aligns with your data content and adheres to MIM2 specifications.

- For this example we will follow the WeatherObserved Smart Data Model and transform the data to match the requirements of the Smart Data Model. Below is a link showing each attribute and types that can be used:

https://github.com/smart-data-models/dataModel.Weather/blob/master/WeatherObserved/doc/spec.md

## 3.7) Transform

- Transform: This stage involves cleaning, formatting, and manipulating the data to conform to the chosen data model structure. In this example, the data is already relatively clean. However, you might need to handle missing values, convert data types, or create new features depending on your specific data source.

- Map the transformed data elements to the corresponding entities and attributes defined in the chosen data model. Here's an example mapping for our mock data:

In [3]:
# Rename columns to match WeatherObserved model
df_transformed = df.rename(columns={
    "time_recorded": "dateObserved",
    "sensor_identifier": "id",
    "temp_celsius": "temperature",
    "humidity_percent": "relativeHumidity"
})

# Convert data types
df_transformed["dateObserved"] = pd.to_datetime(df_transformed["dateObserved"]).dt.strftime('%Y-%m-%dT%H:%M:%S') + 'Z'
df_transformed["location"] = df_transformed.apply(lambda row: {"type": "Point", "coordinates": [row["longitude"], row["latitude"]]}, axis=1)

# Drop unnecessary columns
df_transformed = df_transformed.drop(columns=["latitude", "longitude"])

# Add required static fields
df_transformed["type"] = "WeatherObserved"
df_transformed["dataProvider"] = "ExampleProvider"

df_transformed



Unnamed: 0,dateObserved,id,temperature,relativeHumidity,location,type,dataProvider
0,2024-05-30T10:00:00Z,sensor_1,22.5,55,"{'type': 'Point', 'coordinates': [5.4815494260...",WeatherObserved,ExampleProvider
1,2024-05-30T11:00:00Z,sensor_2,23.2,60,"{'type': 'Point', 'coordinates': [5.4815494260...",WeatherObserved,ExampleProvider
2,2024-05-30T12:00:00Z,sensor_3,21.8,52,"{'type': 'Point', 'coordinates': [5.4815494260...",WeatherObserved,ExampleProvider


## 3.8) Load

- Load: The transformed data is loaded into a suitable format for further processing. In MIM2 creation, this might involve storing the data in a format compatible with your data space platform.

- Next Step is to load the data, in this example we create a CSV and JSON ouput. This data will now be publishable to a Data Space meeitng the requirements of the Smart Data Model by FIWARE.

In [4]:
# Export DataFrame to CSV file
df_transformed.to_csv('weather_observation_data.csv', index=False)

print("CSV file exported successfully.")

CSV file exported successfully.


In [5]:
# Export DataFrame to JSON file
df_transformed.to_json('weather_observation_data.json', orient='records')

print("JSON file exported successfully.")


JSON file exported successfully.


## 3.9) JSON Validation Test

Next step is to create a JSON schema and test the input data with the schema as shown below.

JSON Schema of data model:

In [6]:
schema = {
    "$schema": "http://json-schema.org/draft-07/schema#",
    "title": "Weather Observation Data",
    "type": "object",
    "properties": {
        "dateObserved": {
            "type": "string",
            "format": "date-time"
        },
        "id": {
            "type": "string"
        },
        "temperature": {
            "type": "number"
        },
        "relativeHumidity": {
            "type": "integer"
        },
        "location": {
            "type": "object",
            "properties": {
                "type": {
                    "type": "string"
                },
                "coordinates": {
                    "type": "array",
                    "items": {
                        "type": "number"
                    }
                }
            },
            "required": ["type", "coordinates"]
        },
        "type": {
            "type": "string"
        },
        "dataProvider": {
            "type": "string"
        }
    },
    "required": ["dateObserved", "id", "temperature", "relativeHumidity", "location", "type", "dataProvider"]
}



JSON input of the data:

In [7]:
input_json = {
    "dateObserved": "2024-05-30T10:00:00Z",
    "id": "sensor_1",
    "temperature": 22.5,
    "relativeHumidity": 55,
    "location": {
        "type": "Point",
        "coordinates": [5.481549426062068, 51.452869111964304]
    },
    "type": "WeatherObserved",
    "dataProvider": "ExampleProvider"
}



Validate the input JSON against the schema

In [8]:

try:
    validate(instance=input_json, schema=schema)
    print("Input JSON is valid.")
except ValidationError as e:
    print(f"Input JSON is invalid: {e.message}")

Input JSON is valid.


As you can see the validation test has passed and the input matches the schema.

## 3.10) Conclusion



This tutorial has guided through the process of creating a Minimum Interoperable Mechanism (MIM2) for data spaces, specifically focusing on the development of shared data models using mock sensor data. The key steps covered include:

- Data Understanding and Preparation: We explored the importance of standardized data models and their role in enabling interoperability across different systems and applications within data spaces.

**Implementation Steps:**

- We started with mock data resembling real-world sensor readings, including temperature, humidity, and spatial coordinates.
- Through the Extract, Transform, Load (ETL) process, we cleaned and formatted the data to conform to the WeatherObserved Smart Data Model.
- The transformed data was then exported to both CSV and JSON formats, suitable for publication in a data space.
- We validated the JSON data against a predefined schema to ensure compliance with the WeatherObserved data model.

**Next Steps:**

- For future projects, consider integrating real data sources and adapting the tutorial steps to more complex MIM2 requirements.
- Explore additional Smart Data Models and adapt them to specific domain needs.
- Further investigate the integration of data models with platforms supporting NGSI-LD and other interoperable frameworks.

This tutorial provides a foundational understanding of MIM2 development, demonstrating practical steps for transforming and validating data using standardized data models. It encourages further exploration into data interoperability and smart city solutions.



# Bibliography/Sources

- FIWARE Smart Data Models. Available at: https://github.com/smart-data-models
- Smart Data Models website : https://smartdatamodels.org/
- MIMs Plus Report: https://living-in.eu/group/7/commitments/mims-plus-version-5-final
