# Tutorial: Basics

This tutorial will cover the basics and prerequisites, required to use the osw-python library in Python and to 
interact with an [Open Semantic Lab (OSL)](https://github.com/OpenSemanticLab) instance, like the [OpenSemanticWorld 
Registry](https://opensemantic.world/). To do this tutorial interactively, jump to [Downloading the library](#Downloading-the-library-optional) and open this notebook in a Jupyter environment.

- [OSL data model](#OSL-data-model)
- [Downloading the library (optional)](#Downloading-the-library-optional)
- [Installation](#Installation)
- [Connecting to an OSL instance](#Connecting-to-an-OSL-instance)
- [Downloading data model dependencies](#Downloading-data-model-dependencies)
- [Interact with an entity](#Interact-with-an-entity) 
- [Interact with files](#Interact-with-files)
- [Interface data sources](#Interface-data-sources)

## OSL data model

Open Semantic Lab provides an [extension](https://github.com/OpenSemanticLab/mediawiki-extensions-OpenSemanticLab) 
for Semantic Mediawiki, delivering a machine-readable data structure based on industry standards, like JSON, JSON-LD,
 JSON-Schema. It allows to import, reference and interface existing (OWL, RDF) ontologies and aims to facilitate the 
 implementation of [FAIR Data principles](https://www.go-fair.org/fair-principles/) out-of-the-box.

<figure>
    <a href="https://opensemantic.world/wiki/File:OSW95a74be1e22d4b6e9e4f836127d5915a.drawio.svg">
    <img src="./img/osw_intro_technology_stack.png" 
        width="400" 
        height="200"
        alt="Components of the OSL extension for  Semantic Mediawiki">
</figure>

JSON serves as the central data storage element for structured data, including the definition of classes and forms
 via JSON-Schema, linking JSON-Data to ontologies and building property graphs.

### Namespaces

As we are using Semantic Mediawiki, the data is stored in pages, which are organized in namespaces. Full page titles 
follow this structure: `<namespace>:<page_title>`. While the `<page_title>` can contain `:`, it is rarely found. The 
most important namespaces in OSL and stored entries are:

- Category - Classes (instances of MetaClasses) and MetaClasses
- Item - Instances of classes
- Property - Semantic properties and reusable property schemas
- JsonSchema - Reusable JSON-Schema definitions
- Template - Templates for rendering pages or performing queries

### Slots

The data stored on a page in Semantic Mediawiki can be stored as plain text (main slot, content model: wikitext) or in
 an arbitrary format in dedicated slots. In OSL, we go with nine slots, tailored to the needs of a data scientist, 
 around the JSON format. The most important slots are `jsondata` and `jsonschema`, which store the data and the schema:

| Slot name       | Content model | Description                                                                                                         |
|-----------------|---------------|---------------------------------------------------------------------------------------------------------------------|
| main            | wikitext      | Default content slot, rendered between the header and footer of the page                                            |
| jsondata        | JSON          | Structured data, (partially) used to render the infobox on the page                                                 |
| jsonschema      | JSON          | stored within a category (=class) page, defining the schema for the jsondata slot of any category member (instance) |
| header          | wikitext      | Content to be placed at the top of the page, below the heading                                                      |
| footer          | wikitext      | Content to be placed at the bottom of the page, above the (Semantic Mediawiki) built-in elements                    |
| header_template | wikitext      | Stored within a category (=class) page, renders the page header of any category member (instance)                   |
| footer_template | wikitext      | stored within a category (=class) page, renders the page footer of any category member (instance)                   |
    
This data structure can be used to generate Python data classes, which can be used to interact with the data in a type-safe manner. The osw-python library includes a [code generator](https://github.com/koxudaxi/datamodel-code-generator/) to generate Python data classes from the JSON schema. 

At the same time, this data structure can be used to auto-generate form editors, create property graphs, and provide 
data and interfaces for applications, such as Machine Learning and data processing.

### Data Classes / Class Hierarchy

Everything is considered an 'Entity', which is analogous to the 'object' in Python. 'Classes' are subclasses and 
instances of 'Entity' or specific 'MetaClasses'. 'MetaClasses' define a JSON schema used to validate the structured 
data stored in the jsondata slot of 'Classes', just as 'Classes' do for individual 'Instances' or 'Items'.

<figure>
    <a href="https://opensemantic.world/wiki/File:OSW96280227805c4e4a8fcf615359b01672.drawio.svg">
    <img src="./img/osw_intro_data_model.png" 
        width="600" 
        height="300"
        alt="OSL data model">
</figure>

### JSON / JSON-Schema

The JSON schema stored in the `jsonschema` slot of a Category (=class) defines the structure of the data stored in 
the `jsondata` slot of members of this category (=items). The JSON schema is a JSON object that defines the 
properties and their types, constraints, and relationships. The JSON schema can be generated from the data stored 
in the `jsondata` slot of the category (=class) or can be created manually. We are using the 
[JSON-Schema](https://json-schema.org/) standard to define the schema. 

Through their ensured consistency, JSON can be used to generate Python data classes and instances, which can be used 
as parameter objects for functions and methods. The generated classes are based on Pydantic models, which provide validation and serialization capabilities.

#### JSON-Schema to Python Data Classes

**Category:MyCategory `jsonschema` slot:**
```json
{
    "type": "object",
    "properties": {
        "text": { "type": "string" },
        "number": { "type": "number" },
        "array": { "type": "array" }
    }
}
```
**Category:MySubCategory `jsonschema` slot:**
```json
{
    "type": "object",
    "allOf": "/wiki/Category:MyCategory?action=raw&slot=jsonschema",
    "properties": {
        "additional_property": { "type": "string" }
    }
}
```
**Generated Python data classes:**
```python
from osw.model.entity import Entity

class MyClass(Entity):
    text: str
    number: float
    array: List[Any]
    
class MySubClass(MyClass):
    additional_property: str
```

#### Python instance to JSON data

```python
from osw.express import OswExpress

osw_obj = OswExpress(domain="wiki-dev.open-semantic-lab.org")

my_instance = MySubClass(
    text="some text",
    number=1.1,
    array=[1, "two", 3.0],
    additional_property = "test2",
)
my_instance.json()
my_instance = osw_obj.store_entity(my_instance)  # wiki upload
```

### Object Oriented Linked Data (OO-LD)

The example above [JSON / JSON Schema](#JSON-/-JSON-Schema) already showed the integration of Object-Oriented 
Programming (OOP) into JSON and JSON Schema. Adding the linked data component of [JSON-LD](https://json-ld.org/) 
enables the reusable annotation of datasets with well established vocabularies (ontologies), such as [schema.org] 
(https://schema.org/). Annotation have to be made at Category (=class) level only, and are available on export of 
instances. This makes the datasets machine-readable, allows for the integration of the data into the 
[Semantic Web](https://en.wikipedia.org/wiki/Semantic_Web) and the creation of property graphs. 

#### A minimal example:
```json
{
  "@context": {
    "schema": "https://schema.org/",
    "name": "schema:name"
  },
  "title": "Person",
  "type": "object",
  "properties": {
    "name": {
      "type": "string",
      "description": "First and Last name"
    }
  }
}
```

### Further reading

- [OSW Introduction](https://opensemantic.world/wiki/Item:OSWdb485a954a88465287b341d2897a84d6)
- [OSW Python Package](https://opensemantic.world/wiki/Item:OSW659a81662ff44af1b2b6febeee7c3a25)
- [JSON Tutorial](https://opensemantic.world/wiki/Item:OSWf1df064239044b8fa3c968339fb93344)
- [JSON-Schema Tutorial](https://opensemantic.world/wiki/Item:OSWf4a9514baed04859a4c6c374a7312f10)
- [JSON-LD Tutorial](https://opensemantic.world/wiki/Item:OSW911488771ea449a6a34051f8213d7f2f)
- [OO-LD Tutorial](https://opensemantic.world/wiki/Item:OSWee501c0fa6a9407d99c058b5ff9d55b4)

## Downloading the library (optional)

The osw-python library is available as GitHub repository and can be downloaded as a ZIP file or via git:

```bash
git clone https://github.com/OpenSemanticLab/osw-python.git <target_directory>
```

## Installation

### From PyPI

Preferably, you can install the library from the Python Package Index (PyPI) via pip, which is recommended for most users:

```bash
conda activate <your_environment>  # optional
pip install osw-python
```

### From source

If you want to install the library from source, you can clone the repository and install it via pip. The option `-e`
 installs the package in editable mode, which means that the source code is linked to the installed package. This is 
 useful for development and testing.

```bash
git clone https://github.com/OpenSemanticLab/osw-python.git <target_directory>
cd <target_directory>
conda activate <your_environment>  # optional
pip install [-e] .
```

## Connecting to an OSL instance

To connect to an OSL instance, you need to provide your login credentials. You can either provide your username and 
password directly or create a bot password. The bot is preferred because its edit rights can be restricted and at the 
same time, edits made programmatically are traceable, being marked as bot edits.

### Creating a bot password

- Log in to your OSL instance
- Navigate to **Special:BotPasswords**, via **Toggle menu → Special pages → Bot passwords**,
    e.g., `https://<wiki_domain>/wiki/Special:BotPasswords`, 
- You must log in again to verify your identity
- Create a new bot password by providing a `Bot name`, e.g., 'PythoBot' and click **Create**
- Save the `Username` and `Bot password` in a safe place, as the password will not be displayed again

### (Optional) Creating a credentials file

You can create a YAML file, e.g., 'credentials.pwd.yaml', with your login credentials, which can be used to connect to the OSL instance. The file must follow the structure below:

```yaml
 <wiki_domain>:
     username: <wiki_username>
     password: <wiki_password>
```

### Connecting via osw-python

It is recommended to use the `osw.express.OswExpress` class to connect to an OSL instance. The class provides a 
number of convenience functions ontop of the underlying `osw.core.OSW`. 

On the first execution of the following cell you will be prompted to enter domain, username and password. The 
credentials will be stored in a file named **credentials.pwd.yaml** in a subfolder **osw_files** of the current working 
directory. In the current working directory, a **.gitignore** file will be created or updated to include the 
credentials file. 

This step is required to download all dependencies (data models) of OswExpress from the OSL instance.

In [None]:
from osw.express import OSW, OswExpress

In [None]:
# Define the wiki_domain for later reuse:
wiki_domain = "wiki-dev.open-semantic-lab.org"  # Replace with the domain of your OSL instance

#### Option 1: Reuse the credentials file created in the previous step

If you are still running in the same CWD, OswExpress will automatically find the credentials file.

Else you will be prompted to enter your username and password.

In [None]:
osw_obj = OswExpress(domain=wiki_domain)  

#### Option 2: Provide a credentials file (path)

If the file does not exist or the domain is not in the file, you will be prompted to enter your username and password.
Unknown domains will be appended to the file.

In [None]:
osw_obj = OswExpress(domain=wiki_domain, cred_filepath="credentials.pwd.yaml")

## Downloading data model dependencies

Loading entities from OSL fetches required data models by default. So if you just want to load, modify and upload one
 (type of) entity, you can scip to [Downloading an entity](#Downloading-an-entity).

Before we can upload entities or files, we need to download the required data models. The data models are stored in the 
`jsonschema` slot of the respective categories (=classes) and are used to generate Python data classes. OswExpress 
offers a convenience function to download all dependencies of a given category, that an item is an instance of. 

> [!NOTE]
> It is important to execute this notebook with the same environment, where the data models are installed to!

### Identify required data models

All categories (=classes) are subcategories of the **Entity** category. The classes **Entity**, **Item** and concepts 
required to provide typing for those classes are provided out-of-the-box within `osw.model.entity`, which imports 
**OswBasemodel(pydantic.BaseModel)** from `osw.model.static`.

To store structured information in an OSL instance, you need to find a fitting **Category**, to create pages 
(in the **Item** or **Category** namespace) in. To explore the data model hierarchy, you can use the graph tool 
provided under **Graph** on  every  page in the `Category` or `Item` namespace, following the `SubClassOf` property. 

A good alternativ is to consult the **Category tree** page and navigate through the collapsible tree. The page can 
be found under 
`https://<wiki_domain>/wiki/Special:CategoryTree?target=Category%3AEntity&mode=categories&namespaces=`. 

Save the `Machine compatible name` and `Full page title` of the category you want to work with in a dictionary. Note 
that only the category, the farthest down a branch, with respect to the root category **Entity**, is required. All 
other categories will be downloaded automatically.

**Example category tree**:
```
Entity
├── Property
├── Statement
└── Item
    ├── Person
    |   └── User
    ├── Location
    |   ├── Site
    |   ├── Building
    |   ├── Floor
    |   └── Room
    ├── CreativeWork
    |   └── Article
    |       └── Tutorial
    └── OrganizationalUnit
        └── Organization
```

> [!NOTE]
> If you find no category, ask your administrator to install page packages via the special page 'Special:Packages'. 
> Page packages are maintained via [GitHub](https://github.com/OpenSemanticWorld-Packages/osw-package-maintenance)

In [None]:
dependencies = {
    "Organization": "Category:OSW1969007d5acf40539642877659a02c23",  # Will fetch: Organization, OrganizationalUnit
    "Person":       "Category:OSWd9aa0bca9b0040d8af6f5c091bf9eec7",  # Will fetch: Person
    "Room":         "Category:OSWc5ed0ed1e33c4b31887c67af25a610c1",  # Will fetch: Room, Location, but not: Site, Building, Floor
    "Tutorial":     "Category:OSW494f660e6a714a1a9681c517bbb975da",  # Will fetch: Tutorial, Article, CreativeWork
}

> [!NOTE]
> Keys in this dictionary will eventually be used in the import statements, should therefore fit the auto generated 
> class names, which are the same as the category's `Machine compatible name`!

### Install data models

Data models (data classes generated in osw.model.entity) can not be imported in Python scripts and modules prior to 
installation. Therefore, it is recommended to do this step either in a separate script, which is run before the main
script, or in the main script itself, before the import statements.

#### Option 1: Install dependencies before import from osw.model.entity 

This option is recommended to put in a separate script, which is run before the main script.

In [None]:
# Will run everytime the script is executed:
osw_obj.install_dependencies(dependencies)


# Static code checker will note 'Module not found' before the installation:
from osw.model.entity import Organization, Person, Room, Tutorial

#### Option 2: Use OswExpress comfort function  for imports

This option is recommended to put in the main script, before the first `from osw.model.entity import` statement.

In [None]:
from typing import TYPE_CHECKING
from osw.express import import_with_fallback


# Will fetch and install dependencies only if not already installed:
import_with_fallback(dependencies)


# Otherwise static code checker will note 'Module not found' before the installation:
if TYPE_CHECKING:
    from osw.model.entity import Organization, Person, Room, Tutorial

### Interact with an entity

Data classes created by the code generator are based on Pydantic models, which provide validation and serialization.

#### Creating an entity

To create an entity, you need to create an instance of the respective data class. The `__init__` method of the data 
class expects keyword arguments for all fields. As per usual for Pydantic models, positional arguments are not 
permitted and the input data is validated during initialization.

In [None]:
# Create a person
john = Person(
    first_name="John",
    last_name="Doe",
    email="john.doe@example.com"
)
# Should return two ValidationErrors:
# - surname: field required
# - email: value is not a valid set

Lets breakdown what happened here

- During initialization, the Pydantic model validates the input data. The validation errors are raised as exceptions.
- The `surname` field is required, but it was not provided.
- The extra field `last_name` was provided, but it was not expected. By default, Pydantic models disregard extra 
fields without warning.
- The `email` field is expected to be a list of strings, but a string was provided. 

In [None]:
# Should run without validation errors
john = Person(
    first_name="John",
    surname="Doe",
    email=["john.doe@example.com"],
)

Before storing the entity in the OSL instance, lets check at which full page title it will be stored. 
The full page title is derived from the `uuid` and the `namespace` of the entity.

In [None]:
from osw.utils.wiki import get_namespace, get_osw_id, get_full_title

print("Namespace:", get_namespace(john))
print("UUID:", john.uuid)
print("OSW-ID:", get_osw_id(john.uuid))
print("Full title:", get_full_title(john))

#### Storing an entity

We can now store this entity in the OSL instance. The `store_entity` method uploads the entity to the OSL instance. 

In [None]:
osw_obj.store_entity(john)
# In this specific case equivalent to:
params = OswExpress.StoreEntityParam(
    entities=[john],
    namespace=get_namespace(john),
    parallel=False,
    overwrite="keep existing",
    overwrite_per_class=None,
)  # All default values included
# osw_obj.store_entity(params)

Like most methods and functions in the osw-python library, the `store_entity` takes only a single argument. Usually 
either a specific object type or a dedicated params object is accepted. If an object of type other than the params 
object is passed, it is usually tested for compatibility and put inside a params object, filling the other parameters 
with default values. 

This ensures that the method signature is as simple as possible and at the same time allows full typing and validation 
of input parameters that the method is easy to use. 

#### Downloading an entity

To download an entity, you need to provide the full page title of the entity. The `download_entity` method downloads the
entity from the OSL instance and returns an instance of the respective data class. If the respective data class is 
not already part of `osw.model.entity`, the data class is generated on-the-fly by default.

In [None]:
john2 = osw_obj.load_entity(get_full_title(john))

Lets have a look at the attributes of the downloaded entity

In [None]:
from pprint import pprint
pprint(john2.dict())

Besides the attributes that we set (first_name, surname, email), the downloaded entity has additional attributes, 
that are generated by default, either when the entity is initialized (uuid, meta.wiki_page.title) or loaded (None 
valued attributes). 

Loading an entity from the OSL instance downloads the full `jsondata` slot and pass it to the 
`__init__` method of the respective data class. Thereby, attributes not present in the `jsondata` slot are set to the
 default value of the data class.

#### Modifying an entity

To modify an entity, you can change the attributes of the entity instance. The attributes can be accessed and modified 
like any other attribute of a Python object.

In [None]:
# Adding a new attribute
john2.middle_name = "R."
# Changing an existing attribute 
john2.email = {"john.doe@gmx.de"}

# Checking the made changes:
pprint(john2.dict())

#### Storing an altered entity

Here the same applies as for [Storing an entity](#Storing-an-entity). BUT: overwriting entities is not possible with 
default settings "keep existing". Therefore, you need to call the method `store_entity` passing a StoreEntityParam with 
the attribute `overwrite` set to "overwrite".


In [None]:
# Option 1: Overwrite all entities
osw_obj.store_entity(OSW.StoreEntityParam(entities=[john2], overwrite=True))

In [None]:
# Option 2: Overwrite only entities of type Person
osw_obj.store_entity(OSW.StoreEntityParam(
    entities=[john2], overwrite_per_class=[OSW.OverwriteClassParam(model=Person, overwrite=True)]))

In [None]:
# Option 3: Overwrite only the email attribute of entities of type Person
osw_obj.store_entity(OSW.StoreEntityParam(
    entities=[john2], overwrite_per_class=[
        OSW.OverwriteClassParam(model=Person, overwrite=False,per_property={"email": True})]))

Here all three options will have the same result, but in many cases the result will differ, especially if you have 
entities of different classes in the list of entities to store.

The param `overwrite` is applied to all entities handed to the method regardless of type / class. It is also possible
 to specify the overwrite behavior per class, by providing a list of `OSW.OverwriteClassParam`s. Those can even be 
 specific down to the property level. 
- Available options for `OSW.StoreEntitParam.overwrite`, `OSW.OverwriteClassParam.overwrite` (per class) and `OSW
.OverwriteClassParam.per_property` are: 
    - `OSW.OverwriteOptions.true`: True - overwrite the remote entity or property with the local one
    - `OSW.OverwriteOptions.false`: False - do not overwrite the remote entity or property with the local one
    - `OSW.OverwriteOptions.only_empty`: "only empty" - overwrite the remote entity or property with the local one, 
      if the remote entity or property is empty
- Only available to `OSW.StoreEntitParam.overwrite` and `OSW.OverwriteClassParam.overwrite` (per class) are:
    - `OSW.AddOverwriteClassOptions.replace_remote`: "replace remote" - replace the remote entity with the local one and
      removes all properties not present in the local entity
    - `OSW.AddOverwriteClassOptions.keep_existing`: "keep existing" - keep the remote entity, if one exists under this 
      OSW-ID   

### Interact with files 

#### Download a file

Let's say you have already uploaded a file to the instance of OSL you are connected to and have the URL to the file 
available. (Execute the Upload a file section to upload a file to the OSL instance.) You can download the file with 
just two lines:


In [None]:
from osw.express import osw_download_file
local_file = osw_download_file(
    "https://wiki-dev.open-semantic-lab.org/wiki/File:OSWaa635a571dfb4aa682e43b98937f5dd3.pdf"
    # , use_cached=True  # Can be used to download the file only once, e.g., when developing code
    # , overwrite=True  # Can be used to avoid overwriting an existing file
)

The object `local_file` is an instance of `OswExpress.DownloadFileResult` and contains the path to the downloaded file, 
which is accessible via:

The class `OswExpress.DownloadFileResult` implements all dunder methods required for a context manager. Therefore, it
 can be used with the `with` statement to ensure the file is closed properly after use:

In [None]:
with osw_download_file(
        "https://wiki-dev.open-semantic-lab.org/wiki/File:OSWac9224e1a280449dba71d945b1581d57.txt", overwrite=True
) as file:
    print(file.read())

#### Round-Robin

Let's create a file, upload it to the OSL instance, download it and read and alter its content, before uploading it 
again.


In [None]:
from pathlib import Path
from osw.express import osw_upload_file, osw_download_file

# Create a file
fp = Path("example.txt")
with open(fp, "w") as file:
    file.write("Hello, World!")
# Upload a file to an OSW instance
wiki_file = osw_upload_file(fp, domain=wiki_domain)
# Delete the local file
fp.unlink()

# Download the file
local_file = osw_download_file(wiki_file.url, mode="r+")  # mode="r+" to read and write

with local_file as file:
    content = file.read()
    print("Original content:")
    print(content)
    content = content.replace("World", "OSW")
    print("\nModified content:")
    print(content)
    # Write the modified content back to the file
    file.write(content)

# Upload the modified file
modified_wiki_file = osw_upload_file(local_file)

In [None]:
# Delete WikiFile from OSW instance after you are done with it
wiki_file.delete()

### Interface data sources

#### Tabular data: Excel, CSV and others

Let's create a demo table and save it to an Excel file.   

In [None]:
import pandas as pd

df = pd.DataFrame(
    {
        "FirstName": ["John", "Jane", "Alice"],
        "LastName": ["Doe", "Do", "Dont"],
        "Email": ["john.doe@example.com", "jane.do@example.com", "alice.dont@example.com"],
    }
)
df.to_excel("demo.xlsx", index=False)
del df

In [None]:
from pprint import pprint
# Let's read in our example Excel file
data_df = pd.read_excel("demo.xlsx")
# Pandas dict representation is optimal for converting to JSON
# Let's have a look at the first row
john_dict = data_df.iloc[0].to_dict()
pprint(john_dict)

In [None]:
# Let's convert the dict to a Person instance
from osw.model.entity import Person
john = Person(**john_dict)
# This will cause ValidationError(s):
# - first_name: field required
# - surname: field required

Explanation: This is due to the dictionary unpacking operator `**`, which passes the dictionary keys as keyword
arguments to the `Person` class. The `first_name` and `surname` fields are required, but they are not present in the
dictionary. 

We have several options to resolve this issue:
- Replace the dictionary keys with fitting ones
    - By renaming the columns in the DataFrame before converting it to a dictionary
    - By providing a mapping dictionary and replacing the keys in the dictionary, using the following function:
      ```python
      def replace_keys(d, key_map):    
        return {key_map.get(k, k): v for k, v in d.items()}
      ```
- Create a HelperClass, which inherits from the target data class and `osw.data.import_utility.HelperModel`, and 
  implements a transformation function to create the target data class instance from the dictionary 

##### Option 1: Rename columns in the DataFrame

This option is very simple and will serve you well for cases of low complexity, where the datatypes in the DataFrame 
columns already match the datatypes of the target data class.

In [None]:
# Let's print out the columns first
print(data_df.columns)

In [None]:
# Let's create the mapping dictionary and rename the columns
mapping = {
    "FirstName": "first_name",
    "LastName": "surname",
    "Email": "email"
}
data_df.rename(columns=mapping, inplace=True)
# Let's have a look at the first row
john_dict = data_df.iloc[0].to_dict()
print(john_dict)

In [None]:
# Let's construct an instance of the Person data model
john = Person(**john_dict)
# This will cause a ValidationError:
# - email: value is not a valid set

In [None]:
# Let's correct the email field
john_dict["email"] = [john_dict["email"]]
john = Person(**john_dict)
pprint(john.dict())

##### Option 2: Create a HelperModel and a transformation function

This approach will be able to treat even cases of high complexity, where the datatypes in the DataFrame columns do not 
match the datatypes of the target data class or where references to other instances in a dataset have to be made. 

[!NOTE] Property and variable names in Python must not contain spaces, so the column names in the DataFrame have 
to be transformed accordingly.

In [None]:
from typing import Any
from osw.data.import_utility import HelperModel

class PersonHelper(Person, HelperModel):
    # Attributes of the first base class are set to Optional[Any], default: None
    FirstName: Any
    LastName: Any
    Email: Any
    
    def transform_attributes(self, dd: dict = None) -> bool:
        super().transform_attributes()
        self.first_name = self.FirstName
        self.surname = self.LastName
        self.email = {self.Email}
        return True

# Let's create a new instance of the PersonHelper class
data_df = pd.read_excel("demo.xlsx")
john_dict = data_df.iloc[0].to_dict()
john_helper = PersonHelper(**john_dict)
print("Before transformation:")
pprint(john_helper.dict())

In [None]:
# Let's see the effect of the transformation
john_helper.transform_attributes()
print("After transformation:")
pprint(john_helper.dict())

In [None]:
# Actually we access 'transformed' directly
# If the transformation operations had not been performed already,
#  accessing 'transformed' would trigger them
john = john_helper.transformed
print("After casting:")
pprint(john.dict())

In [None]:
# We can do the same for all instances quite easily:
entities = []
for ii in data_df.index:
    entities.append(PersonHelper(**data_df.iloc[ii].to_dict()).transformed)

In [None]:
# And store the entities in the OSL instance
osw_obj.store_entity(entities)