# Tutorial: Basics

This tutorial will cover the basics and prerequisites, required to use the osw-python library in Python and to 
interact with an [Open Semantic Lab (OSL)](https://github.com/OpenSemanticLab) instance, like the [OpenSemanticWorld 
Registry](https://opensemantic.world/). To do this tutorial interactively, jump to [Downloading the library](#Downloading-the-library-optional) and open this notebook in a Jupyter environment.

- [OSL data model](#OSL-data-model)
- [Downloading the library (optional)](#Downloading-the-library-optional)
- [Installation](#Installation)
- [Connecting to an OSL instance](#Connecting-to-an-OSL-instance)
- [Downloading data model dependencies](#Downloading-data-model-dependencies)
- [Interact with an entity](#Interact-with-an-entity) 
    - Download an entity
    - Modify the entity
    - Upload an entity
- [Interact with files](#Interact-with-files)
    - Download a file
    - Read contents of a file
    - Upload a file
- [Interface data sources](#Interface-data-sources)
    - Tabular data: Excel, CSV --> pandas --> dict
    - Database: SQL --> pyodbc --> dict
    - Two options:
        - Build a Helper class, inheriting from target dataclass and osw.data.import_utility.HelperModel
        - Write a transformation function
- User workflows --> prefect

## OSL data model

Open Semantic Lab provides an [extension](https://github.com/OpenSemanticLab/mediawiki-extensions-OpenSemanticLab) 
for Semantic Mediawiki, delivering a machine-readable data structure based on industry standards, like JSON, JSON-LD,
 JSON-Schema. It allows to import, reference and interface existing (OWL, RDF) ontologies and aims to facilitate the 
 implementation of [FAIR Data principles](https://www.go-fair.org/fair-principles/) out-of-the-box.

<figure>
    <a href="https://opensemantic.world/wiki/File:OSW95a74be1e22d4b6e9e4f836127d5915a.drawio.svg">
    <img src="./img/osw_intro_technology_stack.png" 
        width="400" 
        height="200"
        alt="Components of the OSL extension for  Semantic Mediawiki">
</figure>

JSON serves as the central data storage element for structured data, including the definition of classes and forms
 via JSON-Schema, linking JSON-Data to ontologies and building property graphs.

### Namespaces

As we are using Semantic Mediawiki, the data is stored in pages, which are organized in namespaces. Full page titles 
follow this structure: `<namespace>:<page_title>`. While the `<page_title>` can contain `:`, it is rarely found. The 
most important namespaces in OSL and stored entries are:

- Category - Classes (instances of MetaClasses) and MetaClasses
- Item - Instances of classes
- Property - Semantic properties and reusable property schemas
- JsonSchema - Reusable JSON-Schema definitions
- Template - Templates for rendering pages or performing queries

### Slots

The data stored on a page in Semantic Mediawiki can be stored as plain text (main slot, content model: wikitext) or in
 an arbitrary format in dedicated slots. In OSL, we go with nine slots, tailored to the needs of a data scientist, 
 around the JSON format. The most important slots are `jsondata` and `jsonschema`, which store the data and the schema:

| Slot name       | Content model | Description                                                                                                         |
|-----------------|---------------|---------------------------------------------------------------------------------------------------------------------|
| main            | wikitext      | Default content slot, rendered between the header and footer of the page                                            |
| jsondata        | JSON          | Structured data, (partially) used to render the infobox on the page                                                 |
| jsonschema      | JSON          | stored within a category (=class) page, defining the schema for the jsondata slot of any category member (instance) |
| header          | wikitext      | Content to be placed at the top of the page, below the heading                                                      |
| footer          | wikitext      | Content to be placed at the bottom of the page, above the (Semantic Mediawiki) built-in elements                    |
| header_template | wikitext      | Stored within a category (=class) page, renders the page header of any category member (instance)                   |
| footer_template | wikitext      | stored within a category (=class) page, renders the page footer of any category member (instance)                   |
    
This data structure can be used to generate Python data classes, which can be used to interact with the data in a type-safe manner. The osw-python library includes a [code generator](https://github.com/koxudaxi/datamodel-code-generator/) to generate Python data classes from the JSON schema. 

At the same time, this data structure can be used to auto-generate form editors, create property graphs, and provide 
data and interfaces for applications, such as Machine Learning and data processing.

### Data Classes / Class Hierarchy

Everything is considered an 'Entity', which is analogous to the 'object' in Python. 'Classes' are subclasses and 
instances of 'Entity' or specific 'MetaClasses'. 'MetaClasses' define a JSON schema used to validate the structured 
data stored in the jsondata slot of 'Classes', just as 'Classes' do for individual 'Instances' or 'Items'.

<figure>
    <a href="https://opensemantic.world/wiki/File:OSW96280227805c4e4a8fcf615359b01672.drawio.svg">
    <img src="./img/osw_intro_data_model.png" 
        width="400" 
        height="200"
        alt="OSL data model">
</figure>

### JSON / JSON-Schema

The JSON schema stored in the `jsonschema` slot of a Category (=class) defines the structure of the data stored in 
the `jsondata` slot of members of this category (=items). The JSON schema is a JSON object that defines the 
properties and their types, constraints, and relationships. The JSON schema can be generated from the data stored 
in the `jsondata` slot of the category (=class) or can be created manually. We are using the 
[JSON-Schema](https://json-schema.org/) standard to define the schema. 

Through their ensured consistency, JSON can be used to generate Python data classes and instances, which can be used 
as parameter objects for functions and methods. The generated classes are based on Pydantic models, which provide validation and serialization capabilities.

#### JSON-Schema to Python Data Classes

**Category:MyCategory `jsonschema` slot:**
```json
{
    "type": "object",
    "properties": {
        "text": { "type": "string" },
        "number": { "type": "number" },
        "array": { "type": "array" }
    }
}
```
**Category:MySubCategory `jsonschema` slot:**
```json
{
    "type": "object",
    "allOf": "/wiki/Category:MyCategory?action=raw&slot=jsonschema",
    "properties": {
        "additional_property": { "type": "string" }
    }
}
```
**Generated Python data classes:**
```python
from osw.model.entity import Entity

class MyClass(Entity):
    text: str
    number: float
    array: List[Any]
    
class MySubClass(MyClass):
    additional_property: str
```

#### Python instance to JSON data

```python
from osw.express import OswExpress

osw_obj = OswExpress(domain="wiki-dev.open-semantic-lab.org")

my_instance = MySubClass(
    text="some text",
    number=1.1,
    array=[1, "two", 3.0],
    additional_property = "test2",
)
my_instance.json()
my_instance = osw_obj.store_entity(my_instance)  # wiki upload
```

### Object Oriented Linked Data (OO-LD)

The example above [JSON / JSON Schema](#JSON-/-JSON-Schema) already showed the integration of Object Oriented 
Programming (OOP) into JSON and JSON Schema. Adding the linked data component of [JSON-LD](https://json-ld.org/) 
enables the reusable annotation of datasets with well established vocabularies (ontologies), such as [schema.org] 
(https://schema.org/). Annotation have to be made at Category (=class) level only, and are available on export of 
instances. This makes the datasets machine-readable, allows for the integration of the data into the 
[Semantic Web](https://en.wikipedia.org/wiki/Semantic_Web) and the creation of property graphs. 

#### A minimal example:
```json
{
  "@context": {
    "schema": "https://schema.org/",
    "name": "schema:name"
  },
  "title": "Person",
  "type": "object",
  "properties": {
    "name": {
      "type": "string",
      "description": "First and Last name"
    }
  }
}
```

### Further reading

- [OSW Introduction](https://opensemantic.world/wiki/Item:OSWdb485a954a88465287b341d2897a84d6)
- [OSW Python Package](https://opensemantic.world/wiki/Item:OSW659a81662ff44af1b2b6febeee7c3a25)
- [JSON Tutorial](https://opensemantic.world/wiki/Item:OSWf1df064239044b8fa3c968339fb93344)
- [JSON-Schema Tutorial](https://opensemantic.world/wiki/Item:OSWf4a9514baed04859a4c6c374a7312f10)
- [JSON-LD Tutorial](https://opensemantic.world/wiki/Item:OSW911488771ea449a6a34051f8213d7f2f)
- [OO-LD Tutorial](https://opensemantic.world/wiki/Item:OSWee501c0fa6a9407d99c058b5ff9d55b4)

## Downloading the library (optional)

The osw-python library is available as GitHub repository and can be downloaded as a ZIP file or via git:

```bash
git clone https://github.com/OpenSemanticLab/osw-python.git <target_directory>
```

## Installation

### From PyPI

Preferably, you can install the library from the Python Package Index (PyPI) via pip, which is recommended for most users:

```bash
conda activate <your_environment>  # optional
pip install osw-python
```

### From source

If you want to install the library from source, you can clone the repository and install it via pip. The option `-e`
 installs the package in editable mode, which means that the source code is linked to the installed package. This is 
 useful for development and testing.

```bash
git clone https://github.com/OpenSemanticLab/osw-python.git <target_directory>
cd <target_directory>
conda activate <your_environment>  # optional
pip install [-e] .
```

## Connecting to an OSL instance

To connect to an OSL instance, you need to provide your login credentials. You can either provide your username and 
password directly or create a bot password. The bot is preferred because its edit rights can be restricted and at the 
same time, edits made programmatically are traceable, being marked as bot edits.

### Creating a bot password

- Log in to your OSL instance
- Navigate to **Special:BotPasswords**, via **Toggle menu → Special pages → Bot passwords**,
    e.g., `https://<wiki_domain>/wiki/Special:BotPasswords`, 
- You must log in again to verify your identity
- Create a new bot password by providing a `Bot name`, e.g., 'PythoBot' and click **Create**
- Save the `Username` and `Bot password` in a safe place, as the password will not be displayed again

### (Optional) Creating a credentials file

You can create a YAML file, e.g., 'credentials.pwd.yaml', with your login credentials, which can be used to connect to the OSL instance. The file must follow the structure below:

```yaml
 <wiki_domain>:
     username: <wiki_username>
     password: <wiki_password>
```

### Connecting via osw-python

It is recommended to use the `osw.express.OswExpress` class to connect to an OSL instance. The class provides a 
number of convenience functions ontop of the underlying `osw.core.OSW`. 

On the first execution of the following cell you will be prompted to enter domain, username and password. The 
credentials will be stored in a file named **credentials.pwd.yaml** in a subfolder **osw_files** of the current working 
directory. In the current working directory, a **.gitignore** file will be created or updated to include the 
credentials file. 

This step is required to download all dependencies (data models) of OswExpress from the OSL instance.

In [None]:
from osw.express import OswExpress

#### Option 1: Reuse the credentials file created in the previous step

If you are still running in the same CWD, OswExpress will automatically find the credentials file.

Else you will be prompted to enter your username and password.

In [None]:
osw_obj = OswExpress(domain="wiki-dev.open-semantic-lab.org")  # Replace with your OSL instance

#### Option 2: Provide a credentials file (path)

If the file does not exist or the domain is not in the file, you will be prompted to enter your username and password.
Unknown domains will be appended to the file.

In [None]:
osw_obj = OswExpress(domain="wiki-dev.open-semantic-lab.org", cred_filepath="credentials.pwd.yaml")

## Downloading data model dependencies

Before we can upload entities or files, we need to download the required data models. The data models are stored in the 
`jsonschema` slot of the respective categories (=classes) and are used to generate Python data classes. OswExpress 
offers a convenience function to download all dependencies of a given category, that an item is an instance of. 

### Identify required data models

All categories (=classes) are subcategories of the **Entity** category. The classes **Entity**, **Item** and concepts 
required to provide typing for those classes are provided out-of-the-box within `osw.model.entity`, which imports 
**OswBasemodel(pydantic.BaseModel)** from `osw.model.static`.

To store structured information in an OSL instance, you need to find a fitting **Category**, to create pages 
(in the **Item** or **Category** namespace) in. To explore the data model hierarchy, you can use the graph tool 
provided under **Graph** on  every  page in the `Category` or `Item` namespace, following the `SubClassOf` property. 

A good alternativ is to consult the **Category tree** page and navigate through the collapsible tree. The page can 
be found under 
`https://<wiki_domain>/wiki/Special:CategoryTree?target=Category%3AEntity&mode=categories&namespaces=`. 

Save the `Machine compatible name` and `Full page title` of the category you want to work with in a dictionary. Note 
that only the category, the farthest down a branch, with respect to the root category **Entity**, is required. All 
other categories will be downloaded automatically.

**Example category tree**:
```
Entity
├── Property
├── Statement
└── Item
    ├── Person
    |   └── User
    ├── Location
    |   ├── Site
    |   ├── Building
    |   ├── Floor
    |   └── Room
    ├── CreativeWork
    |   ├── Article
    |       └── Tutorial
    └── OrganizationalUnit
        └── Organization
```

> [!Note]
> 
> If you find no category, ask your administrator to install page packages via the special page 'Special:Packages'. 
> Page packages are maintained via [GitHub](https://github.com/OpenSemanticWorld-Packages/osw-package-maintenance)

In [None]:
dependencies = {
    "Organization": "Category:OSW1969007d5acf40539642877659a02c23",  # Will fetch: Organization, OrganizationalUnit
    "Person":       "Category:OSWd9aa0bca9b0040d8af6f5c091bf9eec7",  # Will fetch: Person
    "Room":         "Category:OSWc5ed0ed1e33c4b31887c67af25a610c1",  # Will fetch: Room, Location, but not: Site, Building, Floor
    "Tutorial":     "Category:OSW494f660e6a714a1a9681c517bbb975da",  # Will fetch: Tutorial, Article, CreativeWork
}

> [!Note]
> 
> Keys in this dictionary will eventually be used in the import statements, should therefore fit the auto generated 
> class names, which are the same as the category's `Machine compatible name`!

### Install data models

Data models (data classes generated in osw.model.entity) can not be imported in Python scripts and modules prior to 
installation. Therefore, it is recommended to do this step either in a separate script, which is run before the main
script, or in the main script itself, before the import statements.

#### Option 1: Install dependencies before import from osw.model.entity 

This option is recommended to put in a separate script, which is run before the main script.

In [None]:
from typing import TYPE_CHECKING


# Will run everytime the script is executed:
osw_obj.install_dependencies(dependencies)


# Static code checker will note 'Module not found' before the installation:
if TYPE_CHECKING:
    from osw.model.entity import Organization, Person, Room, Tutorial

#### Option 2: Use OswExpress comfort function  for imports

This option is recommended to put in the main script, before the first `from osw.model.entity import` statement.

In [None]:
from typing import TYPE_CHECKING
from osw.express import import_with_fallback


# Will fetch and install dependencies only if not already installed:
import_with_fallback(dependencies)


# Otherwise static code checker will note 'Module not found' before the installation:
if TYPE_CHECKING:
    from osw.model.entity import Organization, Person, Room, Tutorial

### Interact with an entity



In [None]:
from osw.model.entity import Organization, Person, Room, Tutorial