diff --git a/README.md b/README.md index 10242e7a..6306748d 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,17 @@ [![PyPI-Server](https://img.shields.io/pypi/v/osw.svg)](https://pypi.org/project/osw/) [![DOI](https://zenodo.org/badge/458130867.svg)](https://zenodo.org/badge/latestdoi/458130867) -[![Coveralls](https://img.shields.io/coveralls/github/OpenSemanticLab/osw-python/main.svg)](https://coveralls.io/r//osw) +[![Coveralls](https://img.shields.io/coveralls/github/OpenSemanticLab/osw-python/main.svg)](https://coveralls.io/r/OpenSemanticLab/osw) +[![docs](xx.xx)](https://opensemanticlab.github.io/osw-python/) +![license](https://img.shields.io/github/license/OpenSemanticLab/osw-python.svg) +[![Pydantic v2](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/pydantic/pydantic/main/docs/badge/v2.json)](https://pydantic.dev) [![Project generated with PyScaffold](https://img.shields.io/badge/-PyScaffold-005CA0?logo=pyscaffold)](https://pyscaffold.org/) # osw Python toolset for data processing, queries, wikicode generation and page manipulation within OpenSemanticLab. -General features for object oriented interaction with knowledge graphs are planned to be moved to a standalone package: [oold-python](https://github.com/OpenSemanticWorld/oold-python) + +General features for object-oriented interaction with knowledge graphs are planned to be moved to a standalone package: +[oold-python](https://github.com/OpenSemanticWorld/oold-python) ## Installation ``` diff --git a/docs/tutorials/basics.ipynb b/docs/tutorials/basics.ipynb new file mode 100644 index 00000000..e44e826b --- /dev/null +++ b/docs/tutorials/basics.ipynb @@ -0,0 +1,1163 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "47f7c6ba83e22f59", + "metadata": {}, + "source": [ + "# Tutorial: Basics\n", + "\n", + "This tutorial will cover the basics and prerequisites, required to use the osw-python library in Python and to \n", + "interact with an [Open Semantic Lab (OSL)](https://github.com/OpenSemanticLab) instance, like the [OpenSemanticWorld \n", + "Registry](https://opensemantic.world/). To do this tutorial interactively, jump to [Downloading the library](#Downloading-the-library-optional) and open this notebook in a Jupyter environment.\n", + "\n", + "- [OSL data model](#OSL-data-model)\n", + "- [Downloading the library (optional)](#Downloading-the-library-optional)\n", + "- [Installation](#Installation)\n", + "- [Connecting to an OSL instance](#Connecting-to-an-OSL-instance)\n", + "- [Downloading data model dependencies](#Downloading-data-model-dependencies)\n", + "- [Interact with an entity](#Interact-with-an-entity) \n", + "- [Interact with files](#Interact-with-files)\n", + "- [Interface data sources](#Interface-data-sources)" + ] + }, + { + "cell_type": "markdown", + "id": "64e0121de7a40c80", + "metadata": {}, + "source": [ + "## OSL data model\n", + "\n", + "Open Semantic Lab provides an [extension](https://github.com/OpenSemanticLab/mediawiki-extensions-OpenSemanticLab) \n", + "for Semantic Mediawiki, delivering a machine-readable data structure based on industry standards, like JSON, JSON-LD,\n", + " JSON-Schema. It allows to import, reference and interface existing (OWL, RDF) ontologies and aims to facilitate the \n", + " implementation of [FAIR Data principles](https://www.go-fair.org/fair-principles/) out-of-the-box.\n", + "\n", + "
\n", + " \n", + " \"Components\n", + "
\n", + "\n", + "JSON serves as the central data storage element for structured data, including the definition of classes and forms\n", + " via JSON-Schema, linking JSON-Data to ontologies and building property graphs.\n", + "\n", + "### Namespaces\n", + "\n", + "As we are using Semantic Mediawiki, the data is stored in pages, which are organized in namespaces. Full page titles \n", + "follow this structure: `:`. While the `` can contain `:`, it is rarely found. The \n", + "most important namespaces in OSL and stored entries are:\n", + "\n", + "- Category - Classes (instances of MetaClasses) and MetaClasses\n", + "- Item - Instances of classes\n", + "- Property - Semantic properties and reusable property schemas\n", + "- JsonSchema - Reusable JSON-Schema definitions\n", + "- Template - Templates for rendering pages or performing queries\n", + "\n", + "### Slots\n", + "\n", + "The data stored on a page in Semantic Mediawiki can be stored as plain text (main slot, content model: wikitext) or in\n", + " an arbitrary format in dedicated slots. In OSL, we go with nine slots, tailored to the needs of a data scientist, \n", + " around the JSON format. The most important slots are `jsondata` and `jsonschema`, which store the data and the schema:\n", + "\n", + "| Slot name | Content model | Description |\n", + "|-----------------|---------------|---------------------------------------------------------------------------------------------------------------------|\n", + "| main | wikitext | Default content slot, rendered between the header and footer of the page |\n", + "| jsondata | JSON | Structured data, (partially) used to render the infobox on the page |\n", + "| jsonschema | JSON | stored within a category (=class) page, defining the schema for the jsondata slot of any category member (instance) |\n", + "| header | wikitext | Content to be placed at the top of the page, below the heading |\n", + "| footer | wikitext | Content to be placed at the bottom of the page, above the (Semantic Mediawiki) built-in elements |\n", + "| header_template | wikitext | Stored within a category (=class) page, renders the page header of any category member (instance) |\n", + "| footer_template | wikitext | stored within a category (=class) page, renders the page footer of any category member (instance) |\n", + " \n", + "This data structure can be used to generate Python data classes, which can be used to interact with the data in a type-safe manner. The osw-python library includes a [code generator](https://github.com/koxudaxi/datamodel-code-generator/) to generate Python data classes from the JSON schema. \n", + "\n", + "At the same time, this data structure can be used to auto-generate form editors, create property graphs, and provide \n", + "data and interfaces for applications, such as Machine Learning and data processing.\n", + "\n", + "### Data Classes / Class Hierarchy\n", + "\n", + "Everything is considered an 'Entity', which is analogous to the 'object' in Python. 'Classes' are subclasses and \n", + "instances of 'Entity' or specific 'MetaClasses'. 'MetaClasses' define a JSON schema used to validate the structured \n", + "data stored in the jsondata slot of 'Classes', just as 'Classes' do for individual 'Instances' or 'Items'.\n", + "\n", + "
\n", + " \n", + " \"OSL\n", + "
" + ] + }, + { + "cell_type": "markdown", + "id": "6bef0569b5824ea0", + "metadata": {}, + "source": [ + "### JSON / JSON-Schema\n", + "\n", + "The JSON schema stored in the `jsonschema` slot of a Category (=class) defines the structure of the data stored in \n", + "the `jsondata` slot of members of this category (=items). The JSON schema is a JSON object that defines the \n", + "properties and their types, constraints, and relationships. The JSON schema can be generated from the data stored \n", + "in the `jsondata` slot of the category (=class) or can be created manually. We are using the \n", + "[JSON-Schema](https://json-schema.org/) standard to define the schema. \n", + "\n", + "Through their ensured consistency, JSON can be used to generate Python data classes and instances, which can be used \n", + "as parameter objects for functions and methods. The generated classes are based on Pydantic models, which provide validation and serialization capabilities.\n", + "\n", + "#### JSON-Schema to Python Data Classes\n", + "\n", + "**Category:MyCategory `jsonschema` slot:**\n", + "```json\n", + "{\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"text\": { \"type\": \"string\" },\n", + " \"number\": { \"type\": \"number\" },\n", + " \"array\": { \"type\": \"array\" }\n", + " }\n", + "}\n", + "```\n", + "**Category:MySubCategory `jsonschema` slot:**\n", + "```json\n", + "{\n", + " \"type\": \"object\",\n", + " \"allOf\": \"/wiki/Category:MyCategory?action=raw&slot=jsonschema\",\n", + " \"properties\": {\n", + " \"additional_property\": { \"type\": \"string\" }\n", + " }\n", + "}\n", + "```\n", + "**Generated Python data classes:**\n", + "```python\n", + "from osw.model.entity import Entity\n", + "\n", + "class MyClass(Entity):\n", + " text: str\n", + " number: float\n", + " array: List[Any]\n", + " \n", + "class MySubClass(MyClass):\n", + " additional_property: str\n", + "```\n", + "\n", + "#### Python instance to JSON data\n", + "\n", + "```python\n", + "from osw.express import OswExpress\n", + "\n", + "osw_obj = OswExpress(domain=\"wiki-dev.open-semantic-lab.org\")\n", + "\n", + "my_instance = MySubClass(\n", + " text=\"some text\",\n", + " number=1.1,\n", + " array=[1, \"two\", 3.0],\n", + " additional_property = \"test2\",\n", + ")\n", + "my_instance.json()\n", + "my_instance = osw_obj.store_entity(my_instance) # wiki upload\n", + "```\n", + "\n", + "### Object Oriented Linked Data (OO-LD)\n", + "\n", + "The example above [JSON / JSON Schema](#JSON-/-JSON-Schema) already showed the integration of Object-Oriented \n", + "Programming (OOP) into JSON and JSON Schema. Adding the linked data component of [JSON-LD](https://json-ld.org/) \n", + "enables the reusable annotation of datasets with well established vocabularies (ontologies), such as [schema.org] \n", + "(https://schema.org/). Annotation have to be made at Category (=class) level only, and are available on export of \n", + "instances. This makes the datasets machine-readable, allows for the integration of the data into the \n", + "[Semantic Web](https://en.wikipedia.org/wiki/Semantic_Web) and the creation of property graphs. \n", + "\n", + "#### A minimal example:\n", + "```json\n", + "{\n", + " \"@context\": {\n", + " \"schema\": \"https://schema.org/\",\n", + " \"name\": \"schema:name\"\n", + " },\n", + " \"title\": \"Person\",\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"name\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"First and Last name\"\n", + " }\n", + " }\n", + "}\n", + "```\n", + "\n", + "### Further reading\n", + "\n", + "- [OSW Introduction](https://opensemantic.world/wiki/Item:OSWdb485a954a88465287b341d2897a84d6)\n", + "- [OSW Python Package](https://opensemantic.world/wiki/Item:OSW659a81662ff44af1b2b6febeee7c3a25)\n", + "- [JSON Tutorial](https://opensemantic.world/wiki/Item:OSWf1df064239044b8fa3c968339fb93344)\n", + "- [JSON-Schema Tutorial](https://opensemantic.world/wiki/Item:OSWf4a9514baed04859a4c6c374a7312f10)\n", + "- [JSON-LD Tutorial](https://opensemantic.world/wiki/Item:OSW911488771ea449a6a34051f8213d7f2f)\n", + "- [OO-LD Tutorial](https://opensemantic.world/wiki/Item:OSWee501c0fa6a9407d99c058b5ff9d55b4)" + ] + }, + { + "cell_type": "markdown", + "id": "6e259e4f34e709ea", + "metadata": {}, + "source": [ + "## Downloading the library (optional)\n", + "\n", + "The osw-python library is available as GitHub repository and can be downloaded as a ZIP file or via git:\n", + "\n", + "```bash\n", + "git clone https://github.com/OpenSemanticLab/osw-python.git \n", + "```\n", + "\n", + "## Installation\n", + "\n", + "### From PyPI\n", + "\n", + "Preferably, you can install the library from the Python Package Index (PyPI) via pip, which is recommended for most users:\n", + "\n", + "```bash\n", + "conda activate # optional\n", + "pip install osw-python\n", + "```\n", + "\n", + "### From source\n", + "\n", + "If you want to install the library from source, you can clone the repository and install it via pip. The option `-e`\n", + " installs the package in editable mode, which means that the source code is linked to the installed package. This is \n", + " useful for development and testing.\n", + "\n", + "```bash\n", + "git clone https://github.com/OpenSemanticLab/osw-python.git \n", + "cd \n", + "conda activate # optional\n", + "pip install [-e] .\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "c089a8c6224ff1be", + "metadata": {}, + "source": [ + "## Connecting to an OSL instance\n", + "\n", + "To connect to an OSL instance, you need to provide your login credentials. You can either provide your username and \n", + "password directly or create a bot password. The bot is preferred because its edit rights can be restricted and at the \n", + "same time, edits made programmatically are traceable, being marked as bot edits.\n", + "\n", + "### Creating a bot password\n", + "\n", + "- Log in to your OSL instance\n", + "- Navigate to **Special:BotPasswords**, via **Toggle menu → Special pages → Bot passwords**,\n", + " e.g., `https:///wiki/Special:BotPasswords`, \n", + "- You must log in again to verify your identity\n", + "- Create a new bot password by providing a `Bot name`, e.g., 'PythoBot' and click **Create**\n", + "- Save the `Username` and `Bot password` in a safe place, as the password will not be displayed again\n", + "\n", + "### (Optional) Creating a credentials file\n", + "\n", + "You can create a YAML file, e.g., 'credentials.pwd.yaml', with your login credentials, which can be used to connect to the OSL instance. The file must follow the structure below:\n", + "\n", + "```yaml\n", + " :\n", + " username: \n", + " password: \n", + "```\n", + "\n", + "### Connecting via osw-python\n", + "\n", + "It is recommended to use the `osw.express.OswExpress` class to connect to an OSL instance. The class provides a \n", + "number of convenience functions ontop of the underlying `osw.core.OSW`. \n", + "\n", + "On the first execution of the following cell you will be prompted to enter domain, username and password. The \n", + "credentials will be stored in a file named **credentials.pwd.yaml** in a subfolder **osw_files** of the current working \n", + "directory. In the current working directory, a **.gitignore** file will be created or updated to include the \n", + "credentials file. \n", + "\n", + "This step is required to download all dependencies (data models) of OswExpress from the OSL instance." + ] + }, + { + "cell_type": "code", + "id": "162b6208a105bbde", + "metadata": {}, + "source": "from osw.express import OSW, OswExpress", + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "cell_type": "code", + "source": [ + "# Define the wiki_domain for later reuse:\n", + "wiki_domain = \"wiki-dev.open-semantic-lab.org\" # Replace with the domain of your OSL instance" + ], + "id": "509040ca218924ed", + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "cell_type": "markdown", + "source": [ + "#### Option 1: Reuse the credentials file created in the previous step\n", + "\n", + "If you are still running in the same CWD, OswExpress will automatically find the credentials file.\n", + "\n", + "Else you will be prompted to enter your username and password." + ], + "id": "4728ef9d8456dc26" + }, + { + "cell_type": "code", + "id": "54f082d780c3dd79", + "metadata": {}, + "source": "osw_obj = OswExpress(domain=wiki_domain) ", + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "id": "c735e2f5c4f55c05", + "metadata": {}, + "source": [ + "#### Option 2: Provide a credentials file (path)\n", + "\n", + "If the file does not exist or the domain is not in the file, you will be prompted to enter your username and password.\n", + "Unknown domains will be appended to the file." + ] + }, + { + "cell_type": "code", + "id": "24810957b39bfd3a", + "metadata": {}, + "source": "osw_obj = OswExpress(domain=wiki_domain, cred_filepath=\"credentials.pwd.yaml\")", + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "id": "9022d39338ed267f", + "metadata": {}, + "source": [ + "## Downloading data model dependencies\n", + "\n", + "Loading entities from OSL fetches required data models by default. So if you just want to load, modify and upload one\n", + " (type of) entity, you can scip to [Downloading an entity](#Downloading-an-entity).\n", + "\n", + "Before we can upload entities or files, we need to download the required data models. The data models are stored in the \n", + "`jsonschema` slot of the respective categories (=classes) and are used to generate Python data classes. OswExpress \n", + "offers a convenience function to download all dependencies of a given category, that an item is an instance of. \n", + "\n", + "> [!NOTE]\n", + "> It is important to execute this notebook with the same environment, where the data models are installed to!" + ] + }, + { + "cell_type": "markdown", + "id": "537cdcc821e4803e", + "metadata": {}, + "source": [ + "### Identify required data models\n", + "\n", + "All categories (=classes) are subcategories of the **Entity** category. The classes **Entity**, **Item** and concepts \n", + "required to provide typing for those classes are provided out-of-the-box within `osw.model.entity`, which imports \n", + "**OswBasemodel(pydantic.BaseModel)** from `osw.model.static`.\n", + "\n", + "To store structured information in an OSL instance, you need to find a fitting **Category**, to create pages \n", + "(in the **Item** or **Category** namespace) in. To explore the data model hierarchy, you can use the graph tool \n", + "provided under **Graph** on every page in the `Category` or `Item` namespace, following the `SubClassOf` property. \n", + "\n", + "A good alternativ is to consult the **Category tree** page and navigate through the collapsible tree. The page can \n", + "be found under \n", + "`https:///wiki/Special:CategoryTree?target=Category%3AEntity&mode=categories&namespaces=`. \n", + "\n", + "Save the `Machine compatible name` and `Full page title` of the category you want to work with in a dictionary. Note \n", + "that only the category, the farthest down a branch, with respect to the root category **Entity**, is required. All \n", + "other categories will be downloaded automatically.\n", + "\n", + "**Example category tree**:\n", + "```\n", + "Entity\n", + "├── Property\n", + "├── Statement\n", + "└── Item\n", + " ├── Person\n", + " | └── User\n", + " ├── Location\n", + " | ├── Site\n", + " | ├── Building\n", + " | ├── Floor\n", + " | └── Room\n", + " ├── CreativeWork\n", + " | └── Article\n", + " | └── Tutorial\n", + " └── OrganizationalUnit\n", + " └── Organization\n", + "```\n", + "\n", + "> [!NOTE]\n", + "> If you find no category, ask your administrator to install page packages via the special page 'Special:Packages'. \n", + "> Page packages are maintained via [GitHub](https://github.com/OpenSemanticWorld-Packages/osw-package-maintenance)" + ] + }, + { + "cell_type": "code", + "id": "7381372e09b55c7a", + "metadata": {}, + "source": [ + "dependencies = {\n", + " \"Organization\": \"Category:OSW1969007d5acf40539642877659a02c23\", # Will fetch: Organization, OrganizationalUnit\n", + " \"Person\": \"Category:OSWd9aa0bca9b0040d8af6f5c091bf9eec7\", # Will fetch: Person\n", + " \"Room\": \"Category:OSWc5ed0ed1e33c4b31887c67af25a610c1\", # Will fetch: Room, Location, but not: Site, Building, Floor\n", + " \"Tutorial\": \"Category:OSW494f660e6a714a1a9681c517bbb975da\", # Will fetch: Tutorial, Article, CreativeWork\n", + "}" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "id": "da879e15b57f0c3e", + "metadata": {}, + "source": [ + "> [!NOTE]\n", + "> Keys in this dictionary will eventually be used in the import statements, should therefore fit the auto generated \n", + "> class names, which are the same as the category's `Machine compatible name`!" + ] + }, + { + "cell_type": "markdown", + "id": "7a8684bb1b8779fd", + "metadata": {}, + "source": [ + "### Install data models\n", + "\n", + "Data models (data classes generated in osw.model.entity) can not be imported in Python scripts and modules prior to \n", + "installation. Therefore, it is recommended to do this step either in a separate script, which is run before the main\n", + "script, or in the main script itself, before the import statements.\n", + "\n", + "#### Option 1: Install dependencies before import from osw.model.entity \n", + "\n", + "This option is recommended to put in a separate script, which is run before the main script." + ] + }, + { + "cell_type": "code", + "id": "39260a2e792deb47", + "metadata": {}, + "source": [ + "# Will run everytime the script is executed:\n", + "osw_obj.install_dependencies(dependencies)\n", + "\n", + "\n", + "# Static code checker will note 'Module not found' before the installation:\n", + "from osw.model.entity import Organization, Person, Room, Tutorial" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "id": "eb01d61d5c5586ee", + "metadata": {}, + "source": [ + "#### Option 2: Use OswExpress comfort function for imports\n", + "\n", + "This option is recommended to put in the main script, before the first `from osw.model.entity import` statement." + ] + }, + { + "cell_type": "code", + "id": "a7c8b1233c5817b3", + "metadata": {}, + "source": [ + "from typing import TYPE_CHECKING\n", + "from osw.express import import_with_fallback\n", + "\n", + "\n", + "# Will fetch and install dependencies only if not already installed:\n", + "import_with_fallback(dependencies)\n", + "\n", + "\n", + "# Otherwise static code checker will note 'Module not found' before the installation:\n", + "if TYPE_CHECKING:\n", + " from osw.model.entity import Organization, Person, Room, Tutorial" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "id": "40dc1839eeb1e503", + "metadata": {}, + "source": [ + "### Interact with an entity\n", + "\n", + "Data classes created by the code generator are based on Pydantic models, which provide validation and serialization.\n", + "\n", + "#### Creating an entity\n", + "\n", + "To create an entity, you need to create an instance of the respective data class. The `__init__` method of the data \n", + "class expects keyword arguments for all fields. As per usual for Pydantic models, positional arguments are not \n", + "permitted and the input data is validated during initialization." + ] + }, + { + "cell_type": "code", + "id": "fb10f35fd3927b6a", + "metadata": {}, + "source": [ + "# Create a person\n", + "john = Person(\n", + " first_name=\"John\",\n", + " last_name=\"Doe\",\n", + " email=\"john.doe@example.com\"\n", + ")\n", + "# Should return two ValidationErrors:\n", + "# - surname: field required\n", + "# - email: value is not a valid set" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "id": "2beecba33c4647ce", + "metadata": {}, + "source": [ + "Lets breakdown what happened here\n", + "\n", + "- During initialization, the Pydantic model validates the input data. The validation errors are raised as exceptions.\n", + "- The `surname` field is required, but it was not provided.\n", + "- The extra field `last_name` was provided, but it was not expected. By default, Pydantic models disregard extra \n", + "fields without warning.\n", + "- The `email` field is expected to be a list of strings, but a string was provided. " + ] + }, + { + "cell_type": "code", + "id": "eaaf78110e18c493", + "metadata": {}, + "source": [ + "# Should run without validation errors\n", + "john = Person(\n", + " first_name=\"John\",\n", + " surname=\"Doe\",\n", + " email=[\"john.doe@example.com\"],\n", + ")" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "id": "9aa534b206d8b486", + "metadata": {}, + "source": [ + "Before storing the entity in the OSL instance, lets check at which full page title it will be stored. \n", + "The full page title is derived from the `uuid` and the `namespace` of the entity." + ] + }, + { + "cell_type": "code", + "id": "d861fde54918bc05", + "metadata": {}, + "source": [ + "from osw.utils.wiki import get_namespace, get_osw_id, get_full_title\n", + "\n", + "print(\"Namespace:\", get_namespace(john))\n", + "print(\"UUID:\", john.uuid)\n", + "print(\"OSW-ID:\", get_osw_id(john.uuid))\n", + "print(\"Full title:\", get_full_title(john))" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "id": "f6dace23bf99c7f3", + "metadata": {}, + "source": [ + "#### Storing an entity\n", + "\n", + "We can now store this entity in the OSL instance. The `store_entity` method uploads the entity to the OSL instance. " + ] + }, + { + "cell_type": "code", + "id": "64e0678b77f00772", + "metadata": {}, + "source": [ + "osw_obj.store_entity(john)\n", + "# In this specific case equivalent to:\n", + "params = OswExpress.StoreEntityParam(\n", + " entities=[john],\n", + " namespace=get_namespace(john),\n", + " parallel=False,\n", + " overwrite=\"keep existing\",\n", + " overwrite_per_class=None,\n", + ") # All default values included\n", + "# osw_obj.store_entity(params)" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "id": "78f959815e0fe8f", + "metadata": {}, + "source": [ + "Like most methods and functions in the osw-python library, the `store_entity` takes only a single argument. Usually \n", + "either a specific object type or a dedicated params object is accepted. If an object of type other than the params \n", + "object is passed, it is usually tested for compatibility and put inside a params object, filling the other parameters \n", + "with default values. \n", + "\n", + "This ensures that the method signature is as simple as possible and at the same time allows full typing and validation \n", + "of input parameters that the method is easy to use. " + ] + }, + { + "cell_type": "markdown", + "id": "639db64348bbd905", + "metadata": {}, + "source": [ + "#### Downloading an entity\n", + "\n", + "To download an entity, you need to provide the full page title of the entity. The `download_entity` method downloads the\n", + "entity from the OSL instance and returns an instance of the respective data class. If the respective data class is \n", + "not already part of `osw.model.entity`, the data class is generated on-the-fly by default." + ] + }, + { + "cell_type": "code", + "id": "ea1da2ff6d96808a", + "metadata": {}, + "source": [ + "john2 = osw_obj.load_entity(get_full_title(john))" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "id": "fe620279252b6886", + "metadata": {}, + "source": [ + "Lets have a look at the attributes of the downloaded entity" + ] + }, + { + "cell_type": "code", + "id": "19d1c6767b2098b", + "metadata": {}, + "source": [ + "from pprint import pprint\n", + "pprint(john2.dict())" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "id": "2ab58db05b3b004b", + "metadata": {}, + "source": [ + "Besides the attributes that we set (first_name, surname, email), the downloaded entity has additional attributes, \n", + "that are generated by default, either when the entity is initialized (uuid, meta.wiki_page.title) or loaded (None \n", + "valued attributes). \n", + "\n", + "Loading an entity from the OSL instance downloads the full `jsondata` slot and pass it to the \n", + "`__init__` method of the respective data class. Thereby, attributes not present in the `jsondata` slot are set to the\n", + " default value of the data class." + ] + }, + { + "cell_type": "markdown", + "id": "d140b974b54155ce", + "metadata": {}, + "source": [ + "#### Modifying an entity\n", + "\n", + "To modify an entity, you can change the attributes of the entity instance. The attributes can be accessed and modified \n", + "like any other attribute of a Python object." + ] + }, + { + "cell_type": "code", + "id": "c0511efb825acf10", + "metadata": {}, + "source": [ + "# Adding a new attribute\n", + "john2.middle_name = \"R.\"\n", + "# Changing an existing attribute \n", + "john2.email = {\"john.doe@gmx.de\"}\n", + "\n", + "# Checking the made changes:\n", + "pprint(john2.dict())" + ], + "outputs": [], + "execution_count": null + }, + { + "cell_type": "markdown", + "id": "f42c1600ed17a059", + "metadata": {}, + "source": [ + "#### Storing an altered entity\n", + "\n", + "Here the same applies as for [Storing an entity](#Storing-an-entity). BUT: overwriting entities is not possible with \n", + "default settings \"keep existing\". Therefore, you need to call the method `store_entity` passing a StoreEntityParam with \n", + "the attribute `overwrite` set to \"overwrite\".\n" + ] + }, + { + "metadata": {}, + "cell_type": "code", + "source": [ + "# Option 1: Overwrite all entities\n", + "osw_obj.store_entity(OSW.StoreEntityParam(entities=[john2], overwrite=True))" + ], + "id": "77b507f13be1235b", + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "cell_type": "code", + "source": [ + "# Option 2: Overwrite only entities of type Person\n", + "osw_obj.store_entity(OSW.StoreEntityParam(\n", + " entities=[john2], overwrite_per_class=[OSW.OverwriteClassParam(model=Person, overwrite=True)]))" + ], + "id": "19d4d48e95e0fe82", + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "cell_type": "code", + "source": [ + "# Option 3: Overwrite only the email attribute of entities of type Person\n", + "osw_obj.store_entity(OSW.StoreEntityParam(\n", + " entities=[john2], overwrite_per_class=[\n", + " OSW.OverwriteClassParam(model=Person, overwrite=False,per_property={\"email\": True})]))" + ], + "id": "4fc037a10f95356a", + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "cell_type": "markdown", + "source": [ + "Here all three options will have the same result, but in many cases the result will differ, especially if you have \n", + "entities of different classes in the list of entities to store.\n", + "\n", + "The param `overwrite` is applied to all entities handed to the method regardless of type / class. It is also possible\n", + " to specify the overwrite behavior per class, by providing a list of `OSW.OverwriteClassParam`s. Those can even be \n", + " specific down to the property level. \n", + "- Available options for `OSW.StoreEntitParam.overwrite`, `OSW.OverwriteClassParam.overwrite` (per class) and `OSW\n", + ".OverwriteClassParam.per_property` are: \n", + " - `OSW.OverwriteOptions.true`: True - overwrite the remote entity or property with the local one\n", + " - `OSW.OverwriteOptions.false`: False - do not overwrite the remote entity or property with the local one\n", + " - `OSW.OverwriteOptions.only_empty`: \"only empty\" - overwrite the remote entity or property with the local one, \n", + " if the remote entity or property is empty\n", + "- Only available to `OSW.StoreEntitParam.overwrite` and `OSW.OverwriteClassParam.overwrite` (per class) are:\n", + " - `OSW.AddOverwriteClassOptions.replace_remote`: \"replace remote\" - replace the remote entity with the local one and\n", + " removes all properties not present in the local entity\n", + " - `OSW.AddOverwriteClassOptions.keep_existing`: \"keep existing\" - keep the remote entity, if one exists under this \n", + " OSW-ID " + ], + "id": "133a7c6940f23ae6" + }, + { + "metadata": {}, + "cell_type": "markdown", + "source": [ + "### Interact with files \n", + "\n", + "#### Download a file\n", + "\n", + "Let's say you have already uploaded a file to the instance of OSL you are connected to and have the URL to the file \n", + "available. (Execute the Upload a file section to upload a file to the OSL instance.) You can download the file with \n", + "just two lines:\n" + ], + "id": "2dc425de23d5e2ce" + }, + { + "metadata": {}, + "cell_type": "code", + "source": [ + "from osw.express import osw_download_file\n", + "local_file = osw_download_file(\n", + " \"https://wiki-dev.open-semantic-lab.org/wiki/File:OSWaa635a571dfb4aa682e43b98937f5dd3.pdf\"\n", + " # , use_cached=True # Can be used to download the file only once, e.g., when developing code\n", + " # , overwrite=True # Can be used to avoid overwriting an existing file\n", + ")" + ], + "id": "9188d8aac00ef04", + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "cell_type": "markdown", + "source": [ + "The object `local_file` is an instance of `OswExpress.DownloadFileResult` and contains the path to the downloaded file, \n", + "which is accessible via:" + ], + "id": "59162da197057ee8" + }, + { + "metadata": {}, + "cell_type": "markdown", + "source": [ + "The class `OswExpress.DownloadFileResult` implements all dunder methods required for a context manager. Therefore, it\n", + " can be used with the `with` statement to ensure the file is closed properly after use:" + ], + "id": "5215e1b89b4ba89f" + }, + { + "metadata": {}, + "cell_type": "code", + "source": [ + "with osw_download_file(\n", + " \"https://wiki-dev.open-semantic-lab.org/wiki/File:OSWac9224e1a280449dba71d945b1581d57.txt\", overwrite=True\n", + ") as file:\n", + " print(file.read())" + ], + "id": "ff674c1413ca6b4b", + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "cell_type": "markdown", + "source": [ + "#### Round-Robin\n", + "\n", + "Let's create a file, upload it to the OSL instance, download it and read and alter its content, before uploading it \n", + "again.\n" + ], + "id": "418b47e58febe97d" + }, + { + "metadata": {}, + "cell_type": "code", + "source": [ + "from pathlib import Path\n", + "from osw.express import osw_upload_file, osw_download_file\n", + "\n", + "# Create a file\n", + "fp = Path(\"example.txt\")\n", + "with open(fp, \"w\") as file:\n", + " file.write(\"Hello, World!\")\n", + "# Upload a file to an OSW instance\n", + "wiki_file = osw_upload_file(fp, domain=wiki_domain)\n", + "# Delete the local file\n", + "fp.unlink()\n", + "\n", + "# Download the file\n", + "local_file = osw_download_file(wiki_file.url, mode=\"r+\") # mode=\"r+\" to read and write\n", + "\n", + "with local_file as file:\n", + " content = file.read()\n", + " print(\"Original content:\")\n", + " print(content)\n", + " content = content.replace(\"World\", \"OSW\")\n", + " print(\"\\nModified content:\")\n", + " print(content)\n", + " # Write the modified content back to the file\n", + " file.write(content)\n", + "\n", + "# Upload the modified file\n", + "modified_wiki_file = osw_upload_file(local_file)" + ], + "id": "14a0ee12dde2961e", + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "cell_type": "code", + "source": [ + "# Delete WikiFile from OSW instance after you are done with it\n", + "wiki_file.delete()" + ], + "id": "9647b490f90d2a0d", + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "cell_type": "markdown", + "source": [ + "### Interface data sources\n", + "\n", + "#### Tabular data: Excel, CSV and others\n", + "\n", + "Let's create a demo table and save it to an Excel file. " + ], + "id": "2095cc63bd256a2c" + }, + { + "metadata": {}, + "cell_type": "code", + "source": [ + "import pandas as pd\n", + "\n", + "df = pd.DataFrame(\n", + " {\n", + " \"FirstName\": [\"John\", \"Jane\", \"Alice\"],\n", + " \"LastName\": [\"Doe\", \"Do\", \"Dont\"],\n", + " \"Email\": [\"john.doe@example.com\", \"jane.do@example.com\", \"alice.dont@example.com\"],\n", + " }\n", + ")\n", + "df.to_excel(\"demo.xlsx\", index=False)\n", + "del df" + ], + "id": "64154efbfd92f87", + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "cell_type": "code", + "source": [ + "from pprint import pprint\n", + "# Let's read in our example Excel file\n", + "data_df = pd.read_excel(\"demo.xlsx\")\n", + "# Pandas dict representation is optimal for converting to JSON\n", + "# Let's have a look at the first row\n", + "john_dict = data_df.iloc[0].to_dict()\n", + "pprint(john_dict)" + ], + "id": "ed6cd2984e10baca", + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "cell_type": "code", + "source": [ + "# Let's convert the dict to a Person instance\n", + "from osw.model.entity import Person\n", + "john = Person(**john_dict)\n", + "# This will cause ValidationError(s):\n", + "# - first_name: field required\n", + "# - surname: field required" + ], + "id": "840449d1e08ea7f2", + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "cell_type": "markdown", + "source": [ + "Explanation: This is due to the dictionary unpacking operator `**`, which passes the dictionary keys as keyword\n", + "arguments to the `Person` class. The `first_name` and `surname` fields are required, but they are not present in the\n", + "dictionary. \n", + "\n", + "We have several options to resolve this issue:\n", + "- Replace the dictionary keys with fitting ones\n", + " - By renaming the columns in the DataFrame before converting it to a dictionary\n", + " - By providing a mapping dictionary and replacing the keys in the dictionary, using the following function:\n", + " ```python\n", + " def replace_keys(d, key_map): \n", + " return {key_map.get(k, k): v for k, v in d.items()}\n", + " ```\n", + "- Create a HelperClass, which inherits from the target data class and `osw.data.import_utility.HelperModel`, and \n", + " implements a transformation function to create the target data class instance from the dictionary " + ], + "id": "de2cd483287d86a9" + }, + { + "metadata": {}, + "cell_type": "markdown", + "source": [ + "##### Option 1: Rename columns in the DataFrame\n", + "\n", + "This option is very simple and will serve you well for cases of low complexity, where the datatypes in the DataFrame \n", + "columns already match the datatypes of the target data class." + ], + "id": "d98830f43fd2a405" + }, + { + "metadata": {}, + "cell_type": "code", + "source": [ + "# Let's print out the columns first\n", + "print(data_df.columns)" + ], + "id": "88760a60e93fc99e", + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "cell_type": "code", + "source": [ + "# Let's create the mapping dictionary and rename the columns\n", + "mapping = {\n", + " \"FirstName\": \"first_name\",\n", + " \"LastName\": \"surname\",\n", + " \"Email\": \"email\"\n", + "}\n", + "data_df.rename(columns=mapping, inplace=True)\n", + "# Let's have a look at the first row\n", + "john_dict = data_df.iloc[0].to_dict()\n", + "print(john_dict)" + ], + "id": "90f7efe362d1fd52", + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "cell_type": "code", + "source": [ + "# Let's construct an instance of the Person data model\n", + "john = Person(**john_dict)\n", + "# This will cause a ValidationError:\n", + "# - email: value is not a valid set" + ], + "id": "469e3f2514a16e50", + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "cell_type": "code", + "source": [ + "# Let's correct the email field\n", + "john_dict[\"email\"] = [john_dict[\"email\"]]\n", + "john = Person(**john_dict)\n", + "pprint(john.dict())" + ], + "id": "9e5c3bc9ff17b2e4", + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "cell_type": "markdown", + "source": [ + "##### Option 2: Create a HelperModel and a transformation function\n", + "\n", + "This approach will be able to treat even cases of high complexity, where the datatypes in the DataFrame columns do not \n", + "match the datatypes of the target data class or where references to other instances in a dataset have to be made. \n", + "\n", + "[!NOTE] Property and variable names in Python must not contain spaces, so the column names in the DataFrame have \n", + "to be transformed accordingly." + ], + "id": "7401bbbb658bc631" + }, + { + "metadata": {}, + "cell_type": "code", + "source": [ + "from typing import Any\n", + "from osw.data.import_utility import HelperModel\n", + "\n", + "class PersonHelper(Person, HelperModel):\n", + " FirstName: Any\n", + " LastName: Any\n", + " Email: Any\n", + " \n", + " def transform_attributes(self, dd: dict = None) -> bool:\n", + " super().transform_attributes()\n", + " self.first_name = self.FirstName\n", + " self.surname = self.LastName\n", + " self.email = {self.Email}\n", + " return True\n", + "\n", + "# Let's create a new instance of the PersonHelper class\n", + "data_df = pd.read_excel(\"demo.xlsx\")\n", + "john_dict = data_df.iloc[0].to_dict()\n", + "john_helper = PersonHelper(**john_dict)\n", + "print(\"Before transformation:\")\n", + "pprint(john_helper.dict())" + ], + "id": "ee7efd2374c3f87e", + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "cell_type": "code", + "source": [ + "# Let's see the effect of the transformation\n", + "john_helper.transform_attributes()\n", + "print(\"After transformation:\")\n", + "pprint(john_helper.dict())" + ], + "id": "872b9e65d606d544", + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "cell_type": "code", + "source": [ + "# Actually we access 'casted' directly\n", + "# If the transformation operations had not been performed already,\n", + "# accessing 'casted' would trigger them\n", + "john = john_helper.casted\n", + "print(\"After casting:\")\n", + "pprint(john.dict())" + ], + "id": "2ff76f2a76c097f6", + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "cell_type": "code", + "source": [ + "# We can do the same for all instances quite easily:\n", + "entities = []\n", + "for ii in data_df.index:\n", + " entities.append(PersonHelper(**data_df.iloc[ii].to_dict()).casted)" + ], + "id": "30e0620c9ef49fdf", + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "cell_type": "code", + "source": [ + "# And store the entities in the OSL instance\n", + "osw_obj.store_entity(entities)" + ], + "id": "15f2596aff7ac2e0", + "outputs": [], + "execution_count": null + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.4" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/docs/tutorials/img/osw_intro_data_model.png b/docs/tutorials/img/osw_intro_data_model.png new file mode 100644 index 00000000..81d52923 Binary files /dev/null and b/docs/tutorials/img/osw_intro_data_model.png differ diff --git a/docs/tutorials/img/osw_intro_technology_stack.png b/docs/tutorials/img/osw_intro_technology_stack.png new file mode 100644 index 00000000..d63fc178 Binary files /dev/null and b/docs/tutorials/img/osw_intro_technology_stack.png differ diff --git a/examples/use_express_functions.py b/examples/use_express_functions.py index 9024b03f..92298cdb 100644 --- a/examples/use_express_functions.py +++ b/examples/use_express_functions.py @@ -37,7 +37,6 @@ overwrite=True, # Required if file already exists ) local_file_path = local_file.path -local_file.close() # required to release the file lock # Open a file with context manager directly from an OSW instance with osw_download_file( diff --git a/src/osw/controller/file/remote.py b/src/osw/controller/file/remote.py index 5497569a..11a592ea 100644 --- a/src/osw/controller/file/remote.py +++ b/src/osw/controller/file/remote.py @@ -4,7 +4,8 @@ from osw.controller.file.base import FileController from osw.core import model -# TODO: add addional remove file with https://docs.prefect.io/2.11.4/concepts/filesystems/ +# TODO: add additional remove file with +# https://docs.prefect.io/2.11.4/concepts/filesystems/ # Note: the order of the base classes is important diff --git a/src/osw/core.py b/src/osw/core.py index a164da58..a067228f 100644 --- a/src/osw/core.py +++ b/src/osw/core.py @@ -1073,13 +1073,13 @@ def store_entity_( meta_category_template, page.get_slot_content("jsondata"), { - "_page_title": entity_title, # legacy + "_page_title": entity_title, # Legacy "_current_subject_": entity_title, }, ) schema = json.loads(schema_str) - # put generated schema in definitions section - # currently only enabled for Characteristics + # Put generated schema in definitions section, + # currently only enabled for Characteristics if hasattr(model, "CharacteristicType") and isinstance( entity_, model.CharacteristicType ): @@ -1091,10 +1091,12 @@ def store_entity_( } schema["title"] = "Generated" + new_schema["title"] schema = new_schema - page.set_slot_content("jsonschema", new_schema) + page.set_slot_content("jsonschema", schema) except Exception as e: - print(f"Schema generation from template failed for {entity_}: {e}") - page.edit() # will set page.changed if the content of the page has changed + print( + f"Schema generation from template failed for " f"{entity}: {e}" + ) + page.edit() # Will set page.changed if the content of the page has changed if page.changed: if index is None: print(f"Entity stored at '{page.get_url()}'.") diff --git a/src/osw/data/import_utility.py b/src/osw/data/import_utility.py index 37f936fa..b7d137a6 100644 --- a/src/osw/data/import_utility.py +++ b/src/osw/data/import_utility.py @@ -9,6 +9,7 @@ import numpy as np from geopy import Nominatim from jsonpath_ng import ext as jp +from pydantic.v1 import create_model import osw.utils.strings as strutil from osw import wiki_tools as wt @@ -28,13 +29,58 @@ # Classes class HelperModel(model.OswBaseModel): + """Helper class for model transformations. The first base of the inheriting class + should always be the target class and the second base should be this class. + + Example + ------- + >>> class Person(model.OswBaseModel): + >>> first_name: str + >>> surname: str + >>> email: Set[str] + >>> + >>> john_dict = {"FirstName": "John", "LastName": "Doe", "Email": { + "john.doe@example.com"}} + >>> + >>> class PersonHelper(Person, HelperModel): + >>> FirstName: Any + >>> LastName: Any + >>> Email: Any + >>> + >>> def transform_attributes(self, dd: dict) -> bool: + >>> super().transform_attributes(dd) + >>> self.first_name = self.FirstName + >>> self.surname = self.LastName + >>> self.email = {self.Email} + >>> return True + """ + # Custom attributes attributes_transformed: bool = False references_transformed: bool = False casted_instance: Any = None full_page_title: Optional[str] - def transform_attributes(self, dd: dict) -> bool: + class Config: + arbitrary_types_allowed = True + + def __init_subclass__(cls, **kwargs): + """Will overwrite the annotations and fields of the inheriting class, + defined in the first base class with Optional[Any] annotations. This is + necessary to prevent errors when casting to the inheriting class.""" + super().__init_subclass__(**kwargs) + first_base = cls.__bases__[0] + if not issubclass(first_base, model.OswBaseModel): + return None + fields = {name: (Optional[Any], None) for name in first_base.__annotations__} + new_first_base = create_model(first_base.__name__, **fields) + for field_name in new_first_base.__fields__: + if field_name in cls.__fields__: # Replace existing fields + cls.__fields__[field_name] = new_first_base.__fields__[field_name] + if field_name in cls.__annotations__: # Replace existing annotations + cls.__annotations__[field_name] = Optional[Any] + + def transform_attributes(self, dd: dict = None) -> bool: if not self.attributes_transformed: uuid = uuid_module.uuid4() if hasattr(self, "uuid"): @@ -45,7 +91,7 @@ def transform_attributes(self, dd: dict) -> bool: self.attributes_transformed = True return True - def transform_references(self, dd: dict) -> bool: + def transform_references(self, dd: dict = None) -> bool: if not self.attributes_transformed: self.transform_attributes(dd) if not self.references_transformed: @@ -56,14 +102,22 @@ def transform_references(self, dd: dict) -> bool: self.references_transformed = True return True - def cast_to_superclass(self, dd): + def cast_to_superclass(self, dd: dict = None, return_casted: bool = False) -> bool: + """Casts the instance to the superclass of the inheriting class. Assumes that + the first base of the inheriting class is the target class.""" if not self.references_transformed: self.transform_references(dd) else: superclass = self.__class__.__bases__[0] self.casted_instance = self.cast_none_to_default(cls=superclass) + if return_casted: + return self.casted_instance return True + @property + def casted(self): + return self.cast_to_superclass(return_casted=True) + # Functions def transform_attributes_and_merge( @@ -89,6 +143,7 @@ def transform_attributes_and_merge( if not inplace: ent = copy.deepcopy(ent) ent_as_dict = copy.deepcopy(ent_as_dict) + # Transform attributes ent, ent_as_dict = loop_and_call_method( entities=ent, method_name="transform_attributes", diff --git a/src/osw/express.py b/src/osw/express.py index d3e90349..d8885075 100644 --- a/src/osw/express.py +++ b/src/osw/express.py @@ -12,6 +12,8 @@ IO, TYPE_CHECKING, Any, + AnyStr, + Buffer, Dict, List, Optional, @@ -178,19 +180,22 @@ def __init__( self.cred_filepath = cred_filepath def __enter__(self): + """Return self when entering the context manager.""" return self def __exit__(self): + """Close the connection to the OSL instance when exiting the context manager.""" self.close_connection() def close_connection(self): + """Close the connection to the OSL instance.""" self.site._site.connection.close() def shut_down(self): + """Makes sure this OSL instance can't be reused after it was shut down, + as the connection can't be reopened except when initializing a new instance.""" self.close_connection() del self - # Make sure this osw instance can't be reused after it was shut down (the - # connection can't be reopened except when initializing a new instance) def install_dependencies( self, @@ -334,8 +339,10 @@ def upload_file( data = {**locals(), **properties} # Clean data dict to avoid passing None values data = {key: value for key, value in data.items() if value is not None} + # Make sure self is passed as osw_express + data["osw_express"] = self # Initialize the UploadFileResult object - return UploadFileResult(source=source, osw_express=self, **data) + return UploadFileResult(source=source, **data) class DataModel(OswBaseModel): @@ -350,7 +357,10 @@ class DataModel(OswBaseModel): def import_with_fallback( - to_import: List[DataModel], dependencies: Dict[str, str] = None, domain: str = None + to_import: Union[List[DataModel], Dict[str, str]], + module: str = None, + dependencies: Dict[str, str] = None, + domain: str = None, ): """Imports data models with a fallback to fetch the dependencies from an OSL instance if the data models are not available in the local osw.model.entity module. @@ -359,6 +369,9 @@ def import_with_fallback( ---------- to_import List of DataModel objects to import. + module + (Optional) The module to import the data models from. Used only if to_import + is of type List[Dict]. Defaults to 'osw.model.entity' if not specified. dependencies A dictionary with the keys being the names of the dependencies and the values being the full page name of the dependencies. @@ -370,6 +383,18 @@ def import_with_fallback( ------- """ + if isinstance(to_import, dict): + # Assume all DataModels are part of osw.model.entity + if module is None: + module = "osw.model.entity" + to_import = [ + DataModel( + module=module, + class_name=key, + osw_fpt=value, + ) + for key, value in to_import.items() + ] try: for ti in to_import: # Raises AttributeError if the target could not be found @@ -449,7 +474,7 @@ def import_with_fallback( class FileResult(OswBaseModel): url_or_title: Optional[str] = None """The URL or full page title of the WikiFile page.""" - file: Optional[TextIO] = None + file_io: Optional[TextIO] = None """The file object. They type depends on the file type.""" mode: str = "r" """The mode to open the file in. Default is 'r'. Implements the built-in open.""" @@ -476,20 +501,45 @@ class FileResult(OswBaseModel): class Config: arbitrary_types_allowed = True - def open(self, mode: str = "r", **kwargs): + def open(self, mode: str = None, **kwargs) -> TextIO: + """Open the file, if not already opened using the 'mode' argument (priority) or + the 'mode' attribute.""" + if mode is None: + mode = self.mode kwargs["mode"] = mode - return open(self.path, **kwargs) + if self.file_io is None or self.file_io.closed: + return open(self.path, **kwargs) + return self.file_io + + def close(self) -> None: + """Close the file, if not already closed.""" + if self.file_io is None or self.file_io.closed: + warn("File already closed or not opened.") + else: + self.file_io.close() - def close(self): - self.file.close() + def read(self, n: int = -1) -> AnyStr: + """Read the file. If n is not specified, the entire file will be read. + If the file is not already opened, it will be opened.""" + if self.file_io is None or self.file_io.closed: + self.file_io = self.open(mode="r") + return self.file_io.read(n) - def read(self, *args, **kwargs): - return self.file.read(*args, **kwargs) + def write(self, s: Union[Buffer, AnyStr]): + """Write to the file. If the file is not already opened, it will be opened.""" + if self.file_io is None or self.file_io.closed: + self.file_io = self.open(mode="w") + return self.file_io.write(s) def __enter__(self): + """Open the file when entering the context manager.""" + if self.file_io is None or self.file_io.closed: + self.file_io = self.open() return self def __exit__(self, exc_type, exc_value, traceback): + """Close the file when exiting the context manager, and deletes the file if + 'delete_after_use' was set.""" self.close() if self.delete_after_use and self.path.exists(): self.path.unlink() @@ -505,6 +555,14 @@ def process_init_data(self, data: Dict[str, Any]) -> Dict[str, Any]: if data.get(key) is None: data[key] = value # Do replacements + if ( + data.get("label") == InMemoryController.__fields__["label"].default + or data.get("label") == LocalFileController.__fields__["label"].default + or data.get("label") == WikiFileController.__fields__["label"].default + ): + # Make sure that the label is not set to the default value, it will be + # set by the source file controller + del data["label"] if data.get("cred_filepath") is None: data["cred_filepath"] = cred_filepath_default.get_default() if not data.get("cred_filepath").parent.exists(): @@ -603,8 +661,7 @@ def __init__(self, url_or_title, **data): data = {key: value for key, value in data.items() if value is not None} super().__init__(**{**lf.dict(), **data}) self.put_from(wf) - # Do open - self.file = self.open(mode=data.get("mode")) + # File is only opened at request to avoid locking the file def osw_download_file( @@ -777,10 +834,18 @@ def __init__( ) # Create an osw_express object if not given if data.get("osw_express") is None: - data["osw_express"] = OswExpress( - domain=data.get("domain"), - cred_mngr=data.get("cred_mngr"), - ) + create_new = True + # Try to get the osw_express object from the source_file_controller + if data.get("source_file_controller") is not None: + if hasattr(data["source_file_controller"], "osw_express"): + create_new = False + data["osw_express"] = data["source_file_controller"].osw_express + # Otherwise create a new osw_express object + if create_new: + data["osw_express"] = OswExpress( + domain=data.get("domain"), + cred_mngr=data.get("cred_mngr"), + ) # If given set titel and namespace if data.get("target_fpt") is not None: namespace = data.get("target_fpt").split(":")[0] @@ -883,6 +948,7 @@ def osw_upload_file( OswExpress.update_forward_refs() + # todo: # * create a .gitignore in the basepath that lists the default credentials file ( # accounts.pwd.yaml) OR append to an existing .gitignore# diff --git a/src/osw/model/static.py b/src/osw/model/static.py index fbe73d18..bb4ea65f 100644 --- a/src/osw/model/static.py +++ b/src/osw/model/static.py @@ -68,7 +68,8 @@ def test_if_empty_list_or_none(obj) -> bool: k: v for k, v in self.dict().items() if not test_if_empty_list_or_none(v) } combined_args = {**self_args, **kwargs} - del combined_args["type"] + if "type" in combined_args: + del combined_args["type"] return cls(**combined_args) diff --git a/src/osw/utils/strings.py b/src/osw/utils/strings.py index 7b49de92..639aa228 100644 --- a/src/osw/utils/strings.py +++ b/src/osw/utils/strings.py @@ -52,6 +52,9 @@ class RegExPatternExtended(OswBaseModel): used to test the pattern by asserting list(self.example_match.groups.values()) == self.expected_groups""" + class Config: + arbitrary_types_allowed = True + def __init__(self, **data): super().__init__(**data) if isinstance(self.pattern, str): @@ -113,14 +116,14 @@ def test_pattern(self) -> bool: else: return False - class Config: - arbitrary_types_allowed = True - class MatchResult(OswBaseModel): match: Union[re.Match, None] pattern: Union[RegExPatternExtended, None] + class Config: + arbitrary_types_allowed = True + @property def groups(self): """Return a dictionary representation of the object, enabling accessing the @@ -133,9 +136,6 @@ def groups(self): for key in keys } - class Config: - arbitrary_types_allowed = True - class SearchResult(MatchResult): pass diff --git a/src/osw/utils/wiki.py b/src/osw/utils/wiki.py index 7d137d8b..4c69241d 100644 --- a/src/osw/utils/wiki.py +++ b/src/osw/utils/wiki.py @@ -12,11 +12,11 @@ def get_osw_id(uuid: UUID) -> str: Parameters ---------- uuid - uuid object, e. g. UUID("2ea5b605-c91f-4e5a-9559-3dff79fdd4a5") + An UUID object, e.g., UUID("2ea5b605-c91f-4e5a-9559-3dff79fdd4a5") Returns ------- - OSW-ID string, e. g. OSW2ea5b605c91f4e5a95593dff79fdd4a5 + OSW-ID string, e.g., OSW2ea5b605c91f4e5a95593dff79fdd4a5 """ return "OSW" + str(uuid).replace("-", "") @@ -27,11 +27,11 @@ def get_uuid(osw_id) -> UUID: Parameters ---------- osw_id - OSW-ID string, e. g. OSW2ea5b605c91f4e5a95593dff79fdd4a5 + OSW-ID string, e.g., OSW2ea5b605c91f4e5a95593dff79fdd4a5 Returns ------- - uuid object, e. g. UUID("2ea5b605-c91f-4e5a-9559-3dff79fdd4a5") + uuid object, e.g., UUID("2ea5b605-c91f-4e5a-9559-3dff79fdd4a5") """ return UUID(osw_id.replace("OSW", ""))