Skip to content
Permalink
Browse files
feat: import lists from excel (DSP-1341) (#48)
* Removing GUI code

* Cleanup

* cleanup

* Added logging

* Bug fix XML parser not longer incremental

* Bugfixing...

* Bugfix

* Removed printouts

* Ongoing imrovements

* Fix: Problem after installing with pip

* Pimped up version to 0.9.12

* Added some fixes regarding knora-api: properties

* some small fixes

* Fixing test data

* Adding testing data

* versioning

* Docu update

* Bugfix for breaking change in dsp-api (concerning lists)

* Added support for lists defined in excel (1. step)

* ...

* Adding tests

* ...

* Test and a bit od docu

* Adapted setup.py to use openpyxl

* Adapted documentation to latestet development

* Added documentation

* excel-list node names from label

* Corss-references in documentation

* Documentation and small bugfix reading excel

* Updated version number and documentation

* Push version to 1.0.0

* The Big Cleanup

* Remove .DS_Store from everywhere

* type fix

* Cleanup pp

* chore(ci): bump ubuntu version

* chore(ci): fix dependencies

* chore(ci): fix dependencies

Co-authored-by: BalduinLandolt <balduin.landolt@hotmail.com>
Co-authored-by: Ivan Subotic <400790+subotic@users.noreply.github.com>
  • Loading branch information
3 people committed Feb 17, 2021
1 parent 03bfa82 commit 362899214c850e6c3f613a3cbff29ab2294dccfb
Showing with 1,951 additions and 6,474 deletions.
  1. +2 −2 .github/workflows/main.yml
  2. +1 −0 .gitignore
  3. +161 −79 docs/dsp-tools-create.md
  4. +13 −0 docs/dsp-tools-excel.md
  5. +8 −48 docs/dsp-tools-xmlupload.md
  6. +0 −78 docs/dsp-tools.md
  7. BIN docs/img.png
  8. BIN docs/img_1.png
  9. +106 −30 docs/index.md
  10. +0 −23 examples/README.md
  11. BIN examples/example.tif
  12. +0 −58 examples/example_create_resource.py
  13. 0 knora/{BUILD → BUILD.bazel}
  14. +0 −717 knora/BiZ-onto.json
  15. BIN knora/DaSCH_Logo_RGB.png
  16. +0 −84 knora/MLS-import-libraries.py
  17. +0 −2,056 knora/MLS-onto.json
  18. +0 −224 knora/anything-test-data.xml
  19. BIN knora/bitstreams/TEMP11.TIF
  20. BIN knora/bitstreams/TEMP12.TIF
  21. BIN knora/bitstreams/TEMP13.TIF
  22. BIN knora/bitstreams/TEMP14.TIF
  23. BIN knora/bitstreams/TEMP15.TIF
  24. +0 −1 knora/bitstreams/test.csv
  25. BIN knora/bitstreams/test.pdf
  26. +37 −6 knora/dsp_tools.py
  27. 0 knora/dsplib/BUILD
  28. 0 examples/BUILD → knora/dsplib/BUILD.bazel
  29. 0 knora/dsplib/models/{BUILD → BUILD.bazel}
  30. +15 −4 knora/dsplib/models/resource.py
  31. +20 −1 knora/dsplib/models/value.py
  32. +0 −77 knora/dsplib/utils/BUILD
  33. +98 −0 knora/dsplib/utils/BUILD.bazel
  34. +31 −4 knora/dsplib/utils/knora-schema.json
  35. +91 −0 knora/dsplib/utils/onto_commons.py
  36. +27 −1 knora/dsplib/utils/onto_create_lists.py
  37. +31 −1 knora/dsplib/utils/onto_create_ontology.py
  38. +43 −0 knora/dsplib/utils/onto_process_excel.py
  39. +33 −0 knora/dsplib/utils/onto_validate.py
  40. +0 −1 knora/dsplib/utils/xml_upload.py
  41. +0 −15 knora/dsplib/widgets/BUILD
  42. 0 knora/dsplib/widgets/__init__.py
  43. +0 −80 knora/dsplib/widgets/doublepassword.py
  44. +0 −84 knora/gaga.json
  45. BIN knora/icons/Betreuungszusage Doktorat Philosophisch Historische Fakultaet_Béatrice Gauvain Kopie.pdf
  46. BIN knora/icons/favicon-16x16.png
  47. BIN knora/icons/favicon-32x32.png
  48. +0 −1,743 knora/mls.json
  49. +0 −99 knora/test.py
  50. +0 −76 knora/testit.py
  51. +0 −633 knora/xml2knora.py
  52. +3 −3 mkdocs.yml
  53. +1 −0 requirements.txt
  54. +3 −2 setup.py
  55. +36 −0 test/{BUILD → BUILD.bazel}
  56. +0 −27 test/lists.json
  57. +0 −217 test/test-onto.json
  58. BIN test/test.tif
  59. +78 −0 test/test_resource.py
  60. +57 −0 test/test_tools.py
  61. +17 −0 testdata/BUILD.bazel
  62. +1,005 −0 testdata/anything.json
  63. +21 −0 testdata/bitstreams/BUILD.bazel
  64. BIN testdata/bitstreams/TEMP11.TIF
  65. BIN testdata/bitstreams/TEMP12.TIF
  66. BIN testdata/bitstreams/TEMP13.TIF
  67. BIN testdata/bitstreams/TEMP14.TIF
  68. BIN testdata/bitstreams/TEMP15.TIF
  69. BIN {knora → testdata}/bitstreams/clara.wav
  70. +1 −0 testdata/bitstreams/test.csv
  71. BIN testdata/bitstreams/test.pdf
  72. BIN {knora → testdata}/bitstreams/test.zip
  73. BIN testdata/list-as-excel.xlsx
  74. +2 −0 {knora → testdata}/test-data.xml
  75. +10 −0 {knora → testdata}/test-onto.json
@@ -8,7 +8,7 @@ env:
jobs:
test-integration:
name: Integration Tests
runs-on: ubuntu-latest
runs-on: ubuntu-20.04
steps:
- name: Checkout source
uses: actions/checkout@v1
@@ -31,7 +31,7 @@ jobs:
with:
python-version: 3.9
- name: Install python package dependencies
run: sudo apt-get install libxml2-dev libxslt-dev python3-dev libgtk-3-dev libgstreamer1.0-0 gstreamer1.0-plugins-base freeglut3-dev libwebkitgtk-1.0-0 libjpeg-dev libpng-dev libtiff-dev libsdl-dev libnotify-dev libsm-dev
run: sudo apt-get install libxml2-dev libxslt-dev python3-dev libgstreamer1.0-0 gstreamer1.0-plugins-base freeglut3-dev libjpeg-dev libpng-dev libtiff-dev libsdl-dev libnotify-dev libsm-dev
- name: run test-integration
run: |
make upgrade-dist-tools
@@ -1,4 +1,5 @@
.tmp
**/.DS_Store

# Byte-compiled / optimized / DLL files
__pycache__/
@@ -1,17 +1,25 @@
# JSON ontology definition format
# JSON data model definition format

## Introduction
This document contains all the information you need to create an ontology that's used by DSP.

In the first section you find a rough overview of the ontology definition, all the necessary components with a
This document contains all the information you need to create an data model that's used by DSP. According to
Wikipedia, da [data model](https://en.wikipedia.org/wiki/Data_model) is "_is an abstract model that organizes elements
of data and standardizes how they relate to one another and to the properties of real-world entities._" Further it
states: "_A data model explicitly determines the structure of data. Data models are typically specified by a data
specialist, data librarian, or a digital humanities scholar in a data modeling notation_". In this section we will
describe one of the notations that is used by the _dsp-tools_ to create a data model in the dsp repository. The dsp
repository is loosely based on [Linked Open Data](https://en.wikipedia.org/wiki/Linked_data) where also the term
_"ontology"_ is used for the data model. It should be noted that in this context an ontology is not used in the
philosophical sense.

In the first section you find a rough overview of the data model definition, all the necessary components with a
definition and a short example of the definition.

## A short overview
In the following section, you find all the mentioned parts with a detailed explanation. Right at the beginning we look
at the basic fields that belong to an ontology definition. This serves as an overview for you to which you can return
at any time while you read the description.

A complete ontology definition looks like this:
A complete data model definition looks like this:

```json
{
@@ -147,56 +155,67 @@ as well e.i. "keywords": [].
`"lists": [<list-definition>,<list-definition>,...]`

Often in order to characterize or classify a real world object, we use a sequential or hierarchical list of terms. For
example a hypothetical classification of classical music genres could be as :

- Orchestral music
- Symphony
- Symphony poem
- Overture
- Concerto
- Ballet
- Incidential music
- Suite
- Chamber music
- String trio
- Piano trio
- String quartet
- Piano quartet
- String quintet
- Piano quintet
- Other
- Solo instrumental
- Organ
- Piano
- Harpsichord
- Spinet
- Guitar
- Lute
- Violin
- Flute
- Other
- Vocal Music
- Choir
- Oratorios
- Passions
- Cantatas
- Masses
- Motets
- Madrigals
- Psalms
- Solo
- Songs
- Arias
- Opera
- Comic opera
- Serious Opera
- Opera Semiseria
- Opera Conrnique
- Grand opera
- Opera verismo
example a classification of disciplines in the Humanities might look like follows:

- Performing arts
- Music
- Chamber music
- Church music
- Conducting
- Choirs
- Orchestras
- Music history
- Musictheory
- Musicology
- Jazz
- Pop/Rock
- Dance
- Choreography
- Theatre
- Acting
- Directing
- Playwriting
- Scenography
- Movies/Television
- Animation
- Live action
- Visual arts
- Fine arts
- Drawing
- Painting
- Photography
- Applied Arts
- Animation
- Architecture
- Decorative arts
- History
- Ancient history
- Modern history
- Languages and literature
- Linguistics
- Grammar
- Etymology
- Phonetics
- Semantics
- Literature
- Fiction
- Non-fiction
- Theory of literature
- Philosophy
- Aesthetics
- Applied philosophy
- Epistemology
- Justification
- Reasoning
- Metaphysics
- Determinism and free will
- Ontology
- Philosophy of mind
- Teleology


DSP allows to define such controlled vocabularies or thesauri. They can be arranged "flat" or in "hierarchies" (as the
given example about music genres is). The definition of these entities are called "lists" in the DSP. Thus, the
given example about the disciplines in Humanities is). The definition of these entities are called "lists" in the DSP. Thus, the
list object is used to give the resources of the ontology a taxonomic quality. A taxonomy makes it possible to
categorize a resource. The big advantage of a taxonomic structure as it is implemented by the DSP
is that the user can subcategorize the objects. This allows the user to formulate his search requests more or less
@@ -230,7 +249,8 @@ therefore flat.
A resource can be assigned to a taxonomic node within its properties. So a resource of type "musical work" with the
title "La Traviata" would have the property/attribute "musical-genre" with the value "Grand opera". Within the DSP,
each property or attribute has an assigned cardinality. Sometimes, a taxonomy allows that an object may belong to
different categories at the same time (e.g. an image which depicts several categories at the same time). In these cases, a cardinality &gt; 1 allows to add multiple attributes
different categories at the same time (e.g. an image which depicts several categories at the same time). In these cases,
a cardinality &gt; 1 allows to add multiple attributes
of the same time. See further below the description of the [cardinalities](#cardinalities)

A node of the Taxonomy may have the following elements:
@@ -243,50 +263,112 @@ It needs to specify at least one language.
is _optional_.
- _nodes_: Array of subnodes. If you have a non-hierarchical taxonomy (i.e. a taxonomy with only 2 levels, the root
level and another level), you don't have child nodes. Therefore the nodes element can be omitted in case of a flat
taxonomy.
taxonomy.

Each list must have exactely one root node which has the same form bu denotes the list itself.

Here is an example on how to build a taxonomic structure with the help of JSON:

```json
"lists": [
"lists": [
{
"name": "my_list",
"labels": {"en": "Disciplines of the Humanities"},
"comments": {"en": "This ist is just a silly example", "fr": "un example un peu fou"},
"nodes": [
{
"name": "classicalmusicgenres",
"labels": { "de": "Musikkategorien für klassische Musik", "en": "Genres of classical music" },
"name": "node_1_1",
"labels": {"en": "Performing arts"},
"comments": {"en": "Arts that are events", "de": "Künste mit performativem Character"},
"nodes": [
{
"name": "orchestral",
"labels": { "en": "Orchestral music", "de": "Orchestermusik" },
"comments": { "en": "Multiple instruments together", "de": "Mehrere Instrumente zusammen" },
{
"name": "node_2_2",
"labels": {"en": "Music"},
"nodes": [
{
"name": "symphony",
"labels": { "en": "Symphony", "de": "Symphonie" }
"name": "node_3_3",
"labels": {"en": "Chamber music"}
},
{
"name": "node_4_3",
"labels": {"en": "Church music"}
},
{
"name": "node_5_3",
"labels": {"en": "Conducting"},
"nodes": [
{
"name": "node_6_4",
"labels": {"en": "Choirs"}
},
{
"name": "node_7_4",
"labels": {"en": "Orchestras"}
}
]
},
{
"name": "symphonicpoem",
"labels": { "en": "Symphonic poem", "de": "Symphonische Dichtung" }
"name": "node_8_3",
"labels": { "en": "Music history" }
},
{
"name": "overture",
"labels": { "en": "Overture", "de": "Overtüre" }
"name": "node_9_3",
"labels": {"en": "Musictheory"}
},
{
"name": "concerto",
"labels": { "en": "Conerto", "de": "Konzert" }
"name": "node_10_3",
"labels": {"en": "Musicology"
},
...
{
"name": "node_11_3",
"labels": {"en": "Jazz"}
},
{
"name": "node_12_3",
"labels": {"en": "Pop/Rock/Blues"}
}
]
},
{
"name": "chambermusic",
"labels": { "en": "Chamber music", "de": "Kammermusik" },
"nodes": [...]
},
...
}
]
}
},
{...},{...}
]
}
]
```
#### Lists from Excel
A list can also be imported from an excel sheet. The excel must have the following format (currently only a single
language is supported):
![img_1.png](img_1.png)
In such a case, the excel-file can directly be referenced in the list definition by defining a special list node:
```json
{
"name": "fromexcel",
"labels": {
"en": "Fromexcel"
},
"nodes": {
"file": "excel-list.xlsx",
"worksheet": "Tabelle1"
}
}
```
The nodes section then must contain the fields
- _file_: Path to the excel file
- _worksheet_: The name of the worksheet in the excel
The nodenames are composed from the label by concatenating the words in the label, with the first word starting wit a
lower case character and the other words starting with an upper case character. So the label `Chamber music` would
become the name `chamberMusic`. _Please note that the label must be unqiue for one list. If in a hierarchical list the
same label is used several times, the nodename will be expanded by adding underlines "_" at the end until the name is
unique_.
As already mentioned before, the _lists_ element is optional. If there are no lists, this element has to be omitted.
### Groups
@@ -0,0 +1,13 @@
[![PyPI version](https://badge.fury.io/py/knora.svg)](https://badge.fury.io/py/knora)

# DSP tools to use Excel-files for data modelling and data import
Dsp-tools is able to directly read and process excel files and output the appropriate JSON and/or XML-files for
importing data to the dsp-repository.

## Flat and hierarchical lists
Lists or "controlled vocabularies" are sets of fixed terms that are used to characterize something. Hierarchical lists
correspond to classifications or taxonomies.

The format of the excel is described [here](./dsp-tools-create.md#lists-from-excel).


Loading

0 comments on commit 3628992

Please sign in to comment.