Skip to content
Permalink
Browse files
docs(ingest): Add accepted file formats to documentation (DEV-677) (#…
…2038)

* add accepted file formats

* Update data-formats.md

* update sipi path

* rename data-formats.md to file-formats.md

* update index.md

* fix footnote

* reset scala setting
  • Loading branch information
irinaschubert committed Apr 12, 2022
1 parent 521150f commit f72e7a01e9396430e88c75c1ab3ec2743e6cf053
Showing with 34 additions and 36 deletions.
  1. +1 −1 .scalafmt.conf
  2. +0 −22 docs/01-introduction/data-formats.md
  3. +24 −0 docs/01-introduction/file-formats.md
  4. +1 −1 docs/01-introduction/index.md
  5. +4 −8 docs/01-introduction/what-is-knora.md
  6. +3 −3 docs/faq/index.md
  7. +1 −1 mkdocs.yml
@@ -1,6 +1,6 @@
version = "2.7.5"
maxColumn = 120
align.preset = most
align.preset = some
align.multiline = false
continuationIndent.defnSite = 2
assumeStandardLibraryStripMargin = true

This file was deleted.

@@ -0,0 +1,24 @@
<!---
* Copyright © 2021 - 2022 Swiss National Data and Service Center for the Humanities and/or DaSCH Service Platform contributors.
* SPDX-License-Identifier: Apache-2.0
-->

# File Formats in DSP-API

Currently, only a limited number of file formats is accepted to be uploaded onto DSP. Some metadata is extracted from the files during the ingest but the file formats are not validated. Only image file formats are currently migrated into another format. Both, the migrated version of the file and the original are kept.

The following table shows the accepted file formats:

| Category | Accepted format | Converted during ingest? |
| --------------------- | ------------------------- | -------------------------------------------------------------------------- |
| Text, XML<sup>1</sup> | TXT, XML, XSL, XSD | No |
| Tables | CSV, XLS, XLSX | No |
| 2D Images | JPEG, PNG, TIFF, JP2 | Yes, converted to JPEG 2000 by [Sipi](https://github.com/dasch-swiss/sipi) |
| Audio | MPEG (MP3), MP4, WAV | No |
| Video | MP4 | No |
| Office | PDF, DOC, DOCX, PPT, PPTX | No |
| Archives | ZIP, TAR, ISO, GZIP, 7Z | No |


1: If your XML files represent text with markup (e.g. [TEI/XML](http://www.tei-c.org/)),
the recommended approach is to allow Knora to store it as [Standoff/RDF](standoff-rdf.md).
@@ -6,6 +6,6 @@
# Introduction

* [What Is DSP and DSP-API (previous Knora)?](what-is-knora.md)
* [Data Formats in DSP-API](data-formats.md)
* [File Formats in DSP-API](file-formats.md)
* [Standoff/RDF Text Markup](standoff-rdf.md)
* [An Example Project](example-project.md)
@@ -23,15 +23,11 @@ DSP solves this problem by keeping the data alive. You can query all the data
in a DSP repository, not just the metadata. You can import thousands of databases into
DSP, and run queries that search through all of them at once.

Another problem is that researchers use a multitude of different data formats, many of
Another problem is that researchers use a multitude of different file formats, many of
which are proprietary and quickly become obsolete. It is not practical to maintain
all the programs that were used to create and read old data files, or even
all the operating systems that these programs ran on.

Instead of preserving all these data formats, DSP supports
the conversion of all sorts of data to a [small number of formats](data-formats.md)
that are suitable for long-term preservation, and that maintain the data's meaning and
structure:
all the programs that were used to create and read old files, or even
all the operating systems that these programs ran on. Therefore, DSP only accepts a
certain number of [file formats](file-formats.md).

- Non-binary data is stored as
[RDF](http://www.w3.org/TR/2014/NOTE-rdf11-primer-20140624/), in a dedicated
@@ -5,11 +5,11 @@

# Frequently Asked Questions

## Data Formats
## File Formats

### What data formats does Knora store?
### What file formats does Knora store?

See [Data Formats in Knora](../01-introduction/data-formats.md).
See [File Formats in Knora](../01-introduction/file-formats.md).

### Does Knora store XML files?

@@ -10,7 +10,7 @@ nav:
- Introduction:
- Index: 01-introduction/index.md
- What is DSP?: 01-introduction/what-is-knora.md
- Data Formats in DSP-API: 01-introduction/data-formats.md
- File Formats in DSP-API: 01-introduction/file-formats.md
- Standoff/RDF Text Markup: 01-introduction/standoff-rdf.md
- An Example Project: 01-introduction/example-project.md
- DSP Ontologies:

0 comments on commit f72e7a0

Please sign in to comment.