5 Levels of data reusability

Tim Berners-Lee's 5 star Open Data is a really cool mental model of how to think about open data quality. Check out the website if you havent seen it: https://5stardata.info/en/

But I think it doesn't quite fit what most developers would consider _usable_ data, so it might make sense to provide a different list that focuses on _data reusability_.

Mostly, it lacks _typed_ data as a deliminator - whether the data has a machine readable schema. Personally, I think this is one of the most important characteristics. It's probably one of the main reasons why SQL is so incredibly popular, or why pretty much all programming languages have things like Structs or Classes with (type-safe) properties. But not all data has this, so I think it should be a distinction layer - a separate level, if you will.

Also, we can introduce _verifiability_ of data, powered by Atomic Commits (or any other technology that does something similar).

I'm not sure whether we should call it '5 levels', it's definitely not as catchy as '5 stars'. I'm also not fully certain about 'usability', but I think it describes what I mean pretty well.

Anyways, here's a work in progress / draft. Feel free to share ideas / criticism / thoughts!

========

# 5 Levels of data reusability

Not all data are created equal.
There are notable differences in how much you can do with data and how much effort it takes.
The more reusable data is, the easier it will be to use it as a developer, researcher or other type of data user.
Re-useability is about being able to transform, sort, query, serialize,  modify, render and audit data without requiring too much work.

_This list is inspired by Tim Berners-Lee's [5-star open data](https://5stardata.info/en/)_.

## Level 0: proprietary data

If you don't give others the _rights_ to read, use or modify your data, it's reusability is zero.

That's why it's important to have a _license_ that allow others to use your data.
A good choice for a permissive option is the [Open Database License](https://opendatacommons.org/licenses/odbl/summary/).
Creative Commons licenses are also good options to clearly communicate _if_, and if so then _how_, your data is permitted to be re-used.

It's also important to use _open formats_ (such as `CSV`, `JSON` or `PNG`), instead of _proprietary formats_ (tied to specific vendors, such as `PSD` or `RAR`).


## Level 1: unstructured data

_Examples: images, videos, plain text_

Unstructured data is the least usable.
Humans can read it, and AI / Machine Learning systems can draw more conclusions from it then ever,
but it's hard to build an actual application or graphic from only unstructured data.

```
Hi! I'm Joep, I'm born in 1991.
```

## Level 2: structured data

_Examples: CSV, XML, JSON, TOML, EXCEL_

Structured data can be read by machines, and this allows us to do all sorts of useful things.
We can _query_, _sort_ and _filter_.
But still, this type of data often requires human input when it needs to be processed.
And we don't have guarantees about which fields will be filled, or what their datatypes are.
One time, a `birthYear` can be a string, and the next time it can be a number.
Data can be _structured_, but still _unpredictable_.

```json
{
  "name": "Joep",
  "birthYear": 1991
}
```

If we want predictability, we need to make it _type-safe_.

## Level 3: type-safe data

_Examples: SQL + DB SCHEMA, JSON + JSON schema, XSD + XML, RDF + SHACL, In-memory data in type-safe programming languages_

Type-safe data means that every value of the data has an explicit datatype.
It is _strongly typed_ and has a clear _schema_ that describes which properties you can expect in a Resource.
This means that someone re-using type-safe data can know for certain that it conforms to a specification, a set of rules.
The shape of the data is _predictable_.
This predictability means that developers can safely re-use it in their system without worrying about missing fields or datatype errors.

Lots of software has _internal_ type safety, especially if you use type-safe programming languages like Typescript, Kotlin or Rust.
However, when the data _leaves the system_, a lot of type related data is lost.
Even if this schema related information is described, the schema itself is often not machine-readable.
The best way to have type-safe data, is to describe the schema in a machine-readable format.

In SQL, we can use a DB schema. In JSON, we can add a JSON Schema file. For XML, we have XSD.

In Atomic Data, the Properties themselves (the links in the keys in JSON-AD) describe the required datatypes, which helps developers when re-using data understand what they can expect from a value.

```json
{
  "https://atomicdata.dev/properties/isA": ["https://atomicdata.dev/classes/Agent"],
  "https://atomicdata.dev/properties/name": "Joep",
  "https://atomicdata.dev/properties/birthYear": 1991,
  "https://atomicdata.dev/properties/worksOn": "Atomic Data",
}
```

## Level 4: browsable data

_Examples: Atomic Data, properly hosted RDF_

If your data is _connected_ to other pieces of machine-readable dat, is becomes browsable, similar to how websites link to each other.
This effectively creates a _web of data_, and allows for a whole new way to think about the internet.
This is what allows decentralized applications, true data ownership, and a new set of applications.

```json
{
  "https://atomicdata.dev/properties/isA": ["https://atomicdata.dev/classes/Agent"],
  "https://atomicdata.dev/properties/name": "Joep",
  "https://atomicdata.dev/properties/birthYear": 1991,
  "https://atomicdata.dev/properties/worksOn": "https://atomicdata.dev",
}
```

## Level 5: verifiable data

_Examples: Atomic Data + Atomic Commits_

When your data is _verifiable_, other people can verify who created it and modified it.
They can use cryptography to validate signatures, which proves that one person or machine created a piece of data.

```json
{
  "https://atomicdata.dev/properties/isA": ["https://atomicdata.dev/classes/Agent"],
  "https://atomicdata.dev/properties/name": "Joep",
  "https://atomicdata.dev/properties/birthYear": 1991,
  "https://atomicdata.dev/properties/worksOn": "https://atomicdata.dev",
  "https://atomicdata.dev/properties/previousCommit": "https://atomicdata.dev/commits/EF18751AE781",
}
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

5 Levels of data reusability #103

5 Levels of data reusability

Level 0: proprietary data

Level 1: unstructured data

Level 2: structured data

Level 3: type-safe data

Level 4: browsable data

Level 5: verifiable data

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

5 Levels of data reusability #103

Description

5 Levels of data reusability

Level 0: proprietary data

Level 1: unstructured data

Level 2: structured data

Level 3: type-safe data

Level 4: browsable data

Level 5: verifiable data

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions