Skip to content

Commit

Permalink
services updates
Browse files Browse the repository at this point in the history
  • Loading branch information
antaldaniel committed Nov 10, 2021
1 parent b060544 commit 1c9ee34
Show file tree
Hide file tree
Showing 8 changed files with 251 additions and 0 deletions.
Binary file added content/services/data-as-service/featured.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
61 changes: 61 additions & 0 deletions content/services/data-as-service/index.md
@@ -0,0 +1,61 @@
---
title: Data-as-Service
summary: We provide our clients with simple datasets, databases, harmonized survey data, and various other rich data applications; we provide them with continuous access to high-quality, re-processed, re-usable public sector and scientific data.
tags:
- daas
- api
date: "2021-01-21T00:00:00Z"
lastmod: "2021-07-07T00:00:00Z"

# Optional external URL for project (replaces project detail page).
external_link: ""

image:
caption: ""
focal_point: Smart

links:
- icon: twitter
icon_pack: fab
name: Follow
url: https://twitter.com/EconDataObs
- icon: linkedin
icon_pack: fab
name: Connect
url: https://www.linkedin.com/company/78562153/
- icon: database
icon_pack: fas
name: Try API
url: https://api.economy.dataobservatory.eu/
- icon: book-open
icon_pack: fas
name: Documentation
url: https://competition-data-observatory.netlify.app/#data
url_code: ""
url_pdf: ""
url_slides: ""
url_video: ""

# Slides (optional).
# Associate this project with Markdown slides.
# Simply enter your slide deck's filename without extension.
# E.g. `slides = "example-slides"` references `content/slides/example-slides.md`.
# Otherwise, set `slides = ""`.
slides: example
---


**We want to ensure that individual researchers, artists, and professionals, as well as NGOs and small and large organizations can benefit equally from big data in the age of artificial intelligence.**

Big data creates inequality and injustice because it is only the big corporations, big government agencies, and the biggest, best endowed universities that can finance long-lasting, comprehensive data collection programs. Big data, and large, well-processed, tidy, and accurately imputed datasets allow them to unleash the power of machine learning and AI. These large entities are able to create algorithms that decide the commercial success of your product and your artwork, giving them a competitive edge against smaller competitors while helping them evade regulations.



> Check out our [iotables](https:/iotables.dataobservatory.eu/) software that helps the use of national accounts data from all EU members states to create economic direct, indirect and induced economic impact calculation, such as employment multipliers or GVA affects of various cultural and creative economy policies.

> Check out our [regions](https:/regions.dataobservatory.eu/) software that helps the harmonization of various European and African standardized surveys.


> Check out our [retroharmonize](https://retroharmonize.dataobservatory.eu/) software that helps the harmonization of various European and African standardized surveys.
Binary file added content/services/data-curation/featured.jpg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
51 changes: 51 additions & 0 deletions content/services/data-curation/index.md
@@ -0,0 +1,51 @@
---
title: Data Curation
summary: We create high value key business and policy evaluation indicators. Scientific proofs require the combination of correctly matching, formatting, and verifying controlled pieces of data. Our data comes from verified and legal sources, with information about use rights and a complete history. You can always take a look at the processing code, too. We do not deal in blood diamonds.
tags:
- curation
date: "2021-01-21T00:00:00Z"

# Optional external URL for project (replaces project detail page).
external_link: ""

image:
caption: "The Open Pit of the Udachnaya Diamond Mine, ©[Stapanov Alexander](https://commons.wikimedia.org/w/index.php?curid=350061)"
focal_point: Smart

links:
- icon: twitter
icon_pack: fab
name: Follow
url: https://twitter.com/EconDataObs
- icon: linkedin
icon_pack: fab
name: Connect
url: https://www.linkedin.com/company/78562153/
- icon: lightbulb-on
icon_pack: fas
name: Get Inspired
url: https://contributors.dataobservatory.eu/data-curators.html#get-inspired
- icon: book-open
icon_pack: fas
name: Documentation
url: https://contributors.dataobservatory.eu/
url_code: ""
url_pdf: ""
url_slides: ""
url_video: ""

# Slides (optional).
# Associate this project with Markdown slides.
# Simply enter your slide deck's filename without extension.
# E.g. `slides = "example-slides"` references `content/slides/example-slides.md`.
# Otherwise, set `slides = ""`.
slides: data-curation
---

**If you cannot find the right data for your policy evaluation, your consulting project, your PhD thesis, your market research, or your scientific research project, it does not mean that the data does not exist, or that it is not available for free. In our experience, up to 95% of available open data is never used, because potential users do not realize it exists or do not know how to access it.**

Every day, thousands of new datasets become available via the EU open data regime, freedom of information legislation in the United States and other jurisdictions, or open science and scientific reproducibility requirements — but as these datasets have been packaged or processed for different primary, original uses, they often require open data experts to locate them and adapt them to a usable form for reuse in business, scientific, or policy research.

The creative and cultural industries often do not participate in government statistics programs because these industries are typically comprised of microenterprises that are exempted from statistical reporting and that file only simplified financial statements and tax returns. This means that finding the appropriate private or public data sources for creative and cultural industry uses requires particularly good data maps.

`Data curation` means that we are continuously mapping potential data sources and sending requests to download and quality test the most current data sources. Our CEEMID project has produced several thousand indicators, of which a few dozen are available in our [Demo Music Observatory](/project/music-observatory/).If you have specific data needs for a scientific research, policy evaluation, or business project, we can find and provide the most suitable, most current, and best value data for analysis or for [ethical AI applications](/service/trustworthy-ai/).
Binary file added content/services/data-processing/featured.jpg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
56 changes: 56 additions & 0 deletions content/services/data-processing/index.md
@@ -0,0 +1,56 @@
---
title: Data Processing
summary: We create high value key business and policy evaluation indicators. Scientific proofs require the combination of correctly matching, formatting, and verifying controlled pieces of data. Our data comes from verified and legal sources, with information about use rights and a complete history. You can always take a look at the processing code, too. We do not deal in blood diamonds.
tags:
- data-processing
date: "2021-01-21T00:00:00Z"
lastmod: "2021-11-10T13:15:00+01:00"

# Optional external URL for project (replaces project detail page).
external_link: ""

image:
caption: "The Open Pit of the Udachnaya Diamond Mine, ©[Stapanov Alexander](https://commons.wikimedia.org/w/index.php?curid=350061)"
focal_point: Smart

links:
- icon: twitter
icon_pack: fab
name: Follow
url: https://twitter.com/dataandlyrics
- icon: linkedin
icon_pack: fab
name: Connect
url: https://www.linkedin.com/company/78562153/
- icon: fa-lightbulb
icon_pack: fas
name: Get Inspired
url: https://contributors.dataobservatory.eu/data-curators.html#get-inspired
- icon: book-open
icon_pack: fas
name: Documentation
url: https://contributors.dataobservatory.eu/
url_code: ""
url_pdf: ""
url_slides: ""
url_video: ""

# Slides (optional).
# Associate this project with Markdown slides.
# Simply enter your slide deck's filename without extension.
# E.g. `slides = "example-slides"` references `content/slides/example-slides.md`.
# Otherwise, set `slides = ""`.
slides: data-curation
---

*Data analysts spend 80% of their time on data processing, even though computers can perform these task much faster, with far less errors, and they can document the process automatically. Data processing can be shared: an analyst in a company and an analyst in an NGO does not have to reprocess the very same data twice**

See our blogpost [How We Add Value to Public Data With Imputation and Forecasting?](/post/2021-11-06-indicator_value_added/).

Public data sources are often plagued by missng values. Naively you may think that you can ignore them, but think twice: in most cases, missing data in a table is not missing information, but rather malformatted information. This approach of ignoring or dropping missing values will not be feasible or robust when you want to make a beautiful visualization, or use data in a business forecasting model, a machine learning (AI) applicaton, or a more complex scientific model. All of the above require complete datasets, and naively discarding missing data points amounts to an excessive waste of information. In this example we are continuing the example a not-so-easy to find public dataset.

Completing missing datapoints requires statistical production information (why might the data be missing?) and data science knowhow (how to impute the missing value.) If you do not have a good statistician or data scientist in your team, you will need high-quality, complete datasets. This is what our automated data observatories provide.

<td style="text-align: center;">{{< figure src="/media/img/blogposts_2021/Sisyphus_Bodleian_Library.png" caption="See our blogpost about [the Data Sisyphus](https://reprex.nl/post/2021-07-08-data-sisyphus/) blogpost." numbered="false" >}}</td>

We have a better solution. You can always rely on our API to import directly the latest, best data, but if you want to be sure, you can use our [regular backups](https://zenodo.org/record/5652118#.YYhGOGDMLIU) on Zenodo. Zenodo is an open science repository managed by CERN and supported by the European Union. On Zenodo, you can find an authoritative copy of our indicator (and its previous versions) with a digital object identifier, for example, [10.5281/zenodo.5652118](https://doi.org/10.5281/zenodo.5652118). These datasets will be preserved for decades, and nobody can manipulate them. You cannot accidentally overwrite them, and we have no backdoor access to modify them.
Binary file added content/services/metadata/featured.jpg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
83 changes: 83 additions & 0 deletions content/services/metadata/index.md
@@ -0,0 +1,83 @@
---
title: Metadata
summary: Adding metadata exponentially increases the value of data. Did somebody already adjust old data to conform to constantly changing geographic boundaries? What are some practical ways of combining satellite sensory data with my organization's records? And do I have the right to do so? Metadata logs the history of data, providing instructions on how to reuse it, also setting the terms of use. We automate this labor-intensive process applying the FAIR data concept.
tags:
- metadata

date: "2021-07-07T00:00:00Z"

# Optional external URL for project (replaces project detail page).
external_link: ""

image:
caption: "Diamond Polisher, ©[Andere Andre](https://commons.wikimedia.org/w/index.php?curid=4770037)"
focal_point: Center

links:
- icon: twitter
icon_pack: fab
name: Follow
url: https://twitter.com/EconDataObs
- icon: linkedin
icon_pack: fab
name: Connect
url: https://www.linkedin.com/company/78562153/
- icon: github
icon_pack: fab
name: Code
url: https://github.com/dataobservatory-eu/dataobservatory/
- icon: book-open
icon_pack: fas
name: Documentation
url: https://r.dataobservatory.eu/
url_code: ""
url_pdf: ""
url_slides: ""
url_video: ""

# Slides (optional).
# Associate this project with Markdown slides.
# Simply enter your slide deck's filename without extension.
# E.g. `slides = "example-slides"` references `content/slides/example-slides.md`.
# Otherwise, set `slides = ""`.
slides: trustworthy-ai
---

*Adding metadata exponentially increases the value of data. Did your region add a new town to its boundaries? How do you adjust old data to conform to constantly changing geographic boundaries? What are some practical ways of combining satellite sensory data with my organization's records? And do I have the right to do so? Metadata logs the history of data, providing instructions on how to reuse it, also setting the terms of use. We automate this labor-intensive process applying the FAIR data concept.*

In our observatory we apply the concept of [FAIR](#FAIR) (**f**indable, **a**ccessibe, **i**nteroperable, and **r**eusable digital assets) in our APIs and in our open-source statistical software packages.

## The hidden cost item

Metadata gets less attention than data, because it is never acquired separately, it is not on the invoice, and therefore it remains an a hidden cost, and it is more important from a budgeting and a usability point of view than the data itself. Metadata is responsible for industry non-billable hours or uncredited working hours in academia. Poor data documentation, lack of reproducible processing and testing logs, inconsistent use of currencies, keywords, and storing [messy data](#messy-data) make reusability and interoperability, integration with other information impossible.

[FAIR Data and the Added Value of Rich Metadata](#FAIR-data) we introduce how we apply the concept of [FAIR](#FAIR) (**f**indable, **a**ccessibe, **i**nteroperable, and **r**eusable digital assets) in our APIs.

Organizations pay many times for the same, repeated work, because these boring tasks, which often comprise of tens of thousands of microtasks, are neglected. Our solution creates automatic documentation and metadata for your own historical internal data or for acquisitions from data vendors. We apply the more general [Dublin Core](#Dublin-Core) and the more specific, mandatory and recommended values of [DataCite](#DataCite) for datasets -- these are new requirements in EU-funded research from 2021. But they are just the minimal steps, and there is a lot more to do to create a diamond ring from an uncut gem.

## Map your data: bibliographis, catalogues, codebooks, versioning

Updating descriptive metadata, such as bibliographic citation files, descriptions and sources to data files downloaded from the internet, versioning spreadsheet documents and presentations is usually a hated and often neglected task withing organization, and rightly so: these boring and error-prone tasks are best left to computers.

{{< figure src="/media/img/gems/n-RFId0_7kep4-unsplash.jpg" caption="Already adjusted spreadsheets are re-adjusted and re-checked. Hours are spent on looking for the right document with the rigth version. Duplicates multiply. Already downloaded data is downloaded again, and miscategorized, again. Finding the data without map is a treasure hunt. Photo: © [N.](https://unsplash.com/photos/RFId0_7kep4?utm_source=unsplash)" numbered="false" >}}

The lack of time and resources spend on documentation over time reduces reusability and significantly increases data processing and supervision or auditing costs.

- [x] Our observatory metadata is compliant with the [Dublin Core Cross-Domain Attribute Set](https://www.dublincore.org/specifications/dublin-core/cross-domain-attribute/) metadata standard, but we use different formatting. We offer simple re-formatting from the richer DataCite to Dublin Core for interoperability with a wider set of data sources.
- [x] We use all [mandatory](https://support.datacite.org/docs/datacite-metadata-schema-v44-mandatory-properties) DataCite metadata fields, all the [the recommended and optional](https://support.datacite.org/docs/datacite-metadata-schema-v44-recommended-and-optional-properties) ones.
- [x] It complies with the tidy data principles.

In other words: very easy to import into your databases, or join with other databases, and the information is easy to find. Corrections, updates can automatically managed.


## What happened with the data before?


- [x] We are creating Codebooks that are following the SDMX statistical metadata codelists, and resemble the SMDX concepts used by international statistical agencies. (See more technical information [here](https://r.dataobservatory.eu/articles/codebook.html).)

Small organizations often cannot afford to have data engineers and data scientists on staff, and they employ analysts who work with Excel, OpenOffice, PowerBI, SPSS or Stata. The problem with these applications is that they often require the user to manually adjust the data, with keyboard entries or mouse clicks. Furthermore, they do not provide a precise logging of the data processing, manipulation history.
The manual data processing and manipulation is very error prone and makes the use of complex and high value resources, such as harmonized surveys or symmetric input-output tables, to name two important source we deal with, impossible to use. The use of these high-value data sources often requires tens of thousands of data processing steps: no human can do it faultlessly.

What is even more problematic that simple applications for analysis do not provide a log of these manipulations’ steps: pulling over a column with the mouse, renaming a row, adding a zero to an empty cell. This makes senior supervisory oversight and external audit very costly.

Our data comes with full history: all changes are visible, and we even open the code or algorithm that processed the raw data. Your analysts can still use their favourite spreadsheet or statistical software application, but they can start from a clean, tidy dataset, with all data wrangling, currency and unit conversion, imputation and other low-priority but important tasks done and logged.

0 comments on commit 1c9ee34

Please sign in to comment.