Skip to content
/ fennica Public
forked from fennicahub/fennica

R tools for Fennica (Finnish national bibliography)

License

Notifications You must be signed in to change notification settings

Akejeb/fennica

 
 

Repository files navigation

Fennica: Harmonized Finnish national bibliography

This repository contains code for cleaning, enriching and automatically generating reports on the Finnish national bibliography, Fennica.

The live document is deployed in a CSC Rahti container: https://fennica-fennicaa.rahtiapp.fi

This README describes how to reproduce the analyses and generate the notebook.

Origins of data

The data was downloaded from The National Metadata Repository Melinda. See more: https://melinda.kansalliskirjasto.fi/

A script collect.py was used to harvest the data. See the script in the fennica repository. The script was provided to us by Osma Suominen (The National Library of Finland).

Reproducing the workflow

Copy the repository to your computer:

# In terminal / GIT
git clone https://github.com/fennicahub/fennica.git

Another alternative is to download the master branch from the repository front page in GitHub: <> Code -> Download ZIP.

Go to the cloned git repository or extracted zip folder and run R. The following example assumes that the folder was downloaded to user's home folder:

cd fennica
R

Another option is to open an IDE and set the working directory to fennica folder. In RStudio this can be done in the Files tab by changing the folder to fennica folder, clicking the gear icon and selecting "Set as working directory". Alternatively, from the R Console:

# See current working directory
getwd()
# Set working directory to fennica, assuming that fennica folder is in your current folder
setwd("fennica")

Install the necessary dependencies:

install.packages("devtools")
library(devtools)
# Install deps for the current project
devtools::install_local(".")
devtools::install_deps(".", dependencies = TRUE)
devtools::install_github("comhis/comhis")

Render the quarto document:

quarto::quarto_render("inst/examples")

Open the rendered book in your browser.

Alternatively, you can view the same live document deployed in a CSC Rahti container: http://fennica-fennica.rahtiapp.fi

Description of the Webhook workflow, image from CSC Documentation

The bookdown document is rendered with GitHub Actions. The generated files are placed in gh-pages branch in the GitHub Repository. The generated files are copied to Rahti by utilizing a webhook and are hosted on an nginx server.

Using the interactive report

The generated bookdown document consists of 20 different sections, or "chapters". Different sections focus on different fields from the MARC formatted raw data MARC. Most chapters also have visualizations that give a quick glance on what the data looks like. For most fields processed CSV datasets can also be downloaded for further analyses.

Earlier material

Links to notebooks that are not actively maintained but may contain useful information regarding related past work.

The analyses cover several steps including XML parsing, data harmonization, removing unrecognized entries, enriching and organizing the data, carrying out statistical summaries, analysis, visualization and automated document generation.

Licensing

The analyses and full source code are provided in this repository and can be freely reused under the BSD 2 clause (FreeBSD) open source licence. The analyses are based on R and rely on various R packages.

The original data has been published openly by National Library of Finland.

Acknowledgements

The project is now developed based on research and infrastructure funding from the Research Council of Finland (DHL-FI and FIN-CLARIAH). The work is based on past and present collaboration between and Turku Data Science Group (University of Turku), Helsinki Computational History Group (COMHIS) (University of Helsinki) and National library of Finland (Fennica data collection). For the list of contributors, see contributors and the related publications.

Contact

Email: yulia.matveeva@utu.fi / leo.lahti@iki.fi

The project is under active open development:

About

R tools for Fennica (Finnish national bibliography)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 46.9%
  • HTML 39.5%
  • JavaScript 9.8%
  • Python 1.3%
  • TeX 1.1%
  • CSS 1.0%
  • Other 0.4%