The usage of ContentMine tools can be learned step-by-step with the help of the tutorials. They describe functionalities, what results to expect, and how to link the different elements of the content mining pipeline. They are based on the ContentMine virtual machine, which has all necessary software pre-installed.
The tutorials can be used in workshops as well as for self-guided learning.
Table of contents
Purpose and installation of the ContentMine-VirtualMachine
A VirtualBox-Image contains all necessary software as well as sample datasets for getting started with content mining. This tutorial explains how to install the ContentMine-VM and use it as a sandbox environment.
Introduction to the command line interface
This tutorial introduces the basic UNIX-commands and shows how to navigate folders and handle files.
Getting started with getpapers
This tutorial demonstrates how to create an initial corpus for fact extraction.
Getting started with quickscrape
This tutorial introduces quickscrape, and how to use it to extract semi-structured information from web pages.
Create your own scraper definition
This tutorial shows how to contribute to, and extend the ContentMine scraper collection. If you need a specific definition for your use with quickscrape, here you can learn how to create it.
Normalizing scholarly literature
This tutorial shows how to normalize scientific literature into a unified format which can be processed by machines.
ContentMine data structure: CProject
This tutorial gives an overview of the data structure used, and how it can be integrated in your analysis.
Extracting facts with AMI-plugins
This tutorial demonstrates how to extract, aggregate, and filter facts from scholarly.html.