Skip to content
Branch: master
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

ContentMine logo

The usage of ContentMine tools can be learned step-by-step with the help of the tutorials. They describe functionalities, what results to expect, and how to link the different elements of the content mining pipeline. They are based on the ContentMine virtual machine, which has all necessary software pre-installed.

The tutorials can be used in workshops as well as for self-guided learning.

Table of contents

  1. Purpose and installation of the ContentMine-VirtualMachine
    A VirtualBox-Image contains all necessary software as well as sample datasets for getting started with content mining. This tutorial explains how to install the ContentMine-VM and use it as a sandbox environment.

  2. Introduction to the command line interface
    This tutorial introduces the basic UNIX-commands and shows how to navigate folders and handle files.

  3. Getting started with getpapers
    This tutorial demonstrates how to create an initial corpus for fact extraction.

  4. Getting started with quickscrape
    This tutorial introduces quickscrape, and how to use it to extract semi-structured information from web pages.

  5. Create your own scraper definition
    This tutorial shows how to contribute to, and extend the ContentMine scraper collection. If you need a specific definition for your use with quickscrape, here you can learn how to create it.

  6. Normalizing scholarly literature
    This tutorial shows how to normalize scientific literature into a unified format which can be processed by machines.

  7. ContentMine data structure: CProject
    This tutorial gives an overview of the data structure used, and how it can be integrated in your analysis.

  8. Extracting facts with AMI-plugins
    This tutorial demonstrates how to extract, aggregate, and filter facts from scholarly.html.

You can’t perform that action at this time.