A tool that extracts links and definitions from the CL HyperSpec as an RDF graph
Common Lisp
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.gitignore
README.md
clsem.asd
data.lisp
package.lisp
scraper.lisp

README.md

Interlinking information for the HyperSpec

The CLHS comes with ~110000 links. We know where each one of them goes, but where do they come from?

Imagine you're looking at a glossary entry. It's normative, but to which areas of the spec is it relevant? Which functions depend on its definition? I often asked myself that, and it bothered me enough that I wrote this program to extract the hyperlinking structure from the CLHS into an RDF graph, which can be imported into any database (of course, I recommend agraph) that can answer these types of questions.

= Creating the triple information

  1. Install the dependencies:
  • asdf
  • cxml-stp
  • closure-html
  • cl-ppcre
  • drakma
  1. Load the clsem.asd, (asdf:oos 'asdf:load-op :clsem)
  2. (clsem:do-it #p"/path/to/output/file.ttl")

This will query the lispworks HTTP servers and will take a long, long time. If you have a copy of the HyperSpec downloaded, you can use:

      (clsem:do-it #p"/path/to/output/file.ttl"
                   :prefix "file:///Users/asf/Downloads/HyperSpec-7-0/HyperSpec")

And it will finish in ~9 seconds.

This file is in the turtle RDF triple language format. If you import the data into a graph database, things may be easier if you convert it to the ntriples format first. I recommend using the most excellent rapper for this task.