Skip to content
Yin Qu (屈垠) edited this page Sep 2, 2016 · 12 revisions

BigSemantics Overview

BigSemantics is a free and open-source software architecture for developing powerful applications that derive and present explorable semantic information (metadata) from diverse information sources. Semantics summarize underlying information resources, for exploration within a unified context.

An example is the Metadata In-Context Expander (MICE) which helps the user maintain context while exploring linked semantic information. The MICE homepage shows semantic information for a CD on Amazon. Linked information, such as related bestseller lists or items people also buy, can be expanded in the same context. Some more examples:

The Getting Started section explains how you can use MICE in your own web application. BigSemantics supports a wide range of semantic types and web sites. In the next section, we will explain how you can easily extend BigSemantics to support new web sites that are useful to your application.

Supporting New Sites: Meta-Metadata Language

The real power of BigSemantics is that it provides an easy-to-learn, declarative language called meta-metadata for developers to specify what semantic information they need and how it should be presented for any web sites that are useful to their application. No special HTML tags are needed from the source web site.

These specifications are written in code blocks called wrappers, each corresponding to a type of entities (such as product) and a source of information (such as Amazon). Common semantic types can be conveniently reused through inheritance, reducing the effort needed to develop new wrappers. BigSemantics comes with a repository of wrappers, organized in a hierarchy. Learn more about the meta-metadata language in the Getting Started section.

Supported Platforms

BigSemantics provides a web service that can be used in web, mobile, and desktop applications. Libraries for working with the service are provided in JavaScript, Java, and C#, and can be used in native Android or WindowsRT applications.

The core BigSemantics architecture is originally written in Java, and is being ported to JavaScript and C#. You are welcomed to make contributions!

Getting Started

This section helps you get started on using BigSemantics in your own application.

Checking Out Code

All BigSemantics projects are hosted inside an umbrella project BigSemantics, in the form of git submodules. You should follow this step-by-step guide to checkout the code and set up a development environment for using and developing with BigSemantics. The guide also explains how you can share your work on wrappers or other parts of BigSemantics to the community through pull requests.

Using BigSemantics in Web Applications

BigSemanticsJavaScript provides a library that you can use to work with BigSemantics in your own web application. The library is structured in a way that you can choose to use the MICE interface components and styles, or develop your own.

Meta-Metadata: Language for Authoring Wrappers

With BigSemantics, developers and other curators author application-independent, reusable code blocks called wrappers, in the meta-metadata language, to specify data models, extraction rules, and presentation semantics of metadata. Data models, extraction rules, and presentation semantics are integrated together in wrappers to constitute metadata types. BigSemantics comes with a repository of wrappers supporting a wide range of types and sources.

The meta-metadata language supports representing nested, cross-linked, and recursive data models. For example, metadata for books can contain metadata for their authors, while metadata for authors can again contain metadata for their books.

At runtime, BigSemantics (in the form of a library or a web service) extracts metadata from web pages using wrappers. To make semantics conveniently accessible from client programs (e.g. applications written in Java or C#), BigSemantics automatically generates native classes using data models defined in wrappers, which are called metadata classes. Extracted metadata is mapped to native Java or C# instances of metadata classes for use in program.

Learn about how to use the meta-metadata language to author wrappers for any web sites. The tutorial walks through data model definitions, information extraction, and metadata presentation in details, as well as useful tools to assist development. Developers with object oriented programming experiences will find it easy to understand and use.

BigSemantics Service

BigSemantics comes with a web service that eases use in different types of applications. Once hosted, it can be accessed through HTTP with a simple, RESTful API.

We host a BigSemantics service instance that can be accessed freely by the public (without guarantee of availability). You can host your own instance by following the guide.

Developing with BigSemantics

The following sources offer tutorials on developing with BigSemantics in other languages or platforms:

You are welcomed to contribute to BigSemantics! The software architecture is illustrated in this page. For further details on how to extend the core of BigSemantics, including processing wrappers and types, extracting semantic information, and maintaining a graph of linked documents, see this tutorial.

Publications and Useful Links

Check out the research papers and background for students. You can view [demos] ( made with BigSemantics. A useful page for viewing and comparing metadata for a website is the IdeaMACHE Dropzone.


For general support, please contact us at

You can’t perform that action at this time.