Skip to content

Check out code and set up development environment

Yin Qu (屈垠) edited this page Feb 23, 2016 · 10 revisions


This page explains how to set up a development environment for BigSemantics. A development environment is required for authoring new wrappers or developing new features for BigSemantics. The development environment uses Git submodules to include multiple components of BigSemantics that work together.

Required and Optional Tools

Before you start, make sure that you have installed the following required tools:

  • A Git client. We use Git to manage our code. We recommend the Git command line tool, or the SourceTree GUI tool. You can use any other git client you prefer, as long as it supports submodules.

  • Java SE development environment (JDK) 1.7 or up. You can download the latest version of JDK from this page.

  • Apache Ant. We use Ant to build components. Ant can be installed as a command line tool. If you use Eclipse, it is already integrated.

  • Apache Maven. We use Maven to manage dependencies for some projects. Maven can be installed as a command line tool. If you use Eclipse, we recommend the m2e plugin for better integration with Eclipse. Visit the m2e download page for the latest address for the update site of m2e (note that the m2e download page itself is not the update site). To install m2e, open Eclipse's 'Install New Software' dialogue from the 'Help' menu, and paste the URL of the update site you found from the previous link.

Optional but recommended:

  • Eclipse (J2EE version). A good IDE can make you more productive. In the lab we prefer to use Eclipse (the J2EE version). Some parts of the tutorials will be Eclipse specific.

  • A merge tool. This will be useful when conflicts happen. We recommend you install P4Merge or DiffMerge. If you have TortoiseSVN or TortoiseGit installed, TortoiseMerge will be another choice.

IMPORTANT: You may also need to add javac to the system path variable (Google 'Change environment variables' for your specific OS). Add the 'bin' folder of your JDK installation (i.e. C:\Program Files\Java\jdk1.7.0_67\bin) to your PATH. You may need to restart running applications for it to take effect.

Forking Projects and Checking Out the Code

If you are going to author new wrappers or write code for BigSemantics, but don't have push permission to BigSemantics, you will need to fork the following projects before checking out the code. Otherwise, skip to the subsection of checking out the latest code.

  • BigSemantics is the umbrella project that pulls all the other components in one environment.

  • BigSemanticsWrapperRepository contains all the wrappers and generated metadata classes.

  • BigSemanticsJava contains the basic BigSemantics architecture. If you want to change or extend the BigSemantics architecture, such as changing the way wrappers are handled or adding a completely new method to extract metadata, you will develop code in this Git repository.

  • BigSemanticsService contains code for the BigSemantics web service.

  • BigSemanticsJavaScript supports building web applications that use BigSemnatics, using JavaScript. The MICE interface is built using this project, and can serve as an example for your web application.

  • BigSemanticsCSharp supports building C# applications that use BigSemnatics.

Checking Out the Latest Code

After forking those projects, you can get all the components of BigSemantics by:

git clone --recursive<your-github-account-name>/BigSemantics

After authoring new wrappers or developing new features for BigSemantics, use pull requests to share your changes with us and all the other BigSemantics users.

If you do not need to push or already have push permission to BigSemantics, you can check out code using:

git clone --recursive

Importing Projects Into Eclipse

If you use Eclipse, you also want to import the various projects into an Eclipse workspace. This can be done by the following steps:

  • Start Eclipse, and choose a folder as the workspace for BigSemantics. We suggest that you use a different folder than the folder that contains the code.

  • Select 'File' -> 'Import', and then select 'Existing Projects into Workspace' from 'General' in the tree view. Then click 'Next'.

  • In the newly opened dialog, choose 'Select root directory', and use 'Browse' to select the BigSemantics folder you cloned from GitHub. Projects in that folder should be listed.

  • To get started, we recommend including all projects, except simplAndroidSpecifics and BigSemanticsAndroidSpecifics if you are not using BigSemantics on Android devices.

  • Click 'Finish'. The projects should show in Eclipse's Project Explorer. It may take a minute for Eclipse to compile all the projects.


Authoring Wrappers and Seeing Changes in Action

BigSemantics provides convenience tools for you to see changes you made in action while you author new or edit existing wrappers.

The basic workflow is through Eclipse:

  • Set up the development environment, as previously explained in this Wiki page.
  • Author new wrappers or make changes to existing wrappers.
  • Stop any running BigSemantics service process. If the process was started in Eclipse, you can just press the red Stop button in Eclipse's Console view.
  • In Eclipse, run BigSemanticsSDK/CompileToJava.launch, by right click on the file and choose 'Run As' -> 'Java Application'.
  • Refresh BigSemanticsGeneratedClassesJava by right clicking that project in Eclipse and choose Refresh. (This step is necessary because the previous step may change files in BigSemanticsGeneratedClassesJava, but in many cases Eclipse fails to automatically detect these changes in file system.)
  • Run BigSemanticsService/RunService.launch. A new BigSemantics service process will be spawned, and you should be able to see its output in Eclipse's Console view.

To see extraction results for your targeted page:

  • Install our Chrome extension, and in its Options, enable the Developer Mode. The extension then enables seeing semantics extracted from the current page by clicking on the button in the address bar. Note that you will need to change the options of the extension to make it use the local instance of BigSemantics service: use localhost as the host name, 8080 as the port number, and 8443 as the secure port number.
  • Alternatively, you can point your browser to http://localhost:8080/static/houseMICE, which shows the MICE interface. You can input the URL to the targeted page into the interface to see extraction results. This applies to most static websites.

If you don't want to use Eclipse, or you want to run the service on a server, you can use the Ant target update-and-run-service in BigSemanticsService/build.xml. The steps are similar to above, except that you run the Ant target instead of running the two launch files.

Committing Your Work

Once you have finished wrapper authoring or feature development, you should commit and push your work. You will need to do so for all the affected submodules, as well as the umbrella BigSemantics Git repository. Submodules are 'pointers' to specific commits of the corresponding Git repository, instead of tracking the latest code. Thus, we recommend you follow these steps to make sure that your work is not lost:

  • Commit your work in corresponding submodule. For example, if you authored new wrappers, your modifications will be in the project BigSemanticsWrappers which belongs to the BigSemanticsWrapperRepository submodule.

  • Since submodule does not track any specific branch, your work is not in a detached commit. You should create a local branch to point to this commit, to prevent losing it:

git checkout -b <branch-name>

We recommend a meaningful branch name, such as 'wrapper_for_urbanspoon' or 'cache_bug_fix'.

  • Commit your work to the local branch.

  • Switch to the master branch, pull down the latest code, and merge in the local branch you just created. You might need to merge conflicts if any.

  • Push the master branch of the submodule to Github.

  • Delete the local branch created before:

git branch -d <branch-name>
  • Go to the umbrella BigSemantics folder. If you do git status, you should be able to see the updated submodule with comments like 'new commit'. Include all updated submodules:
git add -u

Then commit and push. This will update the BigSemantics repository to point to the right commits for submodules.

If you are working with your fork of BigSemantics, use pull requests to share your changes with us and all the other BigSemantics users.

Updating BigSemantics

To update BigSemantics and all submodules to latest code, within the umbrella BigSemantics folder run:

git pull 
git submodule update --recursive
You can’t perform that action at this time.