Check out code and set up development environment
This page explains how to set up a development environment for BigSemantics. A development environment is required for authoring new wrappers or developing new features for BigSemantics. The development environment uses Git submodules to include multiple components of BigSemantics that work together.
Required and Optional Tools
Before you start, make sure that you have installed the following required tools:
Java SE development environment (JDK) 1.7 or up. You can download the latest version of JDK from this page.
Apache Ant. We use Ant to build components. Ant can be installed as a command line tool. If you use Eclipse, it is already integrated.
Apache Maven. We use Maven to manage dependencies for some projects. Maven can be installed as a command line tool. If you use Eclipse, we recommend the m2e plugin for better integration with Eclipse. Visit the m2e download page for the latest address for the update site of m2e (note that the m2e download page itself is not the update site). To install m2e, open Eclipse's 'Install New Software' dialogue from the 'Help' menu, and paste the URL of the update site you found from the previous link.
Optional but recommended:
Eclipse (J2EE version). A good IDE can make you more productive. In the lab we prefer to use Eclipse (the J2EE version). Some parts of the tutorials will be Eclipse specific.
IMPORTANT: You may also need to add
javac to the system path variable (Google 'Change environment variables' for your specific OS). Add the 'bin' folder of your JDK installation (i.e.
C:\Program Files\Java\jdk1.7.0_67\bin) to your
PATH. You may need to restart running applications for it to take effect.
Forking Projects and Checking Out the Code
If you are going to author new wrappers or write code for BigSemantics, but don't have push permission to BigSemantics, you will need to fork the following projects before checking out the code. Otherwise, skip to the subsection of checking out the latest code.
BigSemantics is the umbrella project that pulls all the other components in one environment.
BigSemanticsWrapperRepository contains all the wrappers and generated metadata classes.
BigSemanticsJava contains the basic BigSemantics architecture. If you want to change or extend the BigSemantics architecture, such as changing the way wrappers are handled or adding a completely new method to extract metadata, you will develop code in this Git repository.
BigSemanticsService contains code for the BigSemantics web service.
BigSemanticsCSharp supports building C# applications that use BigSemnatics.
Checking Out the Latest Code
After forking those projects, you can get all the components of BigSemantics by:
git clone --recursive http://github.com/<your-github-account-name>/BigSemantics
After authoring new wrappers or developing new features for BigSemantics, use pull requests to share your changes with us and all the other BigSemantics users.
If you do not need to push or already have push permission to BigSemantics, you can check out code using:
git clone --recursive http://github.com/ecologylab/BigSemantics
Importing Projects Into Eclipse
If you use Eclipse, you also want to import the various projects into an Eclipse workspace. This can be done by the following steps:
Start Eclipse, and choose a folder as the workspace for BigSemantics. We suggest that you use a different folder than the folder that contains the code.
Select 'File' -> 'Import', and then select 'Existing Projects into Workspace' from 'General' in the tree view. Then click 'Next'.
In the newly opened dialog, choose 'Select root directory', and use 'Browse' to select the BigSemantics folder you cloned from GitHub. Projects in that folder should be listed.
To get started, we recommend including all projects, except
BigSemanticsAndroidSpecificsif you are not using BigSemantics on Android devices.
Click 'Finish'. The projects should show in Eclipse's Project Explorer. It may take a minute for Eclipse to compile all the projects.
Authoring Wrappers and Seeing Changes in Action
BigSemantics provides convenience tools for you to see changes you made in action while you author new or edit existing wrappers.
The basic workflow is through Eclipse:
- Set up the development environment, as previously explained in this Wiki page.
- Author new wrappers or make changes to existing wrappers.
- Stop any running BigSemantics service process. If the process was started in Eclipse, you can just press the red Stop button in Eclipse's Console view.
- In Eclipse, run
BigSemanticsSDK/CompileToJava.launch, by right click on the file and choose 'Run As' -> 'Java Application'.
BigSemanticsGeneratedClassesJavaby right clicking that project in Eclipse and choose Refresh. (This step is necessary because the previous step may change files in
BigSemanticsGeneratedClassesJava, but in many cases Eclipse fails to automatically detect these changes in file system.)
BigSemanticsService/RunService.launch. A new BigSemantics service process will be spawned, and you should be able to see its output in Eclipse's Console view.
To see extraction results for your targeted page:
- Install our Chrome extension, and in its Options, enable the Developer Mode. The extension then enables seeing semantics extracted from the current page by clicking on the button in the address bar. Note that you will need to change the options of the extension to make it use the local instance of BigSemantics service: use
localhostas the host name,
8080as the port number, and
8443as the secure port number.
- Alternatively, you can point your browser to
http://localhost:8080/static/houseMICE, which shows the MICE interface. You can input the URL to the targeted page into the interface to see extraction results. This applies to most static websites.
If you don't want to use Eclipse, or you want to run the service on a server, you can use the Ant target
BigSemanticsService/build.xml. The steps are similar to above, except that you run the Ant target instead of running the two launch files.
Committing Your Work
Once you have finished wrapper authoring or feature development, you should commit and push your work. You will need to do so for all the affected submodules, as well as the umbrella BigSemantics Git repository. Submodules are 'pointers' to specific commits of the corresponding Git repository, instead of tracking the latest code. Thus, we recommend you follow these steps to make sure that your work is not lost:
Commit your work in corresponding submodule. For example, if you authored new wrappers, your modifications will be in the project BigSemanticsWrappers which belongs to the BigSemanticsWrapperRepository submodule.
Since submodule does not track any specific branch, your work is not in a detached commit. You should create a local branch to point to this commit, to prevent losing it:
git checkout -b <branch-name>
We recommend a meaningful branch name, such as 'wrapper_for_urbanspoon' or 'cache_bug_fix'.
Commit your work to the local branch.
Switch to the master branch, pull down the latest code, and merge in the local branch you just created. You might need to merge conflicts if any.
Push the master branch of the submodule to Github.
Delete the local branch created before:
git branch -d <branch-name>
- Go to the umbrella BigSemantics folder. If you do
git status, you should be able to see the updated submodule with comments like 'new commit'. Include all updated submodules:
git add -u
Then commit and push. This will update the BigSemantics repository to point to the right commits for submodules.
If you are working with your fork of BigSemantics, use pull requests to share your changes with us and all the other BigSemantics users.
To update BigSemantics and all submodules to latest code, within the umbrella BigSemantics folder run:
git pull git submodule update --recursive