Skip to content
Integrating loosely structured data into the Linked Open Data cloud - A 2011 DataONE Summer Internship project.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


This is the repository for artifacts (source code, documentation, RDF and other data output, etc) from the 2011 DataONE Summer Internship project "Integrating loosely structured data into the Linked Open Data cloud".

The intern is Aída Gándara (University of Texas at El Paso) and the mentor is Hilmar Lapp (NESCent).

Information on this research effort can be found at

The current RDF data is being stored at  and they can be viewed through access the CI-Server LOD4DataONE project at  Multiple servers were used due to PHP version issues that were encountered when the ARC2 library was added to the demonstration.

Sample queries resulting from this work, as well as the use cases that directed this work, can be found at


ld4d1 : this directory has module code that can be loaded onto a Drupal server.  A local version of ARC2,, is expected in the libs/semsol-arc2-495d10b directory.  Modify the include lines if ARC2 is installed elsewhere. There are 3 files in the module:

	The module can be installed on any Drupal 6 install over PHP 5.2.  Place the ld4d1 in the modules directory and enable it through the admin pages.
The database settings will need to be modified in the ld4d1_reset_page() and ld4d1_queryview() functions within the file.  Set these to the appropriate settings for accessing the ARC2 database. 

	The ld4d1_reset_page() function will also need to be modified to load the RDF used by the query view.  Currently, the files loaded are located on a UTEP CI-Server called rio, this would be modified if the files are placed elsewhere.

java  : this directory has the java files are used to extract data from the repositories and create RDF that is subsequently uploaded to a server.  This code was built in Eclipse.  Open Eclipse, create a Java project and load the 7 files.  mainC has the main() function:

	The build.xml file is an Ant script with instructions on creating the javadoc.

	To upload to a CI-Server, as occurs in the main() function, you will need an account on a CI-Server (Drupal) implementation and the ciclient.jar found at  Go to if you need more information on using CI-Server.  Otherwise, just replace the code in the file to put the content where you need it.

repository_rdf : this directory holds the RDF files that were uploaded to the server and are loaded into the ARC2 repository for the demonstration.  Their URIs are specific to the path on the CI-Server where they were uploaded and accessed from.  You will need to modify the path as needed if they are published elsewhere.  Since most of the queries do not specifically search the repository properties, e.g., they search more popular properties like Dublin-Core or DBpedia, most queries should not be affected by a move.  Some will be though.

doc : this directory has javadoc descriptions of the java code.  Open index.html to see it.

data : the main() function refers to several static files that are located in a data directory.  This directory has those files, they were created manually to support the demonstration.  These are uploaded to a server within the main() function which looks in c:/data for the files, modify the reference if the files will be placed elsewhere.

updates : this directory has two powerpoint slideshows.  These were created in an effort share the progress of this research effort.  I am not sure how useful they are as they were created rather quickly and there was never any feedback.  The two files are:
You can’t perform that action at this time.