Skip to content
Integrating loosely structured data into the Linked Open Data cloud - A 2011 DataONE Summer Internship project.
Java PHP CSS
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data
doc
java
ld4d1
repository_rdf
updates
LODPoster36x48.pptx
README

README

This is the repository for artifacts (source code, documentation, RDF and other data output, etc) from the 2011 DataONE Summer Internship project "Integrating loosely structured data into the Linked Open Data cloud".

The intern is Aída Gándara (University of Texas at El Paso) and the mentor is Hilmar Lapp (NESCent).

Information on this research effort can be found at http://notebooks.dataone.org/linked-data/

The current RDF data is being stored at http://rio.cs.utep.edu/ciserver/ciprojects/sdata/...  and they can be viewed through access the CI-Server LOD4DataONE project at http://rio.cs.utep.edu/ciserver/ciserver/viewresource/LOD4DataONE.  Multiple servers were used due to PHP version issues that were encountered when the ARC2 library was added to the demonstration.

Sample queries resulting from this work, as well as the use cases that directed this work, can be found at http://manaus.cs.utep.edu/lod4d1.

IN THIS GITHUB REPOSITORY, YOU WILL FIND:

ld4d1 : this directory has module code that can be loaded onto a Drupal server.  A local version of ARC2, https://github.com/semsol/arc2/wiki, is expected in the libs/semsol-arc2-495d10b directory.  Modify the include lines if ARC2 is installed elsewhere. There are 3 files in the module:
	ld4d1_functions.inc
	ld4d1.info
	ld4d1.install
	ld4d1.module
	

	The module can be installed on any Drupal 6 install over PHP 5.2.  Place the ld4d1 in the modules directory and enable it through the admin pages.
The database settings will need to be modified in the ld4d1_reset_page() and ld4d1_queryview() functions within the ld4d1_functions.inc file.  Set these to the appropriate settings for accessing the ARC2 database. 

	The ld4d1_reset_page() function will also need to be modified to load the RDF used by the query view.  Currently, the files loaded are located on a UTEP CI-Server called rio, this would be modified if the files are placed elsewhere.

java  : this directory has the java files are used to extract data from the repositories and create RDF that is subsequently uploaded to a server.  This code was built in Eclipse.  Open Eclipse, create a Java project and load the 7 files.  mainC has the main() function:
     D1DryadOAIPMHMapper.java
     D1KNBMetacatMapper.java
     D1ORNLDAACMapper.java
     DryadDataSet.java
     KNBDataSet.java
     mainC.java
     build.xml

	The build.xml file is an Ant script with instructions on creating the javadoc.

	To upload to a CI-Server, as occurs in the main() function, you will need an account on a CI-Server (Drupal) implementation and the ciclient.jar found at http://rio.cs.utep.edu/ciserver/CI-Downloads.  Go to http://rio.cs.utep.edu/ciserver/AboutUs if you need more information on using CI-Server.  Otherwise, just replace the code in the mainC.java file to put the content where you need it.

repository_rdf : this directory holds the RDF files that were uploaded to the server and are loaded into the ARC2 repository for the demonstration.  Their URIs are specific to the path on the CI-Server where they were uploaded and accessed from.  You will need to modify the path as needed if they are published elsewhere.  Since most of the queries do not specifically search the repository properties, e.g., they search more popular properties like Dublin-Core or DBpedia, most queries should not be affected by a move.  Some will be though.

doc : this directory has javadoc descriptions of the java code.  Open index.html to see it.

data : the main() function refers to several static files that are located in a data directory.  This directory has those files, they were created manually to support the demonstration.  These are uploaded to a server within the main() function which looks in c:/data for the files, modify the reference if the files will be placed elsewhere.

updates : this directory has two powerpoint slideshows.  These were created in an effort share the progress of this research effort.  I am not sure how useful they are as they were created rather quickly and there was never any feedback.  The two files are:
	LOD4DataONEWeek3EX.ppsx
	LOD4DataONEWeek5.pptx
You can’t perform that action at this time.