The biblio transformation engine (BTE) is a java framework developed by the Hellenic National Documentation Centre (EKT) and consists of programmatic APIs for filtering and modifying records that are retrieved from various types of data sources (eg. databases, files, legacy data sources) as well as for outputing them in appropriate standards formats (eg. database files, txt, xml, Excel). The framework includes independent abstract modules that are executed separately, offering in many cases alternative choices to the user depending of the input data set, the transformation workflow that needs to be executed and the output format that needs to be generated.
The biblio-transformation-engine is included in the core distribution of the DSpace digital repository platform since version 3.0 and forms the basis of the batch import feature of DSpace (i.e. importing records of various standards bibliographic formats into DSpace repositories).
Version 0.9 is a major redesign of the library, that makes it easier for the user to add DataLoaders, Modifiers, Filters and OutputGenerators and offers greater control over the transformation process.
If you need an older version, it can be found here.
How to use the library
Biblio transformation engine uses the apache maven tool for building and dependency resolution. The library is split in two modules: bte-core, that includes the basic infrastructure of the BTE, and bte-io that includes some DataLoaders and OutputGenerators.
Getting the jar from the central maven repository
Include in the dependencies section of the pom.xml of your project the following:
<dependency> <groupId>gr.ekt.bte</groupId> <artifactId>bte-core</artifactId> <version>0.9.3.5</version> </dependency>
If you want to use some of the loaders or the generators in bte-io you will need to add the following as well:
<dependency> <groupId>gr.ekt.bte</groupId> <artifactId>bte-io</artifactId> <version>0.9.3.5</version> </dependency>
Building from source
Clone this git repository, and use maven install to add bte in your local maven repository:
git clone https://github.com/EKT/Biblio-Transformation-Engine.git cd Biblio-Transformation-Engine mvn package mvn install
After that you can add the xml snippets mentioned above, in your project’s pom.xml to make the library API available to your classes.
Most external dependencies come from the bte-io project. If you provide your own data loaders and output generators, you will not need them. In any case Maven handles the dependencies, so probably you will not need to concern yourself with them, but they are recorded here for reference.
One notable exception is that you will need the Java environment 1.7 or newer.
BTE parent dependencies (inherited by child projects)
- log4j version 1.2.17 (http://logging.apache.org/log4j/2.x/)
- junit version 4.11 (http://junit.org/)
- dom4j version 1.6.1 (http://dom4j.sourceforge.net/)
BTE io dependencies
- opencsv version 2.3 (http://opencsv.sourceforge.net/)
- google gson version 2.2.4 (https://code.google.com/p/google-gson/)
- jbibtex version 1.0.10 (https://code.google.com/p/java-bibtex/)
- oai4j version 0.6b1 (http://oai4j-client.sourceforge.net/)
See file ./LICENSE.txt
- version 0.9
- Major redesign of the library. Check the documentation (still in progress) for more detailed info.
- version 0.9.1
- Added an Endnote data loader
- Fixed DSpaceOutputGenerator bug: not producing empty .xml files if there are no data to write in them.
- version 0.9.2
- Added OAI-PMH dataloader
- Added an XML namespace manager for use with XPathRecord (for the moment it is hardcoded for the oai_dc and dc namespaces returned from OAI requests).
- Added the method getFields in the Record interface and implemented it in XPathRecord and MapRecord.
- Fixed DSpaceOutputGenerator bug: Consecutive calls to writeOutput, do not overwrite previously written data anymore.
- version 0.9.3
- Add Excel data loader
- Add MultiSource data loader
- Automatically install sources and javadocs in the local repository when issuing mvn install
- Bugfixes for OAI-PMH data loader
- version 0.9.3.1
- Fixed bug in OAIPMHDataLoader
- Fixed bug in ExcelDataLoader
- version 0.9.3.2
- Added ScreenOutputGenerator useful for debugging
- Updated jbibtex library version to 1.0.10
- Fixed bug in CSVDataLoader
- version 0.9.3.3
- Fixed a bug in CSVDataLoader
- version 0.9.3.4
- Added support for DSpace collections in DSpaceOutputGenerator
- DSpaceOutputGenerator no longer prints empty fields
- Fix issue with loops when requesting more data to process
- version 0.9.3.5
- Corrected bug in setting end_of_input in TransformationEngine.java
- Fixed bug in DSpaceOutputGenerator to not write empty values