Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DS-1226] Batch import from major bibliographic formats #46

Merged
merged 2 commits into from Sep 22, 2012

Conversation

EKT
Copy link

@EKT EKT commented Aug 7, 2012

EKT's extension to DSpace to support batch import from major bibliographic formats

https://jira.duraspace.org/browse/DS-1226

@ghost
Copy link

ghost commented Aug 13, 2012

Hi all, this would be a great contribution, just a few observations for anyone else reviewing the code based on a very quick initial look...

  1. Uses Spring directly rather than via dspace-services. Not a big deal this could be easily refactored if required.
  2. Introduces a new dspace-api/lib directory for the two new jars. I think we would prefer to stick them in our own Maven space if they are not in any other Maven repo.
  3. The mapping config files that map the incoming vales to Dublin Core are in the biblio-transformation jar. What happens if my use of DC is non-standard ? Can I override these files ? Is the source code for this jar available anywhere ?

Cheers, Robin.

@ghost
Copy link

ghost commented Aug 13, 2012

Just another wee observation, looking into the biblio-transformation-engine.jar I notice that the pom has a number of dependencies so of which contain variables eg...

uk.ac.shef.wit simmetrics 1.0 system ${basedir}/lib/simmetrics_v1_6_2_ekt.jar

Can these be removed ?

Cheers.

@ghost
Copy link

ghost commented Aug 13, 2012

Doh ! The XML notation is not apparent in that last comment. Imagine those are the values in a typical maven dependency.

@ghost
Copy link

ghost commented Aug 13, 2012

Found the source code for the biblio-transformation-engine.jar. Should have read the Jira record first :)

@ghost
Copy link

ghost commented Aug 15, 2012

Copied from Jira issue...

Hi Kostas,

I am having a bit of trouble testing this. The error I am getting is ...

java.lang.NoClassDefFoundError: gr/ekt/transformationengine/exceptions/UnknownClassifierException

If I understand correctly then the script 'dspace' uses dspace/lib as its classpath, but the new jars are not in dspace/lib. Have I misunderstood something ?

Thanks, Robin.

@kstamatis
Copy link
Member

Copied from Jira issue...

@robintaylor
Answered my own question by manually copying the missing jars into dspace/lib and its now working fine.

@robintaylor
Hi Kostas,
With regards to the two jars I was wondering if you given any thought to putting them into the Maven Central Repository ?
Cheers, Robin.

@kstamatis
Regarding the problem with the jars, I have added locally the two jars and make pom dependencies get them from the local filesystem. Maybe that is why after the maven build/package command they are not copied to the dspace/lib folder and that is why you get the classnotfound exception. Since you solved the problem, I guess we are ok!
As far as your next comment concerns, the biblio-transformation-engine is a tool that EKT has developed so I am pretty sure that we can add it to the Maven Central Repository. The other jar, the jbibtex jar, is a 3rd party jar that I do not know if we can add it to the maven repo. If it is a problem, we can remove it totally from the biblio-transformation-engine or rewrite some part of it on our own in order not to have any dependencies with 3rd party jars.

@kstamatis
Copy link
Member

Dear Robin,

regarding your comment above for the dependencies of the biblio-transformation-engine the answer is yes, the simmetric library can be removed with no other side-effects.

There is also a second dependency that contains variables, the jbibtex jar, but this cannot be removed if we want to support the batch import from a BibTex file. However, I guess, this jar can be hosted in some Maven repo, or not?

Thanks a lot for your time and interest in this dspace contribution

@kstamatis
Copy link
Member

Hi Robin,

just to add a comment for anyone else reviewing the code, regarding part 3 of your initial comment.

The configuration files (spring-based xml config files) that affect the operation of the biblio-transformation-engine are located in the config folder of the DSpace project and not in the biblio-transformation-engine jar file. Within these files, someone can define the mapping from the input format to the DC metadata schema of DSpace (even if the latter is a non-standard DC schema).
Depending on the input format these files are named as follows: config/spring-bibtex2dspace.xml, config/spring-csv2dspace.xml, config/spring-tsv2dspace.xml, config/spring-ris2dspace.xml, config/spring-endnote2dspace.xml

Thanks a lot!

@abollini
Copy link
Member

please check the comment on the jira issue:
https://jira.duraspace.org/browse/DS-1226?focusedCommentId=25944#comment-25944

@kstamatis
Copy link
Member

@mdiggory

Dear Mark,

Thank you for your comments (that now are lost - I am sorry, after the new squashing I did, I deleted the old branch which seems removed your comments).
I changed the logic for the Biblio-Transformation-Engine import procedure and now I utilize a DataLoaderService to load all possible dataloaders (ours + use specific) and given the input key, the appropriate data loaded is loaded.
With this extension, as you said, users can now specify their own custom data loaders and just add an entry in the Spring configuration file in order to utilize them

Regarding you comment in yesterday's DevMtg:

[20:22] I am happy to see the servicemanager support. I'm still seeing use of package names that are not org.dspace

I cannot see any packages named "gr.ekt" since, for this extension we didn't write our own classes, just added code in the ItemImport.java class DSpace already had. Please, verify that this is true.

Thus, after the new commit, I also fetched the new version from DSpace master, so no conflicts will appear when merging this contribution.

Thanks a lot,

Kostas

mdiggory added a commit that referenced this pull request Sep 22, 2012
[DS-1226] Batch import from major bibliographic formats
@mdiggory mdiggory merged commit afc4b09 into DSpace:master Sep 22, 2012
@kutsurak kutsurak deleted the DS-1226 branch February 13, 2014 13:56
mdiggory referenced this pull request in atmire/DSpace Jun 13, 2014
[DS-1226] Batch import from major bibliographic formats
alanorth referenced this pull request in alanorth/DSpace Dec 11, 2014
Addresses #46.

Signed-off-by: Alan Orth <a.orth@cgiar.org>
alanorth referenced this pull request in alanorth/DSpace Dec 11, 2014
Addresses #46.

Signed-off-by: Alan Orth <a.orth@cgiar.org>
alanorth referenced this pull request in alanorth/DSpace Dec 11, 2014
Addresses #46.

Still to figure out which strings we need to print as "subject" and
which we need to print as "subjects".  From looking at the XMLUI
string names it's not exactly obvious where they will be used, so it
makes it hard to judge what context they will be printed in!

Signed-off-by: Alan Orth <a.orth@cgiar.org>
alanorth referenced this pull request in alanorth/DSpace Dec 11, 2014
Addresses #46.

Signed-off-by: Alan Orth <a.orth@cgiar.org>
hardyoyo pushed a commit to hardyoyo/DSpace that referenced this pull request Oct 14, 2015
…e-autowarming

re DSpace#1501 set autowarmingCount=100 for the search Solr core
hardyoyo added a commit to hardyoyo/DSpace that referenced this pull request Oct 12, 2017
* [VSIM-79] updated Maven overrides for handlebars templates with versions from upstream Mirage2

* [VSIM-79] Wait! Don't override things you haven't changed, that's not how this works!
hardyoyo added a commit to hardyoyo/DSpace that referenced this pull request Dec 15, 2017
* [VSIM-79] updated Maven overrides for handlebars templates with versions from upstream Mirage2

* [VSIM-79] Wait! Don't override things you haven't changed, that's not how this works!
antoine-atmire referenced this pull request in atmire/DSpace Oct 10, 2018
hardyoyo added a commit to hardyoyo/DSpace that referenced this pull request Nov 28, 2018
* [VSIM-79] updated Maven overrides for handlebars templates with versions from upstream Mirage2

* [VSIM-79] Wait! Don't override things you haven't changed, that's not how this works!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants