Skip to content
This repository has been archived by the owner on Nov 18, 2020. It is now read-only.

Doesn't like my config files. #1

Closed
whikloj opened this issue Nov 21, 2016 · 35 comments
Closed

Doesn't like my config files. #1

whikloj opened this issue Nov 21, 2016 · 35 comments

Comments

@whikloj
Copy link
Contributor

whikloj commented Nov 21, 2016

Tried to use this but am having trouble with my Akubra storage config file.

[whikloj@juno]/opt/fcrepo3-rdf-extractor% java -jar target/fcrepo3-rdf-extractor-0.0.1-SNAPSHOT.jar -a /usr/local/fedora/server/config/spring/akubra-llstore.xml -o /local/dam/staging/jareds_triples/juno_20161121.sparql
INFO 15:50:26.012 (edu.si.fcrepo.Extract) Using 4 threads for extraction and a queue size of 1048576.
INFO 15:50:26.018 (edu.si.fcrepo.Extract) Extracting to /local/dam/staging/jareds_triples/juno_20161121.sparql...
INFO 15:50:26.018 (edu.si.fcrepo.Extract) with Akubra configuration from /usr/local/fedora/server/config/spring/akubra-llstore.xml.
INFO 15:50:26.082 (org.springframework.context.support.FileSystemXmlApplicationContext) Refreshing org.springframework.context.support.FileSystemXmlApplicationContext@97e93f1: startup date [Mon Nov 21 15:50:26 GMT-06:00 2016]; root of context hierarchy
INFO 15:50:26.111 (org.springframework.beans.factory.xml.XmlBeanDefinitionReader) Loading XML bean definitions from URL [file:/usr/local/fedora/server/config/spring/akubra-llstore.xml]
INFO 15:50:26.169 (org.springframework.beans.factory.support.DefaultListableBeanFactory) Pre-instantiating singletons in org.springframework.beans.factory.support.DefaultListableBeanFactory@4fa1c212: defining beans [org.fcrepo.server.storage.lowlevel.ILowlevelStorage,org.fcrepo.server.storage.lowlevel.akubra.AkubraLowlevelStorage,objectStore,fsObjectStore,fsObjectStoreMapper,datastreamStore,fsDatastreamStore,fsDatastreamStoreMapper,fedoraStorageHintProvider]; root of factory hierarchy
INFO 15:50:26.170 (org.springframework.beans.factory.support.DefaultListableBeanFactory) Destroying singletons in org.springframework.beans.factory.support.DefaultListableBeanFactory@4fa1c212: defining beans [org.fcrepo.server.storage.lowlevel.ILowlevelStorage,org.fcrepo.server.storage.lowlevel.akubra.AkubraLowlevelStorage,objectStore,fsObjectStore,fsObjectStoreMapper,datastreamStore,fsDatastreamStore,fsDatastreamStoreMapper,fedoraStorageHintProvider]; root of factory hierarchy
Exception in thread "main" org.springframework.beans.factory.CannotLoadBeanClassException: Cannot find class [org.fcrepo.server.storage.lowlevel.akubra.AkubraLowlevelStorageModule] for bean with name 'org.fcrepo.server.storage.lowlevel.ILowlevelStorage' defined in URL [file:/usr/local/fedora/server/config/spring/akubra-llstore.xml]; nested exception is java.lang.ClassNotFoundException: org.fcrepo.server.storage.lowlevel.akubra.AkubraLowlevelStorageModule
        at org.springframework.beans.factory.support.AbstractBeanFactory.resolveBeanClass(AbstractBeanFactory.java:1261)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.predictBeanType(AbstractAutowireCapableBeanFactory.java:575)
        at org.springframework.beans.factory.support.AbstractBeanFactory.isFactoryBean(AbstractBeanFactory.java:1330)
        at org.springframework.beans.factory.support.AbstractBeanFactory.isFactoryBean(AbstractBeanFactory.java:896)
        at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:566)
        at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:895)
        at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:425)
        at org.springframework.context.support.FileSystemXmlApplicationContext.<init>(FileSystemXmlApplicationContext.java:140)
        at org.springframework.context.support.FileSystemXmlApplicationContext.<init>(FileSystemXmlApplicationContext.java:84)
        at edu.si.fcrepo.Extract.init(Extract.java:194)
        at edu.si.fcrepo.Extract.main(Extract.java:157)
Caused by: java.lang.ClassNotFoundException: org.fcrepo.server.storage.lowlevel.akubra.AkubraLowlevelStorageModule
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at org.springframework.util.ClassUtils.forName(ClassUtils.java:257)
        at org.springframework.beans.factory.support.AbstractBeanDefinition.resolveBeanClass(AbstractBeanDefinition.java:408)
        at org.springframework.beans.factory.support.AbstractBeanFactory.doResolveBeanClass(AbstractBeanFactory.java:1282)
        at org.springframework.beans.factory.support.AbstractBeanFactory.resolveBeanClass(AbstractBeanFactory.java:1253)
        ... 10 more
[whikloj@juno]/opt/fcrepo3-rdf-extractor% 

My akubra-llstore.xml -> https://gist.github.com/whikloj/584dea271c6e872e4b3d574676781bcc

@ajs6f
Copy link
Owner

ajs6f commented Nov 21, 2016

@whikloj, this is a semi-known issue that has to do with avoiding classpath problems that arise when pulling in the gargantuan Fedora server classpath. I know the problem and have a fix, which I will get taken care of sometime in the next few days (Tgiving holiday). In the meantime, there is a workaround that @ruebot knows or with which I can help you via IRC in the next day or so. It involves removing all of the org.fcrepo.server.storage beans from a copy of your Akubra config. It will end up looking somewhat like this but with different ID mappers.

@whikloj
Copy link
Contributor Author

whikloj commented Nov 21, 2016

Ok, thanks. I'll bug @ruebot about it tomorrow. No rush.

@ruebot
Copy link

ruebot commented Nov 22, 2016

I got yo back!

/md1200/vol1/fedora_data is your shibboleth.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE beans PUBLIC "-//SPRING//DTD BEAN//EN" "http://www.springframework.org/dtd/spring-beans.dtd">
<beans>


  <bean name="objectStore" class="org.akubraproject.map.IdMappingBlobStore"
    singleton="true">
    <constructor-arg value="urn:example.org:objectStore" />
    <constructor-arg>
      <ref bean="fsObjectStore" />
    </constructor-arg>
    <constructor-arg>
      <ref bean="fsObjectStoreMapper" />
    </constructor-arg>
  </bean>

  <bean name="fsObjectStore" class="org.akubraproject.fs.FSBlobStore"
    singleton="true">
    <constructor-arg value="urn:example.org:fsObjectStore" />
    <constructor-arg value="/md1200/vol1/fedora_data/objectStore"/>
  </bean>

  <bean name="fsObjectStoreMapper"
    class="org.fcrepo.server.storage.lowlevel.akubra.HashPathIdMapper"
    singleton="true">
    <constructor-arg value="##" />
  </bean>

  <bean name="datastreamStore" class="org.akubraproject.map.IdMappingBlobStore"
    singleton="true">
    <constructor-arg value="urn:fedora:datastreamStore" />
    <constructor-arg>
      <ref bean="fsDatastreamStore" />
    </constructor-arg>
    <constructor-arg>
      <ref bean="fsDatastreamStoreMapper" />
    </constructor-arg>
  </bean>

  <bean name="fsDatastreamStore" class="org.akubraproject.fs.FSBlobStore"
    singleton="true">
    <constructor-arg value="urn:example.org:fsDatastreamStore" />
    <constructor-arg value="/md1200/vol1/fedora_data/datastreamStore"/>
  </bean>

  <bean name="fsDatastreamStoreMapper"
    class="org.fcrepo.server.storage.lowlevel.akubra.HashPathIdMapper"
    singleton="true">
    <constructor-arg value="##" />
  </bean>

</beans>

@ruebot
Copy link

ruebot commented Nov 22, 2016

...and @whikloj

java -jar fcrepo3-rdf-extractor-0.0.1-SNAPSHOT.jar -a /usr/local/fedora/server/config/spring/akubra-llstore.xml -o /md1200/vol1/backup/yudl_triples.n3 2>&1 | tee ~/hotindexer.log is what I used.

...and you'll want to use the --graph option as well, and specify a URI for <#ri> which could be something like info:edu.si.fedora#ri according to @ajs6f

@ajs6f
Copy link
Owner

ajs6f commented Nov 22, 2016

If you are expecting to use this with the trippi-sparql connector, then the simplest thing to do graphname-wise is exactly what @ruebot writes. I need to document what's going on there better. (Short story: the <#ri> URI that Fedora uses by default is relative-- it's illegal to have a relative URI in that slot, but that wasn't clear years ago.)

@ajs6f
Copy link
Owner

ajs6f commented Nov 22, 2016

@whikloj I have a much simpler workaround to try: please try adding a single attribute default-lazy-init="true" to the top-level beans element in your Akubra config file. This will work fine with both your repo, and the hot indexer.

@whikloj
Copy link
Contributor Author

whikloj commented Nov 22, 2016

Cool, I'm just rebuilding some derivatives and then I'll give this a try.

@whikloj
Copy link
Contributor Author

whikloj commented Nov 24, 2016

Ok so the problem in my akubra-llstore.xml still exists, adding default-lazy-init="true" just hid it. I added a logback.xml with root at DEBUG and got the following log.
rdf-extractor.log.

I'm trying @ruebot's file example as I think the <bean name="org.fcrepo.server.storage.lowlevel.ILowlevelStorage" and <bean name="org.fcrepo.server.storage.lowlevel.akubra.AkubraLowlevelStorage" are the problem.

@ajs6f
Copy link
Owner

ajs6f commented Nov 24, 2016

@ruebot's example should certainly work, but it is odd that the default-lazy-init="true" thing works for me and not you. Can I see your Akubra file?

@ajs6f
Copy link
Owner

ajs6f commented Nov 24, 2016

Wait, I think you are wrong-- it is not failing, because you are getting to here. I think you are fine. You are just seeing warnings, not errors. Are you getting triples?

@whikloj
Copy link
Contributor Author

whikloj commented Nov 24, 2016

@ajs6f TRIPLES!!!!

@ajs6f
Copy link
Owner

ajs6f commented Nov 24, 2016

I'll see what I can do to hide those annoying and confusing stacktraces. Meanwhile, enjoy your Usan Thanksgiving triples.

@whikloj
Copy link
Contributor Author

whikloj commented Nov 24, 2016

So this is working with @ruebot's modified akubra-llstore.xml. When its done I'll try running it against my original file to see if I just needed to wait a little bit more for it to start processing.

@whikloj
Copy link
Contributor Author

whikloj commented Nov 25, 2016

My run finally completed, I will try to start it again using the original akubra-llstore.xml.

My quad file contains 121,819,261 lines (or quads), but a count query of my entire Mulgara has 125,262,308 which leaves 3,443,047 not accounted for.

Is it possible that there are internal triples that would not be persisted on the object in the filesystem?

@ajs6f
Copy link
Owner

ajs6f commented Nov 25, 2016

It's not obvious that there would be any such triples. My first guess would be that some objects or datastreams weren't readable at the moment that mattered. Can you check the content of the difference by diffing the output of the hot indexer against a complete NQuads dump of Mulgara (you will need to sort them first)? I appreciate that so doing will take a lot of time and computation, but hopefully not too much? I'd like to know what the actual differences are before theorizing.

@whikloj
Copy link
Contributor Author

whikloj commented Nov 25, 2016

I'm not sure I can get NQuads from Mulgara...checking into that.

But you are right it took a little but using the default-lazy-init="true" in my normal akubra-llstore.xml did work.

@ajs6f
Copy link
Owner

ajs6f commented Nov 25, 2016

Okay, to the latter, good, I will update the README to that effect and it will doubtless help others.

To the former, you can always dump NTriples out of <#ri> and use shell commands to add the fourth field.

@ajs6f
Copy link
Owner

ajs6f commented Nov 25, 2016

I am currently testing the new "avoid piling up URIs in a list" commits and I will let you know as soon as I am confident in them.

@whikloj
Copy link
Contributor Author

whikloj commented Nov 25, 2016

Yeah, there are commands to do a backup of Mulgara, but they require access to the server. I'm looking in the fcrepo3 code but I don't know that either a) mulgara is running separately or b) if it is that the server is exposed at all.

I wanted to try this, and I got the client library but I need the host:port to connect to.

I tried doing a query and I have a choice of xml and json. So it would require exporting it all, then transforming it all into n-quads, then sorting, then comparing.

So this might take some time.

@ajs6f
Copy link
Owner

ajs6f commented Nov 25, 2016

You should be able to use a query at the /risearch endpoint to do this much more easily, and directly in the right format:

https://wiki.duraspace.org/display/FEDORA38/Resource+Index+Search#ResourceIndexSearch-ResponseFormatsresponse-formats

@whikloj
Copy link
Contributor Author

whikloj commented Nov 25, 2016

@ajs6f++

Why is Fedora's Mulgara documentation better than Mulgara's own?! Crazy, this is working. I'll start it now.

@ajs6f
Copy link
Owner

ajs6f commented Nov 25, 2016

Well, Fedora 3 remained under maintenance for years after Mulgara wasn't, so that's probably got somewhat to do with it.

@ajs6f
Copy link
Owner

ajs6f commented Nov 26, 2016

Okay, @whikloj , I've committed the new streaming code. Please try it out-- it should get rid of that annoying delay before triples start arriving. Although it won't do anything for your slow storage....

@whikloj
Copy link
Contributor Author

whikloj commented Nov 26, 2016

Ok it took a bit but I have an N-Quad file of all my triples from Mulgara, then I sorted both files (at 17GB a piece that took some time and space).

I couldn't use diff on this machine so I'm going to try moving it to another server and execute it there.

For now I will say it is obvious there is stuff in Mulgara that is not in the rdf-extractor output.

Simple head output for each sorted file shows.

[whikloj@juno]/var/indexes/triples% head juno_sorted.nq 
<info:fedora/changeme:13746/DC> <info:fedora/fedora-system:def/model#state> <info:fedora/fedora-system:def/model#Active> <info:ca.umanitoba.fedora#ri> .
<info:fedora/changeme:13746/DC> <info:fedora/fedora-system:def/view#disseminationType> <info:fedora/*/DC> <info:ca.umanitoba.fedora#ri> .
<info:fedora/changeme:13746/DC> <info:fedora/fedora-system:def/view#isVolatile> "false" <info:ca.umanitoba.fedora#ri> .
<info:fedora/changeme:13746/DC> <info:fedora/fedora-system:def/view#lastModifiedDate> "2015-06-11T15:45:32.275Z"^^<http://www.w3.org/2001/XMLSchema#dateTime> <info:ca.umanitoba.fedora#ri> .
<info:fedora/changeme:13746/DC> <info:fedora/fedora-system:def/view#mimeType> "text/xml" <info:ca.umanitoba.fedora#ri> .
<info:fedora/changeme:13746> <http://islandora.ca/ontology/relsext#generate_ocr> "TRUE" <info:ca.umanitoba.fedora#ri> .
<info:fedora/changeme:13746> <http://islandora.ca/ontology/relsext#isPageNumber> "12" <info:ca.umanitoba.fedora#ri> .
<info:fedora/changeme:13746> <http://islandora.ca/ontology/relsext#isPageOf> <info:fedora/changeme:13745> <info:ca.umanitoba.fedora#ri> .
<info:fedora/changeme:13746> <http://islandora.ca/ontology/relsext#isSection> "1" <info:ca.umanitoba.fedora#ri> .
<info:fedora/changeme:13746> <http://islandora.ca/ontology/relsext#isSequenceNumber> "12" <info:ca.umanitoba.fedora#ri> .

versus

[whikloj@juno]/var/indexes/triples% head mulgara_sorted.nq 
<info:fedora/alan:testObject2/DC> <info:fedora/fedora-system:def/model#state> <info:fedora/fedora-system:def/model#Active> <info:ca.umanitoba.fedora#ri> .
<info:fedora/alan:testObject2/DC> <info:fedora/fedora-system:def/view#disseminationType> <info:fedora/*/DC> <info:ca.umanitoba.fedora#ri> .
<info:fedora/alan:testObject2/DC> <info:fedora/fedora-system:def/view#isVolatile> "false" <info:ca.umanitoba.fedora#ri> .
<info:fedora/alan:testObject2/DC> <info:fedora/fedora-system:def/view#lastModifiedDate> "2012-07-26T19:04:56.856Z"^^<http://www.w3.org/2001/XMLSchema#dateTime> <info:ca.umanitoba.fedora#ri> .
<info:fedora/alan:testObject2/DC> <info:fedora/fedora-system:def/view#mimeType> "text/xml" <info:ca.umanitoba.fedora#ri> .
<info:fedora/alan:testObject2> <http://purl.org/dc/elements/1.1/identifier> "alan:testObject2" <info:ca.umanitoba.fedora#ri> .
<info:fedora/alan:testObject2> <http://purl.org/dc/elements/1.1/title> "Alan's Test Object2" <info:ca.umanitoba.fedora#ri> .
<info:fedora/alan:testObject2> <info:fedora/fedora-system:def/model#createdDate> "2012-07-26T19:04:56.856Z"^^<http://www.w3.org/2001/XMLSchema#dateTime> <info:ca.umanitoba.fedora#ri> .
<info:fedora/alan:testObject2> <info:fedora/fedora-system:def/model#hasModel> <info:fedora/fedora-system:FedoraObject-3.0> <info:ca.umanitoba.fedora#ri> .
<info:fedora/alan:testObject2> <info:fedora/fedora-system:def/model#label> "Alan's Test Object2" <info:ca.umanitoba.fedora#ri> .

What is confusing me is where did Mulgara get <info:fedora/alan:testObject2> from? I rebuilt this index from the filesystem using the stock indexer only about 2 weeks ago. Weird.

@ajs6f
Copy link
Owner

ajs6f commented Nov 26, 2016

Did you clean out Mulgara before reindexing into it?

@ajs6f
Copy link
Owner

ajs6f commented Nov 26, 2016

Actually, looks like you might be okay using embedded Mulgara in particular: https://github.com/fcrepo3/fcrepo/blob/master/fcrepo-server/src/main/java/org/fcrepo/server/resourceIndex/ResourceIndexRebuilder.java#L191

@ajs6f
Copy link
Owner

ajs6f commented Nov 26, 2016

Can you verify that the directory containing Mulgara's data was created at the datetime of your last full rebuild?

@whikloj
Copy link
Contributor Author

whikloj commented Nov 26, 2016

Yeah I remember it says that it is cleaning it out and the directory was created on October 28. I thought it was more recent but that is probably correct.

@whikloj
Copy link
Contributor Author

whikloj commented Nov 26, 2016

I'm scanning the objectStore for a file starting with info%3Afedora%2Falan%3A* to see if anything exists (that is easy to find).

I'm gonna try writing a little python script program to compare the files line by line and create a less memory intensive (but probably time intensive) diff, first gotta do some weekend stuff. I'll check back later.

@ajs6f
Copy link
Owner

ajs6f commented Nov 26, 2016

Well, is it a problem with the rebuilder or the hot indexer? In other words, are those extra triples actually generated from real objects, or not? E.g. is there a alan:testObject2 in the repo?

@whikloj
Copy link
Contributor Author

whikloj commented Nov 26, 2016 via email

@ajs6f
Copy link
Owner

ajs6f commented Nov 26, 2016

No, you are right about that. I know it must be a large file, but can I get access to the log of your hot indexer run somewhere?

@ajs6f
Copy link
Owner

ajs6f commented Nov 26, 2016

Actually, @whikloj , can you close this ticket (because we got the prob with your conf file resolved, at least to first order) and open a new one specifically about the missed objects?

@whikloj
Copy link
Contributor Author

whikloj commented Nov 26, 2016

Absolutely

@whikloj whikloj closed this as completed Nov 26, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants