IQSS/6829 - account for file failures #6857

qqmyers · 2020-04-24T21:40:21Z

What this PR does / why we need it: This PR tracks failures in direct uploads and adds them to the total count of processed files so that direct upload correctly completes with the remaining (successfully uploaded) files

Which issue(s) this PR closes:

Closes #6829

Special notes for your reviewer: Not sure if this closes the issue or not - it does not address the underlying possibility for Amazon AWS's 'eventual consistency' to take longer than 1 minute to recognized creation of new files (so Dataverse can find it, get its filesize, etc.). Assuming that's not the new norm, this PR just makes sure that when errors occur, the overall upload, which counts uploads to see when it's finished the total number of files, counts failures so that it successfully completes the upload. That happening should make the successful files appear in the new panel of uploaded files (as with normal uploads), after which one should be able to save the dataset. The failure to finish an upload at all was reported for dataset create mode, but I think it was possible in both create and edit modes.

Suggestions on how to test this: Could be hard to test if AWS isn't slow. I edited the fileuploads.js script and added a random failure when directUploadFinished() was called (some fraction of the time, it called uploadFailure() instead of continuing. That caused the processing to hang with some files not processed, which is now fixed by the PR since it calls directUploadFinished at the end of its processing now. In my fake test, I didn't send values for the params required by uploadFailure so I bypassed the steps that find the error code and display them.) I can provide edits to that file if someone wants to replicate this.

Does this PR introduce a user interface change?: no

Is there a release notes update needed for this change?: no

Additional documentation: FWIW: I also removed the unused datasetId variable - that was originally used but is only available for edit mode, so I ended up working around it to handle create mode and it just hadn't been deleted.

and the dataset pid being set multiple times.

for directUpload case

used in DatasetPage and editFilesFragment.xhtml

kcondon · 2020-04-30T18:58:58Z

@qqmyers This is still failing for me on create.
All three files showed 100% completion but stuck with progress bars full.
Server log showed 4 nullptr lines but no stack trace. Browser console showed upload complete for 3 files.
I will do some more testing to see how stable the failure is.

Server log:
[2020-04-30T18:55:37.438+0000] [Payara 5.201] [SEVERE] [] [javax.enterprise.resource.webcontainer.jsf.context] [tid: _ThreadID=89 _ThreadName=http-thread-pool::jk-connector(3)] [timeMillis: 1588272937438] [levelValue: 1000] [[
java.lang.NullPointerException
]]

[2020-04-30T18:55:37.467+0000] [Payara 5.201] [SEVERE] [] [javax.enterprise.resource.webcontainer.jsf.context] [tid: _ThreadID=87 _ThreadName=http-thread-pool::jk-connector(1)] [timeMillis: 1588272937467] [levelValue: 1000] [[
java.lang.NullPointerException
]]

[2020-04-30T18:55:37.680+0000] [Payara 5.201] [SEVERE] [] [javax.enterprise.resource.webcontainer.jsf.context] [tid: _ThreadID=91 _ThreadName=http-thread-pool::jk-connector(5)] [timeMillis: 1588272937680] [levelValue: 1000] [[
java.lang.NullPointerException
]]

[2020-04-30T18:55:37.822+0000] [Payara 5.201] [SEVERE] [] [javax.enterprise.resource.webcontainer.jsf.context] [tid: _ThreadID=89 _ThreadName=http-thread-pool::jk-connector(3)] [timeMillis: 1588272937822] [levelValue: 1000] [[
java.lang.NullPointerException
]]

Browser console:
jquery.js.xhtml?ln=primefaces&v=8.0:2 [Deprecation] Synchronous XMLHttpRequest on the main thread is deprecated because of its detrimental effects to the end user's experience. For more help, check https://xhr.spec.whatwg.org/.
send @ jquery.js.xhtml?ln=primefaces&v=8.0:2
VM573:1 dropins.js included more than once
(anonymous) @ VM573:1
VM559:84 Uploading 50by1000 (5) (7).dta as s31://iqssqa:171cc71b7eb-42d44a4e5a3c to https://iqssqa.s3.amazonaws.com/10.70122/FK2/ZHBPGQ/171cc71b7eb-42d44a4e5a3c?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20200430T185536Z&X-Amz-SignedHeaders=host%3Bx-amz-tagging&X-Amz-Expires=3599&X-Amz-Credential=AKIAIODRE3USX25R5IAA%2F20200430%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=4f623b798e59472ec701c79c34b2d8326cfb2ed8ac7be15a98c2409fdcb4ca1d
VM559:84 Uploading cheese1 (1) (1) (1) (1) (1) (1).jpg as s31://iqssqa:171cc71b7ee-8970de1db79f to https://iqssqa.s3.amazonaws.com/10.70122/FK2/ZHBPGQ/171cc71b7ee-8970de1db79f?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20200430T185536Z&X-Amz-SignedHeaders=host%3Bx-amz-tagging&X-Amz-Expires=3599&X-Amz-Credential=AKIAIODRE3USX25R5IAA%2F20200430%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=0dfbcd6cb06819ab45e391c632095ae4c38abe27ae0bc5aa97051afa4ce27d2a
VM559:84 Uploading file1 (6) (1) (1) (1) (1) (2).txt as s31://iqssqa:171cc71b7e6-c9489c97b84e to https://iqssqa.s3.amazonaws.com/10.70122/FK2/ZHBPGQ/171cc71b7e6-c9489c97b84e?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20200430T185536Z&X-Amz-SignedHeaders=host%3Bx-amz-tagging&X-Amz-Expires=3600&X-Amz-Credential=AKIAIODRE3USX25R5IAA%2F20200430%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=fc9c5e4b1b80bce7e653a2e8deb555a8b832a749254ad1495447715a0ca338e2
VM559:139 S3 Upload complete for file1 (6) (1) (1) (1) (1) (2).txt : s31://iqssqa:171cc71b7e6-c9489c97b84e
VM559:139 S3 Upload complete for cheese1 (1) (1) (1) (1) (1) (1).jpg : s31://iqssqa:171cc71b7ee-8970de1db79f
VM559:219 0 : 1 : 3 : 0
VM559:219 0 : 2 : 3 : 0
VM559:139 S3 Upload complete for 50by1000 (5) (7).dta : s31://iqssqa:171cc71b7eb-42d44a4e5a3c
VM559:219 0 : 3 : 3 : 0

qqmyers · 2020-04-30T19:16:29Z

Hmm - the last console line with 3:3 in the middle shows that all three files are uploading successfully, which was not the case in the original bug. One other thing to check is to see in the browser dev pane what calls are being made in the 'Network' panel. After the calls to upload the files to S3, there should be three calls to EditFilesPage.handleExternalUpload(). If they are failing, the error code and response might help debug.

FWIW: I did not have trouble with an EC2 instance spun up with the ansible script, which I think is/was still glassfish. If you're on payara, the NPEs in the log could be related to that - just did a quick check on the web and found some mentions. If that's the problem, we might want a new bug and some help from people more familiar with the latest payara/J2EE/JSF updates.

kcondon · 2020-04-30T19:19:35Z

@qqmyers I will recheck the browser, network log.
Meanwhile, I uploaded 3 small files, all appeared with 0% progress and hung there. This error was in the logs:
[2020-04-30T19:16:11.804+0000] [Payara 5.201] [WARNING] [] [javax.enterprise.resource.webcontainer.jsf.lifecycle] [tid: _ThreadID=96 _ThreadName=http-thread-pool::jk-connector(1)] [timeMillis: 1588274171804] [levelValue: 900] [[
#{EditDatafilesPage.requestDirectUploadUrl()}: java.lang.IllegalArgumentException: profile file cannot be null
javax.faces.FacesException: #{EditDatafilesPage.requestDirectUploadUrl()}: java.lang.IllegalArgumentException: profile file cannot be null
at com.sun.faces.application.ActionListenerImpl.getNavigationOutcome(ActionListenerImpl.java:96)
at com.sun.faces.application.ActionListenerImpl.processAction(ActionListenerImpl.java:71)
at javax.faces.component.UICommand.broadcast(UICommand.java:222)
at javax.faces.component.UIViewRoot.broadcastEvents(UIViewRoot.java:847)
at javax.faces.component.UIViewRoot.processApplication(UIViewRoot.java:1396)
at com.sun.faces.lifecycle.InvokeApplicationPhase.execute(InvokeApplicationPhase.java:58)
at com.sun.faces.lifecycle.Phase.doPhase(Phase.java:76)
at com.sun.faces.lifecycle.LifecycleImpl.execute(LifecycleImpl.java:177)
at javax.faces.webapp.FacesServlet.executeLifecyle(FacesServlet.java:707)
at javax.faces.webapp.FacesServlet.service(FacesServlet.java:451)
at org.apache.catalina.core.StandardWrapper.service(StandardWrapper.java:1636)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:331)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:211)
at org.ocpsoft.rewrite.servlet.RewriteFilter.doFilter(RewriteFilter.java:226)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:253)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:211)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:257)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:161)
at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:757)
at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:577)
at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:99)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:158)
at org.apache.catalina.connector.CoyoteAdapter.doService(CoyoteAdapter.java:371)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:238)
at com.sun.enterprise.v3.services.impl.ContainerMapper$HttpHandlerCallable.call(ContainerMapper.java:520)
at com.sun.enterprise.v3.services.impl.ContainerMapper.service(ContainerMapper.java:217)
at org.glassfish.grizzly.http.server.HttpHandler.runService(HttpHandler.java:182)
at org.glassfish.grizzly.http.server.HttpHandler.doHandle(HttpHandler.java:156)
at org.glassfish.grizzly.http.server.HttpServerFilter.handleRead(HttpServerFilter.java:218)
at org.glassfish.grizzly.filterchain.ExecutorResolver$9.execute(ExecutorResolver.java:95)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:260)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:177)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:109)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:88)
at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:53)
at org.glassfish.grizzly.nio.transport.TCPNIOTransport.fireIOEvent(TCPNIOTransport.java:524)
at org.glassfish.grizzly.strategies.AbstractIOStrategy.fireIOEvent(AbstractIOStrategy.java:89)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.run0(WorkerThreadIOStrategy.java:94)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.access$100(WorkerThreadIOStrategy.java:33)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy$WorkerThreadRunnable.run(WorkerThreadIOStrategy.java:114)
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:569)
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:549)
at java.lang.Thread.run(Thread.java:748)
Caused by: javax.faces.el.EvaluationException: java.lang.IllegalArgumentException: profile file cannot be null
at com.sun.faces.application.MethodBindingMethodExpressionAdapter.invoke(MethodBindingMethodExpressionAdapter.java:76)
at com.sun.faces.application.ActionListenerImpl.getNavigationOutcome(ActionListenerImpl.java:82)
... 42 more
Caused by: java.lang.IllegalArgumentException: profile file cannot be null
at com.amazonaws.util.ValidationUtils.assertNotNull(ValidationUtils.java:37)
at com.amazonaws.auth.profile.ProfilesConfigFile.(ProfilesConfigFile.java:142)
at com.amazonaws.auth.profile.ProfilesConfigFile.(ProfilesConfigFile.java:133)
at com.amazonaws.auth.profile.ProfilesConfigFile.(ProfilesConfigFile.java:100)
at com.amazonaws.auth.profile.ProfileCredentialsProvider.getCredentials(ProfileCredentialsProvider.java:135)
at com.amazonaws.services.s3.AmazonS3Client.generatePresignedUrl(AmazonS3Client.java:2888)
at edu.harvard.iq.dataverse.dataaccess.S3AccessIO.generateTemporaryS3UploadUrl(S3AccessIO.java:901)
at edu.harvard.iq.dataverse.EditDatafilesPage.requestDirectUploadUrl(EditDatafilesPage.java:1755)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at javax.el.ELUtil.invokeMethod(ELUtil.java:227)
at javax.el.BeanELResolver.invoke(BeanELResolver.java:440)
at javax.el.CompositeELResolver.invoke(CompositeELResolver.java:198)
at com.sun.el.parser.AstValue.invoke(AstValue.java:257)
at com.sun.el.MethodExpressionImpl.invoke(MethodExpressionImpl.java:237)
at org.jboss.weld.module.web.util.el.ForwardingMethodExpression.invoke(ForwardingMethodExpression.java:40)
at org.jboss.weld.module.web.el.WeldMethodExpression.invoke(WeldMethodExpression.java:50)
at com.sun.faces.facelets.el.TagMethodExpression.invoke(TagMethodExpression.java:65)
at com.sun.faces.application.MethodBindingMethodExpressionAdapter.invoke(MethodBindingMethodExpressionAdapter.java:66)
... 43 more
]]

qqmyers · 2020-04-30T19:24:00Z

That error means your server can't find the Amazon profile file. That could be because you don't have one or the Dataverse server is running as a different Unix user than you normally use, etc.

kcondon · 2020-04-30T20:21:09Z

@qqmyers I am seeing some weird behavior. Things are not working at this point. I uploaded 3 small files, then on save, got an exception:
[2020-04-30T20:16:39.689+0000] [Payara 5.201] [INFO] [] [edu.harvard.iq.dataverse.DatasetPage] [tid: _ThreadID=99 _ThreadName=http-thread-pool::jk-connector(5)] [timeMillis: 1588277799689] [levelValue: 800] [[
New host dataverse id: 34]]

[2020-04-30T20:17:11.704+0000] [Payara 5.201] [WARNING] [] [edu.harvard.iq.dataverse.ingest.IngestServiceBean] [tid: _ThreadID=97 _ThreadName=http-thread-pool::jk-connector(3)] [timeMillis: 1588277831704] [levelValue: 900] [[
Failed to get file size, storage id s31://iqssqa:171ccbc4a12-1f854cc66b3f (Cannot get S3 object 10.70122/FK2/9YWPKH/171ccbc4a12-1f854cc66b3f (Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; Request ID: 3E8333F8EFDABA3C; S3 Extended Request ID: Of7d5GZ+XRKp48SotPCzyQ9rgJ8hR5v/YfLWGXm7gFHAOnDX2JlbdkhqYfIXIxX3h/J8bIG6R1I=)))]]

[2020-04-30T20:17:11.805+0000] [Payara 5.201] [WARNING] [] [edu.harvard.iq.dataverse.ingest.IngestServiceBean] [tid: _ThreadID=97 _ThreadName=http-thread-pool::jk-connector(3)] [timeMillis: 1588277831805] [levelValue: 900] [[
Failed to get file size, storage id s31://iqssqa:171ccbc4a17-8d5c81f5bb01 (Cannot get S3 object 10.70122/FK2/9YWPKH/171ccbc4a17-8d5c81f5bb01 (Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; Request ID: 042A9AC9F0B6657B; S3 Extended Request ID: vCl9vkvPnGH2uNmoNRNhzMAUXJ8BtNjE5v4mhC1zxQFpkukhrZJ2Cc0H30S46BYfr4zF0ezia70=)))]]

[2020-04-30T20:17:11.848+0000] [Payara 5.201] [WARNING] [] [edu.harvard.iq.dataverse.ingest.IngestServiceBean] [tid: _ThreadID=97 _ThreadName=http-thread-pool::jk-connector(3)] [timeMillis: 1588277831848] [levelValue: 900] [[
Failed to get file size, storage id s31://iqssqa:171ccbc4a15-04d2363c831e (Cannot get S3 object 10.70122/FK2/9YWPKH/171ccbc4a15-04d2363c831e (Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; Request ID: 692111E9AAE1F925; S3 Extended Request ID: z1hC93iqpqdZ7DS7efTFHggueT/27cimFCzAXkyx4CAB7M31ex7A2AcQucF5QrWjXxatRFqDnFo=)))]]

[2020-04-30T20:17:12.645+0000] [Payara 5.201] [SEVERE] [] [javax.enterprise.resource.webcontainer.jsf.application] [tid: _ThreadID=96 _ThreadName=http-thread-pool::jk-connector(2)] [timeMillis: 1588277832645] [levelValue: 1000] [[
Error Rendering View[/dataset.xhtml]
javax.el.ELException: /filesFragment.xhtml @414,187 value=" - #{fileMetadata.dataFile.friendlySize}": java.lang.NullPointerException
at com.sun.faces.facelets.el.TagValueExpression.getValue(TagValueExpression.java:77)
at javax.faces.component.ComponentStateHelper.eval(ComponentStateHelper.java:170)
at javax.faces.component.ComponentStateHelper.eval(ComponentStateHelper.java:157)
at javax.faces.component.UIOutput.getValue(UIOutput.java:140)
at com.sun.faces.renderkit.html_basic.HtmlBasicInputRenderer.getValue(HtmlBasicInputRenderer.java:181)
at com.sun.faces.renderkit.html_basic.HtmlBasicRenderer.getCurrentValue(HtmlBasicRenderer.java:328)
at com.sun.faces.renderkit.html_basic.HtmlBasicRenderer.encodeEnd(HtmlBasicRenderer.java:143)
at javax.faces.component.UIComponentBase.encodeEnd(UIComponentBase.java:595)
at javax.faces.component.UIComponent.encodeAll(UIComponent.java:1654)
at org.primefaces.component.column.Column.renderChildren(Column.java:78)
at org.primefaces.component.datatable.DataTableRenderer.encodeCell(DataTableRenderer.java:1455)
at org.primefaces.component.datatable.DataTableRenderer.encodeRow(DataTableRenderer.java:1373)
at org.primefaces.component.datatable.DataTableRenderer.encodeRows(DataTableRenderer.java:1269)
at org.primefaces.component.datatable.DataTableRenderer.encodeTbody(DataTableRenderer.java:1196)
at org.primefaces.component.datatable.DataTableRenderer.encodeTbody(DataTableRenderer.java:1145)
at org.primefaces.component.datatable.DataTableRenderer.encodeRegularTable(DataTableRenderer.java:432)
at org.primefaces.component.datatable.DataTableRenderer.encodeMarkup(DataTableRenderer.java:363)
at org.primefaces.component.datatable.DataTableRenderer.encodeEnd(DataTableRenderer.java:95)
at javax.faces.component.UIComponentBase.encodeEnd(UIComponentBase.java:595)
at javax.faces.component.UIComponent.encodeAll(UIComponent.java:1654)
at javax.faces.component.UIComponent.encodeAll(UIComponent.java:1650)
at org.primefaces.component.tabview.TabViewRenderer.encodeTabContent(TabViewRenderer.java:306)
at org.primefaces.component.tabview.TabViewRenderer.lambda$encodeContents$1(TabViewRenderer.java:277)
at org.primefaces.component.tabview.TabView.forEachTab(TabView.java:186)
at org.primefaces.component.tabview.TabViewRenderer.encodeContents(TabViewRenderer.java:275)
at org.primefaces.component.tabview.TabViewRenderer.encodeMarkup(TabViewRenderer.java:144)
at org.primefaces.component.tabview.TabViewRenderer.encodeEnd(TabViewRenderer.java:83)
at javax.faces.component.UIComponentBase.encodeEnd(UIComponentBase.java:595)
at com.sun.faces.renderkit.html_basic.HtmlBasicRenderer.encodeRecursive(HtmlBasicRenderer.java:286)
at com.sun.faces.renderkit.html_basic.PassthroughRenderer.encodeChildren(PassthroughRenderer.java:80)
at javax.faces.component.UIComponentBase.encodeChildren(UIComponentBase.java:566)
at javax.faces.component.UIComponent.encodeAll(UIComponent.java:1647)
at javax.faces.render.Renderer.encodeChildren(Renderer.java:152)
at javax.faces.component.UIComponentBase.encodeChildren(UIComponentBase.java:566)
at javax.faces.component.UIComponent.encodeAll(UIComponent.java:1647)
at javax.faces.component.UIComponent.encodeAll(UIComponent.java:1650)
at javax.faces.component.UIComponent.encodeAll(UIComponent.java:1650)
at com.sun.faces.application.view.FaceletViewHandlingStrategy.renderView(FaceletViewHandlingStrategy.java:468)
at com.sun.faces.application.view.MultiViewHandler.renderView(MultiViewHandler.java:170)
at javax.faces.application.ViewHandlerWrapper.renderView(ViewHandlerWrapper.java:132)
at org.ocpsoft.rewrite.faces.RewriteViewHandler.renderView(RewriteViewHandler.java:196)
at javax.faces.application.ViewHandlerWrapper.renderView(ViewHandlerWrapper.java:132)
at com.sun.faces.lifecycle.RenderResponsePhase.execute(RenderResponsePhase.java:102)
at com.sun.faces.lifecycle.Phase.doPhase(Phase.java:76)
at com.sun.faces.lifecycle.LifecycleImpl.render(LifecycleImpl.java:199)
at javax.faces.webapp.FacesServlet.executeLifecyle(FacesServlet.java:708)
at javax.faces.webapp.FacesServlet.service(FacesServlet.java:451)
at org.apache.catalina.core.StandardWrapper.service(StandardWrapper.java:1636)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:331)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:211)
at org.ocpsoft.rewrite.servlet.RewriteFilter.doFilter(RewriteFilter.java:226)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:253)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:211)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:257)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:161)
at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:757)
at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:577)
at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:99)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:158)
at org.apache.catalina.connector.CoyoteAdapter.doService(CoyoteAdapter.java:371)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:238)
at com.sun.enterprise.v3.services.impl.ContainerMapper$HttpHandlerCallable.call(ContainerMapper.java:520)
at com.sun.enterprise.v3.services.impl.ContainerMapper.service(ContainerMapper.java:217)
at org.glassfish.grizzly.http.server.HttpHandler.runService(HttpHandler.java:182)
at org.glassfish.grizzly.http.server.HttpHandler.doHandle(HttpHandler.java:156)
at org.glassfish.grizzly.http.server.HttpServerFilter.handleRead(HttpServerFilter.java:218)
at org.glassfish.grizzly.filterchain.ExecutorResolver$9.execute(ExecutorResolver.java:95)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:260)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:177)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:109)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:88)
at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:53)
at org.glassfish.grizzly.nio.transport.TCPNIOTransport.fireIOEvent(TCPNIOTransport.java:524)
at org.glassfish.grizzly.strategies.AbstractIOStrategy.fireIOEvent(AbstractIOStrategy.java:89)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.run0(WorkerThreadIOStrategy.java:94)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.access$100(WorkerThreadIOStrategy.java:33)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy$WorkerThreadRunnable.run(WorkerThreadIOStrategy.java:114)
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:569)
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:549)
at java.lang.Thread.run(Thread.java:748)
Caused by: javax.el.ELException: java.lang.NullPointerException
at javax.el.BeanELResolver.getValue(BeanELResolver.java:304)
at com.sun.faces.el.DemuxCompositeELResolver._getValue(DemuxCompositeELResolver.java:156)
at com.sun.faces.el.DemuxCompositeELResolver.getValue(DemuxCompositeELResolver.java:184)
at com.sun.el.parser.AstValue.getValue(AstValue.java:114)
at com.sun.el.parser.AstValue.getValue(AstValue.java:177)
at com.sun.el.parser.AstDeferredExpression.getValue(AstDeferredExpression.java:39)
at com.sun.el.parser.AstCompositeExpression.getValue(AstCompositeExpression.java:44)
at com.sun.el.ValueExpressionImpl.getValue(ValueExpressionImpl.java:183)
at org.jboss.weld.module.web.el.WeldValueExpression.getValue(WeldValueExpression.java:50)
at com.sun.faces.facelets.el.TagValueExpression.getValue(TagValueExpression.java:73)
... 79 more
Caused by: java.lang.NullPointerException
at edu.harvard.iq.dataverse.DataFile.getFriendlySize(DataFile.java:565)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at javax.el.BeanELResolver.getValue(BeanELResolver.java:299)
... 88 more
]]

qqmyers · 2020-04-30T20:38:09Z

The stack trace about file size is caused by the failure to get the file metadata from S3. It will break displaying that dataset unless you add non-null file size in the db. I think you can still delete the dataset though.

It's less clear why you can't see the files on S3. The New host dataverse id line suggest that you changed the host dataverse at some point which will change the DOI and file paths. Can you check S3 and see if the files are just at the wrong path? If that's the case, and you changed the dataverse after uploading files, the fix might be to clear out uploaded files if you change dataverses (which could change the storage driver used too). Or we could keep the DOI when recreating the Dataset.

kcondon · 2020-04-30T21:32:17Z

@qqmyers In the case of that one failed dataset, the directory does not exists on s3. I have been able to upload on create in a root directory. Just tried again in a sub directory, newly created, all 3 files appeared to complete upload but only appeared at bottom with a stack trace in logs:

[2020-04-30T21:27:58.800+0000] [Payara 5.201] [SEVERE] [] [javax.enterprise.resource.webcontainer.jsf.context] [tid: _ThreadID=97 _ThreadName=http-thread-pool::jk-connector(3)] [timeMillis: 1588282078800] [levelValue: 1000] [[
java.lang.NullPointerException
at javax.faces.component.ComponentStateHelper.restoreState(ComponentStateHelper.java:294)
at javax.faces.component.UIComponentBase.restoreState(UIComponentBase.java:1249)
at javax.faces.component.UIOutput.restoreState(UIOutput.java:243)
at javax.faces.component.UIInput.restoreState(UIInput.java:1512)
at com.sun.faces.application.view.FaceletPartialStateManagementStrategy$2.visit(FaceletPartialStateManagementStrategy.java:372)
at com.sun.faces.component.visit.FullVisitContext.invokeVisitCallback(FullVisitContext.java:127)
at javax.faces.component.UIComponent.visitTree(UIComponent.java:1456)
at javax.faces.component.UIComponent.visitTree(UIComponent.java:1468)
at javax.faces.component.UIComponent.visitTree(UIComponent.java:1468)
at javax.faces.component.UIComponent.visitTree(UIComponent.java:1468)
at javax.faces.component.UIComponent.visitTree(UIComponent.java:1468)
at javax.faces.component.UIForm.visitTree(UIForm.java:355)
at javax.faces.component.UIComponent.visitTree(UIComponent.java:1468)
at javax.faces.component.UIComponent.visitTree(UIComponent.java:1468)
at com.sun.faces.application.view.FaceletPartialStateManagementStrategy.restoreView(FaceletPartialStateManagementStrategy.java:358)
at com.sun.faces.application.StateManagerImpl.restoreView(StateManagerImpl.java:113)
at com.sun.faces.application.view.ViewHandlingStrategy.restoreView(ViewHandlingStrategy.java:99)
at com.sun.faces.application.view.FaceletViewHandlingStrategy.restoreView(FaceletViewHandlingStrategy.java:272)
at com.sun.faces.application.view.MultiViewHandler.restoreView(MultiViewHandler.java:133)
at javax.faces.application.ViewHandlerWrapper.restoreView(ViewHandlerWrapper.java:101)
at org.omnifaces.viewhandler.RestorableViewHandler.restoreView(RestorableViewHandler.java:66)
at org.ocpsoft.rewrite.faces.RewriteViewHandler.restoreView(RewriteViewHandler.java:102)
at javax.faces.application.ViewHandlerWrapper.restoreView(ViewHandlerWrapper.java:101)
at com.sun.faces.lifecycle.RestoreViewPhase.execute(RestoreViewPhase.java:181)
at com.sun.faces.lifecycle.Phase.doPhase(Phase.java:76)
at com.sun.faces.lifecycle.RestoreViewPhase.doPhase(RestoreViewPhase.java:110)
at com.sun.faces.lifecycle.LifecycleImpl.execute(LifecycleImpl.java:177)
at javax.faces.webapp.FacesServlet.executeLifecyle(FacesServlet.java:707)
at javax.faces.webapp.FacesServlet.service(FacesServlet.java:451)
at org.apache.catalina.core.StandardWrapper.service(StandardWrapper.java:1636)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:331)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:211)
at org.ocpsoft.rewrite.servlet.RewriteFilter.doFilter(RewriteFilter.java:226)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:253)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:211)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:257)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:161)
at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:757)
at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:577)
at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:99)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:158)
at org.apache.catalina.connector.CoyoteAdapter.doService(CoyoteAdapter.java:371)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:238)
at com.sun.enterprise.v3.services.impl.ContainerMapper$HttpHandlerCallable.call(ContainerMapper.java:520)
at com.sun.enterprise.v3.services.impl.ContainerMapper.service(ContainerMapper.java:217)
at org.glassfish.grizzly.http.server.HttpHandler.runService(HttpHandler.java:182)
at org.glassfish.grizzly.http.server.HttpHandler.doHandle(HttpHandler.java:156)
at org.glassfish.grizzly.http.server.HttpServerFilter.handleRead(HttpServerFilter.java:218)
at org.glassfish.grizzly.filterchain.ExecutorResolver$9.execute(ExecutorResolver.java:95)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:260)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:177)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:109)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:88)
at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:53)
at org.glassfish.grizzly.nio.transport.TCPNIOTransport.fireIOEvent(TCPNIOTransport.java:524)
at org.glassfish.grizzly.strategies.AbstractIOStrategy.fireIOEvent(AbstractIOStrategy.java:89)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.run0(WorkerThreadIOStrategy.java:94)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.access$100(WorkerThreadIOStrategy.java:33)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy$WorkerThreadRunnable.run(WorkerThreadIOStrategy.java:114)
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:569)
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:549)
at java.lang.Thread.run(Thread.java:748)
]]

I need to step out for a couple hours but it seems that I'm getting wildly different results. I'll try to retest, being very methodical but aside from that one time starting payara as root rather than dataverse user, everything else was straightforward uploads.

qqmyers · 2020-04-30T21:47:22Z

It's hard to tell but I'd suspect that you've identified two new issues: changing the dataverse when creating a dataset breaks direct file uploads (only if they are started prior to the dataverse change?), and payara breaks direct uploads. If this PR solves the problem on glassfish for cases when the dataverse doesn't change, it fixes an issue (or at least an issue since I was able to see a problem on glassfish when I didn't change the dataverse).

kcondon · 2020-05-01T02:40:38Z

@qqmyers I'm not sure about changing the host dv. I did not do that. What has happened and is a UX issue is the focus is on the host dv text box on create dataset page load and often, when trying to enter the title, it alters the host dv name, which you then need to edit and reselect to be the correct name. This happens before any data or files are uploaded and I believe I simply reselected the original dv so not sure why this would be triggered.

kcondon · 2020-05-01T15:39:00Z

@qqmyers OK, have tried testing more carefully, recording results. Here is the list of tests and results. I'll send you the stack traces outside of this ticket since they're large:

new user, dataset in root, direct s3, 1 file on create
Worked, no errors
new user, dataset in root, direct s3, 3 small files on create
all 3 files hung at upload with 100% progress bar
new user, sub dv, direct s3, 1 sm file on create
worked
new user, sub dv, direct s3, 3 sm files on created
test1 worked all filex
test2 failed, hung on file upload, all 3 files showing 0% complete, browser log errors, no server log error:
test3 worked, all filex
test4 worked, server error
test5 failed, all three files show 100% progress, hung here, browser log shows files uploaded, server log shows stack trace
new user, sub dv, accidentally change host dv, reselect same, 1 file on create
fails: file uploads, apparently, when save dataset, see exception in ui, stack trace in logs:
new user, root, 3 sm file, after dataset create
test1 works
test2 works
new user, sub dv, indirect s3, 3 sm files, on create
works
new user, sub dv, direct s3, accidentally alter dv, reselect, 3 sm files after create
works
new user, sub dv, local file, 3 sm files on create
works

qqmyers · 2020-05-01T17:02:51Z

From looking in the logs:
#2 looks like it has #6371 and the Payara NPE from javax.enterprise.resource.webcontainer.jsf.context.
$4 test 2 - looks like your network changed during the calls (temporary drop? switch to VPN?). This is something that I think we should be able to handle - just uploaded commits to handle the NPE and (correctly) assign it as a network error that should be displayed.
#4 test 4 - same Payara NPE from javax.enterprise.resource.webcontainer.jsf.context
#4 test 5 - same Payara NPE from javax.enterprise.resource.webcontainer.jsf.context
#5 - consistent with our earlier discussion - file was uploaded to a path corresponding to the old ID and can't be found. I've added a commit that keeps the existing GlobalId upon Dataverse change to avoid losing the file in cases where the storage driver remains the same. If the new Dataverse has a different driver, various other bad things could happen (depending on whether the drivers share a bucket, or one is a file driver, etc.), so it may be worth another issue for that (with the question of whether already uploaded files should just be deleted, or moved, or the problem avoided by stopping file upload during create, etc.)
Note that this is then compounded by #6343 as the subsequent call to get the file size fails and breaks the display.

I'd suggest we break the Payara issue out into a new issue, and probably create one to track changing between dataverses with different stores. If we want to fix/work-around #6371, I can submit a PR that reverts to using older code that doesn't randomly fail - we use that on QDR.

W.r.t. this PR, I think it does fix the race condition I saw, and now catches network errors that don't even return a status=0 on their own, and handles ~accidental toggling between dataverses. I'm not sure how you test the network error unless you pull the plug, but testing on glassfish and trying the toggle to another dataverse and back should be possible. If we want to wait on this one until the Payara issue is fixed, that's OK with me - I guess it depends on whether the things I've fixed are being seen/are a priority that would result in a pre-Payara patch.

kcondon · 2020-05-01T20:51:33Z

This fixes some failing cases but will break out payara direct s3 upload on ds create as a separate ticket.

qqmyers added 2 commits April 24, 2020 17:23

IQSS/6829 - account for file failures

5083d44

remove ~duplicate code

df1deea

landreev self-requested a review April 27, 2020 14:58

landreev self-assigned this Apr 27, 2020

qqmyers added 4 commits April 27, 2020 13:03

IQSS-6829 - avoid race with 2+ files uploading

c20c052

and the dataset pid being set multiple times.

IQSS/6829 - create mode uses DatasetPage - initialize identifier there

8788052

for directUpload case

move directUploadEnabled to systemconfig

669ecab

used in DatasetPage and editFilesFragment.xhtml

typo/fix method calls

3c82dd2

landreev approved these changes Apr 28, 2020

View reviewed changes

djbrooke unassigned landreev Apr 28, 2020

kcondon self-assigned this Apr 30, 2020

kcondon assigned qqmyers and unassigned kcondon May 1, 2020

qqmyers added 3 commits May 1, 2020 12:11

handle net:ERR_NETWORK_CHANGED errors w/o 'undefined' error

dea0906

remove draft code from other branch

8e649ce

handle dataverse change - keep GlobalId in direct upload case

35b38e0

kcondon assigned kcondon and unassigned qqmyers May 1, 2020

kcondon merged commit b0868df into IQSS:develop May 1, 2020

djbrooke added this to the Dataverse 5 milestone May 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IQSS/6829 - account for file failures #6857

IQSS/6829 - account for file failures #6857

qqmyers commented Apr 24, 2020

kcondon commented Apr 30, 2020 •

edited

Loading

qqmyers commented Apr 30, 2020

kcondon commented Apr 30, 2020

qqmyers commented Apr 30, 2020

kcondon commented Apr 30, 2020

qqmyers commented Apr 30, 2020 •

edited

Loading

kcondon commented Apr 30, 2020

qqmyers commented Apr 30, 2020

kcondon commented May 1, 2020

kcondon commented May 1, 2020 •

edited

Loading

qqmyers commented May 1, 2020

kcondon commented May 1, 2020

IQSS/6829 - account for file failures #6857

IQSS/6829 - account for file failures #6857

Conversation

qqmyers commented Apr 24, 2020

kcondon commented Apr 30, 2020 • edited Loading

qqmyers commented Apr 30, 2020

kcondon commented Apr 30, 2020

qqmyers commented Apr 30, 2020

kcondon commented Apr 30, 2020

qqmyers commented Apr 30, 2020 • edited Loading

kcondon commented Apr 30, 2020

qqmyers commented Apr 30, 2020

kcondon commented May 1, 2020

kcondon commented May 1, 2020 • edited Loading

qqmyers commented May 1, 2020

kcondon commented May 1, 2020

kcondon commented Apr 30, 2020 •

edited

Loading

qqmyers commented Apr 30, 2020 •

edited

Loading

kcondon commented May 1, 2020 •

edited

Loading