Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IQSS/6829 - account for file failures #6857

Merged
merged 9 commits into from
May 1, 2020

Conversation

qqmyers
Copy link
Member

@qqmyers qqmyers commented Apr 24, 2020

What this PR does / why we need it: This PR tracks failures in direct uploads and adds them to the total count of processed files so that direct upload correctly completes with the remaining (successfully uploaded) files

Which issue(s) this PR closes:

Closes #6829

Special notes for your reviewer: Not sure if this closes the issue or not - it does not address the underlying possibility for Amazon AWS's 'eventual consistency' to take longer than 1 minute to recognized creation of new files (so Dataverse can find it, get its filesize, etc.). Assuming that's not the new norm, this PR just makes sure that when errors occur, the overall upload, which counts uploads to see when it's finished the total number of files, counts failures so that it successfully completes the upload. That happening should make the successful files appear in the new panel of uploaded files (as with normal uploads), after which one should be able to save the dataset. The failure to finish an upload at all was reported for dataset create mode, but I think it was possible in both create and edit modes.

Suggestions on how to test this: Could be hard to test if AWS isn't slow. I edited the fileuploads.js script and added a random failure when directUploadFinished() was called (some fraction of the time, it called uploadFailure() instead of continuing. That caused the processing to hang with some files not processed, which is now fixed by the PR since it calls directUploadFinished at the end of its processing now. In my fake test, I didn't send values for the params required by uploadFailure so I bypassed the steps that find the error code and display them.) I can provide edits to that file if someone wants to replicate this.

Does this PR introduce a user interface change?: no

Is there a release notes update needed for this change?: no

Additional documentation: FWIW: I also removed the unused datasetId variable - that was originally used but is only available for edit mode, so I ended up working around it to handle create mode and it just hadn't been deleted.

@landreev landreev self-requested a review April 27, 2020 14:58
@landreev landreev self-assigned this Apr 27, 2020
@kcondon kcondon self-assigned this Apr 30, 2020
@kcondon
Copy link
Contributor

kcondon commented Apr 30, 2020

@qqmyers This is still failing for me on create.
All three files showed 100% completion but stuck with progress bars full.
Server log showed 4 nullptr lines but no stack trace. Browser console showed upload complete for 3 files.
I will do some more testing to see how stable the failure is.

Server log:
[2020-04-30T18:55:37.438+0000] [Payara 5.201] [SEVERE] [] [javax.enterprise.resource.webcontainer.jsf.context] [tid: _ThreadID=89 _ThreadName=http-thread-pool::jk-connector(3)] [timeMillis: 1588272937438] [levelValue: 1000] [[
java.lang.NullPointerException
]]

[2020-04-30T18:55:37.467+0000] [Payara 5.201] [SEVERE] [] [javax.enterprise.resource.webcontainer.jsf.context] [tid: _ThreadID=87 _ThreadName=http-thread-pool::jk-connector(1)] [timeMillis: 1588272937467] [levelValue: 1000] [[
java.lang.NullPointerException
]]

[2020-04-30T18:55:37.680+0000] [Payara 5.201] [SEVERE] [] [javax.enterprise.resource.webcontainer.jsf.context] [tid: _ThreadID=91 _ThreadName=http-thread-pool::jk-connector(5)] [timeMillis: 1588272937680] [levelValue: 1000] [[
java.lang.NullPointerException
]]

[2020-04-30T18:55:37.822+0000] [Payara 5.201] [SEVERE] [] [javax.enterprise.resource.webcontainer.jsf.context] [tid: _ThreadID=89 _ThreadName=http-thread-pool::jk-connector(3)] [timeMillis: 1588272937822] [levelValue: 1000] [[
java.lang.NullPointerException
]]

Browser console:
jquery.js.xhtml?ln=primefaces&v=8.0:2 [Deprecation] Synchronous XMLHttpRequest on the main thread is deprecated because of its detrimental effects to the end user's experience. For more help, check https://xhr.spec.whatwg.org/.
send @ jquery.js.xhtml?ln=primefaces&v=8.0:2
VM573:1 dropins.js included more than once
(anonymous) @ VM573:1
VM559:84 Uploading 50by1000 (5) (7).dta as s31://iqssqa:171cc71b7eb-42d44a4e5a3c to https://iqssqa.s3.amazonaws.com/10.70122/FK2/ZHBPGQ/171cc71b7eb-42d44a4e5a3c?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20200430T185536Z&X-Amz-SignedHeaders=host%3Bx-amz-tagging&X-Amz-Expires=3599&X-Amz-Credential=AKIAIODRE3USX25R5IAA%2F20200430%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=4f623b798e59472ec701c79c34b2d8326cfb2ed8ac7be15a98c2409fdcb4ca1d
VM559:84 Uploading cheese1 (1) (1) (1) (1) (1) (1).jpg as s31://iqssqa:171cc71b7ee-8970de1db79f to https://iqssqa.s3.amazonaws.com/10.70122/FK2/ZHBPGQ/171cc71b7ee-8970de1db79f?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20200430T185536Z&X-Amz-SignedHeaders=host%3Bx-amz-tagging&X-Amz-Expires=3599&X-Amz-Credential=AKIAIODRE3USX25R5IAA%2F20200430%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=0dfbcd6cb06819ab45e391c632095ae4c38abe27ae0bc5aa97051afa4ce27d2a
VM559:84 Uploading file1 (6) (1) (1) (1) (1) (2).txt as s31://iqssqa:171cc71b7e6-c9489c97b84e to https://iqssqa.s3.amazonaws.com/10.70122/FK2/ZHBPGQ/171cc71b7e6-c9489c97b84e?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20200430T185536Z&X-Amz-SignedHeaders=host%3Bx-amz-tagging&X-Amz-Expires=3600&X-Amz-Credential=AKIAIODRE3USX25R5IAA%2F20200430%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=fc9c5e4b1b80bce7e653a2e8deb555a8b832a749254ad1495447715a0ca338e2
VM559:139 S3 Upload complete for file1 (6) (1) (1) (1) (1) (2).txt : s31://iqssqa:171cc71b7e6-c9489c97b84e
VM559:139 S3 Upload complete for cheese1 (1) (1) (1) (1) (1) (1).jpg : s31://iqssqa:171cc71b7ee-8970de1db79f
VM559:219 0 : 1 : 3 : 0
VM559:219 0 : 2 : 3 : 0
VM559:139 S3 Upload complete for 50by1000 (5) (7).dta : s31://iqssqa:171cc71b7eb-42d44a4e5a3c
VM559:219 0 : 3 : 3 : 0

@qqmyers
Copy link
Member Author

qqmyers commented Apr 30, 2020

Hmm - the last console line with 3:3 in the middle shows that all three files are uploading successfully, which was not the case in the original bug. One other thing to check is to see in the browser dev pane what calls are being made in the 'Network' panel. After the calls to upload the files to S3, there should be three calls to EditFilesPage.handleExternalUpload(). If they are failing, the error code and response might help debug.

FWIW: I did not have trouble with an EC2 instance spun up with the ansible script, which I think is/was still glassfish. If you're on payara, the NPEs in the log could be related to that - just did a quick check on the web and found some mentions. If that's the problem, we might want a new bug and some help from people more familiar with the latest payara/J2EE/JSF updates.

@kcondon
Copy link
Contributor

kcondon commented Apr 30, 2020

@qqmyers I will recheck the browser, network log.
Meanwhile, I uploaded 3 small files, all appeared with 0% progress and hung there. This error was in the logs:
[2020-04-30T19:16:11.804+0000] [Payara 5.201] [WARNING] [] [javax.enterprise.resource.webcontainer.jsf.lifecycle] [tid: _ThreadID=96 _ThreadName=http-thread-pool::jk-connector(1)] [timeMillis: 1588274171804] [levelValue: 900] [[
#{EditDatafilesPage.requestDirectUploadUrl()}: java.lang.IllegalArgumentException: profile file cannot be null
javax.faces.FacesException: #{EditDatafilesPage.requestDirectUploadUrl()}: java.lang.IllegalArgumentException: profile file cannot be null
at com.sun.faces.application.ActionListenerImpl.getNavigationOutcome(ActionListenerImpl.java:96)
at com.sun.faces.application.ActionListenerImpl.processAction(ActionListenerImpl.java:71)
at javax.faces.component.UICommand.broadcast(UICommand.java:222)
at javax.faces.component.UIViewRoot.broadcastEvents(UIViewRoot.java:847)
at javax.faces.component.UIViewRoot.processApplication(UIViewRoot.java:1396)
at com.sun.faces.lifecycle.InvokeApplicationPhase.execute(InvokeApplicationPhase.java:58)
at com.sun.faces.lifecycle.Phase.doPhase(Phase.java:76)
at com.sun.faces.lifecycle.LifecycleImpl.execute(LifecycleImpl.java:177)
at javax.faces.webapp.FacesServlet.executeLifecyle(FacesServlet.java:707)
at javax.faces.webapp.FacesServlet.service(FacesServlet.java:451)
at org.apache.catalina.core.StandardWrapper.service(StandardWrapper.java:1636)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:331)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:211)
at org.ocpsoft.rewrite.servlet.RewriteFilter.doFilter(RewriteFilter.java:226)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:253)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:211)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:257)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:161)
at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:757)
at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:577)
at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:99)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:158)
at org.apache.catalina.connector.CoyoteAdapter.doService(CoyoteAdapter.java:371)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:238)
at com.sun.enterprise.v3.services.impl.ContainerMapper$HttpHandlerCallable.call(ContainerMapper.java:520)
at com.sun.enterprise.v3.services.impl.ContainerMapper.service(ContainerMapper.java:217)
at org.glassfish.grizzly.http.server.HttpHandler.runService(HttpHandler.java:182)
at org.glassfish.grizzly.http.server.HttpHandler.doHandle(HttpHandler.java:156)
at org.glassfish.grizzly.http.server.HttpServerFilter.handleRead(HttpServerFilter.java:218)
at org.glassfish.grizzly.filterchain.ExecutorResolver$9.execute(ExecutorResolver.java:95)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:260)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:177)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:109)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:88)
at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:53)
at org.glassfish.grizzly.nio.transport.TCPNIOTransport.fireIOEvent(TCPNIOTransport.java:524)
at org.glassfish.grizzly.strategies.AbstractIOStrategy.fireIOEvent(AbstractIOStrategy.java:89)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.run0(WorkerThreadIOStrategy.java:94)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.access$100(WorkerThreadIOStrategy.java:33)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy$WorkerThreadRunnable.run(WorkerThreadIOStrategy.java:114)
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:569)
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:549)
at java.lang.Thread.run(Thread.java:748)
Caused by: javax.faces.el.EvaluationException: java.lang.IllegalArgumentException: profile file cannot be null
at com.sun.faces.application.MethodBindingMethodExpressionAdapter.invoke(MethodBindingMethodExpressionAdapter.java:76)
at com.sun.faces.application.ActionListenerImpl.getNavigationOutcome(ActionListenerImpl.java:82)
... 42 more
Caused by: java.lang.IllegalArgumentException: profile file cannot be null
at com.amazonaws.util.ValidationUtils.assertNotNull(ValidationUtils.java:37)
at com.amazonaws.auth.profile.ProfilesConfigFile.(ProfilesConfigFile.java:142)
at com.amazonaws.auth.profile.ProfilesConfigFile.(ProfilesConfigFile.java:133)
at com.amazonaws.auth.profile.ProfilesConfigFile.(ProfilesConfigFile.java:100)
at com.amazonaws.auth.profile.ProfileCredentialsProvider.getCredentials(ProfileCredentialsProvider.java:135)
at com.amazonaws.services.s3.AmazonS3Client.generatePresignedUrl(AmazonS3Client.java:2888)
at edu.harvard.iq.dataverse.dataaccess.S3AccessIO.generateTemporaryS3UploadUrl(S3AccessIO.java:901)
at edu.harvard.iq.dataverse.EditDatafilesPage.requestDirectUploadUrl(EditDatafilesPage.java:1755)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at javax.el.ELUtil.invokeMethod(ELUtil.java:227)
at javax.el.BeanELResolver.invoke(BeanELResolver.java:440)
at javax.el.CompositeELResolver.invoke(CompositeELResolver.java:198)
at com.sun.el.parser.AstValue.invoke(AstValue.java:257)
at com.sun.el.MethodExpressionImpl.invoke(MethodExpressionImpl.java:237)
at org.jboss.weld.module.web.util.el.ForwardingMethodExpression.invoke(ForwardingMethodExpression.java:40)
at org.jboss.weld.module.web.el.WeldMethodExpression.invoke(WeldMethodExpression.java:50)
at com.sun.faces.facelets.el.TagMethodExpression.invoke(TagMethodExpression.java:65)
at com.sun.faces.application.MethodBindingMethodExpressionAdapter.invoke(MethodBindingMethodExpressionAdapter.java:66)
... 43 more
]]

@qqmyers
Copy link
Member Author

qqmyers commented Apr 30, 2020

That error means your server can't find the Amazon profile file. That could be because you don't have one or the Dataverse server is running as a different Unix user than you normally use, etc.

@kcondon
Copy link
Contributor

kcondon commented Apr 30, 2020

@qqmyers I am seeing some weird behavior. Things are not working at this point. I uploaded 3 small files, then on save, got an exception:
[2020-04-30T20:16:39.689+0000] [Payara 5.201] [INFO] [] [edu.harvard.iq.dataverse.DatasetPage] [tid: _ThreadID=99 _ThreadName=http-thread-pool::jk-connector(5)] [timeMillis: 1588277799689] [levelValue: 800] [[
New host dataverse id: 34]]

[2020-04-30T20:17:11.704+0000] [Payara 5.201] [WARNING] [] [edu.harvard.iq.dataverse.ingest.IngestServiceBean] [tid: _ThreadID=97 _ThreadName=http-thread-pool::jk-connector(3)] [timeMillis: 1588277831704] [levelValue: 900] [[
Failed to get file size, storage id s31://iqssqa:171ccbc4a12-1f854cc66b3f (Cannot get S3 object 10.70122/FK2/9YWPKH/171ccbc4a12-1f854cc66b3f (Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; Request ID: 3E8333F8EFDABA3C; S3 Extended Request ID: Of7d5GZ+XRKp48SotPCzyQ9rgJ8hR5v/YfLWGXm7gFHAOnDX2JlbdkhqYfIXIxX3h/J8bIG6R1I=)))]]

[2020-04-30T20:17:11.805+0000] [Payara 5.201] [WARNING] [] [edu.harvard.iq.dataverse.ingest.IngestServiceBean] [tid: _ThreadID=97 _ThreadName=http-thread-pool::jk-connector(3)] [timeMillis: 1588277831805] [levelValue: 900] [[
Failed to get file size, storage id s31://iqssqa:171ccbc4a17-8d5c81f5bb01 (Cannot get S3 object 10.70122/FK2/9YWPKH/171ccbc4a17-8d5c81f5bb01 (Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; Request ID: 042A9AC9F0B6657B; S3 Extended Request ID: vCl9vkvPnGH2uNmoNRNhzMAUXJ8BtNjE5v4mhC1zxQFpkukhrZJ2Cc0H30S46BYfr4zF0ezia70=)))]]

[2020-04-30T20:17:11.848+0000] [Payara 5.201] [WARNING] [] [edu.harvard.iq.dataverse.ingest.IngestServiceBean] [tid: _ThreadID=97 _ThreadName=http-thread-pool::jk-connector(3)] [timeMillis: 1588277831848] [levelValue: 900] [[
Failed to get file size, storage id s31://iqssqa:171ccbc4a15-04d2363c831e (Cannot get S3 object 10.70122/FK2/9YWPKH/171ccbc4a15-04d2363c831e (Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; Request ID: 692111E9AAE1F925; S3 Extended Request ID: z1hC93iqpqdZ7DS7efTFHggueT/27cimFCzAXkyx4CAB7M31ex7A2AcQucF5QrWjXxatRFqDnFo=)))]]

[2020-04-30T20:17:12.645+0000] [Payara 5.201] [SEVERE] [] [javax.enterprise.resource.webcontainer.jsf.application] [tid: _ThreadID=96 _ThreadName=http-thread-pool::jk-connector(2)] [timeMillis: 1588277832645] [levelValue: 1000] [[
Error Rendering View[/dataset.xhtml]
javax.el.ELException: /filesFragment.xhtml @414,187 value=" - #{fileMetadata.dataFile.friendlySize}": java.lang.NullPointerException
at com.sun.faces.facelets.el.TagValueExpression.getValue(TagValueExpression.java:77)
at javax.faces.component.ComponentStateHelper.eval(ComponentStateHelper.java:170)
at javax.faces.component.ComponentStateHelper.eval(ComponentStateHelper.java:157)
at javax.faces.component.UIOutput.getValue(UIOutput.java:140)
at com.sun.faces.renderkit.html_basic.HtmlBasicInputRenderer.getValue(HtmlBasicInputRenderer.java:181)
at com.sun.faces.renderkit.html_basic.HtmlBasicRenderer.getCurrentValue(HtmlBasicRenderer.java:328)
at com.sun.faces.renderkit.html_basic.HtmlBasicRenderer.encodeEnd(HtmlBasicRenderer.java:143)
at javax.faces.component.UIComponentBase.encodeEnd(UIComponentBase.java:595)
at javax.faces.component.UIComponent.encodeAll(UIComponent.java:1654)
at org.primefaces.component.column.Column.renderChildren(Column.java:78)
at org.primefaces.component.datatable.DataTableRenderer.encodeCell(DataTableRenderer.java:1455)
at org.primefaces.component.datatable.DataTableRenderer.encodeRow(DataTableRenderer.java:1373)
at org.primefaces.component.datatable.DataTableRenderer.encodeRows(DataTableRenderer.java:1269)
at org.primefaces.component.datatable.DataTableRenderer.encodeTbody(DataTableRenderer.java:1196)
at org.primefaces.component.datatable.DataTableRenderer.encodeTbody(DataTableRenderer.java:1145)
at org.primefaces.component.datatable.DataTableRenderer.encodeRegularTable(DataTableRenderer.java:432)
at org.primefaces.component.datatable.DataTableRenderer.encodeMarkup(DataTableRenderer.java:363)
at org.primefaces.component.datatable.DataTableRenderer.encodeEnd(DataTableRenderer.java:95)
at javax.faces.component.UIComponentBase.encodeEnd(UIComponentBase.java:595)
at javax.faces.component.UIComponent.encodeAll(UIComponent.java:1654)
at javax.faces.component.UIComponent.encodeAll(UIComponent.java:1650)
at org.primefaces.component.tabview.TabViewRenderer.encodeTabContent(TabViewRenderer.java:306)
at org.primefaces.component.tabview.TabViewRenderer.lambda$encodeContents$1(TabViewRenderer.java:277)
at org.primefaces.component.tabview.TabView.forEachTab(TabView.java:186)
at org.primefaces.component.tabview.TabViewRenderer.encodeContents(TabViewRenderer.java:275)
at org.primefaces.component.tabview.TabViewRenderer.encodeMarkup(TabViewRenderer.java:144)
at org.primefaces.component.tabview.TabViewRenderer.encodeEnd(TabViewRenderer.java:83)
at javax.faces.component.UIComponentBase.encodeEnd(UIComponentBase.java:595)
at com.sun.faces.renderkit.html_basic.HtmlBasicRenderer.encodeRecursive(HtmlBasicRenderer.java:286)
at com.sun.faces.renderkit.html_basic.PassthroughRenderer.encodeChildren(PassthroughRenderer.java:80)
at javax.faces.component.UIComponentBase.encodeChildren(UIComponentBase.java:566)
at javax.faces.component.UIComponent.encodeAll(UIComponent.java:1647)
at javax.faces.render.Renderer.encodeChildren(Renderer.java:152)
at javax.faces.component.UIComponentBase.encodeChildren(UIComponentBase.java:566)
at javax.faces.component.UIComponent.encodeAll(UIComponent.java:1647)
at javax.faces.component.UIComponent.encodeAll(UIComponent.java:1650)
at javax.faces.component.UIComponent.encodeAll(UIComponent.java:1650)
at com.sun.faces.application.view.FaceletViewHandlingStrategy.renderView(FaceletViewHandlingStrategy.java:468)
at com.sun.faces.application.view.MultiViewHandler.renderView(MultiViewHandler.java:170)
at javax.faces.application.ViewHandlerWrapper.renderView(ViewHandlerWrapper.java:132)
at org.ocpsoft.rewrite.faces.RewriteViewHandler.renderView(RewriteViewHandler.java:196)
at javax.faces.application.ViewHandlerWrapper.renderView(ViewHandlerWrapper.java:132)
at com.sun.faces.lifecycle.RenderResponsePhase.execute(RenderResponsePhase.java:102)
at com.sun.faces.lifecycle.Phase.doPhase(Phase.java:76)
at com.sun.faces.lifecycle.LifecycleImpl.render(LifecycleImpl.java:199)
at javax.faces.webapp.FacesServlet.executeLifecyle(FacesServlet.java:708)
at javax.faces.webapp.FacesServlet.service(FacesServlet.java:451)
at org.apache.catalina.core.StandardWrapper.service(StandardWrapper.java:1636)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:331)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:211)
at org.ocpsoft.rewrite.servlet.RewriteFilter.doFilter(RewriteFilter.java:226)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:253)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:211)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:257)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:161)
at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:757)
at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:577)
at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:99)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:158)
at org.apache.catalina.connector.CoyoteAdapter.doService(CoyoteAdapter.java:371)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:238)
at com.sun.enterprise.v3.services.impl.ContainerMapper$HttpHandlerCallable.call(ContainerMapper.java:520)
at com.sun.enterprise.v3.services.impl.ContainerMapper.service(ContainerMapper.java:217)
at org.glassfish.grizzly.http.server.HttpHandler.runService(HttpHandler.java:182)
at org.glassfish.grizzly.http.server.HttpHandler.doHandle(HttpHandler.java:156)
at org.glassfish.grizzly.http.server.HttpServerFilter.handleRead(HttpServerFilter.java:218)
at org.glassfish.grizzly.filterchain.ExecutorResolver$9.execute(ExecutorResolver.java:95)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:260)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:177)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:109)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:88)
at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:53)
at org.glassfish.grizzly.nio.transport.TCPNIOTransport.fireIOEvent(TCPNIOTransport.java:524)
at org.glassfish.grizzly.strategies.AbstractIOStrategy.fireIOEvent(AbstractIOStrategy.java:89)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.run0(WorkerThreadIOStrategy.java:94)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.access$100(WorkerThreadIOStrategy.java:33)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy$WorkerThreadRunnable.run(WorkerThreadIOStrategy.java:114)
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:569)
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:549)
at java.lang.Thread.run(Thread.java:748)
Caused by: javax.el.ELException: java.lang.NullPointerException
at javax.el.BeanELResolver.getValue(BeanELResolver.java:304)
at com.sun.faces.el.DemuxCompositeELResolver._getValue(DemuxCompositeELResolver.java:156)
at com.sun.faces.el.DemuxCompositeELResolver.getValue(DemuxCompositeELResolver.java:184)
at com.sun.el.parser.AstValue.getValue(AstValue.java:114)
at com.sun.el.parser.AstValue.getValue(AstValue.java:177)
at com.sun.el.parser.AstDeferredExpression.getValue(AstDeferredExpression.java:39)
at com.sun.el.parser.AstCompositeExpression.getValue(AstCompositeExpression.java:44)
at com.sun.el.ValueExpressionImpl.getValue(ValueExpressionImpl.java:183)
at org.jboss.weld.module.web.el.WeldValueExpression.getValue(WeldValueExpression.java:50)
at com.sun.faces.facelets.el.TagValueExpression.getValue(TagValueExpression.java:73)
... 79 more
Caused by: java.lang.NullPointerException
at edu.harvard.iq.dataverse.DataFile.getFriendlySize(DataFile.java:565)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at javax.el.BeanELResolver.getValue(BeanELResolver.java:299)
... 88 more
]]

@qqmyers
Copy link
Member Author

qqmyers commented Apr 30, 2020

The stack trace about file size is caused by the failure to get the file metadata from S3. It will break displaying that dataset unless you add non-null file size in the db. I think you can still delete the dataset though.

It's less clear why you can't see the files on S3. The New host dataverse id line suggest that you changed the host dataverse at some point which will change the DOI and file paths. Can you check S3 and see if the files are just at the wrong path? If that's the case, and you changed the dataverse after uploading files, the fix might be to clear out uploaded files if you change dataverses (which could change the storage driver used too). Or we could keep the DOI when recreating the Dataset.

@kcondon
Copy link
Contributor

kcondon commented Apr 30, 2020

@qqmyers In the case of that one failed dataset, the directory does not exists on s3. I have been able to upload on create in a root directory. Just tried again in a sub directory, newly created, all 3 files appeared to complete upload but only appeared at bottom with a stack trace in logs:

[2020-04-30T21:27:58.800+0000] [Payara 5.201] [SEVERE] [] [javax.enterprise.resource.webcontainer.jsf.context] [tid: _ThreadID=97 _ThreadName=http-thread-pool::jk-connector(3)] [timeMillis: 1588282078800] [levelValue: 1000] [[
java.lang.NullPointerException
at javax.faces.component.ComponentStateHelper.restoreState(ComponentStateHelper.java:294)
at javax.faces.component.UIComponentBase.restoreState(UIComponentBase.java:1249)
at javax.faces.component.UIOutput.restoreState(UIOutput.java:243)
at javax.faces.component.UIInput.restoreState(UIInput.java:1512)
at com.sun.faces.application.view.FaceletPartialStateManagementStrategy$2.visit(FaceletPartialStateManagementStrategy.java:372)
at com.sun.faces.component.visit.FullVisitContext.invokeVisitCallback(FullVisitContext.java:127)
at javax.faces.component.UIComponent.visitTree(UIComponent.java:1456)
at javax.faces.component.UIComponent.visitTree(UIComponent.java:1468)
at javax.faces.component.UIComponent.visitTree(UIComponent.java:1468)
at javax.faces.component.UIComponent.visitTree(UIComponent.java:1468)
at javax.faces.component.UIComponent.visitTree(UIComponent.java:1468)
at javax.faces.component.UIForm.visitTree(UIForm.java:355)
at javax.faces.component.UIComponent.visitTree(UIComponent.java:1468)
at javax.faces.component.UIComponent.visitTree(UIComponent.java:1468)
at com.sun.faces.application.view.FaceletPartialStateManagementStrategy.restoreView(FaceletPartialStateManagementStrategy.java:358)
at com.sun.faces.application.StateManagerImpl.restoreView(StateManagerImpl.java:113)
at com.sun.faces.application.view.ViewHandlingStrategy.restoreView(ViewHandlingStrategy.java:99)
at com.sun.faces.application.view.FaceletViewHandlingStrategy.restoreView(FaceletViewHandlingStrategy.java:272)
at com.sun.faces.application.view.MultiViewHandler.restoreView(MultiViewHandler.java:133)
at javax.faces.application.ViewHandlerWrapper.restoreView(ViewHandlerWrapper.java:101)
at org.omnifaces.viewhandler.RestorableViewHandler.restoreView(RestorableViewHandler.java:66)
at org.ocpsoft.rewrite.faces.RewriteViewHandler.restoreView(RewriteViewHandler.java:102)
at javax.faces.application.ViewHandlerWrapper.restoreView(ViewHandlerWrapper.java:101)
at com.sun.faces.lifecycle.RestoreViewPhase.execute(RestoreViewPhase.java:181)
at com.sun.faces.lifecycle.Phase.doPhase(Phase.java:76)
at com.sun.faces.lifecycle.RestoreViewPhase.doPhase(RestoreViewPhase.java:110)
at com.sun.faces.lifecycle.LifecycleImpl.execute(LifecycleImpl.java:177)
at javax.faces.webapp.FacesServlet.executeLifecyle(FacesServlet.java:707)
at javax.faces.webapp.FacesServlet.service(FacesServlet.java:451)
at org.apache.catalina.core.StandardWrapper.service(StandardWrapper.java:1636)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:331)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:211)
at org.ocpsoft.rewrite.servlet.RewriteFilter.doFilter(RewriteFilter.java:226)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:253)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:211)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:257)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:161)
at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:757)
at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:577)
at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:99)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:158)
at org.apache.catalina.connector.CoyoteAdapter.doService(CoyoteAdapter.java:371)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:238)
at com.sun.enterprise.v3.services.impl.ContainerMapper$HttpHandlerCallable.call(ContainerMapper.java:520)
at com.sun.enterprise.v3.services.impl.ContainerMapper.service(ContainerMapper.java:217)
at org.glassfish.grizzly.http.server.HttpHandler.runService(HttpHandler.java:182)
at org.glassfish.grizzly.http.server.HttpHandler.doHandle(HttpHandler.java:156)
at org.glassfish.grizzly.http.server.HttpServerFilter.handleRead(HttpServerFilter.java:218)
at org.glassfish.grizzly.filterchain.ExecutorResolver$9.execute(ExecutorResolver.java:95)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:260)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:177)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:109)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:88)
at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:53)
at org.glassfish.grizzly.nio.transport.TCPNIOTransport.fireIOEvent(TCPNIOTransport.java:524)
at org.glassfish.grizzly.strategies.AbstractIOStrategy.fireIOEvent(AbstractIOStrategy.java:89)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.run0(WorkerThreadIOStrategy.java:94)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.access$100(WorkerThreadIOStrategy.java:33)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy$WorkerThreadRunnable.run(WorkerThreadIOStrategy.java:114)
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:569)
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:549)
at java.lang.Thread.run(Thread.java:748)
]]

I need to step out for a couple hours but it seems that I'm getting wildly different results. I'll try to retest, being very methodical but aside from that one time starting payara as root rather than dataverse user, everything else was straightforward uploads.

@qqmyers
Copy link
Member Author

qqmyers commented Apr 30, 2020

It's hard to tell but I'd suspect that you've identified two new issues: changing the dataverse when creating a dataset breaks direct file uploads (only if they are started prior to the dataverse change?), and payara breaks direct uploads. If this PR solves the problem on glassfish for cases when the dataverse doesn't change, it fixes an issue (or at least an issue since I was able to see a problem on glassfish when I didn't change the dataverse).

@kcondon
Copy link
Contributor

kcondon commented May 1, 2020

@qqmyers I'm not sure about changing the host dv. I did not do that. What has happened and is a UX issue is the focus is on the host dv text box on create dataset page load and often, when trying to enter the title, it alters the host dv name, which you then need to edit and reselect to be the correct name. This happens before any data or files are uploaded and I believe I simply reselected the original dv so not sure why this would be triggered.

@kcondon
Copy link
Contributor

kcondon commented May 1, 2020

@qqmyers OK, have tried testing more carefully, recording results. Here is the list of tests and results. I'll send you the stack traces outside of this ticket since they're large:

  1. new user, dataset in root, direct s3, 1 file on create
    Worked, no errors
  2. new user, dataset in root, direct s3, 3 small files on create
    all 3 files hung at upload with 100% progress bar
  3. new user, sub dv, direct s3, 1 sm file on create
    worked
  4. new user, sub dv, direct s3, 3 sm files on created
    test1 worked all filex
    test2 failed, hung on file upload, all 3 files showing 0% complete, browser log errors, no server log error:
    test3 worked, all filex
    test4 worked, server error
    test5 failed, all three files show 100% progress, hung here, browser log shows files uploaded, server log shows stack trace
  5. new user, sub dv, accidentally change host dv, reselect same, 1 file on create
    fails: file uploads, apparently, when save dataset, see exception in ui, stack trace in logs:
  6. new user, root, 3 sm file, after dataset create
    test1 works
    test2 works
  7. new user, sub dv, indirect s3, 3 sm files, on create
    works
  8. new user, sub dv, direct s3, accidentally alter dv, reselect, 3 sm files after create
    works
  9. new user, sub dv, local file, 3 sm files on create
    works

@kcondon kcondon assigned qqmyers and unassigned kcondon May 1, 2020
@qqmyers
Copy link
Member Author

qqmyers commented May 1, 2020

From looking in the logs:
#2 looks like it has #6371 and the Payara NPE from javax.enterprise.resource.webcontainer.jsf.context.
$4 test 2 - looks like your network changed during the calls (temporary drop? switch to VPN?). This is something that I think we should be able to handle - just uploaded commits to handle the NPE and (correctly) assign it as a network error that should be displayed.
#4 test 4 - same Payara NPE from javax.enterprise.resource.webcontainer.jsf.context
#4 test 5 - same Payara NPE from javax.enterprise.resource.webcontainer.jsf.context
#5 - consistent with our earlier discussion - file was uploaded to a path corresponding to the old ID and can't be found. I've added a commit that keeps the existing GlobalId upon Dataverse change to avoid losing the file in cases where the storage driver remains the same. If the new Dataverse has a different driver, various other bad things could happen (depending on whether the drivers share a bucket, or one is a file driver, etc.), so it may be worth another issue for that (with the question of whether already uploaded files should just be deleted, or moved, or the problem avoided by stopping file upload during create, etc.)
Note that this is then compounded by #6343 as the subsequent call to get the file size fails and breaks the display.

I'd suggest we break the Payara issue out into a new issue, and probably create one to track changing between dataverses with different stores. If we want to fix/work-around #6371, I can submit a PR that reverts to using older code that doesn't randomly fail - we use that on QDR.

W.r.t. this PR, I think it does fix the race condition I saw, and now catches network errors that don't even return a status=0 on their own, and handles ~accidental toggling between dataverses. I'm not sure how you test the network error unless you pull the plug, but testing on glassfish and trying the toggle to another dataverse and back should be possible. If we want to wait on this one until the Payara issue is fixed, that's OK with me - I guess it depends on whether the things I've fixed are being seen/are a priority that would result in a pre-Payara patch.

@kcondon
Copy link
Contributor

kcondon commented May 1, 2020

This fixes some failing cases but will break out payara direct s3 upload on ds create as a separate ticket.

@kcondon kcondon assigned kcondon and unassigned qqmyers May 1, 2020
@kcondon kcondon merged commit b0868df into IQSS:develop May 1, 2020
@djbrooke djbrooke added this to the Dataverse 5 milestone May 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

File Upload: Intermittent Failure to upload files on dataset create using direct to s3
4 participants