-
Notifications
You must be signed in to change notification settings - Fork 494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error with Redetect File Type API #7527
Comments
Hey @stevenferey, thanks for the report. That's interesting, as we're also running on S3 but did not have the same experience on the Harvard Dataverse Repository AFAIK. We'll be running this API following the 5.4 release, as we've added some new mimetypes. It may give us a chance to get some more information. I'm not sure it matters, but are you using AWS S3 or something else? |
Hello Danny, Thank you for your reply. Hope this will be useful for you Thanks a lot. |
Hi, I add that if "dryrun" is false, the recording is impossible (server error in the SQL transaction). The dataset concerned is then in HTTP 500 error because dataverse can no longer retrieve the "DerivedOriginalFileName" field for the file (if it is a tabular file): nullpointerexception : [2021-01-26T16:58:15.412+0100] [Payara 5.2020] [SEVERE] [] [javax.enterprise.resource.webcontainer.jsf.application] [tid: _ThreadID=95 _ThreadName=http-thread-pool::jk-connector(5)] [timeMillis: 1611676695412] [levelValue: 1000] [[ Application server: Payara Steven. |
Thanks @stevenferey for the additional details. After some testing, I was able to reproduce this. We'll be using this API in the next release so I will prioritize this. edit: we will not be using this API in the next release but we should fix it anyway. :) |
|
OK, this does look like a problem/bug. Note that it's not specific to S3 and the fact that we have to create a temp. file though - it doesn't look like it would work for local files either! I'm seeing this in the code:
... it should of course be something like
In other words, that redetection API only works for the types that we recognize by the file content; but not by file names/extensions. |
Did you add this extension and this type to the properties file above? - I'm not seeing it in the version of the file that we distribute. |
Looking at the stack trace you posted, it appears to be the same problem as in #7310, that we have fixed since 5.0.
and this export will be cached, and the dataset page will start working again. |
For the local developers:
instead of
(trivial) |
Hello landreev, Thank you very much for your answer. The curl command also makes the dataset visible again. Steven. |
Hello, I'm assuming this hasn't been much of an issue for you. (With your initial use case - changing the mime type based on the filename extension - that could be easily done by a database query instead...). But it does look like this is being caused by some underlying EJB issue; that may result in problems elsewhere... So ideally, we'd like to understand what's going on. |
Hello, I have run an example to verify that the problem is still present on our Dataverse V5.3: I am taking a .tabular file saved in S3 before customizing the MimeTypeDetectionByFileExtension.properties file Its Mime type is "application / octet-stream", that's normal. Next, I customize the MimeTypeDetectionByFileExtension.properties file so that my new .tabular files have the MIME type "text / tab-separated-values": Now my new .tabular files saved in Dataverse have the MIME type "text / tab-separated-values". To make my first file also have the new MIME type, I run the following resource: curl -H "X-Dataverse-key: $ API_TOKEN" -X POST "$ SERVER_URL / api / files / $ ID / redetect? dryRun = true" server.log: curl -H "X-Dataverse-key: $ API_TOKEN" -X POST "$ SERVER_URL / api / files / $ ID / redetect? dryRun = false" server.log: Attachment ( server.log ) I hope this can help with the analysis |
Hello,
This issue follows a discussion in the google group :
https://groups.google.com/g/dataverse-community/c/_H8ZdAo85BU
Here is an example to describe the problem :
Dataverse version : 5.0 + S3 storage
file extension saved in S3: .tabular
current MIME type for this file: "application/octet-stream"
The .tabular extension is declared in the MimeTypeDetectionByFileExtension.properties file => tabular = text/tab-separated-values
Here is the problem :
When the redetect API resource is called, because the file is remote, its content is inserted into a temporary file: tempFileTypeCheck.tmp
The file extension is then compared to the list in MimeTypeDetectionByFileExtension.properties but the .tmp is not there.
Server return: "tmp is a file extension Dataverse doesn't know about. Consider adding it to the MimeTypeDetectionByFileExtension.properties file."
Finally, the "application/octet-stream" MIME Type is the result of the redetect API resource for this file :(
The expected result is "text/tab-separated-values"
thanks a lot.
Steven.
The text was updated successfully, but these errors were encountered: