Skip to content

fix: improve source file format detection#282

Merged
fnesveda merged 1 commit intomasterfrom
fix/improve-file-format-detection
Oct 5, 2022
Merged

fix: improve source file format detection#282
fnesveda merged 1 commit intomasterfrom
fix/improve-file-format-detection

Conversation

@fnesveda
Copy link
Copy Markdown
Member

@fnesveda fnesveda commented Oct 4, 2022

The source file format detection was not working for e.g. .tgz files, because their MIME type was not detected correctly.
I added the right MIME type definition for the .tgz files, but then I decided to fix one old TODO and rework the file format detection a bit.

Now if we can't detect the MIME type of the file, we don't automatically assume it to be text, but instead we try to detect its encoding from the contents, using the isTextOrBinary package.
It should be relatively fast, since:

  • it's done only for files which couldn't be determined via MIME type
  • it's done only when the total size of the pushed files is <3MB
  • it reads only a few bytes in the start, middle and end of each file, not the whole file, and checks if they are in the Unicode characters range, or not

This should not change the behavior for files which were really text, it should just detect binary files better, so it's not a breaking change.

@fnesveda fnesveda added bug Something isn't working. adhoc Ad-hoc unplanned task added during the sprint. labels Oct 4, 2022
@fnesveda fnesveda added this to the 48th sprint - Platform team milestone Oct 4, 2022
@fnesveda fnesveda requested review from B4nan, drobnikj and mvolfik October 4, 2022 14:32
@fnesveda fnesveda self-assigned this Oct 4, 2022
@fnesveda fnesveda merged commit 12ab43d into master Oct 5, 2022
@fnesveda fnesveda deleted the fix/improve-file-format-detection branch October 5, 2022 06:43
@fnesveda fnesveda added the validated Issues that are resolved and their solutions fulfill the acceptance criteria. label May 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

adhoc Ad-hoc unplanned task added during the sprint. bug Something isn't working. validated Issues that are resolved and their solutions fulfill the acceptance criteria.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants