-
Notifications
You must be signed in to change notification settings - Fork 113
Base reader, refactor and pathlib support #418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
840609d
New branch for new (clean) PR
tetov ba45d7c
Updated to match autopep8-ed files upstream
tetov 0682dcf
Missed a file
tetov d27f338
Fixed a few of the issues raised in code review.
tetov ed1feba
Limiting scope to base_reader, other modifications discarded
tetov d71b7b2
Merge remote-tracking branch 'upstream/master' into base_reader
tetov 2dcf425
Had to manually reset files, odd
tetov a0cb3a7
Refactor of class, renaming, URL solution
tetov add1e74
Missed test fixture
tetov 7691bd2
Class structure
tetov 03a1867
Fixes based on code review by @brgcode. Fixed import for python 2
tetov d319705
Tests and structure of class
tetov 43b1650
Missed files \(don't forget to squash haha\)!
tetov 22258c4
@brgcode\'s suggestions and iter_chunks
tetov 9619c75
Parent class for file reader classes added
tetov File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,212 @@ | ||
| from __future__ import absolute_import | ||
| from __future__ import division | ||
|
|
||
| try: | ||
| from pathlib import Path | ||
| except ImportError: | ||
| from pathlib2 import Path | ||
| try: | ||
| from urllib.request import urlretrieve | ||
| except ImportError: | ||
| from urllib import urlretrieve | ||
|
|
||
| import binaryornot.check | ||
tetov marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
|
|
||
| class BaseReader(object): | ||
| """Base class containing file reading functions for file extension specific readers | ||
|
|
||
| Attributes | ||
| ---------- | ||
| location : Path object | ||
| Path to file location | ||
| """ | ||
|
|
||
tetov marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| FILE_SIGNATURE = { | ||
| 'content': None, | ||
| 'offset': None, | ||
| } | ||
|
|
||
| def __init__(self, address): | ||
| self._address = address | ||
| self._is_binary = None | ||
|
|
||
| @property | ||
| def location(self): | ||
| """Path to local file | ||
|
|
||
| Checks if given address is a Path object, and if not creates one from | ||
| given address if it's a file path or URL in a string. | ||
|
|
||
| If an URL is given as the address the file will be downloaded to a | ||
| temporary directory[1]_ and the location property will be a Path | ||
| object for the downloaded files location. | ||
|
|
||
| .. [1] See builtin module tempfile | ||
| https://docs.python.org/3/library/tempfile.html | ||
|
|
||
| Parameters | ||
| ---------- | ||
| self._address : string or Path object | ||
| Address specified either as an URL, string containing path or | ||
| Path object | ||
|
|
||
| Returns | ||
| ------ | ||
| Path object | ||
| Path object for file location | ||
| """ | ||
| if self.is_address_url(): | ||
| pathobj = self._download(self._address) | ||
| else: | ||
| if not isinstance(self._address, Path): | ||
| pathobj = Path(self._address) | ||
| else: | ||
| pathobj = self._address | ||
|
|
||
| if pathobj.exists(): | ||
| return pathobj | ||
|
|
||
| @property | ||
| def is_binary(self): | ||
| """ Tries to determine if a file is binary or not using the | ||
| binaryornot library. | ||
|
|
||
| Returns | ||
| ------- | ||
| bool | ||
| True if binary, else false. | ||
| """ | ||
| return binaryornot.check.is_binary(str(self.location)) | ||
|
|
||
| @property | ||
tetov marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| def is_valid(self): | ||
| return NotImplementedError | ||
|
|
||
| def is_address_url(self): | ||
| """Checks if given address is an URL | ||
|
|
||
| Returns | ||
| ------- | ||
| bool | ||
| True if recognized as an URL | ||
| """ | ||
| return str(self._address).startswith('http') | ||
|
|
||
| def _download(self, url): | ||
| """Downloads file and returns path to tempfle | ||
|
|
||
| Called by property self.location | ||
|
|
||
| Parameters | ||
| ---------- | ||
| url : string | ||
| URL to file | ||
|
|
||
| Returns | ||
| ------- | ||
| location : Pathlib object | ||
| Path to downloaded file (stored in temporary folder) | ||
| """ | ||
| location, _ = urlretrieve(url) | ||
|
|
||
| return Path(location) | ||
tetov marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| def open_ascii(self): | ||
| """Open ascii file and return file object | ||
|
|
||
| Returns | ||
| ------- | ||
| file object | ||
| """ | ||
| try: | ||
| file_object = self.location.open(mode='r') | ||
| except UnicodeDecodeError: | ||
| file_object = self.location.open(mode='r', errors='replace', newline='\r') | ||
| return file_object | ||
|
|
||
| def open_binary(self): | ||
| """Open binary file and return file object | ||
|
|
||
| Returns | ||
| ------- | ||
| file object | ||
| """ | ||
| return self.location.open(mode='rb') | ||
|
|
||
| def iter_lines(self): | ||
| """Yields lines from ascii file | ||
|
|
||
| Yields | ||
| ------- | ||
| string | ||
| Next line in file | ||
| """ | ||
| # TODO: Handle continuing lines (as in OFF files) | ||
|
|
||
| with self.open_ascii() as fo: | ||
| for line in fo: | ||
| yield line.rstrip() | ||
|
|
||
| def iter_chunks(self, chunk_size=4096): | ||
tetov marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| """Yields chunks from binary files | ||
|
|
||
| Parameters | ||
| ---------- | ||
| chunk_size : int | ||
| Chunks to read with each call | ||
|
|
||
| Yields | ||
| ------ | ||
| bytes | ||
| Next chunk of file | ||
| """ | ||
| with self.open_binary() as fo: | ||
| # reads until empty byte string is encountered | ||
| for chunk in iter(lambda: fo.read(chunk_size), b''): | ||
| yield chunk | ||
|
|
||
| def read(self): | ||
| raise NotImplementedError | ||
|
|
||
| def is_file_signature_correct(self): | ||
| """Checks wether file signature (also known as magic number) is present | ||
| in input file. | ||
|
|
||
| File signatures are strings, numbers or bytes defined in a file | ||
| format's specification. While not technically required to parse the | ||
| file, a missing file signatures might be a sign of a malformed file. | ||
|
|
||
| More information about file signatures can be found on Wikipedia[1]_ | ||
| as well as examples of file signatures[2]_ | ||
|
|
||
| Returns | ||
| ------ | ||
| bool | ||
| True if file signature for file type is found in file or if file | ||
| type has no file signature. | ||
|
|
||
| .. [1] https://en.wikipedia.org/wiki/List_of_file_signatures | ||
| .. [2] https://en.wikipedia.org/wiki/File_format#Magic_number | ||
| """ | ||
|
|
||
| if self.FILE_SIGNATURE['content'] is None: | ||
| return True | ||
|
|
||
| file_signature = self.FILE_SIGNATURE['content'] | ||
|
|
||
| if self.FILE_SIGNATURE['offset'] is None: | ||
| signature_offset = 0 | ||
| else: | ||
| signature_offset = self.FILE_SIGNATURE['offset'] | ||
|
|
||
| with self.location.open(mode="rb") as fd: | ||
| fd.seek(signature_offset) | ||
| found_signature = fd.read(len(file_signature)) | ||
|
|
||
| if isinstance(found_signature, str) and found_signature != file_signature.decode(): | ||
| return False | ||
| elif found_signature != file_signature: | ||
| return False | ||
|
|
||
| return True | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.