Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix JSON bug with data reading #691

Merged
merged 5 commits into from
Oct 17, 2022
Merged

Conversation

JGSweets
Copy link
Collaborator

@JGSweets JGSweets commented Oct 14, 2022

Previously: we could not read line separated JSON arrays.
Now we can read:

[1, 2]
[2, 3]
[3, 3]

becomes a JSONData which is wrapping pandas as:

   0   1
0  1   2
1  2   3
2  3   3

@JGSweets JGSweets added Bug Something isn't working High Priority Dramatic improvement, inaccurate calculation(s) or bug / feature making the library unusable labels Oct 14, 2022
@@ -1,6 +1,6 @@
"""Contains class for saving and loading spreadsheet data."""
from io import BytesIO, StringIO
from typing import Any, Dict, List, Optional, Union, cast
from typing import Any, Dict, List, Optional, Union
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the proper way in the other func for a boolean which checks type is to return a TypeGuard

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to cast now

@@ -92,18 +92,12 @@ def is_match(

# get current position of stream
if data_utils.is_stream_buffer(file_path):
file_path = cast(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static typing, not functional change, removed bc of TypeGuard

starting_location = file_path.tell()

is_valid_avro = fastavro.is_avro(file_path)

# return to original position in stream
if data_utils.is_stream_buffer(file_path):
file_path = cast(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static typing, not functional change, removed bc of TypeGuard


def is_stream_buffer(filepath_or_buffer: Any) -> bool:

def is_stream_buffer(filepath_or_buffer: Any) -> TypeGuard[Union[StringIO, BytesIO]]:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix to use TypeGuard

Comment on lines -30 to +35
open_method: str ="r",
encoding: Optional[str]=None,
seek_offset: Optional[int]=None,
seek_whence: int=0,
open_method: str = "r",
encoding: Optional[str] = None,
seek_offset: Optional[int] = None,
seek_whence: int = 0,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatting

Comment on lines -54 to +58
self.original_type: Union[Type[str], Type[StringIO], Type[BytesIO], Type[IO]] = type(filepath_or_buffer)
self.original_type: Union[
Type[str], Type[StringIO], Type[BytesIO], Type[IO]
] = type(filepath_or_buffer)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

format

Comment on lines -87 to +93
self._filepath_or_buffer = cast(TextIOWrapper, self._filepath_or_buffer) # guaranteed by self._is_wrapped
self._filepath_or_buffer = cast(
TextIOWrapper, self._filepath_or_buffer
) # guaranteed by self._is_wrapped
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

format

Comment on lines -95 to +103
self._filepath_or_buffer = cast(IO, self._filepath_or_buffer) # can't be str due to conversion in __enter__
self._filepath_or_buffer = cast(
IO, self._filepath_or_buffer
) # can't be str due to conversion in __enter__
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

format

_data = _data.to_dict(orient="records", into=OrderedDict)
for i, sample in enumerate(_data):
_data[i] = json.dumps(
data = self._get_data_as_df(data)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no longer having two variables of the data with _data
fixes type acceptance at the top

"""
Extract the data as a json format.

:param data: raw data
:type data: list
:return: dataframe in json format
"""
_data: Union[pd.DataFrame, List]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no longer having two variables of the data with _data
fixes type acceptance at the top

@@ -349,6 +349,8 @@ def _convert_flat_to_nested_cols(cls, dic: Dict, separator: str = ".") -> Dict:
:return:
"""
for key in list(dic.keys()):
if not isinstance(key, str):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix for allowing the [1] format in json reading

@@ -392,14 +394,16 @@ def is_match(
return True
except (json.JSONDecodeError, UnicodeDecodeError):
data_file.seek(0)

json_identifier_re = re.compile(r"(:|\[)")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allows both : and [ as a JSON identifier differentiating from just a string.

@@ -12,11 +12,12 @@

class TestNestedJSON(unittest.TestCase):
def test_flat_to_nested_json(self):
dic = {"a.b": "ab", "a.c": "ac", "a.d.f": "adf", "b": "b"}
dic = {"a.b": "ab", "a.c": "ac", "a.d.f": "adf", "b": "b", 1: 3}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updates test to check for keys that aren't strings

@@ -56,6 +57,11 @@ def setUpClass(cls):
encoding="utf-8",
count=14,
),
dict(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adds test which includes the new json format

@@ -0,0 +1,3 @@
[1]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new data for testing json reading

@taylorfturner taylorfturner enabled auto-merge (squash) October 17, 2022 12:39
@taylorfturner taylorfturner merged commit d16b5c8 into capitalone:main Oct 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working High Priority Dramatic improvement, inaccurate calculation(s) or bug / feature making the library unusable
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants