Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graph loading bug #603

Merged
merged 4 commits into from
Aug 11, 2022
Merged

Graph loading bug #603

merged 4 commits into from
Aug 11, 2022

Conversation

JGSweets
Copy link
Collaborator

Allows lazy loading to work for data class as other classes without causing bugs

@JGSweets JGSweets added Bug Something isn't working High Priority Dramatic improvement, inaccurate calculation(s) or bug / feature making the library unusable labels Aug 11, 2022
@@ -102,25 +104,19 @@ def _find_target_string_in_column(self, column_names, keyword_list):
return target_index

@classmethod
def csv_column_names(cls, file_path, options):
def csv_column_names(cls, file_path, header, delimiter, encoding="utf-8"):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated for reusability in _network

data_as_str = data_utils.load_as_str_from_file(
self.input_file_path, self.file_encoding
)
self._header = CSVData._guess_header_row(
data_as_str, self._delimiter, self._quotechar

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this section is what fixes the bug. If graph data was lazily loaded or loaded without options from Is_match, it wouldn't determine these when thrown into the system. This fixes it.

Comment on lines +101 to +105
@classmethod
def setUp(cls):
for buffer in cls.buffer_list:
buffer["path"].seek(0)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to seek to reset data files after use between tests

@@ -147,11 +152,17 @@ def test_csv_column_names(self):
"open_date_src",
"open_date_dst",
]
for input_file in self.input_file_names_pos:
for input_file in self.file_or_buf_list:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be using all data for tests

self.assertEqual(
GraphData.csv_column_names(input_file["path"], input_file["options"]),
GraphData.csv_column_names(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixes to new call

data = GraphData(input_file["path"])
self.assertEqual(type(data), GraphData)
self.assertIsNotNone(data.data)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ensures data can be loaded as expected

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t mind both but the new is better at validating

@taylorfturner taylorfturner enabled auto-merge (squash) August 11, 2022 01:58
taylorfturner
taylorfturner previously approved these changes Aug 11, 2022
@taylorfturner
Copy link
Contributor

Pre commit too

Copy link
Contributor

@taylorfturner taylorfturner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data_readers/graph_data.py L250-L252 is actually failing here when I run the notebook @JGSweets

@taylorfturner
Copy link
Contributor

taylorfturner commented Aug 11, 2022

data_readers/graph_data.py L250-L252 is actually failing here when I run the notebook @JGSweets

@JGSweets, sorta false alarm.... data.data doesn't return any data, just an networkX object. data.data.edges does output the edges, though.

Based on what I'm seeing in the networkX documentation... there isn't an attribute to return the graph datasets... so not sure data.data, like CSVData, is really an option here.

@taylorfturner taylorfturner merged commit bc6a33b into capitalone:main Aug 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working High Priority Dramatic improvement, inaccurate calculation(s) or bug / feature making the library unusable
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants