Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: excel2json (DEV-2547) #487

Merged
merged 31 commits into from
Aug 30, 2023

Conversation

Nora-Olivia-Ammann
Copy link
Collaborator

@Nora-Olivia-Ammann Nora-Olivia-Ammann commented Aug 24, 2023

No description provided.

Creation of folder and changes of import paths.
Cleaning Datatype
Removal of unnecessary statements.
The compliance checks for the files were changed and testing added. To this end several generic functions were created and tested. The creation and validation of the json string remains wip.
@linear
Copy link

linear bot commented Aug 24, 2023

DEV-2547 Refactoring DSP-TOOLS excel_to_json

Reorganization of the folder structure to create a new folder only for excel_to_json.

Refactoring and addition of testing the excel_to_json_properties file.

@Nora-Olivia-Ammann Nora-Olivia-Ammann self-assigned this Aug 24, 2023
@Nora-Olivia-Ammann Nora-Olivia-Ammann changed the title Dev 2547 refactoring dsp tools excel to json refactor: refactoring dsp tools excel to json (Dev-2547) Aug 24, 2023
@Nora-Olivia-Ammann Nora-Olivia-Ammann changed the title refactor: refactoring dsp tools excel to json (Dev-2547) refactor: dsp-tools excel_to_json (Dev-2547) Aug 24, 2023
@Nora-Olivia-Ammann Nora-Olivia-Ammann changed the title refactor: dsp-tools excel_to_json (Dev-2547) refactor: excel_to_json (Dev-2547) Aug 24, 2023
Copy link
Collaborator

@BalduinLandolt BalduinLandolt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good!
I found it hard to review this PR reasonably, though, because the git diff was a bit all over the place (because stuff was moved around) and I'm not really familiar with this code

src/dsp_tools/cli.py Outdated Show resolved Hide resolved
return {"wrong gui_attributes": final_series}


def _check_missing_values_in_row_raise_error(to_check_df: pd.DataFrame, excel_filename: str) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you name this function in a way that tells the user what this function does? In some other comments, I gave you suggestions how to rename other functions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is it here that is unclear?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Clean code reads like well-written prose." (Robert C. Martin)

If we look at it from the caller side: When we use a function, we want to know what it does. Some examples from my suggestions in other comments:

your name my suggestion
check_duplicate_raise_error() check_column_for_duplicates()
_search_validation_error() _prepare_err_msg_from_json_validation_error
check_required_columns_raise_error() check_that_df_contains_required_columns()

The reason I prefer my suggestions is that it's almost like telling the computer: "do that for me". So the reason I struggle with your name _check_missing_values_in_row_raise_error() is that I cannot translate it into a spoken command like "Hey computer, please check the missing values in the row and raise an error".

See also: https://dev.to/pksilen/27-tips-for-making-your-code-read-like-beautifully-written-prose-4aab

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip #6: Use a preposition in the function name only when necessary: You don’t need to add a preposition to a function name if the preposition can be assumed (i.e., the preposition is implicit).
?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is your question?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this not violate this rule?

Copy link
Collaborator

@jnussbaum jnussbaum Aug 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that Tip no. 6 from dev.to violates Robert Martin's quote? These rules are not natural laws, rather they point you in a certain direction.

So in the present case (discussing about _check_missing_values_in_row_raise_error), it's too far away from natural language. Especially, I struggle with the _raise_error part. If you delete that, it would already be better.

Generally: You have several function names that end in raise_error. As far as I can tell, it's quite uncommon to include that in a function name. It would be better to do it like you did in _check_compliance_gui_attributes(): There, the docstring says that the function might raise an error, but you didn't pack this info into the function name.

"""
This function takes a pd.Dataframe and a list of column names which may not contain empty cells. If there are any
empty cells in the column, it adds the column name and a boolean pd.Series to the dictionary. If there are no empty
cells, then it is not included in the dictionary. If no column has any empty cells, then it returns an empty
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general feedback to your docstrings: I think that you should make a conceptual distinction between inline comments and docstrings:

  • Inline comments explain implementation details. They are intended at other developers.
  • Docstring are 1 abstraction level higher: They are intended at users of the function. A docstring should enable the user to decide if he needs this function or not. It should explain to the outside world what tasks can be accomplished with this function.

Copy link
Collaborator

@jnussbaum jnussbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is a big improvement of the code. I especially like the modular approach.

However, there are some practical things, some "handwerkliche Details", that should be improved, especially the naming of the functions. In my comments, I gave some examples how the functions could be named.

Can you in a first step improve the naming of the functions? In a second step, I could then review again. Because at the moment, i find it hard to review, because I find it difficult to understand what the functions do.

Based on the string content of the individual rows, which is specified in the "substring_list",
the cell is the column "check_empty_colname" is checked whether it is empty or not.
The "substring_list" contains the different possibilities regarding the content of the cell.
They are joined in a RegEx "|" which denotes "or".
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring of a function is not the right place to explain regex syntax. This is an implementation detail that doesn't matter to the person who uses this function.

Comment on lines +32 to +34
for file in os.listdir("testdata/tmp"):
os.remove("testdata/tmp/" + file)
os.rmdir("testdata/tmp")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 3 lines can be simplified to shutil.rmtree("testdata/tmp")

@Nora-Olivia-Ammann Nora-Olivia-Ammann merged commit 504a4ec into main Aug 30, 2023
4 checks passed
@Nora-Olivia-Ammann Nora-Olivia-Ammann deleted the DEV-2547-Refactoring-DSP-TOOLS-excel_to_json branch August 30, 2023 13:27
@daschbot daschbot mentioned this pull request Aug 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants