Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(excel2json): new user error (InputError) implementation in properties section (DEV-3037) #654

Merged
Merged
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
a35deb9
New Error
Nora-Olivia-Ammann Nov 27, 2023
f940a7b
extract validate folder structure / create json
Nora-Olivia-Ammann Nov 27, 2023
33255a0
Merge branch 'main' into wip/dev-3024-excel2json-new-error-implementa…
Nora-Olivia-Ammann Nov 28, 2023
bb939ae
Merge branch 'wip/dev-3024-excel2json-new-error-implementation' of ht…
Nora-Olivia-Ammann Nov 28, 2023
f594c96
Update properties.py
Nora-Olivia-Ammann Nov 28, 2023
858e6e2
Update exceptions.py
Nora-Olivia-Ammann Nov 28, 2023
f4cbfc0
Merge branch 'main' into wip/dev-3024-excel2json-new-error-implementa…
Nora-Olivia-Ammann Nov 29, 2023
19b160b
Merge branch 'main' into wip/dev-3024-excel2json-new-error-implementa…
Nora-Olivia-Ammann Nov 29, 2023
ab4d862
Merge branch 'wip/dev-3024-excel2json-new-error-implementation' of ht…
Nora-Olivia-Ammann Nov 29, 2023
2872600
update utils
Nora-Olivia-Ammann Nov 29, 2023
dee250d
Create input_error.py
Nora-Olivia-Ammann Nov 29, 2023
4a11ea6
Update test_properties.py
Nora-Olivia-Ammann Nov 29, 2023
47cf1c8
update test excel
Nora-Olivia-Ammann Nov 29, 2023
49d8790
fix test
Nora-Olivia-Ammann Nov 29, 2023
0cf49c2
make sourcery happy
Nora-Olivia-Ammann Nov 29, 2023
59a43bb
Update properties.py
Nora-Olivia-Ammann Nov 29, 2023
f211d96
Update properties.py
Nora-Olivia-Ammann Nov 29, 2023
a35a9de
Update input_error.py
Nora-Olivia-Ammann Nov 29, 2023
bbe8545
linting
Nora-Olivia-Ammann Nov 29, 2023
7b9bbb8
linting
Nora-Olivia-Ammann Nov 29, 2023
a0bdbc2
Error for gui attrib
Nora-Olivia-Ammann Nov 29, 2023
a7f4648
mypy linting
Nora-Olivia-Ammann Nov 29, 2023
25eab6c
Update properties.py
Nora-Olivia-Ammann Nov 29, 2023
4a666bc
json validation error
Nora-Olivia-Ammann Nov 29, 2023
50e3167
Update properties.py
Nora-Olivia-Ammann Nov 29, 2023
7b54f48
linting
Nora-Olivia-Ammann Nov 29, 2023
9e0219a
linting
Nora-Olivia-Ammann Nov 29, 2023
e851115
Remove redundant functions
Nora-Olivia-Ammann Nov 30, 2023
eeb91ba
Update properties.py
Nora-Olivia-Ammann Nov 30, 2023
f1f4a19
Update properties.py
Nora-Olivia-Ammann Nov 30, 2023
498b070
Change error messages
Nora-Olivia-Ammann Nov 30, 2023
8e72efe
Change XMLError name
Nora-Olivia-Ammann Nov 30, 2023
dda2770
Change JsonValidationProblem message
Nora-Olivia-Ammann Nov 30, 2023
e8b3a04
Update input_error.py
Nora-Olivia-Ammann Nov 30, 2023
07a3fc4
linting
Nora-Olivia-Ammann Nov 30, 2023
90b010f
Update properties.py
Nora-Olivia-Ammann Nov 30, 2023
6851cd4
Update exceptions.py
Nora-Olivia-Ammann Nov 30, 2023
e91ec05
Remove all \t
Nora-Olivia-Ammann Nov 30, 2023
ecf6bc0
Merge branch 'main' into wip/dev-3024-excel2json-new-error-implementa…
Nora-Olivia-Ammann Nov 30, 2023
0adbcaa
simplify test_properties
Nora-Olivia-Ammann Nov 30, 2023
59cdd25
Merge branch 'main' into wip/dev-3024-excel2json-new-error-implementa…
Nora-Olivia-Ammann Nov 30, 2023
74f1143
linting
Nora-Olivia-Ammann Nov 30, 2023
d37f254
Make more specific problem classes
Nora-Olivia-Ammann Nov 30, 2023
3015329
Update project.py
Nora-Olivia-Ammann Nov 30, 2023
2180c46
Merge branch 'main' into wip/dev-3024-excel2json-new-error-implementa…
Nora-Olivia-Ammann Nov 30, 2023
5e1d419
names
Nora-Olivia-Ammann Dec 1, 2023
779e8bc
harmonise excel information
Nora-Olivia-Ammann Dec 1, 2023
58345c2
moved file
Nora-Olivia-Ammann Dec 1, 2023
281f9c6
make jsonvalidation class more specific
Nora-Olivia-Ammann Dec 1, 2023
ae82749
make darglint happy
Nora-Olivia-Ammann Dec 1, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 0 additions & 1 deletion src/dsp_tools/commands/excel2json/project.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@ def excel2json(
"""
Converts a folder containing Excel files into a JSON data model file. The folder must be structured like this:

::

data_model_files
|-- lists
Expand Down
137 changes: 83 additions & 54 deletions src/dsp_tools/commands/excel2json/properties.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@
import regex

import dsp_tools.commands.excel2json.utils as utl
from dsp_tools.models.exceptions import UserError
from dsp_tools.models.exceptions import InputError
from dsp_tools.models.input_error import ExcelContentProblem, JsonValidationProblem

languages = ["en", "de", "fr", "it", "rm"]
language_label_col = ["label_en", "label_de", "label_fr", "label_it", "label_rm"]
Expand All @@ -21,7 +22,7 @@ def _search_json_validation_error_get_err_msg_str(
properties_list: list[dict[str, Any]],
excelfile: str,
validation_error: jsonschema.ValidationError,
) -> str:
) -> JsonValidationProblem:
Nora-Olivia-Ammann marked this conversation as resolved.
Show resolved Hide resolved
"""
This function takes a list of properties, which were transformed from an Excel to a json.
The validation raised an error.
Expand All @@ -36,7 +37,7 @@ def _search_json_validation_error_get_err_msg_str(
Returns:
A string which is used in the Error message that contains detailed information about the problem
"""
err_msg_list = [f"The 'properties' section defined in the Excel file '{excelfile}' did not pass validation."]
usr_msg = f"The 'properties' section defined in the Excel file '{excelfile}' did not pass validation."
if json_path_to_property := regex.search(r"^\$\[(\d+)\]", validation_error.json_path):
# fmt: off
wrong_property_name = (
Expand All @@ -46,20 +47,26 @@ def _search_json_validation_error_get_err_msg_str(
)
# fmt: on
excel_row = int(json_path_to_property.group(1)) + 2
err_msg_list.append(f"The problematic property is '{wrong_property_name}' in Excel row {excel_row}.")

column = None
val_msg = None
if affected_field := regex.search(
r"name|labels|comments|super|subject|object|gui_element|gui_attributes",
validation_error.json_path,
):
err_msg_list.append(
f"The problem is that the column '{affected_field.group(0)}' has an invalid value: "
f"{validation_error.message}"
)
else:
err_msg_list.append(
f"The error message is: {validation_error.message}\n" f"The error occurred at {validation_error.json_path}"
column = affected_field.group(0)
val_msg = validation_error.message

return JsonValidationProblem(
user_msg=usr_msg,
property=wrong_property_name,
excel_row=excel_row,
excel_column=column,
original_msg=val_msg,
Nora-Olivia-Ammann marked this conversation as resolved.
Show resolved Hide resolved
)
return "\n".join(err_msg_list)
return JsonValidationProblem(
user_msg=usr_msg, original_msg=validation_error.message, message_path=validation_error.json_path
)


def _validate_properties(
Expand All @@ -74,7 +81,7 @@ def _validate_properties(
excelfile: path to the Excel file containing the properties

Raises:
UserError: if the validation fails
InputError: if the validation fails

Returns:
True if the "properties" section passed validation
Expand All @@ -89,7 +96,7 @@ def _validate_properties(
err_msg = _search_json_validation_error_get_err_msg_str(
properties_list=properties_list, excelfile=excelfile, validation_error=err
)
raise UserError(err_msg) from None
raise InputError(err_msg.execute_error_protocol()) from None
Nora-Olivia-Ammann marked this conversation as resolved.
Show resolved Hide resolved
return True
Nora-Olivia-Ammann marked this conversation as resolved.
Show resolved Hide resolved


Expand Down Expand Up @@ -156,7 +163,9 @@ def _format_gui_attribute(attribute_str: str) -> dict[str, str | int | float]:
return {attrib: _search_convert_numbers(value_str=val) for attrib, val in attribute_dict.items()}


def _get_gui_attribute(df_row: pd.Series, row_num: int, excelfile: str) -> dict[str, int | str | float] | None:
def _get_gui_attribute(
df_row: pd.Series, row_num: int, excelfile: str
) -> dict[str, int | str | float] | ExcelContentProblem | None:
"""
This function checks if the cell "gui_attributes" is empty.
If it is, it returns None.
Expand All @@ -179,10 +188,13 @@ def _get_gui_attribute(df_row: pd.Series, row_num: int, excelfile: str) -> dict[
try:
return _format_gui_attribute(attribute_str=df_row["gui_attributes"])
except IndexError:
raise UserError(
f"Row {row_num} of Excel file {excelfile} contains invalid data in column 'gui_attributes'.\n"
"The expected format is '[attribute: value, attribute: value]'."
) from None
return ExcelContentProblem(
user_msg=f"The Excel file '{excelfile}' has invalid content.\nThe expected format is "
f"'attribute: value, attribute: value'",
column="gui_attributes",
rows=[row_num],
values=[df_row["gui_attributes"]],
)


def _row2prop(df_row: pd.Series, row_num: int, excelfile: str) -> dict[str, Any]:
Expand All @@ -198,19 +210,24 @@ def _row2prop(df_row: pd.Series, row_num: int, excelfile: str) -> dict[str, Any]
dict object of the property

Raises:
UserError if there are any formal mistakes in the "gui_attributes" column
InputError: if there are any formal mistakes in the "gui_attributes" column
"""
_property = {x: df_row[x] for x in mandatory_properties}
# These are also mandatory but require formatting
_property.update(
{"labels": utl.get_labels(df_row=df_row), "super": [s.strip() for s in df_row["super"].split(",")]}
)
non_mandatory = {
"comments": utl.get_comments(df_row=df_row),
"gui_attributes": _get_gui_attribute(df_row=df_row, row_num=row_num, excelfile=excelfile),
_property = {x: df_row[x] for x in mandatory_properties} | {
"labels": utl.get_labels(df_row=df_row),
"super": [s.strip() for s in df_row["super"].split(",")],
}
# These functions may return None, this is checked before the update
_property = utl.update_dict_if_not_value_none(additional_dict=non_mandatory, to_update_dict=_property)

gui_attrib = _get_gui_attribute(df_row=df_row, row_num=row_num, excelfile=excelfile)
match gui_attrib:
case dict():
_property["gui_attributes"] = gui_attrib
case ExcelContentProblem():
msg = gui_attrib.execute_error_protocol()
raise InputError(msg) from None

if comment := utl.get_comments(df_row=df_row):
_property["comments"] = comment

return _property


Expand All @@ -229,7 +246,7 @@ def _check_compliance_gui_attributes(df: pd.DataFrame) -> dict[str, pd.Series] |
checks passed.

Raises:
UserError if any of the checks fail
InputError if any of the checks fail
"""
mandatory_attributes = ["Spinbox", "List"]
mandatory_check = utl.col_must_or_not_empty_based_on_other_col(
Expand All @@ -248,32 +265,30 @@ def _check_compliance_gui_attributes(df: pd.DataFrame) -> dict[str, pd.Series] |
must_have_value=False,
)
# If neither has a problem, we return None
if mandatory_check is None and no_attribute_check is None:
return None
# If both have problems, we combine the series
elif mandatory_check is not None and no_attribute_check is not None:
final_series = pd.Series(np.logical_or(mandatory_check, no_attribute_check))
elif mandatory_check is not None:
final_series = mandatory_check
else:
final_series = no_attribute_check
match mandatory_check, no_attribute_check:
case None, None:
return None
case pd.Series(), pd.Series():
final_series = pd.Series(np.logical_or(mandatory_check, no_attribute_check)) # type: ignore[arg-type]
case pd.Series(), None:
final_series = mandatory_check
case None, pd.Series:
final_series = no_attribute_check
# The boolean series is returned
return {"gui_attributes": final_series}


def _check_missing_values_in_row_raise_error(df: pd.DataFrame, excelfile: str) -> None:
def _check_missing_values_in_row(df: pd.DataFrame) -> None | list[ExcelContentProblem]:
"""
This function checks if all the required values are in the df.
If all the checks are ok, the function ends without any effect.
If any of the checks fail, a UserError is raised which contains the information in which column and row there
are problems.
If any of the checks fail, an object that contains the information in which column and row is returned
Nora-Olivia-Ammann marked this conversation as resolved.
Show resolved Hide resolved

Args:
df: pd.DataFrame that is to be checked
excelfile: Name of the original Excel file

Raises:
UserError: if any of the checks are failed
Returns:
If there are problems, it returns objects that store the information about it.
"""
# Every row in these columns must have a value
required_values = ["name", "super", "object", "gui_element"]
Expand All @@ -291,22 +306,28 @@ def _check_missing_values_in_row_raise_error(df: pd.DataFrame, excelfile: str) -
if missing_dict:
# Get the row numbers from the boolean series
missing_dict = utl.get_wrong_row_numbers(wrong_row_dict=missing_dict, true_remains=True)
error_str = "\n".join([f"- Column '{k}' Row Number(s): {v}" for k, v in missing_dict.items()])
raise UserError(f"The file '{excelfile}' is missing values in the following rows:\n" f"{error_str}")
return [
ExcelContentProblem(
user_msg="There are missing values in a column that must not be empty:", column=col, rows=row_nums
)
for col, row_nums in missing_dict.items()
]
else:
return None


def _do_property_excel_compliance(df: pd.DataFrame, excelfile: str) -> None:
"""
This function calls three separate functions which each checks if the pd.DataFrame is as we expect it.
Each of these functions raises a UserError if there is a problem.
Each of these functions raises an InputError if there is a problem.
If the checks do not fail, this function ends without an effect.

Args:
df: The pd.DataFrame that is checked
excelfile: The name of the original Excel file

Raises:
UserError if any of the checks fail
InputError: if any of the checks fail
"""
# If it does not pass any one of the tests, the function stops
required_columns = {
Expand All @@ -316,9 +337,17 @@ def _do_property_excel_compliance(df: pd.DataFrame, excelfile: str) -> None:
"gui_element",
"gui_attributes",
}
utl.check_contains_required_columns_else_raise_error(df=df, required_columns=required_columns)
utl.check_column_for_duplicate_else_raise_error(df=df, to_check_column="name")
_check_missing_values_in_row_raise_error(df=df, excelfile=excelfile)
problems = [
utl.check_contains_required_columns_else_raise_error(df=df, required_columns=required_columns),
utl.check_column_for_duplicate(df=df, to_check_column="name"),
]
if missing_vals_check := _check_missing_values_in_row(df=df):
problems.extend(missing_vals_check)
if any(problems):
extra = [problem.execute_error_protocol() for problem in problems if problem]

Nora-Olivia-Ammann marked this conversation as resolved.
Show resolved Hide resolved
msg = [f"The excel file '{excelfile}' has some problems:", *extra]
Nora-Olivia-Ammann marked this conversation as resolved.
Show resolved Hide resolved
raise InputError("\n".join(msg))


def _rename_deprecated_hlist(df: pd.DataFrame, excelfile: str) -> pd.DataFrame:
Expand Down Expand Up @@ -418,7 +447,7 @@ def excel2properties(
path_to_output_file: if provided, the output is written into this JSON file

Raises:
UserError: if something went wrong
InputError: if something went wrong

Returns:
a tuple consisting of the "properties" section as a Python list,
Expand Down
36 changes: 21 additions & 15 deletions src/dsp_tools/commands/excel2json/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
import pandas as pd
import regex

from dsp_tools.models.exceptions import UserError
from dsp_tools.models.input_error import ExcelContentProblem, ExcelStructureProblem

languages = ["en", "de", "fr", "it", "rm"]

Expand Down Expand Up @@ -65,27 +65,30 @@ def clean_data_frame(df: pd.DataFrame) -> pd.DataFrame:
return df


def check_contains_required_columns_else_raise_error(df: pd.DataFrame, required_columns: set[str]) -> None:
def check_contains_required_columns_else_raise_error(
df: pd.DataFrame, required_columns: set[str]
) -> None | ExcelStructureProblem:
"""
This function takes a pd.DataFrame and a set of required column names.
It checks if all the columns from the set are in the pd.DataFrame.
Additional columns to the ones in the set are allowed.
It raises an error if any columns are missing.

Args:
df: pd.DataFrame that is checked
required_columns: set of column names

Raises:
UserError: if there are required columns missing
Returns:
An object if
"""
if not required_columns.issubset(set(df.columns)):
raise UserError(
f"The following columns are missing in the excel:\n" f"{required_columns.difference(set(df.columns))}"
required = list(required_columns.difference(set(df.columns)))
return ExcelStructureProblem(
user_msg="The following required columns are missing in the excel:", column=required
)
return None


def check_column_for_duplicate_else_raise_error(df: pd.DataFrame, to_check_column: str) -> None:
def check_column_for_duplicate(df: pd.DataFrame, to_check_column: str) -> None | ExcelContentProblem:
"""
This function checks if a specified column contains duplicate values.
Empty cells (pd.NA) also count as duplicates.
Expand All @@ -95,16 +98,19 @@ def check_column_for_duplicate_else_raise_error(df: pd.DataFrame, to_check_colum
df: pd.DataFrame that is checked for duplicates
to_check_column: Name of the column that must not contain duplicates

Raises:
UserError: if there are duplicates in the column
Returns:
If there are problems it returns an object that stores the relevant user information.

"""
if df[to_check_column].duplicated().any():
# If it does, it creates a string with all the duplicate values and raises an error.
duplicate_values = ",".join(df[to_check_column][df[to_check_column].duplicated()].tolist())
raise UserError(
f"The column '{to_check_column}' may not contain any duplicate values. "
f"The following values appeared multiple times '{duplicate_values}'."
duplicate_values = df[to_check_column][df[to_check_column].duplicated()].tolist()
return ExcelContentProblem(
user_msg="Duplicate values are not allowed in the following:",
column=to_check_column,
values=duplicate_values,
)
else:
return None


def check_required_values(df: pd.DataFrame, required_values_columns: list[str]) -> dict[str, pd.Series]:
Expand Down