Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework of the whole workflow #53

Merged
merged 278 commits into from
Feb 19, 2023
Merged

Rework of the whole workflow #53

merged 278 commits into from
Feb 19, 2023

Conversation

hf-krechan
Copy link
Collaborator

@hf-krechan hf-krechan commented Dec 11, 2022

This PR changes the main workflow of kohlrahbi by simplifying it.
There is only one function to parse a AHB table row.

You can see the that it's a netto minus of lines ;)

@hf-krechan hf-krechan marked this pull request as ready for review December 12, 2022 06:01
@hf-krechan hf-krechan marked this pull request as draft December 12, 2022 06:01
Copy link
Contributor

@hf-kklein hf-kklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ich glaube dir erstmal, dass dieses neue "Einfärbe"-Vorgehen gut und sogar besser funktioniert als das bisherige Debugging, trotzdem trauere ich um die Unit Tests. Können wir vielleicht ein, zwei AHBs als ganzes in den Tests hinterlegen und dann zumindest highlevel/integration tests bauen?

src/kohlrahbi/enums/row_type_color.py Outdated Show resolved Hide resolved
src/kohlrahbi/helper/read_functions.py Outdated Show resolved Hide resolved
src/kohlrahbi/parser/bedingung_cell_parser.py Outdated Show resolved Hide resolved
row_index = dataframe.index.max()

bedingung = bedingung_cell.text.replace("\n", " ")
matches = re.findall(r"\[\d*\]", bedingung)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ich dahcte erst: hier müsste der regex auch pakete abdecken. ist aber nicht so, weil es um die splate mit der Definition der Bedingungen geht und die pakete irgendwo oben im AHB spezifiziert werden.

src/kohlrahbi/parser/bedingung_cell_parser.py Outdated Show resolved Hide resolved
src/kohlrahbi/parser/middle_cell_parser.py Outdated Show resolved Hide resolved
src/kohlrahbi/parser/middle_cell_parser.py Outdated Show resolved Hide resolved
src/kohlrahbi/parser/middle_cell_parser.py Outdated Show resolved Hide resolved
left_indent_position: int,
indicator_tabstop_positions: List[int],
) -> None:
"""Parses a paragraph in the middle column and puts the information into the appropriate columns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nur vom lesen fällt es mir schwer direkt zu verstehen, was die "middle cell" ist.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verständlich.
Es geht um diese Spalte:
image

Fällt dir da ein guter Name ein?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

man könnte es "body" nennen, in abgrenzung zum header oben und der... 🤔 "row description"? links?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nehme ich mit in den nächsten PR

src/kohlrahbi/parser/middle_cell_parser.py Outdated Show resolved Hide resolved
hf-krechan and others added 8 commits December 12, 2022 07:43
Co-authored-by: konstantin <konstantin.klein@hochfrequenz.de>
Co-authored-by: konstantin <konstantin.klein@hochfrequenz.de>
Co-authored-by: konstantin <konstantin.klein@hochfrequenz.de>
Co-authored-by: konstantin <konstantin.klein@hochfrequenz.de>
Co-authored-by: konstantin <konstantin.klein@hochfrequenz.de>
Co-authored-by: konstantin <konstantin.klein@hochfrequenz.de>
Copy link
Collaborator Author

@hf-krechan hf-krechan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besten Dank für die Anmerkungen.
Denke es hat den Code nochmal gut verbessert :)

setup.cfg Show resolved Hide resolved
table: pd.DataFrame

@staticmethod
def _parse_docx_table(table_meta_data: Seed, ahb_table_dataframe: pd.DataFrame, docx_table: DocxTable):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hinzugefügt in 33b6f3c

setup.cfg Show resolved Hide resolved
src/kohlrahbi/ahb/ahbsubtable.py Outdated Show resolved Hide resolved
src/kohlrahbi/ahb/ahbsubtable.py Outdated Show resolved Hide resolved
src/kohlrahbi/row_type_checker.py Show resolved Hide resolved
"""
The UnfoldedAhb contains one Prüfidentifikator.
Some columns in the AHB documents contain multiple information like Segmentname and Segmentgruppe.
This class unfolds these columns with multiple information.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So? e82d73a


return FlatAnwendungshandbuch(meta=meta, lines=lines)

def to_flatahb_json(self, output_directory_path: Path):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

csv_output_directory_path / f"{self.meta_data.pruefidentifikator}.csv",
)

def to_xlsx(self, path_to_output_directory: Path):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +11 to +25
table = doc.add_table(rows=1, cols=1)

body_cell = table.rows[0].cells[0]

# the cell comes with an empty paragraph which I could not delete.
# So we insert the BodyCellParagraph attributes into the empty paragraph
first_body_cell_paragprah: CellParagraph = body_cell_paragraphs[0]

body_cell.paragraphs[0].text = first_body_cell_paragprah.text

if first_body_cell_paragprah.tabstop_positions is not None:
for tabstop_position in first_body_cell_paragprah.tabstop_positions:
body_cell.paragraphs[0].paragraph_format.tab_stops.add_tab_stop(tabstop_position)

body_cell.paragraphs[0].paragraph_format.left_indent = first_body_cell_paragprah.left_indent_length
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmmh 🤔
Ich glaube die Funktion hier wird nicht mehr gebraucht.
Durch den Hack dass wir die docx Tabellen "fixen" können, können wir nun einfach direkt docx-Dateien nehmen um Testdaten zu erzeugen.

@hf-krechan hf-krechan changed the title WIP Rework of the whole workflow Rework of the whole workflow Feb 17, 2023
src/kohlrahbi/ahb/ahbsubtable.py Outdated Show resolved Hide resolved
Comment on lines +66 to +67
table_meta_data.last_two_row_types[1] = table_meta_data.last_two_row_types[0]
table_meta_data.last_two_row_types[0] = current_row_type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ich blicks noch nicht ganz, das Problem aber noch eher als die Lösung. Ich fasse mal zusammen:
Ausgangslage: Subtables erstrecken sich teilweise über einen Pagebreak hinweg
Problem: Der Header der Subtable wird in dem Fall stumpf wiederholt und ist sowas wie ein Störsignale in unserer Ausleselogik?
Lösung: Wir tracken die vergangenen RowTypes und können so den Effekt des Pagebreaks wieder rausrechnen?

src/kohlrahbi/harvester.py Show resolved Hide resolved
src/kohlrahbi/harvester.py Show resolved Hide resolved
src/kohlrahbi/row_type_checker.py Show resolved Hide resolved
Co-authored-by: konstantin <konstantin.klein@hochfrequenz.de>
@hf-krechan hf-krechan merged commit d0f71ff into main Feb 19, 2023
@hf-krechan hf-krechan deleted the fix-page-break-in-dataelement branch February 19, 2023 17:45
@hf-krechan hf-krechan restored the fix-page-break-in-dataelement branch February 19, 2023 17:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use classes to improve program structure
2 participants