-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework of the whole workflow #53
Conversation
sorry for this bad commit message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ich glaube dir erstmal, dass dieses neue "Einfärbe"-Vorgehen gut und sogar besser funktioniert als das bisherige Debugging, trotzdem trauere ich um die Unit Tests. Können wir vielleicht ein, zwei AHBs als ganzes in den Tests hinterlegen und dann zumindest highlevel/integration tests bauen?
row_index = dataframe.index.max() | ||
|
||
bedingung = bedingung_cell.text.replace("\n", " ") | ||
matches = re.findall(r"\[\d*\]", bedingung) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ich dahcte erst: hier müsste der regex auch pakete abdecken. ist aber nicht so, weil es um die splate mit der Definition der Bedingungen geht und die pakete irgendwo oben im AHB spezifiziert werden.
left_indent_position: int, | ||
indicator_tabstop_positions: List[int], | ||
) -> None: | ||
"""Parses a paragraph in the middle column and puts the information into the appropriate columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nur vom lesen fällt es mir schwer direkt zu verstehen, was die "middle cell" ist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
man könnte es "body" nennen, in abgrenzung zum header oben und der... 🤔 "row description"? links?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nehme ich mit in den nächsten PR
Co-authored-by: konstantin <konstantin.klein@hochfrequenz.de>
Co-authored-by: konstantin <konstantin.klein@hochfrequenz.de>
Co-authored-by: konstantin <konstantin.klein@hochfrequenz.de>
Co-authored-by: konstantin <konstantin.klein@hochfrequenz.de>
Co-authored-by: konstantin <konstantin.klein@hochfrequenz.de>
…nto fix-page-break-in-dataelement
Co-authored-by: konstantin <konstantin.klein@hochfrequenz.de>
Co-authored-by: konstantin <konstantin.klein@hochfrequenz.de>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besten Dank für die Anmerkungen.
Denke es hat den Code nochmal gut verbessert :)
src/kohlrahbi/ahb/ahbsubtable.py
Outdated
table: pd.DataFrame | ||
|
||
@staticmethod | ||
def _parse_docx_table(table_meta_data: Seed, ahb_table_dataframe: pd.DataFrame, docx_table: DocxTable): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hinzugefügt in 33b6f3c
""" | ||
The UnfoldedAhb contains one Prüfidentifikator. | ||
Some columns in the AHB documents contain multiple information like Segmentname and Segmentgruppe. | ||
This class unfolds these columns with multiple information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So? e82d73a
|
||
return FlatAnwendungshandbuch(meta=meta, lines=lines) | ||
|
||
def to_flatahb_json(self, output_directory_path: Path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
csv_output_directory_path / f"{self.meta_data.pruefidentifikator}.csv", | ||
) | ||
|
||
def to_xlsx(self, path_to_output_directory: Path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
table = doc.add_table(rows=1, cols=1) | ||
|
||
body_cell = table.rows[0].cells[0] | ||
|
||
# the cell comes with an empty paragraph which I could not delete. | ||
# So we insert the BodyCellParagraph attributes into the empty paragraph | ||
first_body_cell_paragprah: CellParagraph = body_cell_paragraphs[0] | ||
|
||
body_cell.paragraphs[0].text = first_body_cell_paragprah.text | ||
|
||
if first_body_cell_paragprah.tabstop_positions is not None: | ||
for tabstop_position in first_body_cell_paragprah.tabstop_positions: | ||
body_cell.paragraphs[0].paragraph_format.tab_stops.add_tab_stop(tabstop_position) | ||
|
||
body_cell.paragraphs[0].paragraph_format.left_indent = first_body_cell_paragprah.left_indent_length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmmh 🤔
Ich glaube die Funktion hier wird nicht mehr gebraucht.
Durch den Hack dass wir die docx Tabellen "fixen" können, können wir nun einfach direkt docx-Dateien nehmen um Testdaten zu erzeugen.
table_meta_data.last_two_row_types[1] = table_meta_data.last_two_row_types[0] | ||
table_meta_data.last_two_row_types[0] = current_row_type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ich blicks noch nicht ganz, das Problem aber noch eher als die Lösung. Ich fasse mal zusammen:
Ausgangslage: Subtables erstrecken sich teilweise über einen Pagebreak hinweg
Problem: Der Header der Subtable wird in dem Fall stumpf wiederholt und ist sowas wie ein Störsignale in unserer Ausleselogik?
Lösung: Wir tracken die vergangenen RowTypes und können so den Effekt des Pagebreaks wieder rausrechnen?
Co-authored-by: konstantin <konstantin.klein@hochfrequenz.de>
This PR changes the main workflow of kohlrahbi by simplifying it.
There is only one function to parse a AHB table row.
You can see the that it's a netto minus of lines ;)