Rework of the whole workflow #53

hf-krechan · 2022-12-11T15:26:49Z

This PR changes the main workflow of kohlrahbi by simplifying it.
There is only one function to parse a AHB table row.

You can see the that it's a netto minus of lines ;)

sorry for this bad commit message

hf-kklein

Ich glaube dir erstmal, dass dieses neue "Einfärbe"-Vorgehen gut und sogar besser funktioniert als das bisherige Debugging, trotzdem trauere ich um die Unit Tests. Können wir vielleicht ein, zwei AHBs als ganzes in den Tests hinterlegen und dann zumindest highlevel/integration tests bauen?

src/kohlrahbi/enums/row_type_color.py

src/kohlrahbi/helper/read_functions.py

src/kohlrahbi/parser/bedingung_cell_parser.py

hf-kklein · 2022-12-12T06:25:24Z

src/kohlrahbi/parser/bedingung_cell_parser.py

+    row_index = dataframe.index.max()
+
+    bedingung = bedingung_cell.text.replace("\n", " ")
+    matches = re.findall(r"\[\d*\]", bedingung)


Ich dahcte erst: hier müsste der regex auch pakete abdecken. ist aber nicht so, weil es um die splate mit der Definition der Bedingungen geht und die pakete irgendwo oben im AHB spezifiziert werden.

src/kohlrahbi/parser/bedingung_cell_parser.py

src/kohlrahbi/parser/middle_cell_parser.py

hf-kklein · 2022-12-12T06:31:19Z

src/kohlrahbi/parser/middle_cell_parser.py

+    left_indent_position: int,
+    indicator_tabstop_positions: List[int],
+) -> None:
+    """Parses a paragraph in the middle column and puts the information into the appropriate columns


nur vom lesen fällt es mir schwer direkt zu verstehen, was die "middle cell" ist.

Verständlich.
Es geht um diese Spalte:

Fällt dir da ein guter Name ein?

man könnte es "body" nennen, in abgrenzung zum header oben und der... 🤔 "row description"? links?

nehme ich mit in den nächsten PR

src/kohlrahbi/parser/middle_cell_parser.py

Co-authored-by: konstantin <konstantin.klein@hochfrequenz.de>

…nto fix-page-break-in-dataelement

Co-authored-by: konstantin <konstantin.klein@hochfrequenz.de>

hf-krechan

Besten Dank für die Anmerkungen.
Denke es hat den Code nochmal gut verbessert :)

setup.cfg

hf-krechan · 2023-02-17T16:00:03Z

src/kohlrahbi/ahb/ahbsubtable.py

+    table: pd.DataFrame
+
+    @staticmethod
+    def _parse_docx_table(table_meta_data: Seed, ahb_table_dataframe: pd.DataFrame, docx_table: DocxTable):


Hinzugefügt in 33b6f3c

setup.cfg

src/kohlrahbi/ahb/ahbsubtable.py

src/kohlrahbi/row_type_checker.py

hf-krechan · 2023-02-17T18:29:52Z

src/kohlrahbi/unfoldedahb/unfoldedahbtable.py

+    """
+    The UnfoldedAhb contains one Prüfidentifikator.
+    Some columns in the AHB documents contain multiple information like Segmentname and Segmentgruppe.
+    This class unfolds these columns with multiple information.


So? e82d73a

hf-krechan · 2023-02-17T18:33:49Z

src/kohlrahbi/unfoldedahb/unfoldedahbtable.py

+
+        return FlatAnwendungshandbuch(meta=meta, lines=lines)
+
+    def to_flatahb_json(self, output_directory_path: Path):


hf-krechan · 2023-02-17T18:34:51Z

src/kohlrahbi/unfoldedahb/unfoldedahbtable.py

+            csv_output_directory_path / f"{self.meta_data.pruefidentifikator}.csv",
+        )
+
+    def to_xlsx(self, path_to_output_directory: Path):


hf-krechan · 2023-02-17T18:36:54Z

unittests/conftest.py

+        table = doc.add_table(rows=1, cols=1)
+
+        body_cell = table.rows[0].cells[0]
+
+        # the cell comes with an empty paragraph which I could not delete.
+        # So we insert the BodyCellParagraph attributes into the empty paragraph
+        first_body_cell_paragprah: CellParagraph = body_cell_paragraphs[0]
+
+        body_cell.paragraphs[0].text = first_body_cell_paragprah.text
+
+        if first_body_cell_paragprah.tabstop_positions is not None:
+            for tabstop_position in first_body_cell_paragprah.tabstop_positions:
+                body_cell.paragraphs[0].paragraph_format.tab_stops.add_tab_stop(tabstop_position)
+
+        body_cell.paragraphs[0].paragraph_format.left_indent = first_body_cell_paragprah.left_indent_length


Mmmh 🤔
Ich glaube die Funktion hier wird nicht mehr gebraucht.
Durch den Hack dass wir die docx Tabellen "fixen" können, können wir nun einfach direkt docx-Dateien nehmen um Testdaten zu erzeugen.

src/kohlrahbi/ahb/ahbsubtable.py

hf-kklein · 2023-02-18T08:01:08Z

src/kohlrahbi/ahb/ahbsubtable.py

+            table_meta_data.last_two_row_types[1] = table_meta_data.last_two_row_types[0]
+            table_meta_data.last_two_row_types[0] = current_row_type


Ich blicks noch nicht ganz, das Problem aber noch eher als die Lösung. Ich fasse mal zusammen:
Ausgangslage: Subtables erstrecken sich teilweise über einen Pagebreak hinweg
Problem: Der Header der Subtable wird in dem Fall stumpf wiederholt und ist sowas wie ein Störsignale in unserer Ausleselogik?
Lösung: Wir tracken die vergangenen RowTypes und können so den Effekt des Pagebreaks wieder rausrechnen?

src/kohlrahbi/harvester.py

src/kohlrahbi/row_type_checker.py

src/kohlrahbi/ahb/ahbsubtable.py

Co-authored-by: konstantin <konstantin.klein@hochfrequenz.de>

hf-krechan added 15 commits December 11, 2022 01:53

🚧 WIP

a8bf2f7

🎨 rename elixir to seed

841371b

🎨 improve naming of indicator tabstops

e15fa24

🎨 add append mode

7c8fa84

🚧 WIP for middle cell parser

eb508fb

🔥 Remove row_index from bedingung cell parser

1673e38

🎨 Reduce all df writer functions to just one and use append_mode

5760633

🎨 Add append mode

fdc049c

🥳 fundamental change in workflow worked!

5506b5d

✅ Update test

b3b31f7

🎨 rename elixir to seed

6cab26f

🔥 remove deprecated code

efb6222

✅ fix and comment out tests

3a757d5

🎨 Further improvements

5ee3a6a

sorry for this bad commit message

🔥 delete old lines

815ba8d

hf-krechan requested review from hf-kklein and hf-aschloegl December 12, 2022 06:00

hf-krechan marked this pull request as ready for review December 12, 2022 06:01

hf-krechan removed request for hf-kklein and hf-aschloegl December 12, 2022 06:01

hf-krechan marked this pull request as draft December 12, 2022 06:01

hf-kklein approved these changes Dec 12, 2022

View reviewed changes

hf-krechan and others added 8 commits December 12, 2022 07:43

💡 Add information how the colours are defined

fc39e4d

Co-authored-by: konstantin <konstantin.klein@hochfrequenz.de>

💡 Make clear which column is meant

b632f39

Co-authored-by: konstantin <konstantin.klein@hochfrequenz.de>

🎨 Improve regex to match Bedingungen

a1130e1

Co-authored-by: konstantin <konstantin.klein@hochfrequenz.de>

🏷 Add type hint for return value of count_matching

4ab3818

Co-authored-by: konstantin <konstantin.klein@hochfrequenz.de>

🎨 Improve code structure in if condition

84566a6

Co-authored-by: konstantin <konstantin.klein@hochfrequenz.de>

💡 Add example for middle_cell with multiple codes

b034fff

Merge remote-tracking branch 'origin/fix-page-break-in-dataelement' i…

b07fc76

…nto fix-page-break-in-dataelement

🎨 Improve code structure

1eac382

Co-authored-by: konstantin <konstantin.klein@hochfrequenz.de>

hf-krechan and others added 20 commits February 17, 2023 18:04

💡 Add information where to find the saved csv

eef94de

🔈 Add error log for edifact_format is None in to_xlsx

21542db

💡 Add information where to find xlsx file

f783a3b

🎨 Use attrs validators

c9ca684

💡 Add information when None is returned

7a71394

🎨 Make path machine independent

c497ac8

🧽 Apply clean code rule

5e78700

🚨🤫 Disable pylint warnings

a66908f

💡 Improve docstring

d5ea72c

🎨 Improve variable name by adding directory

5e070d0

🔥 Remove check_input_path function cause click will take care of it

2188d08

🎨 rename loaded_toml to state_of_kohlrahbi

214d15c

🔈 Add ahb_file_path to logging message

23ddcb8

Co-authored-by: konstantin <konstantin.klein@hochfrequenz.de>

💡 Add link to edi_energy_mirror

3755c4d

🔥 Remove deprecated pylint disable

b74bdce

💡 Add info when you get None

1247a03

🧽 Apply clean code rule

2fcd51b

💡 Try to improve docstring

e82d73a

🎨 Rename to dump and add information where the files are saved

f4e879c

🎨 same for xlsx

6692f48

hf-krechan commented Feb 17, 2023

View reviewed changes

hf-krechan changed the title ~~WIP Rework of the whole workflow~~ Rework of the whole workflow Feb 17, 2023

hf-kklein mentioned this pull request Feb 18, 2023

Don't let methods that have a get_... name modify the data #84

Closed

hf-kklein approved these changes Feb 18, 2023

View reviewed changes

🎨 Rename to sanitized_cells

1b7ed1a

hf-kklein reviewed Feb 18, 2023

View reviewed changes

src/kohlrahbi/ahb/ahbsubtable.py Outdated Show resolved Hide resolved

💡 Make it more clear why we remember the last two row types

d7b2340

Co-authored-by: konstantin <konstantin.klein@hochfrequenz.de>

hf-krechan merged commit d0f71ff into main Feb 19, 2023

hf-krechan deleted the fix-page-break-in-dataelement branch February 19, 2023 17:45

hf-krechan restored the fix-page-break-in-dataelement branch February 19, 2023 17:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework of the whole workflow #53

Rework of the whole workflow #53

hf-krechan commented Dec 11, 2022 •

edited

Loading

hf-kklein left a comment

hf-kklein Dec 12, 2022

hf-kklein Dec 12, 2022

hf-krechan Dec 12, 2022

hf-kklein Dec 12, 2022

hf-krechan Dec 14, 2022

hf-krechan left a comment

hf-krechan Feb 17, 2023

hf-krechan Feb 17, 2023

hf-krechan Feb 17, 2023

hf-krechan Feb 17, 2023

hf-krechan Feb 17, 2023

hf-kklein Feb 18, 2023


		return FlatAnwendungshandbuch(meta=meta, lines=lines)

		def to_flatahb_json(self, output_directory_path: Path):

		table_meta_data.last_two_row_types[1] = table_meta_data.last_two_row_types[0]
		table_meta_data.last_two_row_types[0] = current_row_type

Rework of the whole workflow #53

Rework of the whole workflow #53

Conversation

hf-krechan commented Dec 11, 2022 • edited Loading

hf-kklein left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hf-krechan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hf-krechan commented Dec 11, 2022 •

edited

Loading