diff --git a/checklist/checklist_sys.csv/overview.csv b/checklist/checklist_sys.csv/overview.csv new file mode 100644 index 0000000..5ba2a5f --- /dev/null +++ b/checklist/checklist_sys.csv/overview.csv @@ -0,0 +1,2 @@ +Title,Description +Checklist for Tests in Machine Learning Projects,This is a comprehensive checklist for evaluating the data and ML pipeline based on identified testing strategies from experts in the field. diff --git a/checklist/checklist_sys.csv/tests.csv b/checklist/checklist_sys.csv/tests.csv new file mode 100644 index 0000000..419b3cf --- /dev/null +++ b/checklist/checklist_sys.csv/tests.csv @@ -0,0 +1,9 @@ +ID,Topic,Title,Requirement,Explanation,References +2.1,Data Presence,Test Data Fetching and File Reading,"Verify that the data fetching API or data file reading functionality works correctly. Ensure that proper error handling is in place for scenarios such as missing files, incorrect file formats, and network errors.","Ensure that the code responsible for fetching or reading data can handle errors. This means if the file is missing, the format is wrong, or there's a network issue, the system should not crash but should provide a clear error message indicating the problem.",(general knowledge) +3.1,Data Quality,Validate Data Shape and Values,"Check that the data has the expected shape and that all values meet domain-specific constraints, such as non-negative distances.","Check that the data being used has the correct structure (like having the right number of columns) and that the values within the data make sense (e.g., distances should not be negative). This ensures that the data is valid and reliable for model training.","alexander2024Evaluating, ISO/IEC5259" +3.2,Data Quality,Check for Duplicate Records in Data,Check for duplicate records in the dataset and ensure that there are none.,"Ensure that the dataset does not contain duplicate entries, as these can skew the results and reduce the model's performance. The test should identify any repeated records so they can be removed or investigated.",ISO/IEC5259 +4.1,Data Ingestion,Verify Data Split Proportion,Check that the data is split into training and testing sets in the expected proportion.,"Confirm that the data is divided correctly into training and testing sets according to the intended ratio. This is crucial for ensuring that the model is trained and evaluated properly, with representative samples in each set.","openja2023studying, DBLP:conf/recsys/Kula15, singh2020mmf" +5.1,Model Fitting,Test Model Output Shape,Validate that the model's output has the expected shape.,"Ensure that the output from the model has the correct dimensions and structure. For example, in a classification task, if the model should output probabilities for each class, the test should verify that the output is an array with the correct dimensions. Ensuring the correct output shape helps prevent runtime errors and ensures consistency in how data is handled downstream.","openja2023studying, DBLP:conf/recsys/Kula15, singh2020mmf" +6.1,Model Evaluation,Verify Evaluation Metrics Implementation,Verify that the evaluation metrics are correctly implemented and appropriate for the model's task.,Confirm that the metrics used to evaluate the model are implemented correctly and are suitable for the specific task at hand. This helps in accurately assessing the model's performance and understanding its strengths and weaknesses.,"openja2023studying, DBLP:conf/recsys/Kula15, singh2020mmf" +6.2,Model Evaluation,Evaluate Model's Performance Against Thresholds,"Compute evaluation metrics for both the training and testing datasets and ensure that these metrics exceed predefined threshold values, indicating acceptable model performance.","This ensures that the model's performance meets or exceeds certain benchmarks. By setting thresholds for metrics like accuracy or precision, you can automatically flag models that underperform or overfit. This is crucial for maintaining a baseline quality of results and for ensuring that the model meets the requirements necessary for deployment.","openja2023studying, DBLP:conf/recsys/Kula15, singh2020mmf" +8.1,Data Quality (Optional),Validate Outliers Detection and Handling,Detect outliers in the dataset. Ensure that the outlier detection mechanism is sensitive enough to flag true outliers while ignoring minor anomalies.,The detection method should be precise enough to catch significant anomalies without being misled by minor variations. This is important for maintaining data quality and ensuring the model's reliability in certain projects.,ISO/IEC5259 diff --git a/checklist/checklist_sys.csv/topics.csv b/checklist/checklist_sys.csv/topics.csv new file mode 100644 index 0000000..c35d79a --- /dev/null +++ b/checklist/checklist_sys.csv/topics.csv @@ -0,0 +1,9 @@ +ID,Topic,Description +1,General,The following items describe best practices for all tests to be written. +2,Data Presence,"The following items describe tests that need to be done for testing the presence of data. This area of tests mainly concern whether the reading and saving operations are behaving as expected, and any unexpected behavior would not be passed silently." +3,Data Quality,"The following items describe tests that need to be done for testing the quality of data. This area of tests mainly concern whether the data supplied is in the expected format, data containing null values or outliers to make sure that the data processing pipeline is robust." +4,Data Ingestion,The following items describe tests that need to be done for testing if the data is ingestion properly. +5,Model Fitting,The following items describe tests that need to be done for testing the model fitting process. The unit tests written for this section usually mock model load and model predictions similarly to mocking file access. +6,Model Evaluation,The following items describe tests that need to be done for testing the model evaluation process. +7,Artifact Testing,"The following items involves explicit checks for behaviors that we expect the artifacts e.g. models, plots, etc., to follow." +8,Data Quality (Optional),"The following items describe tests that need to be done for testing the quality of data, but they may not be applicable to all projects." diff --git a/checklist/references.bib b/checklist/references.bib index b5597da..1c66aac 100644 --- a/checklist/references.bib +++ b/checklist/references.bib @@ -73,3 +73,58 @@ @misc{ribeiro2020accuracy archiveprefix = {arXiv}, primaryclass = {cs.CL} } + +@misc{alexander2024Evaluating, + title = {Evaluating the Decency and Consistency of Data Validation Tests Generated by LLMs∗}, + author = {Rohan Alexander and Lindsay Katz and Callandra Moore and Michaela Drouillard and Michael Wing-Cheung Wong and Zane Schwartz}, + year = 2024, + eprint = {2310.01402v2}, + archiveprefix = {arXiv}, + primaryclass = {stat.ME} +} + +@misc{ISO/IEC5259, + title = {ISO/IEC DIS 5259 Artificial intelligence — Data quality for analytics and machine learning (ML)}, + author = {ICS}, + year = 2024, + month = {July}, + url = {https://www.iso.org/standard/81088.html} +} + +@misc{hynes2017, + title = {The Data Linter: Lightweight, Automated Sanity Checking for ML Data Sets}, + author = {Nick Hynes and D. Sculley and Michael Terry}, + year = 2017, + url = {http://learningsys.org/nips17/assets/papers/paper_19.pdf} +} + +@article{openja2023studying, + title = {Studying the Practices of Testing Machine Learning Software in the Wild}, + author = {Openja, Moses and Khomh, Foutse and Foundjem, Armstrong and Ming, Zhen and Abidi, Mouna and Hassan, Ahmed E and others}, + journal = {arXiv preprint arXiv:2312.12604}, + year = {2023} +} + +@inproceedings{DBLP:conf/recsys/Kula15, + author = {Maciej Kula}, + editor = {Toine Bogers and + Marijn Kool"::en}, + title = {Metadata Embeddings for User and Item Cold-start Recommendations}, + booktitle = {Proceedings of the 2nd Workshop on New Trends on Content-Based Recommender + Systems co-located with 9th {ACM} Conference on Recommender Systems + (RecSys 2015), Vienna, Austria, September 16-20, 2015.}, + series = {{CEUR} Workshop Proceedings}, + volume = {1448}, + pages = {14--21}, + publisher = {CEUR-WS.org}, + year = {2015}, + url = {http://ceur-ws.org/Vol-1448/paper4.pdf}, +} + +@misc{singh2020mmf, + author = {Singh, Amanpreet and Goswami, Vedanuj and Natarajan, Vivek and Jiang, Yu and Chen, Xinlei and Shah, Meet and + Rohrbach, Marcus and Batra, Dhruv and Parikh, Devi}, + title = {MMF: A multimodal framework for vision and language research}, + howpublished = {\url{https://github.com/facebookresearch/mmf}}, + year = {2020} +} diff --git a/src/test_creation/analyze.py b/src/test_creation/analyze.py index d8eb551..ceba852 100644 --- a/src/test_creation/analyze.py +++ b/src/test_creation/analyze.py @@ -113,7 +113,12 @@ def evaluate(self, verbose: bool = False) -> List[dict]: if __name__ == '__main__': - def main(checklist_path, repo_path): + def main(checklist_path, repo_path, report_output_path, report_output_format='html'): + """ + Example: + ---------- + >>> python src/test_creation/analyze.py --checklist_path='./checklist/checklist_demo.csv' --repo_path='../lightfm/' --report_output_path='./report/evaluation_report.html' --report_output_format='html' + """ llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0) checklist = Checklist(checklist_path, checklist_format=ChecklistFormat.CSV) extractor = PythonTestFileExtractor(Repository(repo_path)) @@ -122,6 +127,7 @@ def main(checklist_path, repo_path): response = evaluator.evaluate() parser = ResponseParser(response) - parser.get_completeness_score() + parser.get_completeness_score(verbose=True) + parser.export_evaluation_report(report_output_path, report_output_format, exist_ok=True) fire.Fire(main) diff --git a/src/test_creation/checklist_export.py b/src/test_creation/checklist_export.py new file mode 100644 index 0000000..4f8ee7a --- /dev/null +++ b/src/test_creation/checklist_export.py @@ -0,0 +1,25 @@ +import fire + +from modules.checklist.checklist import Checklist, ChecklistFormat + + +def export_checklist(checklist_path: str): + """Example calls. To be removed later. + + Example: + python src/test_creation/modules/checklist/checklist.py ./checklist/test-dump-csv + + Note that the supplied path must be a directory containing 3 CSV files: + 1. `overview.csv` + 2. `topics.csv` + 3. `tests.csv` + """ + __package__ = '' + checklist = Checklist(checklist_path, checklist_format=ChecklistFormat.CSV) + print(checklist.as_markdown()) + checklist.export_html("checklist.html", exist_ok=True) + checklist.export_pdf("checklist.pdf", exist_ok=True) + + +if __name__ == "__main__": + fire.Fire(export_checklist) diff --git a/src/test_creation/demo_report_export.ipynb b/src/test_creation/demo_report_export.ipynb new file mode 100644 index 0000000..9a55f74 --- /dev/null +++ b/src/test_creation/demo_report_export.ipynb @@ -0,0 +1,177 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "669bb292-2b53-4a28-8d5f-ef6f3687f440", + "metadata": {}, + "source": [ + "## Evaluation Report Export Function Demo - For Development" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "d2c1ead7-9d5b-4414-80e2-07092ba180ca", + "metadata": {}, + "outputs": [], + "source": [ + "from analyze import *\n", + "from analyze import TestEvaluator\n", + "from modules.checklist.checklist import Checklist, ChecklistFormat\n", + "from modules.code_analyzer.repo import Repository\n", + "from modules.workflow.files import PythonTestFileExtractor, RepoFileExtractor\n", + "from modules.workflow.parse import ResponseParser\n", + "from langchain_openai import ChatOpenAI" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "ad0a59a9-185c-4f17-a0dd-fa2534958ecb", + "metadata": {}, + "outputs": [], + "source": [ + "repo_path = '../../../lightfm/'\n", + "checklist_path = '../../checklist/checklist_demo.csv'\n", + "report_output_path_html = '../../report/evaluation_report.html'\n", + "report_output_path_pdf = '../../report/evaluation_report.pdf'" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "d717ba5d-dc9d-477d-a9db-ccb993f48f09", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:14<00:00, 7.37s/it]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Report:\n", + " Requirement \\\n", + "ID Title \n", + "1.1 Write Descriptive Test Names Each test function should have a clear, descri... \n", + "1.2 Keep Tests Focused Each test should focus on a single scenario, u... \n", + "2.1 Ensure Data File Loads as Expected Ensure that data-loading functions correctly l... \n", + "5.1 Validate Model Input and Output Compatibility Confirm that the model accepts inputs of the c... \n", + "\n", + " is_Satisfied \\\n", + "ID Title \n", + "1.1 Write Descriptive Test Names 1 \n", + "1.2 Keep Tests Focused 1 \n", + "2.1 Ensure Data File Loads as Expected 0 \n", + "5.1 Validate Model Input and Output Compatibility 0 \n", + "\n", + " n_files_tested \\\n", + "ID Title \n", + "1.1 Write Descriptive Test Names 2 \n", + "1.2 Keep Tests Focused 2 \n", + "2.1 Ensure Data File Loads as Expected 2 \n", + "5.1 Validate Model Input and Output Compatibility 2 \n", + "\n", + " Observations \\\n", + "ID Title \n", + "1.1 Write Descriptive Test Names [(test_cross_validation.py) The test function ... \n", + "1.2 Keep Tests Focused [(test_cross_validation.py) The test function ... \n", + "2.1 Ensure Data File Loads as Expected [(test_cross_validation.py) The code does not ... \n", + "5.1 Validate Model Input and Output Compatibility [(test_cross_validation.py) The code does not ... \n", + "\n", + " Function References \n", + "ID Title \n", + "1.1 Write Descriptive Test Names [{'File Path': '../../../lightfm/tests/test_cr... \n", + "1.2 Keep Tests Focused [{'File Path': '../../../lightfm/tests/test_cr... \n", + "2.1 Ensure Data File Loads as Expected [{'File Path': '../../../lightfm/tests/test_cr... \n", + "5.1 Validate Model Input and Output Compatibility [{'File Path': '../../../lightfm/tests/test_cr... \n", + "\n", + "Score: 2/4\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "text/plain": [ + "'2/4'" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "llm = ChatOpenAI(model=\"gpt-3.5-turbo\", temperature=0)\n", + "checklist = Checklist(checklist_path, checklist_format=ChecklistFormat.CSV)\n", + "extractor = PythonTestFileExtractor(Repository(repo_path))\n", + "\n", + "evaluator = TestEvaluator(llm, extractor, checklist)\n", + "response = evaluator.evaluate()\n", + "\n", + "parser = ResponseParser(response)\n", + "parser.get_completeness_score(verbose=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "273db18c-13c4-4c86-a4c8-f42e0b0e37c5", + "metadata": {}, + "outputs": [], + "source": [ + "parser.export_evaluation_report(report_output_path_html, 'html', exist_ok=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "5a682a42-8807-48c6-9de4-0558838e3ccd", + "metadata": {}, + "outputs": [], + "source": [ + "parser.export_evaluation_report(report_output_path_pdf, 'pdf', exist_ok=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "07875448-9c58-4ec0-94b8-de9be8870011", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python [conda env:test-creation]", + "language": "python", + "name": "conda-env-test-creation-py" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.3" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/src/test_creation/modules/checklist/checklist.py b/src/test_creation/modules/checklist/checklist.py index 5d56e26..0b943fa 100644 --- a/src/test_creation/modules/checklist/checklist.py +++ b/src/test_creation/modules/checklist/checklist.py @@ -5,10 +5,10 @@ from typing import Union from abc import ABC, abstractmethod -import fire -import pypandoc from ruamel.yaml import YAML +from ..mixins import ExportableMixin + def filter_dict(d: dict, keys: list) -> dict: return {k: v for k, v in d.items() if k in keys} @@ -137,7 +137,7 @@ def write(cls, path: str, data: dict) -> None: cls._write_file(os.path.join(path, cls.tests_filename), tests, cls.tests_field_names_unnested) -class Checklist: +class Checklist(ExportableMixin): def __init__(self, checklist_path: str, checklist_format: ChecklistFormat): if not os.path.exists(checklist_path): raise FileNotFoundError("Checklist file not found.") @@ -179,82 +179,45 @@ def to_yaml(self, output_path: str, no_preserve_format: bool = False, exist_ok: "Roundtripping is not yet implemented. If you want to dump the YAML file disregarding the original " "formatting, use `no_preserve_format=True`." ) - self.__filedump_check(output_path, exist_ok) + self._filedump_check(output_path, exist_ok) YamlChecklistIO.write(output_path, self.content) def to_csv(self, output_path: str, exist_ok: bool = False): """Dump the checklist to a directory containing three separate CSV files.""" - self.__filedump_check(output_path, exist_ok) + self._filedump_check(output_path, exist_ok, expects_directory_if_exists=True) CsvChecklistIO.write(output_path, self.content) def as_markdown(self): - return self._get_md_representation(self.content, curr_level=1) - - def _get_md_representation(self, content: dict, curr_level: int): - repeated_col = [k for k, v in content.items() if isinstance(v, list)] - - # print out header for each item - md_repr = '#' * curr_level - if 'ID' in content.keys(): - md_repr += f" {content['ID']}" - if 'Title' in content.keys(): - md_repr += f" {content['Title']}\n\n" - elif 'Topic' in content.keys(): - md_repr += f" {content['Topic']}\n\n" - - # print out non-title, non-repeated items - for k, v in content.items(): - if k not in repeated_col and k not in ['Title', 'Topic', 'ID']: - md_repr += f'**{k}**: {v.replace("'", "\\'")}\n\n' - - # handle repeated columns and references - for k in repeated_col: - if k != 'References': - for item in content[k]: - md_repr += self._get_md_representation(item, curr_level=curr_level + 1) - else: - md_repr += '**References:**\n\n' + '\n'.join(f' - {item}' for item in content['References']) + '\n\n' - - return md_repr - - @staticmethod - def __filedump_check(output_path: str, exist_ok: bool): - if not exist_ok and os.path.exists(output_path): - raise FileExistsError("Output file already exists. Use `exist_ok=True` to overwrite.") - return True - - def export_html(self, output_path: str, exist_ok: bool = False): - self.__filedump_check(output_path, exist_ok) - pypandoc.convert_text(self.as_markdown(), 'html', format='md', outputfile=output_path) - - def export_pdf(self, output_path: str, exist_ok: bool = False): - self.__filedump_check(output_path, exist_ok) - pypandoc.convert_text(self.as_markdown(), 'pdf', format='md', outputfile=output_path, - extra_args=['--pdf-engine=tectonic']) - - def export_quarto(self, output_path: str, exist_ok: bool = False): - self.__filedump_check(output_path, exist_ok) - header = f'---\ntitle: "{self.content['Title']}"\nformat:\n html:\n code-fold: true\n---\n\n' - qmd_repr = header + self.as_markdown() - with open(output_path, "w", encoding="utf-8") as f: - f.write(qmd_repr) - - -if __name__ == "__main__": - def example(checklist_path: str): - """Example calls. To be removed later. - - Example: - python src/test_creation/modules/checklist/checklist.py ./checklist/test-dump-csv - - Note that the supplied path must be a directory containing 3 CSV files: - 1. `overview.csv` - 2. `topics.csv` - 3. `tests.csv` - """ - checklist = Checklist(checklist_path, checklist_format=ChecklistFormat.CSV) - print(checklist.as_markdown()) - checklist.export_pdf("checklist.pdf", exist_ok=True) - - - fire.Fire(example) + def _get_md_representation(content: dict, curr_level: int): + repeated_col = [k for k, v in content.items() if isinstance(v, list)] + + # print out header for each item + md_repr = '#' * curr_level + if 'ID' in content.keys(): + md_repr += f" {content['ID']}" + if 'Title' in content.keys(): + md_repr += f" {content['Title']}\n\n" + elif 'Topic' in content.keys(): + md_repr += f" {content['Topic']}\n\n" + + # print out non-title, non-repeated items + for k, v in content.items(): + if k not in repeated_col and k not in ['Title', 'Topic', 'ID']: + md_repr += f'**{k}**: {v}\n\n' + + # handle repeated columns and references + for k in repeated_col: + if k != 'References': + for item in content[k]: + md_repr += _get_md_representation(item, curr_level=curr_level + 1) + else: + md_repr += '**References:**\n\n' + '\n'.join( + f' - {item}' for item in content['References']) + '\n\n' + + return md_repr + + return _get_md_representation(self.content, curr_level=1) + + def as_quarto_markdown(self): + header = header = '---\ntitle: "{}"\nformat:\n html:\n code-fold: true\n---\n\n'.format(self.content['Title']) + return header + self.as_markdown() diff --git a/src/test_creation/modules/mixins.py b/src/test_creation/modules/mixins.py new file mode 100644 index 0000000..6bc25a5 --- /dev/null +++ b/src/test_creation/modules/mixins.py @@ -0,0 +1,80 @@ +import os +from abc import ABC, abstractmethod + +import pypandoc + + +class WriteableMixin: + """A mixin for classes which will write content to filesystem.""" + def _filedump_check(self, output_path: str, exist_ok: bool, expects_directory_if_exists: bool = False): + normalized_path = os.path.abspath(os.path.normpath(output_path)) + dir_path = os.path.dirname(normalized_path) + print(normalized_path, dir_path) + if not os.access(dir_path, os.W_OK): + raise PermissionError(f"Write permission is not granted for the output path: {dir_path}") + + if not exist_ok: + if os.path.exists(normalized_path): + raise FileExistsError("Output file already exists. Use `exist_ok=True` to overwrite.") + elif os.path.exists(normalized_path): + if expects_directory_if_exists and not os.path.isdir(normalized_path): + raise NotADirectoryError("An non-directory already exists in the path but the write operation is expecting to overwrite a directory.") + elif not expects_directory_if_exists and not os.path.isfile(normalized_path): + raise IsADirectoryError("An non-file object already exists in the path but the write operation is expecting to overwrite a file.") + + if not os.access(normalized_path, os.W_OK): + raise PermissionError(f"Write permission is not granted for the output path: {normalized_path}") + return True + + +class ExportableMixin(WriteableMixin, ABC): + """A mixin that provides functionality to export (dump) content as HTML/PDF/Quarto documents. + + Extends WriteableMixin. + + Relies on markdown representations of the object. + The class including mixin must have `.as_markdown()` and `.as_quarto_markdown()` implemented. + """ + @abstractmethod + def as_markdown(self) -> str: + pass + + @abstractmethod + def as_quarto_markdown(self) -> str: + pass + + @staticmethod + def _escape_single_quotes(string: str) -> str: + return string.replace("'", "\\'") + + def __format_check(self, output_path, format): + formats = { + "pdf": ["pdf"], + "html": ["htm", "html"], + "qmd": ["qmd"] + } + + normalized_ext = output_path.split(".")[-1].lower() + if normalized_ext not in formats[format]: + raise ValueError(f"Output file path `{output_path}` does not meet expectation. When specifying `{format}` to be exported, please use one of the following extensions: {str(formats[format])}.") + + def _export_check(self, output_path: str, format: str, exist_ok: bool): + self._filedump_check(output_path, exist_ok) + self.__format_check(output_path, format) + + def export_html(self, output_path: str, exist_ok: bool = False): + self._export_check(output_path, format="html", exist_ok=exist_ok) + pypandoc.convert_text(self._escape_single_quotes(self.as_markdown()), 'html', format='md', + outputfile=output_path) + + def export_pdf(self, output_path: str, exist_ok: bool = False): + self._export_check(output_path, format="pdf", exist_ok=exist_ok) + self._filedump_check(output_path, exist_ok) + pypandoc.convert_text(self.as_markdown(), 'pdf', format='md', outputfile=output_path, + extra_args=['--pdf-engine=tectonic']) + + def export_quarto(self, output_path: str, exist_ok: bool = False): + self._export_check(output_path, format="qmd", exist_ok=exist_ok) + self._filedump_check(output_path, exist_ok) + with open(output_path, "w", encoding="utf-8") as f: + f.write(self.as_quarto_markdown()) diff --git a/src/test_creation/modules/workflow/parse.py b/src/test_creation/modules/workflow/parse.py index 9b38a96..d397da2 100644 --- a/src/test_creation/modules/workflow/parse.py +++ b/src/test_creation/modules/workflow/parse.py @@ -1,19 +1,32 @@ import pandas as pd +import os +from typing import Union +from ..mixins import ExportableMixin -class ResponseParser: + +class ResponseParser(ExportableMixin): def __init__(self, response): self.response = response self.evaluation_report = None - def get_completeness_score(self, score_format: str = 'fraction', verbose: bool = False) -> str: + def get_completeness_score(self, score_format: str = 'fraction', verbose: bool = False) -> Union[float, str]: + """ + Compute Evaluation Report and Completeness Score + """ report_df = pd.DataFrame(self.response)['report'].explode('report').apply(pd.Series) + report_df = report_df.rename(columns={"file": "File Path"}) + report_df['Function References'] = report_df[['File Path', 'Functions']].to_dict(orient='records') + report_df['Observation'] = '(' + report_df['File Path'].apply(lambda x: os.path.split(x)[-1]) + ') ' + \ + report_df['Observation'] report_df = report_df.groupby(['ID', 'Title']).agg({ + 'Requirement': ['max'], 'Score': ['max', 'count'], - 'Functions': ['sum'] + 'Observation': [list], + 'Function References': [list], }) - report_df.columns = ['is_Satisfied', 'n_files_tested', 'functions'] - self.evaluation_report = report_df + report_df.columns = ['Requirement', 'is_Satisfied', 'n_files_tested', 'Observations', 'Function References'] + self.evaluation_report = report_df.reset_index() if score_format == 'fraction': score = f"{report_df['is_Satisfied'].sum()}/{report_df['is_Satisfied'].count()}" @@ -27,3 +40,58 @@ def get_completeness_score(self, score_format: str = 'fraction', verbose: bool = print(f'Score: {score}') print() return score + + def as_markdown(self) -> str: + def _get_md_representation(content: dict, curr_level: int): + repeated_col = [k for k, v in content.items() if isinstance(v, list)] + + # print out header for each item + md_repr = '#' * curr_level + if 'ID' in content.keys(): + md_repr += f" {content['ID']}" + if 'Title' in content.keys(): + md_repr += f" {content['Title']}\n\n" + elif 'Topic' in content.keys(): + md_repr += f" {content['Topic']}\n\n" + + # print out non-title, non-repeated items + for k, v in content.items(): + if k not in repeated_col and k not in ['Title', 'Topic', 'ID']: + md_repr += f'**{k}**: {v}\n\n' + + # handle repeated columns and references + point_form_col = ['References', 'Function References', 'Observations'] + for k in repeated_col: + if k not in point_form_col: + for item in content[k]: + md_repr += _get_md_representation(item, curr_level=curr_level + 1) + else: + md_repr += f'**{k}:**\n\n' + '\n'.join(f' - {item}' for item in content[k]) + '\n\n' + + return md_repr + + score = self.get_completeness_score(score_format='fraction') + summary_df = self.evaluation_report[['ID', 'Title', 'is_Satisfied', 'n_files_tested']] + details = self.evaluation_report[['ID', 'Title', 'Requirement', 'Observations', 'Function References']].to_dict(orient='records') + + export_content = dict() + export_content['Title'] = 'Test Evaluation Report' + export_content['Report Areas'] = [] + export_content['Report Areas'].append({'Title': 'Summary', 'Completeness Score': score, 'Completeness Score per Checklist Item': '\n\n' + summary_df.to_markdown(index=False)}) + export_content['Report Areas'].append({'Title': 'Details', 'Report Detail': details}) + + return _get_md_representation(export_content, 1) + + def as_quarto_markdown(self) -> str: + header = '---\ntitle: "Test Evaluation Report"\nformat:\n html:\n code-fold: true\n---\n\n' + return header + self.as_markdown() + + def export_evaluation_report(self, output_path, format='html', exist_ok: bool = False): + """ + Export the test evaluation report + """ + if format=='html': + self.export_html(output_path, exist_ok) + elif format=='pdf': + self.export_pdf(output_path, exist_ok) + return \ No newline at end of file