From 507d251ebf6c7afa7c31a238279fc33fe2deb754 Mon Sep 17 00:00:00 2001
From: Yingzi Jin <jinyz8888@gmail.com>
Date: Tue, 21 May 2024 14:39:33 -0700
Subject: [PATCH 01/17] add data presence and data quality checklist

---
 checklist/checklist.csv/tests.csv |  4 ++++
 checklist/references.bib          | 24 ++++++++++++++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/checklist/checklist.csv/tests.csv b/checklist/checklist.csv/tests.csv
index b9c23d5..e8f5e61 100644
--- a/checklist/checklist.csv/tests.csv
+++ b/checklist/checklist.csv/tests.csv
@@ -5,9 +5,13 @@ ID,Topic,Title,Requirement,Explanation,References
 1.4,General,Keep Cause and Effect Clear,Keep any modifications to objects and the corresponding assertions close together in your tests to maintain readability and clearly show the cause-and-effect relationship.,Refrain from using large global test data structures shared across multiple unit tests. This will allow for clear identification of each test's setup and the cause and effect.,yu2017
 2.1,Data Presence,Ensure Data File Loads as Expected,"Ensure that data-loading functions correctly load files when they exist and match the expected format, handle non-existent files appropriately, and return the expected results.","Reading data is a common scenario encountered in ML projects.  This item ensures that the data exists and can be loaded with expected format, and gracefully exit when unable to load the data.",msise2023
 2.2,Data Presence,Ensure Saving Data/Figures Function Works as Expected,"Verify that functions for saving data and figures perform write operations correctly, checking that the operation succeeds and the content matches the expected format.",Writing operations create artifacts at different stages of the analysis. Making sure the artifacts are created as expected ensures that the artifacts we obtained at the end of the analysis would be consistent and reproducible.,msise2023
+2.3,Data Presence,Error handling exists for file operations,Validate error handling for file operations (e.g., file reading / writing),Catch exceptions that could be raised during the reading / writing process and provide meaningful error messages,common practice
 3.1,Data Quality,Files Contain Data,Ensure all data files are non-empty and contain the necessary data required for further analysis or processing tasks.,This checklist item is crucial as it confirms the presence of usable data within the files. It prevents errors in later stages of the project by ensuring data is available from the start.,msise2023
 3.2,Data Quality,Data in the Expected Format,Verify that the data to be ingested matches the format expected by processing algorithms (like pd.DataFrame for CSVs or np.array for images) and adheres to the expected schema.,"Ensuring that data and images are in the correct format is essential for compatibility with processing tools and algorithms, which may not handle unexpected formats gracefully.",msise2023
 3.3,Data Quality,Data Does Not Contain Null Values or Outliers,Check that data files are free from unexpected null values and identify any outliers that could affect the analysis. Tests should explicitly state if null values are part of expected data.,"Null values can lead to errors or inaccurate computations in many data processing applications, while outliers can distort statistical analyses and models. As such, these values should be checked when before the data is being ingested.",msise2023
+3.4,Data Quality,Validate data accuracy and ensure data values meets expectations,Validating data against expected values ensures that it conforms to defined constraints, which helps in preventing errors and inconsistencies,Validate syntactic accuracy (closeness to syntactically correct values) and semantic accuracy (closeness to semantically correct values),"alexander2024Evaluating,ISO/IEC5259"
+3.5,Data Quality,Check for duplicate records,Verify there's no duplicate records in the dataset,Removing duplicates is essential to ensure the accuracy of analysis and avoid bias in machine learning models,ISO/IEC5259
+3.6,Data Quality,Validate outliers detection and handling,Ensure that the outlier detection mechanism is sensitive enough to flag true outliers while ignoring minor anomalies,Effective outlier detection helps in maintaining data quality by identifying and handling true outliers that can affect analysis and model performance,ISO/IEC5259
 4.1,Data Ingestion,Cleaning and Transformation Functions Work as Expected,"Test that a fixed input to a function or model produces the expected output, focusing on one verification per test to ensure predictable behavior.",Fixed input and output during the data cleaning and transformation routines should be tested so that no unexpected transformation is introduced during these steps.,msise2023
 5.1,Model Fitting,Validate Model Input and Output Compatibility,Confirm that the model accepts inputs of the correct shapes and types and produces outputs that meet the expected shapes and types without any errors.,Ensuring that inputs and outputs conform to expected specifications is critical for the correct functioning of the model in a production environment.,msise2023
 5.2,Model Fitting,Check Model is Learning During Fit,"For parametric models, ensure that the model's weights update correctly per training iteration. For non-parametric models, verify that the data fits correctly into the model.",Making sure the training process is indeed training the model is crucial as model without training is not fitted to any data and the performance would suffer.,msise2023
diff --git a/checklist/references.bib b/checklist/references.bib
index b5597da..2402e4d 100644
--- a/checklist/references.bib
+++ b/checklist/references.bib
@@ -73,3 +73,27 @@ @misc{ribeiro2020accuracy
 	archiveprefix = {arXiv},
 	primaryclass = {cs.CL}
 }
+
+@misc{alexander2024Evaluating,
+	title        = {Evaluating the Decency and Consistency of Data Validation Tests Generated by LLMs∗},
+	author       = {Rohan Alexander and Lindsay Katz and Callandra Moore and Michaela Drouillard and Michael Wing-Cheung Wong and Zane Schwartz},
+	year         = 2024,
+	eprint       = {2310.01402v2},
+	archiveprefix = {arXiv},
+	primaryclass = {stat.ME}
+}
+
+@misc{ISO/IEC5259,
+	title        = {ISO/IEC DIS 5259 Artificial intelligence — Data quality for analytics and machine learning (ML)},
+	author       = {ICS},
+	year         = 2024,
+	month        = {July},
+	url          = {https://www.iso.org/standard/81088.html}
+}
+
+@misc{hynes2017,
+	title        = {The Data Linter: Lightweight, Automated Sanity Checking for ML Data Sets},
+	author       = {Nick Hynes and D. Sculley and Michael Terry},
+	year         = 2017,
+	url          = {http://learningsys.org/nips17/assets/papers/paper_19.pdf}
+}

From 689ba36bda7ceb3f876a1a08e4719ffaf13961c8 Mon Sep 17 00:00:00 2001
From: John Shiu <asbjchk.academic@gmail.com>
Date: Tue, 21 May 2024 18:50:21 -0700
Subject: [PATCH 02/17] undo the changes in checklist.csv

---
 checklist/checklist.csv/tests.csv | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/checklist/checklist.csv/tests.csv b/checklist/checklist.csv/tests.csv
index e8f5e61..b9c23d5 100644
--- a/checklist/checklist.csv/tests.csv
+++ b/checklist/checklist.csv/tests.csv
@@ -5,13 +5,9 @@ ID,Topic,Title,Requirement,Explanation,References
 1.4,General,Keep Cause and Effect Clear,Keep any modifications to objects and the corresponding assertions close together in your tests to maintain readability and clearly show the cause-and-effect relationship.,Refrain from using large global test data structures shared across multiple unit tests. This will allow for clear identification of each test's setup and the cause and effect.,yu2017
 2.1,Data Presence,Ensure Data File Loads as Expected,"Ensure that data-loading functions correctly load files when they exist and match the expected format, handle non-existent files appropriately, and return the expected results.","Reading data is a common scenario encountered in ML projects.  This item ensures that the data exists and can be loaded with expected format, and gracefully exit when unable to load the data.",msise2023
 2.2,Data Presence,Ensure Saving Data/Figures Function Works as Expected,"Verify that functions for saving data and figures perform write operations correctly, checking that the operation succeeds and the content matches the expected format.",Writing operations create artifacts at different stages of the analysis. Making sure the artifacts are created as expected ensures that the artifacts we obtained at the end of the analysis would be consistent and reproducible.,msise2023
-2.3,Data Presence,Error handling exists for file operations,Validate error handling for file operations (e.g., file reading / writing),Catch exceptions that could be raised during the reading / writing process and provide meaningful error messages,common practice
 3.1,Data Quality,Files Contain Data,Ensure all data files are non-empty and contain the necessary data required for further analysis or processing tasks.,This checklist item is crucial as it confirms the presence of usable data within the files. It prevents errors in later stages of the project by ensuring data is available from the start.,msise2023
 3.2,Data Quality,Data in the Expected Format,Verify that the data to be ingested matches the format expected by processing algorithms (like pd.DataFrame for CSVs or np.array for images) and adheres to the expected schema.,"Ensuring that data and images are in the correct format is essential for compatibility with processing tools and algorithms, which may not handle unexpected formats gracefully.",msise2023
 3.3,Data Quality,Data Does Not Contain Null Values or Outliers,Check that data files are free from unexpected null values and identify any outliers that could affect the analysis. Tests should explicitly state if null values are part of expected data.,"Null values can lead to errors or inaccurate computations in many data processing applications, while outliers can distort statistical analyses and models. As such, these values should be checked when before the data is being ingested.",msise2023
-3.4,Data Quality,Validate data accuracy and ensure data values meets expectations,Validating data against expected values ensures that it conforms to defined constraints, which helps in preventing errors and inconsistencies,Validate syntactic accuracy (closeness to syntactically correct values) and semantic accuracy (closeness to semantically correct values),"alexander2024Evaluating,ISO/IEC5259"
-3.5,Data Quality,Check for duplicate records,Verify there's no duplicate records in the dataset,Removing duplicates is essential to ensure the accuracy of analysis and avoid bias in machine learning models,ISO/IEC5259
-3.6,Data Quality,Validate outliers detection and handling,Ensure that the outlier detection mechanism is sensitive enough to flag true outliers while ignoring minor anomalies,Effective outlier detection helps in maintaining data quality by identifying and handling true outliers that can affect analysis and model performance,ISO/IEC5259
 4.1,Data Ingestion,Cleaning and Transformation Functions Work as Expected,"Test that a fixed input to a function or model produces the expected output, focusing on one verification per test to ensure predictable behavior.",Fixed input and output during the data cleaning and transformation routines should be tested so that no unexpected transformation is introduced during these steps.,msise2023
 5.1,Model Fitting,Validate Model Input and Output Compatibility,Confirm that the model accepts inputs of the correct shapes and types and produces outputs that meet the expected shapes and types without any errors.,Ensuring that inputs and outputs conform to expected specifications is critical for the correct functioning of the model in a production environment.,msise2023
 5.2,Model Fitting,Check Model is Learning During Fit,"For parametric models, ensure that the model's weights update correctly per training iteration. For non-parametric models, verify that the data fits correctly into the model.",Making sure the training process is indeed training the model is crucial as model without training is not fitted to any data and the performance would suffer.,msise2023

From 312c5a09f4f117229729a69f866ea9c8c5deb883 Mon Sep 17 00:00:00 2001
From: John Shiu <asbjchk.academic@gmail.com>
Date: Tue, 21 May 2024 19:22:17 -0700
Subject: [PATCH 03/17] docs: prepare checklist for system development

---
 checklist/checklist_sys.csv/overview.csv |  2 ++
 checklist/checklist_sys.csv/tests.csv    |  9 +++++++
 checklist/checklist_sys.csv/topics.csv   |  8 ++++++
 checklist/references.bib                 | 31 ++++++++++++++++++++++++
 4 files changed, 50 insertions(+)
 create mode 100644 checklist/checklist_sys.csv/overview.csv
 create mode 100644 checklist/checklist_sys.csv/tests.csv
 create mode 100644 checklist/checklist_sys.csv/topics.csv

diff --git a/checklist/checklist_sys.csv/overview.csv b/checklist/checklist_sys.csv/overview.csv
new file mode 100644
index 0000000..5ba2a5f
--- /dev/null
+++ b/checklist/checklist_sys.csv/overview.csv
@@ -0,0 +1,2 @@
+Title,Description
+Checklist for Tests in Machine Learning Projects,This is a comprehensive checklist for evaluating the data and ML pipeline based on identified testing strategies from experts in the field.
diff --git a/checklist/checklist_sys.csv/tests.csv b/checklist/checklist_sys.csv/tests.csv
new file mode 100644
index 0000000..4fb8838
--- /dev/null
+++ b/checklist/checklist_sys.csv/tests.csv
@@ -0,0 +1,9 @@
+ID,Topic,Title,Requirement,Explanation,References
+2.1,Data Presence,Test Data Fetching and File Reading,"Verify that the data fetching API or data file reading functionality works correctly. Ensure that proper error handling is in place for scenarios such as missing files, incorrect file formats, and network errors.","Ensure that the code responsible for fetching or reading data can handle errors. This means if the file is missing, the format is wrong, or there's a network issue, the system should not crash but should provide a clear error message indicating the problem.",(general knowledge)
+3.1,Data Quality,Validate Data Shape and Values,"Check that the data has the expected shape and that all values meet domain-specific constraints, such as non-negative distances.","Check that the data being used has the correct structure (like having the right number of columns) and that the values within the data make sense (e.g., distances should not be negative). This ensures that the data is valid and reliable for model training.","alexander2024Evaluating, ISO/IEC5259"
+3.2,Data Quality,Check for Duplicate Records in Data,Check for duplicate records in the dataset and ensure that there are none.,"Ensure that the dataset does not contain duplicate entries, as these can skew the results and reduce the model�s performance. The test should identify any repeated records so they can be removed or investigated.",ISO/IEC5259
+4.1,Data Ingestion,Verify Data Split Proportion,Check that the data is split into training and testing sets in the expected proportion.,"Confirm that the data is divided correctly into training and testing sets according to the intended ratio. This is crucial for ensuring that the model is trained and evaluated properly, with representative samples in each set.","openja2023studying, DBLP:conf/recsys/Kula15, singh2020mmf"
+5.1,Model Fitting,Test Model Output Shape,Validate that the model's output has the expected shape.,"Ensure that the output from the model has the correct dimensions and structure. For example, in a classification task, if the model should output probabilities for each class, the test should verify that the output is an array with the correct dimensions. Ensuring the correct output shape helps prevent runtime errors and ensures consistency in how data is handled downstream.","openja2023studying, DBLP:conf/recsys/Kula15, singh2020mmf"
+6.1,Model Evaluation,Verify Evaluation Metrics Implementation,Verify that the evaluation metrics are correctly implemented and appropriate for the model's task.,Confirm that the metrics used to evaluate the model are implemented correctly and are suitable for the specific task at hand. This helps in accurately assessing the model�s performance and understanding its strengths and weaknesses.,"openja2023studying, DBLP:conf/recsys/Kula15, singh2020mmf"
+6.2,Model Evaluation,Evaluate Model's Performance Against Thresholds,"Compute evaluation metrics for both the training and testing datasets and ensure that these metrics exceed predefined threshold values, indicating acceptable model performance.","This ensures that the model's performance meets or exceeds certain benchmarks. By setting thresholds for metrics like accuracy or precision, you can automatically flag models that underperform or overfit. This is crucial for maintaining a baseline quality of results and for ensuring that the model meets the requirements necessary for deployment.","openja2023studying, DBLP:conf/recsys/Kula15, singh2020mmf"
+8.1,Data Quality optional,Validate Outliers Detection and Handling,Detect outliers in the dataset. Ensure that the outlier detection mechanism is sensitive enough to flag true outliers while ignoring minor anomalies.,The detection method should be precise enough to catch significant anomalies without being misled by minor variations. This is important for maintaining data quality and ensuring the model�s reliability in certain projects.,ISO/IEC5259
\ No newline at end of file
diff --git a/checklist/checklist_sys.csv/topics.csv b/checklist/checklist_sys.csv/topics.csv
new file mode 100644
index 0000000..3c93aec
--- /dev/null
+++ b/checklist/checklist_sys.csv/topics.csv
@@ -0,0 +1,8 @@
+ID,Topic,Description
+1,General,The following items describe best practices for all tests to be written.
+2,Data Presence,"The following items describe tests that need to be done for testing the presence of data. This area of tests mainly concern whether the reading and saving operations are behaving as expected, and any unexpected behavior would not be passed silently."
+3,Data Quality,"The following items describe tests that need to be done for testing the quality of data. This area of tests mainly concern whether the data supplied is in the expected format, data containing null values or outliers to make sure that the data processing pipeline is robust."
+4,Data Ingestion,The following items describe tests that need to be done for testing if the data is ingestion properly.
+5,Model Fitting,The following items describe tests that need to be done for testing the model fitting process. The unit tests written for this section usually mock model load and model predictions similarly to mocking file access.
+6,Model Evaluation,The following items describe tests that need to be done for testing the model evaluation process.
+7,Artifact Testing,"The following items involves explicit checks for behaviors that we expect the artifacts e.g. models, plots, etc., to follow."
diff --git a/checklist/references.bib b/checklist/references.bib
index 2402e4d..1c66aac 100644
--- a/checklist/references.bib
+++ b/checklist/references.bib
@@ -97,3 +97,34 @@ @misc{hynes2017
 	year         = 2017,
 	url          = {http://learningsys.org/nips17/assets/papers/paper_19.pdf}
 }
+
+@article{openja2023studying,
+    title        = {Studying the Practices of Testing Machine Learning Software in the Wild},
+    author       = {Openja, Moses and Khomh, Foutse and Foundjem, Armstrong and Ming, Zhen and Abidi, Mouna and Hassan, Ahmed E and others},
+    journal      = {arXiv preprint arXiv:2312.12604},
+    year         = {2023}
+}
+
+@inproceedings{DBLP:conf/recsys/Kula15,
+  author    = {Maciej Kula},
+  editor    = {Toine Bogers and
+               Marijn Kool"::en},
+  title     = {Metadata Embeddings for User and Item Cold-start Recommendations},
+  booktitle = {Proceedings of the 2nd Workshop on New Trends on Content-Based Recommender
+               Systems co-located with 9th {ACM} Conference on Recommender Systems
+               (RecSys 2015), Vienna, Austria, September 16-20, 2015.},
+  series    = {{CEUR} Workshop Proceedings},
+  volume    = {1448},
+  pages     = {14--21},
+  publisher = {CEUR-WS.org},
+  year      = {2015},
+  url       = {http://ceur-ws.org/Vol-1448/paper4.pdf},
+}
+
+@misc{singh2020mmf,
+  author =       {Singh, Amanpreet and Goswami, Vedanuj and Natarajan, Vivek and Jiang, Yu and Chen, Xinlei and Shah, Meet and
+                 Rohrbach, Marcus and Batra, Dhruv and Parikh, Devi},
+  title =        {MMF: A multimodal framework for vision and language research},
+  howpublished = {\url{https://github.com/facebookresearch/mmf}},
+  year =         {2020}
+}

From e936aa2768e98a1f3fb089b209aa915694ad6a58 Mon Sep 17 00:00:00 2001
From: John Shiu <asbjchk.academic@gmail.com>
Date: Tue, 21 May 2024 19:33:29 -0700
Subject: [PATCH 04/17] added topic for optional Data Quality

---
 checklist/checklist_sys.csv/topics.csv | 1 +
 1 file changed, 1 insertion(+)

diff --git a/checklist/checklist_sys.csv/topics.csv b/checklist/checklist_sys.csv/topics.csv
index 3c93aec..9e51edf 100644
--- a/checklist/checklist_sys.csv/topics.csv
+++ b/checklist/checklist_sys.csv/topics.csv
@@ -6,3 +6,4 @@ ID,Topic,Description
 5,Model Fitting,The following items describe tests that need to be done for testing the model fitting process. The unit tests written for this section usually mock model load and model predictions similarly to mocking file access.
 6,Model Evaluation,The following items describe tests that need to be done for testing the model evaluation process.
 7,Artifact Testing,"The following items involves explicit checks for behaviors that we expect the artifacts e.g. models, plots, etc., to follow."
+8,Data Quality optional,"The following items describe tests that need to be done for testing the quality of data, but they may not be applicable to all projects."
\ No newline at end of file

From f18fcc976a0bb3f731d28863acca04b495cb352f Mon Sep 17 00:00:00 2001
From: John Shiu <asbjchk.academic@gmail.com>
Date: Tue, 21 May 2024 19:58:49 -0700
Subject: [PATCH 05/17] updated example in checklist.py

---
 src/test_creation/modules/checklist/checklist.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/test_creation/modules/checklist/checklist.py b/src/test_creation/modules/checklist/checklist.py
index e0871d4..d60e532 100644
--- a/src/test_creation/modules/checklist/checklist.py
+++ b/src/test_creation/modules/checklist/checklist.py
@@ -244,7 +244,7 @@ def example(checklist_path: str):
         """Example calls. To be removed later.
 
         Example:
-        python src/checklist/checklist.py ./test-dump-csv/
+        python src/test_creation/modules/checklist/checklist.py ./checklist/test-dump-csv/
 
         Note that the supplied path must be a directory containing 3 CSV files:
         1. `overview.csv`
@@ -253,7 +253,7 @@ def example(checklist_path: str):
         """
         checklist = Checklist(checklist_path, checklist_format=ChecklistFormat.CSV)
         print(checklist.as_markdown())
-        checklist.export_pdf("checklist.pdf", exist_ok=True)
+        checklist.export_html("checklist.html", exist_ok=True)
 
 
     fire.Fire(example)

From dd37cbb079c9fa6fb73f05105dc6ca135d0b5470 Mon Sep 17 00:00:00 2001
From: John Shiu <asbjchk.academic@gmail.com>
Date: Tue, 21 May 2024 19:59:42 -0700
Subject: [PATCH 06/17] fix symbol typo in checklist_sys

---
 checklist/checklist_sys.csv/tests.csv | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/checklist/checklist_sys.csv/tests.csv b/checklist/checklist_sys.csv/tests.csv
index 4fb8838..419b3cf 100644
--- a/checklist/checklist_sys.csv/tests.csv
+++ b/checklist/checklist_sys.csv/tests.csv
@@ -1,9 +1,9 @@
 ID,Topic,Title,Requirement,Explanation,References
 2.1,Data Presence,Test Data Fetching and File Reading,"Verify that the data fetching API or data file reading functionality works correctly. Ensure that proper error handling is in place for scenarios such as missing files, incorrect file formats, and network errors.","Ensure that the code responsible for fetching or reading data can handle errors. This means if the file is missing, the format is wrong, or there's a network issue, the system should not crash but should provide a clear error message indicating the problem.",(general knowledge)
 3.1,Data Quality,Validate Data Shape and Values,"Check that the data has the expected shape and that all values meet domain-specific constraints, such as non-negative distances.","Check that the data being used has the correct structure (like having the right number of columns) and that the values within the data make sense (e.g., distances should not be negative). This ensures that the data is valid and reliable for model training.","alexander2024Evaluating, ISO/IEC5259"
-3.2,Data Quality,Check for Duplicate Records in Data,Check for duplicate records in the dataset and ensure that there are none.,"Ensure that the dataset does not contain duplicate entries, as these can skew the results and reduce the model�s performance. The test should identify any repeated records so they can be removed or investigated.",ISO/IEC5259
+3.2,Data Quality,Check for Duplicate Records in Data,Check for duplicate records in the dataset and ensure that there are none.,"Ensure that the dataset does not contain duplicate entries, as these can skew the results and reduce the model's performance. The test should identify any repeated records so they can be removed or investigated.",ISO/IEC5259
 4.1,Data Ingestion,Verify Data Split Proportion,Check that the data is split into training and testing sets in the expected proportion.,"Confirm that the data is divided correctly into training and testing sets according to the intended ratio. This is crucial for ensuring that the model is trained and evaluated properly, with representative samples in each set.","openja2023studying, DBLP:conf/recsys/Kula15, singh2020mmf"
 5.1,Model Fitting,Test Model Output Shape,Validate that the model's output has the expected shape.,"Ensure that the output from the model has the correct dimensions and structure. For example, in a classification task, if the model should output probabilities for each class, the test should verify that the output is an array with the correct dimensions. Ensuring the correct output shape helps prevent runtime errors and ensures consistency in how data is handled downstream.","openja2023studying, DBLP:conf/recsys/Kula15, singh2020mmf"
-6.1,Model Evaluation,Verify Evaluation Metrics Implementation,Verify that the evaluation metrics are correctly implemented and appropriate for the model's task.,Confirm that the metrics used to evaluate the model are implemented correctly and are suitable for the specific task at hand. This helps in accurately assessing the model�s performance and understanding its strengths and weaknesses.,"openja2023studying, DBLP:conf/recsys/Kula15, singh2020mmf"
+6.1,Model Evaluation,Verify Evaluation Metrics Implementation,Verify that the evaluation metrics are correctly implemented and appropriate for the model's task.,Confirm that the metrics used to evaluate the model are implemented correctly and are suitable for the specific task at hand. This helps in accurately assessing the model's performance and understanding its strengths and weaknesses.,"openja2023studying, DBLP:conf/recsys/Kula15, singh2020mmf"
 6.2,Model Evaluation,Evaluate Model's Performance Against Thresholds,"Compute evaluation metrics for both the training and testing datasets and ensure that these metrics exceed predefined threshold values, indicating acceptable model performance.","This ensures that the model's performance meets or exceeds certain benchmarks. By setting thresholds for metrics like accuracy or precision, you can automatically flag models that underperform or overfit. This is crucial for maintaining a baseline quality of results and for ensuring that the model meets the requirements necessary for deployment.","openja2023studying, DBLP:conf/recsys/Kula15, singh2020mmf"
-8.1,Data Quality optional,Validate Outliers Detection and Handling,Detect outliers in the dataset. Ensure that the outlier detection mechanism is sensitive enough to flag true outliers while ignoring minor anomalies.,The detection method should be precise enough to catch significant anomalies without being misled by minor variations. This is important for maintaining data quality and ensuring the model�s reliability in certain projects.,ISO/IEC5259
\ No newline at end of file
+8.1,Data Quality (Optional),Validate Outliers Detection and Handling,Detect outliers in the dataset. Ensure that the outlier detection mechanism is sensitive enough to flag true outliers while ignoring minor anomalies.,The detection method should be precise enough to catch significant anomalies without being misled by minor variations. This is important for maintaining data quality and ensuring the model's reliability in certain projects.,ISO/IEC5259

From 96c705e435ff376b10f22d329f94c5302c83b0fa Mon Sep 17 00:00:00 2001
From: John Shiu <asbjchk.academic@gmail.com>
Date: Tue, 21 May 2024 20:00:34 -0700
Subject: [PATCH 07/17] renamed topic

---
 checklist/checklist_sys.csv/topics.csv | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/checklist/checklist_sys.csv/topics.csv b/checklist/checklist_sys.csv/topics.csv
index 9e51edf..c35d79a 100644
--- a/checklist/checklist_sys.csv/topics.csv
+++ b/checklist/checklist_sys.csv/topics.csv
@@ -6,4 +6,4 @@ ID,Topic,Description
 5,Model Fitting,The following items describe tests that need to be done for testing the model fitting process. The unit tests written for this section usually mock model load and model predictions similarly to mocking file access.
 6,Model Evaluation,The following items describe tests that need to be done for testing the model evaluation process.
 7,Artifact Testing,"The following items involves explicit checks for behaviors that we expect the artifacts e.g. models, plots, etc., to follow."
-8,Data Quality optional,"The following items describe tests that need to be done for testing the quality of data, but they may not be applicable to all projects."
\ No newline at end of file
+8,Data Quality (Optional),"The following items describe tests that need to be done for testing the quality of data, but they may not be applicable to all projects."

From bc6ab8a79ebedad090c615c5bcbc29ad27408466 Mon Sep 17 00:00:00 2001
From: John Shiu <asbjchk.academic@gmail.com>
Date: Tue, 21 May 2024 20:00:59 -0700
Subject: [PATCH 08/17] added checklist html for visualization

---
 checklist/checklist_sys.html | 155 +++++++++++++++++++++++++++++++++++
 1 file changed, 155 insertions(+)
 create mode 100644 checklist/checklist_sys.html

diff --git a/checklist/checklist_sys.html b/checklist/checklist_sys.html
new file mode 100644
index 0000000..c0c174b
--- /dev/null
+++ b/checklist/checklist_sys.html
@@ -0,0 +1,155 @@
+<h1 id="checklist-for-tests-in-machine-learning-projects">Checklist for
+Tests in Machine Learning Projects</h1>
+<p><strong>Description</strong>: This is a comprehensive checklist for
+evaluating the data and ML pipeline based on identified testing
+strategies from experts in the field.</p>
+<h2 id="general">1 General</h2>
+<p><strong>Description</strong>: The following items describe best
+practices for all tests to be written.</p>
+<h2 id="data-presence">2 Data Presence</h2>
+<p><strong>Description</strong>: The following items describe tests that
+need to be done for testing the presence of data. This area of tests
+mainly concern whether the reading and saving operations are behaving as
+expected, and any unexpected behavior would not be passed silently.</p>
+<h3 id="test-data-fetching-and-file-reading">2.1 Test Data Fetching and
+File Reading</h3>
+<p><strong>Requirement</strong>: Verify that the data fetching API or
+data file reading functionality works correctly. Ensure that proper
+error handling is in place for scenarios such as missing files,
+incorrect file formats, and network errors.</p>
+<p><strong>Explanation</strong>: Ensure that the code responsible for
+fetching or reading data can handle errors. This means if the file is
+missing, the format is wrong, or there's a network issue, the system
+should not crash but should provide a clear error message indicating the
+problem.</p>
+<p><strong>References:</strong></p>
+<ul>
+<li>(general knowledge)</li>
+</ul>
+<h2 id="data-quality">3 Data Quality</h2>
+<p><strong>Description</strong>: The following items describe tests that
+need to be done for testing the quality of data. This area of tests
+mainly concern whether the data supplied is in the expected format, data
+containing null values or outliers to make sure that the data processing
+pipeline is robust.</p>
+<h3 id="validate-data-shape-and-values">3.1 Validate Data Shape and
+Values</h3>
+<p><strong>Requirement</strong>: Check that the data has the expected
+shape and that all values meet domain-specific constraints, such as
+non-negative distances.</p>
+<p><strong>Explanation</strong>: Check that the data being used has the
+correct structure (like having the right number of columns) and that the
+values within the data make sense (e.g., distances should not be
+negative). This ensures that the data is valid and reliable for model
+training.</p>
+<p><strong>References:</strong></p>
+<ul>
+<li>alexander2024Evaluating</li>
+<li>ISO/IEC5259</li>
+</ul>
+<h3 id="check-for-duplicate-records-in-data">3.2 Check for Duplicate
+Records in Data</h3>
+<p><strong>Requirement</strong>: Check for duplicate records in the
+dataset and ensure that there are none.</p>
+<p><strong>Explanation</strong>: Ensure that the dataset does not
+contain duplicate entries, as these can skew the results and reduce the
+model's performance. The test should identify any repeated records so
+they can be removed or investigated.</p>
+<p><strong>References:</strong></p>
+<ul>
+<li>ISO/IEC5259</li>
+</ul>
+<h2 id="data-ingestion">4 Data Ingestion</h2>
+<p><strong>Description</strong>: The following items describe tests that
+need to be done for testing if the data is ingestion properly.</p>
+<h3 id="verify-data-split-proportion">4.1 Verify Data Split
+Proportion</h3>
+<p><strong>Requirement</strong>: Check that the data is split into
+training and testing sets in the expected proportion.</p>
+<p><strong>Explanation</strong>: Confirm that the data is divided
+correctly into training and testing sets according to the intended
+ratio. This is crucial for ensuring that the model is trained and
+evaluated properly, with representative samples in each set.</p>
+<p><strong>References:</strong></p>
+<ul>
+<li>openja2023studying</li>
+<li>DBLP:conf/recsys/Kula15</li>
+<li>singh2020mmf</li>
+</ul>
+<h2 id="model-fitting">5 Model Fitting</h2>
+<p><strong>Description</strong>: The following items describe tests that
+need to be done for testing the model fitting process. The unit tests
+written for this section usually mock model load and model predictions
+similarly to mocking file access.</p>
+<h3 id="test-model-output-shape">5.1 Test Model Output Shape</h3>
+<p><strong>Requirement</strong>: Validate that the model's output has
+the expected shape.</p>
+<p><strong>Explanation</strong>: Ensure that the output from the model
+has the correct dimensions and structure. For example, in a
+classification task, if the model should output probabilities for each
+class, the test should verify that the output is an array with the
+correct dimensions. Ensuring the correct output shape helps prevent
+runtime errors and ensures consistency in how data is handled
+downstream.</p>
+<p><strong>References:</strong></p>
+<ul>
+<li>openja2023studying</li>
+<li>DBLP:conf/recsys/Kula15</li>
+<li>singh2020mmf</li>
+</ul>
+<h2 id="model-evaluation">6 Model Evaluation</h2>
+<p><strong>Description</strong>: The following items describe tests that
+need to be done for testing the model evaluation process.</p>
+<h3 id="verify-evaluation-metrics-implementation">6.1 Verify Evaluation
+Metrics Implementation</h3>
+<p><strong>Requirement</strong>: Verify that the evaluation metrics are
+correctly implemented and appropriate for the model's task.</p>
+<p><strong>Explanation</strong>: Confirm that the metrics used to
+evaluate the model are implemented correctly and are suitable for the
+specific task at hand. This helps in accurately assessing the model's
+performance and understanding its strengths and weaknesses.</p>
+<p><strong>References:</strong></p>
+<ul>
+<li>openja2023studying</li>
+<li>DBLP:conf/recsys/Kula15</li>
+<li>singh2020mmf</li>
+</ul>
+<h3 id="evaluate-models-performance-against-thresholds">6.2 Evaluate
+Model’s Performance Against Thresholds</h3>
+<p><strong>Requirement</strong>: Compute evaluation metrics for both the
+training and testing datasets and ensure that these metrics exceed
+predefined threshold values, indicating acceptable model
+performance.</p>
+<p><strong>Explanation</strong>: This ensures that the model's
+performance meets or exceeds certain benchmarks. By setting thresholds
+for metrics like accuracy or precision, you can automatically flag
+models that underperform or overfit. This is crucial for maintaining a
+baseline quality of results and for ensuring that the model meets the
+requirements necessary for deployment.</p>
+<p><strong>References:</strong></p>
+<ul>
+<li>openja2023studying</li>
+<li>DBLP:conf/recsys/Kula15</li>
+<li>singh2020mmf</li>
+</ul>
+<h2 id="artifact-testing">7 Artifact Testing</h2>
+<p><strong>Description</strong>: The following items involves explicit
+checks for behaviors that we expect the artifacts e.g. models, plots,
+etc., to follow.</p>
+<h2 id="data-quality-optional">8 Data Quality (Optional)</h2>
+<p><strong>Description</strong>: The following items describe tests that
+need to be done for testing the quality of data, but they may not be
+applicable to all projects.</p>
+<h3 id="validate-outliers-detection-and-handling">8.1 Validate Outliers
+Detection and Handling</h3>
+<p><strong>Requirement</strong>: Detect outliers in the dataset. Ensure
+that the outlier detection mechanism is sensitive enough to flag true
+outliers while ignoring minor anomalies.</p>
+<p><strong>Explanation</strong>: The detection method should be precise
+enough to catch significant anomalies without being misled by minor
+variations. This is important for maintaining data quality and ensuring
+the model's reliability in certain projects.</p>
+<p><strong>References:</strong></p>
+<ul>
+<li>ISO/IEC5259</li>
+</ul>

From 7a3a13a4e27edee12d9737e5285e095e1746bd64 Mon Sep 17 00:00:00 2001
From: shumlh <tonyuglobe@gmail.com>
Date: Thu, 23 May 2024 12:11:41 -0700
Subject: [PATCH 09/17] Refactor the report export function to fit in
 refactored TestEvaluator codes. Add demo notebook

---
 src/test_creation/analyze.py                |  10 +-
 src/test_creation/demo_report_export.ipynb  | 177 ++++++++++++++++++++
 src/test_creation/modules/workflow/parse.py |  82 ++++++++-
 3 files changed, 264 insertions(+), 5 deletions(-)
 create mode 100644 src/test_creation/demo_report_export.ipynb

diff --git a/src/test_creation/analyze.py b/src/test_creation/analyze.py
index d8eb551..ceba852 100644
--- a/src/test_creation/analyze.py
+++ b/src/test_creation/analyze.py
@@ -113,7 +113,12 @@ def evaluate(self, verbose: bool = False) -> List[dict]:
 
 
 if __name__ == '__main__':
-    def main(checklist_path, repo_path):
+    def main(checklist_path, repo_path, report_output_path, report_output_format='html'):
+        """
+        Example:
+        ----------
+        >>> python src/test_creation/analyze.py --checklist_path='./checklist/checklist_demo.csv' --repo_path='../lightfm/' --report_output_path='./report/evaluation_report.html' --report_output_format='html'
+        """
         llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
         checklist = Checklist(checklist_path, checklist_format=ChecklistFormat.CSV)
         extractor = PythonTestFileExtractor(Repository(repo_path))
@@ -122,6 +127,7 @@ def main(checklist_path, repo_path):
         response = evaluator.evaluate()
 
         parser = ResponseParser(response)
-        parser.get_completeness_score()
+        parser.get_completeness_score(verbose=True)
+        parser.export_evaluation_report(report_output_path, report_output_format, exist_ok=True)
 
     fire.Fire(main)
diff --git a/src/test_creation/demo_report_export.ipynb b/src/test_creation/demo_report_export.ipynb
new file mode 100644
index 0000000..9a55f74
--- /dev/null
+++ b/src/test_creation/demo_report_export.ipynb
@@ -0,0 +1,177 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "669bb292-2b53-4a28-8d5f-ef6f3687f440",
+   "metadata": {},
+   "source": [
+    "## Evaluation Report Export Function Demo - For Development"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "d2c1ead7-9d5b-4414-80e2-07092ba180ca",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from analyze import *\n",
+    "from analyze import TestEvaluator\n",
+    "from modules.checklist.checklist import Checklist, ChecklistFormat\n",
+    "from modules.code_analyzer.repo import Repository\n",
+    "from modules.workflow.files import PythonTestFileExtractor, RepoFileExtractor\n",
+    "from modules.workflow.parse import ResponseParser\n",
+    "from langchain_openai import ChatOpenAI"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "ad0a59a9-185c-4f17-a0dd-fa2534958ecb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "repo_path = '../../../lightfm/'\n",
+    "checklist_path = '../../checklist/checklist_demo.csv'\n",
+    "report_output_path_html = '../../report/evaluation_report.html'\n",
+    "report_output_path_pdf = '../../report/evaluation_report.pdf'"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "d717ba5d-dc9d-477d-a9db-ccb993f48f09",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "100%|██████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:14<00:00,  7.37s/it]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Report:\n",
+      "                                                                                         Requirement  \\\n",
+      "ID  Title                                                                                              \n",
+      "1.1 Write Descriptive Test Names                   Each test function should have a clear, descri...   \n",
+      "1.2 Keep Tests Focused                             Each test should focus on a single scenario, u...   \n",
+      "2.1 Ensure Data File Loads as Expected             Ensure that data-loading functions correctly l...   \n",
+      "5.1 Validate Model Input and Output Compatibility  Confirm that the model accepts inputs of the c...   \n",
+      "\n",
+      "                                                   is_Satisfied  \\\n",
+      "ID  Title                                                         \n",
+      "1.1 Write Descriptive Test Names                              1   \n",
+      "1.2 Keep Tests Focused                                        1   \n",
+      "2.1 Ensure Data File Loads as Expected                        0   \n",
+      "5.1 Validate Model Input and Output Compatibility             0   \n",
+      "\n",
+      "                                                   n_files_tested  \\\n",
+      "ID  Title                                                           \n",
+      "1.1 Write Descriptive Test Names                                2   \n",
+      "1.2 Keep Tests Focused                                          2   \n",
+      "2.1 Ensure Data File Loads as Expected                          2   \n",
+      "5.1 Validate Model Input and Output Compatibility               2   \n",
+      "\n",
+      "                                                                                        Observations  \\\n",
+      "ID  Title                                                                                              \n",
+      "1.1 Write Descriptive Test Names                   [(test_cross_validation.py) The test function ...   \n",
+      "1.2 Keep Tests Focused                             [(test_cross_validation.py) The test function ...   \n",
+      "2.1 Ensure Data File Loads as Expected             [(test_cross_validation.py) The code does not ...   \n",
+      "5.1 Validate Model Input and Output Compatibility  [(test_cross_validation.py) The code does not ...   \n",
+      "\n",
+      "                                                                                 Function References  \n",
+      "ID  Title                                                                                             \n",
+      "1.1 Write Descriptive Test Names                   [{'File Path': '../../../lightfm/tests/test_cr...  \n",
+      "1.2 Keep Tests Focused                             [{'File Path': '../../../lightfm/tests/test_cr...  \n",
+      "2.1 Ensure Data File Loads as Expected             [{'File Path': '../../../lightfm/tests/test_cr...  \n",
+      "5.1 Validate Model Input and Output Compatibility  [{'File Path': '../../../lightfm/tests/test_cr...  \n",
+      "\n",
+      "Score: 2/4\n",
+      "\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "'2/4'"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "llm = ChatOpenAI(model=\"gpt-3.5-turbo\", temperature=0)\n",
+    "checklist = Checklist(checklist_path, checklist_format=ChecklistFormat.CSV)\n",
+    "extractor = PythonTestFileExtractor(Repository(repo_path))\n",
+    "\n",
+    "evaluator = TestEvaluator(llm, extractor, checklist)\n",
+    "response = evaluator.evaluate()\n",
+    "\n",
+    "parser = ResponseParser(response)\n",
+    "parser.get_completeness_score(verbose=True)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "273db18c-13c4-4c86-a4c8-f42e0b0e37c5",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "parser.export_evaluation_report(report_output_path_html, 'html', exist_ok=True)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "5a682a42-8807-48c6-9de4-0558838e3ccd",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "parser.export_evaluation_report(report_output_path_pdf, 'pdf', exist_ok=True)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "07875448-9c58-4ec0-94b8-de9be8870011",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python [conda env:test-creation]",
+   "language": "python",
+   "name": "conda-env-test-creation-py"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/src/test_creation/modules/workflow/parse.py b/src/test_creation/modules/workflow/parse.py
index 9b38a96..e408f64 100644
--- a/src/test_creation/modules/workflow/parse.py
+++ b/src/test_creation/modules/workflow/parse.py
@@ -1,4 +1,6 @@
 import pandas as pd
+import os
+import pypandoc
 
 
 class ResponseParser:
@@ -7,13 +9,21 @@ def __init__(self, response):
         self.evaluation_report = None
 
     def get_completeness_score(self, score_format: str = 'fraction', verbose: bool = False) -> str:
+        """
+        Compute Evaluation Report and Completeness Score
+        """
         report_df = pd.DataFrame(self.response)['report'].explode('report').apply(pd.Series)
+        report_df = report_df.rename(columns={"file": "File Path"})
+        report_df['Function References'] = report_df[['File Path', 'Functions']].to_dict(orient='records')
+        report_df['Observation'] = '(' + report_df['File Path'].apply(lambda x: os.path.split(x)[-1]) + ') ' + report_df['Observation']
         report_df = report_df.groupby(['ID', 'Title']).agg({
+            'Requirement': ['max'],
             'Score': ['max', 'count'],
-            'Functions': ['sum']
+            'Observation': [list],
+            'Function References': [list],
         })
-        report_df.columns = ['is_Satisfied', 'n_files_tested', 'functions']
-        self.evaluation_report = report_df
+        report_df.columns = ['Requirement', 'is_Satisfied', 'n_files_tested', 'Observations', 'Function References']
+        self.evaluation_report = report_df.reset_index()
 
         if score_format == 'fraction':
             score = f"{report_df['is_Satisfied'].sum()}/{report_df['is_Satisfied'].count()}"
@@ -27,3 +37,69 @@ def get_completeness_score(self, score_format: str = 'fraction', verbose: bool =
             print(f'Score: {score}')
             print()
         return score
+
+    # FIXME From checklist.py. To be refactored 
+    def _get_md_representation(self, content: dict, curr_level: int):
+        repeated_col = [k for k, v in content.items() if isinstance(v, list)]
+
+        # print out header for each item
+        md_repr = '#' * curr_level
+        if 'ID' in content.keys():
+            md_repr += f" {content['ID']}"
+        if 'Title' in content.keys():
+            md_repr += f" {content['Title']}\n\n"
+        elif 'Topic' in content.keys():
+            md_repr += f" {content['Topic']}\n\n"
+
+        # print out non-title, non-repeated items
+        for k, v in content.items():
+            if k not in repeated_col and k not in ['Title', 'Topic', 'ID']:
+                md_repr += f'**{k}**: {v.replace("'", "\\'")}\n\n'
+
+        # handle repeated columns and references
+        point_form_col = ['References', 'Function References', 'Observations']
+        for k in repeated_col:
+            if k not in point_form_col:
+                for item in content[k]:
+                    md_repr += self._get_md_representation(item, curr_level=curr_level + 1)
+            else:
+                md_repr += f'**{k}:**\n\n' + '\n'.join(f'  - {item}' for item in content[k]) + '\n\n'
+
+        return md_repr
+    
+    # FIXME. From checklist.py. To be refactored 
+    @staticmethod
+    def __filedump_check(output_path: str, exist_ok: bool):
+        if not exist_ok and os.path.exists(output_path):
+            raise FileExistsError("Output file already exists. Use `exist_ok=True` to overwrite.")
+        return True
+
+    # FIXME. From checklist.py. To be refactored 
+    def export_html(self, content: str, output_path: str, exist_ok: bool = False):
+        self.__filedump_check(output_path, exist_ok)
+        pypandoc.convert_text(content, 'html', format='md', outputfile=output_path)
+
+    # FIXME. From checklist.py. To be refactored 
+    def export_pdf(self, content: str, output_path: str, exist_ok: bool = False):
+        self.__filedump_check(output_path, exist_ok)
+        pypandoc.convert_text(content, 'pdf', format='md', outputfile=output_path,
+                              extra_args=['--pdf-engine=tectonic'])
+
+    def export_evaluation_report(self, output_path, format='html', exist_ok: bool = False):
+        """
+        Export the test evaluation report
+        """
+        score = self.get_completeness_score(score_format='fraction')
+        summary_df = self.evaluation_report[['ID', 'Title', 'is_Satisfied', 'n_files_tested']]
+        details = self.evaluation_report[['ID', 'Title', 'Requirement', 'Observations', 'Function References']].to_dict(orient='records')
+
+        export_content = dict()
+        export_content['Title'] = 'Test Evaluation Report'
+        export_content['Report Areas'] = []
+        export_content['Report Areas'].append({'Title': 'Summary', 'Completeness Score': score, 'Completeness Score per Checklist Item': '\n\n' + summary_df.to_markdown(index=False)})
+        export_content['Report Areas'].append({'Title': 'Details', 'Report Detail': details})
+        if format=='html':
+            self.export_html(self._get_md_representation(export_content, curr_level=1), output_path, exist_ok)
+        elif format=='pdf':
+            self.export_pdf(self._get_md_representation(export_content, curr_level=1), output_path, exist_ok)
+        return
\ No newline at end of file

From abf247274776e7702061ae51e2004d8f0aecc4b5 Mon Sep 17 00:00:00 2001
From: SoloSynth1 <solosynth1@gmail.com>
Date: Fri, 24 May 2024 00:34:11 -0700
Subject: [PATCH 10/17] move checklist exporting script to top level

---
 src/test_creation/checklist_export.py | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)
 create mode 100644 src/test_creation/checklist_export.py

diff --git a/src/test_creation/checklist_export.py b/src/test_creation/checklist_export.py
new file mode 100644
index 0000000..4f8ee7a
--- /dev/null
+++ b/src/test_creation/checklist_export.py
@@ -0,0 +1,25 @@
+import fire
+
+from modules.checklist.checklist import Checklist, ChecklistFormat
+
+
+def export_checklist(checklist_path: str):
+    """Example calls. To be removed later.
+
+    Example:
+    python src/test_creation/modules/checklist/checklist.py ./checklist/test-dump-csv
+
+    Note that the supplied path must be a directory containing 3 CSV files:
+    1. `overview.csv`
+    2. `topics.csv`
+    3. `tests.csv`
+    """
+    __package__ = ''
+    checklist = Checklist(checklist_path, checklist_format=ChecklistFormat.CSV)
+    print(checklist.as_markdown())
+    checklist.export_html("checklist.html", exist_ok=True)
+    checklist.export_pdf("checklist.pdf", exist_ok=True)
+
+
+if __name__ == "__main__":
+    fire.Fire(export_checklist)

From 8b272353801b16807168d968c916a63f8f2ab37a Mon Sep 17 00:00:00 2001
From: SoloSynth1 <solosynth1@gmail.com>
Date: Fri, 24 May 2024 00:36:21 -0700
Subject: [PATCH 11/17] create mixins for exporting actions; make `Checklist`
 and `ResponseParser` inherit the mixin methods

---
 .../modules/checklist/checklist.py            | 111 ++++++------------
 src/test_creation/modules/mixins.py           |  47 ++++++++
 src/test_creation/modules/workflow/parse.py   | 105 ++++++++---------
 3 files changed, 133 insertions(+), 130 deletions(-)
 create mode 100644 src/test_creation/modules/mixins.py

diff --git a/src/test_creation/modules/checklist/checklist.py b/src/test_creation/modules/checklist/checklist.py
index 7b0bc4e..e0e2091 100644
--- a/src/test_creation/modules/checklist/checklist.py
+++ b/src/test_creation/modules/checklist/checklist.py
@@ -5,10 +5,10 @@
 from typing import Union
 from abc import ABC, abstractmethod
 
-import fire
-import pypandoc
 from ruamel.yaml import YAML
 
+from ..mixins import ExportableMixin
+
 
 def filter_dict(d: dict, keys: list) -> dict:
     return {k: v for k, v in d.items() if k in keys}
@@ -137,7 +137,7 @@ def write(cls, path: str, data: dict) -> None:
         cls._write_file(os.path.join(path, cls.tests_filename), tests, cls.tests_field_names_unnested)
 
 
-class Checklist:
+class Checklist(ExportableMixin):
     def __init__(self, checklist_path: str, checklist_format: ChecklistFormat):
         if not os.path.exists(checklist_path):
             raise FileNotFoundError("Checklist file not found.")
@@ -179,82 +179,45 @@ def to_yaml(self, output_path: str, no_preserve_format: bool = False, exist_ok:
                 "Roundtripping is not yet implemented. If you want to dump the YAML file disregarding the original "
                 "formatting, use `no_preserve_format=True`."
             )
-        self.__filedump_check(output_path, exist_ok)
+        self._filedump_check(output_path, exist_ok)
         YamlChecklistIO.write(output_path, self.content)
 
     def to_csv(self, output_path: str, exist_ok: bool = False):
         """Dump the checklist to a directory containing three separate CSV files."""
-        self.__filedump_check(output_path, exist_ok)
+        self._filedump_check(output_path, exist_ok)
         CsvChecklistIO.write(output_path, self.content)
 
     def as_markdown(self):
-        return self._get_md_representation(self.content, curr_level=1)
-
-    def _get_md_representation(self, content: dict, curr_level: int):
-        repeated_col = [k for k, v in content.items() if isinstance(v, list)]
-
-        # print out header for each item
-        md_repr = '#' * curr_level
-        if 'ID' in content.keys():
-            md_repr += f" {content['ID']}"
-        if 'Title' in content.keys():
-            md_repr += f" {content['Title']}\n\n"
-        elif 'Topic' in content.keys():
-            md_repr += f" {content['Topic']}\n\n"
-
-        # print out non-title, non-repeated items
-        for k, v in content.items():
-            if k not in repeated_col and k not in ['Title', 'Topic', 'ID']:
-                md_repr += f'**{k}**: {v.replace("'", "\\'")}\n\n'
-
-        # handle repeated columns and references
-        for k in repeated_col:
-            if k != 'References':
-                for item in content[k]:
-                    md_repr += self._get_md_representation(item, curr_level=curr_level + 1)
-            else:
-                md_repr += '**References:**\n\n' + '\n'.join(f'  - {item}' for item in content['References']) + '\n\n'
-
-        return md_repr
-
-    @staticmethod
-    def __filedump_check(output_path: str, exist_ok: bool):
-        if not exist_ok and os.path.exists(output_path):
-            raise FileExistsError("Output file already exists. Use `exist_ok=True` to overwrite.")
-        return True
-
-    def export_html(self, output_path: str, exist_ok: bool = False):
-        self.__filedump_check(output_path, exist_ok)
-        pypandoc.convert_text(self.as_markdown(), 'html', format='md', outputfile=output_path)
-
-    def export_pdf(self, output_path: str, exist_ok: bool = False):
-        self.__filedump_check(output_path, exist_ok)
-        pypandoc.convert_text(self.as_markdown(), 'pdf', format='md', outputfile=output_path,
-                              extra_args=['--pdf-engine=tectonic'])
-
-    def export_quarto(self, output_path: str, exist_ok: bool = False):
-        self.__filedump_check(output_path, exist_ok)
+        def _get_md_representation(content: dict, curr_level: int):
+            repeated_col = [k for k, v in content.items() if isinstance(v, list)]
+
+            # print out header for each item
+            md_repr = '#' * curr_level
+            if 'ID' in content.keys():
+                md_repr += f" {content['ID']}"
+            if 'Title' in content.keys():
+                md_repr += f" {content['Title']}\n\n"
+            elif 'Topic' in content.keys():
+                md_repr += f" {content['Topic']}\n\n"
+
+            # print out non-title, non-repeated items
+            for k, v in content.items():
+                if k not in repeated_col and k not in ['Title', 'Topic', 'ID']:
+                    md_repr += f'**{k}**: {v}\n\n'
+
+            # handle repeated columns and references
+            for k in repeated_col:
+                if k != 'References':
+                    for item in content[k]:
+                        md_repr += _get_md_representation(item, curr_level=curr_level + 1)
+                else:
+                    md_repr += '**References:**\n\n' + '\n'.join(
+                        f'  - {item}' for item in content['References']) + '\n\n'
+
+            return md_repr
+
+        return _get_md_representation(self.content, curr_level=1)
+
+    def as_quarto_markdown(self):
         header = f'---\ntitle: "{self.content['Title']}"\nformat:\n  html:\n  code-fold: true\n---\n\n'
-        qmd_repr = header + self.as_markdown()
-        with open(output_path, "w", encoding="utf-8") as f:
-            f.write(qmd_repr)
-
-
-if __name__ == "__main__":
-    def example(checklist_path: str):
-        """Example calls. To be removed later.
-
-        Example:
-        python src/test_creation/modules/checklist/checklist.py ./checklist/test-dump-csv
-
-        Note that the supplied path must be a directory containing 3 CSV files:
-        1. `overview.csv`
-        2. `topics.csv`
-        3. `tests.csv`
-        """
-        checklist = Checklist(checklist_path, checklist_format=ChecklistFormat.CSV)
-        print(checklist.as_markdown())
-        checklist.export_html("checklist.html", exist_ok=True)
-
-
-    fire.Fire(example)
+        return header + self.as_markdown()
diff --git a/src/test_creation/modules/mixins.py b/src/test_creation/modules/mixins.py
new file mode 100644
index 0000000..820973c
--- /dev/null
+++ b/src/test_creation/modules/mixins.py
@@ -0,0 +1,47 @@
+import os
+from abc import ABC, abstractmethod
+
+import pypandoc
+
+
+class WriteableMixin:
+    """A mixin for classes which will write content to filesystem."""
+    def _filedump_check(self, output_path: str, exist_ok: bool):
+        if not exist_ok and os.path.exists(output_path):
+            raise FileExistsError("Output file already exists. Use `exist_ok=True` to overwrite.")
+        return True
+
+
+class ExportableMixin(WriteableMixin, ABC):
+    """A mixin that provides functionality to export (dump) content as HTML/PDF/Quarto documents.
+
+    Extends WriteableMixin.
+
+    Relies on markdown representations of the object.
+    The class including mixin must have `.as_markdown()` and `.as_quarto_markdown()` implemented.
+    """
+    @abstractmethod
+    def as_markdown(self) -> str:
+        pass
+
+    @abstractmethod
+    def as_quarto_markdown(self) -> str:
+        pass
+
+    @staticmethod
+    def _escape_slashes(string: str) -> str:
+        return string.replace("'", "\\'")
+
+    def export_html(self, output_path: str, exist_ok: bool = False):
+        self._filedump_check(output_path, exist_ok)
+        pypandoc.convert_text(self._escape_slashes(self.as_markdown()), 'html', format='md', outputfile=output_path)
+
+    def export_pdf(self, output_path: str, exist_ok: bool = False):
+        self._filedump_check(output_path, exist_ok)
+        pypandoc.convert_text(self.as_markdown(), 'pdf', format='md', outputfile=output_path,
+                              extra_args=['--pdf-engine=tectonic'])
+
+    def export_quarto(self, output_path: str, exist_ok: bool = False):
+        self._filedump_check(output_path, exist_ok)
+        with open(output_path, "w", encoding="utf-8") as f:
+            f.write(self.as_quarto_markdown())
diff --git a/src/test_creation/modules/workflow/parse.py b/src/test_creation/modules/workflow/parse.py
index e408f64..cf54586 100644
--- a/src/test_creation/modules/workflow/parse.py
+++ b/src/test_creation/modules/workflow/parse.py
@@ -1,21 +1,24 @@
 import pandas as pd
 import os
-import pypandoc
+from typing import Union
 
+from ..mixins import ExportableMixin
 
-class ResponseParser:
+
+class ResponseParser(ExportableMixin):
     def __init__(self, response):
         self.response = response
         self.evaluation_report = None
 
-    def get_completeness_score(self, score_format: str = 'fraction', verbose: bool = False) -> str:
+    def get_completeness_score(self, score_format: str = 'fraction', verbose: bool = False) -> Union[float, str]:
         """
         Compute Evaluation Report and Completeness Score
         """
         report_df = pd.DataFrame(self.response)['report'].explode('report').apply(pd.Series)
         report_df = report_df.rename(columns={"file": "File Path"})
         report_df['Function References'] = report_df[['File Path', 'Functions']].to_dict(orient='records')
-        report_df['Observation'] = '(' + report_df['File Path'].apply(lambda x: os.path.split(x)[-1]) + ') ' + report_df['Observation']
+        report_df['Observation'] = '(' + report_df['File Path'].apply(lambda x: os.path.split(x)[-1]) + ') ' + \
+                                   report_df['Observation']
         report_df = report_df.groupby(['ID', 'Title']).agg({
             'Requirement': ['max'],
             'Score': ['max', 'count'],
@@ -38,57 +41,35 @@ def get_completeness_score(self, score_format: str = 'fraction', verbose: bool =
             print()
         return score
 
-    # FIXME From checklist.py. To be refactored 
-    def _get_md_representation(self, content: dict, curr_level: int):
-        repeated_col = [k for k, v in content.items() if isinstance(v, list)]
-
-        # print out header for each item
-        md_repr = '#' * curr_level
-        if 'ID' in content.keys():
-            md_repr += f" {content['ID']}"
-        if 'Title' in content.keys():
-            md_repr += f" {content['Title']}\n\n"
-        elif 'Topic' in content.keys():
-            md_repr += f" {content['Topic']}\n\n"
-
-        # print out non-title, non-repeated items
-        for k, v in content.items():
-            if k not in repeated_col and k not in ['Title', 'Topic', 'ID']:
-                md_repr += f'**{k}**: {v.replace("'", "\\'")}\n\n'
-
-        # handle repeated columns and references
-        point_form_col = ['References', 'Function References', 'Observations']
-        for k in repeated_col:
-            if k not in point_form_col:
-                for item in content[k]:
-                    md_repr += self._get_md_representation(item, curr_level=curr_level + 1)
-            else:
-                md_repr += f'**{k}:**\n\n' + '\n'.join(f'  - {item}' for item in content[k]) + '\n\n'
-
-        return md_repr
-    
-    # FIXME. From checklist.py. To be refactored 
-    @staticmethod
-    def __filedump_check(output_path: str, exist_ok: bool):
-        if not exist_ok and os.path.exists(output_path):
-            raise FileExistsError("Output file already exists. Use `exist_ok=True` to overwrite.")
-        return True
-
-    # FIXME. From checklist.py. To be refactored 
-    def export_html(self, content: str, output_path: str, exist_ok: bool = False):
-        self.__filedump_check(output_path, exist_ok)
-        pypandoc.convert_text(content, 'html', format='md', outputfile=output_path)
-
-    # FIXME. From checklist.py. To be refactored 
-    def export_pdf(self, content: str, output_path: str, exist_ok: bool = False):
-        self.__filedump_check(output_path, exist_ok)
-        pypandoc.convert_text(content, 'pdf', format='md', outputfile=output_path,
-                              extra_args=['--pdf-engine=tectonic'])
+    def as_markdown(self) -> str:
+        def _get_md_representation(content: dict, curr_level: int):
+            repeated_col = [k for k, v in content.items() if isinstance(v, list)]
+
+            # print out header for each item
+            md_repr = '#' * curr_level
+            if 'ID' in content.keys():
+                md_repr += f" {content['ID']}"
+            if 'Title' in content.keys():
+                md_repr += f" {content['Title']}\n\n"
+            elif 'Topic' in content.keys():
+                md_repr += f" {content['Topic']}\n\n"
+
+            # print out non-title, non-repeated items
+            for k, v in content.items():
+                if k not in repeated_col and k not in ['Title', 'Topic', 'ID']:
+                    md_repr += f'**{k}**: {v}\n\n'
+
+            # handle repeated columns and references
+            point_form_col = ['References', 'Function References', 'Observations']
+            for k in repeated_col:
+                if k not in point_form_col:
+                    for item in content[k]:
+                        md_repr += _get_md_representation(item, curr_level=curr_level + 1)
+                else:
+                    md_repr += f'**{k}:**\n\n' + '\n'.join(f'  - {item}' for item in content[k]) + '\n\n'
+
+            return md_repr
 
-    def export_evaluation_report(self, output_path, format='html', exist_ok: bool = False):
-        """
-        Export the test evaluation report
-        """
         score = self.get_completeness_score(score_format='fraction')
         summary_df = self.evaluation_report[['ID', 'Title', 'is_Satisfied', 'n_files_tested']]
         details = self.evaluation_report[['ID', 'Title', 'Requirement', 'Observations', 'Function References']].to_dict(orient='records')
@@ -98,8 +79,20 @@ def export_evaluation_report(self, output_path, format='html', exist_ok: bool =
         export_content['Report Areas'] = []
         export_content['Report Areas'].append({'Title': 'Summary', 'Completeness Score': score, 'Completeness Score per Checklist Item': '\n\n' + summary_df.to_markdown(index=False)})
         export_content['Report Areas'].append({'Title': 'Details', 'Report Detail': details})
+
+        return _get_md_representation(export_content, 1)
+
+    def as_quarto_markdown(self) -> str:
+        header = '---\ntitle: "Test Evaluation Report"\nformat:\n  html:\n  code-fold: true\n---\n\n'
+        return header + self.as_markdown()
+
+    def export_evaluation_report(self, output_path, format='html', exist_ok: bool = False):
+        """
+        Export the test evaluation report
+        """
+        print(self.as_markdown())
         if format=='html':
-            self.export_html(self._get_md_representation(export_content, curr_level=1), output_path, exist_ok)
+            self.export_html(output_path, exist_ok)
         elif format=='pdf':
-            self.export_pdf(self._get_md_representation(export_content, curr_level=1), output_path, exist_ok)
+            self.export_pdf(output_path, exist_ok)
         return
\ No newline at end of file

From 8efb878c1c0beda8d88d7988daa21ca631141439 Mon Sep 17 00:00:00 2001
From: SoloSynth1 <solosynth1@gmail.com>
Date: Fri, 24 May 2024 00:36:48 -0700
Subject: [PATCH 12/17] remove `checklist_sys.html`

---
 checklist/checklist_sys.html | 155 -----------------------------------
 1 file changed, 155 deletions(-)
 delete mode 100644 checklist/checklist_sys.html

diff --git a/checklist/checklist_sys.html b/checklist/checklist_sys.html
deleted file mode 100644
index c0c174b..0000000
--- a/checklist/checklist_sys.html
+++ /dev/null
@@ -1,155 +0,0 @@
-<h1 id="checklist-for-tests-in-machine-learning-projects">Checklist for
-Tests in Machine Learning Projects</h1>
-<p><strong>Description</strong>: This is a comprehensive checklist for
-evaluating the data and ML pipeline based on identified testing
-strategies from experts in the field.</p>
-<h2 id="general">1 General</h2>
-<p><strong>Description</strong>: The following items describe best
-practices for all tests to be written.</p>
-<h2 id="data-presence">2 Data Presence</h2>
-<p><strong>Description</strong>: The following items describe tests that
-need to be done for testing the presence of data. This area of tests
-mainly concern whether the reading and saving operations are behaving as
-expected, and any unexpected behavior would not be passed silently.</p>
-<h3 id="test-data-fetching-and-file-reading">2.1 Test Data Fetching and
-File Reading</h3>
-<p><strong>Requirement</strong>: Verify that the data fetching API or
-data file reading functionality works correctly. Ensure that proper
-error handling is in place for scenarios such as missing files,
-incorrect file formats, and network errors.</p>
-<p><strong>Explanation</strong>: Ensure that the code responsible for
-fetching or reading data can handle errors. This means if the file is
-missing, the format is wrong, or there's a network issue, the system
-should not crash but should provide a clear error message indicating the
-problem.</p>
-<p><strong>References:</strong></p>
-<ul>
-<li>(general knowledge)</li>
-</ul>
-<h2 id="data-quality">3 Data Quality</h2>
-<p><strong>Description</strong>: The following items describe tests that
-need to be done for testing the quality of data. This area of tests
-mainly concern whether the data supplied is in the expected format, data
-containing null values or outliers to make sure that the data processing
-pipeline is robust.</p>
-<h3 id="validate-data-shape-and-values">3.1 Validate Data Shape and
-Values</h3>
-<p><strong>Requirement</strong>: Check that the data has the expected
-shape and that all values meet domain-specific constraints, such as
-non-negative distances.</p>
-<p><strong>Explanation</strong>: Check that the data being used has the
-correct structure (like having the right number of columns) and that the
-values within the data make sense (e.g., distances should not be
-negative). This ensures that the data is valid and reliable for model
-training.</p>
-<p><strong>References:</strong></p>
-<ul>
-<li>alexander2024Evaluating</li>
-<li>ISO/IEC5259</li>
-</ul>
-<h3 id="check-for-duplicate-records-in-data">3.2 Check for Duplicate
-Records in Data</h3>
-<p><strong>Requirement</strong>: Check for duplicate records in the
-dataset and ensure that there are none.</p>
-<p><strong>Explanation</strong>: Ensure that the dataset does not
-contain duplicate entries, as these can skew the results and reduce the
-model's performance. The test should identify any repeated records so
-they can be removed or investigated.</p>
-<p><strong>References:</strong></p>
-<ul>
-<li>ISO/IEC5259</li>
-</ul>
-<h2 id="data-ingestion">4 Data Ingestion</h2>
-<p><strong>Description</strong>: The following items describe tests that
-need to be done for testing if the data is ingestion properly.</p>
-<h3 id="verify-data-split-proportion">4.1 Verify Data Split
-Proportion</h3>
-<p><strong>Requirement</strong>: Check that the data is split into
-training and testing sets in the expected proportion.</p>
-<p><strong>Explanation</strong>: Confirm that the data is divided
-correctly into training and testing sets according to the intended
-ratio. This is crucial for ensuring that the model is trained and
-evaluated properly, with representative samples in each set.</p>
-<p><strong>References:</strong></p>
-<ul>
-<li>openja2023studying</li>
-<li>DBLP:conf/recsys/Kula15</li>
-<li>singh2020mmf</li>
-</ul>
-<h2 id="model-fitting">5 Model Fitting</h2>
-<p><strong>Description</strong>: The following items describe tests that
-need to be done for testing the model fitting process. The unit tests
-written for this section usually mock model load and model predictions
-similarly to mocking file access.</p>
-<h3 id="test-model-output-shape">5.1 Test Model Output Shape</h3>
-<p><strong>Requirement</strong>: Validate that the model's output has
-the expected shape.</p>
-<p><strong>Explanation</strong>: Ensure that the output from the model
-has the correct dimensions and structure. For example, in a
-classification task, if the model should output probabilities for each
-class, the test should verify that the output is an array with the
-correct dimensions. Ensuring the correct output shape helps prevent
-runtime errors and ensures consistency in how data is handled
-downstream.</p>
-<p><strong>References:</strong></p>
-<ul>
-<li>openja2023studying</li>
-<li>DBLP:conf/recsys/Kula15</li>
-<li>singh2020mmf</li>
-</ul>
-<h2 id="model-evaluation">6 Model Evaluation</h2>
-<p><strong>Description</strong>: The following items describe tests that
-need to be done for testing the model evaluation process.</p>
-<h3 id="verify-evaluation-metrics-implementation">6.1 Verify Evaluation
-Metrics Implementation</h3>
-<p><strong>Requirement</strong>: Verify that the evaluation metrics are
-correctly implemented and appropriate for the model's task.</p>
-<p><strong>Explanation</strong>: Confirm that the metrics used to
-evaluate the model are implemented correctly and are suitable for the
-specific task at hand. This helps in accurately assessing the model's
-performance and understanding its strengths and weaknesses.</p>
-<p><strong>References:</strong></p>
-<ul>
-<li>openja2023studying</li>
-<li>DBLP:conf/recsys/Kula15</li>
-<li>singh2020mmf</li>
-</ul>
-<h3 id="evaluate-models-performance-against-thresholds">6.2 Evaluate
-Model’s Performance Against Thresholds</h3>
-<p><strong>Requirement</strong>: Compute evaluation metrics for both the
-training and testing datasets and ensure that these metrics exceed
-predefined threshold values, indicating acceptable model
-performance.</p>
-<p><strong>Explanation</strong>: This ensures that the model's
-performance meets or exceeds certain benchmarks. By setting thresholds
-for metrics like accuracy or precision, you can automatically flag
-models that underperform or overfit. This is crucial for maintaining a
-baseline quality of results and for ensuring that the model meets the
-requirements necessary for deployment.</p>
-<p><strong>References:</strong></p>
-<ul>
-<li>openja2023studying</li>
-<li>DBLP:conf/recsys/Kula15</li>
-<li>singh2020mmf</li>
-</ul>
-<h2 id="artifact-testing">7 Artifact Testing</h2>
-<p><strong>Description</strong>: The following items involves explicit
-checks for behaviors that we expect the artifacts e.g. models, plots,
-etc., to follow.</p>
-<h2 id="data-quality-optional">8 Data Quality (Optional)</h2>
-<p><strong>Description</strong>: The following items describe tests that
-need to be done for testing the quality of data, but they may not be
-applicable to all projects.</p>
-<h3 id="validate-outliers-detection-and-handling">8.1 Validate Outliers
-Detection and Handling</h3>
-<p><strong>Requirement</strong>: Detect outliers in the dataset. Ensure
-that the outlier detection mechanism is sensitive enough to flag true
-outliers while ignoring minor anomalies.</p>
-<p><strong>Explanation</strong>: The detection method should be precise
-enough to catch significant anomalies without being misled by minor
-variations. This is important for maintaining data quality and ensuring
-the model's reliability in certain projects.</p>
-<p><strong>References:</strong></p>
-<ul>
-<li>ISO/IEC5259</li>
-</ul>

From 01089033e4ad7e0c388b5d7eee5e52ece293eb87 Mon Sep 17 00:00:00 2001
From: SoloSynth1 <solosynth1@gmail.com>
Date: Fri, 24 May 2024 00:41:19 -0700
Subject: [PATCH 13/17] fix incorrect method name; remove unnecessary prints

---
 src/test_creation/modules/mixins.py         | 5 +++--
 src/test_creation/modules/workflow/parse.py | 1 -
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/test_creation/modules/mixins.py b/src/test_creation/modules/mixins.py
index 820973c..67c9b5b 100644
--- a/src/test_creation/modules/mixins.py
+++ b/src/test_creation/modules/mixins.py
@@ -29,12 +29,13 @@ def as_quarto_markdown(self) -> str:
         pass
 
     @staticmethod
-    def _escape_slashes(string: str) -> str:
+    def _escape_single_quotes(string: str) -> str:
         return string.replace("'", "\\'")
 
     def export_html(self, output_path: str, exist_ok: bool = False):
         self._filedump_check(output_path, exist_ok)
-        pypandoc.convert_text(self._escape_slashes(self.as_markdown()), 'html', format='md', outputfile=output_path)
+        pypandoc.convert_text(self._escape_single_quotes(self.as_markdown()), 'html', format='md',
+                              outputfile=output_path)
 
     def export_pdf(self, output_path: str, exist_ok: bool = False):
         self._filedump_check(output_path, exist_ok)
diff --git a/src/test_creation/modules/workflow/parse.py b/src/test_creation/modules/workflow/parse.py
index cf54586..d397da2 100644
--- a/src/test_creation/modules/workflow/parse.py
+++ b/src/test_creation/modules/workflow/parse.py
@@ -90,7 +90,6 @@ def export_evaluation_report(self, output_path, format='html', exist_ok: bool =
         """
         Export the test evaluation report
         """
-        print(self.as_markdown())
         if format=='html':
             self.export_html(output_path, exist_ok)
         elif format=='pdf':

From 858034750909778ef679496b1b5f7066da20a76e Mon Sep 17 00:00:00 2001
From: SoloSynth1 <solosynth1@gmail.com>
Date: Fri, 24 May 2024 14:40:34 -0700
Subject: [PATCH 14/17] add checks for permission and file/directory
 expectations

---
 src/test_creation/modules/checklist/checklist.py | 2 +-
 src/test_creation/modules/mixins.py              | 9 ++++++++-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/src/test_creation/modules/checklist/checklist.py b/src/test_creation/modules/checklist/checklist.py
index e0e2091..9b27323 100644
--- a/src/test_creation/modules/checklist/checklist.py
+++ b/src/test_creation/modules/checklist/checklist.py
@@ -184,7 +184,7 @@ def to_yaml(self, output_path: str, no_preserve_format: bool = False, exist_ok:
 
     def to_csv(self, output_path: str, exist_ok: bool = False):
         """Dump the checklist to a directory containing three separate CSV files."""
-        self._filedump_check(output_path, exist_ok)
+        self._filedump_check(output_path, exist_ok, expects_directory_if_exists=True)
         CsvChecklistIO.write(output_path, self.content)
 
     def as_markdown(self):
diff --git a/src/test_creation/modules/mixins.py b/src/test_creation/modules/mixins.py
index 67c9b5b..3951e83 100644
--- a/src/test_creation/modules/mixins.py
+++ b/src/test_creation/modules/mixins.py
@@ -6,9 +6,16 @@
 
 class WriteableMixin:
     """A mixin for classes which will write content to filesystem."""
-    def _filedump_check(self, output_path: str, exist_ok: bool):
+    def _filedump_check(self, output_path: str, exist_ok: bool, expects_directory_if_exists: bool = False):
+        if not os.access(output_path, os.W_OK):
+            raise PermissionError(f"Write permission is not granted for the output path: {output_path}")
         if not exist_ok and os.path.exists(output_path):
             raise FileExistsError("Output file already exists. Use `exist_ok=True` to overwrite.")
+        else:
+            if expects_directory_if_exists and os.path.isfile(output_path):
+                raise NotADirectoryError("An object already exists in the path, expecting a directory but it is a file.")
+            elif not expects_directory_if_exists and os.path.isdir(output_path):
+                raise IsADirectoryError("An object already exists in the path, expecting a file but it is a directory.")
         return True
 
 

From e136089d60772e6c071855a2bd2e0e6fa274a762 Mon Sep 17 00:00:00 2001
From: SoloSynth1 <solosynth1@gmail.com>
Date: Fri, 24 May 2024 15:05:47 -0700
Subject: [PATCH 15/17] fix incorrect checks

---
 src/test_creation/modules/mixins.py | 26 +++++++++++++++++---------
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/src/test_creation/modules/mixins.py b/src/test_creation/modules/mixins.py
index 3951e83..2fbf950 100644
--- a/src/test_creation/modules/mixins.py
+++ b/src/test_creation/modules/mixins.py
@@ -7,15 +7,23 @@
 class WriteableMixin:
     """A mixin for classes which will write content to filesystem."""
     def _filedump_check(self, output_path: str, exist_ok: bool, expects_directory_if_exists: bool = False):
-        if not os.access(output_path, os.W_OK):
-            raise PermissionError(f"Write permission is not granted for the output path: {output_path}")
-        if not exist_ok and os.path.exists(output_path):
-            raise FileExistsError("Output file already exists. Use `exist_ok=True` to overwrite.")
-        else:
-            if expects_directory_if_exists and os.path.isfile(output_path):
-                raise NotADirectoryError("An object already exists in the path, expecting a directory but it is a file.")
-            elif not expects_directory_if_exists and os.path.isdir(output_path):
-                raise IsADirectoryError("An object already exists in the path, expecting a file but it is a directory.")
+        normalized_path = os.path.abspath(os.path.normpath(output_path))
+        dir_path = os.path.dirname(normalized_path)
+        print(normalized_path, dir_path)
+        if not os.access(dir_path, os.W_OK):
+            raise PermissionError(f"Write permission is not granted for the output path: {dir_path}")
+
+        if not exist_ok:
+            if os.path.exists(normalized_path):
+                raise FileExistsError("Output file already exists. Use `exist_ok=True` to overwrite.")
+        elif os.path.exists(normalized_path):
+            if expects_directory_if_exists and not os.path.isdir(normalized_path):
+                raise NotADirectoryError("An non-directory already exists in the path but the write operation is expecting to overwrite a directory.")
+            elif not expects_directory_if_exists and not os.path.isfile(normalized_path):
+                raise IsADirectoryError("An non-file object already exists in the path but the write operation is expecting to overwrite a file.")
+
+            if not os.access(normalized_path, os.W_OK):
+                raise PermissionError(f"Write permission is not granted for the output path: {normalized_path}")
         return True
 
 

From 8d608c8f635b570551025f3e0f8d04f9cf56c8ea Mon Sep 17 00:00:00 2001
From: SoloSynth1 <solosynth1@gmail.com>
Date: Sun, 26 May 2024 17:45:16 -0700
Subject: [PATCH 16/17] add check for extension given when exporting reports

---
 src/test_creation/modules/mixins.py | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/src/test_creation/modules/mixins.py b/src/test_creation/modules/mixins.py
index 2fbf950..6bc25a5 100644
--- a/src/test_creation/modules/mixins.py
+++ b/src/test_creation/modules/mixins.py
@@ -47,17 +47,34 @@ def as_quarto_markdown(self) -> str:
     def _escape_single_quotes(string: str) -> str:
         return string.replace("'", "\\'")
 
-    def export_html(self, output_path: str, exist_ok: bool = False):
+    def __format_check(self, output_path, format):
+        formats = {
+            "pdf": ["pdf"],
+            "html": ["htm", "html"],
+            "qmd": ["qmd"]
+        }
+
+        normalized_ext = output_path.split(".")[-1].lower()
+        if normalized_ext not in formats[format]:
+            raise ValueError(f"Output file path `{output_path}` does not meet expectation. When specifying `{format}` to be exported, please use one of the following extensions: {str(formats[format])}.")
+
+    def _export_check(self, output_path: str, format: str, exist_ok: bool):
         self._filedump_check(output_path, exist_ok)
+        self.__format_check(output_path, format)
+
+    def export_html(self, output_path: str, exist_ok: bool = False):
+        self._export_check(output_path, format="html", exist_ok=exist_ok)
         pypandoc.convert_text(self._escape_single_quotes(self.as_markdown()), 'html', format='md',
                               outputfile=output_path)
 
     def export_pdf(self, output_path: str, exist_ok: bool = False):
+        self._export_check(output_path, format="pdf", exist_ok=exist_ok)
         self._filedump_check(output_path, exist_ok)
         pypandoc.convert_text(self.as_markdown(), 'pdf', format='md', outputfile=output_path,
                               extra_args=['--pdf-engine=tectonic'])
 
     def export_quarto(self, output_path: str, exist_ok: bool = False):
+        self._export_check(output_path, format="qmd", exist_ok=exist_ok)
         self._filedump_check(output_path, exist_ok)
         with open(output_path, "w", encoding="utf-8") as f:
             f.write(self.as_quarto_markdown())

From 186d21d0bea5170bcecbe5a519a37ebe7c097dd9 Mon Sep 17 00:00:00 2001
From: shumlh <tonyuglobe@gmail.com>
Date: Tue, 28 May 2024 10:57:05 -0700
Subject: [PATCH 17/17] fix: Fix the string format and indentation in Function
 'as_quarto_markdown' in checklist.py

---
 src/test_creation/modules/checklist/checklist.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/test_creation/modules/checklist/checklist.py b/src/test_creation/modules/checklist/checklist.py
index 9b27323..0b943fa 100644
--- a/src/test_creation/modules/checklist/checklist.py
+++ b/src/test_creation/modules/checklist/checklist.py
@@ -219,5 +219,5 @@ def _get_md_representation(content: dict, curr_level: int):
         return _get_md_representation(self.content, curr_level=1)
 
     def as_quarto_markdown(self):
-        header = f'---\ntitle: "{self.content['Title']}"\nformat:\n  html:\n  code-fold: true\n---\n\n'
+        header = header = '---\ntitle: "{}"\nformat:\n  html:\n    code-fold: true\n---\n\n'.format(self.content['Title'])
         return header + self.as_markdown()