# Quality Assesment Form

This Jupyter Notebook provides the source used to create the tables from the Quality Assesment Form responses.  
The following steps are performed:
1. Load responses
2. Separate answers and scores
3. Export the tables

### 1. Importing the libraries

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns

from utils import *

### 2. Loading the dataset

Google Forms was used to collect the responses for the Data Extraction Form.  
The responses were exported first to Google Sheets and then from Google Sheets to a CSV file.  
Finally, the CSV file is loaded using the pandas

In [2]:
qaf_df = QualityAssessment().data

### 3. Creating and validate the scores dataframe

The scores dataframe can be directly obtained from the Quality Assement Form responses.  
Each paper has a score ranging from 0.0 to 10.0, according to the answer to each question.  
Each __Yes__ answer yields 1.0 point, while __Partially__ and __No__ answers yield 0.5, and 0 points, respectively.



In [3]:
answers_df = (qaf_df.replace(to_replace=QualityAssessment.YES, value=1.0)
               .replace(to_replace=QualityAssessment.PARTIALLY, value=0.5)
               .replace(to_replace=QualityAssessment.NO, value=0.0))


(answers_df.iloc[:, -11:-1].sum(axis=1) == qaf_df[QualityAssessment.SCORE]).value_counts()

True    82
dtype: int64

### 4. Creating the answers dataframe

The answers dataframe is obtained from summarizing the original dataframe.  
Each row of the original dataframe represents a paper, and each column represents a question.  
For each question (column) there is the need to obtain the count of each answer.  
These values are added in an auxiliary dataframe which contains three rows (one representing each possible answer).  
The auxiliary dataframe is used to build the answers dataframe by appending to it at each iteration.

In [7]:
try:
    del qaf_df[QualityAssessment.SCORE]
except KeyError:
    pass
    
sanitized_answers_df = pd.DataFrame(columns=[QualityAssessment.YES, QualityAssessment.PARTIALLY, QualityAssessment.NO])
for col in qaf_df:
    count = qaf_df[col].value_counts()
      
    aux = pd.DataFrame()
    aux[QualityAssessment.QUESTION] = pd.Series(col for i in range(len(count)))
    aux[QualityAssessment.ANSWER] = count.index
    aux[QualityAssessment.COUNT] = count.values
    
    aux.drop(QualityAssessment.QUESTION, axis=1, inplace=True)
    aux = aux.T
    aux = aux.rename(columns=aux.iloc[0]).drop(QualityAssessment.ANSWER).rename(index={QualityAssessment.COUNT: col})
    aux = aux.reindex(sorted(aux.columns), axis=1)
    
    sanitized_answers_df = sanitized_answers_df.append(aux)
    
number_of_papers = len(qaf_df)
sanitized_answers_df[QualityAssessment.SOME] = sanitized_answers_df[QualityAssessment.PARTIALLY] + sanitized_answers_df[QualityAssessment.YES]
sanitized_answers_df = sanitized_answers_df.div(number_of_papers).mul(100).astype(float).round(2)
sanitized_answers_df = sanitized_answers_df.rename_axis().reset_index().rename(columns={Table.INDEX: QualityAssessment.QUESTION})

with open(f'{OUTPUT_PATH}/QAF-Answers.tex', 'w') as file:

    table_header = (
'''
\\begin{table}[htb!]
\t\\caption{Answers gathered in the QAF}
\t\\begin{tabular}{@{}lrrrr@{}}
\t\t\\toprule
''')
    
    table_footer = (
'''
\t\t\\bottomrule
\t\\end{tabular}
\t\\label{tab:answers}
\\end{table}
''')
    
    file.write(table_header)
    print(table_header, end='')
    
    formatted_header = f'\t\t{QualityAssessment.QUESTION:<4} & {QualityAssessment.YES:>5} & {QualityAssessment.PARTIALLY:>5} & {QualityAssessment.NO:>5} & {QualityAssessment.SOME:>4} \\\\'
    
    file.write(formatted_header)
    print(formatted_header, end='')
    
    midrule = f'\n\t\t\\midrule'
    
    file.write(midrule)
    print(midrule, end='')
    
    for index, row in sanitized_answers_df.iterrows():
        formatted_row = f'\n\t\t{row[QualityAssessment.QUESTION].upper():<8} & {row[QualityAssessment.YES]:>4.2f} & {row[QualityAssessment.PARTIALLY]:>9.2f} & {row[QualityAssessment.NO]:>5.2f} & {row[QualityAssessment.SOME]:>13.2f} \\\\'
        
        file.write(formatted_row)
        print(formatted_row, end='')
    
    file.write(table_footer)
    print(table_footer)
    


\begin{table}[htb!]
	\caption{Answers gathered in the QAF}
	\begin{tabular}{@{}lrrrr@{}}
		\toprule
		Question &   Yes & Partially &    No & Some level of \\
		\midrule
		QA1      & 97.56 &      1.22 &  1.22 &         98.78 \\
		QA2      & 68.29 &     20.73 & 10.98 &         89.02 \\
		QA3      & 68.29 &     10.98 & 20.73 &         79.27 \\
		QA4      & 63.41 &      1.22 & 35.37 &         64.63 \\
		QA5      & 91.46 &      2.44 &  6.10 &         93.90 \\
		QA6      & 93.90 &      3.66 &  2.44 &         97.56 \\
		QA7      & 59.76 &     15.85 & 24.39 &         75.61 \\
		QA8      & 51.22 &     37.80 & 10.98 &         89.02 \\
		QA9      & 31.71 &     56.10 & 12.20 &         87.80 \\
		QA10     & 78.05 &     10.98 & 10.98 &         89.02 \\
		\bottomrule
	\end{tabular}
	\label{tab:answers}
\end{table}



### 5. Creating the scores table
Here, we get the scores of papers and create the table used in the paper.

In [5]:
scores_df = answers_df[QualityAssessment.SCORE]
total_papers = len(scores_df)
cumulative_sum = 0
cumulative_percentage = 0

with open(f'{OUTPUT_PATH}/QAF-Scores.tex', 'w') as file:
    
    percentage = '\\%'
    
    table_header = (
'''
\\begin{table}[htb!]
\t\\caption{Scores of papers included in the primary selection}
\t\\begin{tabular}{@{}rrrrr@{}}
\t\t\\toprule
''')
    
    table_footer = (
'''
\t\t\\bottomrule
\t\\end{tabular}
\t\\label{tab:scores}
\\end{table}
''')
   
    file.write(table_header)
    print(table_header, end='')
    
    formatted_header = f'\t\t{"Score":>4} & {"Amount":>6} & {"Cumulative Amount":>17} & {percentage:>5} & {"Cumulative"} {percentage} \\\\'

    file.write(formatted_header)
    print(formatted_header)
    
    midrule = '\t\t\\midrule'
    file.write(midrule)
    print(midrule, end='')
    
    for score, amount in sorted(scores_df.value_counts().iteritems(), key=lambda x: x[0]):
        percentage = amount / total_papers * 100
        cumulative_sum += amount
        cumulative_percentage += percentage
        
        formatted_row = f'\n\t\t{score:>5} & {amount:>6} & {cumulative_sum:>17} & {percentage:>5.2f} & {cumulative_percentage:>13.2f} \\\\'
        
        file.write(formatted_row)
        print(formatted_row, end='')
        
    file.write(table_footer)
    print(table_footer, end='')


\begin{table}[htb!]
	\caption{Scores of papers included in the primary selection}
	\begin{tabular}{@{}rrrrr@{}}
		\toprule
		Score & Amount & Cumulative Amount &    \% & Cumulative \% \\
		\midrule
		  5.0 &      1 &                 1 &  1.22 &          1.22 \\
		  5.5 &      6 &                 7 &  7.32 &          8.54 \\
		  6.0 &      9 &                16 & 10.98 &         19.51 \\
		  6.5 &      5 &                21 &  6.10 &         25.61 \\
		  7.0 &      8 &                29 &  9.76 &         35.37 \\
		  7.5 &      6 &                35 &  7.32 &         42.68 \\
		  8.0 &     13 &                48 & 15.85 &         58.54 \\
		  8.5 &      5 &                53 &  6.10 &         64.63 \\
		  9.0 &     11 &                64 & 13.41 &         78.05 \\
		  9.5 &     16 &                80 & 19.51 &         97.56 \\
		 10.0 &      2 &                82 &  2.44 &        100.00 \\
		\bottomrule
	\end{tabular}
	\label{tab:scores}
\end{table}


### 6. Counting number of papers included and excluded
Shows the amount of papers included (False) and excluded (True) with the Quality Assessment Form.

In [6]:
(scores_df < 6.5).value_counts()

False    66
True     16
Name: score, dtype: int64