# Exploration of Interview Data

This notebook contains helper functions that enable to explore the coded interview data further. 

In [None]:
import pandas as pd

## Data Preparation

Load the data and remove rows where the interview participant explicitly stated that they do not perceive any quality defect. However, other than in the [analysis](./../analytics/analysis.ipynb) we retain (1) incomplete and (2) unspecific data as it may be interesting for other types of investigation.

In [None]:
df = pd.read_excel('./../data/interview-data.xlsx', sheet_name='Data').fillna('na')
df = df[df['M'] == True]

In [None]:
# list the variables containing codes, i.e., remove all variables which contain supplementary information or verbatim mentions
allvars = [ 
    'ID',
    'Quality Factor 1', 'Entity-Fact 1', 'Quality Factor 2', 'Entity-Fact 2',
    'Context Factor 1', 'Context Factor 2', 'Context Factor 3',
    'Activity 1', 'Attribute 1', 'Impact 1', 'Activity 2', 'Attribute 2', 'Impact 2'
]

## Helper functions

In the following code blocks, we define helper functions which generate query strings for the `pandas.DataFrame.query()` method. By providing for example the `query_qf()` method with a quality factor, it constructs a query that searches for all rows in which either the column `Quality Factor 1` or `Quality Factor 2` equal the provided value.

In [None]:
def query_qf(qualityfactor: str, entityfact: str = "") -> str : 
    queries: [str] = []
    for idx in ["1", "2"]:
        query: str = f'`Quality Factor {idx}`=="{qualityfactor}"'
        if entityfact:
            query += f' & `Entity-Fact {idx}`=="{entityfact}"'
        queries.append(query)

    return '(' + ' | '.join(queries) + ')'

In [None]:
def query_context(contextfactors: [str]) -> str :
    col_ctx: [str] = ['Context Factor 1', 'Context Factor 2', 'Context Factor 3']

    queries = [f'`{column}` in {contextfactors}' for column in col_ctx]

    return '(' + ' | '.join(queries) + ')'

In [None]:
def query_activity(activity: str, attribute: str = "", impact: int = None) -> str:
    queries: [str] = []
    for idx in ["1", "2"]:
        query = f'`Activity {idx}`=="{activity}"'
        if attribute:
            query += f' & `Attribute {idx}`=="{attribute}"'
            if impact:
                query += f' & `Impact {idx}`=={impact}'
        queries.append(query)

    return '(' + ' | '.join(queries) + ')'

## Exploration

With the helper methods, query strings can be generated that allow to filter the existing data for specific attributes.

The first query retrieves all statements in which the quality factor `orientation` impacts the activity of `Understanding`.

In [None]:
qqf = query_qf(qualityfactor='orientation')
qac = query_activity(activity='Understanding')
df.query(f'{qqf} & {qac}')[allvars]

The second query retrieves all statements in which the quality factor `orientation` with the value `solution` and the context factor `Involvement` impact the attribute `Uniqueness` of the activity `Understanding`.

In [None]:
qqf = query_qf(qualityfactor='orientation', entityfact='solution')
qct = query_context(contextfactors=['Involvement'])
qac = query_activity(activity='Understanding', attribute='Uniqueness')
df.query(f'{qqf} & {qct} & {qac}')[allvars]