# Surveys

Surveys consist of columns
* `id` for the question identifier
* `answer` for the answer of the question
* `q` which is the text of the question presented to the user (optionl)
* As usual, the DataFrame index is the timestamp of the answer.  It is the convention that all responses in a one single survey instance have the same timestamp, and this is used to link surveys together.

The raw on-disk format is "long", that is, one row per answer, which is "tidy data".  This provides the most flexible format, but often you need to do other transformations.


## Load data

In [1]:
# Artificial example survey data
import niimpy
from config import config
import niimpy.preprocessing.survey as survey

ModuleNotFoundError: No module named 'config'

In [None]:
df = niimpy.read_csv(config.SURVEY_PATH, tz='Europe/Helsinki')
df.head()

## Preprocessing

The dataframe's columns are raw questions from a survey. Some questions belong to a specific category, so we will annotate them with ids. The id is constructed from a prefix (the questionnaire category: GAD, PHQ, PSQI etc.), followed by the question number (1,2,3). Similarly, we will also the answers to meaningful numerical values.

Note: It's important that the dataframe follows the below schema before passing into niimpy.

In [None]:
# Convert column name to id, based on provided mappers from niimpy
col_id = {**PHQ2_MAP, **PSQI_MAP, **PSS10_MAP, **PANAS_MAP, **GAD2_MAP}
selected_cols = [col for col in df.columns if col in col_id.keys()]

# Convert from wide to long format
transformed_df = pd.melt(df, id_vars=['user', 'age', 'gender'], value_vars=selected_cols, var_name='question', value_name='raw_answer')

# Assign questions to codes 
transformed_df['id'] = transformed_df['question'].replace(col_id)
transformed_df.head()

Moreover, `niimpy` can convert the raw answers to numerical values for further analysis. For this, we need a mapping `{raw_answer: numerical_answer}`, which `niimpy` provides within the `survey` module that you can easily adjust to your own needs. 

Based on the question's id,  `niimpy` maps the raw answers to their numerical presentation.

In [None]:
# Transform raw answers to numerical values
transformed_df['answer'] = survey.survey_convert_to_numerical_answer(transformed_df, answer_col = 'raw_answer',
                                                                     question_id = 'id', id_map=ID_MAP_PREFIX, use_prefix=True)
transformed_df.head()

## Print survey statistics

Now that we have finally preprocessed the survey, we can extract some meaningful statistic from it. 

First, we can compute the mean, standard deviation, min, and max values of all questionnaires.

In [None]:
d = survey.survey_print_statistic(transformed_df, question_id_col = 'id', answer_col = 'answer')
pd.DataFrame(d)

You can specify the questionnaire that you want statistics of by passing a value into the `prefix` parameter.

In [None]:
d = survey.survey_print_statistic(transformed_df, question_id_col = 'id', answer_col = 'answer', prefix='PHQ')
pd.DataFrame(d)