# Recoding data using Tally, the API for market research

Tally is available on pip, to install it run

```
pip install datasmoothie-tally-client
```

If you are running this in gitpod, the python client has already been installed.

In [89]:
import tally
import os
import pandas as pd
import pprint as pp

## Working with different data sources

Tally works with SPSS, CSV files, the Confirmit API and Unicom/Dimensions files (mdd/ddf). Here we demonstrate both using CSV and SPSS.

You need to get a Tally API key to run the example. Get in touch at info@datasmoothie.com if you need one. 

In [90]:
# we store the tally key in an environment variable, get in touch to get your own key
dataset = tally.DataSet(api_key=os.environ.get('tally_api_key'))
dataset.use_spss('data/Example Data (A).sav')

# also compatible with Confirmit, Nebu, Dimensions.

## Recode a variable

We will recode a variable called `q14r06c03` which asks whether people agree that the waiting time in Store 3 is acceptible using Tally's recode fucntion ([documented here](https://tally.datasmoothie.com/#tag/Data-Processing/operation/recode)).

We want to reverse the answer codes so that 1 becomes 5, 2 becomes 4 etc.

First we take a look at what the meta currently looks like:

In [91]:
dataset.meta(variable='q14r06c03')


Unnamed: 0,codes,texts,missing
1,1,Strongly disagree,
2,2,Disagree,
3,3,Neither agree nor disagree,
4,4,Agree,
5,5,Strongly agree,


The recode method can take very complicated mapping instructions, so when we recode we have to manage the meta data ourselves and make sure it matches the new encoding. So, we create a mapper for the labels as well

In [92]:
new_label_mapper = {(i+1):k for i,k in enumerate(reversed(list(dataset.meta(variable='q14r06c03')['texts'])))}
new_label_mapper

{1: 'Strongly agree',
 2: 'Agree',
 3: 'Neither agree nor disagree',
 4: 'Disagree',
 5: 'Strongly disagree'}

Then we create a mapper for the data and run `recode` and `set_value_texts`.

In [93]:
mapper = {
    5: {'q14r06c03':[1]},
    4: {'q14r06c03':[2]},
    3: {'q14r06c03':[3]},
    2: {'q14r06c03':[4]},
    1: {'q14r06c03':[5]}
}
dataset.recode(target='q14r06c03', mapper=mapper)
dataset.set_value_texts(name='q14r06c03', renamed_vals=new_label_mapper)
dataset.meta(variable='q14r06c03')

Unnamed: 0,codes,texts,missing
1,1,Strongly agree,
2,2,Agree,
3,3,Neither agree nor disagree,
4,4,Disagree,
5,5,Strongly disagree,


In [94]:
dataset.crosstab(x='q14r06c03')

Unnamed: 0_level_0,Question,Total
Unnamed: 0_level_1,Values,Total
Question,Values,Unnamed: 2_level_2
q14r06c03. Store 3 - The wait time when checking out was acceptable.,Base,4091.0
q14r06c03. Store 3 - The wait time when checking out was acceptable.,Strongly agree,0.0
q14r06c03. Store 3 - The wait time when checking out was acceptable.,Agree,1040.0
q14r06c03. Store 3 - The wait time when checking out was acceptable.,Neither agree nor disagree,1007.0
q14r06c03. Store 3 - The wait time when checking out was acceptable.,Disagree,1011.0
q14r06c03. Store 3 - The wait time when checking out was acceptable.,Strongly disagree,1033.0


### Removing answer codes

In [95]:
dataset.remove_values(
    name="q14r06c03",
    remove=[3]
)
dataset.crosstab(x='q14r06c03')

Unnamed: 0_level_0,Question,Total
Unnamed: 0_level_1,Values,Total
Question,Values,Unnamed: 2_level_2
q14r06c03. Store 3 - The wait time when checking out was acceptable.,Base,3084.0
q14r06c03. Store 3 - The wait time when checking out was acceptable.,Strongly agree,0.0
q14r06c03. Store 3 - The wait time when checking out was acceptable.,Agree,1040.0
q14r06c03. Store 3 - The wait time when checking out was acceptable.,Disagree,1011.0
q14r06c03. Store 3 - The wait time when checking out was acceptable.,Strongly disagree,1033.0


### Extend answer codes

In [96]:
dataset.extend_values(name='q14r06c03', ext_values=[[99, "No answer"]])
dataset.meta(variable='q14r06c03')

Unnamed: 0,codes,texts,missing
1,1,Strongly agree,
2,2,Agree,
3,4,Disagree,
4,5,Strongly disagree,
5,99,No answer,
