# Example of working with Parquet data files

To get started:

1. Upload your Parquet file and CSV meta data file to the `data` folder. (Here they are called Tabulation_Test_Project.csv and Tabulation_Test_Project.csv)
1. Paste your Tally API key into the cell with the variable `tally_api_key`


In [1]:
import tally

In [2]:
# insert your tally API key here, replace os.environ.get
tally_api_key = # insert your key here


In [3]:
dataset = tally.DataSet(api_key=tally_api_key)
dataset.use_parquet('data/Tabulation_Test_Project.pq', 'data/Tabulation_Test_Project.csv')

In [53]:
# use the crosstab method to run calculations (always pass the parameter name, i.e. x='q2')
dataset.crosstab(x='q2')

In [54]:
# use the ci parameter to choose whether to show counts or percentages
dataset.crosstab(x='q2', ci=['counts'])

In [55]:
# other parameters include
# y - the banner variable 
# w - weight variable 
# sig_level - the alpha number to use for sig-testing
# base - what bases to show
dataset.crosstab(x='q7', y='q2', w='weightings.q4_2020', ci=['c%', 'counts'], sig_level=[0.05], base='both')

## Create nets or new varialbes with the derive method

In [56]:
# we start by looking at variable q14 and deciding we need a top/bottom 2 net
dataset.crosstab(x='q14')

In [57]:
# We create the NET by supplying a condition map, where we send the codes for the answers we want to combine
cond_map = [
        (1, "NET: Less bottom 2", {'q14':[1,2]}),
        (2, "Neutral", {'q14':[3]}),
        (3, "Net: More top 2", {'q14':[4,5]})
    ]
result = dataset.derive(name='q14_net', label='NET: More or less likely?', cond_map=cond_map, qtype="single")
dataset.crosstab(x='q14_net')

### Build powerpoint presentations

This cell will generate a file in the root folder called my_powerpoint.pptx

In [49]:
dataset.build_powerpoint(filename='my_powerpoint.pptx', 
                         powerpoint_template='pptx_templates/Datasmoothie_Template.pptx',
                         x=['q8', 'q14_net'],
                         y=['q2', 'q7'])

<Response [200]>

### Build Excel tables

This cell will generate a file in the root folder called my_excel.xlsx

In [52]:
dataset.build_excel(filename='my_excel.xlsx', 
                         x=['q8', 'q14_net'],
                         y=['q2', 'q7'])

<Response [200]>