# Calling Reprosyn from a python script

Reprosyn is first a foremost a command line tool. It does not yet have a nice api as a package. 

This notebook describes how to call a reprosyn method `mst`, programmatically varying parameters and saving the output. 

We assume that you have installed reprosyn into whichever python environment you are working in using `pip install git+https://github.com/alan-turing-institute/reprosyn`.

In [13]:
import pandas as pd
import subprocess
from io import StringIO


In [3]:
census = pd.read_csv('https://raw.githubusercontent.com/alan-turing-institute/reprosyn/main/src/reprosyn/datasets/2011-census-microdata/2011-census-microdata-small.csv')

census.head()

Unnamed: 0,Person ID,Region,Residence Type,Family Composition,Population Base,Sex,Age,Marital Status,Student,Country of Birth,Health,Ethnic Group,Religion,Economic Activity,Occupation,Industry,Hours worked per week,Approximated Social Grade
0,7394816,E12000001,H,2,1,2,6,2,2,1,2,1,2,5,8,2,-9,4
1,7394745,E12000001,H,5,1,1,4,1,2,1,1,1,2,1,8,6,4,3
2,7395066,E12000001,H,3,1,2,4,1,2,1,1,1,1,1,6,11,3,4
3,7395329,E12000001,H,3,1,2,2,1,2,1,2,1,2,1,7,7,3,2
4,7394712,E12000001,H,3,1,1,5,4,2,1,1,1,2,1,1,4,3,2


`Reprosyn` takes input from `STDIN` and outputs to `STDOUT`, unless specifically given filepaths.

This means we can loop easily using subprocess. To expedite the example we fix the size of the datasets generated to `10`.

In [18]:
size = 10
epsilon = [1,10,100]
command='mst'
inp = bytes(census.to_csv(), 'utf-8')

outputs = {}
for e in epsilon:
    print(f"running {command} for epsilon = {e}")
    out = subprocess.run(["rsyn", "--size", f"{size}", command, "--epsilon", f"{e}"], input=inp, capture_output=True)
    print('stderr: ', out.stderr)
    df = pd.read_csv(StringIO(cp.stdout.decode()))
    outputs[e] = df


running mst for epsilon = 1
running mst for epsilon = 10
running mst for epsilon = 100


In [21]:
outputs

{1:       Unnamed: 0  Region  ...  Hours worked per week  Approximated Social Grade
 0   0       39367       0  ...                    ...                          
 1   1        1710       3  ...                    ...                          
 2   2       29961       4  ...                    ...                          
 3   3       13996       4  ...                    ...                          
 4   4       45921       5  ...                    ...                          
 5   5       31872       0  ...                    ...                          
 6   6       47757       1  ...                    ...                          
 7   7       28398       2  ...                    ...                          
 8   8       34303       5  ...                    ...                          
 9   9       31949       2  ...                    ...                          
 10                             [10 rows x 18 columns]                          ,
 10:       Unnamed: 0  R