# Creating Estimating Equations from Data Frames for PyMC and BAMBI

The following functions are designed to automatically output estimating equations for use in PyMC and BAMBI.

We first create an example Pandas Data Frame for use in the following functions

In [2]:
# Import pandas library
import pandas as pd
  
# initialize list of lists
data = [['Tom', 25,16,'55000'], ['Nick', 35,16,'75000'], ['Juli', 44,16,'125000'],['Don', 20,14,'50000'],['Ella',18,12,'35000']]
  
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['Name', 'Age', 'Education', 'Income'])
  
# print dataframe.
df

Unnamed: 0,Name,Age,Education,Income
0,Tom,25,16,55000
1,Nick,35,16,75000
2,Juli,44,16,125000
3,Don,20,14,50000
4,Ella,18,12,35000


The following function will output an estimating equation formatted for use in BAMBI.

The dependent input is a string variable for the dependent variable and the independent input is a list with the independent variables that you wish to use in your regression model.

In [3]:
def bambi_equation(dependent,independent):
    vars = independent
    number_of_vars = len(vars)
   
    for i in range(number_of_vars):
        if i==0:
            print(f'\'{dependent} ~ ', end = "")
        if i<number_of_vars-1:
            print(f'{vars[i]} + ', end = "")
        else:
            print(f'{vars[i]}\'')

The following function call will create a string variable with an estimating equation formatted for use with BAMBI. The dependent variable is Age and the explanatory variables are Education and Income.

The output can be copied into your BAMBI code.

In [4]:
bambi_equation('Age',['Education','Income'])

'Age ~ Education + Income'


The next function will create a BAMBI formatted estimating equation that specifies the depndent variable and uses all of the other variables in the dataframe as explanatory variables.

In [5]:
def bambi_equation_all(df,dependent):
    
    vars = df.columns.drop(dependent)
    number_of_vars = len(vars)
   
    for i in range(number_of_vars):
        if i==0:
            print(f'\'{dependent} ~ ', end = "")
        if i<number_of_vars-1:
            print(f'{vars[i]} + ', end = "")
        else:
            print(f'{vars[i]}\'')

The function call inputs the dataframe and the name of the dependent variable and then prints a string for use in BAMBI with an estimating equation using all other variables in the dataframe as independent variables. Again, the output can be copied into your BAMBI code.

In [6]:
bambi_equation_all(df,'Age')

'Age ~ Name + Education + Income'


The next function will create a PyMC formatted estimating equation that specifies the depndent variable (i.e., the variable that you do not want in the equation) and uses all of the other variables in the dataframe as explanatory variables. Note tha the prior distribution for the regression coefficients are called beta.

The results can be copied into your PyMC code.

In [7]:
def pymc_equation_all(df,dependent):
    
    vars = df.columns.drop(dependent)
    number_of_vars = len(vars)
   
    for i in range(number_of_vars):
        if i==0:
            print(f' intercept + ', end = "")
        if i<number_of_vars-1:
            print(f'beta[{i}]*df[\'' + vars[i] + '\'] + ',end='')
        else:
            print(f'beta[{i}]*df[\'' + vars[i] + '\']')

In [8]:
pymc_equation_all(df,'Age')

 intercept + beta[0]*df['Name'] + beta[1]*df['Education'] + beta[2]*df['Income']


Finally, the following function will output an estimating equations formatted for use in PyMC.

The dependent input is a string variable for the dependent variable and the independent input is a list with the independent variables that you wish to use in your regression model.

The result can be copied into your PyMC code.

In [9]:
def pymc_equation(independent):
    vars = independent
    number_of_vars = len(vars)
   
    for i in range(number_of_vars):
        if i==0:
            print(f'intercept + ', end = "")
        if i<number_of_vars-1:
            print(f'beta[{i}]*df[\'' + vars[i] + '\'] + ',end='')
        else:
            print(f'beta[{i}]*df[\'' + vars[i] + '\']')

The following function call will create a string variable with an estimating equation formatted for use with BAMBI. The dependent variable is Age and the explanatory variables are Gender and Income.

In [10]:
pymc_equation(['Education','Income'])

intercept + beta[0]*df['Education'] + beta[1]*df['Income']
