# Understanding Logistic Regression

Logistic Regression is a widely used method for modeling binary dependent variables (e.g., success/failure, 1/0). Unlike models that estimate continuous outcomes, it estimates the probability of an event occurring by transforming a linear combination of predictor variables into a probability using the logistic function.

<br>

---

<br>

## Key Concepts

### 1. Logistic Function and Probabilities

<dl>
<dd>
The logistic (or sigmoid) function converts the result of the linear combination of predictors into a probability between 0 and 1. It is expressed as:
</dd>


<br>

<dt>
$$
P(Y=1 \mid X) = \frac{1}{1 + e^{-(\beta_0 + \beta_1X_1 + \beta_2X_2 + \dots + \beta_pX_p)}}
$$
</dt>

<dd>

Here:

<dd>

* $\beta_0$ is the intercept.


* $\beta_1$, $\beta_2$, $\dots$, $\beta_p$ are the coefficients that measure the impact of each predictor $X_1$, $X_2$, $\dots$, $X_p$ on the probability of the event occurring.

</dl>

### 2. Logit Transformation (Log-Odds)

<dl>
<dd>
Logistic regression can be rewritten in the logit form, where the logarithm of the odds is modeled linearly:
</dd>

<br>

<dt>
$$
\log\left(\frac{P(Y=1 \mid X)}{1 - P(Y=1 \mid X)}\right) = \beta_0 + \beta_1X_1 + \beta_2X_2 + \dots + \beta_pX_p
$$
</dt>
<br>


<dd>
This transformation allows us to interpret each coefficient \( \beta \) as the change in the log-odds of the event for a one-unit change in the corresponding predictor variable.
</dd>
</dl>

### 3. Estimation of the Coefficients

<dl>
<dd>
Unlike least squares models, the coefficients in logistic regression are estimated using Maximum Likelihood Estimation (MLE). This method seeks the values of the coefficients that maximize the probability of observing the sampled data.
</dd>
</dl>

<br>

---

<br>

## Interpreting Logistic Regression

### 1. Odds Ratio (OR)

<dl>
<dd>
By exponentiating the coefficients, we obtain the odds ratios:
</dd>

<br>

<dt>
$$
OR = e^{\beta}
$$
</dt>
<br>


<dd>

- If \( OR > 1 \), an increase in the predictor variable is associated with higher odds of the event occurring.
- If \( OR < 1 \), the odds of the event occurring decrease.
- If \( OR = 1 \), the variable has no effect on the probability of the event.

</dd>
</dl>

### 2. Confidence Intervals (CI)

<dl>
<dd>
To assess statistical significance, we compute the confidence interval (usually 95%) for the odds ratio:
</dd>

<br>

<dt>
$$
CI = \left[e^{(\beta - 1.96 \cdot \sigma)}, \, e^{(\beta + 1.96 \cdot \sigma)}\right]
$$
</dt>
<br>


<dd>

- If the interval does not include 1, the effect of the variable is considered statistically significant.
- If it includes 1, the effect may not be significant.

</dd>
</dl>

### 3. p-value

<dl>
<dd>
The p-value tests the null hypothesis that the coefficient is zero (i.e., that the variable does not affect the probability of the event).
</dd>

<dd>

- If \( p < 0.05 \), the effect is statistically significant.
- If \( p \geq 0.05 \), there is insufficient evidence that the variable affects the event.

</dd>
</dl>

<br>

---

<br>

## Advantages of Logistic Regression

<dl>
<dd>

- **Direct Probabilities:** It provides a direct estimate of the probability of the event, making interpretation straightforward.
- **Interpretable Coefficients:** The odds ratios allow for a clear understanding of the impact of each predictor.
- **Flexibility:** It can be easily extended to multiclass classifications (via multinomial logistic regression) and applied in various fields such as medicine, marketing, and social sciences.

</dd>
</dl>

## Limitations of Logistic Regression

<dl>
<dd>

- **Linearity Assumption in the Logit:** The model assumes that the relationship between the predictor variables and the log-odds is linear.
- **Sensitivity to Outliers:** Extreme values can influence the estimation of the coefficients.
- **Multicollinearity:** High correlations among predictors can distort the estimates and the interpretation of their effects.

</dd>
</dl>

<br>

---

<br>


# Implementing the Logistic Regression Model in Python


<br>

## Library import

Before showing any code we are going to talk about libraries in python. Libraries are external functions that will help us to do the **Logistic Regression**. The main libraries that are going to be used are:

* <a href='#id_1'>Numpy<a>


* <a href='#id_2'>Pandas</a>


* <a href='#id_3'>Statsmodels</a>

To import those libraries in python we do the following:
```
import {library name}
```
Thus, to import our libraries we will do:
```
import numpy
import pandas
import statsmodels
```
To use the function of each library we will need to call them using the operator "." :
```
numpy.array()
```
We can also abriviate the library name using the operator "as" when importing the library:
```
import numpy as np

np.array()
```
In addition, we an import only the function using the operator "from" when importing the library:
```
from numpy import array

array()
```

<br>

---

<br>

Now that we understand how to import libraries we will import them to our colab as follows:



In [None]:
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf

---

## Library Import Disclaimer

If for some reason when trying to import those libraries you recieve a error that the library was not found you will need to install them into your python.

To do so we will use the <a href='#pip'>pip</a> command in our machine, not in python. The pip command gives us the possibility to download external libraries to our computer.

To download the libraries that will be used in the code for Logistic Regression we will do the follow in our terminal (cmd or powershell):
```
pip install numpy pandas statsmodels
```
<br>

---

## Defining the Logistic Regression Function

Due to <a href='https://www.w3schools.com/python/python_scope.asp'>scope</a> issues from the jupyter notebook, we will be first explaining each section of the code and at the end there will be the full function displayed.

### Defining the Logistic Regression Function

The Logistic Regression function is encapsulated inside another function called GLM (Generalized Linear Model), that has the capabilities to do different types of models. Although the function that will be presented does only Logistic and Linear Regression, as seen previously, in this document we will only be seeing how it does a Logistic Regression.

In [None]:
def execute_glm_regression(elr_dataframe_df, elr_outcome_str, elr_predictors_list,
                           model_type='linear', print_results=True, labels=False, reg_type="Multi"):
     """
    Executes a GLM (Generalized Linear Model) for linear or logistic regression.

    Parameters:
    - elr_dataframe_df: Pandas DataFrame containing the data.
    - elr_outcome_str: Name of the outcome variable.
    - elr_predictors_list: List of predictor variable names.
    - model_type: 'linear' for linear regression (Gaussian) or 'logistic' for logistic regression (Binomial).
    - print_results: If True, prints the results table.
    - labels: (Optional) Dictionary to map variable names to human-readable labels.
    - reg_type: Regression type ('uni' or 'multi') to rename the output columns.

    Returns:
    - summary_df: DataFrame with the model results.
    """

### Parameters Disclaimer

All variables inside elr_predictors_list and elr_outcome_str need to be a colunm from elr_dataframe_df.

<br>

---


### Defining family



In [None]:
if model_type.lower() == 'logistic':
    family = sm.families.Binomial()
elif model_type.lower() == 'linear':
    family = sm.families.Gaussian()
else:
    raise ValueError("model_type must be 'linear' or 'logistic'")

This section checks the model_type parameter to determine which statistical family to use:

* For logistic regression ('logistic'), it sets the family to sm.families.Binomial(), which is suitable for binary outcomes.

* For linear regression ('linear'), it sets the family to sm.families.Gaussian(), appropriate for continuous outcomes.

* If an unsupported model_type is passed, it raises a ValueError to alert you immediately about the invalid input.

<br>

---


### Building the Formula

In [None]:
formula = elr_outcome_str + ' ~ ' + ' + '.join(elr_predictors_list)

Here, the code constructs the regression formula using the outcome variable and predictor variables:

The formula string follows the format required by **statsmodels**:
<code>outcome ~ predictor1 + predictor2 + ...</code>

<br>

---


### Converting Categorical Variables


In [None]:
categorical_vars = elr_dataframe_df.select_dtypes(include=['object', 'category']).columns.intersection(elr_predictors_list)
for var in categorical_vars:
    elr_dataframe_df[var] = elr_dataframe_df[var].astype('category')

This block ensures that any categorical predictor variables are correctly recognized:

* It selects columns in the DataFrame that are of type object or category and are also listed in your predictors.

* Each identified column is then converted to the category data type.

<br>

---


### Fitting the GLM Model

In [None]:
model = smf.glm(formula=formula, data=elr_dataframe_df, family=family)
result = model.fit()

The <code>smf.glm</code> function is called with the constructed formula, the dataset, and the chosen family.

The model is then fitted using the <code>.fit()</code> method, which computes the coefficients and related statistics.

<br>

---


### Extracting the Results Table

In [None]:
summary_table = result.summary2().tables[1].copy()

After fitting the model, the summary table containing key statistics is extracted:

* <code>result.summary2()</code> generates a detailed summary.

* The second table (indexed at 1) is copied to work with, as it contains * coefficients, confidence intervals, and p-values.

<br>

---


### Processing Results Based on Model Type

In [None]:
if model_type.lower() == 'logistic':
    summary_table['Odds Ratio'] = np.exp(summary_table['Coef.'])
    summary_table['IC Low'] = np.exp(summary_table['[0.025'])
    summary_table['IC High'] = np.exp(summary_table['0.975]'])

    summary_df = summary_table[['Odds Ratio', 'IC Low', 'IC High', 'P>|z|']].reset_index()
    summary_df = summary_df.rename(columns={'index': 'Study',
                                              'Odds Ratio': 'OddsRatio',
                                              'IC Low': 'LowerCI',
                                              'IC High': 'UpperCI',
                                              'P>|z|': 'p-value'})
else:
    summary_df = summary_table[['Coef.', '[0.025', '0.975]', 'P>|z|']].reset_index()
    summary_df = summary_df.rename(columns={'index': 'Study',
                                              'Coef.': 'Coefficient',
                                              '[0.025': 'LowerCI',
                                              '0.975]': 'UpperCI',
                                              'P>|z|': 'p-value'})

This part adjusts the results table based on the model type:

* For Logistic Regression:

  * The coefficients are exponentiated to convert them into odds ratios.

  * Confidence intervals are also exponentiated.

  * The resulting DataFrame is then restructured and columns are renamed.

<br>

---

### Mapping Variable Names to Readable Labels

In [None]:
if labels:
    def parse_variable_name(var_name):
        if var_name == 'Intercept':
            return labels.get('Intercept', 'Intercept')
        elif '[' in var_name:
            base_var = var_name.split('[')[0]
            level = var_name.split('[')[1].split(']')[0]
            base_var_name = base_var.replace('C(', '').replace(')', '').strip()
            label = labels.get(base_var_name, base_var_name)
            return f'{label} ({level})'
        else:
            var_name_clean = var_name.replace('C(', '').replace(')', '').strip()
            return labels.get(var_name_clean, var_name_clean)
    summary_df['Study'] = summary_df['Study'].apply(parse_variable_name)

This snippet remaps raw variable names to more reader-friendly labels if a labels dictionary is provided:

* The function parse_variable_name checks if a variable is the intercept or a categorical variable.

* For categorical variables, it extracts the base name and level, then applies the mapping.

* The Study column in the summary DataFrame is updated with these parsed names.

<br>

---

### Reordering and Cleaning the Columns

In [None]:
if model_type.lower() == 'logistic':
    summary_df = summary_df[['Study', 'OddsRatio', 'LowerCI', 'UpperCI', 'p-value']]
else:
    summary_df = summary_df[['Study', 'Coefficient', 'LowerCI', 'UpperCI', 'p-value']]

summary_df['Study'] = summary_df['Study'].str.replace('T.', '')

This section organizes the DataFrame:

* Columns are reordered to ensure a logical and consistent display.

* The string <code>'T.'</code>, which may appear in categorical variable names, is removed for clarity.

<br>

---

### Formatting Numerical Values

In [None]:
for col in summary_df.columns[1:-1]:
    summary_df[col] = summary_df[col].round(3)
summary_df['p-value'] = summary_df['p-value'].apply(lambda x: f'{x:.4f}')

This part formats the numeric values for better presentation:

* Coefficients and confidence intervals are rounded to three decimal places.

* p-values are formatted to four decimal places as strings.

<br>

---

### Optional Removal of the Intercept Row

In [None]:
summary_df = summary_df[summary_df['Study'] != 'Intercept']

The intercept row is removed from the summary DataFrame:

* Often, the intercept is not of primary interest, so this step omits it from the final results.

<br>

---

### Renaming Columns Based on Regression Type

In [None]:
if reg_type.lower() == 'uni':
    if model_type.lower() == 'logistic':
        summary_df.rename(columns={
            'OddsRatio': 'OddsRatio (uni)',
            'LowerCI': 'LowerCI (uni)',
            'UpperCI': 'UpperCI (uni)',
            'p-value': 'p-value (uni)'
        }, inplace=True)
    else:
        summary_df.rename(columns={
            'Coefficient': 'Coefficient (uni)',
            'LowerCI': 'LowerCI (uni)',
            'UpperCI': 'UpperCI (uni)',
            'p-value': 'p-value (uni)'
        }, inplace=True)
elif reg_type.lower() == 'multi':
    if model_type.lower() == 'logistic':
        summary_df.rename(columns={
            'OddsRatio': 'OddsRatio (multi)',
            'LowerCI': 'LowerCI (multi)',
            'UpperCI': 'UpperCI (multi)',
            'p-value': 'p-value (multi)'
        }, inplace=True)
    else:
        summary_df.rename(columns={
            'Coefficient': 'Coefficient (multi)',
            'LowerCI': 'LowerCI (multi)',
            'UpperCI': 'UpperCI (multi)',
            'p-value': 'p-value (multi)'
        }, inplace=True)

Based on whether you are performing univariate or multivariate regression, this section renames the columns:

* A suffix ((uni) or (multi)) is appended to each column name.

* This clarifies which type of regression analysis the results pertain to.

<br>

---

### Output and Return

In [None]:
if print_results:
    print(summary_df)

return summary_df

Finally:

* If print_results is True, the function prints the formatted summary DataFrame.

* The summary DataFrame is then returned, allowing further use or inspection of the results.

## Full Code

In [None]:
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf

def execute_glm_regression(elr_dataframe_df, elr_outcome_str, elr_predictors_list,
                           model_type='linear', print_results=True, labels=False, reg_type="Multi"):
    """
    Executes a GLM (Generalized Linear Model) for linear or logistic regression.

    Parameters:
    - elr_dataframe_df: Pandas DataFrame containing the data.
    - elr_outcome_str: Name of the outcome variable.
    - elr_predictors_list: List of predictor variable names.
    - model_type: 'linear' for linear regression (Gaussian) or 'logistic' for logistic regression (Binomial).
    - print_results: If True, prints the results table.
    - labels: (Optional) Dictionary to map variable names to human-readable labels.
    - reg_type: Regression type ('uni' or 'multi') to rename the output columns.

    Returns:
    - summary_df: DataFrame with the model results.
    """

    # 1. Define the family based on model_type
    if model_type.lower() == 'logistic':
        family = sm.families.Binomial()
    elif model_type.lower() == 'linear':
        family = sm.families.Gaussian()
    else:
        raise ValueError("model_type must be 'linear' or 'logistic'")

    # 2. Build the formula string for the regression model
    formula = elr_outcome_str + ' ~ ' + ' + '.join(elr_predictors_list)

    # 3. Convert categorical variables to the 'category' type
    categorical_vars = elr_dataframe_df.select_dtypes(include=['object', 'category']).columns.intersection(elr_predictors_list)
    for var in categorical_vars:
        elr_dataframe_df[var] = elr_dataframe_df[var].astype('category')

    # 4. Fit the GLM model using the specified formula, data, and family
    model = smf.glm(formula=formula, data=elr_dataframe_df, family=family)
    result = model.fit()

    # 5. Extract the results table from the model summary
    summary_table = result.summary2().tables[1].copy()

    # 6. Process the results for logistic or linear regression
    if model_type.lower() == 'logistic':
        # For logistic regression, compute Odds Ratios and their confidence intervals
        summary_table['Odds Ratio'] = np.exp(summary_table['Coef.'])
        summary_table['IC Low'] = np.exp(summary_table['[0.025'])
        summary_table['IC High'] = np.exp(summary_table['0.975]'])

        summary_df = summary_table[['Odds Ratio', 'IC Low', 'IC High', 'P>|z|']].reset_index()
        summary_df = summary_df.rename(columns={'index': 'Study',
                                                  'Odds Ratio': 'OddsRatio',
                                                  'IC Low': 'LowerCI',
                                                  'IC High': 'UpperCI',
                                                  'P>|z|': 'p-value'})
    else:
        # For linear regression, use the coefficients and confidence intervals directly
        summary_df = summary_table[['Coef.', '[0.025', '0.975]', 'P>|z|']].reset_index()
        summary_df = summary_df.rename(columns={'index': 'Study',
                                                  'Coef.': 'Coefficient',
                                                  '[0.025': 'LowerCI',
                                                  '0.975]': 'UpperCI',
                                                  'P>|z|': 'p-value'})

    # 7. Map variable names to human-readable labels if a labels dictionary is provided
    if labels:
        def parse_variable_name(var_name):
            if var_name == 'Intercept':
                return labels.get('Intercept', 'Intercept')
            elif '[' in var_name:
                base_var = var_name.split('[')[0]
                level = var_name.split('[')[1].split(']')[0]
                base_var_name = base_var.replace('C(', '').replace(')', '').strip()
                label = labels.get(base_var_name, base_var_name)
                return f'{label} ({level})'
            else:
                var_name_clean = var_name.replace('C(', '').replace(')', '').strip()
                return labels.get(var_name_clean, var_name_clean)
        summary_df['Study'] = summary_df['Study'].apply(parse_variable_name)

    # 8. Reorder the columns for clarity
    if model_type.lower() == 'logistic':
        summary_df = summary_df[['Study', 'OddsRatio', 'LowerCI', 'UpperCI', 'p-value']]
    else:
        summary_df = summary_df[['Study', 'Coefficient', 'LowerCI', 'UpperCI', 'p-value']]

    # 9. Remove the letter 'T.' from categorical variable names
    summary_df['Study'] = summary_df['Study'].str.replace('T.', '')

    # 10. Format numerical values (round coefficients and confidence intervals, format p-values)
    for col in summary_df.columns[1:-1]:
        summary_df[col] = summary_df[col].round(3)
    summary_df['p-value'] = summary_df['p-value'].apply(lambda x: f'{x:.4f}')

    # 11. Optionally remove the intercept row if not needed
    summary_df = summary_df[summary_df['Study'] != 'Intercept']

    # 12. Rename columns based on the regression type (univariate or multivariate)
    if reg_type.lower() == 'uni':
        if model_type.lower() == 'logistic':
            summary_df.rename(columns={
                'OddsRatio': 'OddsRatio (uni)',
                'LowerCI': 'LowerCI (uni)',
                'UpperCI': 'UpperCI (uni)',
                'p-value': 'p-value (uni)'
            }, inplace=True)
        else:
            summary_df.rename(columns={
                'Coefficient': 'Coefficient (uni)',
                'LowerCI': 'LowerCI (uni)',
                'UpperCI': 'UpperCI (uni)',
                'p-value': 'p-value (uni)'
            }, inplace=True)
    elif reg_type.lower() == 'multi':
        if model_type.lower() == 'logistic':
            summary_df.rename(columns={
                'OddsRatio': 'OddsRatio (multi)',
                'LowerCI': 'LowerCI (multi)',
                'UpperCI': 'UpperCI (multi)',
                'p-value': 'p-value (multi)'
            }, inplace=True)
        else:
            summary_df.rename(columns={
                'Coefficient': 'Coefficient (multi)',
                'LowerCI': 'LowerCI (multi)',
                'UpperCI': 'UpperCI (multi)',
                'p-value': 'p-value (multi)'
            }, inplace=True)

    # 13. Print the results if print_results is True
    if print_results:
        print(summary_df)

    # 14. Return the summary DataFrame with all the model results
    return summary_df

---

# User Case Example

In the following example, we will be doing a **Logistic Regression** using a synthetic data do show to would be the implementation of the code in a real cenario.

In the following code we are implementing a Logistic Regression to a synthetic data, the Logistic Regression will be on top of age and sex.


---

## Defining Variables

Before a Logistic Regression, we need to define the predictors variable(s) and the outcome variable we want to analyze. In our example we are analyzing Death rate by Age. In addition, is important to notice that the database I will be using is synthetic data, so the results that will be displayed are not real just a example of what the function Logistic Regression is able to do.

---

## Importing libraries

In [None]:
import pandas as pd
from google.colab import files

---

## Gathering data from database

In this step, normally what would happen is that you would only declare what data will be analyzed and execute the function. However, in this dataset, the outcome variables were not Binary but categorical, and the logistic regression only execpts binary variable as a outcome. So was needed to create a binary reference from the outcome variable. Although, this is from case to case.

In [None]:
# Uploading data set
uploaded = files.upload()
df_map = pd.read_csv("df_map.csv")

# Creating binary variables for age, as age is between 0-110 in the normal dataset.
bins = list(range(0, 110, 10))
labels = [f"{i}_{i+10}" for i in bins[:-1]]

df_map['age_bin'] = pd.cut(df_map['demog_age'], bins=bins, labels=labels, right=False)

age_dummies = pd.get_dummies(df_map['age_bin'], prefix='age')
df_map = pd.concat([df_map, age_dummies], axis=1)

# Choosing one category for outcome, as this is not a multinomial regression.
df_map['outcome_binary'] = (df_map['outco_binary_outcome'] == 'Death').astype(int)

# Defining outcome and predictors variables
outcome = "outcome_binary"
predictors = ["demog_sex" , "age_bin"]

Saving df_map.csv to df_map.csv


---

## Executing the function

To execute the function we will input the variables that we chose and most importantly, choose the model_type to be logistic. Otherwise it will execute a linear regression. Also, want to print the results and we are doing a multivariet regression.

In [None]:
logistic_response = execute_glm_regression(
    elr_dataframe_df=df_map,
    elr_outcome_str=outcome,
    elr_predictors_list=predictors,
    model_type='logistic',
    print_results=True,
    labels=False,
    reg_type="Multi"
)

              Study  OddsRatio (multi)  LowerCI (multi)  UpperCI (multi)  \
1   demog_sex[Male]              1.700            1.080            2.675   
2    age_bin[10_20]              0.000            0.000              inf   
3    age_bin[20_30]              1.191            0.258            5.502   
4    age_bin[30_40]              0.775            0.156            3.855   
5    age_bin[40_50]              1.508            0.332            6.847   
6    age_bin[50_60]              0.609            0.112            3.312   
7    age_bin[60_70]              1.057            0.207            5.396   
8    age_bin[70_80]              2.389            0.517           11.027   
9    age_bin[80_90]              2.326            0.486           11.135   
10  age_bin[90_100]              3.080            0.450           21.067   

   p-value (multi)  
1           0.0218  
2           0.9990  
3           0.8231  
4           0.7550  
5           0.5948  
6           0.5661  
7           0.94

  result = getattr(ufunc, method)(*inputs, **kwargs)


---

## Analyzing results with Forest Plot

Forest Plot is a graphical method of analyzing the result table of the logistic regression. Where each point is a OddsRation and is visible the Lower and Upper confidence interval.

Below is the full code for the Forest Plot:

In [None]:
import pandas as pd
import plotly.graph_objs as go


def fig_forest_plot(
        df, dictionary=None,
        title='Forest Plot',
        labels=['Study', 'OddsRatio', 'LowerCI', 'UpperCI'],
        graph_id='forest-plot', graph_label='', graph_about='',
        only_display=False):

    # Ordering Values -> Descending Order
    df = df.sort_values(by=labels[1], ascending=True)

    # Error Handling
    if not set(labels).issubset(df.columns):
        print(df.columns)
        error_str = f'Dataframe must contain the following columns: {labels}'
        raise ValueError(error_str)

    # Prepare Data Traces
    traces = []

    # Add the point estimates as scatter plot points
    traces.append(
        go.Scatter(
            x=df[labels[1]],
            y=df[labels[0]],
            mode='markers',
            name='Odds Ratio',
            marker=dict(color='blue', size=10))
    )

    # Add the confidence intervals as lines
    for index, row in df.iterrows():
        traces.append(
            go.Scatter(
                x=[row[labels[2]], row[labels[3]]],
                y=[row[labels[0]], row[labels[0]]],
                mode='lines',
                showlegend=False,
                line=dict(color='blue', width=2))
        )

    # Define layout
    layout = go.Layout(
        title=title,
        xaxis=dict(title='Odds Ratio'),
        yaxis=dict(
            title='', automargin=True, tickmode='array',
            tickvals=df[labels[0]].tolist(), ticktext=df[labels[0]].tolist()),
        shapes=[
            dict(
                type='line', x0=1, y0=-0.5, x1=1, y1=len(df[labels[0]])-0.5,
                line=dict(color='red', width=2)
            )],  # Line of no effect
        margin=dict(l=100, r=100, t=100, b=50),
        height=600
    )

    return go.Figure(data=traces, layout=layout)

---

## Executing Forest Plot

In [None]:
graph = fig_forest_plot(
    df = logistic_response,
    labels = logistic_response.columns.tolist(),
    only_display=True
)

graph.show()

---

# References

* To learn more about numpy library, please go to the <a name='id_1'> <a href='https://numpy.org/'>Numpy WebPage</a>
<br>

* To learn more about pandas library, please go to the <a name='id_2'> <a href='https://pandas.pydata.org/'>Pandas WebPage</a>
<br>

* To learn more about statsmodels library, please go to the<a name='id_3'> <a href='https://www.statsmodels.org/stable/index.html'>StatsModels WebPage</a>
<br>

* To learn more about the Pip command, please go to the <a name='pip'>
<a href='https://pypi.org/project/pip/'>Pip WebPage</a>
<br>
