> ## Instructions
> 
> The goal is to get comfortable with Jupyter notebook and executable documents. You will need to submit a Jupyter notebook that should compile by itself along with the pdf. If you use any special packages make sure that their use is documented so that the user (me!) knows what needs to be loaded.
>
> If your program does not compile properly it will be sent back and you will be asked to submit another version.
>
> 1. The website https://www.causeweb.org/tshs/surgery-timing/ describes a study looking at the time of day a surgery is performed. Download and read the associated article: “Operation timing and 30-day mortality after elective general surgery” by Sessler et al. posted on the Sakai website.
> 2. The study data are available in a number of formats on the website. Load the RData object directly from the website. Use the command: load(url(“http://mywebsite.com/mydata.RData”)) where 'mywebsite.com/mydata.RData' refers to the link address of the data you wish to use.
> 3. Recreate Tables 1 & 2 from the paper. Comment on any discrepancies. (You don’t need to format the tables in the same way)
> 4. Figures 3 & 4 report adjusted probability estimates. What were these probabilities adjusted for? Is this study reproducible based on the published article?

## Work

Let's get to work with the first table:


In [None]:
################################### Python ####################################

import pandas as pd

# Import data
data_url = "https://causeweb.org/tshs/datasets/Surgery%20Timing.xlsx"
surgery_dataset = pd.read_excel(data_url)

# Specify variables for table 1
baseline_vars = [
    "age",
    "gender",
    "race",
    "asa_status",
    "bmi",
    "baseline_cancer",
    "baseline_cvd",
    "baseline_dementia",
    "baseline_diabetes",
    "baseline_osteoart",
    "baseline_psych",
    "baseline_pulmonary",
    "baseline_charlson",
    "mortality_rsi",
    "complication_rsi",
]

continuous_vars = [
    "age",
    "bmi",
    "baseline_charlson",
    "mortality_rsi",
    "complication_rsi",
]
categorical_vars = [
    "gender",
    "race",
    "asa_status",
    "baseline_cancer",
    "baseline_cvd",
    "baseline_dementia",
    "baseline_diabetes",
    "baseline_osteoart",
    "baseline_psych",
    "baseline_pulmonary",
]

# Create the first table
table_1 = pd.DataFrame(["Factor", "Statistic"])

# Summary for continuous variables
continuous_summary = (
    surgery_dataset[continuous_vars].agg(["mean", "std"]).transpose()
)
continuous_summary = continuous_summary.rename(
    columns={"mean": "Mean", "std": "Std. Dev"}
)


# Summary for categorical variables
all_cat_summaries: dict[str, dict[str, list[any]]] = {}

for var in categorical_vars:
    counts = surgery_dataset[var].value_counts()
    percentages = surgery_dataset[var].value_counts(normalize=True) * 100
    cat_summary = {
        "Level": counts.index,
        "Count": counts.values,
        "Percentage": percentages.values,
    }
    all_cat_summaries[var] = cat_summary

table_index = 0
# Create a new table with formatted values
# Example corrected code to add rows:
for var in baseline_vars:
    if var in continuous_vars:
        # Corrected to use .iloc and assign correctly formatted summary stats
        var_mean = round(continuous_summary.loc[var].iloc[0], 2)
        var_std_dev = round(continuous_summary.loc[var].iloc[1], 3)
    elif var in categorical_vars:
        var_info = all_cat_summaries[var]
        var_levels = var_info["Level"]
        var_counts = var_info["Count"]
        var_percents = var_info["Percentage"]

        table_1.loc[table_index] = [var, ""]
        table_index += 1

        for i in range(len(var_levels)):
            level = var_levels[i]
            lvl_count = var_counts[i]
            lvl_pct = var_percents[i]
            table_1.loc[table_index] = [
                f" {level}",
                f"{lvl_count} ({round(lvl_pct, 1)})",
            ]
            table_index += 1
    else:
        raise Exception(f"Variable {var} was not included\n")
print("Table 1")
print(table_1)

Now let's work on the second table. 
