# Codes of Ethics in IT: _do they matter?_

<a href="https://colab.research.google.com/drive/1OcMBVrCDxPgEv_tNUTrg7rKePYS6uPs7" target="_blank">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab">
</a>

![Someone is stretching their arms in front of a computer monitor.](https://raw.githubusercontent.com/Nkluge-correa/codes-of-ethics/main/img/Dilemma-00.gif)

Building upon past works, this study seeks to assess the influence of codes of ethics on the decision-making of IT professionals and students when confronted with ethical dilemmas and moral self-assessment questions. **We conducted a randomized controlled trial on 225 IT students and professionals for this, using a multi-media implementation of the 2018 Association for Computing Machinery Code of Ethics as our intervention in our experimental group.**

Research questions:

1. Can passive exposure to a CoE influence the decision-making of IT professionals and students?

2. Is ethical training an active topic in their academic or professional careers?

3. What is the importance IT professionals and students attribute to CoEs?

4. Can passive exposure to a CoE influence how IT professionals and students perceive their moral behavior?

## Load the data.

These files contain the responses to the survey.

 - [`control.csv`](./data/control.csv) contains the responses from the control group.
 - [`experimental.csv`](./data/experimental.csv) contains the responses from the experimental group.

In [15]:
# Libraries required for the analysis.
import numpy as np
import pandas as pd
from scipy import stats
import matplotlib.pyplot as plt

df_control = pd.read_csv('./data/control.csv')
df_experimental = pd.read_csv('./data/experimental.csv')

# As a choice, you can, for example, select only the samples with self-reporting values less than or equal to 2
# (below average ethical training) for the analysis.
#df_control = df_control[df_control['self_reporting_1'] <= 2]
#df_control.reset_index(drop=True, inplace=True)
#df_experimental = df_experimental[df_experimental['self_reporting_1'] <= 2]
#df_experimental.reset_index(drop=True, inplace=True)

print('Number of samples in control group: {}'.format(len(df_control)))
print('Number of samples in experimental group: {}'.format(len(df_experimental)))

Number of samples in control group: 112
Number of samples in experimental group: 113


## Agregation

Now, we will do some simple pre-processing to help with our analysis. Binarizing the `age` and `occupation` columns just helps us create larger groups from which we can actually try to do some statistics without having to worry too much about the size of our sample.

In [16]:
# Concatenate the dataframes into a single dataframe.
df = pd.concat([df_control, df_experimental], ignore_index=True)

# Drop the columns that won't be used in the analysis.
df = df.drop(
    columns=[
        'token',
        'start_time',
        'date',
        'submit_time',
        'time_of_completion',
    ]
)

# Binarize the `Age` column.
df.age = df.age.replace(
    {
    '18-24': '18-24',
    '25-34': '+24',
    '35-44': '+24',
    '45-54': '+24',
    '55-64': '+24',
    '65+': '+24',
    }
)

# Binaraize the `Occupation` column.
df.occupation = df.occupation.replace(
    {
    'Student': 'Student',
    'Intern': 'Student',
    'Junior Developer': 'Professional',
    'Mid-Level Developer': 'Professional',
    'Senior Developer': 'Professional',
    }
)


# Replace the nan values in the `formation` and `gender` columns with 'Other'.
df['formation'] = df['formation'].fillna('Other')
df['gender'] = df['gender'].fillna('Other')

# Separate the dataframes into control and experimental groups.
df_control = df[df['group'] == 'control'].reset_index(drop=True)
df_experimental = df[df['group'] == 'experimental'].reset_index(drop=True)

## Demographic Analysis

Now that we have aggregated the data, we will perform a demographic analysis. The function below plots and prints an analysis for our chosen demographic attribute (`['age', 'occupation', 'formation', 'gender']`).

In [17]:
def DemographicAnalysis(attribute):
    """
    Perform demographic analysis for a given attribute.

    Args:
    attribute (str): The attribute to perform analysis on.

    Returns:
    None. Prints the distribution plots and demographic analysis.
    """

    # Assert that the attribute is in the dataframe.
    assert attribute in df.columns, f"{attribute} is not in the dataframe."

    # Assert that the attribute is one of the demographic attributes.
    assert attribute in ['age', 'occupation', 'formation', 'gender'], f"{attribute} is not a demographic attribute."

    # Plot the attribute distribution in the control and experimental groups.
    plt.figure(figsize=(10, 6))

    # Determine the unique categories for the attribute.
    categories = df[attribute].unique()
    num_categories = len(categories)
    bar_width = 0.35

    # Position of bars on X-axis.
    r1 = range(num_categories)
    r2 = [x + bar_width for x in r1]

    # Count occurrences of each category in control and experimental groups.
    control_counts = df_control[attribute].value_counts().reindex(categories, fill_value=0)
    experimental_counts = df_experimental[attribute].value_counts().reindex(categories, fill_value=0)

    # Plotting bars.
    plt.bar(r1, control_counts, color='skyblue', width=bar_width, edgecolor='black', label='Control')
    plt.bar(r2, experimental_counts, color='lightcoral', width=bar_width, edgecolor='black', label='Experimental')

    # Adding labels.
    plt.xlabel(attribute.capitalize())
    plt.ylabel('Count')
    plt.title(f'Distribution of {attribute.capitalize()}')
    plt.xticks([r + bar_width / 2 for r in range(num_categories)], categories, rotation=45, ha='right')
    plt.legend()

    plt.show()

    # Calculate demographic analysis.
    temp_df = pd.DataFrame({
        'all': df[attribute].value_counts().values,
        '%': df[attribute].value_counts(normalize=True).values * 100,
    }, index=df[attribute].value_counts().index)

    temp_df_control = pd.DataFrame({
        'control': df_control[attribute].value_counts().values,
        '%': df_control[attribute].value_counts(normalize=True).values * 100,
    }, index=df_control[attribute].value_counts().index)

    temp_df_experimental = pd.DataFrame({
        'experimental': df_experimental[attribute].value_counts().values,
        '%': df_experimental[attribute].value_counts(normalize=True).values * 100,
    }, index=df_experimental[attribute].value_counts().index)

    # Concatenate the dataframes and use the attribute as the index.
    result_df = pd.concat([temp_df, temp_df_control, temp_df_experimental], axis=1)

    # Fill the nan values with 0.
    result_df = result_df.fillna(0)
    
    # Rename index to be the attribute
    result_df.index.name = attribute.capitalize()

    # Round all values of the column "%" to be 2 decimal places.
    result_df['%'] = result_df['%'].round(2)

    # Print the demographic analysis in Markdown format.
    print(f"\nDemographic analysis for {attribute.capitalize()}\n--------------------------------------\n")
    print(result_df.to_markdown())

In [None]:
DemographicAnalysis('age')

In [None]:
DemographicAnalysis('occupation')

In [None]:
DemographicAnalysis('formation')

In [None]:
DemographicAnalysis('gender')

## Pre-processing of Ethical Dilemmas

Before we begin our statistical analysis, we will replace the string values of our respondent's replies with numerical (integer) values representing their responses. Since every dilemma has 3 possible responses, all responses are mapped to one of the possible values: `0, 1, 2` (2 is reserved for the "Indecisive" option).

In [18]:
# Concatenate the dataframes into a single dataframe.
df = pd.concat([df_control, df_experimental], ignore_index=True)

# Loop through the ethical dilemmas and replace the values with integers.
for i in range(1, 17):

    # Get the name of the column.
    name = f"dilemma_{i}"

    # Get the unique values of the column.
    unique_values = list(df[name].unique())

    # check if 'Indecisive' is in the unique values
    assert 'Indecisive' in unique_values

    # Make the `Indecisive` option the last one in the list.
    for value in unique_values:
        if value == 'Indecisive':
            unique_values.remove(value)
            unique_values.append(value)

    # Create a dictionary with the unique values as keys and a corresponding integer as value.
    unique_values_dict = {value: i for i, value in enumerate(unique_values)}
    #print(unique_values_dict)

    # Replace the values in the dataframe with the corresponding integer.
    df[name] = df[name].map(unique_values_dict)

# Reset the df_control and df_experimental dataframes.
df_control = df[df['group'] == 'control'].reset_index(drop=True)
df_experimental = df[df['group'] == 'experimental'].reset_index(drop=True)

## Ethical Dilemmas Analysis

Now, we will analyze the distribution of responses for each ethical dilemma. Remembering that, in this experiment, our hypotheses are:

- **Null Hypothesis (H0):** The samples from all groups are drawn from the same population distribution (i.e., the intervention had no effect)
- **Alternative Hypothesis (H1):** At least one of the groups comes from a different population distribution than the others (i.e., the intervention or some unknown confounder had an effect).

To perform our statistical analysis, we will apply the following tests:

- Given that the options in our dilemmas only provide nominal categories with no rank (our data is not ordinal), we will Perform a [Chi-Square Test of Independence](https://en.wikipedia.org/wiki/Chi-squared_test) to determine whether or not there is a significant association between group membership and responses ([`stats.chi2_contingency`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2_contingency.html)).

- We will perform a [Kruskal–Wallis test](https://en.wikipedia.org/wiki/Kruskal%E2%80%93Wallis_test) to determine whether or not there is a statistically significant difference between the medians of our groups ([`stats.kruskal`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kruskal.html)). This test will only be used in the questions that produce ranks of responses, i.e., the self-assessment and SCES questions (ordinal, nominal data).

- To control the false discovery rate, given that we are performing a statistical test per question in our survey, we will be conducting a [Benjamini-Hochberg Procedure](https://en.wikipedia.org/wiki/False_discovery_rate) at the end ([`stats.false_discovery_control`](https://scipy.github.io/devdocs/reference/generated/scipy.stats.false_discovery_control.html)).
 
> **Note: rejecting the null hypothesis does not indicate which of the groups differs. Post hoc comparisons between groups are required to determine how these groups differ.**

In [19]:
def PerformTestWithBH(
    df_control: pd.DataFrame, 
    df_experimental: pd.DataFrame, 
    column_name: str, 
    n: int, 
    test_type: str
):
    """
    Perform a statistical test with Benjamini-Hochberg (BH) correction.

    Args:
    df_control (DataFrame): DataFrame containing control data.
    df_experimental (DataFrame): DataFrame containing experimental data.
    column_name (str): Name of the column to perform the test on (dilemma, self_reporting, or SCES).
    n (int): Number of columns to perform the test on.
    test_type (str): Type of test to perform ('chi2' for Chi-square test or 'kruskal' for Kruskal-Wallis test).

    Returns:
    None. Prints a Markdown table of p-values and adjusted p-values.
    """
    # Initialize a list to store p-values
    p_values = []
    names = []

    # Iterate over the columns
    for i in range(n):
        # Construct the column name dynamically
        name = f"{column_name}_{i + 1}"
        
        if test_type == 'chi2':
            # Create DataFrame for control data
            temp_df_control = pd.DataFrame({'control': df_control[name].value_counts().values},
                                           index=df_control[name].value_counts().index)

            # Create DataFrame for experimental data
            temp_df_experimental = pd.DataFrame({'experimental': df_experimental[name].value_counts().values},
                                                 index=df_experimental[name].value_counts().index)
            
            # Concatenate control and experimental dataframes
            result_df = pd.concat([temp_df_control, temp_df_experimental], axis=1)
            # Replace NaN values with 0
            result_df = result_df.fillna(0)

            # Perform Chi-square test
            stat, p_value, _, _ = stats.chi2_contingency(
                np.array([result_df['control'].values, result_df['experimental'].values])
            )
            
        elif test_type == 'kruskal':
            # Get the values of the control and experimental groups.
            control_values = df_control[name].values
            experimental_values = df_experimental[name].values

            # Perform the Kruskal-Wallis H-test for independence.
            stat, p_value = stats.kruskal(control_values, experimental_values)
        else:
            raise ValueError("Invalid test_type. Use 'chi2' or 'kruskal'.")

        # Append p-value to the list
        p_values.append(p_value)
        names.append(name.replace("_", " ").title())
    
    # Create a DataFrame for p-values
    p_values_df = pd.DataFrame({
        column_name.title().replace("_", " "): names,
        "p-value": p_values,
    })

    # Sort the dataframe by p-values
    p_values_df = p_values_df.sort_values(by='p-value').reset_index(drop=True)

    # Perform Benjamini-Hochberg correction
    ps_adjusted = stats.false_discovery_control(p_values_df['p-value'].values, axis=0, method='bh')

    # Add adjusted p-values to the DataFrame
    p_values_df['Adjusted p-value'] = ps_adjusted

    # Print Markdown table
    if test_type == 'chi2':
        print(f"""\nChi-square test ({column_name.title().replace("_", " ")})\n--------------------------------------\n""")
    
    else:
        print(f"""\nKruskal-Wallis test ({column_name.title().replace("_", " ")})\n--------------------------------------\n""")

    print(p_values_df.to_markdown(index=False), "\n")

In [20]:
PerformTestWithBH(df_control, df_experimental, 'dilemma', 16, 'chi2')


Chi-square test (Dilemma)
--------------------------------------

| Dilemma    |   p-value |   Adjusted p-value |
|:-----------|----------:|-------------------:|
| Dilemma 3  | 0.0278158 |           0.386483 |
| Dilemma 4  | 0.0500253 |           0.386483 |
| Dilemma 2  | 0.0890799 |           0.386483 |
| Dilemma 16 | 0.13358   |           0.386483 |
| Dilemma 14 | 0.160441  |           0.386483 |
| Dilemma 11 | 0.162544  |           0.386483 |
| Dilemma 8  | 0.169086  |           0.386483 |
| Dilemma 10 | 0.301639  |           0.603278 |
| Dilemma 6  | 0.368412  |           0.638769 |
| Dilemma 7  | 0.39923   |           0.638769 |
| Dilemma 12 | 0.503154  |           0.690063 |
| Dilemma 15 | 0.517547  |           0.690063 |
| Dilemma 5  | 0.604103  |           0.743512 |
| Dilemma 13 | 0.702408  |           0.802752 |
| Dilemma 1  | 0.855336  |           0.912359 |
| Dilemma 9  | 0.972743  |           0.972743 | 



In [21]:
PerformTestWithBH(df_control, df_experimental, 'self_reporting', 2, 'kruskal')


Kruskal-Wallis test (Self Reporting)
--------------------------------------

| Self Reporting   |   p-value |   Adjusted p-value |
|:-----------------|----------:|-------------------:|
| Self Reporting 1 |  0.549653 |           0.681765 |
| Self Reporting 2 |  0.681765 |           0.681765 | 



In [22]:
PerformTestWithBH(df_control, df_experimental, 'SCES', 10, 'kruskal')


Kruskal-Wallis test (Sces)
--------------------------------------

| Sces    |   p-value |   Adjusted p-value |
|:--------|----------:|-------------------:|
| Sces 8  |  0.185532 |           0.925908 |
| Sces 1  |  0.354289 |           0.925908 |
| Sces 3  |  0.40508  |           0.925908 |
| Sces 10 |  0.521481 |           0.925908 |
| Sces 2  |  0.621003 |           0.925908 |
| Sces 5  |  0.667007 |           0.925908 |
| Sces 6  |  0.74774  |           0.925908 |
| Sces 7  |  0.88575  |           0.925908 |
| Sces 9  |  0.907431 |           0.925908 |
| Sces 4  |  0.925908 |           0.925908 | 



Below, we perform the same tests but on only one portion of our sample. We are assessing if significant results can be found inside smaller groups that compose our control and experimental ensemble (e.g., does the ACM code of ethics influence students?).

In [23]:
def PerformTestWithBHonSubGroup(
    df_control: pd.DataFrame,
    df_experimental: pd.DataFrame,
    column_name: str,
    n: int,
    test_type: str,
    group: str,
    sub_group: str
):
    """
    Perform a statistical test with Benjamini-Hochberg (BH) correction on a subgroup.

    Args:
    df_control (DataFrame): DataFrame containing control data.
    df_experimental (DataFrame): DataFrame containing experimental data.
    column_name (str): Name of the column to perform the test on (dilemma, self_reporting, or SCES).
    n (int): Number of columns to perform the test on.
    test_type (str): Type of test to perform ('chi2' for Chi-square test or 'kruskal' for Kruskal-Wallis test).
    group (str): Group to perform the test on ('Age').
    sub_group (str): Subgroup to perform the test on ('18-24').

    Returns:
    None. Prints a Markdown table of p-values and adjusted p-values.
    """
    # Assert that the group amd subgroup are in the dataframe.
    assert group in df_control.columns, f"{group} is not in the dataframe."
    assert sub_group in df_control[group].unique(), f"{sub_group} is not in the {group} column."

    # Get the portion of the dataframe that corresponds to the subgroup.
    df_control_subgroup = df_control[df_control[group] == sub_group].reset_index(drop=True)
    df_experimental_subgroup = df_experimental[df_experimental[group] == sub_group].reset_index(drop=True)

    print(f"\n{group.title()}: {sub_group.title()}\n--------------------------------------\n")

    # Perform the test on the subgroup.
    PerformTestWithBH(df_control_subgroup, df_experimental_subgroup, column_name, n, test_type)

### Age 18-24

In [24]:
PerformTestWithBHonSubGroup(df_control, df_experimental, 'dilemma', 16, 'chi2', 'age', '18-24')
PerformTestWithBHonSubGroup(df_control, df_experimental, 'self_reporting', 2, 'kruskal', 'age', '18-24')
PerformTestWithBHonSubGroup(df_control, df_experimental, 'SCES', 10, 'kruskal', 'age', '18-24')


Age: 18-24
--------------------------------------


Chi-square test (Dilemma)
--------------------------------------

| Dilemma    |   p-value |   Adjusted p-value |
|:-----------|----------:|-------------------:|
| Dilemma 3  | 0.0601927 |           0.728416 |
| Dilemma 2  | 0.120661  |           0.728416 |
| Dilemma 10 | 0.15733   |           0.728416 |
| Dilemma 12 | 0.27635   |           0.728416 |
| Dilemma 8  | 0.329971  |           0.728416 |
| Dilemma 7  | 0.337477  |           0.728416 |
| Dilemma 15 | 0.438397  |           0.728416 |
| Dilemma 4  | 0.451415  |           0.728416 |
| Dilemma 11 | 0.454004  |           0.728416 |
| Dilemma 16 | 0.45526   |           0.728416 |
| Dilemma 14 | 0.719302  |           0.959386 |
| Dilemma 1  | 0.777419  |           0.959386 |
| Dilemma 13 | 0.937389  |           0.959386 |
| Dilemma 5  | 0.945317  |           0.959386 |
| Dilemma 6  | 0.953103  |           0.959386 |
| Dilemma 9  | 0.959386  |           0.959386 | 


Age: 18-24
---

### Age +24

In [25]:
PerformTestWithBHonSubGroup(df_control, df_experimental, 'dilemma', 16, 'chi2', 'age', '+24')
PerformTestWithBHonSubGroup(df_control, df_experimental, 'self_reporting', 2, 'kruskal', 'age', '+24')
PerformTestWithBHonSubGroup(df_control, df_experimental, 'SCES', 10, 'kruskal', 'age', '+24')


Age: +24
--------------------------------------


Chi-square test (Dilemma)
--------------------------------------

| Dilemma    |   p-value |   Adjusted p-value |
|:-----------|----------:|-------------------:|
| Dilemma 11 | 0.0292933 |           0.283055 |
| Dilemma 4  | 0.0478895 |           0.283055 |
| Dilemma 10 | 0.0530728 |           0.283055 |
| Dilemma 6  | 0.0964102 |           0.342688 |
| Dilemma 14 | 0.114206  |           0.342688 |
| Dilemma 16 | 0.128508  |           0.342688 |
| Dilemma 3  | 0.175571  |           0.401306 |
| Dilemma 15 | 0.25613   |           0.51226  |
| Dilemma 12 | 0.326794  |           0.580967 |
| Dilemma 1  | 0.397788  |           0.627293 |
| Dilemma 8  | 0.433469  |           0.627293 |
| Dilemma 5  | 0.47047   |           0.627293 |
| Dilemma 13 | 0.575713  |           0.693096 |
| Dilemma 2  | 0.606459  |           0.693096 |
| Dilemma 9  | 0.829413  |           0.884707 |
| Dilemma 7  | 0.922589  |           0.922589 | 


Age: +24
-------

### Gender Female

In [26]:
PerformTestWithBHonSubGroup(df_control, df_experimental, 'dilemma', 16, 'chi2', 'gender', 'Female')
PerformTestWithBHonSubGroup(df_control, df_experimental, 'self_reporting', 2, 'kruskal', 'gender', 'Female')
PerformTestWithBHonSubGroup(df_control, df_experimental, 'SCES', 10, 'kruskal', 'gender', 'Female')


Gender: Female
--------------------------------------


Chi-square test (Dilemma)
--------------------------------------

| Dilemma    |   p-value |   Adjusted p-value |
|:-----------|----------:|-------------------:|
| Dilemma 11 |  0.104099 |           0.660917 |
| Dilemma 8  |  0.141    |           0.660917 |
| Dilemma 13 |  0.21352  |           0.660917 |
| Dilemma 16 |  0.229004 |           0.660917 |
| Dilemma 12 |  0.234056 |           0.660917 |
| Dilemma 9  |  0.247844 |           0.660917 |
| Dilemma 5  |  0.299917 |           0.685526 |
| Dilemma 3  |  0.351166 |           0.702332 |
| Dilemma 2  |  0.476159 |           0.78169  |
| Dilemma 6  |  0.488556 |           0.78169  |
| Dilemma 1  |  0.600987 |           0.805843 |
| Dilemma 7  |  0.604382 |           0.805843 |
| Dilemma 14 |  0.815882 |           0.996574 |
| Dilemma 4  |  0.923371 |           0.996574 |
| Dilemma 10 |  0.990994 |           0.996574 |
| Dilemma 15 |  0.996574 |           0.996574 | 


Gender: Fe

### Gender Male

In [27]:
PerformTestWithBHonSubGroup(df_control, df_experimental, 'dilemma', 16, 'chi2', 'gender', 'Male')
PerformTestWithBHonSubGroup(df_control, df_experimental, 'self_reporting', 2, 'kruskal', 'gender', 'Male')
PerformTestWithBHonSubGroup(df_control, df_experimental, 'SCES', 10, 'kruskal', 'gender', 'Male')


Gender: Male
--------------------------------------


Chi-square test (Dilemma)
--------------------------------------

| Dilemma    |   p-value |   Adjusted p-value |
|:-----------|----------:|-------------------:|
| Dilemma 4  | 0.023085  |           0.295021 |
| Dilemma 11 | 0.0405008 |           0.295021 |
| Dilemma 3  | 0.0682725 |           0.295021 |
| Dilemma 14 | 0.0737552 |           0.295021 |
| Dilemma 8  | 0.107055  |           0.342576 |
| Dilemma 2  | 0.184176  |           0.491136 |
| Dilemma 10 | 0.216763  |           0.495458 |
| Dilemma 16 | 0.367182  |           0.591057 |
| Dilemma 12 | 0.373714  |           0.591057 |
| Dilemma 7  | 0.377439  |           0.591057 |
| Dilemma 15 | 0.406352  |           0.591057 |
| Dilemma 6  | 0.483048  |           0.600095 |
| Dilemma 13 | 0.487577  |           0.600095 |
| Dilemma 5  | 0.548744  |           0.627136 |
| Dilemma 9  | 0.628582  |           0.670487 |
| Dilemma 1  | 0.991315  |           0.991315 | 


Gender: Male

### Occupation Student

In [28]:
PerformTestWithBHonSubGroup(df_control, df_experimental, 'dilemma', 16, 'chi2', 'occupation', 'Student')
PerformTestWithBHonSubGroup(df_control, df_experimental, 'self_reporting', 2, 'kruskal', 'occupation', 'Student')
PerformTestWithBHonSubGroup(df_control, df_experimental, 'SCES', 10, 'kruskal', 'occupation', 'Student')


Occupation: Student
--------------------------------------


Chi-square test (Dilemma)
--------------------------------------

| Dilemma    |   p-value |   Adjusted p-value |
|:-----------|----------:|-------------------:|
| Dilemma 16 | 0.0818479 |           0.541649 |
| Dilemma 4  | 0.123008  |           0.541649 |
| Dilemma 8  | 0.136411  |           0.541649 |
| Dilemma 15 | 0.141157  |           0.541649 |
| Dilemma 2  | 0.169265  |           0.541649 |
| Dilemma 3  | 0.220235  |           0.587294 |
| Dilemma 7  | 0.387897  |           0.751632 |
| Dilemma 12 | 0.413486  |           0.751632 |
| Dilemma 11 | 0.427514  |           0.751632 |
| Dilemma 6  | 0.522827  |           0.751632 |
| Dilemma 9  | 0.537686  |           0.751632 |
| Dilemma 13 | 0.563724  |           0.751632 |
| Dilemma 10 | 0.672587  |           0.770304 |
| Dilemma 5  | 0.674016  |           0.770304 |
| Dilemma 1  | 0.835425  |           0.869113 |
| Dilemma 14 | 0.869113  |           0.869113 | 


Occup

### Occupation Professional

In [29]:
PerformTestWithBHonSubGroup(df_control, df_experimental, 'dilemma', 16, 'chi2', 'occupation', 'Professional')
PerformTestWithBHonSubGroup(df_control, df_experimental, 'self_reporting', 2, 'kruskal', 'occupation', 'Professional')
PerformTestWithBHonSubGroup(df_control, df_experimental, 'SCES', 10, 'kruskal', 'occupation', 'Professional')


Occupation: Professional
--------------------------------------


Chi-square test (Dilemma)
--------------------------------------

| Dilemma    |   p-value |   Adjusted p-value |
|:-----------|----------:|-------------------:|
| Dilemma 10 | 0.0229372 |           0.215277 |
| Dilemma 12 | 0.0364456 |           0.215277 |
| Dilemma 16 | 0.0687474 |           0.215277 |
| Dilemma 4  | 0.0731922 |           0.215277 |
| Dilemma 13 | 0.0803953 |           0.215277 |
| Dilemma 14 | 0.0932909 |           0.215277 |
| Dilemma 8  | 0.0941835 |           0.215277 |
| Dilemma 3  | 0.114527  |           0.229055 |
| Dilemma 6  | 0.155829  |           0.277029 |
| Dilemma 5  | 0.25942   |           0.415072 |
| Dilemma 15 | 0.312734  |           0.454886 |
| Dilemma 2  | 0.341619  |           0.455492 |
| Dilemma 11 | 0.460515  |           0.566787 |
| Dilemma 9  | 0.679468  |           0.776535 |
| Dilemma 1  | 0.781117  |           0.832743 |
| Dilemma 7  | 0.832743  |           0.832743 | 




In plotting individual graphs for every survey question, we will apply _statistical bootstrapping_ to further explore the differences in our distributions. [Bootstrapping](https://en.wikipedia.org/wiki/Bootstrapping_(statistics)) is a method for constructing a confidence interval for a statistic when the sample size is small and the underlying distribution is unknown. In short, we will take 1000 repeated samples with replacements from both our distributions. In each iteration, we calculate the mean of each sample and append it to the respective list of means. We will use this created distribution to see how mean values differ between groups.

In [30]:
def PlotGraphs(column_name: str, n: int):
    """
    Perform plotting of distribution and bootstrap mean distribution.

    Args:
    column_name (str): Name of the column to plot.
    n (int): Number of the column (depends on the type).

    Returns:
    None. Prints the distribution plots and bootstrap results.
    """

    # Validating inputs
    assert column_name in ["dilemma", "self_reporting", "SCES"], f"{column_name} is not a valid column name."

    if column_name == "dilemma":
        assert 1 <= n <= 16, f"{n} is not a valid dilemma number."
        name = f"dilemma_{n}"
    elif column_name == "self_reporting":
        assert 1 <= n <= 2, f"{n} is not a valid self-assessment number."
        name = f"self_reporting_{n}"
    elif column_name == "SCES":
        assert 1 <= n <= 10, f"{n} is not a valid SCES number."
        name = f"SCES_{n}"

    # Plotting distribution
    plt.figure(figsize=(10, 6))

    # Get unique responses and their counts for control and experimental groups
    unique_control, counts_control = np.unique(df_control[name], return_counts=True)
    unique_experimental, counts_experimental = np.unique(df_experimental[name], return_counts=True)

    all_responses = np.union1d(unique_control, unique_experimental)
    positions = np.arange(len(all_responses))
    bar_width = 0.35

    # Plotting bars
    plt.bar(positions - bar_width/2,
            [counts_control[np.where(unique_control == response)[0][0]] if response in unique_control else 0 for response in all_responses],
            bar_width, color='skyblue', edgecolor='black', label='Control')
    plt.bar(positions + bar_width/2,
            [counts_experimental[np.where(unique_experimental == response)[0][0]] if response in unique_experimental else 0 for response in all_responses],
            bar_width, color='lightcoral', edgecolor='black', label='Experimental')

    # Adding labels and legend
    plt.title(f'Distribution of Responses')
    plt.xlabel('Response')
    plt.ylabel('Count')
    plt.xticks(positions, all_responses)
    plt.legend()
    plt.show()

    # Calculating demographic analysis
    temp_df = pd.DataFrame({
        'all': df[name].value_counts().values,
        '%': df[name].value_counts(normalize=True).values * 100
    }, index=df[name].value_counts().index)

    temp_df_control = pd.DataFrame({
        'control': df_control[name].value_counts().values,
        '%': df_control[name].value_counts(normalize=True).values * 100
    }, index=df_control[name].value_counts().index)

    temp_df_experimental = pd.DataFrame({
        'experimental': df_experimental[name].value_counts().values,
        '%': df_experimental[name].value_counts(normalize=True).values * 100
    }, index=df_experimental[name].value_counts().index)

    result_df = pd.concat([temp_df, temp_df_control, temp_df_experimental], axis=1).fillna(0)
    result_df.index.name = "Results"
    result_df['%'] = result_df['%'].round(2)

    print(f"\nResults\n------------------------\n")
    print(result_df.to_markdown())

    # Bootstrap sampling
    control_values = df_control[name].values
    experimental_values = df_experimental[name].values
    
    control_means = []
    experimental_means = []

    # Perform 1000 iterations of bootstrap sampling
    for i in range(1000):
        control_sample = np.random.choice(control_values, size=len(control_values), replace=True)
        experimental_sample = np.random.choice(experimental_values, size=len(experimental_values), replace=True)
        control_means.append(control_sample.mean())
        experimental_means.append(experimental_sample.mean())

    control_mean, control_confidence_interval = np.mean(control_means), np.percentile(control_means, [2.5, 97.5])
    experimental_mean, experimental_confidence_interval = np.mean(experimental_means), np.percentile(experimental_means, [2.5, 97.5])

    control_standard_deviation, experimental_standard_deviation = np.std(control_means), np.std(experimental_means)

    # Plotting bootstrap mean distribution
    plt.figure(figsize=(10, 6))
    plt.hist(control_means, bins=30, color='skyblue', alpha=0.7, label='Control')
    plt.hist(experimental_means, bins=30, color='lightcoral', alpha=0.7, label='Experimental')
    plt.axvline(control_mean, color='blue', linestyle='dashed', linewidth=2, label='Control Mean')
    plt.axvline(experimental_mean, color='red', linestyle='dashed', linewidth=2, label='Experimental Mean')
    plt.xlabel('Mean')
    plt.ylabel('Frequency')
    plt.legend()
    plt.show()

    bootstrap_df = pd.DataFrame({
        "Group": ["Control", "Experiment"],
        "Mean": [control_mean, experimental_mean],
        "2.5% CI": [control_confidence_interval[0], experimental_confidence_interval[0]],
        "97.5% CI": [control_confidence_interval[1], experimental_confidence_interval[1]],
        "Standard Deviation": [control_standard_deviation, experimental_standard_deviation]
    })

    print("\nBootstrap Sampling\n------------------------\n")
    print(bootstrap_df.to_markdown(index=False))


### Ethical Dilemma 1 (**Electromagnetic Trouble**)

![Someone is typing in with their right hand on a keyboard in front of a computer monitor while holding a smartphone.](https://raw.githubusercontent.com/Nkluge-correa/codes-of-ethics/main/img/Dilemma-01.gif)

The manufacturing department has developed a new technology to enable a specific feature users have requested for a long time. Unfortunately, the technology cannot operate without causing the electromagnetic fields from cell phone towers to increase beyond legal limits. While this feature has not been possible for the past few years, competitors will likely beat you to the market if you wait until you can approach the legal limits. One manager suggests changing the software on the devices to detect when the regulatory tests are happening and modify their behavior to prevent the electromagnetic field from increasing. 

What do you do?

0. Wait until the technology is available, without test detection software, and risk being beaten to market
1. Begin development of the test detection software
2. Indecisive

In [None]:
PlotGraphs("dilemma", 1)

### Ethical Dilemma 2 (**Home Office**)

![Hands are typing on a keyboard in front of a computer monitor.](https://raw.githubusercontent.com/Nkluge-correa/codes-of-ethics/main/img/Dilemma-02.gif)

A deadline is fast approaching for a project you are working on. You realize that you cannot meet the deadline if you work only during normal hours. You are not allowed to take your computer out of the office.

What do you do?

0. Stay at work longer in order to continue development
1. Download the data onto a personal hard drive so you can continue development at home
2. Indecisive

In [None]:
PlotGraphs("dilemma", 2)

### Ethical Dilemma 3 (**Tough Competition**)

![Someone is removing the screen of a computer to see the program below the user interface.](https://raw.githubusercontent.com/Nkluge-correa/codes-of-ethics/main/img/Dilemma-03.gif)

Your company is struggling to enter a lucrative market dominated by a single competitor. While trying to figure out how to import data from that competitor's website, you discover a serious vulnerability that would allow an exploiter to easily access all of the competitor's customer information.

What do you do with the bug information?

0. Report the information to the competitor
1. Say nothing about the bug to the competitor
2. Indecisive

In [None]:
PlotGraphs("dilemma", 3)

### Ethical Dilemma 4 (**Advertising Opportunities**)

![Someone is putting user information in a folder. A little person shows up on the screen, offering money.](https://raw.githubusercontent.com/Nkluge-correa/codes-of-ethics/main/img/Dilemma-04.gif)

Your company has been collecting anonymous usage statistics for your products for many years, but recently has struggled to acquire new users, causing the company to consider scaling back operations. Seeing your company struggling and knowing the value of your customers' data, an advertising company approaches you to use your company's user data to improve its advertising recommendations. Your privacy policy does not explicitly mention selling user data to others, but refusing this offer can result in employees being fired. You are responsible for this decision.

What do you do?

0. Refuse the advertising company's offer
1. Sign a contract with the advertising company
2. Indecisive

In [None]:
PlotGraphs("dilemma", 4)

### Ethical Dilemma 5 (**Shutdown Button**)

![Someone is connecting and disconnecting the service of a node that represents a hospital.](https://raw.githubusercontent.com/Nkluge-correa/codes-of-ethics/main/img/Dilemma-05.gif)

To protect against unauthorized use of your software, your company has incorporated an automatic shutdown switch that prevents it from running after a specific period. An intensive care unit at the local hospital started using your software a year ago. The unit has not paid any of its bills, and the shutoff switch is about to trigger. If the kill switch remains in place, the hospital will not be able to function because critical equipment will be disabled. You can remove the switch for the hospital.

What do you do?

0. Edit the software to remove the off switch
1. Do nothing, allowing the switch to remain operative
2. Indecisive

In [None]:
PlotGraphs("dilemma", 5)

### Ethical Dilemma 6 (**Known Flaws**)

![Someone is holding a contract while looking at a computer monitor where a monster appears behind the code in the text editor.](https://raw.githubusercontent.com/Nkluge-correa/codes-of-ethics/main/img/Dilemma-06.gif)

While reviewing a software specification that your company has just been contracted to create, your team discovers a major flaw that could potentially affect the customer. Your company has spent the last year trying to negotiate this lucrative contract. Your managers do not want to tell the customer about it because it could further prolong the negotiations.

What do you do?

0. Informs the customer about the issue
1. Does not tell the client about the issue
2. Indecisive

In [None]:
PlotGraphs("dilemma", 6)

### Ethical Dilemma 7 (**Criticality**)

![Someone opens their computer screen, revealing literal bugs underneath the code editor.](https://raw.githubusercontent.com/Nkluge-correa/codes-of-ethics/main/img/Dilemma-07.gif)

You are on a team tasked with maintaining critical software for a customer's financial system. During testing, you discover a critical bug that has been present for a long time. While you fix it, your manager does not want to inform the customer for fear that he might doubt your company's competence.

What do you do?

0. Inform the customer about the bug
1. Don't tell the customer about the bug
2. Indecisive

In [None]:
PlotGraphs("dilemma", 7)

### Ethical Dilemma 8 (**Business Opportunities**)

![Someone is working on their computer as a man appears on the screen offering money.](https://raw.githubusercontent.com/Nkluge-correa/codes-of-ethics/main/img/Dilemma-08.gif)

You have been the point of contact within your company for all projects related to a specific client. One day, you receive a message in your personal email from that client requesting your services to be paid as a contractor on a project completely unrelated to all the previous work they have requested with the company.

What do you do?

0. Notify your manager about the request
1. Accept the client's work
2. Indecisive

In [None]:
PlotGraphs("dilemma", 8)

### Ethical Dilemma 9 (**Disruptive Software**)

![Someone is working on their computer while a man appears on the screen and gives the worker a stern and suspicious look.](https://raw.githubusercontent.com/Nkluge-correa/codes-of-ethics/main/img/Dilemma-09.gif)

Rumor has it that your former employer is the lead for a new software product that could be an industry breakthrough. On the morning of the beginning of your third week in your current job, you receive the following memo from the president: Please meet me tomorrow at 8:15 to discuss the developments that your former employer has made in this new area.

What do you do?

0. Tell the president that you will not discuss the matter
1. Meet with the president knowing the purpose of the discussion
2. Indecisive

In [None]:
PlotGraphs("dilemma", 9)

### Ethical Dilemma 10 (**Going to Court**)

![Someone is working on their computer, scrolling the monitor to find broken code.](https://raw.githubusercontent.com/Nkluge-correa/codes-of-ethics/main/img/Dilemma-10.gif)

The company is currently being sued by a customer who claims to have been injured by one of the company's products. In your development tasks, you discover a piece of your company's open-sourced code that has not been analyzed for years. You find a way to affect the customer's personal injury claim. There is a large sum of money at stake and the company is currently in good shape to win the case.

What do you do?

0. Not revealing the information to the customer
1. Disclose the information to the customer
2. Indecisive

In [None]:
PlotGraphs("dilemma", 11)

### Ethical Dilemma 11 (**The Mini-Crisis**)

![Someone is working on their computer as the icon of a telephone (a phone call) appears on the screen.](https://raw.githubusercontent.com/Nkluge-correa/codes-of-ethics/main/img/Dilemma-11.gif)

As you are leaving your office for an extremely important project with a potential new client, you hear the administrative assistant say, "If John calls, please see that he calls home, his spouse says there is a mini-crisis". You are to meet John at the client's office, and the two of you are to make a presentation (pitch). John's participation is crucial. John is quite nervous and often gives a bad impression if distracted.

What do you do?

0. Not tell John about the situation before the meeting
1. Tell John about the situation before the meeting
2. Indecisive

In [None]:
PlotGraphs("dilemma", 11)

### Ethical Dilemma 12 (**Equalized Outcomes**)

![Someone is working on their computer, zooming in and out of different graphs and charts.](https://raw.githubusercontent.com/Nkluge-correa/codes-of-ethics/main/img/Dilemma-12.gif)

Machine learning models perform tasks for which they are trained. When developing an autonomous learning system, an inaccuracy was noticed, resulting in discrimination involving a group representing 6.2% of the population where the model will be applied. However, the system has already been approved by the CEO and COO of the company due to the good performance results obtained.

What do you do?

0. Evidence a lack of accuracy in presenting the model on its own
1. Since the decision making is not yours, you deliver the model as approved
2. Indecisive

In [None]:
PlotGraphs("dilemma", 12)

### Ethical Dilemma 13 (**Health Assistant**)

![Someone is working on their computer while two persons pop in and out of view. One is sad, and another is happy.](https://raw.githubusercontent.com/Nkluge-correa/codes-of-ethics/main/img/Dilemma-13.gif)

You have the opportunity to create a next-generation model that will help promote and preserve the mental health of millions of people and may even cure milder mental disorders. However, this model will harm, to a lesser extent, individuals with serious disorders (e.g., suicidal tendencies, overwhelming fear of social interaction, etc.) since the difficulty in social interaction could aggravate these individuals.

What do you do?

0. Does not develop the model, since, even though it will help mankind considerably, it may incur life risks for a small portion of the population
1. Develop the model, since it will help millions of people and the number of individuals harmed would be less
2. Indecisive

In [None]:
PlotGraphs("dilemma", 13)

### Ethical Dilemma 14 (**Improving the Hash**)

![Someone is working on their computer disconnects a service that leaves many people unhappy. Their faces show on the monitor.](https://raw.githubusercontent.com/Nkluge-correa/codes-of-ethics/main/img/Dilemma-14.gif)

You were hired to enhance the appointment scheduling software for a medical clinic. During the development process, you learn that the personal data collected is encrypted by an insecure hash (SHA1). You can enhance the system by changing the hash used to a more robust version (SHA 256). However, this does not change the fact that the personal data of this clinic's clients was relatively unprotected during the time the system was being used. Improving the encryption protocol used will require a development and testing period equivalent to (approximately) 1 month. During this time, this clinic's data will be vulnerable.

What do you do?

0. Advises that those in charge of the clinic notify their clients, and temporarily remove their application from the air
1. Proposes that the upgrade of the encryption protocol (SHA1 to SHA 256) be done during the next month, without this being disclosed to the clinic's clients
2. Indecisive

In [None]:
PlotGraphs("dilemma", 14)

### Ethical Dilemma 15 (**Factory Robot**)

![Someone is working on their computer as a robot keeps bumping at the right side of the inner screen.](https://raw.githubusercontent.com/Nkluge-correa/codes-of-ethics/main/img/Dilemma-15.gif)

You have developed an AI-based motion program for a factory robot that transports heavy materials. Two months into the test operation, no abnormalities were identified. One month after the test operation (now in production), the robot runs over a pregnant employee, resulting in her death. The technical report indicates problems with the robot's sensor calibration program. This poor calibration originates in the source code and data used during testing.

What do you do?

0. You take no responsibility for the accident
1. You decide to take responsibility for the accident
2. Indecisive

In [None]:
PlotGraphs("dilemma", 15)

### Ethical Dilemma 16 (**Risk Mitigation**)

![Someone is working on their computer while different types of sensitive user information (geolocation) show on the screen.](https://raw.githubusercontent.com/Nkluge-correa/codes-of-ethics/main/img/Dilemma-16.gif)

Your team has been working on a mobile application that tracks the user's location to provide personalized recommendations for nearby restaurants and events. However, your team recently discovered a way to collect additional data points, such as the user's contacts, browsing history, and social media activity, which could greatly enhance the app's functionality and revenue potential.

What do you do?

0. Advocate for transparently informing the users about the additional data collection and asking for their explicit consent
1. Refuse to collect the additional data, citing privacy concerns and potential backlash from the public
2. Indecisive

In [None]:
PlotGraphs("dilemma", 16)

## Analyzing Self-Reporting Questions

Now, we will analyze the responses to the self-reporting questions.

### Self-Reporting Question 1

From "Insufficient" to "Extensive", how would you describe your ethical training in dealing with situations of the kind experienced in the dilemmas presented in our survey?

1. Insufficient
2. Below Average
3. Average
4. Above Average
5. Extensive

In [None]:
PlotGraphs("self_reporting", 1)

### Self-Reporting Question 2

Regarding the following statement: "Codes of Ethics have a necessary importance in the practice of IT-professions.", indicate your level of agreement.

1. I strongly disagree
2. I disagree
3. Nothing to state
4. I agree
5. I strongly agree

In [None]:
PlotGraphs("self_reporting", 2)

## Analyzing Santa Clara Ethics Scale

Now, we will analyze the responses to the [Santa Clara Ethics Scale](https://link.springer.com/article/10.1007/s11089-019-00861-w) (SCES).

### SCES Question 1

Respecting others, even those who I don't like or agree with, is very important to me.

1. I strongly disagree
2. I disagree
3. Nothing to state
4. I agree
5. I strongly agree

In [None]:
PlotGraphs("SCES", 1)

### SCES Question 2

Being responsible and accountable, even when I have to admit that I'm wrong or have errored, is very important to me.

1. I strongly disagree
2. I disagree
3. Nothing to state
4. I agree
5. I strongly agree

In [None]:
PlotGraphs("SCES", 2)

### SCES Question 3

Being honest, fair, and maintaining integrity, even when it might put me at a disadvantage, is very important to me.

1. I strongly disagree
2. I disagree
3. Nothing to state
4. I agree
5. I strongly agree

In [None]:
PlotGraphs("SCES", 3)

### SCES Question 4

I strive to be competent in my areas of personal or professional expertise and am the first to admit it when I am not and have fallen short.

1. I strongly disagree
2. I disagree
3. Nothing to state
4. I agree
5. I strongly agree

In [None]:
PlotGraphs("SCES", 4)

### SCES Question 5

I feel a great deal of compassion for others, even those whom I don't know or have few things in common with.

1. I strongly disagree
2. I disagree
3. Nothing to state
4. I agree
5. I strongly agree

In [None]:
PlotGraphs("SCES", 5)

### SCES Question 6

I have clear ethical guiding principles that I keep in mind and follow at all times.

1. I strongly disagree
2. I disagree
3. Nothing to state
4. I agree
5. I strongly agree

In [None]:
PlotGraphs("SCES", 6)

### SCES Question 7

It is more important for me to behave ethically than to get an advantage in life.

1. I strongly disagree
2. I disagree
3. Nothing to state
4. I agree
5. I strongly agree

In [None]:
PlotGraphs("SCES", 7)

### SCES Question 8

I never take advantage of others and am truthful in my relationships and interactions even when it might put me at a disadvantage.

1. I strongly disagree
2. I disagree
3. Nothing to state
4. I agree
5. I strongly agree

In [None]:
PlotGraphs("SCES", 8)

### SCES Question 9

I would not be embarrassed if all of my actions were filmed and played back for others to see and evaluate.

1. I strongly disagree
2. I disagree
3. Nothing to state
4. I agree
5. I strongly agree

In [None]:
PlotGraphs("SCES", 9)

### SCES Question 10

I typically ask myself what the right thing to do is from an ethical or moral perspective before making decisions

1. I strongly disagree
2. I disagree
3. Nothing to state
4. I agree
5. I strongly agree

In [None]:
PlotGraphs("SCES", 10)

Summarizing our findings and bringing answers to the research questions posed at the introduction of this study, our results show that:
 
- **Can passive exposure to a CoE influence the decision-making of IT professionals and students?** We failed to reject H0 for the 16 ethical dilemmas. Our 2018 ACM CoE intervention did not produce a statistically significant difference in the response patterns of groups C and E regarding their monitored decision-making patterns. All these results were shown regardless of the type of subgroup comparison performed.
- **Is ethical training an active topic in their academic or professional careers?** Most of our study participants self-report an average or above-average ethical training. A minority of our sample ($\approx$ 18\%) states to have below-average ethical training during their formation.
- **What is the importance IT professionals and students attribute to CoE?** Subjects report that they attribute considerable significance to the use and existence of codes of ethics. Most of our sample (> 90\%) agree that these CoEs are important for IT practices.
- **Can passive exposure to a CoE influence how IT professionals and students perceive their moral behavior?** We failed to reject H0 for moral self-assessment. Our 2018 ACM CoE intervention did not produce a statistically significant difference in groups C and E's response patterns regarding how they perceive their day-to-day ethical practices. All these results were shown regardless of the type of subgroup comparison performed.