In [1]:
import pandas as pd
from collections import defaultdict

def open_file(filename):
    # Read the CSV file
    data = pd.read_csv(filename)
    return data

data= open_file('dataset.csv')
data.head() 

Unnamed: 0,Patient,S1,S2,S3,Treatment
0,P 1,1,1,1,Yes
1,P 2,1,1,0,Yes
2,P 3,1,0,1,Yes
3,P 4,0,1,1,Yes
4,P 5,1,1,0,Yes


## Creating Equivalence classes
Creating the equivalence classes on the basis of the values of S1, S2, and S3. For example, if we see <br>
X 1 = {P 1 } <br>
X 2 = {P 2 , P 5 }<br>
X 3 = {P 3 , P 10 , P 15 , P 16 , P 20 }<br>
X 5 = {P 7 , P 17 , P 21 , P 27 }<br>
X 4 = {P 4 , P 6 , P 9 , P 11 }<br>
X 6 = {P 8 , P 22 , P 23 }<br>
X 7 = {P 12 , P 13 , P 18 , P 24 , P 26 }<br>
X 8 = {P 14 , P 19 , P 25 }<br>

#1. This logic is part of the `create_equivalence_classes` function, which takes a dataset as input and creates equivalence classes based on the values of the 'S1', 'S2', and 'S3' columns. Here's how it works:

1. `for index, row in dataset.iterrows():`: This iterates over each row in the dataset, where `index` is the index of the row, and `row` is a Series containing the data in that row.

2. `key = (row['S1'], row['S2'], row['S3'])`: For each row, it extracts the values in the 'S1', 'S2', and 'S3' columns and creates a tuple called `key`. This tuple serves as the key for the equivalence class dictionary.

3. `equivalence_classes[key].append(row['Patient'])`: It appends the value of the 'Patient' column for the current row to the list associated with the `key` in the `equivalence_classes` dictionary. If the `key` does not exist in the dictionary yet, a new entry is created with an empty list, and then the 'Patient' value is appended to that list.

4. Finally, `return equivalence_classes`: It returns the `equivalence_classes` dictionary containing the created equivalence classes, where each key is a tuple of 'S1', 'S2', and 'S3' values, and the corresponding value is a list of 'Patient' values belonging to that equivalence class.

In summary, this logic iterates through each row in the dataset, groups the rows based on the values of the 'S1', 'S2', and 'S3' columns, and creates equivalence classes with the 'Patient' values belonging to each group.


In [2]:
# Define a function to create equivalence classes
def create_equivalence_classes(dataset):
    # Define a dictionary to store equivalence classes
    equivalence_classes = defaultdict(list)

    # 1
    # Group data by S1, S2, S3 values and create equivalence classes
    for index, row in dataset.iterrows():
        key = (row['S1'], row['S2'], row['S3'])
        equivalence_classes[key].append(row['Patient'])

    return equivalence_classes
    # 1
    
# Call the function and get the equivalence classes
equivalence_classes = create_equivalence_classes(data)

# Print the equivalence classes
print("Equivalence Classes:")
for i, (key, patients) in enumerate(equivalence_classes.items(), start=1):
    class_name = f"X{i}"
    patients_str = ", ".join(patients)
    print(f"{class_name} = {{{patients_str}}}")

Equivalence Classes:
X1 = {P 1}
X2 = {P 2, P 5}
X3 = {P 3, P 1 0, P 1 5, P 1 6, P 2 0}
X4 = {P 4, P 6, P 9, P 1 1}
X5 = {P 7, P 1 7, P 2 1, P 2 7}
X6 = {P 8, P 2 2, P 2 3}
X7 = {P 1 2, P 1 3, P 1 8, P 2 4, P 2 6}
X8 = {P 1 4, P 1 9, P 2 5}


\$\$ P(C|X_i) = \frac{|Treatment = Yes \cap X_i|}{|X_i|} \$\$

#2.
Here's an explanation of how the provided logic works:

1. `for i, (key, patients) in enumerate(equivalence_classes.items(), start=1):`:
   - This loop iterates over each equivalence class (denoted by `key`) along with its corresponding list of patients (denoted by `patients`) in the `equivalence_classes` dictionary.
   - The `enumerate()` function is used to iterate over the items in `equivalence_classes`, providing an index `i` for each item, starting from 1.

2. `total_count = len(patients)`: 
   - Calculates the total count of patients in the current equivalence class `X_i` by finding the length of the `patients` list.

3. `treatment_counts = dataset.loc[dataset['Patient'].isin(patients), 'Treatment'].value_counts()`:
   - Filters the dataset to include only the rows where the patient ID is in the current `patients` list (i.e., in the current equivalence class `X_i`).
   - Then, it counts the occurrences of each unique value in the 'Treatment' column for these filtered rows.
   - This provides the counts of 'Yes' and 'No' treatments for the patients in the current equivalence class.

4. `if 'Yes' in treatment_counts:`:
   - Checks if there are any occurrences of 'Yes' treatment in the `treatment_counts`.
   - If 'Yes' treatment is found:
     - `yes_count = treatment_counts['Yes']` calculates the count of patients in the equivalence class `X_i` who received 'Yes' treatment.
     - `probability = yes_count / total_count` calculates the probability \( P(\text{Treatment = Yes} | X_i) \), which is the ratio of the count of patients with 'Yes' treatment to the total count of patients in `X_i`.
   - If 'Yes' treatment is not found, `probability` is set to 0.

5. `class_name = f"X{i}"`:
   - Constructs the name of the current equivalence class `X_i` using the index `i`.

6. `class_info = (class_name, probability, patients)`:
   - Constructs a tuple `class_info` containing the name of the equivalence class `X_i`, the calculated probability \( P(\text{Treatment = Yes} | X_i) \), and the list of patients in `X_i`.

This logic calculates the probability of 'Yes' treatment for each equivalence class `X_i` based on the patients' treatment outcomes and stores this information along with other relevant data in the tuple `class_info`.


In [3]:
def calculate_probabilities(equivalence_classes, dataset):
    positive_classes = []
    negative_classes = []
    uncertain_classes = []
    
    # 2
    for i, (key, patients) in enumerate(equivalence_classes.items(), start=1):
        total_count = len(patients)  # Total count of patients in the equivalence class X_i
        treatment_counts = dataset.loc[dataset['Patient'].isin(patients), 'Treatment'].value_counts()

        if 'Yes' in treatment_counts:
            yes_count = treatment_counts['Yes']  # Count of patients in the equivalence class X_i with 'Yes' treatment
            probability = yes_count / total_count  # Probability P(Treatment = Yes | X_i)
        else:
            probability = 0

        class_name = f"X{i}"
        class_info = (class_name, probability, patients)
    # 2

        if probability <= 0.25:
            negative_classes.append(class_info)  # Append X_i to negative_classes if P(Treatment = Yes | X_i) <= 0.25
        elif probability >= 0.75:
            positive_classes.append(class_info)  # Append X_i to positive_classes if P(Treatment = Yes | X_i) >= 0.75
        else:
            uncertain_classes.append(class_info)  # Append X_i to uncertain_classes if 0.25 < P(Treatment = Yes | X_i) < 0.75

    return positive_classes, negative_classes, uncertain_classes


In [4]:
positive_classes, negative_classes, uncertain_classes = calculate_probabilities(equivalence_classes, data)
# Print the results
print("Positive Classes:")
for class_info in positive_classes:
    print(f"Class: {class_info[0]}, Probability: {class_info[1]}")

print("\nNegative Classes:")
for class_info in negative_classes:
    print(f"Class: {class_info[0]}, Probability: {class_info[1]}")

print("\nUncertain Classes:")
for class_info in uncertain_classes:
    print(f"Class: {class_info[0]}, Probability: {class_info[1]}")

Positive Classes:
Class: X1, Probability: 1.0
Class: X2, Probability: 1.0
Class: X3, Probability: 0.8
Class: X4, Probability: 0.75

Negative Classes:
Class: X7, Probability: 0.2
Class: X8, Probability: 0

Uncertain Classes:
Class: X5, Probability: 0.5
Class: X6, Probability: 0.3333333333333333
