# **Manual Naive Bayes**

The following illustrates how to solve the exercise in the naive bayes lecture materials by hand.

### Data

In [1]:
import pandas as pd

# Data
data = {
    'Example': [1, 2, 3, 4, 5, 6, 7],
    'Positive Mammogram': ['Yes', 'Yes', 'No', 'Yes', 'Yes', 'No', 'No'],
    'Family History': ['Yes', 'Yes', 'Yes', 'No', 'No', 'No', 'No'],
    'Alcohol': ['Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'No'],
    'Cancer': ['Yes', 'Yes', 'Yes', 'No', 'No', 'No', 'No']
}

# Creating DataFrame
df = pd.DataFrame(data)

To predict whether someone with a positive mammogram, a family history of cancer, and who uses alcohol has cancer under the Naive Bayes assumption, we'll follow these steps:

### Step 1: Calculate Prior Probabilities
The prior probability is the overall probability of having cancer (or not) in the dataset, regardless of other features.

- $P(Cancer = Yes)$
- $P(Cancer = No)$

### Step 2: Calculate Conditional Probabilities
We need to calculate the probability of the features given each cancer outcome. Under the Naive Bayes assumption, these probabilities are considered independent.

- $P(Positive\ Mammogram = Yes | Cancer = Yes)$
- $P(Family\ History = Yes | Cancer = Yes)$
- $P(Alcohol = Yes | Cancer = Yes)$
- $P(Positive\ Mammogram = Yes | Cancer = No)$
- $P(Family\ History = Yes | Cancer = No)$
- $P(Alcohol = Yes | Cancer = No)$

### Step 3: Multiply to Find the More Likely Outcome
We calculate the likelihood of having cancer and not having cancer given the conditions, and then compare to predict the more likely outcome.

- Likelihood of having cancer: $P(Cancer = Yes) \times P(Positive\ Mammogram = Yes | Cancer = Yes) \times P(Family\ History = Yes | Cancer = Yes) \times P(Alcohol = Yes | Cancer = Yes)$
- Likelihood of not having cancer: $P(Cancer = No) \times P(Positive\ Mammogram = Yes | Cancer = No) \times P(Family\ History = Yes | Cancer = No) \times P(Alcohol = Yes | Cancer = No)$

### Manual Calculations:


In [3]:

# Prior probabilities
P_Cancer_Yes = len(df[df['Cancer'] == 'Yes']) / len(df)
P_Cancer_No = len(df[df['Cancer'] == 'No']) / len(df)

# Conditional probabilities
P_PosMammo_Yes_Cancer_Yes = len(df[(df['Positive Mammogram'] == 'Yes') & (df['Cancer'] == 'Yes')]) / len(df[df['Cancer'] == 'Yes'])
P_FamilyHist_Yes_Cancer_Yes = len(df[(df['Family History'] == 'Yes') & (df['Cancer'] == 'Yes')]) / len(df[df['Cancer'] == 'Yes'])
P_Alcohol_Yes_Cancer_Yes = len(df[(df['Alcohol'] == 'Yes') & (df['Cancer'] == 'Yes')]) / len(df[df['Cancer'] == 'Yes'])

P_PosMammo_Yes_Cancer_No = len(df[(df['Positive Mammogram'] == 'Yes') & (df['Cancer'] == 'No')]) / len(df[df['Cancer'] == 'No'])
P_FamilyHist_Yes_Cancer_No = len(df[(df['Family History'] == 'Yes') & (df['Cancer'] == 'No')]) / len(df[df['Cancer'] == 'No'])
P_Alcohol_Yes_Cancer_No = len(df[(df['Alcohol'] == 'Yes') & (df['Cancer'] == 'No')]) / len(df[df['Cancer'] == 'No'])

# Multiply for the more likely outcome
likelihood_cancer_yes = P_Cancer_Yes * P_PosMammo_Yes_Cancer_Yes * P_FamilyHist_Yes_Cancer_Yes * P_Alcohol_Yes_Cancer_Yes
likelihood_cancer_no = P_Cancer_No * P_PosMammo_Yes_Cancer_No * P_FamilyHist_Yes_Cancer_No * P_Alcohol_Yes_Cancer_No

# Prediction
prediction = 'Yes' if likelihood_cancer_yes > likelihood_cancer_no else 'No'

print(prediction)

Yes
