# 02. The Synoptic Algorithm
## Source Code Detection in the Gospels

**Objective:** Quantify the textual overlap between Matthew, Mark, and Luke to validate the 'Markan Priority' hypothesis.
**The Question:** Is Mark the 'Source Code' (Base Class) that Matthew and Luke extended?

In [1]:
# 1. GENERATE GOSPEL DATA (Simulated Pericopes/Verses)
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

def generate_gospel_overlap():
    print("üìö COMPARING GOSPEL MANUSCRIPTS...")
    
    # Total Verse Counts (Approximate)
    # Mark: 661 verses
    # Matthew: 1068 verses
    # Luke: 1149 verses
    
    # Simulation Logic:
    # 1. Mark is the base set (M)
    # 2. Matthew contains 90% of Mark + Q Source + Unique Matt
    # 3. Luke contains 50% of Mark + Q Source + Unique Luke
    
    mark_content = set(range(0, 661)) # IDs representing unique stories
    
    # Matthew copies 90% of Mark
    matt_from_mark = set(list(mark_content)[:int(661 * 0.90)])
    # Q Source (Shared by Matt & Luke, not in Mark) ~230 verses
    q_source = set(range(2000, 2230))
    matt_unique = set(range(3000, 3300))
    matthew_content = matt_from_mark.union(q_source).union(matt_unique)
    
    # Luke copies 55% of Mark
    luke_from_mark = set(list(mark_content)[:int(661 * 0.55)])
    luke_unique = set(range(4000, 4500))
    luke_content = luke_from_mark.union(q_source).union(luke_unique)
    
    return mark_content, matthew_content, luke_content

mark, matt, luke = generate_gospel_overlap()
print("‚úÖ Corpus Loaded.")

üìö COMPARING GOSPEL MANUSCRIPTS...
‚úÖ Corpus Loaded.


In [2]:
# 2. CALCULATE JACCARD SIMILARITY
def jaccard(set_a, set_b):
    intersection = len(set_a.intersection(set_b))
    union = len(set_a.union(set_b))
    return intersection / union

sim_mark_matt = jaccard(mark, matt)
sim_mark_luke = jaccard(mark, luke)
sim_matt_luke = jaccard(matt, luke)

print(f"Similarity (Mark <-> Matthew): {sim_mark_matt:.2%}")
print(f"Similarity (Mark <-> Luke):    {sim_mark_luke:.2%}")
print(f"Similarity (Matthew <-> Luke): {sim_matt_luke:.2%} (Includes Q Source)")

# Containment Ratio (How much of Mark is inside Matthew?)
containment = len(mark.intersection(matt)) / len(mark)
print(f"\nüîç MARKAN PRIORITY CHECK: {containment:.1%} of Mark is contained within Matthew.")
print("   This high containment ratio strongly suggests Mark was a source document.")

Similarity (Mark <-> Matthew): 49.87%
Similarity (Mark <-> Luke):    26.10%
Similarity (Matthew <-> Luke): 36.51% (Includes Q Source)

üîç MARKAN PRIORITY CHECK: 89.9% of Mark is contained within Matthew.
   This high containment ratio strongly suggests Mark was a source document.
