This is a specialized tool for working with text from Jastrow's Dictionary of the Targumim, Talmud Bavli, Talmud Yerushalmi, and Midrashic Literature.
The main functions of this tool include:

Abbreviation Expansion: It converts over 150 specialized abbreviations commonly used in Jastrow's Dictionary to their full forms (e.g., "B. Bath." → "Bava Batra", "Ber." → "Berakhot").

Roman Numeral Conversion: It transforms Roman numerals to Arabic numerals in reference citations (e.g., "IV, 6" becomes "4:6").

Intelligent Paragraph Splitting: It breaks text into paragraphs at logical points such as source references and semicolons to improve readability.

The notebook implements a user-friendly interface with:

Text input and output areas
Processing options (toggle for paragraph splitting)

Process and Reset buttons
Example buttons with sample texts demonstrating different abbreviation patterns
Detailed display of all replacements made

The code uses regular expressions extensively for pattern matching and handles special cases like compound abbreviations (e.g., "R. s." for "Rabbah section"). It's clearly designed for scholars and researchers working with Talmudic and rabbinic literature who need to modernize the arcane abbreviation system used in Jastrow's reference work.

The implementation includes features like sorting abbreviations by length (longest first) to avoid partial matches and special handling for text containing Hebrew characters alongside Latin text.

In [10]:
# Enhanced Jastrow Abbreviation Modernizer - Colab Version
# This notebook converts Jastrow Dictionary abbreviations to their full forms,
# converts Roman numerals to Arabic numerals, and intelligently splits text into paragraphs
# Now with special handling for Hebrew text in RTL paragraphs

# Install required libraries if needed
#!pip install ipywidgets -q

import re
import ipywidgets as widgets
from IPython.display import display, HTML

# Function to convert Roman numerals to Arabic
def roman_to_arabic(roman):
    roman_dict = {
        'I': 1, 'V': 5, 'X': 10, 'L': 50,
        'C': 100, 'D': 500, 'M': 1000
    }

    result = 0
    prev_value = 0

    # Iterate through the roman numeral from right to left
    for char in reversed(roman.upper()):
        if char not in roman_dict:
            return roman  # Return original if not a valid Roman numeral

        current_value = roman_dict[char]

        # If current value is greater than or equal to previous, add it
        # Otherwise subtract it (for cases like IV, IX, etc.)
        if current_value >= prev_value:
            result += current_value
        else:
            result -= current_value

        prev_value = current_value

    return result

# Hardcoded mapping table of Jastrow abbreviations
mapping_dict = {
    "a.": "and",
    "a. e.": "and elsewhere",
    "a. fr.": "and frequently",
    "a. l.": "ad locum",
    "a. v. fr.": "and very frequently",
    "Ab.": "Avot",
    "Ab. d'R. N.": "Avot d'Rabbi Natan",
    "Ab. Zar.": "Avodah Zarah",
    "Ag. Hatt.": "Agadot HaTorah",
    "Alf.": "Alfasi",
    "allud.": "alluding",  # Added new mapping
    "Am.": "Amos",
    "Ar.": "Arukh",
    "Ar. Compl.": "Arukh Completum",
    "Arakh.": "Arakhin",
    "B. Bath.": "Bava Batra",
    "B. Kam.": "Bava Kamma",
    "B. Mets.": "Bava Metzia",
    "B. N.": "Beit Natan",
    "Bab.": "Bavli",
    "Bart.": "Bartenora",
    "Bekh.": "Bekhorot",
    "Ber.": "Berakhot",
    "Bets.": "Beitzah",
    "B'ḥuck.": "Bechukotai",
    "Bicc.": "Bikkurim",
    "B'resh.": "Bereishit",
    "B'shall.": "Beshalach",
    "Cant.": "Shir HaShirim",
    "Cant. R.": "Shir HaShirim Rabbah",
    "Ch.": "Chapter",
    "ch.": "chapter",
    "Chald.": "Aramaic",
    "Cmp.": "Compare",
    "cmp.": "compare",
    "contr.": "contracted/contraction",
    "corr.": "correct reading",
    "corr. acc.": "correct accordingly",
    "Dan.": "Daniel",
    "Del.": "Delete",
    "Dem.": "Demai",
    "Deut.": "Deuteronomy",
    "Deut. R.": "Devarim Rabbah",
    "diff.": "different",
    "ed.": "edition",
    "eds.": "editions",
    "e.g.": "for example",
    "Erub.": "Eruvin",
    "esp.": "especially",
    "Esth.": "Esther",
    "Esth. R.": "Esther Rabbah",
    "euphem.": "euphemism",
    "Ex.": "Exodus",
    "Ex. R.": "Shemot Rabbah",
    "expl.": "explained",
    "Ez.": "Ezekiel",
    "Gen.": "Genesis",
    "Gitt.": "Gittin",
    "Hag.": "Hagigah",
    "Hal.": "Hallah",
    "Heb.": "Hebrew",
    "Hor.": "Horayot",
    "Hull.": "Hullin",
    "Is.": "Isaiah",
    "Jer.": "Jeremiah",
    "Jud.": "Judges",
    "Kel.": "Kelim",
    "Ker.": "Keritot",
    "Keth.": "Ketubot",
    "Kidd.": "Kiddushin",
    "Kil.": "Kilayim",
    "Kin.": "Kinnim",
    "Koh.": "Kohelet",
    "Koh. R.": "Kohelet Rabbah",
    "Lam.": "Lamentations",
    "Lam. R.": "Eichah Rabbah",
    "Lev.": "Leviticus",
    "Lev. R.": "Vayikra Rabbah",
    "M.": "Mishnah",
    "Maas. Sh.": "Maaser Sheni",
    "Macc.": "Makkot",
    "Makhsh.": "Makhshirin",
    "Meg.": "Megillah",
    "Meil.": "Meilah",
    "Men.": "Menahot",
    "Midd.": "Middot",
    "Midr.": "Midrash",
    "Midr. Sam.": "Midrash Shmuel",
    "Midr. Till.": "Midrash Tehillim",
    "Mikh.": "Mikvaot",
    "Ms.": "Manuscript",
    "Naz.": "Nazir",
    "Ned.": "Nedarim",
    "Neg.": "Negaim",
    "Neh.": "Nehemiah",
    "Nidd.": "Niddah",
    "Nif.": "Nifal",
    "Nithpa.": "Nithpael",
    "Num.": "Numbers",
    "Num. R.": "Bamidbar Rabbah",
    "Ohol.": "Oholot",
    "Orl.": "Orlah",
    "Par.": "Parah",
    "Pes.": "Pesahim",
    "Pesik.": "Pesikta",
    "Pesik. R.": "Pesikta Rabbati",
    "Pirkê d'R. El.": "Pirkei d'Rabbi Eliezer",
    "Pl.": "Plural",
    "pl.": "plural",
    "pr. n.": "proper name",
    "pr. n. f.": "proper name of a female",
    "pr. n. m.": "proper name of a male",
    "pr. n. pl.": "proper name of a place",
    "Prov.": "Proverbs",
    "Ps.": "Psalms",
    "q.v.": "which see",
    "R.": "Rabbi",
    "R. Hash.": "Rosh Hashanah",
    "R. S.": "R. Samson of Sens",
    "ref.": "reference",
    "Ruth R.": "Ruth Rabbah",
    "s.": "section",
    "S.": "Sifra",
    "Sabb.": "Shabbat",
    "Sam.": "Samuel",
    "Sanh.": "Sanhedrin",
    "Shebi.": "Sheviit",
    "Shebu.": "Shevuot",
    "Shek.": "Shekalim",
    "Sifra.": "Sifra",
    "Sifré.": "Sifre",
    "sing.": "singular",
    "Snh.": "Sanhedrin",
    "Sot.": "Sotah",
    "sub.": "substitute",
    "Succ.": "Sukkah",
    "s. v.": "sub voce (under the word)",
    "Taan.": "Taanit",
    "Tam.": "Tamid",
    "Targ.": "Targum",
    "Targ. Y.": "Targum Yerushalmi",
    "Tem.": "Temurah",
    "Ter.": "Terumot",
    "Tes.": "Teshuvot",
    "Tosef.": "Tosefta",
    "Treat.": "Treatise",
    "Ukts.": "Uktzin",
    "usu.": "usually",
    "v.": "see",
    "Yad.": "Yadayim",
    "Yalk.": "Yalkut",
    "Yeb.": "Yevamot",
    "Y. or Yer.": "Yerushalmi",
    "Y.": "Yerushalmi",
    "Yer.": "Yerushalmi",
    "Zab.": "Zavim",
    "Zeb.": "Zevahim",
    "Zech.": "Zechariah",
    "Zeph.": "Zephaniah",
    "&c.": "etc."  # Added new mapping
}

# List of sources and tractates for paragraph splitting
sources_list = [
    # Tractates of the Mishnah/Talmud
    "Avot", "Avodah Zarah", "Arakhin", "Bava Batra", "Bava Kamma", "Bava Metzia",
    "Berakhot", "Beitzah", "Bekhorot", "Bikkurim", "Demai", "Eruvin", "Gittin",
    "Hagigah", "Hallah", "Horayot", "Hullin", "Keritot", "Ketubot", "Kiddushin",
    "Kilayim", "Kinnim", "Makkot", "Makhshirin", "Megillah", "Meilah", "Menahot",
    "Middot", "Mikvaot", "Nazir", "Nedarim", "Negaim", "Niddah", "Oholot", "Orlah",
    "Parah", "Pesahim", "Rosh Hashanah", "Sanhedrin", "Sheviit", "Shevuot", "Shekalim",
    "Sotah", "Sukkah", "Taanit", "Tamid", "Temurah", "Terumot", "Uktzin", "Yadayim",
    "Yevamot", "Zevahim", "Zavim",

    # Biblical books
    "Genesis", "Exodus", "Leviticus", "Numbers", "Deuteronomy", "Joshua", "Judges",
    "Samuel", "Kings", "Isaiah", "Jeremiah", "Ezekiel", "Hosea", "Joel", "Amos",
    "Obadiah", "Jonah", "Micah", "Nahum", "Habakkuk", "Zephaniah", "Haggai",
    "Zechariah", "Malachi", "Psalms", "Proverbs", "Job", "Song of Songs", "Ruth",
    "Lamentations", "Ecclesiastes", "Esther", "Daniel", "Ezra", "Nehemiah", "Chronicles",

    # Other sources
    "Midrash", "Midrash Tehillim", "Midrash Shmuel", "Sifra", "Sifre", "Tosefta",
    "Targum", "Targum Yerushalmi", "Yalkut", "Bamidbar Rabbah", "Devarim Rabbah",
    "Shemot Rabbah", "Vayikra Rabbah", "Bereishit Rabbah", "Shir HaShirim Rabbah",
    "Kohelet Rabbah", "Ruth Rabbah", "Eichah Rabbah", "Esther Rabbah", "Pesikta",
    "Pesikta Rabbati", "Avot d'Rabbi Natan", "Pirkei d'Rabbi Eliezer",

    # Alternative names/abbreviations (pre-modernized)
    "Ber.", "Shabb.", "Erub.", "Pes.", "Yoma", "Sukk.", "Bets.", "Taan.", "Meg.",
    "M. Kat.", "Hag.", "Yeb.", "Keth.", "Ned.", "Naz.", "Sot.", "Gitt.", "Kidd.",
    "B. Kam.", "B. Mets.", "B. Bath.", "Sanh.", "Macc.", "Shebu.", "Abod. Zar.",
    "Zeb.", "Men.", "Hul.", "Bekh.", "Arak.", "Tem.", "Ker.", "Meil.", "Tam.",
    "Midd.", "Kin.", "Kel.", "Ohol.", "Neg.", "Par.", "Toh.", "Mikv.", "Nidd.",
    "Makhsh.", "Zab.", "Teb.", "Yad.", "Ukts.", "Gen.", "Ex.", "Lev.", "Num.",
    "Deut.", "Josh.", "Judg.", "I Sam.", "II Sam.", "I Kings", "II Kings",
    "Isa.", "Jer.", "Ezek.", "Hos.", "Joel", "Am.", "Obad.", "Jon.", "Mic.",
    "Nah.", "Hab.", "Zeph.", "Hag.", "Zech.", "Mal.", "Ps.", "Prov.", "Job",
    "Song", "Ruth", "Lam.", "Eccl.", "Esth.", "Dan.", "Ezra", "Neh.", "I Chr.",
    "II Chr.", "Targ.", "Tosef.", "Gen. R.", "Ex. R.", "Lev. R.", "Num. R.",
    "Deut. R.", "Cant. R.", "Ruth R.", "Lam. R.", "Eccl. R.", "Esth. R.",
    "Pirke R. El.", "Ab. d'R. N.", "Maas. Sh.", "R. Hash.", "Ab. Zar."
]

# Special compound abbreviations that need custom handling
# This prevents "R. s." from being processed as two separate abbreviations
compound_patterns = {
    r'([A-Z][a-z]+\.) R\. s\.': lambda match: match.group(1) + " Rabbah s.",
    r'Gen\. R\. s\.': "Genesis Rabbah s.",
    r'Lev\. R\. s\.': "Vayikra Rabbah s.",
    r'Num\. R\. s\.': "Bamidbar Rabbah s.",
    r'Deut\. R\. s\.': "Devarim Rabbah s.",
    r'Cant\. R\. s\.': "Shir HaShirim Rabbah s.",
    r'Lam\. R\. s\.': "Eichah Rabbah s.",
    r'Esth\. R\. s\.': "Esther Rabbah s.",
    r'Ruth R\. s\.': "Ruth Rabbah s.",
    r'Koh\. R\. s\.': "Kohelet Rabbah s.",
    r'Pesik\. R\. s\.': "Pesikta Rabbati s.",
    # Add patterns for Yalkut with biblical books
    r'Yalk\.\s+Gen\.': "Yalkut Genesis",
    r'Yalk\.\s+Ex\.': "Yalkut Exodus",
    r'Yalk\.\s+Lev\.': "Yalkut Leviticus",
    r'Yalk\.\s+Num\.': "Yalkut Numbers",
    r'Yalk\.\s+Deut\.': "Yalkut Deuteronomy",
    r'Yalk\.\s+Ps\.': "Yalkut Psalms",
    r'Yalk\.\s+Prov\.': "Yalkut Proverbs",
    r'Yalk\.\s+Is\.': "Yalkut Isaiah",
    # More Biblical citation patterns
    r'cmp\.\s+Gen\.': "compare Genesis"
}

# Helper function to escape regex special characters
def escape_regex(string):
    return re.escape(string)

# Function to convert Roman numeral references to Arabic
def convert_roman_references(text):
    # Pattern to find references like "IV, 6" or "IV,6" (Roman numeral followed by comma and Arabic numeral)
    pattern = r'((?:[IVXLCDMivxlcdm]+)(?:,\s*|\s+))(\d+)'

    def replace_reference(match):
        roman_part = match.group(1).strip().rstrip(',')
        arabic_part = match.group(2)

        # Convert roman to arabic
        try:
            arabic_converted = str(roman_to_arabic(roman_part))
            return f"{arabic_converted}:{arabic_part}"
        except:
            # If conversion fails, return original
            return match.group(0)

    return re.sub(pattern, replace_reference, text)

# NEW FUNCTION: Format Hebrew text into RTL paragraphs
def format_hebrew_paragraphs(text):
    if not text:
        return text

    # Define a pattern for Hebrew characters
    hebrew_char_pattern = re.compile(r'[\u0590-\u05FF]')

    # Split the text into words and spaces
    tokens = re.findall(r'\S+|\s+', text)

    result = []
    current_hebrew_run = []
    in_hebrew_run = False

    for token in tokens:
        # Check if this token contains Hebrew characters
        has_hebrew = hebrew_char_pattern.search(token) is not None

        if has_hebrew:
            # If we're not already in a Hebrew run, start one
            if not in_hebrew_run:
                in_hebrew_run = True
                # Add a newline before the Hebrew run
                if result and not result[-1].endswith('\n'):
                    result.append('\n')

            # Add this token to the current Hebrew run
            current_hebrew_run.append(token)
        else:
            # If we're in a Hebrew run and this is whitespace, keep it in the run
            if in_hebrew_run and token.isspace():
                current_hebrew_run.append(token)
            # If we're in a Hebrew run and this is not whitespace, end the run
            elif in_hebrew_run:
                # Add the Hebrew run to the result
                result.append(''.join(current_hebrew_run))
                # Add a newline after the Hebrew run
                result.append('\n')

                # Reset for the next run
                current_hebrew_run = []
                in_hebrew_run = False

                # Add this token to the result
                result.append(token)
            else:
                # Not in a Hebrew run, just add the token
                result.append(token)

    # Handle any remaining Hebrew run at the end
    if current_hebrew_run:
        result.append(''.join(current_hebrew_run))
        result.append('\n')

    # Join everything and clean up extra newlines
    joined = ''.join(result)
    cleaned = re.sub(r'\n{3,}', '\n\n', joined)  # Replace 3+ newlines with just 2

    return cleaned

# Apply the compound patterns to handle "R. s." correctly
def apply_compound_patterns(text, replacements_list):
    result = text
    for pattern, replacement in compound_patterns.items():
        if callable(replacement):
            # For patterns with a lambda function
            new_result = re.sub(pattern, replacement, result)
            if new_result != result:
                diff = re.findall(pattern, result)
                for match in diff:
                    if isinstance(match, tuple):
                        original = ''.join(match)
                        replaced = replacement(re.match(pattern, original))
                    else:
                        original = match
                        replaced = re.sub(pattern, replacement, original)
                    replacements_list.append(f"'{original}' → '{replaced}'")
                result = new_result
        else:
            # For static replacements
            new_result = re.sub(pattern, replacement, result)
            if new_result != result:
                diff = re.findall(pattern, result)
                for match in diff:
                    replacements_list.append(f"'{match}' → '{replacement}'")
                result = new_result
    return result

# Helper function to track replacements
def replacement_and_log(original, replacement, replacements_list):
    replacements_list.append(f"'{original}' → '{replacement}'")
    return replacement

# Process abbreviations with context awareness
def modernize_abbreviations(text):
    if not text:
        return {"modernized": "", "replacements": []}

    modernized = text
    replacements_list = []

    # First apply compound patterns to handle special cases
    modernized = apply_compound_patterns(modernized, replacements_list)

    # Sort abbreviations by length (longest first) to avoid partial matches
    sorted_abbrs = sorted(mapping_dict.keys(), key=len, reverse=True)

    for abbr in sorted_abbrs:
        # Skip 's.' after 'R.' as it's handled by compound patterns
        if abbr == "s." and re.search(r'R\.\s+s\.', modernized):
            continue

        # Pattern 1: Abbreviation followed by space
        pattern1 = re.compile(r'%s\s' % escape_regex(abbr))
        modernized = pattern1.sub(lambda match:
                                  replacement_and_log(match.group(), f"{mapping_dict[abbr]} ", replacements_list),
                                  modernized)

        # Pattern 2: Abbreviation followed by punctuation
        pattern2 = re.compile(r'%s([,;.:\)\]\}])' % escape_regex(abbr))
        modernized = pattern2.sub(lambda match:
                                 replacement_and_log(match.group(), f"{mapping_dict[abbr]}{match.group(1)}", replacements_list),
                                 modernized)

        # Pattern 3: Abbreviation at end of string
        pattern3 = re.compile(r'%s$' % escape_regex(abbr))
        modernized = pattern3.sub(lambda match:
                                 replacement_and_log(match.group(), mapping_dict[abbr], replacements_list),
                                 modernized)

    return {"modernized": modernized, "replacements": replacements_list}

# Function to intelligently split text into paragraphs
def split_into_paragraphs(text):
    if not text:
        return text

    # First split by semicolons
    parts = text.split(';')
    new_parts = []

    # Process each part to identify source citations
    for part in parts:
        if not part.strip():
            continue

        # Look for source references to further split if needed
        source_pattern = r'(?<=[.!?])\s+(?=[A-Z])'
        subparts = re.split(source_pattern, part)

        for subpart in subparts:
            # Check if this subpart contains a source reference
            contains_source = False

            # Check for common citation patterns in both original and modernized text
            for source in sources_list:
                # Look for the source at the beginning of the text
                # We want to match patterns like "Ber. 34b" or "Berakhot 34b"
                if re.search(r'(^|\s)' + re.escape(source) + r'\.?\s+\d', subpart):
                    contains_source = True
                    break

            # Look for reference patterns like "Y. Ber." or "b. Sanh."
            if re.search(r'(^|\s)([YB]\.|\b[Yy]er\.|\b[Bb]ab\.)\s+[A-Z][a-z]+\.', subpart):
                contains_source = True

            new_parts.append(subpart.strip())

    # Join with newlines for paragraph separation
    result = '\n'.join(new_parts)

    # Additional pattern matching for new sources
    source_patterns = [
        # Match full tractate names with folio references
        r'((?:^|\.\s+)(?:' + '|'.join(re.escape(s) for s in sources_list) + r')\s+\d+[ab]?\s+)',
        # Match abbreviations for sources with folio references
        r'((?:^|\.\s+)(?:[A-Z][a-z]*\.(?:\s+[A-Z][a-z]*\.)?\s+\d+[ab]?\s+))',
        # Match Targum, Midrash, etc. references
        r'((?:^|\.\s+)(?:Targ|Midr|Sifr[ae]|Yalk|Pesik)\.(?:\s+[A-Z][a-z]*\.)?\s+)',
        # Match biblical book references with chapter and verse
        r'((?:^|\.\s+)(?:Gen|Ex|Lev|Num|Deut|Josh|Judg|Sam|Kgs|Is|Jer|Ezek|Hos|Joel|Am|Obad|Jon|Mic|Nah|Hab|Zeph|Hag|Zech|Mal|Ps|Prov|Job|Song|Ruth|Lam|Eccl|Esth|Dan|Ezra|Neh|Chr)\.(?:\s+\d+:\d+\s+))'
    ]

    for pattern in source_patterns:
        result = re.sub(pattern, r'\n\1', result)

    # Handle special case for emdash followed by a source
    result = re.sub(r'(—\s*)([A-Z][a-z]+\.)', r'\1\n\2', result)

    # Final clean-up: remove redundant newlines and fix spacing
    result = re.sub(r'\n\s+', '\n', result)
    result = re.sub(r'\n{2,}', '\n', result)

    return result

# Function to process text with abbreviation modernization, Roman numeral conversion, and Hebrew RTL formatting
def process_jastrow_text(input_text):
    # First modernize abbreviations
    result = modernize_abbreviations(input_text)
    modernized_text = result["modernized"]

    # Then convert Roman numerals
    text_with_arabic = convert_roman_references(modernized_text)

    # Add any Roman numeral conversions to the replacements list
    if text_with_arabic != modernized_text:
        # Track which Roman numerals were converted
        original_sections = re.findall(r'([IVXLCDMivxlcdm]+)(?:,\s*|\s+)(\d+)', modernized_text)

        for roman, arabic_num in original_sections:
            try:
                arabic_roman = roman_to_arabic(roman)
                result["replacements"].append(f"'{roman}, {arabic_num}' → '{arabic_roman}:{arabic_num}'")
            except:
                pass

    # Format Hebrew text into separate paragraphs if option enabled
    if hebrew_formatting_option.value:
        text_with_hebrew_formatted = format_hebrew_paragraphs(text_with_arabic)
        hebrew_formatted = text_with_hebrew_formatted != text_with_arabic
    else:
        text_with_hebrew_formatted = text_with_arabic
        hebrew_formatted = False

    # Finally, split into paragraphs if option enabled
    if paragraph_option.value:
        paragraphed_text = split_into_paragraphs(text_with_hebrew_formatted)
        paragraphed = paragraphed_text != text_with_hebrew_formatted
    else:
        paragraphed_text = text_with_hebrew_formatted
        paragraphed = False

    return {
        "modernized": paragraphed_text,
        "replacements": result["replacements"],
        "paragraphed": paragraphed,
        "hebrew_formatted": hebrew_formatted
    }

# Create the input widget
input_text = widgets.Textarea(
    placeholder='Paste Jastrow entry here...',
    description='Input:',
    disabled=False,
    layout=widgets.Layout(width='100%', height='150px')
)

# Create output widgets
output_text = widgets.Textarea(
    placeholder='Modernized text will appear here...',
    description='Result:',
    disabled=True,
    layout=widgets.Layout(width='100%', height='250px')  # Increased height for paragraphs
)

replacements_output = widgets.HTML(
    value='',
    placeholder='Replacements will be listed here...',
    description='Changes:',
)

# Create processing options
paragraph_option = widgets.Checkbox(
    value=True,
    description='Enable paragraph splitting',
    disabled=False
)

# NEW OPTION: Add checkbox for Hebrew formatting
hebrew_formatting_option = widgets.Checkbox(
    value=True,
    description='Format Hebrew text in separate RTL paragraphs',
    disabled=False
)

# Create buttons
process_button = widgets.Button(
    description='Process Text',
    disabled=False,
    button_style='primary',
    tooltip='Click to modernize the text',
    icon='check'
)

reset_button = widgets.Button(
    description='Reset',
    disabled=False,
    button_style='',
    tooltip='Click to reset all fields',
    icon='refresh'
)

# Example buttons
example_buttons = [
    widgets.Button(description=f'Example {i+1}',
                  layout=widgets.Layout(width='auto'),
                  style=widgets.ButtonStyle(button_color='lightgray'))
    for i in range(5)
]

# Example texts
examples = [
    "Koh. R. beg.; a. fr. (Midr. Till. to Ps. I)",
    "Y. Meg. III, 74 a bot. rendered in a secret political letter",
    "Ruth R. to I, 2. Midr. Sam. ch. I.",
    "or. acc.) wild (opp. אִימִירוֹן q. v.); rough . Gen. R. s. 77; Cant. R. to III, 6 כלב א'. Num. R. s. 11 (refer. to Gen. III, 8) שומע הקול א' after sinning, Adam heard the divine voice as a harsh one. Cant. R. to III, 7 (corr. acc.). Pesik. R. s. 15 בזעף א' וכ' … (leave out hebr. words as glosses to explain the Greek).",
    "R. Hash. 23 a מאי קדרוס (ם) אדרא Ms. M. (ed. קתרוס) what is kedros ? Adara . Snh. 108 b what is gofer? רב אמר אדרא דבי ר' שילא אמר וכ' Ar. a Ms. Fl. (v. Rabb. D. S. a. l.); cmp. Gen. R. s. 31; Yalk. Gen. 51.—Bets. 15 b יטע אדר וכ' let him plant an edar (allud. to addir &c., Ps. XCIII, 4); א\"נ אדרא וכ' or adara as its (popular or Chald.) name is; as people say, it is called adara because it lasts for generations (א־דרא). Git. 69 b אטרף א' leaves of ad. Ib. מיא דא' decoct thereof. Targ. II, Esth. VII, 9 (to which perhaps belongs. Git. l. c.)."
]

# Button click handlers
def on_process_button_clicked(b):
    result = process_jastrow_text(input_text.value)
    output_text.value = result["modernized"]

    # Format replacements as HTML list
    replacements_html = ""

    if result["replacements"]:
        replacements_html += "<ul>"
        for replacement in result["replacements"]:
            replacements_html += f"<li>{replacement}</li>"
        replacements_html += "</ul>"

    # Add notes about text modifications
    if result["paragraphed"] and paragraph_option.value:
        replacements_html += "<p><em>Text has been split into paragraphs at source references and semicolons.</em></p>"

    if result["hebrew_formatted"] and hebrew_formatting_option.value:
        replacements_html += "<p><em>Hebrew text has been separated into right-to-left paragraphs.</em></p>"

    if replacements_html:
        replacements_output.value = replacements_html
    else:
        replacements_output.value = "<p>No modifications were made.</p>"

def on_reset_button_clicked(b):
    input_text.value = ''
    output_text.value = ''
    replacements_output.value = ''

def create_example_handler(example_text):
    def handler(b):
        input_text.value = example_text
    return handler

# Connect handlers to buttons
process_button.on_click(on_process_button_clicked)
reset_button.on_click(on_reset_button_clicked)

for i, button in enumerate(example_buttons):
    button.on_click(create_example_handler(examples[i]))

# Create UI layout
header = widgets.HTML(value="<h1>Jastrow Dictionary Abbreviation Modernizer</h1>")
options_label = widgets.HTML(value="<p><b>Options:</b></p>")
options_box = widgets.HBox([paragraph_option, hebrew_formatting_option])  # Added new option to UI
examples_label = widgets.HTML(value="<p><b>Try these examples:</b></p>")
examples_box = widgets.HBox(example_buttons)
button_box = widgets.HBox([process_button, reset_button])
footer = widgets.HTML(
    value="""<div style="margin-top: 30px; border-top: 1px solid #ccc; padding-top: 10px;">
    <p><b>About this tool:</b></p>
    <p>This tool modernizes abbreviations from Jastrow's Dictionary of the Targumim,
       Talmud Bavli, Talmud Yerushalmi and Midrashic Literature.</p>
    <p>Features:</p>
    <ul>
      <li>Expands 158 common abbreviations used in the dictionary</li>
      <li>Converts Roman numerals to Arabic format (e.g., "IV, 6" becomes "4:6")</li>
      <li>Intelligently splits text into paragraphs at source references and semicolons</li>
      <li>Places Hebrew text in separate paragraphs with right-to-left direction</li>
    </ul>
    </div>"""
)

# Display the UI
display(header)
display(input_text)
display(options_label)
display(options_box)
display(button_box)
display(examples_label)
display(examples_box)
display(widgets.HTML(value="<h3>Results:</h3>"))
display(output_text)
display(replacements_output)
display(footer)

# Test with the paragraph example
test_text = "R. Hash. 23 a מאי קדרוס (ם) אדרא Ms. M. (ed. קתרוס) what is kedros ? Adara . Snh. 108 b what is gofer? רב אמר אדרא דבי ר' שילא אמר וכ' Ar. a Ms. Fl. (v. Rabb. D. S. a. l.); cmp. Gen. R. s. 31; Yalk. Gen. 51.—Bets. 15 b יטע אדר וכ' let him plant an edar (allud. to addir &c., Ps. XCIII, 4); א\"נ אדרא וכ' or adara as its (popular or Chald.) name is; as people say, it is called adara because it lasts for generations (א־דרא). Git. 69 b אטרף א' leaves of ad. Ib. מיא דא' decoct thereof. Targ. II, Esth. VII, 9 (to which perhaps belongs. Git. l. c.)."

print("\nTest with Hebrew formatting and paragraph splitting example:")
result = process_jastrow_text(test_text)
print("Input:", test_text)
print("\nOutput with paragraphs and Hebrew formatting:")
print(result["modernized"])
print("\nReplacements made:")
for replacement in result["replacements"]:
    print(f"- {replacement}")

HTML(value='<h1>Jastrow Dictionary Abbreviation Modernizer</h1>')

Textarea(value='', description='Input:', layout=Layout(height='150px', width='100%'), placeholder='Paste Jastr…

HTML(value='<p><b>Options:</b></p>')

HBox(children=(Checkbox(value=True, description='Enable paragraph splitting'), Checkbox(value=True, descriptio…

HBox(children=(Button(button_style='primary', description='Process Text', icon='check', style=ButtonStyle(), t…

HTML(value='<p><b>Try these examples:</b></p>')

HBox(children=(Button(description='Example 1', layout=Layout(width='auto'), style=ButtonStyle(button_color='li…

HTML(value='<h3>Results:</h3>')

Textarea(value='', description='Result:', disabled=True, layout=Layout(height='250px', width='100%'), placehol…

HTML(value='', description='Changes:', placeholder='Replacements will be listed here...')

HTML(value='<div style="margin-top: 30px; border-top: 1px solid #ccc; padding-top: 10px;">\n    <p><b>About th…


Test with Hebrew formatting and paragraph splitting example:
Input: R. Hash. 23 a מאי קדרוס (ם) אדרא Ms. M. (ed. קתרוס) what is kedros ? Adara . Snh. 108 b what is gofer? רב אמר אדרא דבי ר' שילא אמר וכ' Ar. a Ms. Fl. (v. Rabb. D. S. a. l.); cmp. Gen. R. s. 31; Yalk. Gen. 51.—Bets. 15 b יטע אדר וכ' let him plant an edar (allud. to addir &c., Ps. XCIII, 4); א"נ אדרא וכ' or adara as its (popular or Chald.) name is; as people say, it is called adara because it lasts for generations (א־דרא). Git. 69 b אטרף א' leaves of ad. Ib. מיא דא' decoct thereof. Targ. II, Esth. VII, 9 (to which perhaps belongs. Git. l. c.).

Output with paragraphs and Hebrew formatting:

Rosh Hashanah 23 a 
מאי קדרוס (ם) אדרא 
Manuscript Mishnah (edition 
קתרוס) 
what is kedros ?
Adara 
.
Sanhedrin 108 b what is gofer? 
רב אמר אדרא דבי ר' שילא אמר וכ' 
Arukh a Manuscript Fl. (see Rabb.
D.
Sifra ad locum)
compare Genesis Rabbah section 31
Yalkut Genesis 51.—Beitzah 15 b 
יטע אדר וכ' 
let him plant an edar (alluding to 