# üåç Exploring the Arabic WordNet (AWN)

This notebook provides a comprehensive exploration of the Arabic WordNet using the `wn` Python library.

**Arabic WordNet (AWN v2)** is part of the Open Multilingual Wordnet collection and provides Arabic lexical data linked to the Princeton WordNet through the Interlingual Index (ILI).

## Table of Contents
1. [Setup & Installation](#setup)
2. [Loading the Arabic WordNet](#loading)
3. [Basic Statistics](#statistics)
4. [Exploring Arabic Words](#words)
5. [Exploring Synsets](#synsets)
6. [Navigating the Taxonomy](#taxonomy)
7. [Cross-Lingual Analysis (Arabic ‚Üî English)](#crosslingual)
8. [Semantic Relations](#relations)
9. [Similarity Measures](#similarity)
10. [Advanced Analysis](#advanced)


<a id='setup'></a>
## 1. Setup & Installation


In [5]:
# Install wn if not already installed
# !pip install wn
!pip install --upgrade wn

Collecting wn
  Using cached wn-0.14.0-py3-none-any.whl.metadata (15 kB)
Using cached wn-0.14.0-py3-none-any.whl (86 kB)
Installing collected packages: wn
  Attempting uninstall: wn
    Found existing installation: wn 0.9.1
    Uninstalling wn-0.9.1:
      Successfully uninstalled wn-0.9.1
Successfully installed wn-0.14.0


In [19]:
# NOTE: If you upgraded wn and still see old version, restart the kernel:
#       Kernel ‚Üí Restart Kernel (or press 0,0 in command mode)

import wn
from wn import taxonomy, similarity
from collections import Counter
import warnings

# Suppress warnings for cleaner output
warnings.filterwarnings('ignore')

print(f"wn version: {wn.__version__}")

# Should show 0.14.0 after kernel restart


wn version: 0.14.0


### Download Required WordNets

We'll download:
- **omw-arb:1.4** - Arabic WordNet (AWN v2)
- **omw-en:1.4** - OMW English WordNet (required for relations/taxonomy)


In [16]:
# Download Arabic WordNet and its English dependency
print("Downloading Arabic WordNet...")
wn.download('omw-arb:1.4')

print("\nDownloading English WordNet (for relations)...")
wn.download('omw-en:1.4')


Downloading Arabic WordNet...

Downloading English WordNet (for relations)...


[KCached file found: /Users/salahmac/.wn_data/downloads/4cb8d2182ddb97c9a9d865fe4b0fd7c7a74d6a21
[KSkipping omw-arb:1.4 (Arabic WordNet (AWN v2)); already addedc7hqf6_y/omw-arb/omw-arb.xml

[KCached file found: /Users/salahmac/.wn_data/downloads/3334cfd8709f5032fe246261d73528528c2542fa
[KSkipping omw-en:1.4 (OMW English Wordnet based on WordNet 3.0); already addedomw-en.xml



PosixPath('/Users/salahmac/.wn_data/downloads/3334cfd8709f5032fe246261d73528528c2542fa')

<a id='loading'></a>
## 2. Loading the Arabic WordNet


In [17]:
# List all installed lexicons
print("Installed Lexicons:")
print("-" * 60)
for lex in wn.lexicons():
    print(f"{lex.id:15} v{lex.version:5} [{lex.language:3}] {lex.label}")


Installed Lexicons:
------------------------------------------------------------
omw-arb         v1.4   [arb] Arabic WordNet (AWN v2)
omw-en          v1.4   [en ] OMW English Wordnet based on WordNet 3.0
ewn             v2020  [en ] English WordNet
test-lex        v1.0   [en ] Test Lexicon
test-lex-v2     v1.0   [en ] Test Lexicon V2
test-fe25e5     v1.0   [en ] Test


In [18]:
# Create Wordnet instances
arb = wn.Wordnet('omw-arb:1.4')
en = wn.Wordnet('omw-en:1.4')

# Get lexicon info
arb_lex = arb.lexicons()[0]
print("Arabic WordNet Info:")
print(f"  ID: {arb_lex.id}")
print(f"  Version: {arb_lex.version}")
print(f"  Label: {arb_lex.label}")
print(f"  Language: {arb_lex.language}")
print(f"  License: {arb_lex.license}")


Arabic WordNet Info:
  ID: omw-arb
  Version: 1.4
  Label: Arabic WordNet (AWN v2)
  Language: arb
  License: https://creativecommons.org/licenses/by-sa/3.0/


In [25]:
# Helper function to get definitions (falls back to English via ILI)
def get_definition(synset, en_wordnet=None):
    """Get definition for a synset, falling back to English if Arabic is missing."""
    # Try Arabic definition first
    defn = synset.definition()
    if defn:
        return defn, 'ar'
    
    # Fall back to English via ILI
    if en_wordnet and synset.ili:
        en_synsets = en_wordnet.synsets(ili=synset.ili.id)
        if en_synsets:
            en_defn = en_synsets[0].definition()
            if en_defn:
                return en_defn, 'en'
    
    return "(no definition)", None

def get_english_words(synset, en_wordnet):
    """Get English word equivalents for an Arabic synset via ILI."""
    if synset.ili:
        en_synsets = en_wordnet.synsets(ili=synset.ili.id)
        if en_synsets:
            return [w.lemma() for w in en_synsets[0].words()][:5]
    return []

print("‚úì Helper functions defined for cross-lingual definitions")


‚úì Helper functions defined for cross-lingual definitions


In [5]:
# Check dependencies - Arabic WN uses English WN for relations
print("Dependencies:")
print(f"  Requires: {arb_lex.requires()}")
print(f"  Expanded lexicons: {arb.expanded_lexicons()}")


Dependencies:
  Requires: {'omw-en:1.4': <Lexicon omw-en:1.4 [en]>}
  Expanded lexicons: [<Lexicon omw-en:1.4 [en]>]


<a id='statistics'></a>
## 3. Basic Statistics


In [20]:
def get_wordnet_stats(wordnet, name="WordNet"):
    """Calculate comprehensive statistics for a wordnet."""
    stats = {
        'name': name,
        'words': len(wordnet.words()),
        'senses': len(wordnet.senses()),
        'synsets': len(wordnet.synsets()),
    }
    
    # By part of speech
    pos_labels = {'n': 'Nouns', 'v': 'Verbs', 'a': 'Adjectives', 'r': 'Adverbs', 's': 'Adj. Satellites'}
    for pos, label in pos_labels.items():
        stats[f'words_{pos}'] = len(wordnet.words(pos=pos))
        stats[f'synsets_{pos}'] = len(wordnet.synsets(pos=pos))
    
    return stats

# Get stats for Arabic
arb_stats = get_wordnet_stats(arb, "Arabic WN")

print("üìä Arabic WordNet Statistics")
print("=" * 40)
print(f"Total Words:    {arb_stats['words']:,}")
print(f"Total Senses:   {arb_stats['senses']:,}")
print(f"Total Synsets:  {arb_stats['synsets']:,}")
print()
print("By Part of Speech:")
print("-" * 40)
print(f"  Nouns:           {arb_stats['words_n']:,} words, {arb_stats['synsets_n']:,} synsets")
print(f"  Verbs:           {arb_stats['words_v']:,} words, {arb_stats['synsets_v']:,} synsets")
print(f"  Adjectives:      {arb_stats['words_a']:,} words, {arb_stats['synsets_a']:,} synsets")
print(f"  Adverbs:         {arb_stats['words_r']:,} words, {arb_stats['synsets_r']:,} synsets")
print(f"  Adj. Satellites: {arb_stats['words_s']:,} words, {arb_stats['synsets_s']:,} synsets")


üìä Arabic WordNet Statistics
Total Words:    18,003
Total Senses:   37,342
Total Synsets:  9,916

By Part of Speech:
----------------------------------------
  Nouns:           10,344 words, 6,884 synsets
  Verbs:           6,728 words, 2,484 synsets
  Adjectives:      693 words, 443 synsets
  Adverbs:         238 words, 105 synsets
  Adj. Satellites: 0 words, 0 synsets


In [21]:
# Compare with English WordNet
en_stats = get_wordnet_stats(en, "English WN")

print("\nüìà Comparison: Arabic vs English WordNet")
print("=" * 50)
print(f"{'Metric':<20} {'Arabic':>12} {'English':>12} {'Ratio':>10}")
print("-" * 50)
for key in ['words', 'senses', 'synsets']:
    arb_val = arb_stats[key]
    en_val = en_stats[key]
    ratio = arb_val / en_val * 100 if en_val > 0 else 0
    print(f"{key.capitalize():<20} {arb_val:>12,} {en_val:>12,} {ratio:>9.1f}%")



üìà Comparison: Arabic vs English WordNet
Metric                     Arabic      English      Ratio
--------------------------------------------------
Words                      18,003      156,584      11.5%
Senses                     37,342      206,978      18.0%
Synsets                     9,916      117,659       8.4%


In [22]:
# Polysemy analysis
def analyze_polysemy(wordnet):
    """Analyze how many senses each word has."""
    sense_counts = [len(word.senses()) for word in wordnet.words()]
    
    if not sense_counts:
        return {}
    
    return {
        'monosemous': sum(1 for c in sense_counts if c == 1),
        'polysemous': sum(1 for c in sense_counts if c > 1),
        'avg_senses': sum(sense_counts) / len(sense_counts),
        'max_senses': max(sense_counts),
        'distribution': Counter(sense_counts)
    }

polysemy = analyze_polysemy(arb)

print("\nüìù Polysemy Analysis (Arabic)")
print("=" * 40)
print(f"Monosemous words (1 sense):  {polysemy['monosemous']:,}")
print(f"Polysemous words (>1 sense): {polysemy['polysemous']:,}")
print(f"Average senses per word:     {polysemy['avg_senses']:.2f}")
print(f"Maximum senses for a word:   {polysemy['max_senses']}")

print("\nSense count distribution (top 10):")
for count, freq in sorted(polysemy['distribution'].items())[:10]:
    bar = '‚ñà' * min(freq // 100, 50)
    print(f"  {count} sense(s): {freq:>5} words {bar}")



üìù Polysemy Analysis (Arabic)
Monosemous words (1 sense):  12,591
Polysemous words (>1 sense): 5,412
Average senses per word:     2.07
Maximum senses for a word:   48

Sense count distribution (top 10):
  1 sense(s): 12591 words ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà
  2 sense(s):  2288 words ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà
  3 sense(s):  1075 words ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà
  4 sense(s):   625 words ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà
  5 sense(s):   337 words ‚ñà‚ñà‚ñà
  6 sense(s):   219 words ‚ñà‚ñà
  7 sense(s):   141 words ‚ñà
  8 sense(s):   130 words ‚ñà
  9 sense(s):    85 words 
  10 sense(s):    66 words 


<a id='words'></a>
## 4. Exploring Arabic Words


In [23]:
# Sample some Arabic words
print("üìö Sample Arabic Words")
print("=" * 60)

sample_words = list(arb.words())[:20]
for word in sample_words:
    senses = word.senses()
    print(f"  {word.lemma():20} (POS: {word.pos}) - {len(senses)} sense(s)")


üìö Sample Arabic Words
  ÿ£ŸàŸëŸÑŸêŸä               (POS: a) - 1 sense(s)
  ÿ∏ŸÑŸíŸÖÿßÿ°               (POS: n) - 1 sense(s)
  ÿØŸèŸáŸíŸÖÿ©               (POS: n) - 1 sense(s)
  ŸÉŸêŸäŸíŸÑŸèŸà ŸÖŸêÿ™Ÿíÿ±        (POS: n) - 1 sense(s)
  ÿ¥ÿßÿ±ŸêŸä                (POS: n) - 1 sense(s)
  ÿ¥ÿßÿ¶ŸêŸÉ ÿßŸÑÿ¨ŸêŸÑŸíÿØ        (POS: n) - 1 sense(s)
  ÿ¥ÿ∞ÿß                  (POS: n) - 1 sense(s)
  ÿ¥ÿ£ŸÜ                  (POS: n) - 5 sense(s)
  ÿ¥ÿ£ŸíŸÜ                 (POS: n) - 1 sense(s)
  ÿ¥ÿßÿ°                  (POS: v) - 3 sense(s)
  ÿ¥ÿßÿ∞Ÿë                 (POS: n) - 1 sense(s)
  ÿ¥ÿßÿπ                  (POS: v) - 2 sense(s)
  ÿ¥ÿßÿπŸêÿ±                (POS: n) - 1 sense(s)
  ÿ¥ÿßÿ≠ŸêŸÜÿ©               (POS: n) - 1 sense(s)
  ÿ¥ÿßÿ∑ÿ±                 (POS: v) - 3 sense(s)
  ÿ¥ÿßÿ∑ÿ± ÿßŸÑÿ£ÿ≥Ÿâ           (POS: v) - 1 sense(s)
  ÿ¥ÿßÿ∑Ÿêÿ¶                (POS: n) - 2 sense(s)
  ÿ¥ÿßÿ∑Ÿêÿ¶ ÿßŸÑÿ®ÿ≠Ÿíÿ±         (POS: n) - 1 sense(s)
  ÿ¥ÿßÿ®                  (POS: n) - 1 sense(s)
  ÿ¥ÿßÿ®  

In [26]:
# Look up specific Arabic words
# Common Arabic words to explore
test_words = ['ŸÉÿ™ÿßÿ®', 'ŸÖÿßÿ°', 'ÿ¥ŸÖÿ≥', 'ŸÇŸÖÿ±', 'ÿ®Ÿäÿ™', 'ÿ≠ÿ®', 'ÿπŸÑŸÖ', 'ÿπŸÖŸÑ', 'ŸÉÿ®Ÿäÿ±', 'ÿµÿ∫Ÿäÿ±']

print("üîç Looking up Common Arabic Words")
print("=" * 70)

for arabic_word in test_words:
    words = arb.words(arabic_word)
    if words:
        for word in words:
            print(f"\n‚úì {word.lemma()} (POS: {word.pos})")
            for sense in word.senses():
                synset = sense.synset()
                defn, lang = get_definition(synset, en)
                en_words = get_english_words(synset, en)
                
                # Format output with language indicator and English words
                lang_tag = f"[{lang}]" if lang else ""
                en_tag = f" ({', '.join(en_words[:3])})" if en_words else ""
                
                display_defn = f"{defn[:55]}..." if len(defn) > 55 else defn
                print(f"    ‚Üí {lang_tag} {display_defn}{en_tag}")
    else:
        print(f"\n‚úó '{arabic_word}' not found")


üîç Looking up Common Arabic Words

‚úì ŸÉŸêÿ™ÿßÿ® (POS: n)
    ‚Üí [en] a written message addressed to a person or organization (letter, missive)
    ‚Üí [en] a written work or composition that has been published (... (book)
    ‚Üí [en] a number of sheets (ticket or stamps etc.) bound togeth... (book)
    ‚Üí [en] physical objects consisting of a number of pages bound ... (book, volume)
    ‚Üí [en] a major division of a long written composition (book)

‚úì ŸÖÿßÿ° (POS: n)
    ‚Üí [en] once thought to be one of four elements composing the u... (water)
    ‚Üí [en] liquid excretory product (urine, piss, pee)
    ‚Üí [en] a facility that provides a source of water (water system, water supply, water)
    ‚Üí [en] binary compound that occurs at room temperature as a cl... (water, H2O)
    ‚Üí [en] the part of the earth's surface covered with water (suc... (body of water, water)
    ‚Üí [en] a liquid necessary for the life of most animals and pla... (water)

‚úì ÿ¥ŸÖŸíÿ≥ (POS: n)
    ‚Üí

In [27]:
# Find the most polysemous Arabic words
print("\nüèÜ Most Polysemous Arabic Words")
print("=" * 50)

word_senses = [(word, len(word.senses())) for word in arb.words()]
word_senses.sort(key=lambda x: x[1], reverse=True)

for word, num_senses in word_senses[:15]:
    print(f"  {word.lemma():25} - {num_senses} senses")



üèÜ Most Polysemous Arabic Words
  ÿ£ÿØÿ±ŸÉ                      - 48 senses
  ÿ£ÿ´ÿßÿ±                      - 47 senses
  ŸÅÿµŸÑ                       - 46 senses
  ÿ≠ŸÖŸÑ                       - 45 senses
  ÿ≠ŸÇŸÇ                       - 45 senses
  ÿ±ÿßŸÅŸÇ                      - 44 senses
  ÿ≠ŸàŸÑ                       - 43 senses
  ŸÜÿ¥ÿ±                       - 42 senses
  ÿ≠ÿØÿ´                       - 41 senses
  ÿ∏Ÿáÿ±                       - 41 senses
  ÿ™ÿ±ŸÉ                       - 41 senses
  ÿπÿ®ÿ±                       - 39 senses
  ÿ±ÿ≠ŸÑ                       - 39 senses
  ŸàŸÇÿπ                       - 39 senses
  ŸÅŸáŸÖ                       - 38 senses


## üî¨ Deep Dive: Highly Polysemous Words (8+ Senses)

Let's explore words that have many meanings (polysemy). Words with 8 or more senses are particularly interesting as they often represent core vocabulary with rich semantic extensions.


In [28]:
# Explore highly polysemous words (8+ senses)
print("üî¨ Deep Dive: Words with 8+ Senses")
print("=" * 70)

# Find words with 8 or more senses
highly_polysemous = [(word, len(word.senses())) for word in arb.words() if len(word.senses()) >= 8]
highly_polysemous.sort(key=lambda x: x[1], reverse=True)

print(f"Found {len(highly_polysemous)} words with 8+ senses\n")

for word, num_senses in highly_polysemous:
    print(f"\n{'='*70}")
    print(f"üìù {word.lemma()} ({word.pos}) - {num_senses} senses")
    print("=" * 70)
    
    for i, sense in enumerate(word.senses(), 1):
        synset = sense.synset()
        defn, lang = get_definition(synset, en)
        en_words = get_english_words(synset, en)
        
        # Format English equivalent
        en_equiv = f" ‚Üí EN: {', '.join(en_words[:3])}" if en_words else ""
        lang_tag = f"[{lang}]" if lang else ""
        
        print(f"\n  Sense {i}:{en_equiv}")
        print(f"    Definition {lang_tag}: {defn[:75]}{'...' if len(defn) > 75 else ''}")
        
        # Show examples if available
        examples = synset.examples()
        if examples:
            print(f"    Example: {examples[0][:60]}{'...' if len(str(examples[0])) > 60 else ''}")


üî¨ Deep Dive: Words with 8+ Senses
Found 727 words with 8+ senses


üìù ÿ£ÿØÿ±ŸÉ (v) - 48 senses

  Sense 1: ‚Üí EN: feel
    Definition [en]: have a feeling or perception about oneself in reaction to someone's behavio...

  Sense 2: ‚Üí EN: learn, hear, get word
    Definition [en]: get to know or become aware of, usually accidentally

  Sense 3: ‚Üí EN: follow, fall out
    Definition [en]: come as a logical consequence; follow logically

  Sense 4: ‚Üí EN: pass, overtake, overhaul
    Definition [en]: travel past

  Sense 5: ‚Üí EN: meet, run into, encounter
    Definition [en]: come together

  Sense 6: ‚Üí EN: recognize, recognise, realize
    Definition [en]: be fully aware or cognizant of

  Sense 7: ‚Üí EN: recognize, recognise
    Definition [en]: show approval or appreciation of

  Sense 8: ‚Üí EN: recognize, recognise
    Definition [en]: perceive to be the same

  Sense 9: ‚Üí EN: see
    Definition [en]: perceive by sight or have the power to perceive by sight

  Sense 

In [29]:
# Statistical breakdown of highly polysemous words
print("üìä Statistics: Words with 8+ Senses")
print("=" * 50)

if highly_polysemous:
    # Group by POS
    pos_breakdown = Counter(word.pos for word, _ in highly_polysemous)
    print("\nBy Part of Speech:")
    for pos, count in pos_breakdown.most_common():
        pos_name = {'n': 'Noun', 'v': 'Verb', 'a': 'Adjective', 'r': 'Adverb', 's': 'Adj. Satellite'}.get(pos, pos)
        print(f"  {pos_name:15} {count:3} words")
    
    # Sense count distribution
    sense_counts = [num for _, num in highly_polysemous]
    print(f"\nSense count range: {min(sense_counts)} - {max(sense_counts)}")
    print(f"Average senses:    {sum(sense_counts)/len(sense_counts):.1f}")
    
    # Group by sense count
    print("\nDistribution by sense count:")
    sense_dist = Counter(sense_counts)
    for count in sorted(sense_dist.keys()):
        words_at_count = [w.lemma() for w, n in highly_polysemous if n == count]
        print(f"  {count} senses: {len(words_at_count)} words")
        # Show first 3 examples
        examples = words_at_count[:3]
        if examples:
            print(f"           e.g., {', '.join(examples)}")
else:
    print("No words with 8+ senses found.")


üìä Statistics: Words with 8+ Senses

By Part of Speech:
  Verb            679 words
  Noun             48 words

Sense count range: 8 - 48
Average senses:    14.4

Distribution by sense count:
  8 senses: 130 words
           e.g., ÿ£ŸáŸíŸÖŸÑ, ÿ£ŸÉŸëÿØ, ÿ£ŸÖÿ±
  9 senses: 85 words
           e.g., ÿ¥ŸÉŸíŸÑ, ÿ£ÿπŸíŸÑŸÜ, ÿ£ÿ∏ŸíŸáÿ±
  10 senses: 66 words
           e.g., ÿ¥ŸÖŸÑ, ÿ£ÿ∂ÿßŸÅ, ÿ£ŸÜŸíÿ¥ÿ£
  11 senses: 80 words
           e.g., ÿ¥ÿ∫ŸëŸÑ, ÿ¥ŸÉŸëŸÑ, ÿ£ÿ∑ŸíŸÑŸÇ
  12 senses: 52 words
           e.g., ÿ¥ŸÉÿ±, ÿ¥ÿ±ÿπ, ÿ¥ÿ±ÿ≠
  13 senses: 55 words
           e.g., ÿ£ÿØÿÆŸÑ, ÿ£ŸÇÿßŸÖ, ÿ∂ÿ∫ÿ∑
  14 senses: 32 words
           e.g., ÿ£ÿ¨ÿßÿ≤, ÿ£ÿ≤ÿßÿ≠, ÿ∂ÿ®ÿ∑
  15 senses: 24 words
           e.g., ÿ¥ÿßŸáÿØ, ÿµÿßÿØŸÅ, ÿØÿÆŸÑ
  16 senses: 25 words
           e.g., ÿ£ÿ±ÿßÿØ, ÿ≠ÿ≥ÿ®, ÿµÿßŸÜ
  17 senses: 9 words
           e.g., ÿ£ŸÖŸíÿ≥ŸÉ, ŸÇÿßÿ®ŸÑ, ÿ≥ÿßŸÑ
  18 senses: 14 words
           e.g., ÿπŸÖŸÑ, ÿπÿ≤ŸÅ, ÿ≠ÿßŸÉŸÖ
  19 senses: 16 words
           e.g., ÿ∞Ÿáÿ®, ÿ£ÿπŸíÿ∑Ÿâ, ÿ£ÿ≥ÿ±
  20 se

In [30]:
# Analyze the semantic variety of highly polysemous words
print("üéØ Semantic Analysis of Highly Polysemous Words")
print("=" * 60)

# Create expanded wordnet for taxonomy access
arb_expanded = wn.Wordnet('omw-arb:1.4', expand='omw-en:1.4')

if highly_polysemous:
    for word, num_senses in highly_polysemous[:5]:  # Top 5 most polysemous
        print(f"\n{'‚îÄ'*60}")
        print(f"üìå {word.lemma()} ({num_senses} senses)")
        print("‚îÄ" * 60)
        
        # Collect all synsets and their hypernyms to find common themes
        hypernym_counts = Counter()
        pos_in_senses = Counter()
        
        for sense in word.senses():
            synset = sense.synset()
            pos_in_senses[synset.pos] += 1
            
            # Get hypernyms using expanded wordnet
            arb_ss = arb_expanded.synset(synset.id)
            if arb_ss:
                for hyp in arb_ss.hypernyms():
                    hyp_lemmas = [w.lemma() for w in hyp.words()]
                    if hyp_lemmas:
                        hypernym_counts[hyp_lemmas[0]] += 1
                    else:
                        # Get English hypernym
                        if hyp.ili:
                            en_hyps = en.synsets(ili=hyp.ili.id)
                            if en_hyps:
                                en_lemmas = [w.lemma() for w in en_hyps[0].words()]
                                if en_lemmas:
                                    hypernym_counts[f"[{en_lemmas[0]}]"] += 1
        
        # Show POS distribution within senses
        if len(pos_in_senses) > 1:
            print(f"  POS variety: {dict(pos_in_senses)}")
        
        # Show common hypernyms (semantic categories)
        if hypernym_counts:
            print(f"  Common hypernyms (semantic categories):")
            for hyp, count in hypernym_counts.most_common(5):
                print(f"    ‚Ä¢ {hyp}: {count} senses")
        
        # Show all senses briefly
        print(f"\n  All senses:")
        for i, sense in enumerate(word.senses(), 1):
            synset = sense.synset()
            defn, lang = get_definition(synset, en)
            lang_tag = f"[{lang}]" if lang else ""
            print(f"    {i}. {lang_tag} {defn[:50]}{'...' if len(defn) > 50 else ''}")


üéØ Semantic Analysis of Highly Polysemous Words

‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
üìå ÿ£ÿØÿ±ŸÉ (48 senses)
‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
  Common hypernyms (semantic categories):
    ‚Ä¢ ÿ£ÿØŸíÿ±ŸÉ: 4 senses
    ‚Ä¢ ÿ•ŸêÿπŸíÿ™ÿ®ÿ±: 2 senses
    ‚Ä¢ ŸÅŸáŸêŸÖ: 2 senses
    ‚Ä¢ ÿ™ÿ≠ÿ±ŸëŸÉ: 2 senses
    ‚Ä¢ ÿ£ÿ≥ŸíŸÅÿ±: 1 senses

  All senses:
    1. [en] have a feeling or perception about oneself in reac...
    2. [en] get to know or become aware of, usually accidental...
    3. [en] come as a logical consequence; follow logically
    4. [en] travel past
    5. [en] come together
    6. [en] be fully aware or cognizant of
    7. [en] show approval or appreciation of
    8. [en] perceive to be 

<a id='synsets'></a>
## 5. Exploring Synsets


In [31]:
# Get sample synsets
print("üîÆ Sample Arabic Synsets")
print("=" * 70)

sample_synsets = list(arb.synsets())[:10]
for ss in sample_synsets:
    lemmas = [w.lemma() for w in ss.words()]
    defn, lang = get_definition(ss, en)
    en_words = get_english_words(ss, en)
    ili = ss.ili.id if ss.ili else "N/A"
    
    print(f"\nSynset: {ss.id}")
    print(f"  POS: {ss.pos}")
    print(f"  ILI: {ili}")
    print(f"  Arabic words: {', '.join(lemmas)}")
    if en_words:
        print(f"  English words: {', '.join(en_words)}")
    lang_tag = f"[{lang}]" if lang else ""
    print(f"  Definition {lang_tag}: {defn[:65]}..." if len(defn) > 65 else f"  Definition {lang_tag}: {defn}")


üîÆ Sample Arabic Synsets

Synset: omw-arb-03012209-a
  POS: a
  ILI: i17226
  Arabic words: ÿ£ŸàŸëŸÑŸêŸä, ÿ£ŸéŸàŸëŸéŸÑŸêŸä
  English words: prime
  Definition [en]: of or relating to or being an integer that cannot be factored int...

Synset: omw-arb-13983515-n
  POS: n
  ILI: i110421
  Arabic words: ÿ∏ŸÑŸíŸÖÿßÿ°, ÿØŸèŸáŸíŸÖÿ©, ÿπÿ™ŸíŸÖÿ©, ÿ∏ŸÑÿßŸÖ, ÿ∏ŸèŸÑŸíŸÖÿ©, ÿ∫ŸÑÿ≥, ŸÇÿ™ŸíŸÖÿ©, ÿ∏ŸéŸÑŸíŸÖÿßÿ°, ÿØŸèŸáŸíŸÖŸéÿ©
  English words: dark, darkness
  Definition [en]: absence of light or illumination

Synset: omw-arb-13659760-n
  POS: n
  ILI: i108497
  Arabic words: ŸÉŸêŸäŸíŸÑŸèŸà ŸÖŸêÿ™Ÿíÿ±, ŸÉŸÖ
  English words: kilometer, kilometre, km, klick
  Definition [en]: a metric unit of length equal to 1000 meters (or 0.621371 miles)

Synset: omw-arb-09885145-n
  POS: n
  ILI: i88763
  Arabic words: ÿ¥ÿßÿ±ŸêŸä, ŸÖŸèÿ¥Ÿíÿ™ÿ±Ÿç, ŸÖŸèÿ¥Ÿíÿ™ÿ±ŸêŸä
  English words: buyer, purchaser, emptor, vendee
  Definition [en]: a person who buys

Synset: omw-arb-02316707-n
  POS: n
  ILI: i47722
  Arabic words

In [32]:
# Find synsets with multiple members (synonyms)
print("\nüë• Synsets with Multiple Arabic Words (Synonyms)")
print("=" * 60)

multi_word_synsets = []
for ss in arb.synsets():
    words = ss.words()
    if len(words) > 1:
        multi_word_synsets.append((ss, words))

multi_word_synsets.sort(key=lambda x: len(x[1]), reverse=True)

print(f"Found {len(multi_word_synsets)} synsets with 2+ Arabic words\n")

for ss, words in multi_word_synsets[:15]:
    lemmas = ', '.join([w.lemma() for w in words])
    defn, lang = get_definition(ss, en)
    en_words = get_english_words(ss, en)
    
    print(f"[{len(words)} words] {lemmas}")
    lang_tag = f"[{lang}]" if lang else ""
    en_tag = f" = {', '.join(en_words[:3])}" if en_words else ""
    print(f"         ‚Üí {lang_tag} {defn[:45]}...{en_tag}" if len(defn) > 45 else f"         ‚Üí {lang_tag} {defn}{en_tag}")



üë• Synsets with Multiple Arabic Words (Synonyms)
Found 6232 synsets with 2+ Arabic words

[83 words] ÿ¥ÿ±ÿØ, ÿ∑ÿßŸÅ, ŸáÿßŸÖ, ÿ™ÿ¨ŸàŸëŸÑ, ÿ™ÿ±ÿ≠ŸëŸÑ, ÿ™ÿ≥ŸÉŸëÿπ, ÿ£ÿ∂ÿßÿπ, ÿ£ÿ∑ŸÑŸÇ, ÿ£ŸÑŸÇŸâ, ÿ•ÿ™ÿÆÿ∞ ŸÖŸÉÿßŸÜÿß, ÿ•ÿ¨ÿ™ÿßÿ≤ ÿ≥Ÿäÿ±ÿß ÿπŸÑŸâ ÿßŸÑÿ£ŸÇÿØÿßŸÖ, ÿ•ÿÆÿ™ÿßÿ± ÿØŸàÿ±ÿß ŸÑŸÑŸÖŸÖÿ´ŸÑ, ÿ•ŸÜÿØŸÅÿπ ŸÑŸÑÿ£ŸÖÿßŸÖ, ÿßŸÖÿ™ÿØ, ÿßŸÜÿ®ÿ≥ÿ∑, ÿßŸÜÿ¨ÿ±ŸÅ, ÿßŸÜÿ≠ÿ±ŸÅ, ÿßŸÜÿ∂ŸÖ, ÿ™ÿßŸá, ÿ™ÿ™ŸÉÿ≥ÿ± ÿßŸÑÿ£ŸÖŸàÿßÿ¨, ÿ™ÿ¨ŸàŸÑ, ÿ™ÿ≠ÿØÿ´ ÿπŸÑŸâ ŸÜÿ≠Ÿà ŸÖŸÅŸÉŸÉ, ÿ™ÿØÿ≠ÿ±ÿ¨, ÿ™ÿØŸÅÿπŸá ÿßŸÑÿ±Ÿäÿßÿ≠, ÿ™ÿØŸÅŸÇ, ÿ™ÿ±ÿßÿµŸÅ, ÿ™ÿ±ÿ≠ŸÑ, ÿ™ÿ±ÿµŸÜ, ÿ™ÿ≥ŸÉÿπ, ÿ™ÿ≥ŸÑŸÇ, ÿ™ÿ≥ŸàŸÑ, ÿ™ÿπÿ±ÿ¥, ÿ™ŸÉŸàÿ±, ÿ™ŸÑŸàŸâ, ÿ™ŸÖÿßŸäŸÑ, ÿ™ŸÖÿ±ŸÇ, ÿ™ŸÜÿ≤Ÿá, ÿ¨ÿßÿ®, ÿ¨ÿßÿ® ÿßŸÑÿ®ÿ≠ÿßÿ±, ÿ¨ÿ±Ÿâ, ÿ¨ÿ±Ÿâ ŸÖÿπ ÿßŸÑÿ™Ÿäÿßÿ±, ÿ¨ŸàŸÑ, ÿ¨ŸàŸÑ ŸÅŸä, ÿ≠ÿßÿØ, ÿ≠ÿßŸÖ, ÿ≠ÿ∏, ÿÆÿ±ÿ¨ ŸÅŸä ŸÜÿ≤Ÿáÿ©, ÿÆÿ±ŸÅ, ÿÆÿ∂ÿπ ÿ®ŸÑÿØ ŸÑŸÑŸÇÿßŸÜŸàŸÜ, ÿÆÿ∑ÿß ÿ®ÿ™ÿ´ÿßŸÇŸÑ, ÿØÿÆŸÑ, ÿØŸÅÿπ, ÿØŸàŸâ, ÿ±ÿ™ÿ®, ÿ±ÿπŸâ ÿßŸÑŸÖÿßÿ¥Ÿäÿ©, ÿ±ÿ∫ÿß, ÿ±ŸÖŸâ, ÿ≥ÿßŸÅÿ± ÿ®ÿØŸàŸÜ ŸáÿØŸÅ, ÿ≥ÿßŸÅÿ± ŸÉÿ´Ÿäÿ±ÿß, ÿ≥ÿ®ŸÉ, ÿ≥ŸÉÿ® ÿßŸÑÿ≠ÿØŸäÿØ, ÿ≥ŸàŸÇ ÿ®ŸÇŸàÿ© ÿßŸÑÿ±Ÿäÿßÿ≠, ÿ¥ŸÉŸÑ, ÿµÿ®, 

<a id='taxonomy'></a>
## 6. Navigating the Taxonomy

Since Arabic WordNet uses the "expand" methodology, it relies on the English WordNet for taxonomic relations (hypernyms/hyponyms). The relations are traversed using the Interlingual Index (ILI).


In [33]:
# Create Arabic wordnet with English as expand lexicon
arb_expanded = wn.Wordnet('omw-arb:1.4', expand='omw-en:1.4')

print("Lexicon configuration:")
print(f"  Primary: {arb_expanded.lexicons()}")
print(f"  Expanded: {arb_expanded.expanded_lexicons()}")


Lexicon configuration:
  Primary: [<Lexicon omw-arb:1.4 [arb]>]
  Expanded: [<Lexicon omw-en:1.4 [en]>]


In [34]:
# Find a synset and explore its hierarchy
print("\nüå≥ Exploring Taxonomy")
print("=" * 60)

# Try to find a common noun with hypernyms
test_synsets = arb_expanded.synsets(pos='n')[:50]

for ss in test_synsets:
    hypernyms = ss.hypernyms()
    if hypernyms:
        lemmas = ', '.join([w.lemma() for w in ss.words()])
        defn, lang = get_definition(ss, en)
        lang_tag = f"[{lang}]" if lang else ""
        
        print(f"\nüìå {lemmas}")
        print(f"   Definition {lang_tag}: {defn}")
        print(f"   Hypernyms:")
        for h in hypernyms:
            h_lemmas = ', '.join([w.lemma() for w in h.words()])
            h_defn, h_lang = get_definition(h, en)
            h_en_words = get_english_words(h, en)
            h_display = h_lemmas if h_lemmas else f"[{', '.join(h_en_words[:3])}]" if h_en_words else "(no words)"
            print(f"     ‚Üë {h_display}: {h_defn[:50]}...")
        break



üå≥ Exploring Taxonomy

üìå ÿ∏ŸÑŸíŸÖÿßÿ°, ÿØŸèŸáŸíŸÖÿ©, ÿπÿ™ŸíŸÖÿ©, ÿ∏ŸÑÿßŸÖ, ÿ∏ŸèŸÑŸíŸÖÿ©, ÿ∫ŸÑÿ≥, ŸÇÿ™ŸíŸÖÿ©, ÿ∏ŸéŸÑŸíŸÖÿßÿ°, ÿØŸèŸáŸíŸÖŸéÿ©
   Definition [en]: absence of light or illumination
   Hypernyms:
     ‚Üë ÿ•Ÿêÿ∂ÿßÿ°ÿ©: the degree of visibility of your environment...


<a id='crosslingual'></a>
## 7. Cross-Lingual Analysis (Arabic ‚Üî English)


In [35]:
# Find Arabic-English translations via ILI
print("üåê Arabic ‚Üí English Translations")
print("=" * 60)

count = 0
for ss in arb.synsets():
    if ss.ili and count < 15:
        # Find English equivalent
        en_synsets = en.synsets(ili=ss.ili.id)
        if en_synsets:
            arb_lemmas = ', '.join([w.lemma() for w in ss.words()])
            en_lemmas = ', '.join([w.lemma() for w in en_synsets[0].words()][:5])
            
            print(f"\n{arb_lemmas}")
            print(f"  ‚Üí English: {en_lemmas}")
            print(f"  ‚Üí ILI: {ss.ili.id}")
            count += 1


üåê Arabic ‚Üí English Translations

ÿ£ŸàŸëŸÑŸêŸä, ÿ£ŸéŸàŸëŸéŸÑŸêŸä
  ‚Üí English: prime
  ‚Üí ILI: i17226

ÿ∏ŸÑŸíŸÖÿßÿ°, ÿØŸèŸáŸíŸÖÿ©, ÿπÿ™ŸíŸÖÿ©, ÿ∏ŸÑÿßŸÖ, ÿ∏ŸèŸÑŸíŸÖÿ©, ÿ∫ŸÑÿ≥, ŸÇÿ™ŸíŸÖÿ©, ÿ∏ŸéŸÑŸíŸÖÿßÿ°, ÿØŸèŸáŸíŸÖŸéÿ©
  ‚Üí English: dark, darkness
  ‚Üí ILI: i110421

ŸÉŸêŸäŸíŸÑŸèŸà ŸÖŸêÿ™Ÿíÿ±, ŸÉŸÖ
  ‚Üí English: kilometer, kilometre, km, klick
  ‚Üí ILI: i108497

ÿ¥ÿßÿ±ŸêŸä, ŸÖŸèÿ¥Ÿíÿ™ÿ±Ÿç, ŸÖŸèÿ¥Ÿíÿ™ÿ±ŸêŸä
  ‚Üí English: buyer, purchaser, emptor, vendee
  ‚Üí ILI: i88763

ÿ¥ÿßÿ¶ŸêŸÉ ÿßŸÑÿ¨ŸêŸÑŸíÿØ
  ‚Üí English: echinoderm
  ‚Üí ILI: i47722

ÿ¥ÿ∞ÿß, ÿ£ÿ±ÿ¨, ÿ£ÿ±ŸêŸäÿ¨, ÿßŸÑÿ±Ÿëÿßÿ¶ÿ≠ÿ© ÿßŸÑÿ≤ŸëŸÉŸäÿ©, ÿπÿ®ŸÇ, ÿπÿ®ŸêŸäÿ±, ÿπŸêÿ∑Ÿíÿ±, ÿ≠ŸÑÿßŸàÿ©, ÿ∑ŸêŸäŸíÿ®, ÿ®ÿßŸÇÿ©, ÿ±ŸäŸëÿß, ÿ±ŸêŸäŸíÿ≠
  ‚Üí English: bouquet, fragrance, fragrancy, redolence, sweetness
  ‚Üí ILI: i63173

ÿ¥ÿ£ŸÜ, ÿ£ŸáŸÖŸëŸêŸäŸëÿ©
  ‚Üí English: significance
  ‚Üí ILI: i64149

ÿ¥ÿ£ŸÜ, ŸáŸÖŸë
  ‚Üí English: concern
  ‚Üí ILI: i66717

ÿ¥ÿ£ŸÜ, ÿ¥Ÿäÿ°, ÿ£ŸÖŸíÿ±, ŸÖÿ≥Ÿíÿ£ŸÑÿ©
  ‚Üí English: matter, affair, thing
  ‚Üí

In [36]:
# English ‚Üí Arabic translation
print("\nüåê English ‚Üí Arabic Translations")
print("=" * 60)

english_words = ['water', 'sun', 'moon', 'book', 'house', 'love', 'knowledge', 'work', 'big', 'small']

for en_word in english_words:
    en_synsets = en.synsets(en_word)
    
    if en_synsets:
        en_ss = en_synsets[0]  # Take first sense
        if en_ss.ili:
            # Find Arabic equivalent
            arb_synsets = arb.synsets(ili=en_ss.ili.id)
            if arb_synsets:
                arb_lemmas = ', '.join([w.lemma() for w in arb_synsets[0].words()])
                print(f"{en_word:15} ‚Üí {arb_lemmas}")
            else:
                print(f"{en_word:15} ‚Üí (not in Arabic WN)")
        else:
            print(f"{en_word:15} ‚Üí (no ILI)")



üåê English ‚Üí Arabic Translations
water           ‚Üí ŸÖÿßÿ°
sun             ‚Üí ÿßŸÑÿ£ÿ≠ÿØ, ŸäŸéŸàŸíŸÖ ÿßŸÑÿ£ÿ≠ŸéÿØ
moon            ‚Üí ÿ∂ŸèŸàÿ° ÿßŸÑŸÇŸÖÿ±, ŸÇŸÖÿ±
book            ‚Üí ÿßŸÑŸÖŸèÿµŸíÿ≠ŸÅ ÿßŸÑÿ¥ÿ±ŸêŸäŸÅ, ÿßŸÑŸÇŸèÿ±Ÿíÿ¢ŸÜŸè ÿßŸÑŸÉÿ±ŸêŸäŸÖ, ŸÅŸèÿ±ŸíŸÇÿßŸÜ, ŸÖŸèÿµŸíÿ≠ŸÅ, ŸÇŸèÿ±Ÿíÿ¢ŸÜ
house           ‚Üí ÿ®ŸäŸíÿ™, ŸÖŸÜŸíÿ≤ŸêŸÑ
love            ‚Üí ÿ≠Ÿèÿ®Ÿë
knowledge       ‚Üí ÿ•ŸêÿØŸíÿ±ÿßŸÉ, ŸÖÿπŸíÿ±ŸêŸÅÿ©
work            ‚Üí ÿ¥Ÿèÿ∫ŸíŸÑ, ÿπŸÖŸÑ
big             ‚Üí ŸÉÿ®ŸêŸäÿ±
small           ‚Üí ÿØŸÇŸêŸäŸÇ, ŸÖŸêÿ¨ŸíŸáÿ±ŸêŸä


In [37]:
# Coverage analysis - what percentage of English concepts have Arabic equivalents?
print("\nüìä Cross-Lingual Coverage Analysis")
print("=" * 50)

# Get all ILIs from Arabic
arb_ilis = set()
for ss in arb.synsets():
    if ss.ili and ss.ili.id:
        arb_ilis.add(ss.ili.id)

# Get all ILIs from English
en_ilis = set()
for ss in en.synsets():
    if ss.ili and ss.ili.id:
        en_ilis.add(ss.ili.id)

# Calculate overlap
common_ilis = arb_ilis & en_ilis

print(f"Arabic synsets with ILI:  {len(arb_ilis):,}")
print(f"English synsets with ILI: {len(en_ilis):,}")
print(f"Shared concepts (ILIs):   {len(common_ilis):,}")
print(f"\nArabic coverage of English: {len(common_ilis)/len(en_ilis)*100:.1f}%")
print(f"Arabic unique to Arabic:    {len(arb_ilis - en_ilis):,} concepts")



üìä Cross-Lingual Coverage Analysis
Arabic synsets with ILI:  9,916
English synsets with ILI: 117,659
Shared concepts (ILIs):   9,916

Arabic coverage of English: 8.4%
Arabic unique to Arabic:    0 concepts


<a id='relations'></a>
## 8. Semantic Relations


In [39]:
# Explore different relation types available
print("üîó Exploring Semantic Relations")
print("=" * 50)

# Collect all relation types found
all_relations = Counter()

for ss in list(arb_expanded.synsets())[:500]:
    rel_map = ss.relations()
    for rel_type in rel_map.keys():
        all_relations[rel_type] += len(rel_map[rel_type])

print("Relation types found (sample of 500 synsets):")
for rel_type, count in all_relations.most_common(20):
    print(f"  {rel_type:25} {count:5} instances")


üîó Exploring Semantic Relations
Relation types found (sample of 500 synsets):
  hyponym                    3042 instances
  hypernym                    458 instances
  instance_hyponym            184 instances
  mero_part                   133 instances
  similar                     105 instances
  holo_part                    51 instances
  mero_member                  49 instances
  domain_topic                 30 instances
  attribute                    28 instances
  has_domain_topic             28 instances
  holo_member                  17 instances
  instance_hypernym            11 instances
  also                          8 instances
  causes                        8 instances
  entails                       7 instances
  mero_substance                7 instances
  holo_substance                6 instances
  has_domain_region             2 instances
  domain_region                 1 instances


In [40]:
# Explore specific relation examples
print("\nüîç Relation Examples")
print("=" * 60)

def show_relations(synset, rel_type, limit=3):
    """Display relations of a specific type."""
    related = synset.get_related(rel_type)
    if related:
        for r in related[:limit]:
            lemmas = [w.lemma() for w in r.words()]
            if lemmas:
                print(f"    ‚Üí {', '.join(lemmas)}")
            else:
                # Try English
                en_equiv = r.translate(lexicon='omw-en:1.4')
                if en_equiv:
                    en_lemmas = [w.lemma() for w in en_equiv[0].words()][:3]
                    print(f"    ‚Üí [{', '.join(en_lemmas)}] (English)")

# Find a synset with various relations
for ss in arb_expanded.synsets(pos='n'):
    rels = ss.relations()
    if len(rels) >= 2:
        lemmas = ', '.join([w.lemma() for w in ss.words()])
        print(f"\nSynset: {lemmas}")
        print(f"Definition: {ss.definition() or '(no def)'}")
        
        for rel_type in list(rels.keys())[:5]:
            print(f"  {rel_type}:")
            show_relations(ss, rel_type)
        break



üîç Relation Examples

Synset: ÿ∏ŸÑŸíŸÖÿßÿ°, ÿØŸèŸáŸíŸÖÿ©, ÿπÿ™ŸíŸÖÿ©, ÿ∏ŸÑÿßŸÖ, ÿ∏ŸèŸÑŸíŸÖÿ©, ÿ∫ŸÑÿ≥, ŸÇÿ™ŸíŸÖÿ©, ÿ∏ŸéŸÑŸíŸÖÿßÿ°, ÿØŸèŸáŸíŸÖŸéÿ©
Definition: (no def)
  hypernym:
    ‚Üí ÿ•Ÿêÿ∂ÿßÿ°ÿ©
  hyponym:
    ‚Üí ŸÑŸäŸíŸÑ
    ‚Üí [total darkness, lightlessness, blackness] (English)
    ‚Üí [blackout, brownout, dimout] (English)


<a id='similarity'></a>
## 9. Similarity Measures


In [41]:
# Find pairs of Arabic synsets to compare
print("üìê Semantic Similarity Between Arabic Concepts")
print("=" * 60)

# Get some noun synsets with hypernyms
noun_synsets = []
for ss in arb_expanded.synsets(pos='n'):
    if ss.hypernyms() and ss.words():
        noun_synsets.append(ss)
    if len(noun_synsets) >= 10:
        break

if len(noun_synsets) >= 2:
    print("Comparing pairs of Arabic concepts:\n")
    
    for i in range(min(5, len(noun_synsets)-1)):
        ss1 = noun_synsets[i]
        ss2 = noun_synsets[i+1]
        
        lemma1 = ss1.words()[0].lemma()
        lemma2 = ss2.words()[0].lemma()
        
        try:
            path_sim = similarity.path(ss1, ss2)
            wup_sim = similarity.wup(ss1, ss2)
            
            print(f"{lemma1} ‚Üî {lemma2}")
            print(f"  Path similarity: {path_sim:.3f}")
            print(f"  Wu-Palmer:       {wup_sim:.3f}")
            print()
        except Exception as e:
            print(f"{lemma1} ‚Üî {lemma2}: Could not compute ({e})")
else:
    print("Not enough synsets with hypernyms found for similarity comparison.")


üìê Semantic Similarity Between Arabic Concepts
Comparing pairs of Arabic concepts:

ÿ∏ŸÑŸíŸÖÿßÿ° ‚Üî ŸÉŸêŸäŸíŸÑŸèŸà ŸÖŸêÿ™Ÿíÿ±
  Path similarity: 0.091
  Wu-Palmer:       0.286

ŸÉŸêŸäŸíŸÑŸèŸà ŸÖŸêÿ™Ÿíÿ± ‚Üî ÿ¥ÿßÿ±ŸêŸä
  Path similarity: 0.067
  Wu-Palmer:       0.125

ÿ¥ÿßÿ±ŸêŸä ‚Üî ÿ¥ÿßÿ¶ŸêŸÉ ÿßŸÑÿ¨ŸêŸÑŸíÿØ
  Path similarity: 0.111
  Wu-Palmer:       0.600

ÿ¥ÿßÿ¶ŸêŸÉ ÿßŸÑÿ¨ŸêŸÑŸíÿØ ‚Üî ÿ¥ÿ∞ÿß
  Path similarity: 0.071
  Wu-Palmer:       0.133

ÿ¥ÿ∞ÿß ‚Üî ÿ¥ÿ£ŸÜ
  Path similarity: 0.111
  Wu-Palmer:       0.429



<a id='advanced'></a>
## 10. Advanced Analysis


In [42]:
# Character/script analysis of Arabic words
print("‚úçÔ∏è Arabic Script Analysis")
print("=" * 50)

import unicodedata

def analyze_arabic_text(text):
    """Analyze the Unicode properties of Arabic text."""
    categories = Counter()
    for char in text:
        cat = unicodedata.category(char)
        categories[cat] += 1
    return categories

# Collect all Arabic lemmas
all_lemmas = [word.lemma() for word in arb.words()]
all_text = ' '.join(all_lemmas)

# Analyze
char_stats = analyze_arabic_text(all_text)

print(f"Total Arabic words: {len(all_lemmas):,}")
print(f"Total characters:   {len(all_text):,}")
print(f"Unique characters:  {len(set(all_text))}")

print("\nCharacter categories:")
cat_names = {
    'Lo': 'Letters (other)',
    'Mn': 'Marks (non-spacing)',
    'Zs': 'Spaces',
    'Po': 'Punctuation',
    'Nd': 'Digits',
    'Lu': 'Letters (uppercase)',
    'Ll': 'Letters (lowercase)'
}
for cat, count in char_stats.most_common(10):
    name = cat_names.get(cat, cat)
    print(f"  {name:25} {count:6,} ({count/len(all_text)*100:.1f}%)")


‚úçÔ∏è Arabic Script Analysis
Total Arabic words: 18,003
Total characters:   173,821
Unique characters:  79

Character categories:
  Letters (other)           113,868 (65.5%)
  Marks (non-spacing)       34,065 (19.6%)
  Spaces                    25,756 (14.8%)
  Lm                            60 (0.0%)
  Digits                        34 (0.0%)
  Letters (uppercase)           14 (0.0%)
  Pd                             8 (0.0%)
  Punctuation                    8 (0.0%)
  Letters (lowercase)            5 (0.0%)
  So                             2 (0.0%)


In [43]:
# Word length distribution
print("\nüìè Word Length Distribution")
print("=" * 50)

lengths = [len(word.lemma()) for word in arb.words()]
length_dist = Counter(lengths)

print(f"Shortest word: {min(lengths)} characters")
print(f"Longest word:  {max(lengths)} characters")
print(f"Average:       {sum(lengths)/len(lengths):.1f} characters")

print("\nLength distribution:")
for length in sorted(length_dist.keys())[:15]:
    count = length_dist[length]
    bar = '‚ñà' * min(count // 20, 40)
    print(f"  {length:2} chars: {count:5} {bar}")

# Show longest words
print("\nLongest Arabic words:")
sorted_by_length = sorted([(word.lemma(), len(word.lemma())) for word in arb.words()], 
                          key=lambda x: x[1], reverse=True)
for word, length in sorted_by_length[:10]:
    print(f"  {word} ({length} chars)")



üìè Word Length Distribution
Shortest word: 0 characters
Longest word:  49 characters
Average:       8.7 characters

Length distribution:
   0 chars:     3 
   1 chars:    13 
   2 chars:   121 ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà
   3 chars:  1573 ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà
   4 chars:  2034 ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà
   5 chars:  2499 ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà
   6 chars:  1736 ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà
   7 chars:  1980 ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà
   8 chars:  1215 ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà

In [44]:
# Export summary data
print("\nüíæ Summary Export")
print("=" * 50)

import json

summary = {
    'lexicon': {
        'id': arb_lex.id,
        'version': arb_lex.version,
        'label': arb_lex.label,
        'language': arb_lex.language,
    },
    'statistics': {
        'words': arb_stats['words'],
        'senses': arb_stats['senses'],
        'synsets': arb_stats['synsets'],
        'nouns': arb_stats['words_n'],
        'verbs': arb_stats['words_v'],
        'adjectives': arb_stats['words_a'],
        'adverbs': arb_stats['words_r'],
    },
    'polysemy': {
        'monosemous': polysemy['monosemous'],
        'polysemous': polysemy['polysemous'],
        'avg_senses': round(polysemy['avg_senses'], 2),
        'max_senses': polysemy['max_senses'],
    },
    'cross_lingual': {
        'arabic_ilis': len(arb_ilis),
        'shared_with_english': len(common_ilis),
        'coverage_percent': round(len(common_ilis)/len(en_ilis)*100, 1) if en_ilis else 0,
    }
}

print(json.dumps(summary, indent=2, ensure_ascii=False))



üíæ Summary Export
{
  "lexicon": {
    "id": "omw-arb",
    "version": "1.4",
    "label": "Arabic WordNet (AWN v2)",
    "language": "arb"
  },
  "statistics": {
    "words": 18003,
    "senses": 37342,
    "synsets": 9916,
    "nouns": 10344,
    "verbs": 6728,
    "adjectives": 693,
    "adverbs": 238
  },
  "polysemy": {
    "monosemous": 12591,
    "polysemous": 5412,
    "avg_senses": 2.07,
    "max_senses": 48
  },
  "cross_lingual": {
    "arabic_ilis": 9916,
    "shared_with_english": 9916,
    "coverage_percent": 8.4
  }
}


---

## üìã Summary

In this notebook, we explored the **Arabic WordNet (AWN v2)** using the `wn` library:

### Key Findings:
1. **Coverage**: The Arabic WordNet provides Arabic lexical entries linked to English concepts via ILI
2. **Structure**: Uses the "expand" methodology - Arabic words are added on top of English synset structure
3. **Relations**: Taxonomic relations (hypernyms/hyponyms) are inherited from the English WordNet
4. **Cross-lingual**: Full bidirectional translation capability between Arabic and English

### Next Steps:
- Compare with other Arabic NLP resources
- Build Arabic word sense disambiguation systems
- Use for Arabic-English machine translation
- Extend coverage by proposing new Arabic entries

### Resources:
- [wn Documentation](https://wn.readthedocs.io/)
- [Open Multilingual Wordnet](https://github.com/omwn/omw-data)
- [Arabic WordNet Project](http://www.globalwordnet.org/AWN/)
