<a href="https://colab.research.google.com/github/Oksana0020/Hokku/blob/main/Hokku.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**A hokku** is a very short poem written in three lines following 5–7–5 syllable pattern. In my project, a hokku is generated text built by choosing words from vocabulary and arranging them into 3 lines following syllables pattern 5–7–5.

**A chromosome** is one candidate hokku solution. It is encoded as an array of integers. Each integer corresponds to an index in the vocabulary.The chromosome length is fixed and represents word slots for all three lines of the poem.When decoded, the chromosome maps to words that form the hokku text.

**The cost function** is a scoring rule that measures how bad a candidate hokku is. Lower cost means a better poem, with the ideal goal being to approach zero cost. The cost function is built incrementally and enforces the following rules:

*Syllable mismatch penalty*
Penalize deviation from the 5–7–5 syllable pattern.

*Marker rule*
Exactly one marker must be present in the poem.
Later stages prefer the marker at the end of line 1 or line 2.
This enables a later *ending rule* (do not end the poem with a marker or connector).

*Repetition penalty*
Penalize repeated words within the same poem.

*Season-word rule*
Require at least one season word per poem (added at a later stage).

*Grammar-inspired constraints*
Examples:
1)Preferred word order such as
adj noun verb or adj noun verb adv
2)At least two adverbs within one poem
3)Avoid ending the poem with short connector / preposition words

Mutation introduces random variation into an individual chromosome so the Genetic Algorithm can explore new possibilities. During mutation, a word may be replaced by another word from the same category (for example, replacing one noun with a different noun).It allows new word combinations to appear.

Crossover combines two parents to produce children: parent1, parent2 are making children by blending genes. Crossover mixes parents’ word choices. Kids  inherit word positions from each parent.Crossover reuses good partial solutions

**Example of Hokku**


Through my shanks are thin

I go where flowers blossom,

Yoshino Mountain.


*Basho*

In [3]:
import numpy as np
from copy import deepcopy

In [9]:
WORD_DATA = [
    # markers
    {"word": "—",    "syllable": 0, "category": "marker"},
    {"word": "...",  "syllable": 0, "category": "marker"},
    {"word": "!",    "syllable": 0, "category": "marker"},
    {"word": "?",    "syllable": 0, "category": "marker"},
    {"word": ",",    "syllable": 0, "category": "marker"},
    {"word": ".",    "syllable": 0, "category": "marker"},
    {"word": ";",    "syllable": 0, "category": "marker"},
    {"word": ":",    "syllable": 0, "category": "marker"},
    {"word": "—!",   "syllable": 0, "category": "marker"},
    {"word": "—...", "syllable": 0, "category": "marker"},

    # nouns
    {"word": "pond",     "syllable": 1, "category": "noun"},
    {"word": "frog",     "syllable": 1, "category": "noun"},
    {"word": "river",    "syllable": 2, "category": "noun"},
    {"word": "stone",    "syllable": 1, "category": "noun"},
    {"word": "leaf",     "syllable": 1, "category": "noun"},
    {"word": "shadow",   "syllable": 2, "category": "noun"},
    {"word": "silence",  "syllable": 2, "category": "noun"},
    {"word": "moon",     "syllable": 1, "category": "noun"},
    {"word": "wind",     "syllable": 1, "category": "noun"},
    {"word": "lantern",  "syllable": 2, "category": "noun"},

    # verbs
    {"word": "jumps",     "syllable": 1, "category": "verb"},
    {"word": "drifts",    "syllable": 1, "category": "verb"},
    {"word": "falls",     "syllable": 1, "category": "verb"},
    {"word": "whispers",  "syllable": 2, "category": "verb"},
    {"word": "shines",    "syllable": 1, "category": "verb"},
    {"word": "turns",     "syllable": 1, "category": "verb"},
    {"word": "melts",     "syllable": 1, "category": "verb"},
    {"word": "settles",   "syllable": 2, "category": "verb"},
    {"word": "wanders",   "syllable": 2, "category": "verb"},
    {"word": "sleeps",    "syllable": 1, "category": "verb"},

    # adjectives
    {"word": "old",      "syllable": 1, "category": "adj"},
    {"word": "silent",   "syllable": 2, "category": "adj"},
    {"word": "soft",     "syllable": 1, "category": "adj"},
    {"word": "bright",   "syllable": 1, "category": "adj"},
    {"word": "cold",     "syllable": 1, "category": "adj"},
    {"word": "dark",     "syllable": 1, "category": "adj"},
    {"word": "misty",    "syllable": 2, "category": "adj"},
    {"word": "gentle",   "syllable": 2, "category": "adj"},
    {"word": "lonely",   "syllable": 2, "category": "adj"},
    {"word": "empty",    "syllable": 2, "category": "adj"},

    # connectors
    {"word": "in",       "syllable": 1, "category": "other"},
    {"word": "of",       "syllable": 1, "category": "other"},
    {"word": "to",       "syllable": 1, "category": "other"},
    {"word": "from",     "syllable": 1, "category": "other"},
    {"word": "with",     "syllable": 1, "category": "other"},
    {"word": "without",  "syllable": 2, "category": "other"},
    {"word": "through",  "syllable": 1, "category": "other"},
    {"word": "under",    "syllable": 2, "category": "other"},
    {"word": "near",     "syllable": 1, "category": "other"},
    {"word": "while",    "syllable": 1, "category": "other"},

    # adverbs
    {"word": "slowly",    "syllable": 2, "category": "adv"},
    {"word": "softly",    "syllable": 2, "category": "adv"},
    {"word": "silently",  "syllable": 3, "category": "adv"},
    {"word": "brightly",  "syllable": 2, "category": "adv"},
    {"word": "deeply",    "syllable": 2, "category": "adv"},
    {"word": "lightly",   "syllable": 2, "category": "adv"},
    {"word": "suddenly",  "syllable": 3, "category": "adv"},
    {"word": "quietly",   "syllable": 3, "category": "adv"},
    {"word": "still",     "syllable": 1, "category": "adv"},
    {"word": "again",     "syllable": 2, "category": "adv"},
]

VOCABULARY = [w["word"] for w in WORD_DATA]
SYLLABLE   = {w["word"]: w["syllable"] for w in WORD_DATA}
CATEGORY   = {w["word"]: w["category"] for w in WORD_DATA}
MARKER_SET = {w for w in VOCABULARY if CATEGORY[w] == "marker"}

In [21]:
VOCABULARY[50],
SYLLABLE["frog"],
CATEGORY["frog"]

('slowly',)

In [10]:
# Checkup for vocabulary
from collections import Counter

category_counts = Counter(CATEGORY[w] for w in VOCABULARY)
print("Category counts:", category_counts)
print("Total vocabulary size:", len(VOCABULARY))
print("Markers are the following:", sorted(MARKER_SET))
print("Syllables in 'silently':", SYLLABLE["silently"])
print("Syllables in 'misty':", SYLLABLE["misty"])



Category counts: Counter({'marker': 10, 'noun': 10, 'verb': 10, 'adj': 10, 'other': 10, 'adv': 10})
Total vocabulary size: 60
Markers are the following: ['!', ',', '.', '...', ':', ';', '?', '—', '—!', '—...']
Syllables in 'silently': 3
Syllables in 'misty': 2


Adding chromosome and some helpers

slots_per_line does not mean syllables, it means how many words Genetic Algorithm is allowed to choose per line.

chromosome length is:
number_of_genes = 3 * slots_per_line

In [11]:
slots_per_line = 4
number_of_genes = 3 * slots_per_line

#Convert a numeric chromosome into three lines of words
def chromosome_to_lines(chromosome, slots_per_line: int):
    words = [VOCABULARY[int(i)] for i in chromosome]
    return [
        words[0:slots_per_line],
        words[slots_per_line:2 * slots_per_line],
        words[2 * slots_per_line:3 * slots_per_line],
    ]

#Calculate total syllables of line
def line_syllables(words):
    return sum(SYLLABLE[w] for w in words)

def format_hokku(lines):
    formatted_lines = []
    for line in lines:
        out = []
        for w in line:
            if w in MARKER_SET:
                if out:
                    out[-1] = out[-1] + w
                else:
                    out.append(w)
            else:
                out.append(w)
        formatted_lines.append(" ".join(out))
    return "\n".join(formatted_lines)

Checking chromosome decoding using raw integer indices.

In [15]:
chromosome = np.array([
    50, 1, 11, 0,     # line 1
    12, 20, 32, 5,   # line 2
    18, 14, 49, 2,   # line 3
], dtype=int)

lines = chromosome_to_lines(chromosome, slots_per_line)

print("Decoded lines:", lines)
print()
print("Formatted hokku:\n")
print(format_hokku(lines))
print()
print("Syllables per line:", [line_syllables(line) for line in lines])

Decoded lines: [['slowly', '...', 'frog', '—'], ['river', 'jumps', 'soft', '.'], ['wind', 'leaf', 'while', '!']]

Formatted hokku:

slowly... frog—
river jumps soft.
wind leaf while!

Syllables per line: [3, 4, 3]
