## 

In [1]:
data = """

**The Tale of Ginger: A Cat Named Ryzhik**

In a small, picturesque village nestled between rolling hills and dense forests, there lived a cat named Ryzhik. His name, which means "Ginger" in Russian, perfectly suited his vibrant orange fur that shimmered like the setting sun. Ryzhik was no ordinary cat; he was a legend in the village, known for his adventurous spirit, cunning intelligence, and a heart full of curiosity. His story is one of courage, friendship, and the unbreakable bond between humans and animals.

Ryzhik’s journey began in the cozy home of an elderly woman named Babushka Anna. She had found him as a tiny, shivering kitten abandoned near the edge of the forest. Taking pity on the little creature, she brought him home, fed him warm milk, and wrapped him in a soft blanket. From that day on, Ryzhik became her loyal companion, filling her quiet home with purrs and playful antics.

As Ryzhik grew, so did his reputation. The villagers often spotted him prowling the streets, his bushy tail held high like a flag. He was a master hunter, keeping the village free of mice and rats. But Ryzhik’s talents extended beyond hunting. He had an uncanny ability to sense trouble. Once, he alerted the villagers to a fire in the bakery by meowing loudly and scratching at doors until everyone woke up. Thanks to Ryzhik, the fire was extinguished before it could spread, and the villagers hailed him as a hero.

Despite his bravery, Ryzhik was also known for his mischievous side. He loved exploring every nook and cranny of the village, often sneaking into homes to steal bites of food or nap in the warmest spots. The villagers didn’t mind; they adored Ryzhik and considered him a part of their community. Children would chase him playfully, and he would dart away, only to circle back and rub against their legs, purring loudly.

One day, Ryzhik’s adventurous spirit led him beyond the village and into the forest. The forest was a mysterious place, filled with towering trees, babbling brooks, and creatures of all kinds. Ryzhik had always been curious about the forest, and that day, he decided to explore it. He padded softly through the underbrush, his keen eyes and ears alert for any signs of danger.

As he ventured deeper, he encountered a family of foxes. At first, the foxes were wary of the strange orange cat, but Ryzhik’s calm demeanor and friendly purrs soon put them at ease. The fox cubs, in particular, were fascinated by Ryzhik and invited him to play. They spent hours chasing each other through the trees, leaping over logs, and splashing in the streams. Ryzhik felt a sense of freedom and joy he had never experienced before.

But the forest was not without its dangers. As the sun began to set, Ryzhik realized he had wandered too far and lost his way. The once-familiar sounds of the village were gone, replaced by the eerie calls of nocturnal creatures. Ryzhik’s heart raced as he tried to retrace his steps, but the dense foliage and fading light made it impossible. He was lost.

Just as fear began to creep in, Ryzhik heard a soft rustling nearby. He froze, his fur standing on end. From the shadows emerged a large, gray wolf. The wolf’s piercing eyes locked onto Ryzhik, and for a moment, time seemed to stand still. But instead of attacking, the wolf tilted its head and let out a low, rumbling growl. To Ryzhik’s surprise, the wolf turned and began to walk away, glancing back as if to say, “Follow me.”

Trusting his instincts, Ryzhik followed the wolf through the forest. The wolf led him to a clearing where a small, hidden path began. Ryzhik recognized the path; it was the one that led back to the village. With a grateful purr, he rubbed against the wolf’s leg before darting down the path. The wolf watched him go, its eyes gleaming with something akin to respect.

When Ryzhik finally returned to the village, the villagers were overjoyed. Babushka Anna had been worried sick, and she scooped him up in her arms, scolding him gently for his reckless adventure. But Ryzhik knew he had grown from the experience. He had faced his fears, made new friends, and discovered a courage he didn’t know he had.

From that day on, Ryzhik became a symbol of bravery and resilience in the village. His story spread far and wide, inspiring others to face their own challenges with courage and determination. But despite his newfound fame, Ryzhik remained the same playful, curious cat who loved nothing more than curling up in Babushka Anna’s lap at the end of the day.

As the years passed, Ryzhik’s fur began to gray, and his steps grew slower. But his spirit never wavered. He continued to explore, to protect, and to bring joy to everyone he met. And when his time finally came, the villagers gathered to honor him, sharing stories of his many adventures and the lessons he had taught them.

Ryzhik’s legacy lived on in the hearts of the villagers. A small statue was erected in the village square, depicting a proud orange cat with a bushy tail held high. Children would gather around the statue, listening to tales of the brave and mischievous Ryzhik, and dreaming of their own adventures.

In the end, Ryzhik was more than just a cat. He was a reminder that even the smallest creatures can make a big difference, that courage and kindness can light the way in the darkest of times, and that the bonds of friendship and love are the most powerful forces of all.

And so, the tale of Ryzhik, the ginger cat with a heart of gold, became a cherished part of the village’s history, a story passed down from generation to generation, inspiring all who heard it to live with curiosity, bravery, and an unshakable sense of wonder.
"""

In [4]:
import re
import string


def clean(inp: str) -> str:

    inp = inp.translate(
        str.maketrans(string.punctuation, " " * len(string.punctuation))
    )

    inp = re.sub(r"\s+", " ", inp.lower())

    return inp

In [15]:
import torch
import numpy as np


def train(data: str):
    emb_size = 300
    k = 15
    window_size = 10

    cleaned_data = clean(data)
    unique_words = unique_words = list(set(cleaned_data.split()))
    word_to_idx = {word: idx for idx, word in enumerate(unique_words)}
    vocab_size = len(unique_words)

    num_epochs = 30

    v_embeddings = torch.rand(vocab_size, emb_size, requires_grad=True)
    u_embeddings = torch.rand(vocab_size, emb_size, requires_grad=True)

    optimizer = torch.optim.SGD(
        params=[
            {"params": v_embeddings, "lr": 0.01},
            {"params": u_embeddings, "lr": 0.01},
        ]
    )

    data_numpy = np.array(cleaned_data.split())

    range_idxs = np.arange(len(data_numpy))

    for epoch in range(num_epochs):
        total_loss = 0
        for pos_v, word in enumerate(data_numpy):
            optimizer.zero_grad()

            idx_v = word_to_idx[word]
            mask_u = (range_idxs != pos_v) & (abs(range_idxs - pos_v) <= window_size)
            # pos_u_string = range_idxs[mask_u]

            words_u = data_numpy[mask_u]
            idxs_u = [word_to_idx[word_u] for word_u in words_u]

            words_non_context = np.random.choice(data_numpy[~mask_u], size=k)
            idxs_non_context = [
                word_to_idx[word_non_context] for word_non_context in words_non_context
            ]

            # print(idx_v, idxs_u)
            # print(idxs_non_context)
            context_loss = torch.sum(
                torch.log(
                    torch.sigmoid(
                        torch.matmul(v_embeddings[idx_v], u_embeddings[idxs_u].T)
                    )
                )
            )

            non_context_loss = torch.sum(
                torch.sum(
                    torch.sigmoid(
                        -torch.matmul(
                            v_embeddings[idx_v], u_embeddings[idxs_non_context].T
                        )
                    )
                )
            )

            loss = -(context_loss + non_context_loss)

            loss.backward()
            optimizer.step()
            total_loss += loss.item()
        print(f"Epoch {epoch} with loss: {total_loss / len(data_numpy)}")
    return {
        word_: embedding.detach().numpy()
        for word_, embedding in zip(unique_words, v_embeddings)
    }

In [16]:
%time
embeddings = train(data)

CPU times: user 3 µs, sys: 1 µs, total: 4 µs
Wall time: 9.3 µs
Epoch 0 with loss: -8.913785244012248e-30
Epoch 1 with loss: -1.4018322845955617e-29
Epoch 2 with loss: -8.763776993555527e-30
Epoch 3 with loss: -7.596333426256305e-30
Epoch 4 with loss: -1.4849904831462366e-29
Epoch 5 with loss: -1.1866547105365459e-29
Epoch 6 with loss: -1.492823685406346e-29
Epoch 7 with loss: -1.880157793512702e-29
Epoch 8 with loss: -8.814712326858428e-30
Epoch 9 with loss: -1.255941216287125e-29
Epoch 10 with loss: -1.4567843200069113e-29
Epoch 11 with loss: -1.4837559219422002e-29
Epoch 12 with loss: -9.397916184543435e-30
Epoch 13 with loss: -8.346422632270283e-30
Epoch 14 with loss: -7.303116234767349e-30
Epoch 15 with loss: -1.0034151350479387e-29
Epoch 16 with loss: -1.1552327777259286e-29
Epoch 17 with loss: -9.85283580511365e-30
Epoch 18 with loss: -6.994018830501171e-30
Epoch 19 with loss: -1.3733359018028352e-29
Epoch 20 with loss: -1.611275339256894e-29
Epoch 21 with loss: -9.18676522811187

In [11]:
embeddings["ryzhik"].detach().numpy()

array([0.10240865, 0.97869915, 0.34721798, 0.74651515, 0.704993  ,
       0.8725097 , 0.8108366 , 0.66991377, 0.7093448 , 0.7215508 ,
       0.8921043 , 0.14622784, 0.96129066, 0.89800245, 0.02479452,
       0.12193871, 0.95624304, 0.47946227, 0.36446851, 0.6047833 ,
       0.9780253 , 0.6811065 , 0.96849895, 0.12881476, 0.3002454 ,
       0.04772341, 0.01693201, 0.8129706 , 0.7809258 , 0.19448787,
       0.6905765 , 0.8298839 , 0.7092811 , 0.9260868 , 0.70973736,
       0.7754277 , 0.247451  , 0.8806229 , 0.34983194, 0.7628068 ,
       0.532485  , 0.8332035 , 0.74179476, 0.36350936, 0.4728797 ,
       0.18284154, 0.16683489, 0.9614223 , 0.15800041, 0.3752433 ,
       0.06675726, 0.88700134, 0.39440954, 0.49329   , 0.9722176 ,
       0.7847164 , 0.05764443, 0.6308585 , 0.4225769 , 0.04073751,
       0.30903137, 0.70290244, 0.9939393 , 0.80318135, 0.02847439,
       0.05202276, 0.8870243 , 0.40395373, 0.14151806, 0.73091817,
       0.5824844 , 0.8915332 , 0.8689611 , 0.4551974 , 0.93058