In [1]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, LlamaForCausalLM

# Check if a GPU is available and set the device
device = torch.device("cuda:1" if torch.cuda.is_available() else "cpu")

# Specify the model ID
model_id = "meta-llama/Meta-Llama-3-8B"

# Load the tokenizer from the Hugging Face library
tokenizer = AutoTokenizer.from_pretrained(model_id)

model: LlamaForCausalLM = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float32, device_map='auto')


Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]



In [16]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
from collections import defaultdict
from tqdm.notebook import tqdm

result_1 = []
result_2 = []
max_tokens = 10

with torch.no_grad():
    test_str_1 = tokenizer('Q: Translate "Mary had a little lamb" into Spanish. A: ', return_tensors='pt').input_ids[0]
    test_str_2 = tokenizer('Q: Translate "Mary tenía un corderito." into English. A: ', return_tensors='pt').input_ids[0]
    seq_len = max(test_str_1.shape[0], test_str_2.shape[0])
    test_str_1 = torch.cat([torch.tensor([tokenizer.bos_token_id] * (seq_len - test_str_1.shape[0]), dtype=torch.long), test_str_1])
    test_str_2 = torch.cat([torch.tensor([tokenizer.bos_token_id] * (seq_len - test_str_2.shape[0]), dtype=torch.long), test_str_2])

    for tok_i in tqdm(range(max_tokens)):
        if len(result_1) > 0: test_str_1 = torch.cat([test_str_1, torch.tensor([result_1[-1]], dtype=torch.long)])
        if len(result_2) > 0: test_str_2 = torch.cat([test_str_2, torch.tensor([result_2[-1]], dtype=torch.long)])

        # Interleave the two sequences
        input_ids = torch.stack([test_str_2, test_str_1, ], dim=1).reshape(-1).unsqueeze(0)
        position_ids = torch.repeat_interleave(torch.arange(seq_len + tok_i), 2).unsqueeze(0)

        # Run the model while with repeated position IDs
        logits = model.forward(input_ids, position_ids=position_ids.to(device)).logits

        # print(logits.shape)
        # print(decode(logits[0].argmax(-1).cpu().tolist()))
        val, idx = logits[0].detach().float().softmax(-1).topk(5)
        # print(val.shape)
        print('-' * 40)
        print('top 5 predictions:')
        for j in range(5):
            print(('{:>8}' * idx.shape[0]).format(*[repr(tokenizer.decode([i]))[1:-1] for i in idx[:, j].numpy()]))
            print(('{:8.2f}' * val.shape[0]).format(*[i for i in val[:, j].numpy()]))

        result_1 += [idx[-1, 0]]
        result_2 += [idx[-1, 1]]


  0%|          | 0/10 [00:00<?, ?it/s]

----------------------------------------
top 5 predictions:
QuestionQuestion       :     ://    What       :     the    What       I     the     had       I      un       "     per       a     uch  little       o    lamb    lamb       "    into    into Spanish Spanish     .\n     .\n       A       A       :       :    Mary    Mary       1       1
    0.31    0.31    0.16    0.43    0.05    0.21    0.23    0.12    0.06    0.22    0.08    0.05    0.12    0.29    0.04    0.32    0.09    0.65    0.86    0.81    0.20    0.42    0.27    0.34    0.28    0.31    0.47    0.40    0.26    0.27    0.77    0.72    0.29    0.30    0.72    0.76
     def     def      &A    _REF     The       .    this       I     The    this      is     The     una     had     her      to    ator   black     ito    Lamb       "     "\n      to      to English English       .       .    What       (       .       .       "       "    \xa0    \xa0
    0.10    0.10    0.14    0.08    0.04    0.20    0.06    0.11    0.04 

In [4]:
import numpy.random as npr
import numpy as np

p1 = 'The vast and intricate tapestry of human history is a testament to the enduring spirit of exploration, innovation, and resilience that characterizes our species. From the earliest days of hunter-gatherer societies, where survival hinged on an intimate understanding of nature and the ability to adapt to ever-changing environments, to the dawn of agriculture, which marked a seismic shift in human civilization by allowing for the establishment of permanent settlements and the development of complex social structures, our journey has been one of continuous transformation. The advent of written language enabled the preservation and dissemination of knowledge across generations, fostering the growth of cultures and the proliferation of ideas. This intellectual ferment gave rise to the great civilizations of antiquity, such as those of Mesopotamia, Egypt, the Indus Valley, and China, each contributing monumental advancements in fields as diverse as mathematics, astronomy, architecture, and governance. As trade networks expanded, these civilizations began to interact and influence one another, laying the groundwork for a more interconnected world. The classical period saw the rise and fall of empires like Greece and Rome, whose philosophical, political, and artistic legacies continue to shape contemporary thought. The Middle Ages, often characterized as a time of stagnation, were in fact a period of significant technological and cultural progress, particularly in the Islamic world, where scholars made remarkable strides in science, medicine, and philosophy. The Renaissance, fueled by the rediscovery of classical knowledge and the advent of the printing press, ignited a period of unprecedented creativity and intellectual inquiry in Europe, setting the stage for the Scientific Revolution and the Enlightenment. These movements fundamentally altered our understanding of the universe and our place within it, challenging long-held beliefs and sparking revolutions in industry, politics, and society. The modern era, marked by rapid advancements in technology and an ever-accelerating pace of change, has brought both tremendous opportunities and profound challenges. As we navigate the complexities of globalization, climate change, and the digital revolution, the lessons of our shared past offer invaluable insights. They remind us of our capacity for ingenuity and adaptation, as well as the importance of empathy, cooperation, and a commitment to the common good in shaping a future that is both equitable and sustainable for all.'
# p2 = 'En el corazón de la bulliciosa ciudad, una pequeña librería se erigía como un santuario para aquellos que buscaban consuelo entre las páginas de innumerables historias. Situada entre una cafetería y una panadería, ofrecía un refugio tranquilo del ruido y la prisa del exterior. El aroma del papel envejecido y el café recién hecho creaba una atmósfera acogedora, atrayendo a lectores de todas las edades. Cada rincón de la tienda estaba lleno de estantes que llegaban hasta el techo, conteniendo libros de todos los géneros imaginables. Era un lugar donde el tiempo parecía ralentizarse, permitiendo a los visitantes perderse en los mundos creados por autores de todo el mundo.'
p2 = '人类历史那庞大而复杂的挂毯是对我们物种探索、创新和韧性精神的持久见证。从最早的狩猎采集社会，那时的生存依赖于对自然的深刻理解和适应不断变化的环境能力，到农业的兴起，这标志着人类文明的巨大转变，因为它允许建立永久定居点和发展复杂的社会结构，我们的旅程一直是连续的转变。文字的出现使知识能够跨越世代保存和传播，促进了文化的成长和思想的繁荣。这种智力发酵催生了古代伟大的文明，如美索不达米亚、埃及、印度河流域和中国，每个文明在数学、天文学、建筑和治理等多种领域都做出了巨大的贡献。随着贸易网络的扩展，这些文明开始相互互动和影响，为一个更加相互联系的世界奠定了基础。古典时期见证了希腊和罗马等帝国的兴衰，它们的哲学、政治和艺术遗产至今仍在塑造当代思想。中世纪，尽管常被描述为停滞的时代，实际上是一个技术和文化显著进步的时期，特别是在伊斯兰世界，学者们在科学、医学和哲学领域取得了显著的成就。文艺复兴时期，由于重新发现古典知识和印刷术的发明，点燃了欧洲前所未有的创造力和智力探究时期，为科学革命和启蒙运动铺平了道路。这些运动从根本上改变了我们对宇宙及其自身位置的理解，挑战了长期持有的信仰，并引发了工业、政治和社会的革命。现代时期，以技术的快速进步和不断加快的变化速度为标志，带来了巨大的机遇和深刻的挑战。在我们应对全球化、气候变化和数字革命的复杂性时，我们共享的过去的教训提供了宝贵的见解。它们提醒我们具备创新和适应能力的重要性，以及塑造一个公平和可持续未来时同情心、合作和共同利益承诺的重要性。'

# p1 = 'In the heart of the bustling city, a small bookstore '
# p2 = 'En el corazón de la bulliciosa ciudad'

p1 = p1.split()
# p2 = p2.split()
p2 = list(p2)

# p1_loc = npr.choice(np.arange(len(p1) + len(p2)), len(p1), replace=False)
p1_loc = np.arange(0, min(len(p1) + len(p2), 2 * len(p1)), 2)

mix = np.zeros(len(p1) + len(p2), dtype=int)
mix[p1_loc] = 1

mix_p = ''
p1_i = 0
p2_i = 0
for j, pi in enumerate(mix):
    if pi == 1:
        mix_p += p1[p1_i] + ' '
        p1_i += 1
    else:
        mix_p += p2[p2_i] + ' '
        p2_i += 1
    # if j % 10 == 0 and j > 0:
    #     mix_p += '\n'

print(mix_p)


The 人 vast 类 and 历 intricate 史 tapestry 那 of 庞 human 大 history 而 is 复 a 杂 testament 的 to 挂 the 毯 enduring 是 spirit 对 of 我 exploration, 们 innovation, 物 and 种 resilience 探 that 索 characterizes 、 our 创 species. 新 From 和 the 韧 earliest 性 days 精 of 神 hunter-gatherer 的 societies, 持 where 久 survival 见 hinged 证 on 。 an 从 intimate 最 understanding 早 of 的 nature 狩 and 猎 the 采 ability 集 to 社 adapt 会 to ， ever-changing 那 environments, 时 to 的 the 生 dawn 存 of 依 agriculture, 赖 which 于 marked 对 a 自 seismic 然 shift 的 in 深 human 刻 civilization 理 by 解 allowing 和 for 适 the 应 establishment 不 of 断 permanent 变 settlements 化 and 的 the 环 development 境 of 能 complex 力 social ， structures, 到 our 农 journey 业 has 的 been 兴 one 起 of ， continuous 这 transformation. 标 The 志 advent 着 of 人 written 类 language 文 enabled 明 the 的 preservation 巨 and 大 dissemination 转 of 变 knowledge ， across 因 generations, 为 fostering 它 the 允 growth 许 of 建 cultures 立 and 永 the 久 proliferation 定 of 居 ideas. 点 This 和 intellectual 发 ferment 展 gave 