In [None]:
import json

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig

In [2]:
MODEL_NAME = "baichuan-inc/Baichuan2-7B-Chat"

print("set up tokenizer")
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, use_fast=False, trust_remote_code=True)
print("set up AutoModelForCausalLM")
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True)
print("GenerationConfig")
model.generation_config = GenerationConfig.from_pretrained(MODEL_NAME, max_new_tokens=4000, temperature=1)

set up tokenizer
set up AutoModelForCausalLM
GenerationConfig


In [3]:
messages = []
messages.append({"role": "user", "content": "自我介绍"})
response = model.chat(tokenizer, messages)
print(response)

我叫百川大模型，是由百川智能的工程师们创造的大语言模型，我可以和人类进行自然交流、解答问题、协助创作，帮助大众轻松、普惠的获得世界知识和专业服务。如果你有任何问题，可以随时向我提问


In [4]:
messages = []
messages.append({"role": "system", "content": "You are a translator. You translate Chinese text to English."})
messages.append({"role": "user", "content": "我叫百川大模型，是由百川智能的工程师们创造的大语言模型，我可以和人类进行自然交流、解答问题、协助创作，帮助大众轻松、普惠的获得世界知识和专业服务。如果你有任何问题，可以随时向我提问"})
response = model.chat(tokenizer, messages)
print(response)

Hello, I am Baichuan AI Model, created by the engineers of Baichuan AI. I can engage in natural communication, answer questions, and assist creation with humans. I am ready to help you with any questions you have.


In [5]:
messages = []
messages.append({"role": "user", "content": "Please introduce yourself and your capabilities"})
response = model.chat(tokenizer, messages)
print(response)

Hi, I am a language model developed by the Chinese AI company Unitytech.I can understand and generate natural language, which can help you answer questions, give advice, etc.My main capabilities include: 1. Understanding and answering questions: I can understand what you say, answer your questions, and provide relevant information.
2. Text generation: Based on the input information, I can generate text that is reasonable and logical.
3. Natural language interaction: I can interact with humans in natural language, understanding your requirements and needs, and providing corresponding services.
4. Cross-language support: I can support multiple languages, helping users around the world.
5. Continuous learning and improvement: Through continuous learning and feedback from users, I can improve my performance and better meet user needs.


In [14]:
def inference(text, instructions, system_prompt, model, tokenizer):
    messages = []
    messages.append({"role": "system", "content": f"{system_prompt}"})
    messages.append({"role": "user", "content": f"{instructions} {text}"})
    response = model.chat(tokenizer, messages)
    print(response)
    return response

In [15]:
CN_TEXT = "“基石”是个平实的词，不够“炫”，却能够准确传达我们对构建中的中国科幻繁华巨厦的情感与信心，因此，我们用它来作为这套原创丛书的名字。"
# INSTRUCTIONS = "Summarize the following in Chinese"
INSTRUCTIONS = "用中文概括下面一段"
# You are a translator. Please translate each message from Chinese to English.
SYSTEM_PROMPT = "你是一名翻译。请将每条消息从中文翻译成英文。"
response = inference(CN_TEXT, INSTRUCTIONS, SYSTEM_PROMPT, model, tokenizer)


The word "cornerstone" is a plain one, not dazzling enough, but capable of accurately conveying our emotions and confidence in the futuristic metropolis under construction. Therefore, we use it as the name of this series of original collections.


In [16]:
CN_TEXT = "“基石”是个平实的词，不够“炫”，却能够准确传达我们对构建中的中国科幻繁华巨厦的情感与信心，因此，我们用它来作为这套原创丛书的名字。"
# INSTRUCTIONS = "Summarize the following in Chinese"
INSTRUCTIONS = "用一句话概括下面一段"
# Your job is to summarize text. Please summarize each message using Chinese.
SYSTEM_PROMPT = "你的工作是总结文本。请用中文总结每条消息。"
response = inference(CN_TEXT, INSTRUCTIONS, SYSTEM_PROMPT, model, tokenizer)

这段文字概括了“基石”这个词语的含义，强调了其平实和准确，用于表达对中国科幻繁荣大厦的建设和信心的态度。


In [17]:
CN_TEXT = "“基石”是个平实的词，不够“炫”，却能够准确传达我们对构建中的中国科幻繁华巨厦的情感与信心，因此，我们用它来作为这套原创丛书的名字。"
# INSTRUCTIONS = "Summarize the following in Chinese"
INSTRUCTIONS = "用一句话概括下面一段"
# Your job is to summarize text. Please summarize each message using Chinese.
SYSTEM_PROMPT = "你的工作是总结文本。请用英文总结每条消息。"
response = inference(CN_TEXT, INSTRUCTIONS, SYSTEM_PROMPT, model, tokenizer)

The phrase "cornerstone" is plain and simple, not as exciting as other choices, but it accurately conveys our feelings and confidence in the emerging Chinese science fiction landscape. Therefore, we chose it as the title for this collection of original works.


In [35]:
# replace translate_paragraph function with generic inference function

import json

def translate_chapter(model, tokenizer, book_path, chapter_number):
    cn_en_translation_instructions = "Translate the following from Chinese to English:"
    cn_summary_instructions = "用一句话简要概括以下段落："

    # system prompts
    cn_en_translation_system_prompt = "你是一名翻译。请将每条消息从中文翻译成英文。"
    cn_summary_system_prompt = "用中文概括下列文字"

    # open the file
    with open(f"../data/books/{book_path}/chapters/{chapter_number}.json", "r") as f:
        chapter = json.loads(f.read())
        translated_paragraphs = []
        cn_summaries = []
        en_summaries = []

        for paragraph in chapter["paragraphs"]:

            print()
            print("original text")
            print()
            print(paragraph)

            # translate cn to en
            print()
            print("translate from cn to en")
            print()
            translated_paragraph = inference(
                paragraph,
                cn_en_translation_instructions,
                cn_en_translation_system_prompt,
                model,
                tokenizer
            )
            print(translated_paragraph)
            translated_paragraphs.append(translated_paragraph)

            # cn_summary
            print()
            print("summarize using Chinese")
            print()
            cn_summary = inference(
                paragraph,
                cn_summary_instructions,
                cn_summary_system_prompt,
                model,
                tokenizer
            )
            print(cn_summary)
            cn_summaries.append(cn_summary)

            # en_summary
            print()
            print("Translate Chinese summary to English")
            print()
            en_summary = inference(
                # use CN summary instead of original paragraph
                cn_summary,
                cn_en_translation_instructions,
                cn_en_translation_system_prompt,
                model,
                tokenizer
            )
            print(en_summary)
            en_summaries.append(en_summary)

        chapter["en_translation_baichuan2_7b"] = translated_paragraphs
        chapter["cn_summaries_baichuan2_7b"] = cn_summaries
        chapter["en_summaries_baichuan2_7b"] = en_summaries

    with open(f"../data/books/{book_path}/chapters/{chapter_number}.json", "w") as f:
        json.dump(chapter, f, ensure_ascii=False)

    print(f"translated {len(translated_paragraphs)}.")

In [36]:
translate_chapter(model, tokenizer, "three_body", 1)


original text

“基石”是个平实的词，不够“炫”，却能够准确传达我们对构建中的中国科幻繁华巨厦的情感与信心，因此，我们用它来作为这套原创丛书的名字。

translate from cn to en

"Stone Pillar" is a simple word that isn't very "glitz" but can accurately convey our emotions and confidence in the bustling mega-building under construction. Therefore, we use it as the name of this set of original series.

summarize using Chinese

"基石"系列丛书是中国科幻文学的原创丛书，旨在展示中国科幻的繁荣发展，并传达对构建中的中国科幻的信心和情感。

Translate Chinese summary to English

The "Stone Foundation" series is an original series of Chinese science fiction literature, aiming to showcase the prosperity and development of Chinese science fiction, and convey faith and emotion in the construction of Chinese science fiction.

original text

最近十年，是科幻创作飞速发展的十年。王晋康、刘慈欣、何宏伟、韩松等一大批科幻作家发表了大量深受读者喜爱、极具开拓与探索价值的科幻佳作。科幻文学的龙头期刊更是从一本传统的《科幻世界》，发展壮大成为涵盖各个读者层的系列刊物。与此同时，科幻文学的市场环境也有了改善，省会级城市的大型书店里终于有了属于科幻的领地。

translate from cn to en

In recent decade, it is a decade of rapid development of science fiction creation. A group of science fi

In [32]:
translate_chapter(model, tokenizer, "three_body", 2)


original text

《三体》终于能与科幻朋友们见面了，用连载的方式事先谁都没有想到，也是无奈之举。之前就题材问题与编辑们仔细商讨过，感觉没有什么问题，但没想到今年是文革三十周年这事儿，单行本一时出不了，也只能这样了。

translate from cn to en

"San Ti" is finally available for science fiction friends. It was published in serial form, which no one thought of before and was a desperate measure. We discussed the topic with editors carefully before, and felt that there were no problems, but I didn't expect this year is the 30th anniversary of the Cultural Revolution. The single volume cannot be released at once due to this issue, and it can only be done in this way.

summarize using Chinese

《三体》以连载形式与读者见面，因文革三十周年事件导致单行本出版推迟。

summarize using English

"三体"小说以连载形式与读者见面，因文化大革命三十周年纪念导致单行本出版延迟。

original text

其实这本书不是文革题材的，文革内容在其中只占不到十分之一，但却是一个漂荡在故事中挥之不去的精神幽灵。

translate from cn to en

In fact, this book is not about the Cultural Revolution, and the content of the Cultural Revolution accounts for less than one-tenth of it, but it is a psychological ghost that cannot be evaded in the story.

sum

In [16]:
translate_chapter(model, tokenizer, "three_body", 3)

China, 1967.
1967年的中国，经历着历史变革的挑战与机遇。
In 1967, China experienced a period of social and political turmoil.
在文革期间，红卫兵是新左派的一个组织，他们的目标是推翻旧的统治阶层和反对所谓的“资产阶级反动权威”。他们进行了大规模的抄家和暴力行为，导致了数千人死亡。第一代红卫兵是指那些最早参与红卫兵运动的人，而新一代的红卫兵则是指在红卫兵运动之后加入的人。

造反派是一个更广泛的概念，指的是在文革期间支持各种政治派别和组织的人群。他们的目标和策略各不相同，有些是激进的革命者，有些则是试图维护现有秩序的人。他们在文革期间起到了重要的作用，但也有很多暴力和混乱事件是他们参与的。
然而，造反派们的疯狂并不是没有道理的。他们在那个特殊的历史时期，面临着巨大的压力和挑战。他们的行为和行动，很大程度上是为了捍卫自己的利益和权益。在这个意义上，他们的疯狂也可以被视为一种抵抗和反抗。

在那个时代，造反派们对传统的权威和社会秩序进行了挑战。他们试图推翻旧的体制，建立一个新的社会。这种激进的行动和思想，在当时的社会环境中是难以理解的，但也正是因为这样，他们的行为和行为才更加令人瞩目和关注。

总的来说，造反派们的疯狂是一种历史的反映，是一种特定历史时期的特定现象。他们的行为和行动，虽然看似荒谬和不可理解，但在那个特定的历史背景下，却是可以理解和解释的。
然而，造反派们的行为也引起了广泛的争议和谴责。他们的一些极端行为被指责为破坏社会秩序和民主法制，甚至导致了一些严重的暴力事件和社会动荡。在1967年至1970年间，中国的政治环境日益恶化，造反派之间的冲突和对立也愈演愈烈。

在这个时期，毛泽东对造反派的立场也在不断变化。起初，他鼓励造反派推翻旧制度，打破旧思想，推动社会进步。然而，随着造反派之间的冲突加剧，毛泽东开始担忧他们的行动可能对社会稳定产生负面影响。因此，他开始试图控制造反派的行为，甚至对他们进行了打压。

1970年，毛泽东在一次谈话中明确表示，他不支持造反派的做法，认为他们过于激进，可能导致国家陷入混乱。同年，周恩来开始着手整顿社会秩序，限制造反派的行动，并试图与各派系进行和解。然而，这些努力并未取得显著效果，造反派之间的冲突和对立仍然持续不断。

最终，在1976年毛泽东去

In [17]:
translate_chapter(model, tokenizer, "three_body", 4)

Two years later, Daxing'an Mountains.
两年后的大兴安岭，风景如诗如画，吸引着游客前来欣赏。
Two years later, Daxing'anling will continue to protect and develop its natural environment while promoting sustainable development and eco-tourism.
"Shun shan dao le -"
这段文字描述了“顺山倒咧”的叫卖声，暗示着某种活动或场景的开始。
"Shun shan dao le" is a Chinese idiomatic expression, meaning "the mountain is falling backward." It is used to describe something that is happening in reverse or against the usual direction.
“我爸爸是叶文泽。”

“啊，叶文泽？那是个很了不起的科学家呢！”

“是的，他是研究物理的。他曾经参与过中国的核研究项目，也就是原子弹和氢弹的研究。”

“哇，那真是太棒了！”

“但是他的研究也引起了争议，因为有些人认为他在从事有害于人类的活动。他去世的时候，我只有12岁，但我知道他是个好人。”

“你真的很幸运能有个这么伟大的父亲。”

“是的，我也这么觉得。而且我觉得他也给了我很多勇气和力量。每当我遇到困难时，我都会想起他的努力和坚持。这让我有了面对一切的勇气。”
叶文洁用斧头劈砍的时候，她仿佛看到了自己父亲的背影，他曾经那么高大、健壮，而现在，他也如同这棵落叶松一样，已经老去，变得脆弱不堪。她的心中充满了悲痛，但她知道，这是生命的轮回，也是自然的法则。

她轻轻地抚摸着树干，仿佛在安慰一个即将离她而去的亲人。她知道自己无法阻止大自然的进程，但她愿意为这巨人做最后一些事情。她用锯子慢慢地切割着树干的伤口，每一锯都如此缓慢，仿佛在挖掘自己的心灵。

当树干被锯断时，叶文洁感受到了一股强大的震动，仿佛是大地在她脚下爆炸。她站立起来，看着那巨大的落叶松轰然倒下，地面也在颤抖。她知道，这就是生命的终结，也是新的生命的开始。

她开始清理现场，将树枝堆起来，点燃火堆