# TuoLingC: An text summarization model based on ChatGLM
Developed by Ziang Leng 冷子昂, Qiyuan Chen 陈启源 and Cheng Li 李鲁鲁.
This is a simple script based on the project of luotuo-silk-road.git

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/LC1332/Luotuo-Chinese-LLM/blob/main/notebook/TuoLingC_evaluation_code.ipynb)

In [None]:
!git clone https://github.com/LC1332/luotuo-silk-road.git ./luotuo_silk_road
!wget https://github.com/LC1332/Luotuo-Chinese-LLM/raw/main/notebook/utils.py
!cd luotuo_silk_road/TuoLing && pip install -r requirements.txt 

In [None]:
import os
import torch
from utils import DeviceMap
from transformers import AutoModel, AutoTokenizer

In [None]:
torch.set_default_tensor_type(torch.cuda.HalfTensor)

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)

model = AutoModel.from_pretrained(
    "THUDM/chatglm-6b", 
    trust_remote_code=True, 
    device_map=DeviceMap("ChatGLM").get()
)


In [None]:
from peft import get_peft_model, LoraConfig, TaskType

peft_path = "./luotuo_silk_road/TuoLing/output/luotuoC.pt"

peft_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    inference_mode=True,
    r=8,
    lora_alpha=32,
    lora_dropout=0.1,
)

model = get_peft_model(model, peft_config)

model.load_state_dict(torch.load(peft_path), strict=False)

torch.set_default_tensor_type(torch.cuda.FloatTensor)


# This model is a summarization model, you can simply copy your rich text into "input" in the next cell. The max input and output sequence length is 2048 currently. Please don't copy input or expect output text longer than this number for accuracy purpose. If you need to train or validate more domain specific data, please contact our team! Thank you!

# 本模型为summarization模型，您可以将对应长文本数据，直接复制到下面的输入中。当前模型输入及输出总长度为2048，请复制时不要将大于这个长度的文档放入，以免不准确。如果需要进一步验证数据或训练更多领域相关数据，请随时联系我们团队！谢谢！

In [None]:
from luotuo_silk_road.TuoLing.cover_alpaca2jsonl import format_example


def evaluate(instruction, input=None):
    with torch.no_grad():
        feature = format_example(
            {"instruction": "请帮我总结以下内容:", "output": "", "input": f"{instruction}"}
        )
        input_text = feature["context"]
        input_ids = tokenizer.encode(input_text, return_tensors="pt")
        out = model.generate(input_ids=input_ids, max_length=2048, temperature=0)
        answer = tokenizer.decode(out[0])
        print(answer)


In [None]:
evaluate(input("您需要总结的长文本请直接复制在这里: "))

In [None]:
evaluate(input("Put your rich text here which you want to summarize it: "))