NeuralChat is a customizable chat framework designed to create user own chatbot within few minutes on multiple architectures. This notebook is used to demonstrate how to build a talking chatbot on 4th Generation of Intel® Xeon® Scalable Processors Sapphire Rapids.

The 4th Generation of Intel® Xeon® Scalable processor provides two instruction sets viz. AMX_BF16 and AMX_INT8 which provides acceleration for bfloat16 and int8 operations respectively.

# Prepare Environment

Install intel extension for transformers:

In [None]:
!pip install intel-extension-for-transformers

Install Requirements:

In [None]:
!git clone https://github.com/intel/intel-extension-for-transformers.git

In [None]:
%cd ./intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/
!pip install -r requirements_cpu.txt
%cd ../../../

# Build your chatbot 💻

## Text Chat

Giving NeuralChat the textual instruction, it will respond with the textual response.

In [1]:
# BF16 Optimization
from intel_extension_for_transformers.neural_chat import build_chatbot, PipelineConfig
from intel_extension_for_transformers.transformers import MixedPrecisionConfig
config = PipelineConfig(optimization_config=MixedPrecisionConfig())
chatbot = build_chatbot(config)
response = chatbot.predict(query="Tell me about Intel Xeon Scalable Processors.")
print(response)




Loading model Intel/neural-chat-7b-v3-1


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

The Intel Xeon Scalable Processors represent a family of high-performance central processing units (CPUs) designed for data centers, cloud computing, and other demanding workloads. These processors offer significant improvements in performance, efficiency, and scalability compared to their predecessors. They feature advanced technologies such as Intel Advanced Vector Extensions 512 (AVX-512), Intel Turbo Boost Technology 2.0, and Intel Hyper-Threading Technology, which contribute to increased throughput and reduced latency. Additionally, they support various memory configurations, including DDR4, DDR3L, and Optane DC Persistent Memory, allowing for flexible system designs tailored to specific needs. Overall, the Intel Xeon Scalable Processors aim to deliver exceptional performance and reliability for mission-critical applications and large-scale deployments. интелект интелл процессор скалируемый эксон интелл скалируемый процессор эксон интелл скалируемый процессор эксен интелл скалируе

In [2]:
response1 = chatbot.predict(query="What is AMD Ryzen?")
print(response1)

AMD Ryzen is a series of high-performance central processing units (CPUs) developed by Advanced Micro Devices (AMD). It was first introduced in 2017 as a competitor to Intel's Core processors. The Ryzen lineup offers various models with different core counts, clock speeds, and features, catering to diverse needs such as gaming, content creation, and general computing tasks. These CPUs are known for their impressive performance, power efficiency, and affordability, making them popular among PC enthusiasts and gamers alike. інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде


In [3]:
response2 = chatbot.predict(query="What is the difference between ARM and x86 architectures?")
print(response2)

The main difference between ARM and x86 architectures lies in their design and usage. ARM (Advanced RISC Machine) architecture is a reduced instruction set computing (RISC) design, which focuses on efficiency and low power consumption. It's commonly used in mobile devices, embedded systems, and Internet of Things (IoT) applications. ARM processors are generally smaller and consume less power compared to x86 processors.

On the other hand, x86 (short for Intel 8086) architecture is a complex instruction set computing (CISC) design, originally developed by Intel. This architecture is known for its flexibility and compatibility with various operating systems. It has been widely adopted in personal computers, servers, and workstations. X86 processors offer better performance and support for multitasking compared to ARM processors. However, they tend to consume more power and generate more heat.

In summary, while both architectures have their unique strengths and weaknesses, ARM is more su

In [4]:
response3 = chatbot.predict(query="Can you explain what a GPU is?")
print(response3)

A GPU (Graphics Processing Unit) is a specialized electronic circuit designed for accelerating the rendering of graphics and images on various devices like computers, smartphones, and gaming consoles. It works in parallel with the CPU (Central Processing Unit), which handles general computing tasks. GPUs are optimized for handling complex mathematical calculations required for processing visual data, making them particularly efficient at rendering high-quality 2D and 3D graphics, video playback, and other graphical applications. In summary, a GPU is like a superhero for visuals, helping our devices display stunning images and animations. інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде інде


In [5]:
response4 = chatbot.predict(query="What is the role of a motherboard in a computer?")
print(response4)

The role of a motherboard in a computer can be compared to the central nervous system of a human body. It serves as the main hub where all essential components connect and communicate with each other. A motherboard houses various crucial elements such as CPU (Central Processing Unit), RAM (Random Access Memory), storage devices like hard drives or SSDs, and expansion slots for additional hardware like graphics cards or network adapters. It also provides power supply and distributes it among different parts of the computer. In summary, the motherboard acts as the backbone of a computer, enabling seamless communication between its vital organs and ensuring optimal performance. интелектуальный бот может быть полезным помощником в жизни человека, предоставляя разнообразные услуги и возможности для обучения и развития. интеллект может быть применен не только к технологическим областям, но и к другим сферам, таким как искусство, наука или образование.


## Text Chat With Retrieval Plugin

User could also leverage NeuralChat Retrieval plugin to do domain specific chat by feding with some documents like below:

In [None]:
%cd ./intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/pipeline/plugins/retrieval/
!pip install -r requirements.txt
%cd ../../../../../../

In [None]:
!mkdir docs
%cd docs
!curl -OL https://raw.githubusercontent.com/intel/intel-extension-for-transformers/main/intel_extension_for_transformers/neural_chat/assets/docs/sample.jsonl
!curl -OL https://raw.githubusercontent.com/intel/intel-extension-for-transformers/main/intel_extension_for_transformers/neural_chat/assets/docs/sample.txt
!curl -OL https://raw.githubusercontent.com/intel/intel-extension-for-transformers/main/intel_extension_for_transformers/neural_chat/assets/docs/sample.xlsx
%cd ..

In [None]:
from intel_extension_for_transformers.neural_chat import PipelineConfig
from intel_extension_for_transformers.neural_chat import build_chatbot
from intel_extension_for_transformers.neural_chat import plugins
plugins.retrieval.enable=True
plugins.retrieval.args["input_path"]="./docs/"
config = PipelineConfig(plugins=plugins)
chatbot = build_chatbot(config)
response = chatbot.predict("How many cores does the Intel® Xeon® Platinum 8480+ Processor have in total?")
print(response)

## Voice Chat with ASR & TTS Plugin

In the context of voice chat, users have the option to engage in various modes: utilizing input audio and receiving output audio, employing input audio and receiving textual output, or providing input in textual form and receiving audio output.

For the Python API code, users have the option to enable different voice chat modes by setting ASR and TTS plugins enable or disable.

In [None]:
%cd ./intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/pipeline/plugins/audio/
!pip install -r requirements.txt
%cd ../../../../../../

In [None]:
!curl -OL https://raw.githubusercontent.com/intel/intel-extension-for-transformers/main/intel_extension_for_transformers/neural_chat/assets/speaker_embeddings/spk_embed_default.pt
!curl -OL https://raw.githubusercontent.com/intel/intel-extension-for-transformers/main/intel_extension_for_transformers/neural_chat/assets/audio/sample.wav

In [None]:
from intel_extension_for_transformers.neural_chat import PipelineConfig
from intel_extension_for_transformers.neural_chat import build_chatbot
from intel_extension_for_transformers.neural_chat import plugins
plugins.tts.enable = True
plugins.tts.args["output_audio_path"] = "./response.wav"
plugins.asr.enable = True

config = PipelineConfig(plugins=plugins)
chatbot = build_chatbot(config)
result = chatbot.predict(query="./sample.wav")
print(result)

# Low Precision Optimization

## BF16

In [None]:
# BF16 Optimization
from intel_extension_for_transformers.neural_chat.config import PipelineConfig
from intel_extension_for_transformers.transformers import MixedPrecisionConfig
config = PipelineConfig(optimization_config=MixedPrecisionConfig())
chatbot = build_chatbot(config)
response = chatbot.predict(query="Tell me about Intel Xeon Scalable Processors.")
print(response)