In [1]:
import os
import urllib.request


def download_file(file_link, filename):
    # Checks if the file already exists before downloading
    if not os.path.isfile(filename):
        urllib.request.urlretrieve(file_link, filename)
        print("File downloaded successfully.")
    else:
        print("File already exists.")

# Dowloading GGML model from HuggingFace
ggml_model_path = "https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/resolve/main/zephyr-7b-beta.Q4_0.gguf"
filename = "zephyr-7b-beta.Q4_0.gguf"

download_file(ggml_model_path, filename)


File already exists.


In [2]:
from llama_cpp import Llama

llm = Llama(model_path="zephyr-7b-beta.Q4_0.gguf", n_ctx=2048, n_batch=6856)


llama_model_loader: loaded meta data with 21 key-value pairs and 291 tensors from zephyr-7b-beta.Q4_0.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = huggingfaceh4_zephyr-7b-beta
llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.hea

In [3]:
def generate_text(
    prompt="Who is the CEO of Apple?",
    max_tokens=256,
    temperature=0.1,
    top_p=0.5,
    echo=False,
    stop=["#"],
):
    output = llm(
        prompt,
        max_tokens=max_tokens,
        temperature=temperature,
        top_p=top_p,
        echo=echo,
        stop=stop,
    )
    output_text = output["choices"][0]["text"].strip()
    return output_text


def generate_prompt_from_template(input):
    chat_prompt_template = f"""<|im_start|>system
You are a helpful chatbot.<|im_end|>
<|im_start|>user
{input}<|im_end|>"""
    return chat_prompt_template


def make_answer(text):
    prompt = generate_prompt_from_template(text)
    result = generate_text(prompt, max_tokens = 5120)
    print(result)



In [4]:
query = '''
Перескажи для ребёнка содержание на английском языке
The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. The ultimate reason for this is Moore's law, or rather its generalization of continued exponentially falling cost per unit of computation. Most AI research has been conducted as if the computation available to the agent were constant (in which case leveraging human knowledge would be one of the only ways to improve performance) but, over a slightly longer time than a typical research project, massively more computation inevitably becomes available. Seeking an improvement that makes a difference in the shorter term, researchers seek to leverage their human knowledge of the domain, but the only thing that matters in the long run is the leveraging of computation. These two need not run counter to each other, but in practice they tend to. Time spent on one is time not spent on the other. There are psychological commitments to investment in one approach or the other. And the human-knowledge approach tends to complicate methods in ways that make them less suited to taking advantage of general methods leveraging computation.  There were many examples of AI researchers' belated learning of this bitter lesson, and it is instructive to review some of the most prominent.
In computer chess, the methods that defeated the world champion, Kasparov, in 1997, were based on massive, deep search. At the time, this was looked upon with dismay by the majority of computer-chess researchers who had pursued methods that leveraged human understanding of the special structure of chess. When a simpler, search-based approach with special hardware and software proved vastly more effective, these human-knowledge-based chess researchers were not good losers. They said that ``brute force" search may have won this time, but it was not a general strategy, and anyway it was not how people played chess. These researchers wanted methods based on human input to win and were disappointed when they did not.
A similar pattern of research progress was seen in computer Go, only delayed by a further 20 years. Enormous initial efforts went into avoiding search by taking advantage of human knowledge, or of the special features of the game, but all those efforts proved irrelevant, or worse, once search was applied effectively at scale. Also important was the use of learning by self play to learn a value function (as it was in many other games and even in chess, although learning did not play a big role in the 1997 program that first beat a world champion). Learning by self play, and learning in general, is like search in that it enables massive computation to be brought to bear. Search and learning are the two most important classes of techniques for utilizing massive amounts of computation in AI research. In computer Go, as in computer chess, researchers' initial effort was directed towards utilizing human understanding (so that less search was needed) and only much later was much greater success had by embracing search and learning.
In speech recognition, there was an early competition, sponsored by DARPA, in the 1970s. Entrants included a host of special methods that took advantage of human knowledge---knowledge of words, of phonemes, of the human vocal tract, etc. On the other side were newer methods that were more statistical in nature and did much more computation, based on hidden Markov models (HMMs). Again, the statistical methods won out over the human-knowledge-based methods. This led to a major change in all of natural language processing, gradually over decades, where statistics and computation came to dominate the field. The recent rise of deep learning in speech recognition is the most recent step in this consistent direction. Deep learning methods rely even less on human knowledge, and use even more computation, together with learning on huge training sets, to produce dramatically better speech recognition systems. As in the games, researchers always tried to make systems that worked the way the researchers thought their own minds worked---they tried to put that knowledge in their systems---but it proved ultimately counterproductive, and a colossal waste of researcher's time, when, through Moore's law, massive computation became available and a means was found to put it to good use.
In computer vision, there has been a similar pattern. Early methods conceived of vision as searching for edges, or generalized cylinders, or in terms of SIFT features. But today all this is discarded. Modern deep-learning neural networks use only the notions of convolution and certain kinds of invariances, and perform much better.
This is a big lesson. As a field, we still have not thoroughly learned it, as we are continuing to make the same kind of mistakes. To see this, and to effectively resist it, we have to understand the appeal of these mistakes. We have to learn the bitter lesson that building in how we think we think does not work in the long run. The bitter lesson is based on the historical observations that 1) AI researchers have often tried to build knowledge into their agents, 2) this always helps in the short term, and is personally satisfying to the researcher, but 3) in the long run it plateaus and even inhibits further progress, and 4) breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning. The eventual success is tinged with bitterness, and often incompletely digested, because it is success over a favored, human-centric approach.
One thing that should be learned from the bitter lesson is the great power of general purpose methods, of methods that continue to scale with increased computation even as the available computation becomes very great. The two methods that seem to scale arbitrarily in this way are search and learning.
The second general point to be learned from the bitter lesson is that the actual contents of minds are tremendously, irredeemably complex; we should stop trying to find simple ways to think about the contents of minds, such as simple ways to think about space, objects, multiple agents, or symmetries. All these are part of the arbitrary, intrinsically-complex, outside world. They are not what should be built in, as their complexity is endless; instead we should build in only the meta-methods that can find and capture this arbitrary complexity. Essential to these methods is that they can find good approximations, but the search for them should be by our methods, not by us. We want AI agents that can discover like we can, not which contain what we have discovered. Building in our discoveries only makes it harder to see how the discovering process can be done.
'''
make_answer(query)


llama_print_timings:        load time =  295166.88 ms
llama_print_timings:      sample time =      90.18 ms /   243 runs   (    0.37 ms per token,  2694.61 tokens per second)
llama_print_timings: prompt eval time =  295165.30 ms /  1470 tokens (  200.79 ms per token,     4.98 tokens per second)
llama_print_timings:        eval time =  121059.73 ms /   242 runs   (  500.25 ms per token,     2.00 tokens per second)
llama_print_timings:       total time =  417258.35 ms /  1712 tokens


For children, here's a simplified version:
In AI research, using more computation is usually better than relying on human knowledge. This is because the amount of computation available keeps getting cheaper over time. Researchers often try to use human knowledge at first, but in the long run, using more computation is much more effective. This can be seen in games like chess and Go, where methods that rely on lots of search and learning have beaten world champions. In speech recognition and computer vision, early methods tried to mimic how humans see or hear things, but now deep learning neural networks are used instead because they use less human knowledge and perform better. This is a lesson we still need to learn in AI research, because sometimes researchers try to build in what they think they know, but this can hold back progress in the long run. Instead, we should focus on using methods that can find and capture complexity, rather than trying to understand everything about the wo

In [5]:
query = '''
TRANSLATE INTO RUSSIAN: 
The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. The ultimate reason for this is Moore's law, or rather its generalization of continued exponentially falling cost per unit of computation. Most AI research has been conducted as if the computation available to the agent were constant (in which case leveraging human knowledge would be one of the only ways to improve performance) but, over a slightly longer time than a typical research project, massively more computation inevitably becomes available. Seeking an improvement that makes a difference in the shorter term, researchers seek to leverage their human knowledge of the domain, but the only thing that matters in the long run is the leveraging of computation. These two need not run counter to each other, but in practice they tend to. Time spent on one is time not spent on the other. There are psychological commitments to investment in one approach or the other. And the human-knowledge approach tends to complicate methods in ways that make them less suited to taking advantage of general methods leveraging computation.  There were many examples of AI researchers' belated learning of this bitter lesson, and it is instructive to review some of the most prominent.
In computer chess, the methods that defeated the world champion, Kasparov, in 1997, were based on massive, deep search. At the time, this was looked upon with dismay by the majority of computer-chess researchers who had pursued methods that leveraged human understanding of the special structure of chess. When a simpler, search-based approach with special hardware and software proved vastly more effective, these human-knowledge-based chess researchers were not good losers. They said that ``brute force" search may have won this time, but it was not a general strategy, and anyway it was not how people played chess. These researchers wanted methods based on human input to win and were disappointed when they did not.
A similar pattern of research progress was seen in computer Go, only delayed by a further 20 years. Enormous initial efforts went into avoiding search by taking advantage of human knowledge, or of the special features of the game, but all those efforts proved irrelevant, or worse, once search was applied effectively at scale. Also important was the use of learning by self play to learn a value function (as it was in many other games and even in chess, although learning did not play a big role in the 1997 program that first beat a world champion). Learning by self play, and learning in general, is like search in that it enables massive computation to be brought to bear. Search and learning are the two most important classes of techniques for utilizing massive amounts of computation in AI research. In computer Go, as in computer chess, researchers' initial effort was directed towards utilizing human understanding (so that less search was needed) and only much later was much greater success had by embracing search and learning.
In speech recognition, there was an early competition, sponsored by DARPA, in the 1970s. Entrants included a host of special methods that took advantage of human knowledge---knowledge of words, of phonemes, of the human vocal tract, etc. On the other side were newer methods that were more statistical in nature and did much more computation, based on hidden Markov models (HMMs). Again, the statistical methods won out over the human-knowledge-based methods. This led to a major change in all of natural language processing, gradually over decades, where statistics and computation came to dominate the field. The recent rise of deep learning in speech recognition is the most recent step in this consistent direction. Deep learning methods rely even less on human knowledge, and use even more computation, together with learning on huge training sets, to produce dramatically better speech recognition systems. As in the games, researchers always tried to make systems that worked the way the researchers thought their own minds worked---they tried to put that knowledge in their systems---but it proved ultimately counterproductive, and a colossal waste of researcher's time, when, through Moore's law, massive computation became available and a means was found to put it to good use.
In computer vision, there has been a similar pattern. Early methods conceived of vision as searching for edges, or generalized cylinders, or in terms of SIFT features. But today all this is discarded. Modern deep-learning neural networks use only the notions of convolution and certain kinds of invariances, and perform much better.
This is a big lesson. As a field, we still have not thoroughly learned it, as we are continuing to make the same kind of mistakes. To see this, and to effectively resist it, we have to understand the appeal of these mistakes. We have to learn the bitter lesson that building in how we think we think does not work in the long run. The bitter lesson is based on the historical observations that 1) AI researchers have often tried to build knowledge into their agents, 2) this always helps in the short term, and is personally satisfying to the researcher, but 3) in the long run it plateaus and even inhibits further progress, and 4) breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning. The eventual success is tinged with bitterness, and often incompletely digested, because it is success over a favored, human-centric approach.
One thing that should be learned from the bitter lesson is the great power of general purpose methods, of methods that continue to scale with increased computation even as the available computation becomes very great. The two methods that seem to scale arbitrarily in this way are search and learning.
The second general point to be learned from the bitter lesson is that the actual contents of minds are tremendously, irredeemably complex; we should stop trying to find simple ways to think about the contents of minds, such as simple ways to think about space, objects, multiple agents, or symmetries. All these are part of the arbitrary, intrinsically-complex, outside world. They are not what should be built in, as their complexity is endless; instead we should build in only the meta-methods that can find and capture this arbitrary complexity. Essential to these methods is that they can find good approximations, but the search for them should be by our methods, not by us. We want AI agents that can discover like we can, not which contain what we have discovered. Building in our discoveries only makes it harder to see how the discovering process can be done.
'''
make_answer(query)

Llama.generate: prefix-match hit

llama_print_timings:        load time =  295166.88 ms
llama_print_timings:      sample time =     238.66 ms /   594 runs   (    0.40 ms per token,  2488.89 tokens per second)
llama_print_timings: prompt eval time =  310282.16 ms /  1420 tokens (  218.51 ms per token,     4.58 tokens per second)
llama_print_timings:        eval time =  308195.30 ms /   593 runs   (  519.72 ms per token,     1.92 tokens per second)
llama_print_timings:       total time =  621400.20 ms /  2013 tokens


В качестве помощи чатбота, могу предоставить вам перевод этого текста на русский язык:

70 лет исследований в области искуственного интеллекта позволяют прочитать самую большую лекцию. Методы, основанные на компьютере и вовлечающие больше вычислений, оказываются гораздо более эффективными и в три раза превосходят методы, основанные на использовании человеческого знания. Эта причина заключается в том, что стоимость вычислений постоянно падает согласно закону Мурра, или лучше - его продолжению. Большинство исследований в области искуственного интеллекта велись как будто количество доступных вычислений для агента неизменно (в котором случае использование человеческого знания было бы единственным способом улучшить результаты), но, на протяжении несколько длительнее обычной исследовательской программы, становится доступной огромная масса вычислений. Исследователи ищут методы, основанные на использовании человеческого знания, поскольку результаты будут иметь значение в более короткий срок, н

In [6]:
# перевод не идеальный, но понять можно
#далее будет делаться пересказ статьи "ФОРМАЛЬНАЯ МОДЕЛЬ ИНФОРМАЦИОННОАНАЛИТИЧЕСКОЙ СИСТЕМЫ ФИНАНСОВОГО УЧЕТА НА ПРИМЕРЕ МАШИНОСТРОИТЕЛЬНОГО ПРЕДПРИЯТИЯ"

In [7]:
query = '''
Сделай пересказ статьи:
Основным показателем в управлении любым предприятием является эффективность, и в первую очередь это должна быть экономическая эффективность. Верное и строгое финансово-экономическое планирование с помощью информационных технологий – это один из 
важных этапов к успешному развитию предприятия. Активное внедрение информационных систем в первую очередь затронуло промышленные предприятия, обеспеченные заказами на несколько лет. Эффективно управлять производственными мощностями, контролировать затраты на производство, минимизировать ошибки, организовать точный 
учет и обеспечить оперативное планирование – все это достигается 
с помощью внедренных автоматизированных информационных систем. Однако стандартные информационные системы даже при доработке в крупных компаниях не захватывают все аспекты оперативноаналитического учета на предприятии [14].
Необходимы системы, способные оперативно решать узкие задачи: интегрирование операций бизнес-процесса с планированием, сбор, 
обработка и корректировка больших объемов неоднородных данных, 
их анализ и прогноз. Данный подход повышает эффективность производства предприятия посредством усовершенствования функционирования автоматизированных информационных систем. Для полного понимания и раскрытия проблем в формировании информационноаналитической системы финансового учета рассмотрим пример реализации процесса с помощью информационной автоматизированной системы 1С: «Предприятие» блок управления и контроля финансового планирования и бюджетирования на примере промышленного предприятия. Для описания процесса используем цикл Деминга, согласно 
которому управление должно циклически проходить по следующим 
этапам: P – планирование или же проектирование, D – контроль, C –
корректировка или регулировка, A – исполнение [15]. 
При поступлении заказа на предприятие и на основании приказа 
предприятия начинает зарождаться проект контракта (рис. 3). Плановоэкономическим отделом составляется плановая калькуляция для определения себестоимости продукции, что позволяет финансовому отделу 
проделать работу по формированию бюджета движения денежных 
средств (БДДС) и определить финансовую устойчивость предприятия, 
т.е. сможет ли предприятие исполнить данный контракт посредством 
собственных средств или же с привлечением кредитных. Следующим 
этапом является составление планового бюджета на год по каждому 
структурному подразделению в рамках заказов. Бюджет на год составляется структурным подразделением с разнесением по статьям суммы 
расходов, обязательной частью является определение дополнительной 
аналитики (на предприятии в основном это номер серии машины), дополнительная аналитика определяется отделом финансового планирования на основании приказа руководителя [16]. Предполагается, что 
данная ИАС при установленном лимите годового бюджета, утвержденном финансовым сектором управляющей компанией, является 
входящим продуктом для управления этапами процесса, запустит итерационный цикл взаимодействия объектов бюджетирования. На этапе 
планирования, опираясь на имеющийся бюджет движения денежных 
средств, определяются плановые значения элементов бюджета, проводится аналитика задач процесса. На следующем этапе исполнения, на 
основе подтверждающих документов, таких как товарно-транспортная 
накладная (ТТН), твердофиксированная цена (ТФЦ), выполняется учет 
фактических затрат постатейно, выявляются ошибки и недочеты данной операции и определяются пути решения установленных задач. На 
следующем этапе выполняется контроль, соответствует ли плану 
и бюджету выполненные задачи на базе ежемесячных планируемых 
платежей. Если же контроль выявляет проблемные зоны, запускается 
этап корректировки, где проводятся редактирование данных и анализ 
для дальнейшего принятия решения. Все этапы повторяются столько, 
сколько необходимо для достижения решения поставленных задач.
Функционирование данной ИАС основано на методическом 
инструментарии по формированию и корректировке бюджета и на аналитических формах данных бюджета, а также базы данных с выходом 
отчетов и куба данных [17].
На сегодняшний день существующие ИС не полностью автоматизируют блок финансового контроля. Реализация процесса по финансовому контролю содержит в себе огромнейшую аналитическую работу сотрудника с выполнением трудоемких операций в самой системе. 
Часто на мониторинг, подготовку отчетов, заполнение различных таблиц уходит 2/3 рабочего времени сотрудников. Устаревшие методика 
и схема данного бизнес-процесса влекут за собой рутинную работу, 
занимающую большое количество времени сотрудников, которое можно использовать на решение более важных вопросов [18].
'''
make_answer(query)

Llama.generate: prefix-match hit

llama_print_timings:        load time =  295166.88 ms
llama_print_timings:      sample time =      70.10 ms /   195 runs   (    0.36 ms per token,  2781.58 tokens per second)
llama_print_timings: prompt eval time =  408869.25 ms /  1819 tokens (  224.78 ms per token,     4.45 tokens per second)
llama_print_timings:        eval time =   96094.11 ms /   194 runs   (  495.33 ms per token,     2.02 tokens per second)
llama_print_timings:       total time =  505788.56 ms /  2013 tokens


The article emphasizes that the main indicator in managing any organization is economic efficiency, and proper financial and economic planning with the help of information technologies is one of the important steps towards successful development. The active introduction of information systems has primarily affected industrial enterprises with long-term orders. Effective management of production capacities, cost control, minimization of errors, accurate accounting, and operational planning are all achieved through implemented automated information systems. However, standard information systems, even after customization in large companies, do not fully cover all aspects of operational analytical accounting on the enterprise [14]. The need for systems that can solve narrow tasks is becoming increasingly apparent: integration of business processes with planning, collection, processing, and correction of large volumes of heterogeneous data, analysis, and forecasting. This approach increases

In [8]:
# другой человек прочитал статью и после чего мой пересказ и пересказ ЛЛМ. Мой пересказ был оценен на 7, а пересказ ЛЛМ на 2.