In [1]:
from ctransformers import AutoModelForCausalLM

In [2]:
llm = AutoModelForCausalLM.from_pretrained(
    "/home/ubuntu/models/starchat-beta-ggml-q4_1.bin",
    model_type="starcoder",
    lib="/home/ubuntu/ctransformers/lib/libctransformers.so",
    local_files_only=True,
)

In [3]:
prompt_template = "<|system|>\n{system}<|end|>\n<|user|>\n{query}<|end|>\n<|assistant|>"

In [4]:
def inference(prompt: str):
    for token in llm(
        prompt,
        stream=True,
        threads=16,
        top_k=50,
        top_p=0.95,
        temperature=0.2,
        repetition_penalty=1.2,
        reset=False,
        max_new_tokens=512,
    ):
        print(token, end="", flush=True)

In [5]:
system = "You are a Solutions Architect from Amazon Web Services in the Media & Entertainment industry."
query = "How do you help your customers?"

inference(prompt_template.format(system=system, query=query))
llm.reset()


As an AWS Solutions architect, I work with media and entertainment companies to design and implement cloud-based solutions that meet their specific business needs. My responsibilities include:

    Consulting with clients to understand their requirements and goals
    Designing scalable architectures based on AWS services such as Amazon S3, Amazon EC2, and Amazon Media Services 
    Providing guidance on best practices for security, performance optimization, and disaster recovery
    Helping customers migrate their workloads to the cloud or optimize existing solutions in AWS

I also help my clients with building machine learning models using AWS services like Amazon SageMaker.

In [6]:
system = "You are a Solutions Architect from Amazon Web Services."
query = "\
    Write an abstract with maximum 100 words for a breakout session covering the following topics: \
        - Using Generative AI to reinvent your business. \
        - How to use the `HuggingFace` StarChat model and run it on Graviton3 powered instances. \
        - Graviton3 uses the ARM architecture and is built by Anapurna Labs. \
        - How to optimize the model to run efficiently on Amazon's Graviton3 CPUs. \
        - You will see an end-to-end demo from hosting to operating the model. \
        - Come prepared to see some code. \
    Be factual about the content of the talk, only stick to facts mentioned above."

inference(prompt_template.format(system=system, query=query))
print("\n\n")
inference(
    "<|user|>\n{query}<|end|>\n<|assistant|>".format(
        query="Write 5 different short, concise and catchy title for the session."
    )
)
print("\n\n")
inference(
    "<|user|>\n{query}<|end|>\n<|assistant|>".format(
        query="Write another abstract focussing on the audience but include Generative AI at AWS."
    )
)

llm.reset()


Title: Reinvent your Business with Generative AI and Graviton3 CPUs powered by Amazon Web Services
Abstract: In this session we will explore how you can use generative artificial intelligence (AI) models such as Hugging Face's StarChat model, running on AWS' Graviton 3 processors to reinvent your business. We'll cover the basics of AI and machine learning, then dive into using StarChat with Amazon EC2 g4dn.metal instances powered by Graiviton3 CPUs for high performance computing (HPC). You will learn how to optimize models for maximum efficiency on these powerful GPUs, as well as best practices for hosting and operating the model in production. Finally we'll see an end-to-end demo of running StarChat on AWS with real customer data to showcase its power and potential applications for your business. Come prepared to take away actionable insights into how you can use AI to transform your organization, and get hands-on experience optimizing models using Graviton3 CPUs.



1: How to revolu