# Test model

In [25]:
!pip install transformers peft accelerate bitsandbytes "sagemaker>=2.190.0" gradio --upgrade --quiet

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [2]:
from huggingface_hub import login
login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [3]:
import torch
from transformers import BitsAndBytesConfig, AutoModelForCausalLM
from peft import LoraConfig, get_peft_model


model_path = "outputs"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

model = AutoModelForCausalLM.from_pretrained(model_path
                                             ,quantization_config=bnb_config
                                             #, local_files_only=True
                                             )

lora_config = LoraConfig.from_pretrained(model_path
                                         #, local_files_only=True
                                         )

model = get_peft_model(model, lora_config)

Loading checkpoint shards:   0%|          | 0/14 [00:00<?, ?it/s]

Downloading (…)neration_config.json:   0%|          | 0.00/132 [00:00<?, ?B/s]

In [4]:
from transformers import AutoTokenizer

model_name = "TinyPixel/Llama-2-7B-bf16-sharded"

tokenizer = AutoTokenizer.from_pretrained(
    model_name
    #, trust_remote_code=True
    )
tokenizer.pad_token = tokenizer.eos_token

device = "cuda:0"

Downloading (…)okenizer_config.json:   0%|          | 0.00/676 [00:00<?, ?B/s]

Downloading tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/411 [00:00<?, ?B/s]

In [5]:
text = """
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Analyze and explain the legal reasoning behind the judgment in the given case.

### Input:
Central Inland Water Transport Corporation Ltd. vs Brojo Nath Ganguly & Anr., 1986 AIR 1571, 1986 SCR (2) 278

### Response:"""


print (text)


Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Analyze and explain the legal reasoning behind the judgment in the given case.

### Input:
Central Inland Water Transport Corporation Ltd. vs Brojo Nath Ganguly & Anr., 1986 AIR 1571, 1986 SCR (2) 278

### Response:


In [6]:
Correct_response = """

### Response:The Supreme Court in this case applied a broad interpretation of the term 'State' under Article 12 of the Constitution.
The court reasoned that a government company undertaking public functions qualifies as 'State' based on factors like government control, public importance of activities etc.
This interpretation was based on previous decisions that have defined 'State' under Article 12 broadly to include various agencies and instrumentalities beyond just statutory bodies.
The court also applied the principle that unreasonable and arbitrary contractual terms can be struck down under Article 14 of the Constitution.
The court found that Rule 9(i) of the service rules, which allowed for termination of service without reason, conferred unfettered power to terminate employment without hearing.
This was deemed arbitrary and violative of principles of natural justice and right to equality under Article 14.
Furthermore, the court held that the right to life and livelihood under Article 21 is affected by arbitrary termination of employment.
The court reasoned that the right to livelihood is an integral part of the right to life, and any arbitrary action that affects a person's livelihood would be a violation of Article 21.In conclusion,
the court's legal reasoning was based on a broad interpretation of the term 'State',
the application of the principle of equality and natural justice under Article 14,
and the protection of the right to life and livelihood under Article 21."""

In [7]:
inputs = tokenizer(text, return_tensors="pt").to(device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))


Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Analyze and explain the legal reasoning behind the judgment in the given case.

### Input:
Central Inland Water Transport Corporation Ltd. vs Brojo Nath Ganguly & Anr., 1986 AIR 1571, 1986 SCR (2) 278

### Response:
The case of Central Inland Water Transport Corporation Ltd. vs Brojo Nath Ganguly & Anr., 1986 AIR 1571, 1986 SCR (2) 278, is a landmark case in the field of contract law in India. The case involved a dispute between the Central Inland Water Transport Corporation Ltd. (CIWT) and Brojo Nath Ganguly & Anr. (BNG) over the payment of freight charges for the transportation of goods by water.

The CIWT had contracted with BNG to transport goods from Calcutta to Dhaka, Bangladesh. The contract provided that the freight charges would be paid in Indian rupees, and that the payment would be made within 30 days

### SageMaker and AWS Configuration
Initializes the SageMaker session and sets up the AWS role for execution. This step is crucial for deploying the model on AWS SageMaker.


In [8]:
import sagemaker
import boto3

sess = sagemaker.Session()
# sagemaker session bucket -> used for uploading data, models and logs
# sagemaker will automatically create this bucket if it not exists
sagemaker_session_bucket=None
if sagemaker_session_bucket is None and sess is not None:
    # set to default bucket if a bucket name is not given
    sagemaker_session_bucket = sess.default_bucket()

try:
    role = sagemaker.get_execution_role()
except ValueError:
    iam = boto3.client('iam')
    role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

sess = sagemaker.Session(default_bucket=sagemaker_session_bucket)

print(f"sagemaker role arn: {role}")
print(f"sagemaker session region: {sess.boto_region_name}")

sagemaker role arn: arn:aws:iam::157318562066:role/service-role/AmazonSageMakerServiceCatalogProductsUseRole
sagemaker session region: eu-central-1


### Retrieve Hugging Face LLM Image URI
Fetches the Large Language Model (LLM) image URI from Hugging Face's SageMaker SDK. This image is essential for deploying the model on SageMaker.


In [9]:
from sagemaker.huggingface import get_huggingface_llm_image_uri

# retrieve the llm image uri
llm_image = get_huggingface_llm_image_uri(
  "huggingface",
  version="0.9.3"
)

# print ecr image uri
print(f"llm image uri: {llm_image}")

llm image uri: 763104351884.dkr.ecr.eu-central-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.0.1-tgi0.9.3-gpu-py39-cu118-ubuntu20.04


In [10]:
# sagemaker config
instance_type = "ml.g4dn.2xlarge"
number_of_gpu = 1
health_check_timeout = 600 # 10 minutes to be able to load the model

In [11]:
from getpass import getpass

# Prompt the user to enter the Hugging Face token without displaying it
hf_token = getpass("Please enter your Hugging Face token: ")

Please enter your Hugging Face token: ········


In [12]:
# Define Model and Endpoint configuration parameter
import json

config = {
  'HF_MODEL_ID': "TinyPixel/Llama-2-7B-bf16-sharded", # model_id from hf.co/models
  'SM_NUM_GPUS': json.dumps(number_of_gpu), # Number of GPU used per replica
  'MAX_INPUT_LENGTH': json.dumps(2048),  # Max length of input text
  'MAX_TOTAL_TOKENS': json.dumps(4096),  # Max length of the generation (including input text)
  'MAX_BATCH_TOTAL_TOKENS': json.dumps(8192), # Limits the number of tokens that can be processed in parallel during the generation
  'MAX_BATCH_PREFILL_TOKENS': json.dumps(4096),
  'HUGGING_FACE_HUB_TOKEN': hf_token,
  'HF_MODEL_QUANTIZE': "bitsandbytes" # comment in to quantize
}

# check if token is set
#assert config['HUGGING_FACE_HUB_TOKEN'] != "<REPLACE WITH YOUR TOKEN>", "Please set your Hugging Face Hub token"

In [13]:
from sagemaker.huggingface import HuggingFaceModel

# create HuggingFaceModel with the image uri
llm_model = HuggingFaceModel(
  role=role,
  image_uri=llm_image,
  env=config
)

In [15]:
# Deploy model to an endpoint
# https://sagemaker.readthedocs.io/en/stable/api/inference/model.html#sagemaker.model.Model.deploy
llm = llm_model.deploy(
  initial_instance_count=1,
  instance_type=instance_type,
  container_startup_health_check_timeout=health_check_timeout, # 10 minutes to be able to load the model
)



--------------!

In [16]:
endpoint_name = llm.endpoint_name
print(f"The endpoint name is {endpoint_name}")

The endpoint name is huggingface-pytorch-tgi-inference-2023-10-13-03-49-18-031


In [36]:
def build_llama2_prompt(messages):
    (message, history, system_prompt)
    startPrompt = "<s>[INST] "
    endPrompt = " [/INST]"
    conversation = []
    for index, message in enumerate(messages):
        if message["role"] == "system" and index == 0:
            conversation.append(f"<<SYS>>\n{message['content']}\n<</SYS>>\n\n")
        elif message["role"] == "user":
            conversation.append(message["content"].strip())
        else:
            conversation.append(f" [/INST] {message['content'].strip()}</s><s>[INST] ")

    return startPrompt + "".join(conversation) + endPrompt


messages = [
  { "role": "system","content": """"
  
  Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
  Analyze and explain the legal reasoning behind the judgment in the given case."""}
]

In [39]:
# define format function for our input
def format_prompt(message, history, system_prompt):
    prompt = ""
    for user_prompt, bot_response in history:
        prompt += f"### Instruction\n{user_prompt}\n\n"
        prompt += f"### Answer\n{bot_response}\n\n"  # Response already contains "Falcon: "
    prompt += f"### Instruction\n{message}\n\n### Answer\n"
    return prompt

history=""
system_prompt= """"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
  Analyze and explain the legal reasoning behind the judgment in the given case."""

In [40]:
# define question and add to messages


message = "Central Inland Water Transport Corporation Ltd. vs Brojo Nath Ganguly & Anr., 1986 AIR 1571, 1986 SCR (2) 278"


prompt = format_prompt(message, history, system_prompt)
#print(prompt)
#del(messages)
#del(instruction)
#del(prompt)
chat = llm.predict({"inputs":prompt})

print(chat[0]["generated_text"][len(prompt):])


The appellant is a statutory corporation established under the Central Inland Water Transport


In [41]:
# hyperparameters for llm

parameters= {
    "do_sample": True,
    "top_p": 0.6,
    "temperature": 0.9,
    "top_k": 50,
    "max_new_tokens": 512,
    "repetition_penalty": 1.03,
    "stop": ["</s>"]
  }

payload = {
  "inputs":  prompt,
  "parameters": parameters
}

# send request to endpoint
response = llm.predict(payload)

print(response[0]["generated_text"][len(prompt):])


(a) The question is whether the Central Government can make a rule to make the contract of carriage of goods by inland waterways as a contract of carriage of goods by sea under the Carriage of Goods by Sea Act, 1925. The question is whether the Carriage of Goods by Sea Act, 1925 can be made applicable to the carriage of goods by inland waterways. The answer to the question is in the negative.

(b) The Carriage of Goods by Sea Act, 1925 applies to contracts of carriage of goods by sea. The Carriage of Goods by Sea Act, 1925 does not apply to contracts of carriage of goods by inland waterways.

### Held

(a) The Carriage of Goods by Sea Act, 1925 cannot be made applicable to the carriage of goods by inland waterways. The Central Government cannot make a rule to make the contract of carriage of goods by inland waterways as a contract of carriage of goods by sea under the Carriage of Goods by Sea Act, 1925.

(b) The Carriage of Goods by Sea Act, 1925 applies to contracts of carriage of go

In [42]:
endpoint_name = llm.endpoint_name
print(f"The endpoint name is {endpoint_name}")


The endpoint name is huggingface-pytorch-tgi-inference-2023-10-13-03-49-18-031


In [44]:
# add apps directory to path ../apps/
import sys
sys.path.append("../demo")
from sagemaker_chat import create_gradio_app

# create gradio app
create_gradio_app(
    endpoint_name,           # Sagemaker endpoint name
    session=sess.boto_session,   # boto3 session used to send request
    parameters=parameters,       # Request parameters
    system_prompt=None,          # System prompt to use
    format_prompt=format_prompt, # Function to format prompt
    concurrency_count=4,         # Number of concurrent requests
    share=True,                  # Share app publicly
)

Running on local URL:  http://127.0.0.1:7862
Running on public URL: https://2d619855cd716c0bbd.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




In [None]:
#llm.delete_model()
#llm.delete_endpoint()
