# Deploy `microsoft/phi-2` model on MonsterAPI using Monster Deploy

Monster Deploy is a new LLM Deployment engine that enables you to serve various LLMs along with lora adapters as an API endpoint on MonsterAPI's robust and cost optimised GPU Cloud.

Following Deployment options are supported:
1. Deploy SOTA LLMs and fine-tuned LLM LoRA adapters as a REST API serving endpoint
2. Deploy docker containers for GPU powered applications

Monster Deploy offers in-built optimisations for higher throughput and lower cost of serving LLMs.

Checkout our [Developer Docs](https://developer.monsterapi.ai/docs/monster-deploy-beta)

If you haven't applied for Deploy beta then you may signup on this [Google form](https://forms.gle/ZHuZt68fJLRozo3v9) for early access with free credits.

Sign up on [MonsterAPI](https://monsterapi.ai/signup?utm_source=llm-deploy-colab&utm_medium=referral) and get a free auth key. Paste it below:
Make sure you have signed up  for beta access at [here](https://forms.gle/TTJRapHm59RxjttJA)

In [None]:
api_key = "YOUR_MONSTERAPI_KEY"

### Install and Initialize MonsterAPI Client

In [None]:
!python3 -m pip install monsterapi==1.0.2b3

from monsterapi import client as mclient
deploy_client = mclient(api_key = api_key)

### Create `microsoft/phi-2` model deployment:

In [None]:
launch_payload = {
    "basemodel_path": "microsoft/phi-2",
    "prompt_template": "Instruct: {prompt}.Output:{completion}",
    "per_gpu_vram": 16,
    "gpu_count": 1,
    "use_nightly": True
}

# Launch a deployment
ret = deploy_client.deploy("llm", launch_payload)
deployment_id = ret.get("deployment_id")
print(deployment_id)

### Fetch your Deployment Status:

Wait until the status is `Live`. It should take 5-10 minutes.

In [None]:
status_ret = deploy_client.get_deployment_status(deployment_id)
print(status_ret)

### Once the deployment is live, let's query our deployed LLM endpoint:

In [None]:
import json

assert status_ret.get("status") == "live", "Please wait until status is live!"

service_client  = mclient(api_key = status_ret.get("api_auth_token"), base_url = status_ret.get("URL"))

payload = {
    "input_variables": {"system": "You are a friendly chatbot good at logical reasoning and you provide complete answers.",
        "prompt": "Write a detailed analogy between mathematics and physics?"},
    "stream": False,
    "temperature": 0.7,
    "max_tokens": 256
}

output = service_client.generate(model = "deploy-llm", data = payload)

if payload.get("stream"):
    for i in output:
        print(i[0])
else:
    print(json.loads(output)['text'][0])



 Mathematics is like the foundation of a building, providing the structure and support for the entire structure. In the same way, physics is like the laws of nature that govern the behavior of everything in the universe. Without the foundation of mathematics, physics would not be able to accurately describe and predict the behavior of the physical world. Similarly, without the laws of nature provided by physics, mathematics would not be able to accurately model and predict the behavior of the mathematical concepts.



------

### Terminate Deployment

Once your work is done, you may terminate your LLM deployment and stop the account billing

In [None]:
terminate_return = deploy_client.terminate_deployment(deployment_id)
print(terminate_return)