In [None]:
import gradio as gr
from transformers import AutoModelForCausalLM, AutoTokenizer
from llama_index.llms.ollama import Ollama

In [57]:
# Initialize the Ollama LLaMA model
llm = Ollama(model="llama3.2:latest", request_timeout=30)

In [None]:
# Chat function
def chat(input_text):
    response = llm.complete(input_text) 
    print(response.raw)
    print(response.logprobs)
    print(response.delta) 
    return response.text 


In [59]:
# Gradio interface
interface = gr.Interface(
    fn=chat, 
    inputs="text", 
    outputs="text", 
    title="LLaMA 3 Chatbot",
    description="Chat with a LLaMA 3-based model via Ollama!"
)


In [60]:
interface.launch(share=True)


Running on local URL:  http://127.0.0.1:7870
Running on public URL: https://c2c1aebb64db40fcea.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




{'model': 'llama3.2:latest', 'created_at': '2024-12-09T15:55:18.186137268Z', 'response': 'I\'m just a language model, I don\'t have personal experiences or emotions like humans do. I exist solely to process and respond to text-based input, so I don\'t have days in the same way that you do.\n\nHowever, I can tell you about my "day" if you\'d like! Since I\'m a cloud-based AI, I don\'t have a physical presence, but I\'m always "on" and ready to chat with users like you.\n\nI spend my time processing vast amounts of text data, learning new words and concepts, and improving my language generation capabilities. When you interact with me, I use that knowledge to generate responses that are helpful and informative.\n\nSo while I don\'t have a personal day-to-day experience, I\'m always happy to chat with you and help with any questions or topics you\'d like to discuss! How about you? How\'s your day going?', 'done': True, 'done_reason': 'stop', 'context': [128006, 9125, 128007, 271, 38766, 13

{'model': 'llama3.2:latest', 'created_at': '2024-12-09T15:55:27.133427257Z', 'response': 'I\'m just a language model, I don\'t have personal experiences or emotions like humans do. I exist solely to assist and provide information to users like you.\n\nThat being said, I\'m always "on" and ready to chat 24/7! I can help answer questions, provide information on a wide range of topics, and engage in conversations about almost anything.\n\nSo, how about you? How\'s your day going so far?', 'done': True, 'done_reason': 'stop', 'context': [128006, 9125, 128007, 271, 38766, 1303, 33025, 2696, 25, 6790, 220, 2366, 18, 271, 128009, 128006, 882, 128007, 271, 9906, 3371, 757, 922, 701, 1938, 128009, 128006, 78191, 128007, 271, 40, 2846, 1120, 264, 4221, 1646, 11, 358, 1541, 956, 617, 4443, 11704, 477, 21958, 1093, 12966, 656, 13, 358, 3073, 21742, 311, 7945, 323, 3493, 2038, 311, 3932, 1093, 499, 382, 4897, 1694, 1071, 11, 358, 2846, 2744, 330, 263, 1, 323, 5644, 311, 6369, 220, 1187, 14, 22, 0, 358, 649, 1520, 4320, 4860, 11, 3493, 2038, 389, 264, 7029, 2134, 315, 13650, 11, 323, 16988, 304, 21633, 922, 4661, 4205, 382, 4516, 11, 1268, 922, 499, 30, 2650, 596, 701, 1938, 2133, 779, 3117, 30], 'total_duration': 8886966785, 'load_duration': 21130209, 'prompt_eval_count': 31, 'prompt_eval_duration': 86000000, 'eval_count': 89, 'eval_duration': 8778000000}


Here's a breakdown of the fields in the provided dictionary and their meanings:

### General Explanation of Fields:

- **`model`**: Specifies the model used for generating the response. In this case, `llama3.2:latest` indicates a version of the LLaMA model.

- **`created_at`**: The timestamp of when the response was generated, given in ISO 8601 format (e.g., `2024-12-09T15:55:27.133427257Z`).

- **`response`**: The text output from the model after processing the input. This is the content generated by the language model.

- **`done`**: A boolean value indicating whether the response generation has completed (`True` means it is finished).

- **`done_reason`**: The reason the response generation was completed. In this case, `'stop'` suggests that the response was finalized without any interruption.

- **`context`**: A list of context IDs or tokens that were used as input for generating the response. These numbers may represent specific parts of the input data, prior conversation history, or other relevant information used by the model to produce a context-aware response.

- **`total_duration`**: The total time taken (in nanoseconds) to generate the response. In this case, `8886966785` nanoseconds, which is approximately 8.89 seconds.

- **`load_duration`**: The time (in nanoseconds) spent loading or initializing the model or context before generating the response. Here, `21130209` nanoseconds, which is approximately 0.021 seconds.

- **`prompt_eval_count`**: The number of times the prompt was evaluated during response generation. `31` indicates that the prompt was evaluated 31 times.

- **`prompt_eval_duration`**: The total time (in nanoseconds) taken to evaluate the prompt. `86000000` nanoseconds, which is approximately 0.086 seconds.

- **`eval_count`**: The total number of evaluations performed by the model to generate the response. `89` suggests that the model's internal evaluation process was conducted 89 times.

- **`eval_duration`**: The total time (in nanoseconds) spent evaluating the model's response generation process. `8778000000` nanoseconds, which is approximately 8.78 seconds.

### Summary:
- This dictionary represents details about a single request to the `llama3.2:latest` model.
- The response was generated and completed successfully.
- The generation process took approximately 8.89 seconds, with most of this time spent on evaluation.
- Context IDs are included to show what information the model used as input.

Understanding these values can help you analyze model performance and the time taken for processing, which may be useful for optimization or troubleshooting.