In [1]:
import os , json , time

# Load your credentials from a JSON file
with open('.env.json', 'r') as file:
    credentials = json.load(file)

# Hugging Face APIs

## Example 1: Sentence Completion**

Let’s look at how we can use Bloom for sentence completion. The code below uses the hugging face token for API to send an API call with the input text and appropriate parameters for getting the best response.

In [20]:
from huggingface_hub import InferenceClient


inference = InferenceClient(model = "bigscience/bloom",token=credentials['huggingface_API_KEY'])

In [21]:
def infer(prompt:str,
          max_length = 128,
          top_k = 0,
          num_beams = 0,
          no_repeat_ngram_size = 2,
          top_p = 0.9,
          seed=42,
          temperature=0.7,
          greedy_decoding = False,
          return_full_text = False):
    

    top_k = None if top_k == 0 else top_k
    do_sample = False if num_beams > 0 else not greedy_decoding
    num_beams = None if (greedy_decoding or num_beams == 0) else num_beams
    no_repeat_ngram_size = None if num_beams is None else no_repeat_ngram_size
    top_p = None if num_beams else top_p
    early_stopping = None if num_beams is None else num_beams > 0
    
    params = {
            "max_new_tokens": max_length,
            "top_k": top_k,
            "top_p": top_p,
            "temperature": temperature,
            "do_sample": do_sample,
            "seed": seed,
            "early_stopping":early_stopping,
            "no_repeat_ngram_size":no_repeat_ngram_size,
            "num_beams":num_beams,
            "return_full_text":return_full_text
        }
    
    s = time.time()
    response = inference.post(json = {"inputs": prompt, "params": params})
    #print(f"Processing time was {proc_time} seconds")
    return json.loads(response.decode('utf-8'))

In [30]:
print(infer(prompt='The thing that makes large language models interesting is ')[0]['generated_text'])

The thing that makes large language models interesting is  that they are very large. The largest models are in the order of billions of parameters. This is


## Example 2: Question Answers**

We can use the API for the Roberta-base model which can be a source to refer to and reply to. Let’s change the payload to provide some information about myself and ask the model to answer questions based on that.

### Using PIPLINE 

In [4]:
from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline

model_name = "deepset/roberta-base-squad2"
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)
QA_input = {
    'question': 'Why is model conversion important?',
    'context': 'The option to convert models between FARM and transformers gives freedom to the user and let people easily switch between frameworks.'
}
res = nlp(QA_input)

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/496M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/79.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/772 [00:00<?, ?B/s]

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


In [5]:
res

{'score': 0.21171456575393677,
 'start': 59,
 'end': 84,
 'answer': 'gives freedom to the user'}

### Without PIPLINE 

In [9]:

import torch
from transformers import AutoTokenizer, AutoModelForQuestionAnswering

model_name = 'deepset/roberta-base-squad2'
model = AutoModelForQuestionAnswering.from_pretrained(model_name,return_dict=False)
tokenizer = AutoTokenizer.from_pretrained(model_name)

question = "What has Huggingface done ?"
text = "Huggingface has democratized NLP. Huge thanks to Huggingface for this."

encoding = tokenizer(question, text, return_tensors="pt")
input_ids = encoding["input_ids"]
# Transform input tokens



# default is local attention everywhere
# the forward method will automatically set global attention on question tokens
attention_mask = encoding["attention_mask"]

start_scores, end_scores = model(input_ids, attention_mask=attention_mask)
all_tokens = tokenizer.convert_ids_to_tokens(input_ids[0].tolist())

answer_tokens = all_tokens[torch.argmax(start_scores) :torch.argmax(end_scores)+1]
answer = tokenizer.decode(tokenizer.convert_tokens_to_ids(answer_tokens))
answer

' democratized NLP'

## Example 3: Summarization

We can summarize using Large Language Models. Let’s summarize a long text describing large language models using the Bart Large CNN model. We modify the API URL and added the input text below:

In [37]:
from pprint import pprint
import requests

API_URL = "https://api-inference.huggingface.co/models/facebook/bart-large-cnn"

def query(payload):
    headers = {
        "Authorization": f"Bearer {credentials['huggingface_API_KEY']}",
    }
    response = requests.post(API_URL, json=payload , headers=headers)
    return response.json()

params = {'do_sample': False}

full_text = '''AI applications are summarizing articles, writing stories and
engaging in long conversations — and large language models are doing
the heavy lifting.

A large language model, or LLM, is a deep learning model that can
understand, learn, summarize, translate, predict, and generate text and other
content based on knowledge gained from massive datasets.

Large language models - successful applications of
transformer models. They aren’t just for teaching AIs human languages,
but for understanding proteins, writing software code, and much, much more.

In addition to accelerating natural language processing applications —
like translation, chatbots, and AI assistants — large language models are
used in healthcare, software development, and use cases in many other fields.'''

output = query({
    'inputs': full_text,
    'parameters': params
})

pprint(output)

[{'summary_text': 'A large language model, or LLM, is a deep learning model '
                  'that can understand, learn, summarize, translate, predict, '
                  'and generate text. They aren’t just for teaching AIs human '
                  'languages, but for understanding proteins, writing software '
                  'code, and much more.'}]


In [46]:
print(output)

[{'summary_text': 'A large language model, or LLM, is a deep learning model that can understand, learn, summarize, translate, predict, and generate text. They aren’t just for teaching AIs human languages, but for understanding proteins, writing software code, and much more.'}]


## Example 4 langchain


In [51]:
# pip install langchain-community

In [52]:
from transformers import load_tool, ReactCodeAgent, HfEngine
from langchain.agents import load_tools

In [53]:
from transformers import load_tool, ReactCodeAgent, HfEngine
from langchain.agents import load_tools

image_tools = load_tool("m-ric/text-to-image")

mistral_engine = HfEngine("mistralai/Mixtral-8x7B-Instruct-v0.1")
agent = ReactCodeAgent(tools=[image_tools], llm_engine= mistral_engine)

purple_alien = agent.run(
    "Generate an image of a violet boat on a lake, with a red comet in the sky.",
)
purple_alien

You're loading a tool from the Hub from None. Please make sure this is a source that you trust as the code within that tool will be executed on your machine. Always verify the code of the tools that you load. We recommend specifying a `revision` to ensure you're loading the code that you have checked.


tool_config.json:   0%|          | 0.00/412 [00:00<?, ?B/s]

tool.py:   0%|          | 0.00/666 [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/spaces/m-ric/text-to-image:
- tool.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
[37;1mGenerate an image of a violet boat on a lake, with a red comet in the sky.[0m
[31;20mError in generating llm output: (ReadTimeoutError("HTTPSConnectionPool(host='api-inference.huggingface.co', port=443): Read timed out. (read timeout=120)"), '(Request ID: a78e5584-0612-4845-80cf-0777c3d52898)').[0m
Traceback (most recent call last):
  File "/Users/abdulwahabmac/Desktop/MyFiles/Projects/Training/Tuwaiq/Tuwaiq-LLM-28-July/.env/lib/python3.10/site-packages/urllib3/connectionpool.py", line 536, in _make_request
    response = conn.getresponse()
  File "/Users/abdulwahabmac/Desktop/MyFiles/Projects/Training/Tuwaiq/Tuwaiq-LLM-28-July/.env/lib/python3.10/site-packages/urllib3/connection.py", line 464, in getresponse
   

'Error in generating final llm output: 429 Client Error: Too Many Requests for url: https://api-inference.huggingface.co/models/mistralai/Mixtral-8x7B-Instruct-v0.1/v1/chat/completions (Request ID: XkNZ5dLBxAzyspGjiIZPf)\n\nRate limit reached. Please log in or use a HF access token.'

In [55]:
purple_alien

'Error in generating final llm output: 429 Client Error: Too Many Requests for url: https://api-inference.huggingface.co/models/mistralai/Mixtral-8x7B-Instruct-v0.1/v1/chat/completions (Request ID: XkNZ5dLBxAzyspGjiIZPf)\n\nRate limit reached. Please log in or use a HF access token.'