# What is Hugging Face?

Hugging Face is a machine learning (ML) and data science platform and community that helps users build, deploy and train machine learning models.

It provides the infrastructure to demo, run and deploy artificial intelligence (AI) in live applications. Users can also browse through models and data sets that other people have uploaded. Hugging Face is often called the GitHub of machine learning because it lets developers share and test their work openly.

Hugging Face is known for its Transformers Python library, which simplifies the process of downloading and training ML models. The library gives developers an efficient way to include one of the ML models hosted on Hugging Face in their workflow and create ML pipelines.

The platform is important because of its open source nature and deployment tools. It allows users to share resources, models and research and to reduce model training time, resource consumption and environmental impact of AI development.

Hugging Face Inc. is the American company that created the Hugging Face platform. The company was founded in New York City in 2016 by French entrepreneurs Clément Delangue, Julien Chaumond and Thomas Wolf. The company originally developed a chatbot app by the same name for teenagers. The company switched its focus to being a machine learning platform after open sourcing the model behind the chatbot app.

In 2023, the company announced a partnership with Amazon Web Services to make Hugging Face products available to AWS customers for building custom applications. Google, Amazon and Nvidia are just a few of the companies that have invested in the startup as of this writing.

https://www.techtarget.com/whatis/definition/Hugging-Face

#### Pre-Requisites

In [7]:
# # Install the Libraries
!pip install langchain-huggingface
# For API Calls
!pip install huggingface_hub #it has repositories which contains a lot of models
!pip install transformers # if search for any model in huggingface we will be using transformer pipeline
!pip install accelerate
!pip install bitsandbytes
!pip install langchain

Collecting langchain-huggingface
  Downloading langchain_huggingface-0.3.0-py3-none-any.whl.metadata (996 bytes)
Downloading langchain_huggingface-0.3.0-py3-none-any.whl (27 kB)
Installing collected packages: langchain-huggingface
Successfully installed langchain-huggingface-0.3.0
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu1

In [8]:
# %pip install --upgrade --quiet huggingface_hub

1. To start with the Hugging face, you need to go to the https://huggingface.co/
2. login with your credentials
3. if you don't have an account, Click on signup and create it
4. click on settings
5. click on AccessTokens
6. create new token and add read permissions
7. Add the Token in Google colab under secrets section
8. call them using below code


IF you are using VS code, Then create a environment variabes and call them according (load_dotenv)


In [None]:
# call the Environment Secret Keys
from google.colab import userdata
sec_key=userdata.get("HF_TOKEN")
print(sec_key)


## HuggingFaceEndpoint
## How to Access HuggingFace Models with API

1. There are multiple ways to call the any model from the https://huggingface.co/models
2. if you want document Q&A ===> Select Document Question Answering and select any model you want
3. if you want image model ===> Select Image to Text and select any model you want

There are also two ways to use this class. You can specify the model with the repo_id parameter. Those endpoints use the serverless API, which is particularly beneficial to people using pro accounts or enterprise hub. Still, regular users can already have access to a fair amount of request by connecting with their HF token in the environment where they are executing the code.

# Calling The model
if you know any of the model, you can search and copy and then use that particular model.

Calling the model, we have 2 ways
1. Download the Model locally (Specifically in the Google colab in my ram but again you need huge amount of RAM/GPU)

2. The second is way is that, we can call the same model through an endpoint(using an API). which is optimal

3. I am going use second way in this Project.

# Huggingface hub

Hugging face hub which already installed in the Pre-requisites section.
Hugging hub is a library which contains the LLM models in the Huggingface

It allows you to:

    Browse and search models/datasets/spaces.

    Download models or use them via API.

    Push your own models.

In [10]:
from langchain_huggingface import HuggingFaceEndpoint

In [None]:
# call the Environment Secret Keys
from google.colab import userdata
sec_key=userdata.get("HUGGINGFACEHUB_API_TOKEN") # hugging face access token & api_token both are same
print(sec_key)

In [12]:
# to call the huggingface model, you need to configure the HuggingFace Token

import os
os.environ["HUGGINGFACEHUB_API_TOKEN"] = sec_key

In [13]:
# environment variable is set, now we can call any model from the Huggingface, some of the models are paid, but most of the models are free
# repo_id means which model you wanted to call as an API

repo_id = "mistralai/Mistral-7B-Instruct-v0.3"

# Configure the LLM model
llm = HuggingFaceEndpoint(
    repo_id=repo_id,
    temperature=0.7,
    max_new_tokens=128
)

In [14]:
llm

HuggingFaceEndpoint(repo_id='mistralai/Mistral-7B-Instruct-v0.3', huggingfacehub_api_token='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', max_new_tokens=128, temperature=0.7, stop_sequences=[], server_kwargs={}, model_kwargs={}, model='mistralai/Mistral-7B-Instruct-v0.3', client=<InferenceClient(model='mistralai/Mistral-7B-Instruct-v0.3', timeout=120)>, async_client=<InferenceClient(model='mistralai/Mistral-7B-Instruct-v0.3', timeout=120)>)

# 1. Just invoke the LLM to see the response

In [15]:
llm.invoke("What is Generative AI?")

'\n\nGenerative AI is a type of artificial intelligence that uses machine learning to create new content or data, such as images, videos, text, or music, based on patterns it has learned from existing data. It works by training a model on a large dataset and then using that model to generate new data that is similar to the data it was trained on.\n\nThere are several types of generative AI, including:\n\n1. Generative adversarial networks (GANs): A type of neural network that consists of two parts: a generator and a discriminator. The generator creates new data, while the discrim'

# 2. Create Prompt Template to call the LLM model

In [16]:
from langchain import PromptTemplate, LLMChain

question = "who won ICC ODI cricket world cup in 2011?"
template = """Question: {question}
Answer: Let's think step by step."""
prompt = PromptTemplate(template=template, input_variables=["question"])
print(prompt)

input_variables=['question'] input_types={} partial_variables={} template="Question: {question}\nAnswer: Let's think step by step."


In [17]:
llm_chain = LLMChain(llm=llm, prompt=prompt)
llm_chain.invoke(question)

  llm_chain = LLMChain(llm=llm, prompt=prompt)


{'question': 'who won ICC ODI cricket world cup in 2011?',
 'text': '\n1. First, we know that the International Cricket Council (ICC) organizes the ODI cricket world cup.\n2. Next, we need to find out which team won the ICC ODI cricket world cup in 2011.\n3. The 2011 ICC ODI Cricket World Cup was won by the team from India.\n\nSo, the answer is India.'}

# 3. Hugging Face Pipeline

*** you want to download entire model in your local environment at that time you can use huggingface pipeline***

#### Disadvantage: Very big models or multimodels can't be downloaded to locally due to RAM & GPU

### Let's try with small model

Among transformers, the Pipeline is the most versatile tool in the Hugging Face toolbox. LangChain being designed primarily to address RAG and Agent use cases, the scope of the pipeline here is reduced to the following text-centric tasks: “text-generation", “text2text-generation", “summarization”, “translation”.
Models can be loaded directly with the from_model_id method

In [18]:
# import libraries

from langchain_huggingface import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

In [19]:
model_id = "gpt2" # https://huggingface.co/openai-community/gpt2     this model will download to your google colab

model = AutoModelForCausalLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [20]:
# Define a pipeline, tha mandatory fields are task(text-generation, q&a, summarization), model, tokenizer, max_tokens

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=100)
hf = HuggingFacePipeline(pipeline=pipe)

Device set to use cuda:0


In [21]:
hf.invoke("what is machine learning?")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


"what is machine learning? I'm always amazed when I hear people say the same thing.\n\nMy current system is a bit smaller than my previous ones, but it works. It takes a few seconds to complete and it is very quick. I think the biggest advantage it has over the previous ones is that it has a lot more memory. I'm still learning in a very short time. I'm not sure if this will be a big advantage or not, but I think it will be an advantage for the future."

In [22]:
##### This is How you can download the model locally and run locally usingHuggingFacePipeline, transformers, But keep in mind that it requires a High end configurations


# Use HuggingFacePipeline with GPU config (T4 GPU)

In [23]:
gpu_llm = HuggingFacePipeline.from_model_id(
    model_id="gpt2",
    task="text-generation",
    device = 0, # device =0 (it uses GPU), -1 (it uses CPU)
    model_kwargs={"temperature":0, "max_length":100}
)

The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Device set to use cuda:0


In [24]:
from langchain_core.prompts import PromptTemplate

template = """Question: {question}

Answer: Let's think step by step."""
prompt = PromptTemplate.from_template(template)

In [25]:
chain = prompt | gpu_llm

In [26]:
chain.invoke("What is an Artificial Intelligence?")

'Question: What is an Artificial Intelligence?\n\nAnswer: Let\'s think step by step. It\'s like a giant machine. There are some things that you can do with it. You can make it smarter. You can get rid of the problems. You can make it more efficient. That\'s the important thing. But that\'s not what humans are doing.\n\nThe thing that is really interesting is that there are a lot of things that are going on for a machine. Those are the things that are important to humans to do – they\'re the things that we need. But there are also a lot of things that are the things that we can do with technology.\n\nWhen I was at MIT, I was working on a project called "Deep Neural Networks" – I had an idea of what that was. I started doing a research project on it, and I asked myself: "What are these neural networks? What kind of things should they do?" And then I decided to start doing it. And since then there\'s been a lot of interest in it, and I\'ve been making a lot of progress.\n\nWe\'ve had a lo