# 🚀 Arun’s Generative AI Notebook
### *Exploring Hugging Face Transformers for Text Generation*

Welcome 👋 This notebook will guide you through the basics of **Generative AI** using Hugging Face.  
It covers: setup, library imports, model & tokenizer loading, training/fine-tuning, and text generation.

---

## 🔐 Hugging Face Authentication
To use Hugging Face models:  
1. Create an access token here 👉 [https://huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)  
2. Copy the token and paste it when prompted in the login cell.  
⚠️ **Keep your token private** — don’t share it publicly. You can safely remove it after use.  

---

➡️ Run the cells **in order**, follow the comments, and start experimenting with Generative AI!


In [8]:
# Authenticate with Hugging Face Hub
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    A token is already saved on your machine. Run `hf auth whoami` to get more information or `hf auth logout` if you want to log out.
    Setting a new token will erase the existing one.
    To log in, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) y
Token is valid (permission: read).
The token `myf

In [9]:
# Install required Python packages
%%capture
pip install transformers

In [10]:
# Import required libraries
from transformers import pipeline

In [11]:
llm=pipeline(task="sentiment-analysis")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


In [12]:
llm("I am Learning Gen AI and it looks good")

[{'label': 'POSITIVE', 'score': 0.9998704195022583}]

In [13]:
# Install required Python packages
%%capture
pip install gradio

In [14]:
def sentimental_analysis(text):
  return llm(text)[0]["label"]

In [15]:
# Import required libraries
import gradio as gr

In [16]:
demo = gr.Interface(fn=sentimental_analysis,inputs="text",outputs="text",title="Sentimental_analysis_gradio_application")

In [17]:
demo.launch()

It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://26469bc3ae88a1c02b.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




In [18]:
text_list = ["This is great",
             "Thanks for nothing",
             "You've got to work on your face",
             "You're beautiful, never change!"]

In [19]:
# Define or load the model
llm=pipeline(task="sentiment-analysis", model="distilbert/distilbert-base-uncased-finetuned-sst-2-english")

Device set to use cpu


In [20]:
llm(text_list)

[{'label': 'POSITIVE', 'score': 0.9998785257339478},
 {'label': 'POSITIVE', 'score': 0.9680057168006897},
 {'label': 'NEGATIVE', 'score': 0.8776122331619263},
 {'label': 'POSITIVE', 'score': 0.9998120665550232}]

#Text Classification


In [21]:
# Import required libraries
from transformers import pipeline

classifier = pipeline(task="text-classification", model="SamLowe/roberta-base-go_emotions")

sentences = ["I am not having a great day"]

classifier(sentences)


config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/499M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/380 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/280 [00:00<?, ?B/s]

Device set to use cpu


[{'label': 'disappointment', 'score': 0.4666953980922699}]

In [22]:
classifier(sentences)[0]

{'label': 'disappointment', 'score': 0.4666953980922699}

#Summarization


In [23]:
# Import required libraries
from transformers import pipeline

summarizer = pipeline(task="summarization", model="facebook/bart-large-cnn")

config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Device set to use cpu


In [24]:
ARTICLE = """ New York (CNN)When Liana Barrientos was 23 years old, she got married in Westchester County, New York.
A year later, she got married again in Westchester County, but to a different man and without divorcing her first husband.
Only 18 days after that marriage, she got hitched yet again. Then, Barrientos declared "I do" five more times, sometimes only within two weeks of each other.
In 2010, she married once more, this time in the Bronx. In an application for a marriage license, she stated it was her "first and only" marriage.
Barrientos, now 39, is facing two criminal counts of "offering a false instrument for filing in the first degree," referring to her false statements on the
2010 marriage license application, according to court documents.
Prosecutors said the marriages were part of an immigration scam.
On Friday, she pleaded not guilty at State Supreme Court in the Bronx, according to her attorney, Christopher Wright, who declined to comment further.
After leaving court, Barrientos was arrested and charged with theft of service and criminal trespass for allegedly sneaking into the New York subway through an emergency exit, said Detective
Annette Markowski, a police spokeswoman. In total, Barrientos has been married 10 times, with nine of her marriages occurring between 1999 and 2002.
All occurred either in Westchester County, Long Island, New Jersey or the Bronx. She is believed to still be married to four men, and at one time, she was married to eight men at once, prosecutors say.
Prosecutors said the immigration scam involved some of her husbands, who filed for permanent residence status shortly after the marriages.
Any divorces happened only after such filings were approved. It was unclear whether any of the men will be prosecuted.
The case was referred to the Bronx District Attorney\'s Office by Immigration and Customs Enforcement and the Department of Homeland Security\'s
Investigation Division. Seven of the men are from so-called "red-flagged" countries, including Egypt, Turkey, Georgia, Pakistan and Mali.
Her eighth husband, Rashid Rajput, was deported in 2006 to his native Pakistan after an investigation by the Joint Terrorism Task Force.
If convicted, Barrientos faces up to four years in prison.  Her next court appearance is scheduled for May 18.
"""

In [25]:
print(summarizer(ARTICLE, max_length=130, min_length=30)[0]["summary_text"])

Liana Barrientos, 39, is charged with two counts of "offering a false instrument for filing in the first degree" In total, she has been married 10 times, with nine of her marriages occurring between 1999 and 2002. She is believed to still be married to four men.


In [26]:
ner=pipeline("ner",aggregation_strategy="simple" )

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision 4c53496 (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/998 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

Device set to use cpu


In [27]:
# Define or load the model
text=""" Google recently announced its new AI model, Gemini, at its annual I/O conference. Sundar Pichai, the CEO of Google, showcased Gemini's capabilities in a keynote address, highlighting its advanced performance in text generation and image recognition. The model is expected to be integrated into various Google products, including Google Assistant and Bard, in the coming months.

In a separate development, Dr. Sarah Miller, a leading researcher at the Massachusetts Institute of Technology (MIT), published a paper in the journal Nature detailing her team's breakthrough in materials science. The research, which was funded by a grant from the National Science Foundation (NSF), focuses on creating a new type of superconductor that operates at room temperature. This discovery could revolutionize the electronics industry.

Meanwhile, the city of Paris, France, is preparing to host the 2024 Summer Olympics. Organizers are working tirelessly to ensure all venues, including the Stade de France and the Grand Palais, are ready for the global event. Athletes from around the world, such as the renowned swimmer Michael Phelps and sprinter Usain Bolt, are not participating, but their legacies are being celebrated.  The city's mayor, Anne Hidalgo, has expressed confidence that the games will be a spectacular success and a major boost for tourism.

Finally, the company Tesla reported its quarterly earnings, exceeding analyst expectations. Elon Musk, the company's CEO, mentioned in the earnings call that the new Gigafactory in Texas is now fully operational and is significantly increasing production of the Model 3 and Model Y. The stock price saw a notable increase following the announcement."""

In [28]:
# Import required libraries
import pandas as pd
pd.DataFrame(ner(text))

Unnamed: 0,entity_group,score,word,start,end
0,ORG,0.999038,Google,1,7
1,MISC,0.552022,AI,35,37
2,MISC,0.634971,Gemini,45,51
3,PER,0.996361,Sundar Pichai,83,96
4,ORG,0.998689,Google,109,115
5,ORG,0.509695,Gemini,127,133
6,ORG,0.990622,Google,303,309
7,MISC,0.993067,Google Assistant,330,346
8,MISC,0.983716,Bard,351,355
9,PER,0.999726,Sarah Miller,411,423


#Translation eng to hin

In [29]:
# Define or load the model
translator=pipeline("translation_en_to_hi",model="Helsinki-NLP/opus-mt-en-hi")

config.json: 0.00B [00:00, ?B/s]

pytorch_model.bin:   0%|          | 0.00/306M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/306M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/44.0 [00:00<?, ?B/s]

source.spm:   0%|          | 0.00/812k [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/1.07M [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

Device set to use cpu


In [30]:
print(translator(text)[0]["translation_text"])

गूगल ने हाल ही में अपने नए एआई मॉडल की घोषणा की है, अपने वार्षिक I/O सम्मेलन में. स्वंयक्षी भाई, गूगल की क्षमता दिखाते हैं कि वे पाठ के विस्तृत प्रदर्शन और छवि की पहचान में शामिल हैं. आदर्श की अपेक्षा की गई है, गूगल के विभिन्न उत्पादों में, गूगल के बारे में, डॉ.


#Q/A with model

In [31]:
qabot=pipeline("question-answering")
question="Identify all the organizations mentioned."
qabot(question=question,context=text)

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 564e9b5 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Device set to use cpu


{'score': 2.530427991587203e-07, 'start': 1, 'end': 7, 'answer': 'Google'}

In [32]:
question="Find all the locations (cities, countries, and states) listed in the passage."
print(qabot(question=question,context=text))

{'score': 0.00013420352479442954, 'start': 977, 'end': 1017, 'answer': 'the Stade de France and the Grand Palais'}


#Vanila Chatbot

In [33]:
# Import required libraries
from transformers import pipeline

chatbot = pipeline("text2text-generation", model="facebook/blenderbot-400M-distill")

config.json: 0.00B [00:00, ?B/s]

pytorch_model.bin:   0%|          | 0.00/730M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/730M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/347 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

added_tokens.json:   0%|          | 0.00/16.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/772 [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Device set to use cpu


In [34]:
chatbot("hi, where are you from")

[{'generated_text': ' I am from the united states, how about you? I have never been out of the country.'}]

In [35]:
chatbot("I am from india")

[{'generated_text': ' I have never been to india, but I would love to go one day. I hear it is beautiful there.'}]

In [36]:
# Generate predictions using the model
def vanila_chatbot(message,history):
  return chatbot(message)[0]["generated_text"]

In [37]:
demobot=gr.ChatInterface(vanila_chatbot,title="Vanila Chatbot")

  self.chatbot = Chatbot(


In [38]:
demobot.launch()

It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://6c796fa1701ec6f294.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


