<a href="https://colab.research.google.com/github/Tgoutam/skills-introduction-to-github/blob/main/Hugging%20Face%20Sentiment%20Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# prompt: install transformer and pyTorch library

!pip install transformers
!pip install torch


In [2]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification


In [6]:
import torch # Import the torch library

In [None]:
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)


**Tokenize the Input Text** - Prepare your input text by tokenizing it. Tokenization converts the text into a format that the model can process.

In [11]:
input_text = "I HATE using Hugging Face transformers!"
inputs = tokenizer(input_text, return_tensors="pt")


**Run the Model**
Pass the tokenized input to the model to get predictions.

Explanation
with torch.no_grad(): This context manager is used to disable gradient calculation, which is not needed for inference and reduces memory usage.

outputs = model(**inputs): This passes the tokenized inputs to the model and stores the output. The **inputs syntax unpacks the dictionary of inputs so that each key-value pair is passed as a separate argument.



In [12]:
with torch.no_grad():
    outputs = model(**inputs)


**Process the Output**
Extract and process the output to interpret the results.

In [13]:
predictions = outputs.logits.argmax(dim=-1)


**This extracts the logits (raw model outputs) from the output, and argmax(dim=-1) finds the index of the highest logit, which corresponds to the predicted class.**

In [14]:
print(f"Predicted class: {predictions.item()}")

Predicted class: 0


Predicted class: 0 means the Sentiment expressed above in (" I HATE...") is NEGATIVE. If HATE is changed to LOVE and re-run, the Sentiment shows as 1 i.e POSITIVE.

Full Python Code:

# Step 1: Install necessary libraries
!pip install transformers torch

# Step 2: Import libraries
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Step 3: Select and load a model
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Step 4: Tokenize the input text
input_text = "I love using Hugging Face transformers!"
inputs = tokenizer(input_text, return_tensors="pt")

# Step 5: Run the model
with torch.no_grad():
    outputs = model(**inputs)

# Step 6: Process the output
predictions = outputs.logits.argmax(dim=-1)
print(f"Predicted class: {predictions.item()}")

-------------------

To run a Transformer model from the Hugging Face Hub in Google Colab, follow these steps:

### Step 1: Install Necessary Libraries
First, ensure you have the necessary libraries installed. You'll need the `transformers` library from Hugging Face and `torch` for PyTorch.

```python
!pip install transformers torch
```

### Explanation
- `!pip install transformers torch`: This command installs the `transformers` library, which provides the tools to work with transformer models, and `torch`, which is PyTorch, a deep learning framework required by most Hugging Face models.

### Step 2: Import Libraries
Next, import the necessary modules from the `transformers` library.

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
```

### Explanation
- `from transformers import AutoTokenizer, AutoModelForSequenceClassification`: This imports the `AutoTokenizer` and `AutoModelForSequenceClassification` classes, which are used to load the tokenizer and model, respectively. The `AutoModelForSequenceClassification` class is specifically for classification tasks; if you need a different type of model, you might import a different class.

### Step 3: Select and Load a Model
Choose a model from the Hugging Face Hub. For this example, we'll use `distilbert-base-uncased-finetuned-sst-2-english`, a sentiment analysis model.

```python
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
```

### Explanation
- `model_name = "distilbert-base-uncased-finetuned-sst-2-english"`: This specifies the name of the model you want to use.
- `tokenizer = AutoTokenizer.from_pretrained(model_name)`: This loads the tokenizer associated with the specified model, which will preprocess the text data into the format expected by the model.
- `model = AutoModelForSequenceClassification.from_pretrained(model_name)`: This loads the pre-trained model itself.

### Step 4: Tokenize the Input Text
Prepare your input text by tokenizing it. Tokenization converts the text into a format that the model can process.

```python
input_text = "I love using Hugging Face transformers!"
inputs = tokenizer(input_text, return_tensors="pt")
```

### Explanation
- `input_text = "I love using Hugging Face transformers!"`: This is the input text you want to classify.
- `inputs = tokenizer(input_text, return_tensors="pt")`: This tokenizes the input text and returns it as PyTorch tensors (`"pt"` stands for PyTorch).

### Step 5: Run the Model
Pass the tokenized input to the model to get predictions.

```python
with torch.no_grad():
    outputs = model(**inputs)
```

### Explanation
- `with torch.no_grad()`: This context manager is used to disable gradient calculation, which is not needed for inference and reduces memory usage.
- `outputs = model(**inputs)`: This passes the tokenized inputs to the model and stores the output. The `**inputs` syntax unpacks the dictionary of inputs so that each key-value pair is passed as a separate argument.

### Step 6: Process the Output
Extract and process the output to interpret the results.

```python
predictions = outputs.logits.argmax(dim=-1)
```

### Explanation
- `predictions = outputs.logits.argmax(dim=-1)`: This extracts the logits (raw model outputs) from the output, and `argmax(dim=-1)` finds the index of the highest logit, which corresponds to the predicted class.

### Complete Example Code
Here is the complete code with all the steps combined:

```python
# Step 1: Install necessary libraries
!pip install transformers torch

# Step 2: Import libraries
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Step 3: Select and load a model
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Step 4: Tokenize the input text
input_text = "I love using Hugging Face transformers!"
inputs = tokenizer(input_text, return_tensors="pt")

# Step 5: Run the model
with torch.no_grad():
    outputs = model(**inputs)

# Step 6: Process the output
predictions = outputs.logits.argmax(dim=-1)
print(f"Predicted class: {predictions.item()}")
```

### Explanation
- `print(f"Predicted class: {predictions.item()}")`:

This prints the predicted class label. In sentiment analysis, 0 typically represents negative sentiment and 1 represents positive sentiment.

This example demonstrates how to load, prepare, and run a Transformer model from the Hugging Face Hub in Google Colab, and how to interpret the results.
