# Introduction to Language Models

## Table of Contents

1. Background
2. Architecture
3. Hugging Face Transformers Library
4. Introduction to BERT
5. Introduction to GPT-2
6. Introduction to LLAMA
7. Introduction to FairSense-AI


## Background

Language models are a type of artificial intelligence model that can understand, interpret, and generate human language. They are a crucial component in various Natural Language Processing (NLP) tasks such as translation, summarization, and question answering.



## Architecture

There are various architectures used to build language models, with some of the popular ones being:

- **Recurrent Neural Networks (RNNs) & Long Short-Term Memory (LSTM) :** RNNs process sequences of data by passing the hidden state from one step in the sequence to the next. Whereas LSTMs, are a special kind of RNNs capable of learning long-term dependencies.

- **Transformer:** Introduced in the paper "Attention is All You Need," (Vaswani et al. in 2017) this architecture uses attention mechanisms to process input sequences in parallel rather than sequentially, which significantly improves efficiency.

**What advantages do transformers have over RNNS/LSTMs**

1. **Parallel Processing**: Transformers process all tokens in a sequence simultaneously, enabling faster training and inference, while RNNs and LSTMs process tokens sequentially which can be time-consuming.
2. **Long-Term Dependencies**: With their attention mechanisms, transformers can handle long-term dependencies in sequences better than standard RNNs and often LSTMs as well, by directly attending over all positions in the input sequence in constant time.
3. **Positional Embeddings**: Unlike RNNs and LSTMs, transformers incorporate positional embeddings in their input representations to retain the order information of sequences, which is crucial for many NLP tasks.
4. **Global Context Awareness**: Transformers' attention mechanisms allow each position to focus on different parts of the input sequence, providing a more global context awareness compared to the local receptive fields of RNNs and LSTMs.
5. **Easier to Train**: Transformers tend to be easier to train and require less tinkering with hyperparameters compared to RNNs and LSTMs, which often suffer from vanishing or exploding gradient problems.

### Recurrent Neural Networks (RNNs):





1. **Input**:
   - RNNs take in a sequence of data, one item at a time (e.g., a word in a sentence).
   - x(t)​ is taken as the input to the network at time step t.

2. **Hidden State**:
   - RNNs have a "memory" (hidden state) that captures information about what has been seen so far.
   - h(t)​ represents a hidden state at time t.

3. **Recurrent Loop**:
   - The same operations (a set of weights) are performed at each step of the sequence, looping the output back into the model at the next step.
   - Input to hidden connections parameterized by a weight matrix U
   - Hidden-to-hidden recurrent connections parameterized by a weight matrix W.
   - Hidden-to-output connections parameterized by a weight matrix V

4. **Output**:
   - At each step, RNNs can produce an output based on the current input and memory.
   - o(t)​ illustrates the output of the network.

5. **Backpropagation**:
   - During training, errors are back-propagated through the network to adjust the weights, improving the model's predictions.



<div> <img src=https://miro.medium.com/v2/resize:fit:640/format:webp/1*JOkrQoJ3J3-451GzRcayRg.png width="400"/> </div>

*The left side of the diagram shows a notation of an RNN and on the right side is the full network representing an RNN.

*Image Source: https://www.deeplearningbook.org/contents/rnn.html/*

### Long Short-Term Memory Networks (LSTMs):



1. **Input, Forget, and Output Gates**:
   - LSTMs have a more complex structure than simple RNNs, with three "gates" controlling the flow of information.
   - The **Input Gate** decides how much of the new input to keep.
   - The **Forget Gate** decides how much of the current memory to forget.
   - The **Output Gate** decides what part of the current state will be outputted.

2. **Cell State**:
   - The "long-term memory" where information can be stored, retrieved, or forgotten over time.

3. **Hidden State**:
   - Similar to RNNs, it's the "short-term memory" capturing information from recent steps.

4. **Recurrent Loop**:
   - Like RNNs, LSTMs process one part of the sequence at a time, but with more complex operations to avoid problems like vanishing gradients, making them capable of learning long-term dependencies.

5. **Output**:
   - LSTMs also produce an output at each step based on the current input, memory, and state.

6. **Backpropagation**:
   - Similar to RNNs, errors are back-propagated through the network during training to adjust the weights.

<div> <img src=https://dwbi1.files.wordpress.com/2021/08/fig-4-lstm.jpg  width ="400"/> </div>

*Image Source: https://dwbi1.wordpress.com/2021/08/07/recurrent-neural-network-rnn-and-lstm/*


### Transformer Architecture






1. **Input Embedding**:
    - Your words (text) are turned into numbers using a process called embedding, so the model can understand and process them.

2. **Positional Encoding**:
    - Since the model doesn't read words in order like humans do, it needs to know the position of each word in a sentence. Positional encoding is added to the embeddings to give that information.

3. **Multi-Head Attention**:
    - This is the part where the model pays "attention" to different parts of the input text. It helps the model to focus on different words when trying to understand the meaning of a particular word.

4. **Normalization and Residual Connection**:
    - After attention, the outputs are normalized (made more uniform) and added to the original input embeddings to ensure smooth training and to help with learning long-term dependencies.

5. **Feed Forward Neural Network**:
    - The model then applies a simple neural network independently to each position, processing the representations further.

6. **Output of Encoder**:
    - The processed representations are then passed to the next layer of the model (or to the decoder if this is the final layer).

7. **Decoder**:
    - The decoder has similar components as the encoder but with an additional multi-head attention layer that looks at the encoder's output. This helps the model to generate coherent responses or translations.

8. **Final Linear and Softmax Layer**:
    - In the final step, the model makes predictions, like predicting the next word in a sequence or translating the input text into another language.

9. **Output**:
    - The model gives out the predicted words or translated text as the output.
    
<div> <img src=https://machinelearningmastery.com/wp-content/uploads/2021/08/attention_research_1.png  width ="400"/> </div>

*Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 6000–6010.*


### Hugging Face Transformers Library

The **Hugging Face Transformers** library is a popular library that provides pre-trained models and tokenizers for various NLP tasks. It supports a wide range of models including GPT-2, BERT, T5, and many others.


# Introduction to BERT



## 1. Background

Traditional language models operated in a unidirectional manner, either from left-to-right (like GPT) or from right-to-left. However, BERT brought a paradigm shift by being designed to read text bidirectionally. This bidirectional understanding means that when processing a particular word, BERT considers context from both its left and its right, capturing richer semantic meanings, especially for polysemous words (words with multiple meanings).

## 2. BERT's Architecture

BERT utilizes the Transformer architecture, first introduced by Vaswani et al. in 2017. Let's dive deeper into the architecture:

### Embedding Layer:
The input tokens are converted into vectors through embeddings. BERT uniquely combines token, segment, and position embeddings for each token's representation.

### Transformer Blocks:
The embedded tokens undergo a series of Transformer blocks. Each block has:

- **Multi-head Self Attention Mechanism**: Allows the model to focus on different parts of the input when interpreting a word.
- **Feed-forward Neural Networks**: Post attention, the tokens are channeled through this network.
- **Residual Connections**: These connections surround every sub-layer and help avoid vanishing gradient issues in deep networks.
- **Layer Normalization**: Applied before each sub-layer to stabilize neuron activations.

### Output:
Post all Transformer blocks, the final embeddings are used for tasks like classification or token-level predictions.

## 3. Pre-training and Fine-tuning

BERT's prowess comes from its two-stage training:

- **Pre-training**: BERT learns word representations from vast text data through two tasks: masked language modeling (predicting masked words from their context) and next sentence prediction (understanding sentence relationships).

- **Fine-tuning**: Post pre-training, BERT is tailored on specific tasks using smaller, task-specific datasets.

Let's explore a hands-on example of fine-tuning BERT.


## 4. Hands-on: Using BERT for a Simple Task


###  Mount Google Drive

In [1]:
#from google.colab import drive
#drive.mount('/content/drive')

### Step 1: Install All the Required Packages and Libraries

In [2]:
from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
from datasets import load_dataset
import torch

  from .autonotebook import tqdm as notebook_tqdm


### Step 2: Load the dataset

The Microsoft Research Paraphrase Corpus (MRPC) is a part of the General Language Understanding Evaluation (GLUE) benchmark, which is a collection of natural language understanding tasks for evaluating the performance of language models. MRPC specifically focuses on the task of paraphrase identification.

In [3]:
# 1. Load the MRPC part of the GLUE dataset
dataset = load_dataset("glue", "mrpc")

### Step 3: Load the Tokenizer and Tokenize the Dataset

In [4]:
# 2. Load the tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

# 3. Tokenize the dataset
def tokenize_function(examples):
    return tokenizer(examples['sentence1'], examples['sentence2'], padding='max_length', truncation=True, max_length=128)

tokenized_datasets = dataset.map(tokenize_function, batched=True)

### Step 4: Load the Model

In [5]:
# 4. Load the model
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)  # MRPC is binary classification

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


### Step 5: Model Training & Evaluation


In [6]:
# 5. Define training arguments
training_args = TrainingArguments(
    output_dir='./results',
    per_device_train_batch_size=8,
    num_train_epochs=3,
    logging_dir='./logs',
    logging_steps=100,
    evaluation_strategy="steps",
    save_steps=500,
    load_best_model_at_end=True
)

# 6. Define the Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"].remove_columns(["sentence1", "sentence2", "idx"]),
    eval_dataset=tokenized_datasets["validation"].remove_columns(["sentence1", "sentence2", "idx"]),
    compute_metrics=None  # You can define a metric function here if needed
)

# 7. Train the model
trainer.train()

# 8. Evaluate the model
results = trainer.evaluate()

Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
[codecarbon INFO @ 10:47:11] [setup] RAM Tracking...
[codecarbon INFO @ 10:47:11] [setup] GPU Tracking...
[codecarbon INFO @ 10:47:11] Tracking Nvidia GPU via pynvml
[codecarbon INFO @ 10:47:11] [setup] CPU Tracking...


ref: /fs01/projects/green-ai/greenai/green-ai/lib/python3.10/site-packages/codecarbon/data/hardware/cpu_power.csv


[codecarbon INFO @ 10:47:14] CPU Model on constant consumption mode: Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz
[codecarbon INFO @ 10:47:14] >>> Tracker's metadata:
[codecarbon INFO @ 10:47:14]   Platform system: Linux-5.4.0-131-generic-x86_64-with-glibc2.27
[codecarbon INFO @ 10:47:14]   Python version: 3.10.12
[codecarbon INFO @ 10:47:14]   CodeCarbon version: 2.6.0
[codecarbon INFO @ 10:47:14]   Available RAM : 1.000 GB
[codecarbon INFO @ 10:47:14]   CPU count: 2
[codecarbon INFO @ 10:47:14]   CPU model: Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz
[codecarbon INFO @ 10:47:14]   GPU count: 1
[codecarbon INFO @ 10:47:14]   GPU model: 1 x NVIDIA A40
[codecarbon INFO @ 10:47:18] Saving emissions data to file /fs01/projects/green-ai/Shaina/results/emissions.csv
[codecarbon INFO @ 10:47:47] Energy consumed for RAM : 0.000002 kWh. RAM Power : 0.375 W
[codecarbon INFO @ 10:47:48] Energy consumed for all GPUs : 0.000383 kWh. Total GPU Power : 89.30155833035045 W
[codecarbon INFO @ 10:47:48] E

Step,Training Loss,Validation Loss
100,0.6462,0.569732
200,0.5676,0.535866
300,0.5341,0.522772
400,0.5113,0.485494
500,0.4403,0.534196
600,0.333,0.615421
700,0.3016,0.531237
800,0.2913,0.62207
900,0.3559,0.56188
1000,0.1,0.745218


[codecarbon INFO @ 10:48:02] Energy consumed for RAM : 0.000003 kWh. RAM Power : 0.375 W
[codecarbon INFO @ 10:48:02] Energy consumed for all GPUs : 0.001092 kWh. Total GPU Power : 174.96301846037366 W
[codecarbon INFO @ 10:48:02] Energy consumed for all CPUs : 0.000355 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 10:48:02] 0.001450 kWh of electricity used since the beginning.
[codecarbon INFO @ 10:48:17] Energy consumed for RAM : 0.000005 kWh. RAM Power : 0.375 W
[codecarbon INFO @ 10:48:17] Energy consumed for all GPUs : 0.002049 kWh. Total GPU Power : 229.9959771531692 W
[codecarbon INFO @ 10:48:17] Energy consumed for all CPUs : 0.000532 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 10:48:17] 0.002586 kWh of electricity used since the beginning.
[codecarbon INFO @ 10:48:32] Energy consumed for RAM : 0.000006 kWh. RAM Power : 0.375 W
[codecarbon INFO @ 10:48:32] Energy consumed for all GPUs : 0.003041 kWh. Total GPU Power : 238.42567606421713 W
[codecarbon INFO @ 10:48:32] En

ref: /fs01/projects/green-ai/greenai/green-ai/lib/python3.10/site-packages/codecarbon/data/private_infra/2016/canada_energy_mix.json


[codecarbon INFO @ 10:49:47] Energy consumed for RAM : 0.000014 kWh. RAM Power : 0.375 W
[codecarbon INFO @ 10:49:47] Energy consumed for all GPUs : 0.007606 kWh. Total GPU Power : 222.512740694916 W
[codecarbon INFO @ 10:49:47] Energy consumed for all CPUs : 0.001591 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 10:49:47] 0.009211 kWh of electricity used since the beginning.
[codecarbon INFO @ 10:50:02] Energy consumed for RAM : 0.000016 kWh. RAM Power : 0.375 W
[codecarbon INFO @ 10:50:02] Energy consumed for all GPUs : 0.008639 kWh. Total GPU Power : 247.79183059121914 W
[codecarbon INFO @ 10:50:02] Energy consumed for all CPUs : 0.001768 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 10:50:02] 0.010422 kWh of electricity used since the beginning.
[codecarbon INFO @ 10:50:13] Energy consumed for RAM : 0.000017 kWh. RAM Power : 0.375 W
[codecarbon INFO @ 10:50:13] Energy consumed for all GPUs : 0.009273 kWh. Total GPU Power : 207.95569709472076 W
[codecarbon INFO @ 10:50:13] Ene

ref: /fs01/projects/green-ai/greenai/green-ai/lib/python3.10/site-packages/codecarbon/data/private_infra/2016/canada_energy_mix.json


### Step 6: Print the Results

In [7]:
print(results)

{'eval_loss': 0.5341963171958923, 'eval_runtime': 1.2485, 'eval_samples_per_second': 326.797, 'eval_steps_per_second': 40.85, 'epoch': 3.0}


### Step 7: Save Pre-Trained Model to Drive

In [8]:
#save_directory = "/content/drive/MyDrive/bert_sentiment_model" #use this command when saving to google drive
save_directory = "bert_sentiment_model"
model.save_pretrained(save_directory)
tokenizer.save_pretrained(save_directory)


('bert_sentiment_model/tokenizer_config.json',
 'bert_sentiment_model/special_tokens_map.json',
 'bert_sentiment_model/vocab.txt',
 'bert_sentiment_model/added_tokens.json')

### Step 8: Load Pre-Trained Model from Drive

In [9]:
#load_directory = "/content/drive/MyDrive/bert_sentiment_model" #use this command when saving to google drive
load_directory = "bert_sentiment_model"
tokenizer = BertTokenizer.from_pretrained(load_directory)
model = BertForSequenceClassification.from_pretrained(load_directory)


### Step 9: Test Pre-Trained Model on Different Examples

In [10]:
# Test on diverse examples
def predict_equivalence(sentence1, sentence2):
    inputs = tokenizer(sentence1, sentence2, return_tensors="pt", truncation=True, padding=True, max_length=128)
    outputs = model(**inputs)

    is_equivalent = 'equivalent' if torch.argmax(outputs.logits, dim=1).item() == 1 else 'not equivalent'
    return is_equivalent

sentence_pair1 = ("The movie was fantastic! Absolutely loved it.", "I really enjoyed the movie, it was wonderful.")
sentence_pair2 = ("The movie was terrible. I regret watching it.", "This film was a masterpiece!")

print(f"Sentences: \n'{sentence_pair1[0]}' \nand \n'{sentence_pair1[1]}' \nare {predict_equivalence(*sentence_pair1)}")
print("\n")
print(f"Sentences: \n'{sentence_pair2[0]}' \nand \n'{sentence_pair2[1]}' \nare {predict_equivalence(*sentence_pair2)}")

Sentences: 
'The movie was fantastic! Absolutely loved it.' 
and 
'I really enjoyed the movie, it was wonderful.' 
are equivalent


Sentences: 
'The movie was terrible. I regret watching it.' 
and 
'This film was a masterpiece!' 
are not equivalent


## 5. Conclusion

BERT's bidirectional attention and its Transformer-based architecture allow it to achieve state-of-the-art performance on many NLP tasks. This, combined with its adaptability through fine-tuning, showcases its versatility and significance in the modern NLP landscape.


# Introduction to GPT-2



## 1. Background
GPT-2, which stands for Generative Pre-trained Transformer 2, is a language model. Developed by OpenAI, GPT-2 was released in February 2019. It's a large model with 1.5 billion parameters, trained on a dataset of 8 million web pages. GPT-2 follows a Transformer architecture and was notable for its size on release. It was pre-trained on a WebText dataset, which includes text from 45 million website linksThe primary objective of GPT-2 is to predict the next word given all previous words within a text. It is designed to generate human-like text by predicting the next word in a sequence given the words that have come before it. With 1.5 billion parameters, GPT-2 is capable of understanding and generating text across a variety of tasks without task-specific training data, making it a powerful and versatile language model.

## 2. Architecture

GPT-2 is based on the Transformer architecture, which was introduced in the paper "Attention is All You Need" by Vaswani et al. Unlike recurrent or convolutional architectures, Transformer models process input sequences in parallel, enabling faster training and inference times. GPT-2, in particular, utilizes the decoder-only variant of the Transformer, and it is trained to minimize the negative log likelihood of the predicted word, given the preceding words.

## 3. Pre-training and Fine-tuning

### Pre-training
GPT-2 is pre-trained on a large corpus of text data. During this phase, the model learns to predict the next word in a sequence given the previous words. It captures a wide range of language patterns and structures.

### Fine-tuning
Although GPT-2 can be used out-of-the-box for a variety of tasks, fine-tuning it on a smaller, task-specific dataset can further improve performance. Fine-tuning adjusts the pre-trained parameters to better suit the specific task at hand.



## 4. Hands-on: Using GPT-2 for a Simple Task

### Step 1: Install All the Required Packages and Libraries

In [11]:
# Import necessary libraries
from transformers import GPT2Tokenizer, GPT2LMHeadModel

### Step 2: Loading the Model

In [12]:
# Load a pre-trained model and tokenizer
model_name = "gpt2"
model = GPT2LMHeadModel.from_pretrained(model_name)
tokenizer = GPT2Tokenizer.from_pretrained(model_name)


### Step 3: Function for Generating Text

In [13]:
# Define a function to generate text
def generate_text(prompt, max_length=100):
    inputs = tokenizer.encode(prompt, return_tensors="pt")
    outputs = model.generate(inputs, max_length=max_length, num_return_sequences=1)
    generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return generated_text

### Step 4: Test the Function

In [14]:
# Test the function
prompt = "Once upon a time, there was a village where."
generated_story = generate_text(prompt)
print(generated_story)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


Once upon a time, there was a village where.

"I'm not sure if it's a village or not, but it's a village. It's a village. It's a village. It's a village. It's a village. It's a village. It's a village. It's a village. It's a village. It's a village. It's a village. It's a village. It's a village. It's a village. It's a


## 5. Conclusion

GPT-2 is a powerful and versatile language model that can be used for a wide range of natural language processing tasks. Its Transformer-based architecture allows it to generate coherent and contextually relevant text across a variety of domains. By pre-training on a large corpus and fine-tuning on a task-specific dataset, GPT-2 can be adapted to many different NLP tasks, showcasing the potential and versatility of modern language models.

# Introduction to LLAMA 




### 1. Background
The Llama (Large Language Model Meta AI) series, developed by Meta, represents state-of-the-art advancements in large language models. Building on the foundational Transformers architecture, which has been pivotal in natural language processing since 2016, Llama models are designed to provide robust performance across diverse language tasks. These models are optimized through techniques like supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to enhance alignment with human preferences for safety and helpfulness. They are released in multiple sizes and configurations to cater to both commercial and research purposes.

#### Key Features of Llama Models:
Model Sizes: Llama models are available in various parameter sizes, designed to balance performance and resource efficiency. Newer versions often expand these options to cater to evolving needs.
Performance: Each iteration builds on its predecessor, offering improved natural language understanding and generation capabilities.
Accessibility: While earlier versions may have been limited to research use, newer iterations typically expand access for commercial applications, often under permissive licenses.
Training and Context Length: Successive versions often benefit from larger and more diverse training datasets, extended context lengths, and advanced fine-tuning methods.
Alignment: Focused on user-centric design, the models undergo extensive alignment processes to ensure safe and effective interactions.
### 2. Architecture
Llama models are based on the Generative Pretrained Transformer (GPT) architecture. They are auto-regressive transformers that predict text sequentially, optimizing for performance and usability. Iterative improvements in architecture and training methodologies distinguish each version of the Llama series, ensuring better alignment with user needs and advancing the state of the art in NLP.

### 3. Pre-training and Fine-tuning
Pre-training
Llama models are pre-trained on massive text corpora encompassing diverse domains, enabling them to build a strong foundational understanding of language. During pre-training, the models learn to predict the next word in a sequence, mastering linguistic patterns and structures across a broad spectrum of topics.

Fine-tuning
Fine-tuning tailors a pre-trained Llama model to specific tasks, enhancing its utility in targeted applications. This process adjusts the model's parameters using smaller, task-specific datasets, significantly improving its performance for specialized use cases.

### 4. Quantized Models and Community Contributions
The Hugging Face community and similar platforms offer quantized versions of Llama models, which optimize them for resource-constrained environments like GPUs on platforms such as Google Colab. Quantization reduces model size and computational requirements without significant loss of accuracy, making Llama models accessible for broader use cases.

 Currently, in this work we use **Llama3.2 1B instruct* model, however you can switch to other versions in the model_id field.



## Hands-on: Using Llama-3.2-1B for a Simple Task




### Step 1: Install All the Required Packages

In [15]:
# GPU llama-cpp-python
# The llama-cpp-python bindings provide simple access to using llama.cpp from within Python.
#!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.78 numpy==1.23.4 --force-reinstall --upgrade --no-cache-dir --verbose

In [16]:
model_id = "meta-llama/Llama-3.2-1B-Instruct"  # if it does not run due to restrictions, please fill in form here https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct an use the below

model_id = "unsloth/Llama-3.2-1B-Instruct" 

### Step 2: Loading the Model

In [17]:
import torch
from transformers import pipeline

model_id = model_id
pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
messages = [
    {"role": "system", "content": "You are a great medical advisor.!"},
    {"role": "user", "content": "Can you suggest me some healthy diet?"},
]
outputs = pipe(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])


{'role': 'assistant', 'content': "I'd be happy to suggest some healthy diets for you. Here are a few options:\n\n**1. Mediterranean Diet:**\n\nThis diet emphasizes whole grains, fruits, vegetables, lean proteins, and healthy fats, such as those found in olive oil. It's been shown to reduce the risk of heart disease, type 2 diabetes, and certain cancers.\n\n**2. Plant-Based Diet:**\n\nThis diet focuses on plant-based foods, such as fruits, vegetables, whole grains, and legumes. It's been shown to reduce the risk of heart disease, type 2 diabetes, and certain cancers.\n\n**3. Flexitarian Diet:**\n\nThis diet is primarily vegetarian but allows for occasional consumption of lean meats. It's been shown to reduce the risk of heart disease and type 2 diabetes.\n\n**4. DASH Diet (Dietary Approaches to Stop Hypertension):**\n\nThis diet emphasizes whole grains, fruits, vegetables, lean proteins, and low-fat dairy. It's been shown to reduce blood pressure and improve overall health.\n\n**5. Keto


## 5. Conclusion
Llama 2 excels in various natural language processing tasks due to its extensive training and fine-tuning. With improved accessibility for commercial use, it demonstrates the evolving potential and adaptability of modern language models in addressing real-world challenges.

# Introduction to Fair-Sense-AI

## A bias detection multi modal tool to find biases in texts and images. It is optimized on green ai methods for ease of use.



### **FairSense-AI Resources**

#### 1. **Documentation and Overview**
   - **Website**: [FairSense-AI Documentation](https://vectorinstitute.github.io/FairSense-AI/)
   - **Description**: A comprehensive guide to understanding and implementing fairness in AI systems. This site provides insights into the methodology, applications, and best practices for using FairSense-AI.

#### 2. **Python Package**
   - **PyPI Page**: [Fair-Sense-AI Python Package](https://pypi.org/project/fair-sense-ai/)
   - **Description**: The official Python package for FairSense-AI, designed for seamless integration into AI pipelines. It provides tools for fairness evaluation, bias mitigation, and reporting.

#### **Key Features**
   - **User-Friendly Interface**: Simplifies the fairness analysis process.
   - **Comprehensive Metrics**: Includes multiple fairness metrics tailored for various AI models.
   - **Customizable Pipelines**: Enables adaptation to diverse datasets and models.

#### **Get Started**
   - **Install the Package**:
     ```bash
     pip install Fair-Sense-AI
     ```
   - **Explore the Documentation**: Visit the [documentation site](https://vectorinstitute.github.io/FairSense-AI/) to learn how to use the package effectively.

#### **Applications**
   - Evaluating and improving fairness in machine learning models.
   - Supporting ethical AI development across industries such as healthcare, finance, and education.

--- 



!pip install fair-sense-ai
!pip show fair-sense-ai

# How to Run This Colab Notebook

### Steps to Run the Notebook
1. **Open the Notebook:**
   - Click on this [link](https://colab.research.google.com/drive/1en8JtZTAIa5MuV5OZWYNteYl95Ql9xy7?usp=sharing) to open the notebook in Google Colab.

2. **Set Up Your Environment:**
   - Ensure you're signed in to your Google account.
   - Once opened, Colab will automatically provision a runtime environment.

3. **Connect to the Runtime:**
   - Click the "Connect" button in the top-right corner of the Colab interface to initialize the environment.

4. **Install Required Dependencies:**
   - The notebook might include cells with `!pip install` commands to set up required libraries. Run these cells first to install dependencies.

5. **Run All Cells:**
   - To execute all cells in the notebook:
     - Go to **Runtime > Run all** in the menu bar.
   - Alternatively, run individual cells by selecting a cell and pressing `Shift + Enter`.

6. **Customize Execution (if needed):**
   - Modify input cells or parameters as required before running the notebook.

7. **Save Your Work:**
   - To save a copy of the notebook:
     - Click **File > Save a 
