- Base code comes from Maarten Grootendorst, modified for my use case
- Doesn't use Llama.cpp to implement quantization
  - Utilizes QLORA to reduces the average memory requirements
of finetuning a 65B parameter model from >780GB
of GPU memory to <48GB without degrading the
runtime or predictive performance compared to a 16-
bit fully finetuned baseline
- SentenceTransformer("BAAI/bge-small-en")
- model_id = 'meta-llama/Llama-2-13b-chat-hf'
- Only runs on A100 / V100 will not work even with high ram
- RUNTIME: Approx 11 minutes on Colab Pro+

- Successfully identified topics within tweets.


GPU Notes:
- The A100 performed between 1.4x – 1.9x faster than the V100 for these benchmarks.

In [1]:
#see what GPU you've been assigned at any time by executing the following cell
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Not connected to a GPU')
else:
  print(gpu_info)

Mon Mar 18 20:33:08 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA A100-SXM4-40GB          Off | 00000000:00:04.0 Off |                    0 |
| N/A   32C    P0              47W / 400W |      2MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
                                                                    

In [2]:
#You can see how much memory you have available at any time by running the following code cell.
#If the execution result of running the code cell below is "Not using a high-RAM runtime", then you can enable a high-RAM runtime via Runtime > Change runtime type in the menu. Then select High-RAM in the Runtime shape dropdown.
#After, re-execute the code cell.
from psutil import virtual_memory
ram_gb = virtual_memory().total / 1e9
print('Your runtime has {:.1f} gigabytes of available RAM\n'.format(ram_gb))

if ram_gb < 20:
  print('Not using a high-RAM runtime')
else:
  print('You are using a high-RAM runtime!')

Your runtime has 89.6 gigabytes of available RAM

You are using a high-RAM runtime!


In [3]:
%%capture
!pip install bertopic datasets accelerate bitsandbytes xformers adjustText

# 📄 **Data**

In [4]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [5]:
from google.colab import files
uploaded = files.upload()

Saving TweetBatch3.csv to TweetBatch3.csv


In [6]:
import pandas as pd
import io

tweets = pd.read_csv(io.BytesIO(uploaded['TweetBatch3.csv']))

In [7]:
tweets.drop(['cleaned'], axis=1, inplace=True)
tweets.shape

(34993, 1)

In [8]:
tweets.head()

Unnamed: 0,text
0,@ReallyAmerican1 #Roevember and\n#ForThePeople...
1,RT @sandibachom: IS THIS THING ON???!!This is ...
2,RT @sandibachom: IS THIS THING ON???!!This is ...
3,RT @tleehumphrey: Today is the beginning of th...
4,RT @AdamKinzinger: Mitch McConnell.\nKevin McC...


In [9]:
#turn tweet column into a list of strings
tweet_list = tweets["text"].tolist()

# 🤗 HuggingFace Hub Credentials
we can login with our HuggingFace credentials so that this environment knows we have permission to download the Llama 2 model that we are interested in.

In [10]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

# **Llama 2**

In [11]:
from torch import cuda

model_id = 'meta-llama/Llama-2-13b-chat-hf'
device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'

print(device)

cuda:0


## **Optimization & Quantization**

- To load the 13 billion parameter model utilize 4-bit quantization to condense the model to reduce GPU memory required

In [12]:
from torch import bfloat16
import transformers

# set quantization configuration to load large model with less GPU memory
# this requires the `bitsandbytes` library

bnb_config = transformers.BitsAndBytesConfig(
    load_in_4bit=True,  # 4-bit quantization
    bnb_4bit_quant_type='nf4',  # Normalized float 4
    bnb_4bit_use_double_quant=True,  # Second quantization after the first
    bnb_4bit_compute_dtype=bfloat16  # Computation type
)

Definitions
* `load_in_4bit`
  * Allows us to load the model in 4-bit precision compared to the original 32-bit precision
  * This gives us an incredibly speed up and reduces memory!
* `bnb_4bit_quant_type`
  * This is the type of 4-bit precision. The paper recommends normalized float 4-bit, so that is what we are going to use!
* `bnb_4bit_use_double_quant`
  * This is a neat trick as it perform a second quantization after the first which further reduces the necessary bits
* `bnb_4bit_compute_dtype`
  * The compute type used during computation, which further speeds up the model.



Using this configuration, load in the model and tokenizer

#### Load Tokenizer and Model

In [13]:
# Llama 2 Tokenizer
tokenizer = transformers.AutoTokenizer.from_pretrained(model_id)

# Llama 2 Model
model = transformers.AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    quantization_config=bnb_config,
    device_map='auto',
)
model.eval()

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/1.62k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/587 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/33.4k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/9.95G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/9.90G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/6.18G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 5120)
    (layers): ModuleList(
      (0-39): 40 x LlamaDecoderLayer(
        (self_attn): LlamaSdpaAttention(
          (q_proj): Linear4bit(in_features=5120, out_features=5120, bias=False)
          (k_proj): Linear4bit(in_features=5120, out_features=5120, bias=False)
          (v_proj): Linear4bit(in_features=5120, out_features=5120, bias=False)
          (o_proj): Linear4bit(in_features=5120, out_features=5120, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear4bit(in_features=5120, out_features=13824, bias=False)
          (up_proj): Linear4bit(in_features=5120, out_features=13824, bias=False)
          (down_proj): Linear4bit(in_features=13824, out_features=5120, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    (norm): Lla

Using the model and tokenizer, generate a HuggingFace transformers pipeline that generates new text:

In [14]:
# Our text generator
generator = transformers.pipeline(
    model=model, tokenizer=tokenizer,
    task='text-generation',
    # set temp low. Sets how creative the model is. low = not that creative.
    temperature=0.1,
    max_new_tokens=500,
    # give it a penalty for how much it repeats certain phrases
    repetition_penalty=1.1
)

## **Prompt Engineering**

template to follow:

```python
"""
<s>[INST] <<SYS>>

{{ System Prompt }}

<</SYS>>

{{ User Prompt }} [/INST]

{{ Model Answer }}
"""
```

template two main components `{{ System Prompt }}` and the `{{ User Prompt }}`:
* The `{{ System Prompt }}` helps guide the model during a conversation.
* The  `{{ User Prompt }}` is where we ask it a question.

### **Prompt Template**

In [15]:
# System prompt describes information given to all conversations
system_prompt = """
<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant for labeling topics.
<</SYS>>
"""

In [16]:
# Example prompt demonstrating the output we are looking for
example_prompt = """
I have a topic that contains the following documents:
- Traditional diets in most cultures were primarily plant-based with a little meat on top, but with the rise of industrial style meat production and factory farming, meat has become a staple food.
- Meat, but especially beef, is the word food in terms of emissions.
- Eating meat doesn't make you a bad person, not eating meat doesn't make you a good one.

The topic is described by the following keywords: 'meat, beef, eat, eating, emissions, steak, food, health, processed, chicken'.

Based on the information about the topic above, please create a short label of this topic. Make sure you to only return the label and nothing more.

[/INST] Environmental impacts of eating meat
"""

In [17]:
# Our main prompt with documents ([DOCUMENTS]) and keywords ([KEYWORDS]) tags
main_prompt = """
[INST]
I have a topic that contains the following documents:
[DOCUMENTS]

The topic is described by the following keywords: '[KEYWORDS]'.

Based on the information about the topic above, please create a short label of this topic. Make sure you to only return the label and nothing more.
[/INST]
"""

Definitions
`[DOCUMENTS]` and `[KEYWORDS]`:

* `[DOCUMENTS]` contain the top 5 most relevant documents to the topic
* `[KEYWORDS]` contain the top 10 most relevant keywords to the topic as generated through c-TF-IDF

Template filled accordingly to each topic. Combine this into our final prompt:

In [19]:
#        who is Llama 2 + what output should look like + our question = create label of topic
prompt = system_prompt + example_prompt + main_prompt

### **BERTopic**

## **Preparing Embeddings**

In [20]:
from sentence_transformers import SentenceTransformer

# Pre-calculate embeddings
embedding_model = SentenceTransformer("BAAI/bge-small-en")
embeddings = embedding_model.encode(tweet_list, show_progress_bar=True)

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/90.8k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/684 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Batches:   0%|          | 0/1094 [00:00<?, ?it/s]

## **Sub-models**

In [21]:
from umap import UMAP
from hdbscan import HDBSCAN

umap_model = UMAP(n_neighbors=15, n_components=5, min_dist=0.0, metric='cosine', random_state=42)
hdbscan_model = HDBSCAN(min_cluster_size=150, metric='euclidean', cluster_selection_method='eom', prediction_data=True)

Reduce the embeddings 2-dimensions for visualization purposes

In [22]:
# Pre-reduce embeddings for visualization purposes (proxy for what the embedding structure looks like)
reduced_embeddings = UMAP(n_neighbors=15, n_components=2, min_dist=0.0, metric='cosine', random_state=42).fit_transform(embeddings)

  warn(f"n_jobs value {self.n_jobs} overridden to 1 by setting random_state. Use no seed for parallelism.")


### **Representation Models**
c-TF-IDF as main representation and [KeyBERT](https://maartengr.github.io/BERTopic/getting_started/representation/representation.html#keybertinspired), [MMR](https://maartengr.github.io/BERTopic/getting_started/representation/representation.html#maximalmarginalrelevance), and [Llama 2](https://maartengr.github.io/BERTopic/getting_started/representation/llm.html) as additional representations.

Focus on Llama 2 Representation

In [23]:
from bertopic.representation import KeyBERTInspired, MaximalMarginalRelevance, TextGeneration

# KeyBERT
#keybert = KeyBERTInspired()

# MMR (increases diversity of resulting key words. If we have topic with "car" and "cars" redundant it reduces car or cars)
#mmr = MaximalMarginalRelevance(diversity=0.3)

# Text generation with Llama 2
llama2 = TextGeneration(generator, prompt=prompt)

# All representation models
representation_model = {
    #"KeyBERT": keybert,
    "Llama2": llama2,
    #"MMR": mmr,
}

# 🔥 **Training**

In [24]:
from bertopic import BERTopic

topic_model = BERTopic(

  # Sub-models
  embedding_model=embedding_model,
  umap_model=umap_model,
  hdbscan_model=hdbscan_model,
  representation_model=representation_model,

  # Hyperparameters
  top_n_words=10,
  verbose=True
)

# Train model
topics, probs = topic_model.fit_transform(tweet_list, embeddings)


2024-03-18 21:08:04,358 - BERTopic - Dimensionality - Fitting the dimensionality reduction algorithm
2024-03-18 21:09:46,809 - BERTopic - Dimensionality - Completed ✓
2024-03-18 21:09:46,812 - BERTopic - Cluster - Start clustering the reduced embeddings
2024-03-18 21:09:51,334 - BERTopic - Cluster - Completed ✓
2024-03-18 21:09:51,354 - BERTopic - Representation - Extracting topics from clusters using representation models.
100%|██████████| 41/41 [00:58<00:00,  1.43s/it]
2024-03-18 21:10:51,181 - BERTopic - Representation - Completed ✓


topics generated:

In [25]:
pd.set_option('display.max_columns', None)

In [26]:
# Show topics
topic_model.get_topic_info()

Unnamed: 0,Topic,Count,Name,Representation,Llama2,Representative_Docs
0,-1,6714,-1_the_rt_to_in,"[the, rt, to, in, of, january6thcommitteeheari...","[""Trump Election Lies"", , , , , , , , , ]",[RT @CNN: Former Attorney General Bill Barr ca...
1,0,5999,0_both_they_kevin_backed,"[both, they, kevin, backed, mitch, mccarthy, m...","[Politicians backtrack on criticisms of Trump,...",[RT @AdamKinzinger: Mitch McConnell.\nKevin Mc...
2,1,5987,1_january6thcommitteehearings_https_co_the,"[january6thcommitteehearings, https, co, the, ...","[""Trump Testimony Controversy"", , , , , , , , , ]",[RT @carolmswain: #Trump testifying before the...
3,2,1612,2_creating_hamill_correct_overthrowing,"[creating, hamill, correct, overthrowing, resi...","[Political resistance and activism, , , , , , ...",[RT @Resist_MAGA_GOP: Mark Hamill is correct. ...
4,3,1367,3_rudy_hatch_giuliani_meadows,"[rudy, hatch, giuliani, meadows, yet, new, rog...","[""Trump Coup Attempt"", , , , , , , , , ]",[RT @StandForBetter: 📺 NEW Video:\n\nTrump Los...
5,4,1175,4_demands_ja_deserves_unanimously,"[demands, ja, deserves, unanimously, it, histo...","[Trump Subpoena Vote, , , , , , , , , ]",[RT @AdamKinzinger: We just voted unanimously ...
6,5,1116,5_deploy_sec_defense_acting,"[deploy, sec, defense, acting, pathetic, perso...","[Chris Miller's Deployment Decision, , , , , ,...",[RT @sandibachom: IS THIS THING ON???!!This is...
7,6,1055,6_goaded_author_summoned_excuse,"[goaded, author, summoned, excuse, rioters, at...","[""Trump and the January 6th Attack"", , , , , ,...",[RT @AdamKinzinger: Trump is the author of the...
8,7,947,7_his_rejection_loss_break,"[his, rejection, loss, break, couldn, accept, ...","[Trump's attempted break of democracy, , , , ,...",[RT @AdamKinzinger: When he couldn't accept hi...
9,8,787,8_95kjex8faf_says_trumpcoupattempt_standforbetter,"[95kjex8faf, says, trumpcoupattempt, standforb...","[""Trump Coup Attempt Hearings"", , , , , , , , , ]",[RT @StandForBetter: This says it all.\n\n#Tru...


## Llama 2 Topic Names


In [27]:
llama2_labels = [label[0][0].split("\n")[0] for label in topic_model.get_topics(full=True)["Llama2"].values()]
topic_model.set_topic_labels(llama2_labels)

In [28]:
llama2_labels

['"Trump Election Lies"',
 'Politicians backtrack on criticisms of Trump',
 '"Trump Testimony Controversy"',
 'Political resistance and activism',
 '"Trump Coup Attempt"',
 'Trump Subpoena Vote',
 "Chris Miller's Deployment Decision",
 '"Trump and the January 6th Attack"',
 "Trump's attempted break of democracy",
 '"Trump Coup Attempt Hearings"',
 "Trump and Tomi Ahonen's Twitter exchanges",
 'Law enforcement failures in addressing white nationalist threats',
 'Political tweets',
 'Legal disputes over Mar-a-Lago ownership',
 'Congressional leadership',
 "Fugitive Mama's Denial of FBI Operation",
 'Political controversies involving Tom Fitton and President Trump',
 'Political opinions and statements about Speaker Pelosi',
 '"Donald Trump\'s Election Lies"',
 'Trump Riot Coverup',
 'Political polarization and inflammatory rhetoric',
 'Trump arrest calls',
 'Political tweets about Nancy Pelosi',
 'Political tweets from Maya Wiley',
 '#January6thCommitteeHearings',
 "Trump's election fraud