# **Censored Crawled Conversation**

**Text-To-Text Chatbots to Demonstrate Output Differences Between Models Trained on Filtered/Unfiltered Datasets for HSS4 - The Modern Context: Select Figures and Topics**

Some notes regarding the code:
*   Run the installations at the bottom of the page first. (To run things press the play button on the left side!)
*   Run T5 or Flan-T5 one at a time, your computer probably can't handle both.
*   DO NOT use punctuation or numbers in your messages. You will not get a response.
*   Errors of any kind? Do the following:
  *   Go to runtime at the top of the page and click disconnect and delete runtime.
  *   Reconnect by checking whether or not you are connected in the top right corner.
  *   Run installations after that is finished disconnecting.
  *   Run whichever model (T5 or Flan-T5) you want!





In [None]:
# @title T5 (C4)
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
import torch

model_name = "google/t5-v1_1-small"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
print("T5 (C4)")
for step in range(1): #change the number in range to get a longer conversation
    text = input(">> You:")
    input_ids = tokenizer.encode(text + tokenizer.eos_token, return_tensors="pt")
    bot_input_ids = torch.cat([chat_history_ids, input_ids], dim=-1) if step > 0 else input_ids
    chat_history_ids_list = model.generate(
        bot_input_ids,
        max_length=1000,
        do_sample=True,
        top_p=0.95,
        top_k=50,
        temperature=0.75,
        num_return_sequences=5,
        pad_token_id=tokenizer.eos_token_id
    )
    for i in range(len(chat_history_ids_list)):
      output = tokenizer.decode(chat_history_ids_list[i][bot_input_ids.shape[-1]:], skip_special_tokens=True)
      print(f"T5 {i}: {output}")
    choice_index = int(input("Choose the response you want for the next input: "))
    chat_history_ids = torch.unsqueeze(chat_history_ids_list[choice_index], dim=0)

In [None]:
# @title Flan-T5 (Substitute Common Crawl)
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
import torch

model_name = "google/flan-t5-small"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
print("Flan-T5 (Substitute Common Crawl)")
for step in range(1): #change the number in range to get a longer conversation
    text = input(">> You:")
    input_ids = tokenizer.encode(text + tokenizer.eos_token, return_tensors="pt")
    bot_input_ids = torch.cat([chat_history_ids, input_ids], dim=-1) if step > 0 else input_ids
    chat_history_ids_list = model.generate(
        bot_input_ids,
        max_length=1000,
        do_sample=True,
        top_p=0.95,
        top_k=50,
        temperature=0.75,
        num_return_sequences=5,
        pad_token_id=tokenizer.eos_token_id
    )
    for i in range(len(chat_history_ids_list)):
      output = tokenizer.decode(chat_history_ids_list[i][bot_input_ids.shape[-1]:], skip_special_tokens=True)
      print(f"Flan-T5 {i}: {output}")
    choice_index = int(input("Choose the response you want for the next input: "))
    chat_history_ids = torch.unsqueeze(chat_history_ids_list[choice_index], dim=0)

In [None]:
# @title Installations (Run This First!)
!pip install transformers
!pip install sentencepiece
!pip install torch