<a href="https://colab.research.google.com/github/carldomond7/pokemonlab/blob/main/Inference_Engine.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install cython



In [None]:
import os
os.kill(os.getpid(), 9)

In [1]:

!pip install -q -U torch==2.1.0 datasets transformers tensorflow langchain playwright html2text sentence_transformers faiss-cpu
!pip install -q accelerate==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 trl==0.4.7
!pip3 install Cython


import os
import torch
from transformers import (
  AutoTokenizer,
  AutoModelForCausalLM,
  BitsAndBytesConfig,
  pipeline,
  AutoConfig
)

from transformers import BitsAndBytesConfig

from langchain.text_splitter import CharacterTextSplitter
from langchain.document_transformers import Html2TextTransformer
from langchain.document_loaders import AsyncChromiumLoader

from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

from langchain.prompts import PromptTemplate
from langchain.schema.runnable import RunnablePassthrough
from langchain.llms import HuggingFacePipeline
from langchain.chains import LLMChain

import nest_asyncio
#################################################################
# Tokenizer
#################################################################

model_name='mistralai/Mistral-7B-Instruct-v0.1'

model_config = AutoConfig.from_pretrained(
    model_name
)

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)


#################################################################
# bitsandbytes parameters
#################################################################

# Activate 4-bit precision base model loading
use_4bit = True

# Compute dtype for 4-bit base models
bnb_4bit_compute_dtype = "float16"

# Quantization type (fp4 or nf4)
bnb_4bit_quant_type = "nf4"

# Activate nested quantization for 4-bit base models (double quantization)
use_nested_quant = False

#################################################################
# Set up quantization config
#################################################################
compute_dtype = getattr(torch, bnb_4bit_compute_dtype)

bnb_config = BitsAndBytesConfig(
    load_in_4bit=use_4bit,
    bnb_4bit_quant_type=bnb_4bit_quant_type,
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=use_nested_quant,
)

#################################################################
# Load pre-trained config
#################################################################
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
)


text_generation_pipeline = pipeline(
    model=model,
    tokenizer=tokenizer,
    task="text-generation",
    temperature=0.9,
    repetition_penalty=1.1,
    return_full_text=True,
    max_new_tokens=1000,
)

mistral_llm = HuggingFacePipeline(pipeline=text_generation_pipeline)

!playwright install
!playwright install-deps

import nest_asyncio
nest_asyncio.apply()

# Articles to index
articles = ["https://en.wikipedia.org/wiki/One_Piece",
            "https://www.reddit.com/r/OnePiece/comments/zx23m5/zoro_vs_sanji/",
            "https://www.cbr.com/one-piece-zoro-vs-sanji-fight-winner/#:~:text=Starting%20with%20their%20physical%20abilities,durable%20as%20Sanji's%20any%20longer",
            "https://blox-fruits.fandom.com/f/p/4400000000000182274"]

# Scrapes the blogs above
loader = AsyncChromiumLoader(articles)
docs = loader.load()

# Converts HTML to plain text
html2text = Html2TextTransformer()
docs_transformed = html2text.transform_documents(docs)

# Chunk text
text_splitter = CharacterTextSplitter(chunk_size=1000,
                                      chunk_overlap=0)
chunked_documents = text_splitter.split_documents(docs_transformed)

# Load chunked documents into the FAISS index
vs = FAISS.from_documents(chunked_documents,
                          HuggingFaceEmbeddings(model_name='sentence-transformers/all-mpnet-base-v2'))

retriever = vs.as_retriever()

# Create prompt template
prompt_template = """
### [INST] Instruction: Answer the question based on your One Piece (Anime) knowledge. Here is context to help:

{context}

### QUESTION:
{question} [/INST]
 """

# Create prompt from prompt template
prompt = PromptTemplate(
    input_variables=["context", "question"],
    template=prompt_template,
)

# Create llm chain
llm_chain = LLMChain(llm=mistral_llm, prompt=prompt)

rag_chain = (
 {"context": retriever, "question": RunnablePassthrough()}
    | llm_chain
)

rag_chain.invoke("In a contest of speed who would win, Zoro or Sanji?")



The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.47k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/72.0 [00:00<?, ?B/s]

`low_cpu_mem_usage` was None, now set to True since model is quantized.


model.safetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.94G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

You are calling `save_pretrained` to a 4-bit converted model, but your `bitsandbytes` version doesn't support it. If you want to save 4-bit models, make sure to have `bitsandbytes>=0.41.3` installed.


Downloading Chromium 121.0.6167.57 (playwright build v1097)[2m from https://playwright.azureedge.net/builds/chromium/1097/chromium-linux.zip[22m
[1G152.8 MiB [] 0% 0.0s[0K[1G152.8 MiB [] 0% 26.2s[0K[1G152.8 MiB [] 0% 16.3s[0K[1G152.8 MiB [] 0% 7.4s[0K[1G152.8 MiB [] 1% 6.1s[0K[1G152.8 MiB [] 1% 5.0s[0K[1G152.8 MiB [] 2% 4.5s[0K[1G152.8 MiB [] 2% 4.3s[0K[1G152.8 MiB [] 3% 4.3s[0K[1G152.8 MiB [] 3% 4.0s[0K[1G152.8 MiB [] 4% 4.0s[0K[1G152.8 MiB [] 5% 3.8s[0K[1G152.8 MiB [] 6% 3.6s[0K[1G152.8 MiB [] 6% 3.5s[0K[1G152.8 MiB [] 7% 3.4s[0K[1G152.8 MiB [] 8% 3.1s[0K[1G152.8 MiB [] 8% 3.0s[0K[1G152.8 MiB [] 9% 3.0s[0K[1G152.8 MiB [] 10% 2.9s[0K[1G152.8 MiB [] 10% 3.0s[0K[1G152.8 MiB [] 11% 2.8s[0K[1G152.8 MiB [] 12% 2.7s[0K[1G152.8 MiB [] 13% 2.6s[0K[1G152.8 MiB [] 14% 2.6s[0K[1G152.8 MiB [] 15% 2.6s[0K[1G152.8 MiB [] 15% 2.5s[0K[1G152.8 MiB [] 16% 2.5s[0K[1G152.8 MiB [] 17% 2.5s[0K[1G152.8 MiB [] 17% 2.4s[0K[1G152.8 MiB [] 19% 2.2s[0



modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

  return self.fget.__get__(instance, owner)()


tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


{'context': [Document(page_content="While the fight would be close, ultimately, Zoro would take the win. This is\npretty hard to determine given how close they seem to be in strength, but even\nso, the series has given enough proof as to why Zoro would win in a fight\nbetween the two. Starting with their physical abilities, Sanji would actually\nbe the one who wins in this aspect. Regardless of how strong his body is or\nhow much pain Zoro can tolerate, his body just isn't as durable as Sanji's any\nlonger. While the series has implied on multiple occasions that Zoro can take\nmore of a beating than Sanji could, since his Germa 66 enhancements awakened,\nit's become significantly harder to hurt Sanji.", metadata={'source': "https://www.cbr.com/one-piece-zoro-vs-sanji-fight-winner/#:~:text=Starting%20with%20their%20physical%20abilities,durable%20as%20Sanji's%20any%20longer"}),
  Document(page_content="Finally, there's the matter of their Haki. If it were just a matter of the\nstandard t

In [None]:
!pip show transformers