In [24]:
import warnings
warnings.filterwarnings('ignore')

In [25]:
import gradio as gr
from transformers import pipeline, AutoTokenizer

# Load model and tokenizer
model_path = "models/llama3-8b-merged-4bit"
tokenizer = AutoTokenizer.from_pretrained(model_path)

llm = pipeline(
    "text-generation",
    model=model_path,
    tokenizer=model_path,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    top_p=0.95,
    repetition_penalty=1.1,
    torch_dtype="auto",
    device_map="auto",
    pad_token_id=tokenizer.eos_token_id
)

# Initial context
conversation_context = (
    "You are Michael Fried, a renowned art critic and historian. "
    "You respond in your own intellectual and formal voice, referencing your essays and ideas."
)

Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████| 2/2 [00:03<00:00,  1.70s/it]
Device set to use cuda:0


In [26]:
llm("what is art")

[{'generated_text': 'what is art, that would be the “universally valid criterion” for separating it from non-art. For this reason Schopenhauer’s theory of value has its own kind of ontological primacy and ought to count more than any empirical criterion in our aesthetic judgments. It seems to me that Fried is not just being playful when he claims that “the Kantian perspective on the question of how we judge works of art is ‘disinterested’.” What Fried means by the aestheticism of modernism (and which we are to understand as a form of anti-aestheticism) is that what matters aesthetically is how a work instantiates a certain normative or axiological category (in this case, the category of anti-theatricality). We can know whether something is an aesthetic object through its instantiation of a particular kind, rather than by trying to understand the subjective reactions it produces in us. The aesthetic quality of a work is determined by its kind. That is why Fried sees the work of art as a

In [27]:
import os
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

folder_path = "./data/fried/"
all_docs = []

for file in os.listdir(folder_path):
    if file.endswith(".pdf"):
        pdf_path = os.path.join(folder_path, file)
        loader = PyPDFLoader(pdf_path)
        docs = loader.load()
        all_docs.extend(docs)

# Split all loaded docs
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
split_docs = splitter.split_documents(all_docs)

Ignoring wrong pointing object 55 0 (offset 0)
Ignoring wrong pointing object 35 0 (offset 0)
Ignoring wrong pointing object 45 0 (offset 0)
Ignoring wrong pointing object 40 0 (offset 0)
Ignoring wrong pointing object 40 0 (offset 0)
Ignoring wrong pointing object 113 0 (offset 0)
Ignoring wrong pointing object 79 0 (offset 0)
Ignoring wrong pointing object 29 0 (offset 0)
incorrect startxref pointer(1)
parsing for Object Streams
Ignoring wrong pointing object 23 0 (offset 0)
Ignoring wrong pointing object 85 0 (offset 0)
Ignoring wrong pointing object 36 0 (offset 0)
Ignoring wrong pointing object 40 0 (offset 0)
Ignoring wrong pointing object 46 0 (offset 0)
Ignoring wrong pointing object 28 0 (offset 0)
Ignoring wrong pointing object 29 0 (offset 0)
Ignoring wrong pointing object 62 0 (offset 0)
Ignoring wrong pointing object 23 0 (offset 0)
Ignoring wrong pointing object 39 0 (offset 0)
Ignoring wrong pointing object 92 0 (offset 0)
Ignoring wrong pointing object 36 0 (offset 0)
I

In [28]:
len(split_docs)

33567

In [29]:
from langchain_huggingface import HuggingFaceEmbeddings
embedding = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

In [30]:
from langchain.vectorstores import FAISS

db = FAISS.from_documents(split_docs, embedding)
db.save_local("faiss_index/")
#retriever = db.as_retriever()
retriever = FAISS.load_local("faiss_index/", embedding, allow_dangerous_deserialization=True).as_retriever() 

In [31]:
from langchain.chains import RetrievalQA
from langchain_huggingface import HuggingFacePipeline
from langchain.prompts import PromptTemplate

llm_pipeline = HuggingFacePipeline(pipeline=llm)


custom_prompt = PromptTemplate.from_template(
    """
    You are Michael Fried, a renowned art critic and historian. 
    You respond in your own intellectual and formal voice, referencing your essays and ideas.

    {context}
    
    Question: {question}
    Helpful Answer:"""
)


rag_chain = RetrievalQA.from_chain_type(
    llm=llm_pipeline,
    retriever=retriever,
    return_source_documents=False,
    chain_type_kwargs={"prompt": custom_prompt}
)


In [32]:
query = "What can you tell me about the nature of form in Caravaggio’s painting? "
# query = "who are you?"
result = rag_chain({"query": query})

print(result["result"].split("Helpful Answer:")[-1].strip())
# print(result["source_documents"])

Because he was so interested in realist effects he often cut up his models in order to present them more effectively. For instance, he often sliced off their hands, because if you don’t see them they look just the same as if they had been the best dressed. Other parts of the body were altered in various ways. His technique is also very distinctive... In fact, you might say that a lot of his compositions are about the relationship between form and matter, or rather about how form relates to what we might call its potential energy. The composition is driven by the forms in a way that makes their form become more important than the bodies themselves. You could even argue that his compositional strategies are less about the content of the painting, the subject, than about the presentation of the subject, although some of his subjects, like David playing with Cupid after the Battle of Schipwa
