In [142]:
import os
import re
from openai import OpenAI
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

# Importing the raw text data from a markdown file taken from a pdf parsed by llama

In [201]:
with open("hegelpos.md", "r", encoding="utf8") as ff:
    data = ff.read()

In [202]:
data[:1000]

'\n        Preface\n\n¶1. It is customary to begin a work by explaining in a preface the aim that the author\nset himself in the work, his reasons for writing it, and the relationship in which he\nbelieves it to stand to other earlier or contemporary treatments of the same subject.\nIn the case of a philosophical work, however, such an explanation seems not only\nsuperfluous but, in view of the nature of the Thing,1                          even inappropriate and\nmisleading. For the sort of statement that might properly be made about philosophy\nin a preface—say, a historical            report   of the main direction and standpoint, of the\ngeneral content and results, a string of desultory assertions and assurances about the\ntrue—cannot be accepted as the way and manner in which to expound philosophical\ntruth. Also, philosophy moves essentially in the element2                            of universality that\nembraces the particular within itself, and this creates the impression, mo

In [145]:
type(data)

str

# Cleaning the text data a bit to make it easier for the LLM to parse

In [146]:
cleandata = data.strip()

In [147]:
cleandata = cleandata.replace("\n", " ")

In [148]:
pattern = '[\uF000-\uF999]'

cleandata = re.sub(pattern, " ", cleandata)

In [149]:
pattern = '\s+'

cleandata = re.sub(pattern, " ", cleandata)

# Creating chunks using the existing aphorisms

In [150]:
splitdata = cleandata.split("¶")

In [151]:
splitdata[700]

'700. If we ask, what is the actual spirit which has the consciousness of its absolute essence in the religion of art, then the answer is that it is the ethical or the true spirit. This spirit is not only the universal substance of all singletons; since this substance has for actual consciousness the shape of consciousness, this means, in addition, that this substance, which has individualization, is known by the singletons as their own essence and product. So for them it is not the light-essence in whose unity the Being- for-self of self-consciousness is contained only negatively, only in a transitory way, and beholds the master of its actuality,—nor is it the consuming and relentless hatred of hostile peoples,—nor their subjugation to castes which together constitute the semblance of organization of a completed whole, but a whole that lacks the universal freedom of the individuals. No, this spirit is the free people in which custom constitutes the substance of all, whose actuality an

# Creating API client from OpenAI

In [152]:
client = OpenAI(api_key="ENTER YOUR OWN API KEY HERE IF YOU WANT TO RUN IT")

In [153]:
resp = client.chat.completions.create(
 model="gpt-3.5-turbo",
 messages=[{"role": "user", "content": "Hello world"}]
)

In [154]:
resp

ChatCompletion(id='chatcmpl-A4VpUmUK20iqyGNdmFzHz2Asmrk6x', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Hello! How can I assist you today?', role='assistant', function_call=None, tool_calls=None, refusal=None))], created=1725639028, model='gpt-3.5-turbo-0125', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=9, prompt_tokens=9, total_tokens=18))

In [155]:
resp.choices[0].message.content

'Hello! How can I assist you today?'

# Function to return text response from client query

In [156]:
def answerme(client, question):

    resp = client.chat.completions.create(model="gpt-3.5-turbo", messages=[{"role": "user", "content": question}])

    return resp.choices[0].message.content

In [157]:
print(answerme(client, "What is the port of Aysén famous for?"))

The port of Aysén in Chile is famous for being the gateway to the remote and beautiful Aysén Region, known for its stunning landscapes, glaciers, fjords, and outdoor recreational opportunities such as fishing, kayaking, and hiking.


# Function to embed a chunk of text

In [158]:
def get_embedding(client, text, model="text-embedding-ada-002"):
   text = text.replace("\n", " ")
   return client.embeddings.create(input = [text], model=model).data[0].embedding

# Vectorizing the text

In [161]:
#This only needs to be run once to obtain the vectors. 

#vectors = []

#for _ in range(0, len(splitdata)):

    #try:
        #vectors.append(get_embedding(client, splitdata[_]))
    #except:
        #vectors.append(np.zeros(1536))

In [162]:
#df = pd.DataFrame(vectors)

In [163]:
#df.to_csv("vectors.csv")

# Importing existing vector frame

In [167]:
df = pd.read_csv("vectors.csv", index_col=0)

In [168]:
df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,1526,1527,1528,1529,1530,1531,1532,1533,1534,1535
0,0.014789,-0.010883,-0.037749,-0.007581,0.006367,0.023629,0.005017,-0.008140,-0.020314,0.003286,...,0.002692,-0.011487,0.005162,-0.015997,-0.016832,-0.007137,0.018040,-0.021830,0.018541,-0.013138
1,0.016903,0.012458,0.012169,-0.010006,-0.014281,0.013520,-0.031656,-0.008517,-0.006711,-0.005488,...,0.004806,-0.005062,0.011960,-0.008845,-0.038869,-0.032863,0.015618,-0.004383,0.009566,-0.050068
2,0.012877,0.003183,0.008545,-0.001652,-0.015958,0.017289,-0.017595,-0.013489,-0.009450,-0.012411,...,0.009403,-0.014853,0.015093,-0.013975,-0.035110,-0.025461,0.014521,0.003266,-0.000340,-0.039130
3,-0.008276,0.001609,0.025049,-0.009256,-0.014511,0.017089,-0.016095,-0.022203,-0.030660,-0.030365,...,0.001323,0.003054,0.014807,-0.015746,-0.032190,-0.010471,-0.002618,-0.006940,-0.003950,-0.043386
4,0.013413,-0.000642,0.026282,-0.017366,-0.002777,0.016197,-0.018176,0.006843,-0.009852,-0.012204,...,0.017047,0.004710,0.008610,-0.025777,-0.030294,-0.017858,0.032447,-0.007208,-0.016037,-0.038665
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2718,-0.011102,0.007972,0.017217,-0.009396,-0.017637,0.006784,-0.001374,-0.012690,-0.012834,-0.026036,...,0.017965,-0.010059,0.012867,-0.033069,-0.030943,0.008149,0.008950,-0.005922,-0.006791,-0.029841
2719,-0.012151,0.002775,0.023420,0.001810,-0.008421,0.025766,-0.013826,-0.032601,-0.023718,-0.010978,...,0.035368,-0.016097,0.024627,-0.042718,-0.002151,0.033632,0.003940,-0.005831,-0.006591,-0.035368
2720,0.010984,-0.017450,0.026957,-0.004615,-0.008260,0.027577,-0.003366,-0.029478,-0.020322,0.009750,...,0.021199,-0.013660,0.002067,-0.025055,-0.025891,0.001400,-0.001155,-0.020403,-0.016883,-0.017733
2721,0.007435,-0.009077,-0.009428,0.002120,-0.025064,-0.004552,-0.009012,-0.023025,-0.007493,-0.018233,...,0.006617,-0.010324,0.009376,-0.031920,-0.016947,0.023155,-0.001584,0.005912,0.006214,-0.010363


In [170]:
# Check the indexing still matches up

len(splitdata)

2723

# Function to return top k matches using cosine simiarlity

In [171]:
def find_top_k_matches(query, vecdf, textdata, k=5):
    query_embedding = np.array(get_embedding(client, query, model="text-embedding-ada-002"))
    
    embeddings = vecdf.to_numpy()
    
    similarities = cosine_similarity(query_embedding.reshape(1,-1), embeddings).flatten()

    top_k_indices = similarities.argsort()[-k:][::-1]

    return top_k_indices


In [173]:
topk = find_top_k_matches("What is the spirit in Hegel's hierarchy?", df, textdata=splitdata, k=5)

# Example of top k matches found

In [175]:
for _ in topk:

    print(splitdata[_])

767. 1. Spirit, as conceived by Christianity, involves three stages that correspond not only to the three persons of the Trinity, but to the three parts of Hegel’s system. The pure substance, or God the Father, is thinking or logic: cf. Hegel, SL, p.29: ‘It can therefore be said that this content is the exposition of God as he is in his eternal essence before the creation of nature and of a finite spirit.’ Logic descends into the singularity of Christ or into nature, the second part of the system. This involves representation especially, not only in religion, but also in Hegel’s system, since the description of nature requires determinate empirical concepts, such as that of a plant, not only pure thoughts, such as those of Being or of substance. Christ is other than God and nature is other than pure thought. Finally, in the holy spirit and in Hegel’s account of mind or spirit, spirit returns from otherness and representation into self- consciousness. Each of these stages is a sort of c

# Function to run the RAG model

In [193]:
def rag(client, textdata, vecdf, query, k):

    ind = find_top_k_matches(query, vecdf, textdata, k)

    info = ". ".join([splitdata[_] for _ in ind])

    combquery = f"The question is: {query}, please make use of the following retrieved data when answering the question: {info}"

    print("What the LLM sees:\n")
    print(combquery + "\n\n")

    print("Resulting answer:\n") 
    return answerme(client, combquery)

# Comparison of RAG output vs default output

In [196]:
print(rag(client, textdata=splitdata, vecdf=df, query="What is spirit in Hegel's hierarchy?", k=6))

What the LLM sees:

The question is: What is spirit in Hegel's hierarchy?, please make use of the following retrieved data when answering the question: 767. 1. Spirit, as conceived by Christianity, involves three stages that correspond not only to the three persons of the Trinity, but to the three parts of Hegel’s system. The pure substance, or God the Father, is thinking or logic: cf. Hegel, SL, p.29: ‘It can therefore be said that this content is the exposition of God as he is in his eternal essence before the creation of nature and of a finite spirit.’ Logic descends into the singularity of Christ or into nature, the second part of the system. This involves representation especially, not only in religion, but also in Hegel’s system, since the description of nature requires determinate empirical concepts, such as that of a plant, not only pure thoughts, such as those of Being or of substance. Christ is other than God and nature is other than pure thought. Finally, in the holy spirit 

In [197]:
print(answerme(client, "What is spirit in Hegel's hierarchy?"))

In Hegel's hierarchical system, spirit (Geist) is the highest level of development and the culmination of the dialectical process. According to Hegel, spirit represents the ultimate reality and the self-consciousness of the Absolute. It is the principle that drives history forward and manifests itself in human culture, art, religion, and philosophy. Spirit is dynamic, self-aware, and constantly evolving, seeking to reconcile and transcend contradictions to achieve complete unity and self-realization. Hegel believed that the ultimate goal of spirit is to achieve absolute knowledge and freedom through the process of self-discovery and self-awareness.


In [198]:
print(rag(client, textdata=splitdata, vecdf=df, query="What are Hegel's thoughts on the power of the netherworld?", k=6))

What the LLM sees:

The question is: What are Hegel's thoughts on the power of the netherworld?, please make use of the following retrieved data when answering the question: 442. 1. Hegel summarizes the whole development of spirit. It begins with the tightly knit ethical life of the Greek city-state. This is rent apart by its self-knowledge, leading to the abstract ‘right’ of the Roman Empire, which sets the Self in opposition to the substance. Division persists in the history of Europe down to the French revolution: there are two realms, that of culture (the hither side) and that of faith (the beyond). These realms are --- shattered by the ‘insight’ of the enlightenment, the conceptual penetration of an object that results in its dissolution. The two realms thus coalesce into morality, where the Self is all- important, culminating in conscience. . 463. 1. Hegel finds a twofold syllogistic structure in the ethical realm. One syllogism is the movement from the extreme of conscious spiri

In [191]:
print(answerme(client, "What are Hegel's thoughts on the power of the netherworld?"))

Hegel did not specifically focus on the power of the netherworld in his philosophical work. However, in his concept of "Spirit" as a historical force that drives the progression of human societies towards higher levels of self-awareness and freedom, it could be interpreted that the netherworld, or the realm of the unconscious and unknown, plays a role in shaping and influencing human consciousness and history. Hegel believed that the dialectical process of thesis-antithesis-synthesis leads to the development of new ideas and societal progress, so it is possible that he would see the netherworld as a source of potential transformation and change.
