First we will extract all the text from the pdf and store it in our variable full text. We use fitz from the pymupdf library.

In [23]:
import fitz 

def extraction(pdf):
    doc = fitz.open(pdf)
    full_text = ""
    for page in doc:
        full_text += page.get_text() + "\n"
    return full_text

pdf = "Som.pdf"
text = extraction(pdf)
print(text[:1000])

SOM 101 
Notes 
Introduction to Management 
 
Module 1: The field and functions of management 
 Historicity of the Management Function and Management Education 
1. Origins of Management: 
Early Foundations: 
The origins of management can be traced back to ancient civilizations like Egypt, China, and 
Rome, where large projects like the construction of pyramids or the Great Wall required 
organized labour and managerial control. 
Ancient Civilizations: In ancient Egypt, the construction of the pyramids involved tens of 
thousands of workers who needed to be coordinated over decades. The organization required 
careful planning and resource management, an early example of large-scale project 
management. 
Military and Government: In ancient China, the administration of the vast empire during the 
Han Dynasty required a hierarchical structure with clear roles and responsibilities, akin to 
modern bureaucratic management. The building of the Great Wall, for example, required 
extensive plan

Just to check whether the pdf is read correctly or not and its contents are correct. Now we will break off the text into semantic chunks so that each chunk contains some consensus of meaning.

In [24]:
def split_paras(text):
    paragraphs = [p.strip() for p in text.split("\n\n") if p.strip()]
    return paragraphs

In [None]:
chunks = split_paras(text)
print(len(chunks))
print(len(chunks[0].split()))

38
284


Generally in pdfs the different contexts are separated into different paragraphs, so using that to our advantage, i first split the text in different paragraphs and made them into chunks. 

In [None]:
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')

embeddings = model.encode(chunks, show_progress_bar=True)

import numpy as np
from numpy.linalg import norm

def cosine_similarity(vec1, vec2):
    return np.dot(vec1, vec2) / (norm(vec1) * norm(vec2))

Batches: 100%|██████████| 2/2 [00:00<00:00,  3.74it/s]


After generating different paragraphs, we use the Sentence transformer which works very similarly to an attention block but instead of being for tokens, it is used for sentences. It assigns embedded vectors to each sentence it receives. Here i have use the all-MiniLM-L6-v2 model of the sentence transformer. Then using the previously made chunks we embedd them into the pre-designed categories by the model. Then we determine similarity between different vectors using the dot product. We use this similarity to encode similar embedded vectors. 

In [27]:
def semantic_merge(paragraphs, embeddings, threshold=0.8):
    merged_chunks = []
    current_chunk = paragraphs[0]
    current_embedding = embeddings[0]

    for i in range(1, len(paragraphs)):
        sim = cosine_similarity(current_embedding, embeddings[i])
        if sim >= threshold:
            current_chunk += "\n\n" + paragraphs[i]
            current_embedding = (current_embedding + embeddings[i]) / 2
        else:
            merged_chunks.append(current_chunk)
            current_chunk = paragraphs[i]
            current_embedding = embeddings[i]

    merged_chunks.append(current_chunk)
    return merged_chunks


As explained in the above md, we use the cosine similarity function to find the dot products of two different embeddings and how close they are, if they are closer than a certain threshold, 0.8 for our case, then we consider them to be sematically connected. We merge the semantically connected chunks.

In [None]:
semantic_chunks = semantic_merge(chunks, embeddings, threshold=0.8)

print(f"Total semantic chunks: {len(semantic_chunks)}")
print(f"First chunk preview:\n{semantic_chunks[0][:500]}")


Batches: 100%|██████████| 2/2 [00:00<00:00,  8.97it/s]

Total semantic chunks: 38
First chunk preview:
SOM 101 
Notes 
Introduction to Management 
 
Module 1: The field and functions of management 
 Historicity of the Management Function and Management Education 
1. Origins of Management: 
Early Foundations: 
The origins of management can be traced back to ancient civilizations like Egypt, China, and 
Rome, where large projects like the construction of pyramids or the Great Wall required 
organized labour and managerial control. 
Ancient Civilizations: In ancient Egypt, the construction of the py





Now we use FAISS, which is a library that helps us to find the nearest neighbours on a large database of vectors in a high dimension space. We are using the eucleidean L2 distance to find the distance between vectors which is used by the index. 

In [29]:
import faiss
import numpy as np

dimension = embeddings[0].shape[0]
index = faiss.IndexFlatL2(dimension) 
index.add(np.array(embeddings))
faiss.write_index(index, "chunks.index")

Now when we inputs a query, the model encodes the query into a embedded vector and then uses the index search of faiss to find the 3 closest neighbours to the query vector, what this does is, from the semantically merged chunks it finds the top 3 chunks whose semantic vectors are closest to the query vector thus formed. 

In [30]:
query = "What is the role of marketing management?"
query_embedding = model.encode([query])
D, I = index.search(np.array(query_embedding), k=3)  
for i in I[0]:
    print(semantic_chunks[i])

Step 2: Managing Others to Leading Managers 
In this transition, a leader moves from managing a team to managing multiple teams or other 
managers. This requires developing skills in managing managers, setting broader strategic 
goals, and ensuring alignment across teams.  
Example: A team leader in a marketing department is promoted to a marketing manager, 
responsible for overseeing the heads of the social media, content creation, and advertising 
teams. They must now ensure that each team's efforts are aligned with the overall marketing 
strategy and company goals. 
Step 3: Leading Managers to Functional Manager 
At this stage, the leader becomes responsible for an entire function or department within the 
organization. This involves a deeper understanding of the business, strategic planning, and 
optimizing departmental performance.  
Example: A marketing manager is promoted to the role of Director of Marketing. They are 
now responsible for the overall performance of the marketing

Once we find the top three nearest neighbours, we use the groq api to send a prompt and the prompt needs to be in a very specified format for which i used help of internet sources. We send in the query, the context, which is nothing but the conjunction of the top three chunks. 

In [31]:
def prepare_prompt(query, chunks):
    context = "\n\n".join(chunks)
    prompt = f"""Answer the following question using the context provided.

Context:
{context}

Question:
{query}

Answer:"""
    return prompt


Using the groq api, it returns us the answer from the LLM based on the question and the context we gave it nd then we print the answer.

In [None]:
import requests


def query_groq(prompt):
    headers = {
        "Authorization": f"Bearer {GROQ_API_KEY}",
        "Content-Type": "application/json"
    }

    data = {
        "model": "llama3-8b-8192", 
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": prompt}
        ],
        "temperature": 0.2
    }

    response = requests.post(GROQ_URL, headers=headers, json=data)
    return response.json()['choices'][0]['message']['content']


In [None]:
relevant_chunks = [semantic_chunks[idx] for idx in I[0]]
prompt = prepare_prompt(query, relevant_chunks)
answer = query_groq(prompt)
print("Answer:\n", answer)


Answer:
 According to the context, marketing management involves understanding customer needs and preferences and then creating products or services that satisfy these needs. It encompasses activities such as market research, product development, branding, pricing, distribution, and promotion. The goal is to create value for customers while achieving the organization's objectives, such as increasing market share, revenue, and profitability.
