# PoC : News RAG LLM

1. Split data in chunks
2. Create embeddings and Faiss index; (Semanntic search)
3. Implement TF-IDF and cosine similairty (keyword search)
4. Create Hybrid-search (semantic + keyword)
5. Create Agent for summary generation 
6. Create Agent for linkedin post generation
7. Create request logic; manages all the agents

In [1]:
import pandas as pd
import numpy as np 


In [None]:
#load dataset
news_df = pd.read_csv('news_category_dataset.csv', 
                      usecols=['document']) #load dataset

print(f" the dataset has {news_df.shape[0]} rows and {news_df.shape[1]} features")

news_df.head(2)

 the dataset has 20953 rows and 1 features


Unnamed: 0,document
0,Headline: What If We Were All Family Generatio...
1,Headline: Firestorm At AOL Over Employee Benef...


## User Defined Functions

In [3]:
def chunking_doc(document):
    from langchain_text_splitters import RecursiveCharacterTextSplitter

    text_splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=64)
    chunk_documents = text_splitter.split_text(document) if document else []
    #print(f"Document chunked successfully.\n Document has {len(chunk_documents)} pages")

    return chunk_documents

In [4]:
def initialize_embedding(model_name):
    "initialize embedding model"
    from sentence_transformers import SentenceTransformer #load library

    if isinstance(model_name,str):
        model_name = model_name

        #initialize embedding model using Sentence Transformer
        model_embedding = SentenceTransformer(model_name)

        #save model for reuse
        model_embedding.save("./embedding_model") #saves config + model weights

        return model_embedding
    else:
        raise TypeError("The model_name must be a string.")

In [5]:
def load_embedding_model(file_path,device_name):

    '''Function to load embedding model '''  
    from langchain_community.embeddings import HuggingFaceEmbeddings

    if isinstance(file_path,str) and isinstance(device_name,str):
        emb_model_id = file_path
        model_kwargs = {'device': device_name}
        encode_kwargs = {"normalize_embeddings": False}

        #initialize embedding model
        embedding_model= HuggingFaceEmbeddings(model_name=emb_model_id,
                                               model_kwargs= model_kwargs,
                                               encode_kwargs=encode_kwargs
                                               )
    
        return embedding_model
    else:
        raise TypeError("The file path and device_name must be a string.")

In [6]:
def initialize_faiss(document,embedding_model):
    ''' Create and save FAISS index'''
    import faiss
    from langchain.schema import Document
    from langchain_community.vectorstores import FAISS

    # Wrap the text chunks into Document objects with page_content
    documents = [Document(page_content=doc) for doc in document]

    #create FAISS index
    faiss_index = FAISS.from_documents(documents,embedding_model)

    #save FAISS index
    faiss_index.save_local("./faiss_index")

    return faiss_index  

In [7]:
def load_faiss_index(file_path,embedding_model):
    '''load saved faiss index'''
    import faiss
    from langchain_community.vectorstores import FAISS
    from langchain_community.embeddings import HuggingFaceEmbeddings
    
    if isinstance(file_path,str):
        faiss_index = FAISS.load_local(file_path,
                                       embedding_model,
                                       allow_dangerous_deserialization=True ) # Enable safe loading
        return faiss_index
    
    else:
        raise TypeError('The file path must be string.')

## 1. Split data in chunks

In [8]:
#Step 1

#change pd.series to list for embedding etc..
documents = news_df['document'].tolist()

print(documents[:2],'\n')

#chunk tests prior to embeding
#chunked_texts = list(map(chunking_doc,documents))
chunked_texts = [chunk for doc in documents for chunk in chunking_doc(doc)]

#preview output
print(len(chunked_texts))

["Headline: What If We Were All Family Generation Changers?\nDate: 2014-06-20\nAuthor: Matt Murrie, ContributorEdupreneur, Cofounder/Chief Curiosity Curator of What If...?\nSummary: What if, in doing so, we won't just create new opportunities for ourselves, we'll also uncover ways to create new opportunities for our families that may not have otherwise existed?\nArticle: In Tara Stone's conversation from the What If...? Conference this past March she begins with the recognition that there is no such thing as a perfect family. It's up to us to be the difference that can make our families better. Tara urges us to go beyond the standards our families have provided and challenges us to push down new paths. What if, in doing so, we won't just create new opportunities for ourselves, we'll also uncover ways to create new opportunities for our families that may not have otherwise existed? The life you have doesn't mean it's the life you've been given. Sometimes we can make the biggest impact o

## 2. Create embeddings and Faiss index (Semanntic search)

In [9]:
#Step 2 Create EMbeddings and FAISS index

model_emb_id="sentence-transformers/all-MiniLM-L6-v2"
emb_file_path = "./embedding_model"
device_name = 'cuda'
faiss_file_path = "./faiss_index"

#intialize embedding model
#emb_model= initialize_embedding(model_emb_id)

#load saved model with Huggin face embeddings
embedding_model = load_embedding_model(emb_file_path,device_name)


#create faiss index
#faiss_index = initialize_faiss(chunked_texts,embedding_model)

#load faiss index
faiss_index = load_faiss_index(faiss_file_path,embedding_model)

  embedding_model= HuggingFaceEmbeddings(model_name=emb_model_id,


In [None]:
#test functionality of 'FAISS similarity search'

query = "Joe Biden vaccine mandate"
query_embedding = embedding_model.embed_query(query)  # Correct method

#Search using LangChain's method 
docs_and_scores = faiss_index.similarity_search_with_score(query, k=5)

#Display results
for doc, score in docs_and_scores:
    print(f"Score: {score:.3f}")
    print(doc.page_content + "...\n")

Score: 0.553
Headline: Biden Easing Foreign Travel Restrictions, Requiring Vaccines
Date: 2021-09-20
Author: ZEKE MILLER, AP
Summary: President Joe Biden will ease foreign travel restrictions into the U.S. beginning in November, when his administration will require all foreign travelers flying into the country to be fully vaccinated....

Score: 0.575
Headline: Joe Biden’s ‘Vaccine Mandate’ Has An Alternative: Weekly Tests
Date: 2021-12-04
Author: Arthur Delaney, Dave Jamieson, and Igor Bobic
Summary: But the testing option won't be fun, and maybe that's the idea....

Score: 0.615
transition and continuing through virtually all of his public remarks, Biden has urged Americans to get vaccinated to protect themselves as well as their families and neighbors. He repeated that message again Monday. “Virtually all hospitalizations and deaths are occurring among unvaccinated Americans,” he said. “If you’re unvaccinated, you are not protected. So, please, please get vaccinated. Get vaccinated n

## 3. Implement TF-IDF and cosine similairty (keyword search)

In [12]:
#import require library
from sklearn.feature_extraction.text import TfidfVectorizer 
from sklearn.metrics.pairwise import cosine_similarity


In [13]:
#Initialize TF-IDF vectorizer
tf_idf= TfidfVectorizer(
    stop_words='english',#remove stop words before tokenizing
    lowercase=True,       #convert to lowercase before tokenizing
    ngram_range=(1,2)      #Use uni and bigrams
)

#text to be vectorized
documents = news_df['document'].tolist()

tf_idf_matrix = tf_idf.fit_transform(documents)


In [14]:
def keyword_search(query,tfidf_matrix,vectorizer,documents,k=5):
    ''' Search documents based on keywords

    query: Search string (must be non-empty)
        tfidf_matrix: Precomputed TF-IDF document vectors
        vectorizer: Fitted TF-IDF vectorizer instance
        documents: List of original document texts
        k: Number of top results to return (default: 5)
    
    Returns:
        List of top matching documents, or None if error occurs
    '''

    if not isinstance(query,str) or not query.strip():
        raise ValueError("Query must be a non empty string")
    
    #ensure documents are string
    documents = [str(doc) if not isinstance(doc,str) else doc for doc in documents]
    
    try:
        try:
            #transform query using vecotizer
            query_vector= vectorizer.transform([query])  #expect lists
        except AttributeError as e:
            print(f"Vectorizer failed on input type: {type(query)}")
            raise

        #search keyword: calculate cosine similarity
        cosine_sim = cosine_similarity(query_vector, tfidf_matrix)

        query_scores = cosine_sim[0] # cosine score for query

        #handle scenarios where k > available documents
        valid_k= min(k, len(documents))
        
        #get top-k indices and sort in desceding order
        top_indices = query_scores.argsort()[-valid_k:][::-1] 

        # Return matching documents with scores 
        return [(documents[i], float(query_scores[i])) for i in top_indices]
    
    except Exception as e:
        print(f'Keyword search failed: {str(e)}')
        return []





In [15]:
#test functionality of 'keyword_search()'

query = "Biden vaccine mandate for workers"

keyword_search(query=query,
               tfidf_matrix=tf_idf_matrix,
               vectorizer=tf_idf,
               documents=documents)



[("Headline: Federal Judge Suspends New York City's Vaccine Mandate For Teachers\nDate: 2021-09-25\nAuthor: Michael Hill, AP\nSummary: But the district is confident that it will prevail.\nArticle: New York City schools have been temporarily blocked from enforcing a vaccine mandate for its teachers and other workers by a federal appeals judge just days before it was to take effect. Workers in the nation’s largest school system were to be required to show vaccination proof starting Monday. But late Friday, a judge for the 2nd U.S. Circuit Court of Appeals granted a temporary injunction sought by a group of teachers and referred the case to a three-judge panel an an expedited basis. Department of Education spokesperson Danielle Filson said that officials are seeking a speedy resolution and that the circuit court has the motion on its calendar for Wednesday. “We’re confident our vaccine mandate will continue to be upheld once all the facts have been presented, because that is the level of 

Function works as expected:
The scores suggest the first article is more relevant to the query, as it focuses more on worker mandates rather than the testing.

## 4. Create Hybrid-search (semantic + keyword)

In [16]:
# Hybrid search functionality: Fallback search, if semantic search lower than a set threshold use keyword search
def model_search(embedding_model, faiss_db,query, k=5, threshold =0.4,
                 tfidf_matrix=None, vectorizer=None, documents=None):
    ''' Function dictates agents search logic: Semantic search -> Keyword search
    args:
    embedding_model: Sentence Transformer or other embedding model
    faiss_db: Faiss index for semantic search
    query: User search query
    k: Number of documents to retrieve
    threshold: Score threshold to trigger keyword fallback
    tfidf_matrix: Precomputed TF-IDF matrix for keyword search
    vectorizer: TF-IDF vectorizer
    documents: Original documents for keyword search

    Returns:
        List of tuples (document_content, score)
    '''
    #Debug check if document type
    #print(f"First document type: {type(documents[0])}")
    #print(f"Sample content: {documents[0][:100] if isinstance(documents[0], str) else documents[0]}")
    
    if not isinstance(query,str) or not query.strip():
        raise ValueError("Query must be a non empty string")
    try:
        # --- SEMANTIC SEARCH PHASE ---
        #print('Embedding query...')
        #query_embed= embedding_model.embed_query(query) #embed query using Langchain method embed_query

        print('Running semantic search ...')
        try:
            semantic_results =faiss_db.similarity_search_with_score(query,k) #search using LangChain's method
        except Exception as e:
            print(f'Semantic search failed: {str(e)}')
            semantic_results =[]
        
        if semantic_results:
            #print('check 2')
            formatted_results =[]
            for doc, score in semantic_results:

                #hanlde both string and list content
                if hasattr(doc,'page_content'):
                    content = doc.page_content if isinstance(doc.page_content,str) else ' '.join(doc.page_content)
                #normalized_score = (float(score)+1)/2
                else:
                    content = str(doc)
            
                formatted_results.append((content,float(score)))
            #print('check 3')
            # --- CONFIDENCE CHECK ---

            #caluclate confidence metrix (average top 3 scores)
            #print('check 4')
            top_scores = [ score for _,score in semantic_results[:3]]
            #print(top_scores)
            semantic_confidence = sum(top_scores)/len(top_scores) if top_scores else 0

            # --- FALLBACK DECISION ---
            #Debug check if document type
            #print(f"First document type: {type(documents[0])}")
            #print(f"Sample content: {documents[0][:100] if isinstance(documents[0], str) else documents[0]}")

            #Check if semantic confidence is < threshold
            if semantic_confidence < threshold:
                print('Semantic confidence value low\nRunning keyword search ...')

                #ensure documents are strings
                doc_str =[]
                for doc in documents:
                    if isinstance(doc,str):
                        doc_str.append(doc)
                    elif isinstance(doc,list):
                        doc_str.append(' '.join(str(x) for x in doc))

                    else:
                        doc_str.append(str(doc))

                

                keyword_results = keyword_search(query=query,
                                                 tfidf_matrix=tfidf_matrix,
                                                 vectorizer=vectorizer,
                                                documents=doc_str) or [] 
            
                combined_results = formatted_results + keyword_results #concatenate both results
                print('Search Succesful: results include semantic and keyword search results')
                return sorted(combined_results,
                              key=lambda x:x[1], #sort by score
                              reverse=True       # sort in descending order
                              )[:k]              #return top k results

            else:
                print('Search Succesful')
                return formatted_results[:k]
        
    except Exception as e:
        print(f'Model encountered obstacles during search: {str(e)}')
        return []

    



In [None]:
#test functionality of hybrid-search
query = "Biden vaccine mandate for workers"
results = model_search(
    embedding_model=embedding_model,
    faiss_db=faiss_index,
    query=query,
    tfidf_matrix=tf_idf_matrix,
    vectorizer=tf_idf,
    documents=documents
)

for content, score in results:
    print(f"Score: {score:.3f}")
    print(content[:200] + "...\n")

Running semantic search ...
Search Succesful
Score: 0.601
Article: WASHINGTON ― A faction of congressional Republicans threatened to shut down the government this week over what they call President Joe Biden’s “vaccine mandate.” The rule from the Occupationa...

Score: 0.648
Headline: Biden Easing Foreign Travel Restrictions, Requiring Vaccines
Date: 2021-09-20
Author: ZEKE MILLER, AP
Summary: President Joe Biden will ease foreign travel restrictions into the U.S. beginni...

Score: 0.651
Headline: Joe Biden’s ‘Vaccine Mandate’ Has An Alternative: Weekly Tests
Date: 2021-12-04
Author: Arthur Delaney, Dave Jamieson, and Igor Bobic
Summary: But the testing option won't be fun, and maybe ...

Score: 0.658
transition and continuing through virtually all of his public remarks, Biden has urged Americans to get vaccinated to protect themselves as well as their families and neighbors. He repeated that messa...

Score: 0.678



## 5. Create Agent for summary generation 

*Workflow*

<li> Take a user query → retrieve relevant documents (via hybrid search) → generate a meaningful response using the google/gemma-7b-it LLM. 

In [28]:
#import libraries
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import pipeline
from transformers import BitsAndBytesConfig
import torch

In [29]:
#huggin face LLM
model_name = "google/gemma-7b-it"   

### Agent: Generate summary of news

In [30]:
def generate_summary(documents, model_name= "google/gemma-7b-it", max_tokens=512):
    """Generate summary of content provided using Gemma LLM
    
    Args:
        documents: List of tuples (document_content, score) from hybrid search
        model_name: Name of the Gemma model variant
        max_tokens: Maximum length of summary to generate
        
    Returns:
        str: Generated summary text 
        
        """

    # ------ 1. Prepare Input Text ------

    content = [doc[0] for doc in documents] #get document from documents tuple(content, score)
    input_doc = "\n\n".join(content[:3])  #join top 3 documents found

    #truncate to model's context window if required
    max_input_length = 8000   #model's max context window is 8192 tokens
    if len(input_doc) > max_input_length:
        input_doc = input_doc[:max_input_length]

    #------2. Initialize LLM and tokenizer-------
    
    #initialize 4bit quantized version
    quantization_config = BitsAndBytesConfig(load_in_4bit=True,
                                             bnb_4bit_compute_dtype=torch.bfloat16,
                                             bnb_4bit_quant_type="nf4")

    #initialize the LLM
    model = AutoModelForCausalLM.from_pretrained(model_name,
                                                 quantization_config=quantization_config,
                                                   device_map="cuda",
                                                   torch_dtype=torch.bfloat16
                                                   )

    #initialize the tokenizer
    tokenizer = AutoTokenizer.from_pretrained(model_name,
                                              model_max_length= 8192)


    #------3. Create Prompt --------
    prompt = f""" Summarize the following documents about the search query. Focus on key facts,figures, and main points.
    The summary should communicate the main theme of the document.
    Keep it concise and factual.

    Documents: {input_doc}

    Summary:"""

    #------4. Generate summary-------
    inputs = tokenizer(prompt,
                       return_tensors="pt",
                       #max_length= tokenizer.model_max_length - max_tokens,
                       #return_overflowing_tokens=False,
                       truncation=True).to("cuda")

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_tokens,
            do_sample=True,
            temperature=0.6,
            #num_beams=1,
            top_k=50,
            top_p=0.95,   
        )

    summary_results = tokenizer.decode(outputs[0], skip_special_tokens=True)

    #remove the input prompt
    summary_output =summary_results.replace(prompt,"").strip()

    return summary_output

In [31]:
#Test functionality of Agent for document summarization'


# Get documents from hybrid search
search_results = model_search(
    embedding_model=embedding_model,
    faiss_db=faiss_index,
    query="Biden vaccine mandate for workers",
    tfidf_matrix=tf_idf_matrix,
    vectorizer=tf_idf,
    documents=documents
)

# Generate summary
summary = generate_summary(
    documents=search_results,
    model_name="google/gemma-7b-it",
    max_tokens=512
)

print("Generated Summary:")
print(summary)

Running semantic search ...
Search Succesful


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Generated Summary:
The documents discuss the ongoing debate surrounding President Biden's vaccine mandate for foreign travelers and the potential consequences of the regulation. The key points highlighted in the articles include the requirement for foreign travelers to be fully vaccinated, the alternative option of weekly COVID-19 tests, and the potential impact of the mandate on various stakeholders. The main theme of the documents is the ongoing struggle between those who support and oppose the vaccine mandate.

The articles emphasize that the vaccine mandate applies to foreign travelers and not domestic workers, addressing concerns raised about potential job loss due to the regulation. The testing option is intended to provide flexibility for individuals who are unable or unwilling to get vaccinated. However, the testing process itself is not without its challenges, as it requires regular testing and may be inconvenient for some.

Overall, the documents provide a concise overview of

## 6. Create Agent for linkedin post generation

*Workflow*

<li> Take a user query → retrieve relevant documents (via hybrid search) → Linkedin Post using the mistralai/Mistral-7B-Instruct-v0.2

In [32]:

def generate_linkedin_post(documents, 
                           model_name= "mistralai/Mistral-7B-Instruct-v0.2", 
                           topic="news and media ",
                           max_tokens=600):
    """Generate optimized LinkedIn posts from source content
    
    Args:
        documents: List of (content, score) tuples from search
        model_name: LLM identifier (default: Mistral 7B Instruct)
        topic: Central theme for the post
        max_tokens: Length control (400-700 ideal for LinkedIn)
        
    Returns:
        str: Viral-ready LinkedIn post
    """
    torch.cuda.empty_cache() #clear cache

    # ------ 1. Prepare Input Text ------

    content = [doc[0] for doc in documents] #get document from documents tuple(content, score)
    input_doc = "\n- ".join(content[:3])  #join with bulleted list,top 3 documents found

    #truncate to model's context window if required
    max_input_length = 3000   #model's max context window is 32k tokens
    if len(input_doc) > max_input_length:
        input_doc = input_doc[:max_input_length]

    #------2. Initialize LLM and tokenizer-------
    
    #initialize 4bit quantized version
    quantization_config = BitsAndBytesConfig(load_in_4bit=True,
                                             bnb_4bit_compute_dtype=torch.bfloat16,
                                             bnb_4bit_quant_type="nf4",
                                             bnb_4bit_use_double_quant=True   # Extra memory savings
                                             )   

    #initialize the LLM
    model = AutoModelForCausalLM.from_pretrained(model_name,
                                                 quantization_config=quantization_config,
                                                   device_map="cuda",
                                                   torch_dtype=torch.bfloat16
                                                   )

    #initialize the tokenizer
    tokenizer = AutoTokenizer.from_pretrained(model_name,
                                              model_max_length= 32768)


    #------3. Create Prompt --------

    prompt = f"""As a LinkedIn algorithm specialist, create a viral post about {topic} using:
    - {input_doc}
    
    Apply these rules:
    1. Hook: Start with surprising data or bold question
    2. Body: 2 short paragraphs (3-4 sentences each)
    3. Include:
       - 1 verifiable statistic (add "[Citation Needed]" if unsure)
       - 1 practical tip (use "> Pro Tip:" formatting)
       - 1 discussion question
    4. Formatting:
       - 2 line breaks between sections
       - 3 hashtags: 1 broad, 1 niche, 1 trending
       - Max 2 emojis total
       - 600-800 characters total

    Linkedin Post:"""
    

    #------4. Generate summary-------
    inputs = tokenizer(prompt,
                       return_tensors="pt",
                       #max_length= tokenizer.model_max_length - max_tokens,
                       #return_overflowing_tokens=False,
                       truncation=True).to("cuda")

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_tokens,
            do_sample=True,
            temperature=0.7,
            #num_beams=1,
            top_k=50,
            top_p=0.90,
            repetition_penalty=1.2   
        )

    linkedin_post_results = tokenizer.decode(outputs[0], skip_special_tokens=True)

    #remove the input prompt
    generated_post =linkedin_post_results.split("Linkedin Post:")[-1].strip()
    

    return generated_post

In [None]:
#Test functionality of Agent for Linkedin post generation

# Get documents from hybrid search
search_results = model_search(
    embedding_model=embedding_model,
    faiss_db=faiss_index,
    query= "AI applications in modern journalism",
    tfidf_matrix=tf_idf_matrix,
    vectorizer=tf_idf,
    documents=documents
)

post = generate_linkedin_post(
    documents=search_results,
    topic="AI in journalism",
    model_name="mistralai/Mistral-7B-Instruct-v0.2"
)
print(post)

Running semantic search ...
Search Succesful


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


📰💡 [Surprising Stat]: Traditional newsroom budgets have plummeted by 75% since 2000 [Citation Needed]. But even as the landscape shifts dramatically, one thing remains constant: Good stories still reign supreme! 🤔 #JournalismRevolution #MediaReform

    > Pro Tip: Use AI to find fresh angles & sources for your next big scoop. 🚀 #ContentMarketingHacks

    🌍🌐 Discussion Question: How can artificial intelligence best be integrated into the journalism ecosystem? Are there any ethical considerations that must be addressed before implementation? Let us know below! 👇🏼 [Emoji 1: 🤝](https://emogip.me/emoticons/people/handshake/) [Emoji 2: 🤔](https://emogip.me/emoticons/thoughtful/)


## 7. Create request logic; manages all the agents

*Workflow*

1. Take a user query → retrieve relevant documents (via hybrid search) → classify query intent → intent == news → Generate summary of documents
2. Take a user query → retrieve relevant documents (via hybrid search) → classify query intent → intent == linkedin post → Generate draft Linkedin post

In [25]:
#Agent Routing Logic

documents = news_df['document'].tolist()

def classify_intent(query):
    '''check for key word in query; then categorize '''

    if not isinstance(query,str) or not query.strip():
        raise ValueError("Query must be a non empty string")
    
             
    news_keywords = ['news', 'updates', 'article', 'headline', 'latest']
    post_keywords = ['linkedin', 'post', 'share', 'generate', 'update']
    
    if any(keyword in query.lower() for keyword in news_keywords):
        return 'news'
    elif any(keyword in query.lower() for keyword in post_keywords):
        return 'linkedin post'
    else:
        return 'unknown'
    

def route_request(query,embedding_model,faiss_db,tfidf_matrix,vecotrizer,documents,topic):
    "Determine which Agent is required based on user query"
    if not isinstance(query,str) or not query.strip():
        raise ValueError("Query must be a non empty string")
     
    search_results = model_search(embedding_model=embedding_model,
                                  faiss_db=faiss_index,
                                  query= query,
                                  tfidf_matrix=tf_idf_matrix,
                                  vectorizer=tf_idf,
                                  documents=documents
                                  )
    try:
         print("Classifying query intent...")
         intent = classify_intent(query)
         if intent == 'news': 
            print('Classified as news; generating news summary...')
            summary = generate_summary(documents=search_results,
                                        model_name="google/gemma-7b-it",
                                        max_tokens=512
                                        )
            print('Summary successfully created...')
            return summary
         elif intent == 'linkedin post':
            print('Classified as Linkedin post request; generating draft post...')
            post = generate_linkedin_post(documents=search_results,
                                          topic=topic,
                                          model_name="mistralai/Mistral-7B-Instruct-v0.2"
                                          )
            print('Post successfully created...')
            return post
         else:
             print('Could not classify your intent')

    except Exception as e:
        print(f'Failed to classify query: {str(e)}')
             

            
             
     




In [None]:
#Test functionality of request routing logic 

query = "Create a post about Biden vaccine mandate for workers"
results=route_request(embedding_model=embedding_model,
                      faiss_db=faiss_index,
                      query=query,
                      tfidf_matrix=tf_idf_matrix,
                      vecotrizer=tf_idf,
                      documents=documents,
                      topic='news and media')

print(results)

Running semantic search ...
Search Succesful
Classifying query intent...
Classified as Linkedin post request; generating draft post...


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Post successfully created...
🚨 Surprising News Alert! 🚨 Republican lawmakers threaten to shut down the govt over Biden's 'vaccine mandate.' But here's the kicker: There IS an alternative! [Citation Needed] #COVID19 #Politics

    According to a recent report by HuffPost, a faction of Congressional Republicans is threatening to shut down the U.S. government over President Joe Biden's vaccine mandate. However, they seem to be ignoring a significant detail: employees can choose weekly tests instead of getting vaccinated [1]. This revelation comes as Biden continues to urge Americans to get vaccinated to protect themselves and their loved ones [2]. Here's a quick look at why testing might not be a popular choice > Pro Tip: If you're required to comply with workplace regulations regarding COVID-19, consider discussing the testing option with your employer before making a decision. <br>

    What do you think? Is this an effective compromise for those who refuse to get vaccinated? Or will it