# Reading PDF papers
##### All the downloaded academic papers are about "Leader Election in Distributed Computing Systems"
The papers have been downloaded from **arXiv**

<img src="https://info.arxiv.org/brand/images/brand-logo-primary.jpg" width="250" />


In [None]:
!pip install PyPDF2

In [35]:
import PyPDF2
from tqdm import tqdm
import os


def divide_chunks(text_body, min_chunk_size=300, max_chunk_size=1000):
    in_chunks = text_body.split(".\n")
    out_chunks = []
    curr_chunk = ""
    for ic in in_chunks:
        if len(ic) < min_chunk_size:
            if len(ic)+len(curr_chunk) < max_chunk_size:
                curr_chunk = curr_chunk + "\n" + ic
            else:
                out_chunks.append(curr_chunk)
                curr_chunk = ""
    return out_chunks


def clean_chunk(raw_chunk):
    chunk = raw_chunk.replace('-\n','').replace('\n',' ')
    return chunk


pdf_paths = [x for x in os.listdir("/media") if ".pdf" in x]

print("\n".join(pdf_paths))

bodies = []
titles = []

for pdf_path in pdf_paths:
    with open(f"/media/{pdf_path}", "rb") as file:
        reader = PyPDF2.PdfReader(file)
        body = ""
        for page in tqdm(reader.pages):
            text = page.extract_text()
            body += "\n\n" + text
        bodies.append(body.strip())
        titles.append(pdf_path.strip())

chunk_list = []
for body in bodies:
    chunks = [clean_chunk(c) for c in divide_chunks(body)[1:]]
    chunk_list.append(chunks)

print("\n\n".join(chunk_list[3]))

Distributed Consensus in Content Centric Networking.pdf
Fault Toleran Leader Election in Distributed Systems.pdf
Improved Tradeoffs for Leader Election.pdf
A Survey and Taxonomy of Leader Election Algorithms in Distributed Systems.pdf
Modified Bully Algorithm using Election.pdf
ZePoP A Distributed Leader Election Protocol using the Delay-based Closeness Centrality for Peer-to-Peer Applications.pdf
Improved Bully Election Algorithm for Distributed Systems.pdf


100%|██████████| 3/3 [00:00<00:00, 27.32it/s]
100%|██████████| 8/8 [00:00<00:00, 28.22it/s]
100%|██████████| 33/33 [00:04<00:00,  7.46it/s]
100%|██████████| 17/17 [00:00<00:00, 25.63it/s]
100%|██████████| 8/8 [00:00<00:00, 18.87it/s]
100%|██████████| 9/9 [00:01<00:00,  4.96it/s]
100%|██████████| 22/22 [00:00<00:00, 32.30it/s]

 Then t time slot with the probability of  Would not be successful is equivalent to Then, t=e.ln f Theorem 2: if probability series be D∞ [log u] with minimum probability of 1-1/f (for each f>=1) algorithm will  be finished in time gap of log f [log u]. To prove the details  of the algorithm refer article10 If the channel state be null, a binary search in [0, 2t2]  that is the same as Broadcast (2t2/ 4) is executed.If the channel state be null, a binary search in [2t2/ 2, 2t] that is the same as Broadcast ( 3_ 4, 2t2) is executed else if the status of the channel is GOLLISION then the stations that have not transmitted become inactive Table 1 shows leader election protocols along with  their implementation time-order Figure 2. Convergence time (a) Malpani algorithm,  (b) Derhab Algorithm

 Figure 2 shows the comparison of Convergence  time of Malpani and Derhab algorithm. Section a is for  Malpani algorithm and section b is for Derhab algorithm.  As you can see, Derhab algorithm with t




# Embeddings
##### Multidimensional Vector Representation of Semantic Meanings
<img src="https://corpling.hypotheses.org/files/2018/04/Screen-Shot-2018-04-25-at-13.21.44.png" width="400" />



In [None]:
!pip install sentence_transformers

In [37]:
from sentence_transformers import SentenceTransformer
import itertools

model_name = "sentence-transformers/all-MiniLM-L12-v2"

model = SentenceTransformer(model_name)

sentences = list(itertools.chain(*chunk_list))

embeddings = [[float(x) for x in model.encode(s)] for s in sentences]

# Database
##### Creating a Vector Database of Embeddings using the open-source ChromaDB
<img src="https://www.mlq.ai/content/images/2023/08/1_admwyPyR6v_IZI0EYE--eA-1.webp" width="250" />


In [None]:
!pip install chromadb

In [None]:
import chromadb
from chromadb.utils import embedding_functions


chroma_client = chromadb.Client()

default_ef = embedding_functions.DefaultEmbeddingFunction()

collection = chroma_client.get_or_create_collection(name="leader_election_distributed_systems", embedding_function=default_ef)

collection.add(
    documents=sentences,
    embeddings=embeddings,
    ids=[str(x) for x in range(len(embeddings))]
)

# ChatBot
##### Using the open-source Databricks' Dolly model fine-tuned through the RAG techinique
<img src="https://www.databricks.com/sites/default/files/2023-04/Dolly-logo.png" width="300" />

In [None]:
!pip install transformers torch langchain

In [45]:
from transformers import pipeline
import torch
from langchain import PromptTemplate
from langchain.llms import HuggingFacePipeline
from langchain.chains.question_answering import load_qa_chain


def build_qa_chain():

    model_name = "databricks/dolly-v2-3b" # Dolly smallest version (3 billion params)

    instruct_pipeline = pipeline(model=model_name, torch_dtype=torch.bfloat16, trust_remote_code=True,
                                 return_full_text=True, max_new_tokens=4096, top_p=0.95, top_k=50,
                                 device=0) #cuda

    template = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

    Instruction:
    You are an expert about leader election algorithms in distributed systems.
    You use a simple language to explain concepts.
    You reply using only short textual descriptions, no images.

    {context}

    Question: {question}

    Response:
    """

    prompt = PromptTemplate(input_variables=['context', 'question'], template=template)

    hf_pipe = HuggingFacePipeline(pipeline=instruct_pipeline)

    return load_qa_chain(llm=hf_pipe, chain_type="stuff", prompt=prompt, verbose=True)

In [46]:
# Building the chain will load Dolly and can take several minutes depending on the model size
qa_chain = build_qa_chain()

In [47]:
class Document():
    def __init__(self, content):
        self.page_content = content
        self.metadata = {"metadata": ""}

def get_similar_docs(question):
    results = collection.query(
        query_embeddings=[float(x) for x in model.encode(question)],
        n_results=3
    )
    return results["documents"]

def answer_question(question):
    similar_docs = [Document(x) for x in get_similar_docs(question)]
    result = qa_chain({"input_documents": similar_docs, "question": question})
    return result

In [51]:
import os
os.environ['CURL_CA_BUNDLE'] = ''

question = "Why distributed systems need a leader?"

answer = answer_question(question)



[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mBelow is an instruction that describes a task. Write a response that appropriately completes the request.
    
    Instruction: 
    You are an expert about leader election algorithms in distributed systems.
    You use a simple language to explain concepts.
    You reply using only short textual descriptions, no images.
    
    [' [14] Shay Kutten,William K MosesJr,GopalPandurangan, and David Peleg. 2020. Singularly OptimalRandomized Leader Election.In 34th InternationalSymposiumonDistributedComputing  [15] Shay Kutten, Gopal Pandurangan, David Peleg, Peter Rob inson, and Amitabh Trehan. 2015. On the complexity of universal leaderelection. Journalof theACM (JACM) 62,1 (2015),1–27 [16] Shay Kutten, Gopal Pandurangan, David Peleg, Peter Rob inson, and Amitabh Trehan. 2015. Sublinear bounds for randomizedleader election. TheoreticalComputerScience 561(20

In [52]:
print(answer["output_text"])


Leader election is required to maintain the integrity of the system.
In a distributed system, the communication amongst the components is not only limited, but also asynchronous. As a result, there is no global point of coordination. In such a setup, who will be the master, who will be the slave, or who will be the leader? A leader is expected to bring all the slaves together under it.
There are various leader election algorithms like singly-probetical single leader election, k-approval, etc. 
For a leader election to be successful, it is very important to ensure that only one leader can claim the election, and that leader can claim the election only if all the slaves accept it.
There are several factors that can affect the success of a leader election algorithm. 
    
    - Communication: in a distributed system, the communication amongst the components is not only limited, but also asynchronous. As a result, there is no global point of coordination. In such a setup, there is a possi