# Querying Indian Railways AC Traction Manual: Harnessing Vector Search with LangChain and Astra DB

#### Pre-requisites:

You need a **_Serverless Cassandra with Vector Search_** database on [Astra DB](https://astra.datastax.com) to run this demo. As outlined in more detail [here](https://docs.datastax.com/en/astra-serverless/docs/vector-search/quickstart.html#_prepare_for_using_your_vector_database), you should get a DB Token with role _Database Administrator_ and copy your Database ID: these connection parameters are needed momentarily.

You also need an [OpenAI API Key](https://cassio.org/start_here/#llm-access) for this demo to work.

#### What you will do:

- Setup: import dependencies, provide secrets, create the LangChain vector store;
- Run a Question-Answering loop retrieving the relevant headlines and having an LLM construct the answer.

Install the required dependencies:

In [None]:
!pip install -q cassio datasets langchain openai tiktoken

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.3/41.3 kB[0m [31m758.2 kB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m536.7/536.7 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m817.0/817.0 kB[0m [31m17.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m226.7/226.7 kB[0m [31m26.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m33.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m18.8/18.8 MB[0m [31m53.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m16.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m19.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━

Import the packages you'll need:

In [None]:
# LangChain components to use
from langchain.vectorstores.cassandra import Cassandra
from langchain.indexes.vectorstore import VectorStoreIndexWrapper
from langchain.llms import OpenAI
from langchain.embeddings import OpenAIEmbeddings

# Support for dataset retrieval with Hugging Face
from datasets import load_dataset

# With CassIO, the engine powering the Astra DB integration in LangChain,
# you will also initialize the DB connection:
import cassio

In [None]:
!pip install PyPDF2

Collecting PyPDF2
  Downloading pypdf2-3.0.1-py3-none-any.whl (232 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m232.6/232.6 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: PyPDF2
Successfully installed PyPDF2-3.0.1


In [None]:
from PyPDF2 import PdfReader

### Setup

In [None]:
ASTRA_DB_APPLICATION_TOKEN = "ASTRA_DB_APPLICATION_TOKEN" # enter the "AstraCS:..." string found in in your Token JSON file
ASTRA_DB_ID = "ASTRA_DB_ID" # enter your Database ID

OPENAI_API_KEY = "OPENAI_API_KEY " # enter your OpenAI key

#### Provide your secrets:

Replace the following with your Astra DB connection details and your OpenAI API key:

In [None]:
# provide the path of  pdf file/files.
pdfreader = PdfReader('ACTM.pdf')

In [None]:
from typing_extensions import Concatenate
# read text from pdf
raw_text = ''
for i, page in enumerate(pdfreader.pages):
    content = page.extract_text()
    if content:
        raw_text += content

In [None]:
raw_text



Initialize the connection to your database:

_(do not worry if you see a few warnings, it's just that the drivers are chatty about negotiating protocol versions with the DB.)_

In [None]:
cassio.init(token=ASTRA_DB_APPLICATION_TOKEN, database_id=ASTRA_DB_ID)

ERROR:cassandra.connection:Closing connection <AsyncoreConnection(139995570530640) dc844560-5760-462e-87ad-bfe1e8b86392-us-east1.db.astra.datastax.com:29042:2e5e63cf-d295-4ffb-80e0-9daf73783cd7> due to protocol error: Error from server: code=000a [Protocol error] message="Beta version of the protocol used (5/v5-beta), but USE_BETA flag is unset"


Create the LangChain embedding and LLM objects for later usage:

In [None]:
llm = OpenAI(openai_api_key=OPENAI_API_KEY)
embedding = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)

Create your LangChain vector store ... backed by Astra DB!

In [None]:
astra_vector_store = Cassandra(
    embedding=embedding,
    table_name="qa_mini_demo",
    session=None,
    keyspace=None,
)

In [None]:
from langchain.text_splitter import CharacterTextSplitter
# We need to split the text using Character Text Split such that it sshould not increse token size
text_splitter = CharacterTextSplitter(
    separator = "\n",
    chunk_size = 800,
    chunk_overlap  = 200,
    length_function = len,
)
texts = text_splitter.split_text(raw_text)

In [None]:
texts[:50]

['CHAPTER I  \n \nMANAGEMENT OF AC TRACTION  \n \n1. HEADQUARTERS OFFICE  \n \n \nPARA NO.   SUBJECT  \n \n10100    Introduction  \n10101    Chief Electrical Engineer(CEE) -Zonal Railway  \n10102    Control Over Division  \n10103    Duties of Administrative Officers  \n10104    Stores procurement  \n1010 5   Standard Drawings and Specifications  \n \n \nII. DIVISIONAL ORGANIZATION  \n \n10106    Divisional Railway Manager (DRM)  \n10107    Duties of Traction Distribution (TRD) Officers  \n10108    Duties of Rolling Stock (RS) Officers  \n10109    Duties of Rolling Stock Operation Officers  \n10110    Duties of Principal, Training School  \n10111    Budget Estimates for Electric Traction  \n10112    Electrical Department - Budget Demands',
 '10111    Budget Estimates for Electric Traction  \n10112    Electrical Department - Budget Demands  \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \nINDIAN RAILWAYS – AC TRACTION MANUAL - VOLUME 1      [2] CHAPTER I

### Load the dataset into the vector store



In [None]:

astra_vector_store.add_texts(texts[:50])

print("Inserted %i headlines." % len(texts[:50]))

astra_vector_index = VectorStoreIndexWrapper(vectorstore=astra_vector_store)

Inserted 50 headlines.


### Run the QA cycle

Simply run the cells and ask a question -- or `quit` to stop. (you can also stop execution with the "▪" button on the top toolbar)

Here are some suggested questions:
- _.Why Lightning arresters are provided?_
- _What are the duties of Senior Divisional Electrical Engineer?_


In [None]:

first_question = True
while True:
    if first_question:
        query_text = input("\nEnter your question (or type 'quit' to exit): ").strip()
    else:
        query_text = input("\nWhat's your next question (or type 'quit' to exit): ").strip()

    if query_text.lower() == "quit":
        break

    if query_text == "":
        continue

    first_question = False

    print("\nQUESTION: \"%s\"" % query_text)
    answer = astra_vector_index.query(query_text, llm=llm).strip()
    print("ANSWER: \"%s\"\n" % answer)

    print("FIRST DOCUMENTS BY RELEVANCE:")
    for doc, score in astra_vector_store.similarity_search_with_score(query_text, k=4):
        print("    [%0.4f] \"%s ...\"" % (score, doc.page_content[:84]))


Enter your question (or type 'quit' to exit): Why Lightning arresters are provided?

QUESTION: "Why Lightning arresters are provided?"




ANSWER: "Lightning arresters are provided to protect the overhead equipment (OHE) from damage due to lightning strikes. This is an important maintenance measure to ensure the safe and efficient operation of the railway."

FIRST DOCUMENTS BY RELEVANCE:




    [0.8940] "conductors, uprooting of OHE masts on account of accidents, or to meet urgent operat ..."
    [0.8940] "conductors, uprooting of OHE masts on account of accidents, or to meet urgent operat ..."
    [0.8925] "c)   Close liaison with power supply authorities to ensure continuity of power suppl ..."
    [0.8925] "c)   Close liaison with power supply authorities to ensure continuity of power suppl ..."

What's your next question (or type 'quit' to exit): What are the duties of Senior Divisional Electrical Engineer?

QUESTION: "What are the duties of Senior Divisional Electrical Engineer?"




ANSWER: "The duties of a Senior Divisional Electrical Engineer (Sr. DEE) include being responsible for technical and organizational matters related to the operation and maintenance of power supply installations, OHE and RC equipment. They are also responsible for investigating electrical accidents and issuing directives to prevent their recurrence, as well as submitting annual reports to the Central Electricity Authority. Sr. DEEs are answerable to the CEE in all technical matters."

FIRST DOCUMENTS BY RELEVANCE:




    [0.9461] "5. Investigation of electrical accidents and Issuing directives to prevent their rec ..."
    [0.9461] "5. Investigation of electrical accidents and Issuing directives to prevent their rec ..."
    [0.9397] "responsible to CEE for efficient functioning of electric traction and electrical gen ..."
    [0.9397] "responsible to CEE for efficient functioning of electric traction and electrical gen ..."

What's your next question (or type 'quit' to exit): quit
