# Ollama server

In [30]:
%pip -q install python-magic tabulate pdf2image pytesseract llama-index llama-index-readers-web llama-index-llms-ollama llama-index-embeddings-huggingface llama-index-readers-file unstructured einops datasets huggingface_hub sentence-transformers docx2txt python-docx

In [31]:
! curl https://ollama.ai/install.sh | sh

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 10941    0 10941    0     0  52381      0 --:--:-- --:--:-- --:--:-- 52349>>> Downloading ollama...
100 10941    0 10941    0     0  38460      0 --:--:-- --:--:-- --:--:-- 38389
############################################################################################# 100.0%
>>> Installing ollama to /usr/local/bin...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.


In [32]:
import os
import threading
import subprocess
import requests
import json

def ollama():
    os.environ['OLLAMA_HOST'] = '0.0.0.0:11434'
    os.environ['OLLAMA_ORIGINS'] = '*'
    subprocess.Popen(["ollama", "serve"])


In [33]:
ollama_thread = threading.Thread(target=ollama)
ollama_thread.start()

In [34]:
!ollama pull llama3

[?25lpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest ⠦ [?25h[?25l[2K[1Gpulling manifest 
pulling 6a0746a1ec1a... 100% ▕▏ 4.7 GB                         
pulling 4fa551d4f938... 100% ▕▏  12 KB                         
pulling 8ab4849b038c... 100% ▕▏  254 B                         
pulling 577073ffcc6c... 100% ▕▏  110 B                         
pulling 3f8eb4da87fa... 100% ▕▏  485 B                         
verifying sha256 digest 
writing manifest 
removing any unused layers 
success [?25h


In [35]:
!ollama list

NAME         	ID          	SIZE  	MODIFIED      
llama3:latest	365c0bd3c000	4.7 GB	3 seconds ago	


# Libraries

In [36]:
from llama_index.core import SimpleDirectoryReader
from llama_index.core import VectorStoreIndex
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings
from llama_index.core import get_response_synthesizer
from llama_index.core import DocumentSummaryIndex
from llama_index.core.node_parser import SentenceSplitter
from llama_index.readers.web import SimpleWebPageReader
from llama_index.readers.file import (
    DocxReader,
    HWPReader,
    PDFReader,
    EpubReader,
    FlatReader,
    HTMLTagReader,
    ImageCaptionReader,
    ImageReader,
    ImageVisionLLMReader,
    IPYNBReader,
    MarkdownReader,
    MboxReader,
    PptxReader,
    PandasCSVReader,
    VideoAudioReader,
    UnstructuredReader,
    PyMuPDFReader,
    ImageTabularChartReader,
    XMLReader,
    PagedCSVReader,
    CSVReader,
    RTFReader,
)
from llama_index.core import (
    load_index_from_storage,
    StorageContext,
)


# Trying Embedding models from MTEB

In [None]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) n
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [None]:
!huggingface-cli whoami

ballav


# If want to download the model locally

In [None]:
# !huggingface-cli download meta-llama/Meta-Llama-3-8B --include "original/*" --local-dir Meta-Llama-3-8B
!huggingface-cli download meta-llama/Meta-Llama-3-8B --local-dir Meta-Llama-3-8B


Fetching 17 files:   0% 0/17 [00:00<?, ?it/s]Downloading 'README.md' to 'Meta-Llama-3-8B/.huggingface/download/README.md.53e7e6ec18e5d060bc5690314c0e04f2df3326bf.incomplete'
Downloading 'model-00002-of-00004.safetensors' to 'Meta-Llama-3-8B/.huggingface/download/model-00002-of-00004.safetensors.d9eee5f23d94405d90b7e9ff88b9443fee42f8528a658f54214c2aba7530d80c.incomplete'
Downloading 'model-00001-of-00004.safetensors' to 'Meta-Llama-3-8B/.huggingface/download/model-00001-of-00004.safetensors.f2c144103072514542e327fa8080bd375cb300f2d453fba9ca3aea81d0d4cf33.incomplete'
Downloading 'USE_POLICY.md' to 'Meta-Llama-3-8B/.huggingface/download/USE_POLICY.md.6f8e5f6e3243c92046a133c92eb62b1ed7975a0b.incomplete'
Downloading 'generation_config.json' to 'Meta-Llama-3-8B/.huggingface/download/generation_config.json.22550a806e8cfce3b346a520348e14b765287805.incomplete'
Downloading 'LICENSE' to 'Meta-Llama-3-8B/.huggingface/download/LICENSE.4c763399690c7fec642231350756e4ee9184b5ce.incomplete'
Downloading

In [None]:
Settings.embed_model = HuggingFaceEmbedding(model_name="nvidia/NV-Embed-v1", trust_remote_code=True, use_auth_token=True)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

In [None]:
from transformers import LlamaForCausalLM, LlamaTokenizer

model = LlamaForCausalLM.from_pretrained("/content/Meta-Llama-3-8B(1)")
tokenizer = LlamaTokenizer.from_pretrained("/content/Meta-Llama-3-8B(1)")


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

# Data loading and storing indexes

# 1. Tablular data from webpage




*   Detecting the tables
*   Saving it as csv
*   Mapping the csv files with their corresponding url
*   Indexing of these csv data



In [37]:
file_to_url_mapping={}

In [38]:
def table_to_csv_converter(url,file_to_url_mapping):

  from bs4 import BeautifulSoup
  import requests
  import pandas as pd
  import os

  url = url
  page = requests.get(url)
  soup = BeautifulSoup(page.text, 'html')

  # Detect all tables on the webpage
  all_tables = soup.find_all('table')
  print(f"Total number of tables: {len(all_tables)}")

  # Create a directory to save the CSV files
  directory = 'tabular_data_csv'
  if not os.path.exists(directory):
      os.makedirs(directory)

  # Iterate over each table
  for i, table in enumerate(all_tables):
      # Get the caption of the table
      table_caption = table.find('caption').text.strip()

      world_titles = table.find_all('th')
      world_table_titles = [title.text.strip() for title in world_titles]

      df = pd.DataFrame(columns=world_table_titles)

      column_data = table.find_all('tr')
      for row in column_data[1:]:
          row_data = row.find_all('td')
          individual_row_data = [data.text.strip() for data in row_data]
          df.loc[len(df)] = individual_row_data

      # Save the DataFrame as a CSV file with the table caption as the file name
      csv_file = os.path.join(directory, f'{table_caption}.csv')
      df.to_csv(csv_file, index=False)
      print(f"CSV file saved: {csv_file}")
      #file_to_url_mapping.append({ f'{table_caption}.csv':url})
      file_to_url_mapping[f'{table_caption}.csv'] = url

In [39]:
url_with_table = ['https://maharashtra.nic.in/',
 'https://maharashtra.nic.in/services/',
 'https://maharashtra.nic.in/profile/',
 'https://www.nic.in/servicecontents/nicnet/',
 'https://www.nic.in/servicecontents/data-centre/',
 'https://www.nic.in/servicecontents/national-cloud/',
 'https://www.nic.in/servicecontents/messaging/',
 'https://www.nic.in/servicecontents/remote-sensing-gis/',
 'https://www.nic.in/servicecontents/webcast/',
 'https://www.nic.in/servicecontents/nkn/',
 'https://www.nic.in/servicecontents/command-and-control-centre/',
 'https://www.nic.in/servicecontents/government-local-area-networks-lans/',
 'https://www.nic.in/servicecontents/video-conferencing/',
 'https://www.nic.in/servicecontents/security/',
 'https://www.nic.in/servicecontents/centralised-aadhaar-vault/',
 'https://maharashtra.nic.in/infrastructure/',
 'https://maharashtra.nic.in/news-update/',
 'https://maharashtra.nic.in/events/',
 'https://maharashtra.nic.in/awards/',
 'https://www.nic.in/servicecontents/domain-registration/',
 'https://maharashtra.nic.in/contact-us/',
 'https://maharashtra.nic.in/website-policies/',
 'https://maharashtra.nic.in/help/',
 'https://maharashtra.nic.in/web-information-manager/',
 'https://maharashtra.nic.in/directory/',
 'https://maharashtra.nic.in/rti/',
 'https://maharashtra.nic.in/district-centres/',
 'https://maharashtra.nic.in/photo-gallery/',
 'https://maharashtra.nic.in/video-gallery/', 'https://maharashtra.nic.in/organization-structure/']

In [40]:
for url in url_with_table:
  table_to_csv_converter(url,file_to_url_mapping)

Total number of tables: 0
Total number of tables: 0
Total number of tables: 0
Total number of tables: 0
Total number of tables: 0
Total number of tables: 0
Total number of tables: 0
Total number of tables: 0
Total number of tables: 0
Total number of tables: 0
Total number of tables: 0
Total number of tables: 0
Total number of tables: 0
Total number of tables: 0
Total number of tables: 0
Total number of tables: 0
Total number of tables: 0
Total number of tables: 0
Total number of tables: 0
Total number of tables: 0
Total number of tables: 0
Total number of tables: 0
Total number of tables: 2
CSV file saved: tabular_data_csv/Screen Reader Information.csv
CSV file saved: tabular_data_csv/Plug-in for alternate document types.csv
Total number of tables: 0
Total number of tables: 37
CSV file saved: tabular_data_csv/SIO.csv
CSV file saved: tabular_data_csv/Maharashtra State Centre, Mumbai.csv
CSV file saved: tabular_data_csv/Ahmadnagar.csv
CSV file saved: tabular_data_csv/Akola.csv
CSV file s

In [41]:
file_to_url_mapping

{'Screen Reader Information.csv': 'https://maharashtra.nic.in/help/',
 'Plug-in for alternate document types.csv': 'https://maharashtra.nic.in/help/',
 'SIO.csv': 'https://maharashtra.nic.in/directory/',
 'Maharashtra State Centre, Mumbai.csv': 'https://maharashtra.nic.in/directory/',
 'Ahmadnagar.csv': 'https://maharashtra.nic.in/directory/',
 'Akola.csv': 'https://maharashtra.nic.in/directory/',
 'Amravati.csv': 'https://maharashtra.nic.in/directory/',
 'Aurangabad.csv': 'https://maharashtra.nic.in/directory/',
 'Beed.csv': 'https://maharashtra.nic.in/directory/',
 'Bhandara.csv': 'https://maharashtra.nic.in/directory/',
 'Buldana.csv': 'https://maharashtra.nic.in/directory/',
 'Chandrapur.csv': 'https://maharashtra.nic.in/directory/',
 'Dhule.csv': 'https://maharashtra.nic.in/directory/',
 'Gadchiroli.csv': 'https://maharashtra.nic.in/directory/',
 'Gondia.csv': 'https://maharashtra.nic.in/directory/',
 'Hingoli.csv': 'https://maharashtra.nic.in/directory/',
 'Jalgaon.csv': 'https:/

# CSV index

In [42]:
from llama_index.core import VectorStoreIndex, SummaryIndex
from llama_index.readers.web import SimpleWebPageReader
import os
from llama_index.core import (
    load_index_from_storage,
    StorageContext,
)

Settings.llm = Ollama(model="llama3", temperature=0, request_timeout=500.0)

# # Overall embedding (due to size not able to load it)
# Settings.embed_model = HuggingFaceEmbedding(model_name="nvidia/NV-Embed-v1", trust_remote_code=True, use_auth_token=True)

# # Embedding used earlier
Settings.embed_model = HuggingFaceEmbedding(model_name="Alibaba-NLP/gte-large-en-v1.5", trust_remote_code=True)


# Paged CSV Reader example
parser = PagedCSVReader()

file_extractor = {".csv": parser}  # Add other CSV formats as needed
documents = SimpleDirectoryReader(
    "./tabular_data_csv", file_extractor=file_extractor
).load_data()
documents

# Create and save the Indexes for the loaded data
index = VectorStoreIndex.from_documents(documents, llm=Settings.llm, embed_model=Settings.embed_model)



#  save index1 to disk
index.set_index_id("vector_index")
index.storage_context.persist("./tabular_data_csv_index")








# 2. Text data from the webpages

In [59]:
# all the urls

urls = {
    "https://maharashtra.nic.in/": "Home_index",
    "https://maharashtra.nic.in/services/": "Services_index",
    "https://maharashtra.nic.in/profile/": "profile_index",
    "https://www.nic.in/servicecontents/nicnet/": "nicnet_index",
    "https://www.nic.in/servicecontents/data-centre/": "data-centre_index",
    "https://www.nic.in/servicecontents/national-cloud/": "national-cloud_index",
    "https://www.nic.in/servicecontents/messaging/": "messaging_index",
    "https://www.nic.in/servicecontents/remote-sensing-gis/": "remote-sensing-gis_index",
    "https://www.nic.in/servicecontents/webcast/": "webcast_index",
    "https://www.nic.in/servicecontents/nkn/": "nkn_index",
    "https://www.nic.in/servicecontents/command-and-control-centre/": "command-and-control-centre_index",
    "https://www.nic.in/servicecontents/government-local-area-networks-lans/": "government-local-area-networks-lans_index",
    "https://www.nic.in/servicecontents/video-conferencing/": "video-conferencing_index",
    "https://www.nic.in/servicecontents/security/": "security_index",
    "https://www.nic.in/servicecontents/centralised-aadhaar-vault/": "centralised-aadhaar-vault_index",
    "https://maharashtra.nic.in/infrastructure/": "infrastructure_index",
    "https://maharashtra.nic.in/news-update/": "news-update_index",
    "https://maharashtra.nic.in/events/": "events_index",
    "https://maharashtra.nic.in/awards/": "awards_index",
    "https://www.nic.in/servicecontents/domain-registration/": "domain_registration_index",
    "https://maharashtra.nic.in/contact-us/": "contact-us_index",
    "https://maharashtra.nic.in/website-policies/": "website-policies_index",
    "https://maharashtra.nic.in/help/": "help_index",
    "https://maharashtra.nic.in/web-information-manager/": "web-information-manager_index",
    "https://maharashtra.nic.in/directory/": "directory_index",
    "https://maharashtra.nic.in/rti/": "rti_index",
    "https://maharashtra.nic.in/district-centres/": "district-centres_index",
    "https://maharashtra.nic.in/photo-gallery/": "photo-gallery_index",
    "https://maharashtra.nic.in/video-gallery/": "video-gallery_index",
    "https://maharashtra.nic.in/organization-structure/": "organization-structure_index"
}


In [44]:
# URLs and corresponding link names
urls = {
    "https://maharashtra.nic.in/": "Home_index",
    "https://maharashtra.nic.in/services/": "Services_index",
    "https://maharashtra.nic.in/profile/": "profile_index",
    "https://www.nic.in/servicecontents/nicnet/": "nicnet_index",
    "https://www.nic.in/servicecontents/data-centre/": "data-centre_index",
    "https://www.nic.in/servicecontents/national-cloud/": "national-cloud_index",
    "https://www.nic.in/servicecontents/messaging/": "messaging_index",
    "https://www.nic.in/servicecontents/remote-sensing-gis/": "remote-sensing-gis_index",
    "https://www.nic.in/servicecontents/webcast/": "webcast_index",
    "https://www.nic.in/servicecontents/nkn/": "nkn_index",
    "https://www.nic.in/servicecontents/command-and-control-centre/": "command-and-control-centre_index",
    "https://www.nic.in/servicecontents/government-local-area-networks-lans/": "government-local-area-networks-lans_index",
    "https://www.nic.in/servicecontents/video-conferencing/": "video-conferencing_index",
    "https://www.nic.in/servicecontents/security/": "security_index",
    "https://www.nic.in/servicecontents/centralised-aadhaar-vault/": "centralised-aadhaar-vault_index",
    "https://maharashtra.nic.in/infrastructure/": "infrastructure_index",
    "https://maharashtra.nic.in/awards/": "awards_index",
    "https://www.nic.in/servicecontents/domain-registration/": "domain_registration_index",
    "https://maharashtra.nic.in/contact-us/": "contact-us_index",
    "https://maharashtra.nic.in/website-policies/": "website-policies_index",
    "https://maharashtra.nic.in/help/": "help_index",
    "https://maharashtra.nic.in/web-information-manager/": "web-information-manager_index",
    "https://maharashtra.nic.in/rti/": "rti_index",
    "https://maharashtra.nic.in/video-gallery/": "video-gallery_index"
}

In [45]:
from llama_index.core import VectorStoreIndex, SummaryIndex
from llama_index.readers.web import SimpleWebPageReader
import os
from llama_index.core import (
    SimpleDirectoryReader,
    load_index_from_storage,
    StorageContext,
)
from llama_index.core import download_loader
from docx import Document


# Set the configurations for the models
Settings.llm = Ollama(model="llama3", temperature=0, request_timeout=500.0)
Settings.embed_model = HuggingFaceEmbedding(model_name="Alibaba-NLP/gte-large-en-v1.5", trust_remote_code=True)


# Define the directory path to save the documents and indexes
save_directory = "/content/webpages_doc"

if not os.path.exists(save_directory):
    os.makedirs(save_directory)


# Load data from the specified URLs and save each document with the link name
BeautifulSoupWebReader = download_loader("BeautifulSoupWebReader")
loader = BeautifulSoupWebReader()

for url, link_name in urls.items():
    documents = loader.load_data(urls=[url])
    print(f"Loaded {len(documents)} documents of type {type(documents[0])} from {url}")

    file_path = os.path.join(save_directory, f"{link_name}.docx")
    doc = Document()
    for document in documents:
        doc.add_paragraph(document.text)
    doc.save(file_path)

    print(f"Data from {url} saved successfully to: {file_path}")

  BeautifulSoupWebReader = download_loader("BeautifulSoupWebReader")


Loaded 1 documents of type <class 'llama_index.core.schema.Document'> from https://maharashtra.nic.in/
Data from https://maharashtra.nic.in/ saved successfully to: /content/webpages_doc/Home_index.docx
Loaded 1 documents of type <class 'llama_index.core.schema.Document'> from https://maharashtra.nic.in/services/
Data from https://maharashtra.nic.in/services/ saved successfully to: /content/webpages_doc/Services_index.docx
Loaded 1 documents of type <class 'llama_index.core.schema.Document'> from https://maharashtra.nic.in/profile/
Data from https://maharashtra.nic.in/profile/ saved successfully to: /content/webpages_doc/profile_index.docx
Loaded 1 documents of type <class 'llama_index.core.schema.Document'> from https://www.nic.in/servicecontents/nicnet/
Data from https://www.nic.in/servicecontents/nicnet/ saved successfully to: /content/webpages_doc/nicnet_index.docx
Loaded 1 documents of type <class 'llama_index.core.schema.Document'> from https://www.nic.in/servicecontents/data-cent

In [25]:
from llama_index.core import VectorStoreIndex, SummaryIndex
from llama_index.readers.web import SimpleWebPageReader
import os
from llama_index.core import (
    load_index_from_storage,
    StorageContext,
)

Settings.llm = Ollama(model="llama3", temperature=0, request_timeout=500.0)
# # Embedding used earlier
Settings.embed_model = HuggingFaceEmbedding(model_name="Alibaba-NLP/gte-large-en-v1.5", trust_remote_code=True)
# Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5", trust_remote_code=True)

# Paged CSV Reader example
parser = DocxReader()

file_extractor = {".docx": parser}
documents = SimpleDirectoryReader(
    "./webpages_doc", file_extractor=file_extractor
).load_data()
documents

# Create and save the Indexes for the loaded data
index = VectorStoreIndex.from_documents(documents, llm=Settings.llm, embed_model=Settings.embed_model)



#  save index1 to disk
index.set_index_id("vector_index")
index.storage_context.persist("./webpages_doc_index")






# Combined Index (both csv and docx)

In [46]:
from llama_index.core import VectorStoreIndex, SummaryIndex
from llama_index.readers.web import SimpleWebPageReader
import os
from llama_index.core import (
    load_index_from_storage,
    StorageContext,
)
from llama_index.readers.file import DocxReader, PagedCSVReader

# Set up the language model and embedding
Settings.llm = Ollama(model="llama3", temperature=0, request_timeout=500.0)
Settings.embed_model = HuggingFaceEmbedding(model_name="Alibaba-NLP/gte-large-en-v1.5", trust_remote_code=True)

# Readers for different file types
docx_parser = DocxReader()
csv_parser = PagedCSVReader()

file_extractor = {
    ".docx": docx_parser,
    ".csv": csv_parser
}

# Load documents from both directories
docx_documents = SimpleDirectoryReader("./webpages_doc", file_extractor=file_extractor).load_data()
csv_documents = SimpleDirectoryReader("./tabular_data_csv", file_extractor=file_extractor).load_data()

# Combine the documents from both sources
all_documents = docx_documents + csv_documents

# Create and save the Indexes for the loaded data
index = VectorStoreIndex.from_documents(all_documents, llm=Settings.llm, embed_model=Settings.embed_model)

# Save the combined index to disk
index.set_index_id("combined_vector_index")
index.storage_context.persist("./combined_data_index")

# Optionally, to verify loading the index:
# storage_context = StorageContext.from_directory("./combined_data_index")
# loaded_index = load_index_from_storage(storage_context, index_id="combined_vector_index")


# Creating a zip of colab folder

In [57]:
import shutil
import os
import zipfile

# Specify the parent folder path
parent_folder = '/content/combined_data_index'

# Create a zip file
zip_file = zipfile.ZipFile('combined_data_index_v2.zip', 'w')

# Iterate over the parent folder and its subfolders
for root, dirs, files in os.walk(parent_folder):
    for file in files:
        file_path = os.path.join(root, file)
        zip_file.write(file_path, os.path.relpath(file_path, parent_folder))

zip_file.close()


In [58]:
from google.colab import files
files.download("combined_data_index_v2.zip")


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

# extracting folder paths

In [None]:
import os

# Specify the path to the parent directory
parent_dir = '/content/Index'

# Initialize an empty list to store the folder paths
folder_paths = []

# Iterate over the contents of the parent directory
for item in os.listdir(parent_dir):
    # Check if the item is a directory
    if os.path.isdir(os.path.join(parent_dir, item)):
        # Add the full path of the directory to the list
        folder_paths.append(os.path.join(parent_dir, item))

# Print the list of folder paths
print(folder_paths)
print(len(folder_paths))

['/content/Index/Services_index', '/content/Index/rti_index', '/content/Index/webcast_index', '/content/Index/nicnet_index', '/content/Index/messaging_index', '/content/Index/help_index', '/content/Index/Home_index', '/content/Index/infrastructure_index', '/content/Index/website-policies_index', '/content/Index/profile_index', '/content/Index/domain_registration_index', '/content/Index/nkn_index', '/content/Index/web-information-manager_index', '/content/Index/government-local-area-networks-lans_index', '/content/Index/directory_index', '/content/Index/tabular_data_csv_index', '/content/Index/awards_index', '/content/Index/centralised-aadhaar-vault_index', '/content/Index/photo-gallery_index', '/content/Index/contact-us_index', '/content/Index/data-centre_index', '/content/Index/events_index', '/content/Index/remote-sensing-gis_index', '/content/Index/news-update_index', '/content/Index/video-conferencing_index', '/content/Index/security_index', '/content/Index/national-cloud_index', '

# Combining index

In [None]:
import os
import json

# List of folders
# folders = ['/content/Index/video-gallery_index', '/content/Index/remote-sensing-gis_index', '/content/Index/help_index', '/content/Index/events_index', '/content/Index/Services_index', '/content/Index/domain_registration_index', '/content/Index/data-centre_index', '/content/Index/messaging_index', '/content/Index/web-information-manager_index', '/content/Index/nicnet_index', '/content/Index/website-policies_index', '/content/Index/news-update_index', '/content/Index/government-local-area-networks-lans_index', '/content/Index/profile_index', '/content/Index/webcast_index', '/content/Index/rti_index', '/content/Index/nkn_index', '/content/Index/awards_index', '/content/Index/security_index', '/content/Index/photo-gallery_index', '/content/Index/centralised-aadhaar-vault_index', '/content/Index/Home_index', '/content/Index/contact-us_index', '/content/Index/infrastructure_index', '/content/Index/video-conferencing_index', '/content/Index/command-and-control-centre_index', '/content/Index/national-cloud_index', '/content/Index/directory_index', '/content/tabular_data_csv_index', '/content/text_Index']
# folders = ['/content/Index/video-gallery_index', '/content/Index/remote-sensing-gis_index', '/content/Index/help_index', '/content/Index/events_index', '/content/Index/Services_index', '/content/Index/domain_registration_index', '/content/Index/data-centre_index', '/content/Index/messaging_index', '/content/Index/web-information-manager_index', '/content/Index/nicnet_index', '/content/Index/website-policies_index', '/content/Index/news-update_index', '/content/Index/government-local-area-networks-lans_index', '/content/Index/profile_index', '/content/Index/webcast_index', '/content/Index/rti_index', '/content/Index/nkn_index', '/content/Index/awards_index', '/content/Index/security_index', '/content/Index/photo-gallery_index', '/content/Index/centralised-aadhaar-vault_index', '/content/Index/Home_index', '/content/Index/contact-us_index', '/content/Index/infrastructure_index', '/content/Index/video-conferencing_index', '/content/Index/command-and-control-centre_index', '/content/Index/national-cloud_index', '/content/Index/directory_index', '/content/tabular_data_csv_index', '/content/Text_Index_v1/news_updates', '/content/Text_Index_v1/services', '/content/Text_Index_v1/districts', '/content/Text_Index_v1/events', '/content/Text_Index_v1/photo_gallery', '/content/Text_Index_v1/Nic']

folders = ['/content/Index/Services_index', '/content/Index/rti_index', '/content/Index/webcast_index', '/content/Index/nicnet_index', '/content/Index/messaging_index', '/content/Index/help_index', '/content/Index/Home_index', '/content/Index/infrastructure_index', '/content/Index/website-policies_index', '/content/Index/profile_index', '/content/Index/domain_registration_index', '/content/Index/nkn_index', '/content/Index/web-information-manager_index', '/content/Index/government-local-area-networks-lans_index', '/content/Index/directory_index', '/content/Index/tabular_data_csv_index', '/content/Index/awards_index', '/content/Index/centralised-aadhaar-vault_index', '/content/Index/photo-gallery_index', '/content/Index/contact-us_index', '/content/Index/data-centre_index', '/content/Index/events_index', '/content/Index/remote-sensing-gis_index', '/content/Index/news-update_index', '/content/Index/video-conferencing_index', '/content/Index/security_index', '/content/Index/national-cloud_index', '/content/Index/video-gallery_index', '/content/Index/docs_index', '/content/Index/command-and-control-centre_index']


# Initialize merged data
merged_data = {
    "embedding_dict": {},
    "text_id_to_ref_doc_id": {},
    "metadata_dict": {}
}

# Loop through each folder
for folder in folders:
    file_path = os.path.join(folder, 'default__vector_store.json')

    # Load the contents of the JSON file
    with open(file_path, 'r') as file:
        data = json.load(file)

    # Merge the embedding_dict
    merged_data['embedding_dict'] = {**merged_data['embedding_dict'], **data['embedding_dict']}

    # Merge the text_id_to_ref_doc_id
    merged_data['text_id_to_ref_doc_id'] = {**merged_data['text_id_to_ref_doc_id'], **data['text_id_to_ref_doc_id']}

    # Merge the metadata_dict
    merged_data['metadata_dict'] = {**merged_data['metadata_dict'], **data['metadata_dict']}

# Write the merged data to a new JSON file
with open('/content/combined_index_v4/default__vector_store.json', 'w') as merged_file:
    json.dump(merged_data, merged_file, indent=4)

In [None]:
import os
import json

# List of folders
# folders = ['/content/Index/video-gallery_index', '/content/Index/remote-sensing-gis_index', '/content/Index/help_index', '/content/Index/events_index', '/content/Index/Services_index', '/content/Index/domain_registration_index', '/content/Index/data-centre_index', '/content/Index/messaging_index', '/content/Index/web-information-manager_index', '/content/Index/nicnet_index', '/content/Index/website-policies_index', '/content/Index/news-update_index', '/content/Index/government-local-area-networks-lans_index', '/content/Index/profile_index', '/content/Index/webcast_index', '/content/Index/rti_index', '/content/Index/nkn_index', '/content/Index/awards_index', '/content/Index/security_index', '/content/Index/photo-gallery_index', '/content/Index/centralised-aadhaar-vault_index', '/content/Index/Home_index', '/content/Index/contact-us_index', '/content/Index/infrastructure_index', '/content/Index/video-conferencing_index', '/content/Index/command-and-control-centre_index', '/content/Index/national-cloud_index', '/content/Index/directory_index', '/content/tabular_data_csv_index', '/content/text_Index']
# folders = ['/content/Index/video-gallery_index', '/content/Index/remote-sensing-gis_index', '/content/Index/help_index', '/content/Index/events_index', '/content/Index/Services_index', '/content/Index/domain_registration_index', '/content/Index/data-centre_index', '/content/Index/messaging_index', '/content/Index/web-information-manager_index', '/content/Index/nicnet_index', '/content/Index/website-policies_index', '/content/Index/news-update_index', '/content/Index/government-local-area-networks-lans_index', '/content/Index/profile_index', '/content/Index/webcast_index', '/content/Index/rti_index', '/content/Index/nkn_index', '/content/Index/awards_index', '/content/Index/security_index', '/content/Index/photo-gallery_index', '/content/Index/centralised-aadhaar-vault_index', '/content/Index/Home_index', '/content/Index/contact-us_index', '/content/Index/infrastructure_index', '/content/Index/video-conferencing_index', '/content/Index/command-and-control-centre_index', '/content/Index/national-cloud_index', '/content/Index/directory_index', '/content/tabular_data_csv_index', '/content/Text_Index_v1/news_updates', '/content/Text_Index_v1/services', '/content/Text_Index_v1/districts', '/content/Text_Index_v1/events', '/content/Text_Index_v1/photo_gallery', '/content/Text_Index_v1/Nic']

folders = ['/content/Index/Services_index', '/content/Index/rti_index', '/content/Index/webcast_index', '/content/Index/nicnet_index', '/content/Index/messaging_index', '/content/Index/help_index', '/content/Index/Home_index', '/content/Index/infrastructure_index', '/content/Index/website-policies_index', '/content/Index/profile_index', '/content/Index/domain_registration_index', '/content/Index/nkn_index', '/content/Index/web-information-manager_index', '/content/Index/government-local-area-networks-lans_index', '/content/Index/directory_index', '/content/Index/tabular_data_csv_index', '/content/Index/awards_index', '/content/Index/centralised-aadhaar-vault_index', '/content/Index/photo-gallery_index', '/content/Index/contact-us_index', '/content/Index/data-centre_index', '/content/Index/events_index', '/content/Index/remote-sensing-gis_index', '/content/Index/news-update_index', '/content/Index/video-conferencing_index', '/content/Index/security_index', '/content/Index/national-cloud_index', '/content/Index/video-gallery_index', '/content/Index/docs_index', '/content/Index/command-and-control-centre_index']


# Initialize merged data
merged_data = {
    "docstore/metadata": {},
    "docstore/data": {},
    "docstore/ref_doc_info": {}
}

# Loop through each folder
for folder in folders:
    file_path = os.path.join(folder, 'docstore.json')

    # Load the contents of the JSON file
    with open(file_path, 'r') as file:
        data = json.load(file)

    # Merge the docstore/metadata
    merged_data['docstore/metadata'] = {**merged_data['docstore/metadata'], **data['docstore/metadata']}

    # Merge the docstore/data
    merged_data['docstore/data'] = {**merged_data['docstore/data'], **data['docstore/data']}

    # Merge the docstore/ref_doc_info
    merged_data['docstore/ref_doc_info'] = {**merged_data['docstore/ref_doc_info'], **data['docstore/ref_doc_info']}

# Create the merged JSON data
merged_data_final = {
    "docstore/metadata": merged_data['docstore/metadata'],
    "docstore/data": merged_data['docstore/data'],
    "docstore/ref_doc_info": merged_data['docstore/ref_doc_info']
}

# Write the merged data to a new JSON file
with open('/content/combined_index_v4/docstore.json', 'w') as merged_file:
    json.dump(merged_data_final, merged_file, indent=4)

In [None]:
import os
import json

# List of folders
# folders = ['/content/Index/video-gallery_index', '/content/Index/remote-sensing-gis_index', '/content/Index/help_index', '/content/Index/events_index', '/content/Index/Services_index', '/content/Index/domain_registration_index', '/content/Index/data-centre_index', '/content/Index/messaging_index', '/content/Index/web-information-manager_index', '/content/Index/nicnet_index', '/content/Index/website-policies_index', '/content/Index/news-update_index', '/content/Index/government-local-area-networks-lans_index', '/content/Index/profile_index', '/content/Index/webcast_index', '/content/Index/rti_index', '/content/Index/nkn_index', '/content/Index/awards_index', '/content/Index/security_index', '/content/Index/photo-gallery_index', '/content/Index/centralised-aadhaar-vault_index', '/content/Index/Home_index', '/content/Index/contact-us_index', '/content/Index/infrastructure_index', '/content/Index/video-conferencing_index', '/content/Index/command-and-control-centre_index', '/content/Index/national-cloud_index', '/content/Index/directory_index', '/content/tabular_data_csv_index', '/content/text_Index']
# folders = ['/content/Index/video-gallery_index', '/content/Index/remote-sensing-gis_index', '/content/Index/help_index', '/content/Index/events_index', '/content/Index/Services_index', '/content/Index/domain_registration_index', '/content/Index/data-centre_index', '/content/Index/messaging_index', '/content/Index/web-information-manager_index', '/content/Index/nicnet_index', '/content/Index/website-policies_index', '/content/Index/news-update_index', '/content/Index/government-local-area-networks-lans_index', '/content/Index/profile_index', '/content/Index/webcast_index', '/content/Index/rti_index', '/content/Index/nkn_index', '/content/Index/awards_index', '/content/Index/security_index', '/content/Index/photo-gallery_index', '/content/Index/centralised-aadhaar-vault_index', '/content/Index/Home_index', '/content/Index/contact-us_index', '/content/Index/infrastructure_index', '/content/Index/video-conferencing_index', '/content/Index/command-and-control-centre_index', '/content/Index/national-cloud_index', '/content/Index/directory_index', '/content/tabular_data_csv_index', '/content/Text_Index_v1/news_updates', '/content/Text_Index_v1/services', '/content/Text_Index_v1/districts', '/content/Text_Index_v1/events', '/content/Text_Index_v1/photo_gallery', '/content/Text_Index_v1/Nic']

folders = ['/content/Index/Services_index', '/content/Index/rti_index', '/content/Index/webcast_index', '/content/Index/nicnet_index', '/content/Index/messaging_index', '/content/Index/help_index', '/content/Index/Home_index', '/content/Index/infrastructure_index', '/content/Index/website-policies_index', '/content/Index/profile_index', '/content/Index/domain_registration_index', '/content/Index/nkn_index', '/content/Index/web-information-manager_index', '/content/Index/government-local-area-networks-lans_index', '/content/Index/directory_index', '/content/Index/tabular_data_csv_index', '/content/Index/awards_index', '/content/Index/centralised-aadhaar-vault_index', '/content/Index/photo-gallery_index', '/content/Index/contact-us_index', '/content/Index/data-centre_index', '/content/Index/events_index', '/content/Index/remote-sensing-gis_index', '/content/Index/news-update_index', '/content/Index/video-conferencing_index', '/content/Index/security_index', '/content/Index/national-cloud_index', '/content/Index/video-gallery_index', '/content/Index/docs_index', '/content/Index/command-and-control-centre_index']



# Initialize merged data
merged_data = {
    "graph_dict": {}
}

# Loop through each folder
for folder in folders:
    file_path = os.path.join(folder, 'graph_store.json')

    # Load the contents of the JSON file
    with open(file_path, 'r') as file:
        data = json.load(file)

    # Merge the graph_dict
    merged_data['graph_dict']= {**merged_data['graph_dict'],**data['graph_dict']}

# Write the merged data to a new JSON file
with open('/content/combined_index_v4/graph_store.json', 'w') as merged_file:
    json.dump(merged_data, merged_file, indent=4)

In [None]:
import os
import json

# List of folders
# folders = ['/content/Index/video-gallery_index', '/content/Index/remote-sensing-gis_index', '/content/Index/help_index', '/content/Index/events_index', '/content/Index/Services_index', '/content/Index/domain_registration_index', '/content/Index/data-centre_index', '/content/Index/messaging_index', '/content/Index/web-information-manager_index', '/content/Index/nicnet_index', '/content/Index/website-policies_index', '/content/Index/news-update_index', '/content/Index/government-local-area-networks-lans_index', '/content/Index/profile_index', '/content/Index/webcast_index', '/content/Index/rti_index', '/content/Index/nkn_index', '/content/Index/awards_index', '/content/Index/security_index', '/content/Index/photo-gallery_index', '/content/Index/centralised-aadhaar-vault_index', '/content/Index/Home_index', '/content/Index/contact-us_index', '/content/Index/infrastructure_index', '/content/Index/video-conferencing_index', '/content/Index/command-and-control-centre_index', '/content/Index/national-cloud_index', '/content/Index/directory_index', '/content/tabular_data_csv_index', '/content/text_Index']
# folders = ['/content/Index/video-gallery_index', '/content/Index/remote-sensing-gis_index', '/content/Index/help_index', '/content/Index/events_index', '/content/Index/Services_index', '/content/Index/domain_registration_index', '/content/Index/data-centre_index', '/content/Index/messaging_index', '/content/Index/web-information-manager_index', '/content/Index/nicnet_index', '/content/Index/website-policies_index', '/content/Index/news-update_index', '/content/Index/government-local-area-networks-lans_index', '/content/Index/profile_index', '/content/Index/webcast_index', '/content/Index/rti_index', '/content/Index/nkn_index', '/content/Index/awards_index', '/content/Index/security_index', '/content/Index/photo-gallery_index', '/content/Index/centralised-aadhaar-vault_index', '/content/Index/Home_index', '/content/Index/contact-us_index', '/content/Index/infrastructure_index', '/content/Index/video-conferencing_index', '/content/Index/command-and-control-centre_index', '/content/Index/national-cloud_index', '/content/Index/directory_index', '/content/tabular_data_csv_index', '/content/Text_Index_v1/news_updates', '/content/Text_Index_v1/services', '/content/Text_Index_v1/districts', '/content/Text_Index_v1/events', '/content/Text_Index_v1/photo_gallery', '/content/Text_Index_v1/Nic']

folders = ['/content/Index/Services_index', '/content/Index/rti_index', '/content/Index/webcast_index', '/content/Index/nicnet_index', '/content/Index/messaging_index', '/content/Index/help_index', '/content/Index/Home_index', '/content/Index/infrastructure_index', '/content/Index/website-policies_index', '/content/Index/profile_index', '/content/Index/domain_registration_index', '/content/Index/nkn_index', '/content/Index/web-information-manager_index', '/content/Index/government-local-area-networks-lans_index', '/content/Index/directory_index', '/content/Index/tabular_data_csv_index', '/content/Index/awards_index', '/content/Index/centralised-aadhaar-vault_index', '/content/Index/photo-gallery_index', '/content/Index/contact-us_index', '/content/Index/data-centre_index', '/content/Index/events_index', '/content/Index/remote-sensing-gis_index', '/content/Index/news-update_index', '/content/Index/video-conferencing_index', '/content/Index/security_index', '/content/Index/national-cloud_index', '/content/Index/video-gallery_index', '/content/Index/docs_index', '/content/Index/command-and-control-centre_index']

# Initialize merged data
merged_data = {
    "embedding_dict": {},
    "text_id_to_ref_doc_id": {},
    "metadata_dict": {}
}

# Loop through each folder
for folder in folders:
    file_path = os.path.join(folder, 'image__vector_store.json')

    # Load the contents of the JSON file
    with open(file_path, 'r') as file:
        data = json.load(file)

    # Merge the embedding_dict
    merged_data['embedding_dict'] = {**merged_data['embedding_dict'], **data['embedding_dict']}

    # Merge the text_id_to_ref_doc_id
    merged_data['text_id_to_ref_doc_id'] = {**merged_data['text_id_to_ref_doc_id'], **data['text_id_to_ref_doc_id']}

    # Merge the metadata_dict
    merged_data['metadata_dict'] = {**merged_data['metadata_dict'], **data['metadata_dict']}

# Write the merged data to a new JSON file
with open('/content/combined_index_v4/image_vector_store.json', 'w') as merged_file:
    json.dump(merged_data, merged_file, indent=4)

In [None]:
# VectorStoreIndex

import os
import json

# List of folders
# folders = ['/content/Index/video-gallery_index', '/content/Index/remote-sensing-gis_index', '/content/Index/help_index', '/content/Index/events_index', '/content/Index/Services_index', '/content/Index/domain_registration_index', '/content/Index/data-centre_index', '/content/Index/messaging_index', '/content/Index/web-information-manager_index', '/content/Index/nicnet_index', '/content/Index/website-policies_index', '/content/Index/news-update_index', '/content/Index/government-local-area-networks-lans_index', '/content/Index/profile_index', '/content/Index/webcast_index', '/content/Index/rti_index', '/content/Index/nkn_index', '/content/Index/awards_index', '/content/Index/security_index', '/content/Index/photo-gallery_index', '/content/Index/centralised-aadhaar-vault_index', '/content/Index/Home_index', '/content/Index/contact-us_index', '/content/Index/infrastructure_index', '/content/Index/video-conferencing_index', '/content/Index/command-and-control-centre_index', '/content/Index/national-cloud_index', '/content/Index/directory_index', '/content/tabular_data_csv_index', '/content/text_Index']
# folders = ['/content/Index/video-gallery_index', '/content/Index/remote-sensing-gis_index', '/content/Index/help_index', '/content/Index/events_index', '/content/Index/Services_index', '/content/Index/domain_registration_index', '/content/Index/data-centre_index', '/content/Index/messaging_index', '/content/Index/web-information-manager_index', '/content/Index/nicnet_index', '/content/Index/website-policies_index', '/content/Index/news-update_index', '/content/Index/government-local-area-networks-lans_index', '/content/Index/profile_index', '/content/Index/webcast_index', '/content/Index/rti_index', '/content/Index/nkn_index', '/content/Index/awards_index', '/content/Index/security_index', '/content/Index/photo-gallery_index', '/content/Index/centralised-aadhaar-vault_index', '/content/Index/Home_index', '/content/Index/contact-us_index', '/content/Index/infrastructure_index', '/content/Index/video-conferencing_index', '/content/Index/command-and-control-centre_index', '/content/Index/national-cloud_index', '/content/Index/directory_index', '/content/tabular_data_csv_index', '/content/Text_Index_v1/news_updates', '/content/Text_Index_v1/services', '/content/Text_Index_v1/districts', '/content/Text_Index_v1/events', '/content/Text_Index_v1/photo_gallery', '/content/Text_Index_v1/Nic']

folders = ['/content/Index/Services_index', '/content/Index/rti_index', '/content/Index/webcast_index', '/content/Index/nicnet_index', '/content/Index/messaging_index', '/content/Index/help_index', '/content/Index/Home_index', '/content/Index/infrastructure_index', '/content/Index/website-policies_index', '/content/Index/profile_index', '/content/Index/domain_registration_index', '/content/Index/nkn_index', '/content/Index/web-information-manager_index', '/content/Index/government-local-area-networks-lans_index', '/content/Index/directory_index', '/content/Index/tabular_data_csv_index', '/content/Index/awards_index', '/content/Index/centralised-aadhaar-vault_index', '/content/Index/photo-gallery_index', '/content/Index/contact-us_index', '/content/Index/data-centre_index', '/content/Index/events_index', '/content/Index/remote-sensing-gis_index', '/content/Index/news-update_index', '/content/Index/video-conferencing_index', '/content/Index/security_index', '/content/Index/national-cloud_index', '/content/Index/video-gallery_index', '/content/Index/docs_index', '/content/Index/command-and-control-centre_index']


# Initialize merged data
merged_data = {
    "index_store/data": {
        "vector_index": {
            "__type__": "vector_store",
            "__data__": "{\"index_id\": \"vector_index\", \"summary\": null, \"nodes_dict\": {}, \"doc_id_dict\": {}, \"embeddings_dict\": {}}"
        }
    }
}


# Loop through each folder
for folder in folders:
    file_path = os.path.join(folder, 'index_store.json')

    # Load the contents of the JSON file
    with open(file_path, 'r') as file:
        data = json.load(file)

    # Merge the vector_index data
    vector_index_data = json.loads(merged_data['index_store/data']['vector_index']['__data__'])
    vector_index_data['nodes_dict'] = {**vector_index_data['nodes_dict'], **json.loads(data['index_store/data']['vector_index']['__data__'])['nodes_dict']}
    vector_index_data['doc_id_dict'] = {**vector_index_data['doc_id_dict'], **json.loads(data['index_store/data']['vector_index']['__data__'])['doc_id_dict']}
    vector_index_data['embeddings_dict'] = {**vector_index_data['embeddings_dict'], **json.loads(data['index_store/data']['vector_index']['__data__'])['embeddings_dict']}
    merged_data['index_store/data']['vector_index']['__data__'] = json.dumps(vector_index_data)

# Write the merged data to a new JSON file
with open('/content/combined_index_v4/index_store.json', 'w') as merged_file:
    json.dump(merged_data, merged_file, indent=4)

# Model and Embeddings

In [49]:
# Initialize settings for LlamaIndex
# Settings.llm = Ollama(model="llama3", temperature=0, request_timeout=500.0)
Settings.llm = Ollama(model="llama3", temperature=0, request_timeout=500.0, max_tokens=80000000000)

In [50]:
Settings.embed_model = HuggingFaceEmbedding(model_name="Alibaba-NLP/gte-large-en-v1.5", trust_remote_code=True)

# Settings.embed_model = HuggingFaceEmbedding(model_name="nvidia/NV-Embed-v1", trust_remote_code=True, use_auth_token=True)


# Loading the index from disk

In [None]:
persist_dir = "/content/combined_index_v3"

storage_context = StorageContext.from_defaults(persist_dir=persist_dir)
index = load_index_from_storage(storage_context)

print(index)

<llama_index.core.indices.vector_store.base.VectorStoreIndex object at 0x7d749fbcfc70>


In [None]:
!!cat /content/chat_store2.pkl

['"{\\"store\\": {\\"user1\\": [{\\"role\\": \\"user\\", \\"content\\": \\"What can you tell me about the organization structure?\\", \\"additional_kwargs\\": {}}, {\\"role\\": \\"assistant\\", \\"content\\": \\"Based on the provided context, I can tell you that the National Informatics Centre (NIC) Maharashtra State Centre has an organization structure that includes various levels of personnel with different designations and roles. Here\'s a breakdown:\\\\n\\\\n1. **Scientist-F**: This is the highest level in the organization, with 5-6 individuals holding this designation. They report to senior officials.\\\\n2. **Scientist-E**: There are around 15-16 individuals at this level, who report to Scientist-F or Scientist-D.\\\\n3. **Scientist-D**: Approximately 10-11 individuals hold this designation, which is one level below Scientist-E. They report to Scientist-F or Scientist-E.\\\\n4. **Scientist-C**: Around 5-6 individuals are at this level, reporting to Scientist-D or Scientist-E.\\\\

# Queries

In [None]:
# What are NIC services?
response_json

{'response': 'According to the provided context information, National Informatics Centre (NIC) provides various services including:\n\n1. Electronic Mail (E-Mail) services over NICNET, which is a satellite-based communication network.\n2. SMTP, UUCP, and X.400 email services.\n3. Integration with a X.500 directory for easy searching of e-mail addresses.\n4. Fax messaging through E-Mail.\n5. Messaging services, including core eMail application Gateway services, Short Messaging Service (SMS), Outbound Dialing (OBD), and an IT Platform for citizen engagement (Sampark).\n6. Video Conferencing services for direct interaction with stakeholders.\n7. Webcast services for live/on-demand broadcasts of important events and conferences.\n\nThese services are offered to users across the country, including government departments, ministries, and autonomous bodies.',
 'search_Score': 0.6809562153737795,
 'url': 'https://www.nic.in/servicecontents/messaging/'}

In [None]:
# What is the published date of Training Program of IVFRT-MMP at FRRO?
response_json

{'response': 'I\'m happy to help! However, I don\'t see any information about a "Training Program of IVFRT-MMP at FRRO" in the provided context. The context only mentions file paths for CSV files and some personal information about Smt. Ireni Akoijam and details about JAWS screen reader. There is no mention of a training program or its published date.',
 'search_Score': 0.47610851359310563,
 'url': 'https://maharashtra.nic.in/rti/'}

In [None]:
# what is the published date of SAMPARK-NIC eGOV mobile App?
response_json

{'response': 'The published date of SAMPARK – NIC eGov Mobile App is July 31, 2021.',
 'search_Score': 0.683033538785289,
 'url': 'https://maharashtra.nic.in/news-update/'}

In [None]:
# what is the published date of SIMNIC?
response_json

{'response': 'According to the provided context information, the published date of SIMNIC is August 2, 2021.',
 'search_Score': 0.5385759364469261,
 'url': 'https://maharashtra.nic.in/news-update/'}

In [None]:
# can you tell me about the latest news?
response_json

{'response': "There is no news update provided in the given context. The context appears to be a list of screen reader information, including the name of the screen reader, its website, and whether it's free or commercial. There is no mention of any news updates.",
 'search_Score': 0.47680748668387585,
 'url': 'https://maharashtra.nic.in/help/'}

In [None]:
# who is the SIO?
response_json

{'response': 'According to the provided context information, the CISO (Chief Information Security Officer) is responsible for the security of the National Informatics Centre (NIC).',
 'search_Score': 0.6270658935981352,
 'url': 'https://www.nic.in/servicecontents/nkn/'}

In [None]:
# what are the events?
response_json

{'response': "According to the provided context information, the events that are covered using NIC Webcast services include:\n\n1. Union Budget speech\n2. President's address to the nation\n3. Prime Minister's Mann Ki Baat & other speeches\n4. Independence and Republic Day celebrations at New Delhi\n5. Air Force Day\n6. Dance and cultural Festivals\n7. PIB press conferences\n8. NIC Knowledge sharing\n9. NKN events\n10. Proceedings of state assemblies\n11. National and international events/conferences such as:\n\t* Make in India\n\t* Skill India\n\t* Start-up India\n\t* Digital India\n\t* International Yoga Day",
 'search_Score': 0.4722525002107816,
 'url': 'https://maharashtra.nic.in/infrastructure/'}

In [None]:
# whom did Ms. D. Lakshmi Prasanna reports to?
response_json

{'response': "Based on the provided context information, I can see that there is no mention of Ms. D. Lakshmi Prasanna's name or designation in the list of scientists and officials. Therefore, it is not possible to determine who her reporting officer would be based on this information alone.",
 'search_Score': 0.7028144708896884,
 'url': 'https://maharashtra.nic.in/directory/'}

In [None]:
# what is the organization structure?
response_json

{'response': 'The organization structure appears to be a hierarchical structure with various levels of officials and staff. The top-level officials include Scientists-F, who report to higher-level officials such as Scientist-D or Scientist-C. There are also lower-level officials like Section Officers, Senior Secretariat Assistants, Junior Secretariat Assistants, and Staff Car Drivers. Additionally, there are Multitasking Staff and Steno Gr-II personnel.',
 'search_Score': 0.47507247362784927,
 'url': 'https://maharashtra.nic.in/organization-structure/'}

In [None]:
# who is the reporting officer of Ms. D. Lakshmi Prasanna?
response_json

{'response': 'There is no mention of a photo gallery in the provided context information. The context appears to be related to screen readers and document types, but it does not include any information about a photo gallery. Therefore, I cannot provide an answer to this query based on the given context.',
 'search_Score': 0.5215259570020727,
 'url': 'https://maharashtra.nic.in/help/'}

In [None]:
# what is there in the video gallery?
response_json

{'response': 'According to the provided context, the Video Gallery contains videos related to NIC Maharashtra – ICT support during COVID-19.',
 'search_Score': 0.5873177463100648,
 'url': 'https://maharashtra.nic.in/video-gallery/'}

In [None]:
response_json

{'response': 'According to the provided context information, National Informatics Centre (NIC) has been involved in various works, including:\n\n1. Establishing High Capacity SCPC VSAT Connectivity at Kavarati, Lakshadweep and Port Blair, Andaman & Nicobar Island.\n2. Providing Data Centers Services from National Data Centres at Delhi, Hyderabad, Pune, and Bhubaneswar.\n3. Upgrading the National Data Centre at Delhi with high-speed network backbone, 1.6 Petabyte Enterprise-class storage, high-throughput Network Load Balancers, and Intrusion Prevention Systems.\n4. Implementing Solution for Backup as a Service & Storage as a Service.\n5. Hosting/enhancing ICT infrastructure of national-level projects, including E-office, e-Courts, and e-Transport.\n6. Upgrading Data Centres at Pune and Hyderabad with high-speed network backbone and storage capacity enhanced.\n7. Renovating the National Data Centre at Hyderabad with a capacity of 106 Racks.\n8. Launching National Cloud Services under Meg

# Visualizing Reranking

In [None]:
persist_dir = "/content/combined_index"

storage_context = StorageContext.from_defaults(persist_dir=persist_dir)
index = load_index_from_storage(storage_context)

print(index)

<llama_index.core.indices.vector_store.base.VectorStoreIndex object at 0x7ca00054b430>


In [None]:
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core import QueryBundle
import pandas as pd
from IPython.display import display, HTML
from copy import deepcopy

pd.set_option("display.max_colwidth", None)

# configure retriever
retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=10)

def get_retrieved_nodes(
    query_str, reranker
):
    query_bundle = QueryBundle(query_str)

    retrieved_nodes = retriever.retrieve(query_bundle)

    if reranker != "None":
      retrieved_nodes = reranker.postprocess_nodes(retrieved_nodes, query_bundle)
    else:
        retrieved_nodes

    return retrieved_nodes


def pretty_print(df):
    return display(HTML(df.to_html().replace("\\n", "<br>")))


def visualize_retrieved_nodes(nodes) -> None:
    result_dicts = []
    for node in nodes:
        node = deepcopy(node)
        node.node.metadata = {}
        node_text = node.node.get_text()
        node_text = node_text.replace("\n", " ")

        result_dict = {"Score": node.score, "Text": node_text}
        result_dicts.append(result_dict)

    pretty_print(pd.DataFrame(result_dicts))

# Comparing with reranker and without reranker

In [None]:
# Define all embeddings and rerankers

RERANKERS = {
    "WithoutReranker": "None",
    "cross-encoder": SentenceTransformerRerank(model="cross-encoder/ms-marco-MiniLM-L-2-v2", top_n=5),
    "bge-reranker-large": SentenceTransformerRerank(model="BAAI/bge-reranker-large", top_n=5)
}

NameError: name 'SentenceTransformerRerank' is not defined

In [None]:
RERANKERS.items()

dict_items([('WithoutReranker', 'None'), ('cross-encoder', SentenceTransformerRerank(callback_manager=<llama_index.core.callbacks.base.CallbackManager object at 0x7ca0e5a8f340>, model='cross-encoder/ms-marco-MiniLM-L-2-v2', top_n=5, device='cuda', keep_retrieval_score=False)), ('bge-reranker-large', SentenceTransformerRerank(callback_manager=<llama_index.core.callbacks.base.CallbackManager object at 0x7c9fec1c8be0>, model='BAAI/bge-reranker-large', top_n=5, device='cuda', keep_retrieval_score=False))])

In [None]:
query_str = "what are the works done by NIC?"

# Loop over rerankers
for rerank_name, reranker in RERANKERS.items():
    print(f"Running Evaluation for Reranker: {rerank_name}")

    query_bundle = QueryBundle(query_str)

    retrieved_nodes = retriever.retrieve(query_bundle)

    if reranker != "None":
      retrieved_nodes = reranker.postprocess_nodes(retrieved_nodes, query_bundle)
    else:
        retrieved_nodes

    print(f"Visualize Retrieved Nodes for Reranker: {rerank_name}")
    visualize_retrieved_nodes(retrieved_nodes)

Running Evaluation for Reranker: WithoutReranker
Visualize Retrieved Nodes for Reranker: WithoutReranker


Unnamed: 0,Score,Text
0,0.608027,"Page Last Updated Date :October 30th, 2023 Footer About this website Terms of use Website policies Sitemap Help Contact Us Feedback Web Information Manager CISO Recruitment \t\t\t\t\t \t\t\t\t\t Content Owned and Maintained by National Informatics Centre, Ministry of Electronics & IT (MeitY) | Government of India Website is Designed, Developed and Hosted by National Informatics Centre, Ministry of Electronics & IT (MeitY) | Government of India\t\t\t\t\t \t\t\t\t\t Last Updated: May 27, 2024 error: Content is protected !!"
1,0.597522,"NKN has also established a High Capacity SCPC VSAT Connectivity at Kavarati, Lakshadweep and Port Blair, Andaman & Nicobar Island. Data Centres NIC provides Data Centers Services from National Data Centres at Delhi, Hyderabad, Pune and Bhubaneswar. National Data Centre (NDC) at Bhubaneswar is a Cloud-enabled data centre which has been offering cloud services to Govt. Departments since its inauguration during 2018. The cloud services are being offered on various flavors of cloud platforms backed by state-of-the-art infrastructure to support the DC operations. NDC Bhubaneswar is also offering co-location services by Govt. Organizations. National Data Centre at Delhi was upgraded with high speed Network backbone, 1.6 Petabyte Enterprise class storage, high throughput Network Load Balancers, and Intrusion Prevention Systems. Solution for Backup as a Service & Storage as a Service has been implemented. ICT infrastructure of number of national level projects was hosted/enhanced; these include E-office, e-Courts and e-Transport. Data centres at Pune & Hyderabad are also upgraded with high speed network backbone and storage capacity enhanced. National Data Centre at Hyderabad is being renovated with a capacity of106 Racks. National Cloud Infrastructure NIC launched National Cloud Services in year 2014 under MeghRaj Government of India Cloud Initiative. NIC Cloud Services are being provided from multiple locations of National Data Centres at Bhubaneswar, Delhi, Hyderabad, and Pune. Various new services are now offered on Cloud including Application Programme Monitoring (APM) Service, Data Analytics (DA) Service, Resource Monitoring (RM) Service and Container Service. In order to cater to the projects envisioned under Digital India Programme and growing requirements of existing Projects, over 18,000 Virtual Servers were provisioned and allocated to over 1100 Ministries/Departments for e-Governance Projects. Establishment of Mini-Clouds in States: NIC already had established Mini Clouds in four state units and are operational. During this year Mini Cloud setups have been made operational in ten state units of Assam, Bihar, Chandigarh, Chhattisgarh, Haryana, Karnataka, Kerala, Punjab, Rajasthan and Tripura. Command and Control Centre NIC has been offering services to the government through its 4 National Data Centres and 30 Mini Data Centres across the country. CCC has been set up keeping in view the requirement of a centralized facility to seamlessly monitor the availability of all these Centres and Cloud Services.Over 10,000 e-Governance applications are being hosted by these Centre’s. CCC is providing users with a customized dashboard of Network Management System (NMS) to enable them to monitor their respective applications. NIC is also providing Application Performance Management (APM) through CCC to improve the availability, performance and functioning of critical applications. Cyber Security Network Security The Network Security Division is in relentless pursuit of achieving CIA (Confidentiality, Integrity, and Availability) of ICT assets in NICNET through deployment of expert manpower, appropriate tools, and state-of-the-art technologies. The Network Security Division (NSD) of NIC is engaged in assessment, planning, deployment and management of security devices and solutions across the NICNET in general and the Data Centres in particular. The security span of NSD comprises of all National and State Data Centres, over 1000 LANs of Govt. offices and MPLS networks, more than 2 Lakh endpoints and a series of networking devices deployed across the country. A dedicated team actively monitors real time attacks on 24×7 basis. The Network Security Division (NSD) conducts Security Audit of Data Centres and Bhawan Networks on regular basis and on demand. Besides, review of the network audit performed by third party vendors in NICNET was also undertaken. Cyber Security Policies, Guidelines, Advisories and Standard Operating Procedure(s) are also being regularly prepared, updated and circulated to the NICNET users. Network Security Division is involved in vulnerability assessment of ICT assets in Physical, Virtual, and Cloud environments at regular intervals and on demand.The Network Security Division manages the 24×7 Security Monitoring Centre to ensure real time monitoring, detection, prevention, analysis and reporting of Cyber threats and attacks. Application Security NIC is formulating and updating the Security Policies for NICNET as and when required. Security Audit of Web Applications / Websites, Penetration Testing and Vulnerability Analysis,SSL compliance testing, Version Detection for application hosting environment with infrastructure compliance checks are also done as per user requirement."
2,0.596696,"Messaging | National Informatics Centre Change Text Size A + Increase font size\t\t\t\t\t\t A Reset font size\t\t\t\t\t\t A - Decrease font size\t\t\t\t\t\t Change Color Contrast High Contrast\t\t\t\t\t\t Normal Contrast\t\t\t\t\t\t Skip to main content \t\t\t\t\t\tSkip to main content\t\t\t\t\t English English हिन्दी मराठी ਪੰਜਾਬੀ ગુજરાતી অসমীয়া বাংলা తెలుగు தமிழ் മലയാളം oriya Kannada Screen Reader Access search Search About Us Mandate Organization Chart Who’s who Directory Search Timeline Research & Publications RTI Tenders Recruitment Alumni NIC Offices Headquarters Data Centres Focus Centres Centres of Excellence State Centres District Centres Services Products & Platforms From Centre From State eBrochure Emerging Technologies Media eBook Blogs Awards Infographics Infographics SSO Testimonials Informatics Newsletter Messaging Home » » Messaging\t\t Messaging \tElectronic Mail (better known as E-Mail), is the most used Network Service across the country. National Informatics Centre (NIC) provides different kinds of E-mail services to its users, over NICNET, which is NIC’s satellite-based communication network. The different types of e-mail services being provided include SMTP, UUCP and X.400. The NICNET e-mail service is distributed over many mail servers located at different NIC Centres. These are inter-linked with each other in a way that mails can be exchanged amongst all types of services. \tE-mail can be used as part of the electronic file processing in Government of India. All services under e-mail are offered free of cost to all officials under Ministries / Departments / Statutory Bodies / Autonomous bodies of both Central and State/ UT Governments. \tThe E-mail service is also integrated with a X.500 directory which makes it possible to search for and locate e-mail addresses very easily. It is also possible to send fax messages through E-mail since a gateway is provided for conversion of messages for fax recipients. \tMails are accepted and sent in NICNET from a single entry point i.e. via the SMTP gateways. Over 8 lakh mails are transacted in a day. Once a mail is accepted in the network, based on its address, it is routed to the recipient server. \tMessaging services constitute one of the primary applications deployed across the network. Each network connected to the Internet has a Domain Name associated with it, to ensure email and other traffic getting directed to the right recipient. \tIn the case of NICNET, this domain is known as “nic.in”. All emails to the home user are directed to “home.user[at]nic[dot]in” which results in the mail being stored on the NIC mail server, ready to be collected by the home user email client. “Gov.in” domain accounts are also maintained by NIC for use by Government departments. \tUnder the Digital India initiative, NIC has established a robust Messaging framework that includes core eMial application Gateway services, Short Messaging Service (SMS), OBD (Outbound Dialing) and an IT Platform for citizen engagement (Sampark). These services collectively create a value chain over the existing NIC web portals that host Digital Services and are used extensively by Central & State Governments for citizen engagement & disseminating information. Page Last Updated Date :April 28th, 2023 Footer About this website Terms of use Website policies Sitemap Help Contact Us Feedback Web Information Manager CISO Recruitment \t\t\t\t\t \t\t\t\t\t Content Owned and Maintained by National Informatics Centre, Ministry of Electronics & IT (MeitY) | Government of India Website is Designed, Developed and Hosted by National Informatics Centre, Ministry of Electronics & IT (MeitY) | Government of India\t\t\t\t\t \t\t\t\t\t Last Updated: May 27, 2024 error: Content is protected !!"
3,0.595926,"Page Last Updated Date :March 22nd, 2024 Footer About this website Terms of use Website policies Sitemap Help Contact Us Feedback Web Information Manager CISO Recruitment \t\t\t\t\t \t\t\t\t\t Content Owned and Maintained by National Informatics Centre, Ministry of Electronics & IT (MeitY) | Government of India Website is Designed, Developed and Hosted by National Informatics Centre, Ministry of Electronics & IT (MeitY) | Government of India\t\t\t\t\t \t\t\t\t\t Last Updated: May 27, 2024 error: Content is protected !!"
4,0.583055,Name: Shri S.V.Ganjewar Designation: Scientist-F Email: sv[dot]ganjewar[at]nic[dot]in Phone: 0253-23311460 IP Number: 38306
5,0.581553,Name: Shri Satish Ninoorao Khadse Designation: Scientist-E Email: sn[dot]khadse[at]nic[dot]in Phone: 07172-250530 IP Number: 38241
6,0.581437,"NICNET | National Informatics Centre Change Text Size A + Increase font size\t\t\t\t\t\t A Reset font size\t\t\t\t\t\t A - Decrease font size\t\t\t\t\t\t Change Color Contrast High Contrast\t\t\t\t\t\t Normal Contrast\t\t\t\t\t\t Skip to main content \t\t\t\t\t\tSkip to main content\t\t\t\t\t English English हिन्दी मराठी ਪੰਜਾਬੀ ગુજરાતી অসমীয়া বাংলা తెలుగు தமிழ் മലയാളം oriya Kannada Screen Reader Access search Search About Us Mandate Organization Chart Who’s who Directory Search Timeline Research & Publications RTI Tenders Recruitment Alumni NIC Offices Headquarters Data Centres Focus Centres Centres of Excellence State Centres District Centres Services Products & Platforms From Centre From State eBrochure Emerging Technologies Media eBook Blogs Awards Infographics Infographics SSO Testimonials Informatics Newsletter NICNET Home » » NICNET\t\t NICNET \tNational Informatics Centre (NIC) through its Information and Communication Technology (ICT) Network – NICNET, has institutional linkages across all the Ministries /Departments of the Central Government, State Governments, Union Territories, and District administrations of the country. \tThrough NICNET, NIC has been instrumental in steering e-Governance applications in Government Ministries/ Departments at the Centre, States, Districts and Block level, facilitating improvement in Government services, wider transparency, promoting decentralized planning and management, resulting in better efficiency and accountability to the people of India. \tDirect peering of NICNET with BSNL, PGCIL and Railtel are completed at Delhi and Hyderabad for saving Internet Bandwidth and faster access of each other’s Network and Data Centre. Peering with Google, Microsoft and Akamai Content Delivery Network has facilitated faster access to Google services and other important International web sites. \tRe-structuring of Videoconferencing network has enabled to minimize delay and handle large scale important video conferencing such as PRAGATI of Hon’ble Prime Minister, GST Council Meetings by Hon’ble Finance Minister among others. Page Last Updated Date :July 17th, 2023 Footer About this website Terms of use Website policies Sitemap Help Contact Us Feedback Web Information Manager CISO Recruitment \t\t\t\t\t \t\t\t\t\t Content Owned and Maintained by National Informatics Centre, Ministry of Electronics & IT (MeitY) | Government of India Website is Designed, Developed and Hosted by National Informatics Centre, Ministry of Electronics & IT (MeitY) | Government of India\t\t\t\t\t \t\t\t\t\t Last Updated: May 27, 2024 error: Content is protected !!"
7,0.577655,"Page Last Updated Date :March 19th, 2024 Footer About this website Terms of use Website policies Sitemap Help Contact Us Feedback Web Information Manager CISO Recruitment \t\t\t\t\t \t\t\t\t\t Content Owned and Maintained by National Informatics Centre, Ministry of Electronics & IT (MeitY) | Government of India Website is Designed, Developed and Hosted by National Informatics Centre, Ministry of Electronics & IT (MeitY) | Government of India\t\t\t\t\t \t\t\t\t\t Last Updated: May 27, 2024 error: Content is protected !!"
8,0.576371,Name: Dr. Nitin Vishnu Choudhari Designation: Scientist-E Email: nv[dot]choudhari[at]nic[dot]in Phone: 07152-245087 IP Number: 38366
9,0.575196,"Shri Surendra Purushottam Patil (DIO) Scientist-C surendra[dot]patil[at]nic[dot]in IP Number : 38302 Nasik Shri S.V.Ganjewar (DIO) Scientist-F 0253-23311460 sv[dot]ganjewar[at]nic[dot]in IP Number : 38306 NIC-DISTRICT CENTER, COLLECTOR OFFICE, SECOND FLOOR, , COLLECTOR OFFICE , NEAR CBS, OLD AGRA ROAD, 422002 Shri Raghuveer Singh Scientific/Technical Assistant-A raghuveer[dot]s02[at]nic[dot]in IP Number : 38307 1, , , , COLLECTOR OFFICE, NASHIK, 422001 Osmanabad Shri Purushottam Nandkishor Rukme (DIO) Scientist-C 02472-220233 pn[dot]rukme[at]nic[dot]in IP Number : 38312 1ST FLOOR, NIC DEPARTMENT, COLLECTOR OFFICE, DHARASHIV, 413501 Palghar Shri Khadayate Yogesh Arvind (DIO) Scientist-F 02525-299117 y[dot]khadayate[at]nic[dot]in IP Number : 38316 217, , SECOND FLOOR, COLLECTOR OFFICE, COLLECTOR AND DM BUILDING, KOLGAON, PALGHAR, 401404 Parbhani Shri Sunil Digambarro Rao Potekar (DIO) Scientist-F 02452-223528 sd[dot]potekar[at]nic[dot]in IP Number : 38321 I, I, COLLECTOR OFFICE, PARBHANI, 431401 Pune District Ms. Ashwini B.Karmarkar (DIO) Scientist-E 020-26129948 ab[dot]karmarkar[at]nic[dot]in IP Number : 38327 NIC, -, FIFTH FLOOR, FIFTH FLOOR, COLLECTOR OFFICE BUILDING, PUNE, 411001 Raigad Shri Nilesh Nivrutti Landge Scientific/Technical Assistant-B nilesh[dot]landge[at]nic[dot]in IP Number : 38332 Ratnagiri Shri Peerjade Mahamadshahid Vajidahamad (DIO) Scientist-C 02352-223757 mv[dot]peerjade[at]nic[dot]in IP Number : 38337 COLLECTOR OFFICE, RATNAGIRI, 415612 Sangli Shri Patel Yasin Usmansab (DIO) Scientist-D 0233-2600600 sayyed[dot]yasin[at]nic[dot]in IP Number : 38341 NATIONAL INFORMATICS CENTRE, , GROUND FLOOR, VIJAYNAGAR, COLLECTOR OFFICE, SANGLI, 416415 Satara Shri S.K. Kulkarni (DIO) Scientist-E sanjeev[dot]k[at]nic[dot]in IP Number : 38346 1, , THIRD, GANESHKHIND ROAD, NATIONAL INFORMATICS CENTRE, SATARA, 415001 Sindhu Durg Shri Antony Thomas A. (DIO) Scientist-D 02362-228822 aa[dot]thomas[at]nic[dot]in IP Number : 38351 219, , FIRST, , COLLECTORATE BUILDING OROS, SINDHU DURG, 416812 Solapur Shri Utkarsh Madhukar Honkalse (DIO) Scientist-E 0217-2722782 um[dot]honkalse[at]nic[dot]in IP Number : 38357 NIC, , SECOND, COLLECTOR OFFICE, FIRST FLOOR MAHSOOL BHAVAN DIST COLLECTOR OFFICE, SAAT RASTA SOLAPUR, 413004 Thane"


Running Evaluation for Reranker: cross-encoder
Visualize Retrieved Nodes for Reranker: cross-encoder


Unnamed: 0,Score,Text
0,-2.399898,"NICNET | National Informatics Centre Change Text Size A + Increase font size\t\t\t\t\t\t A Reset font size\t\t\t\t\t\t A - Decrease font size\t\t\t\t\t\t Change Color Contrast High Contrast\t\t\t\t\t\t Normal Contrast\t\t\t\t\t\t Skip to main content \t\t\t\t\t\tSkip to main content\t\t\t\t\t English English हिन्दी मराठी ਪੰਜਾਬੀ ગુજરાતી অসমীয়া বাংলা తెలుగు தமிழ் മലയാളം oriya Kannada Screen Reader Access search Search About Us Mandate Organization Chart Who’s who Directory Search Timeline Research & Publications RTI Tenders Recruitment Alumni NIC Offices Headquarters Data Centres Focus Centres Centres of Excellence State Centres District Centres Services Products & Platforms From Centre From State eBrochure Emerging Technologies Media eBook Blogs Awards Infographics Infographics SSO Testimonials Informatics Newsletter NICNET Home » » NICNET\t\t NICNET \tNational Informatics Centre (NIC) through its Information and Communication Technology (ICT) Network – NICNET, has institutional linkages across all the Ministries /Departments of the Central Government, State Governments, Union Territories, and District administrations of the country. \tThrough NICNET, NIC has been instrumental in steering e-Governance applications in Government Ministries/ Departments at the Centre, States, Districts and Block level, facilitating improvement in Government services, wider transparency, promoting decentralized planning and management, resulting in better efficiency and accountability to the people of India. \tDirect peering of NICNET with BSNL, PGCIL and Railtel are completed at Delhi and Hyderabad for saving Internet Bandwidth and faster access of each other’s Network and Data Centre. Peering with Google, Microsoft and Akamai Content Delivery Network has facilitated faster access to Google services and other important International web sites. \tRe-structuring of Videoconferencing network has enabled to minimize delay and handle large scale important video conferencing such as PRAGATI of Hon’ble Prime Minister, GST Council Meetings by Hon’ble Finance Minister among others. Page Last Updated Date :July 17th, 2023 Footer About this website Terms of use Website policies Sitemap Help Contact Us Feedback Web Information Manager CISO Recruitment \t\t\t\t\t \t\t\t\t\t Content Owned and Maintained by National Informatics Centre, Ministry of Electronics & IT (MeitY) | Government of India Website is Designed, Developed and Hosted by National Informatics Centre, Ministry of Electronics & IT (MeitY) | Government of India\t\t\t\t\t \t\t\t\t\t Last Updated: May 27, 2024 error: Content is protected !!"
1,-3.689135,"Messaging | National Informatics Centre Change Text Size A + Increase font size\t\t\t\t\t\t A Reset font size\t\t\t\t\t\t A - Decrease font size\t\t\t\t\t\t Change Color Contrast High Contrast\t\t\t\t\t\t Normal Contrast\t\t\t\t\t\t Skip to main content \t\t\t\t\t\tSkip to main content\t\t\t\t\t English English हिन्दी मराठी ਪੰਜਾਬੀ ગુજરાતી অসমীয়া বাংলা తెలుగు தமிழ் മലയാളം oriya Kannada Screen Reader Access search Search About Us Mandate Organization Chart Who’s who Directory Search Timeline Research & Publications RTI Tenders Recruitment Alumni NIC Offices Headquarters Data Centres Focus Centres Centres of Excellence State Centres District Centres Services Products & Platforms From Centre From State eBrochure Emerging Technologies Media eBook Blogs Awards Infographics Infographics SSO Testimonials Informatics Newsletter Messaging Home » » Messaging\t\t Messaging \tElectronic Mail (better known as E-Mail), is the most used Network Service across the country. National Informatics Centre (NIC) provides different kinds of E-mail services to its users, over NICNET, which is NIC’s satellite-based communication network. The different types of e-mail services being provided include SMTP, UUCP and X.400. The NICNET e-mail service is distributed over many mail servers located at different NIC Centres. These are inter-linked with each other in a way that mails can be exchanged amongst all types of services. \tE-mail can be used as part of the electronic file processing in Government of India. All services under e-mail are offered free of cost to all officials under Ministries / Departments / Statutory Bodies / Autonomous bodies of both Central and State/ UT Governments. \tThe E-mail service is also integrated with a X.500 directory which makes it possible to search for and locate e-mail addresses very easily. It is also possible to send fax messages through E-mail since a gateway is provided for conversion of messages for fax recipients. \tMails are accepted and sent in NICNET from a single entry point i.e. via the SMTP gateways. Over 8 lakh mails are transacted in a day. Once a mail is accepted in the network, based on its address, it is routed to the recipient server. \tMessaging services constitute one of the primary applications deployed across the network. Each network connected to the Internet has a Domain Name associated with it, to ensure email and other traffic getting directed to the right recipient. \tIn the case of NICNET, this domain is known as “nic.in”. All emails to the home user are directed to “home.user[at]nic[dot]in” which results in the mail being stored on the NIC mail server, ready to be collected by the home user email client. “Gov.in” domain accounts are also maintained by NIC for use by Government departments. \tUnder the Digital India initiative, NIC has established a robust Messaging framework that includes core eMial application Gateway services, Short Messaging Service (SMS), OBD (Outbound Dialing) and an IT Platform for citizen engagement (Sampark). These services collectively create a value chain over the existing NIC web portals that host Digital Services and are used extensively by Central & State Governments for citizen engagement & disseminating information. Page Last Updated Date :April 28th, 2023 Footer About this website Terms of use Website policies Sitemap Help Contact Us Feedback Web Information Manager CISO Recruitment \t\t\t\t\t \t\t\t\t\t Content Owned and Maintained by National Informatics Centre, Ministry of Electronics & IT (MeitY) | Government of India Website is Designed, Developed and Hosted by National Informatics Centre, Ministry of Electronics & IT (MeitY) | Government of India\t\t\t\t\t \t\t\t\t\t Last Updated: May 27, 2024 error: Content is protected !!"
2,-4.641734,"NKN has also established a High Capacity SCPC VSAT Connectivity at Kavarati, Lakshadweep and Port Blair, Andaman & Nicobar Island. Data Centres NIC provides Data Centers Services from National Data Centres at Delhi, Hyderabad, Pune and Bhubaneswar. National Data Centre (NDC) at Bhubaneswar is a Cloud-enabled data centre which has been offering cloud services to Govt. Departments since its inauguration during 2018. The cloud services are being offered on various flavors of cloud platforms backed by state-of-the-art infrastructure to support the DC operations. NDC Bhubaneswar is also offering co-location services by Govt. Organizations. National Data Centre at Delhi was upgraded with high speed Network backbone, 1.6 Petabyte Enterprise class storage, high throughput Network Load Balancers, and Intrusion Prevention Systems. Solution for Backup as a Service & Storage as a Service has been implemented. ICT infrastructure of number of national level projects was hosted/enhanced; these include E-office, e-Courts and e-Transport. Data centres at Pune & Hyderabad are also upgraded with high speed network backbone and storage capacity enhanced. National Data Centre at Hyderabad is being renovated with a capacity of106 Racks. National Cloud Infrastructure NIC launched National Cloud Services in year 2014 under MeghRaj Government of India Cloud Initiative. NIC Cloud Services are being provided from multiple locations of National Data Centres at Bhubaneswar, Delhi, Hyderabad, and Pune. Various new services are now offered on Cloud including Application Programme Monitoring (APM) Service, Data Analytics (DA) Service, Resource Monitoring (RM) Service and Container Service. In order to cater to the projects envisioned under Digital India Programme and growing requirements of existing Projects, over 18,000 Virtual Servers were provisioned and allocated to over 1100 Ministries/Departments for e-Governance Projects. Establishment of Mini-Clouds in States: NIC already had established Mini Clouds in four state units and are operational. During this year Mini Cloud setups have been made operational in ten state units of Assam, Bihar, Chandigarh, Chhattisgarh, Haryana, Karnataka, Kerala, Punjab, Rajasthan and Tripura. Command and Control Centre NIC has been offering services to the government through its 4 National Data Centres and 30 Mini Data Centres across the country. CCC has been set up keeping in view the requirement of a centralized facility to seamlessly monitor the availability of all these Centres and Cloud Services.Over 10,000 e-Governance applications are being hosted by these Centre’s. CCC is providing users with a customized dashboard of Network Management System (NMS) to enable them to monitor their respective applications. NIC is also providing Application Performance Management (APM) through CCC to improve the availability, performance and functioning of critical applications. Cyber Security Network Security The Network Security Division is in relentless pursuit of achieving CIA (Confidentiality, Integrity, and Availability) of ICT assets in NICNET through deployment of expert manpower, appropriate tools, and state-of-the-art technologies. The Network Security Division (NSD) of NIC is engaged in assessment, planning, deployment and management of security devices and solutions across the NICNET in general and the Data Centres in particular. The security span of NSD comprises of all National and State Data Centres, over 1000 LANs of Govt. offices and MPLS networks, more than 2 Lakh endpoints and a series of networking devices deployed across the country. A dedicated team actively monitors real time attacks on 24×7 basis. The Network Security Division (NSD) conducts Security Audit of Data Centres and Bhawan Networks on regular basis and on demand. Besides, review of the network audit performed by third party vendors in NICNET was also undertaken. Cyber Security Policies, Guidelines, Advisories and Standard Operating Procedure(s) are also being regularly prepared, updated and circulated to the NICNET users. Network Security Division is involved in vulnerability assessment of ICT assets in Physical, Virtual, and Cloud environments at regular intervals and on demand.The Network Security Division manages the 24×7 Security Monitoring Centre to ensure real time monitoring, detection, prevention, analysis and reporting of Cyber threats and attacks. Application Security NIC is formulating and updating the Security Policies for NICNET as and when required. Security Audit of Web Applications / Websites, Penetration Testing and Vulnerability Analysis,SSL compliance testing, Version Detection for application hosting environment with infrastructure compliance checks are also done as per user requirement."
3,-6.632578,"Page Last Updated Date :October 30th, 2023 Footer About this website Terms of use Website policies Sitemap Help Contact Us Feedback Web Information Manager CISO Recruitment \t\t\t\t\t \t\t\t\t\t Content Owned and Maintained by National Informatics Centre, Ministry of Electronics & IT (MeitY) | Government of India Website is Designed, Developed and Hosted by National Informatics Centre, Ministry of Electronics & IT (MeitY) | Government of India\t\t\t\t\t \t\t\t\t\t Last Updated: May 27, 2024 error: Content is protected !!"
4,-6.703731,"Page Last Updated Date :March 22nd, 2024 Footer About this website Terms of use Website policies Sitemap Help Contact Us Feedback Web Information Manager CISO Recruitment \t\t\t\t\t \t\t\t\t\t Content Owned and Maintained by National Informatics Centre, Ministry of Electronics & IT (MeitY) | Government of India Website is Designed, Developed and Hosted by National Informatics Centre, Ministry of Electronics & IT (MeitY) | Government of India\t\t\t\t\t \t\t\t\t\t Last Updated: May 27, 2024 error: Content is protected !!"


Running Evaluation for Reranker: bge-reranker-large
Visualize Retrieved Nodes for Reranker: bge-reranker-large


Unnamed: 0,Score,Text
0,0.294791,"Messaging | National Informatics Centre Change Text Size A + Increase font size\t\t\t\t\t\t A Reset font size\t\t\t\t\t\t A - Decrease font size\t\t\t\t\t\t Change Color Contrast High Contrast\t\t\t\t\t\t Normal Contrast\t\t\t\t\t\t Skip to main content \t\t\t\t\t\tSkip to main content\t\t\t\t\t English English हिन्दी मराठी ਪੰਜਾਬੀ ગુજરાતી অসমীয়া বাংলা తెలుగు தமிழ் മലയാളം oriya Kannada Screen Reader Access search Search About Us Mandate Organization Chart Who’s who Directory Search Timeline Research & Publications RTI Tenders Recruitment Alumni NIC Offices Headquarters Data Centres Focus Centres Centres of Excellence State Centres District Centres Services Products & Platforms From Centre From State eBrochure Emerging Technologies Media eBook Blogs Awards Infographics Infographics SSO Testimonials Informatics Newsletter Messaging Home » » Messaging\t\t Messaging \tElectronic Mail (better known as E-Mail), is the most used Network Service across the country. National Informatics Centre (NIC) provides different kinds of E-mail services to its users, over NICNET, which is NIC’s satellite-based communication network. The different types of e-mail services being provided include SMTP, UUCP and X.400. The NICNET e-mail service is distributed over many mail servers located at different NIC Centres. These are inter-linked with each other in a way that mails can be exchanged amongst all types of services. \tE-mail can be used as part of the electronic file processing in Government of India. All services under e-mail are offered free of cost to all officials under Ministries / Departments / Statutory Bodies / Autonomous bodies of both Central and State/ UT Governments. \tThe E-mail service is also integrated with a X.500 directory which makes it possible to search for and locate e-mail addresses very easily. It is also possible to send fax messages through E-mail since a gateway is provided for conversion of messages for fax recipients. \tMails are accepted and sent in NICNET from a single entry point i.e. via the SMTP gateways. Over 8 lakh mails are transacted in a day. Once a mail is accepted in the network, based on its address, it is routed to the recipient server. \tMessaging services constitute one of the primary applications deployed across the network. Each network connected to the Internet has a Domain Name associated with it, to ensure email and other traffic getting directed to the right recipient. \tIn the case of NICNET, this domain is known as “nic.in”. All emails to the home user are directed to “home.user[at]nic[dot]in” which results in the mail being stored on the NIC mail server, ready to be collected by the home user email client. “Gov.in” domain accounts are also maintained by NIC for use by Government departments. \tUnder the Digital India initiative, NIC has established a robust Messaging framework that includes core eMial application Gateway services, Short Messaging Service (SMS), OBD (Outbound Dialing) and an IT Platform for citizen engagement (Sampark). These services collectively create a value chain over the existing NIC web portals that host Digital Services and are used extensively by Central & State Governments for citizen engagement & disseminating information. Page Last Updated Date :April 28th, 2023 Footer About this website Terms of use Website policies Sitemap Help Contact Us Feedback Web Information Manager CISO Recruitment \t\t\t\t\t \t\t\t\t\t Content Owned and Maintained by National Informatics Centre, Ministry of Electronics & IT (MeitY) | Government of India Website is Designed, Developed and Hosted by National Informatics Centre, Ministry of Electronics & IT (MeitY) | Government of India\t\t\t\t\t \t\t\t\t\t Last Updated: May 27, 2024 error: Content is protected !!"
1,0.271061,"NICNET | National Informatics Centre Change Text Size A + Increase font size\t\t\t\t\t\t A Reset font size\t\t\t\t\t\t A - Decrease font size\t\t\t\t\t\t Change Color Contrast High Contrast\t\t\t\t\t\t Normal Contrast\t\t\t\t\t\t Skip to main content \t\t\t\t\t\tSkip to main content\t\t\t\t\t English English हिन्दी मराठी ਪੰਜਾਬੀ ગુજરાતી অসমীয়া বাংলা తెలుగు தமிழ் മലയാളം oriya Kannada Screen Reader Access search Search About Us Mandate Organization Chart Who’s who Directory Search Timeline Research & Publications RTI Tenders Recruitment Alumni NIC Offices Headquarters Data Centres Focus Centres Centres of Excellence State Centres District Centres Services Products & Platforms From Centre From State eBrochure Emerging Technologies Media eBook Blogs Awards Infographics Infographics SSO Testimonials Informatics Newsletter NICNET Home » » NICNET\t\t NICNET \tNational Informatics Centre (NIC) through its Information and Communication Technology (ICT) Network – NICNET, has institutional linkages across all the Ministries /Departments of the Central Government, State Governments, Union Territories, and District administrations of the country. \tThrough NICNET, NIC has been instrumental in steering e-Governance applications in Government Ministries/ Departments at the Centre, States, Districts and Block level, facilitating improvement in Government services, wider transparency, promoting decentralized planning and management, resulting in better efficiency and accountability to the people of India. \tDirect peering of NICNET with BSNL, PGCIL and Railtel are completed at Delhi and Hyderabad for saving Internet Bandwidth and faster access of each other’s Network and Data Centre. Peering with Google, Microsoft and Akamai Content Delivery Network has facilitated faster access to Google services and other important International web sites. \tRe-structuring of Videoconferencing network has enabled to minimize delay and handle large scale important video conferencing such as PRAGATI of Hon’ble Prime Minister, GST Council Meetings by Hon’ble Finance Minister among others. Page Last Updated Date :July 17th, 2023 Footer About this website Terms of use Website policies Sitemap Help Contact Us Feedback Web Information Manager CISO Recruitment \t\t\t\t\t \t\t\t\t\t Content Owned and Maintained by National Informatics Centre, Ministry of Electronics & IT (MeitY) | Government of India Website is Designed, Developed and Hosted by National Informatics Centre, Ministry of Electronics & IT (MeitY) | Government of India\t\t\t\t\t \t\t\t\t\t Last Updated: May 27, 2024 error: Content is protected !!"
2,0.115074,"NKN has also established a High Capacity SCPC VSAT Connectivity at Kavarati, Lakshadweep and Port Blair, Andaman & Nicobar Island. Data Centres NIC provides Data Centers Services from National Data Centres at Delhi, Hyderabad, Pune and Bhubaneswar. National Data Centre (NDC) at Bhubaneswar is a Cloud-enabled data centre which has been offering cloud services to Govt. Departments since its inauguration during 2018. The cloud services are being offered on various flavors of cloud platforms backed by state-of-the-art infrastructure to support the DC operations. NDC Bhubaneswar is also offering co-location services by Govt. Organizations. National Data Centre at Delhi was upgraded with high speed Network backbone, 1.6 Petabyte Enterprise class storage, high throughput Network Load Balancers, and Intrusion Prevention Systems. Solution for Backup as a Service & Storage as a Service has been implemented. ICT infrastructure of number of national level projects was hosted/enhanced; these include E-office, e-Courts and e-Transport. Data centres at Pune & Hyderabad are also upgraded with high speed network backbone and storage capacity enhanced. National Data Centre at Hyderabad is being renovated with a capacity of106 Racks. National Cloud Infrastructure NIC launched National Cloud Services in year 2014 under MeghRaj Government of India Cloud Initiative. NIC Cloud Services are being provided from multiple locations of National Data Centres at Bhubaneswar, Delhi, Hyderabad, and Pune. Various new services are now offered on Cloud including Application Programme Monitoring (APM) Service, Data Analytics (DA) Service, Resource Monitoring (RM) Service and Container Service. In order to cater to the projects envisioned under Digital India Programme and growing requirements of existing Projects, over 18,000 Virtual Servers were provisioned and allocated to over 1100 Ministries/Departments for e-Governance Projects. Establishment of Mini-Clouds in States: NIC already had established Mini Clouds in four state units and are operational. During this year Mini Cloud setups have been made operational in ten state units of Assam, Bihar, Chandigarh, Chhattisgarh, Haryana, Karnataka, Kerala, Punjab, Rajasthan and Tripura. Command and Control Centre NIC has been offering services to the government through its 4 National Data Centres and 30 Mini Data Centres across the country. CCC has been set up keeping in view the requirement of a centralized facility to seamlessly monitor the availability of all these Centres and Cloud Services.Over 10,000 e-Governance applications are being hosted by these Centre’s. CCC is providing users with a customized dashboard of Network Management System (NMS) to enable them to monitor their respective applications. NIC is also providing Application Performance Management (APM) through CCC to improve the availability, performance and functioning of critical applications. Cyber Security Network Security The Network Security Division is in relentless pursuit of achieving CIA (Confidentiality, Integrity, and Availability) of ICT assets in NICNET through deployment of expert manpower, appropriate tools, and state-of-the-art technologies. The Network Security Division (NSD) of NIC is engaged in assessment, planning, deployment and management of security devices and solutions across the NICNET in general and the Data Centres in particular. The security span of NSD comprises of all National and State Data Centres, over 1000 LANs of Govt. offices and MPLS networks, more than 2 Lakh endpoints and a series of networking devices deployed across the country. A dedicated team actively monitors real time attacks on 24×7 basis. The Network Security Division (NSD) conducts Security Audit of Data Centres and Bhawan Networks on regular basis and on demand. Besides, review of the network audit performed by third party vendors in NICNET was also undertaken. Cyber Security Policies, Guidelines, Advisories and Standard Operating Procedure(s) are also being regularly prepared, updated and circulated to the NICNET users. Network Security Division is involved in vulnerability assessment of ICT assets in Physical, Virtual, and Cloud environments at regular intervals and on demand.The Network Security Division manages the 24×7 Security Monitoring Centre to ensure real time monitoring, detection, prevention, analysis and reporting of Cyber threats and attacks. Application Security NIC is formulating and updating the Security Policies for NICNET as and when required. Security Audit of Web Applications / Websites, Penetration Testing and Vulnerability Analysis,SSL compliance testing, Version Detection for application hosting environment with infrastructure compliance checks are also done as per user requirement."
3,0.001937,"Page Last Updated Date :March 19th, 2024 Footer About this website Terms of use Website policies Sitemap Help Contact Us Feedback Web Information Manager CISO Recruitment \t\t\t\t\t \t\t\t\t\t Content Owned and Maintained by National Informatics Centre, Ministry of Electronics & IT (MeitY) | Government of India Website is Designed, Developed and Hosted by National Informatics Centre, Ministry of Electronics & IT (MeitY) | Government of India\t\t\t\t\t \t\t\t\t\t Last Updated: May 27, 2024 error: Content is protected !!"
4,0.001858,"Page Last Updated Date :March 22nd, 2024 Footer About this website Terms of use Website policies Sitemap Help Contact Us Feedback Web Information Manager CISO Recruitment \t\t\t\t\t \t\t\t\t\t Content Owned and Maintained by National Informatics Centre, Ministry of Electronics & IT (MeitY) | Government of India Website is Designed, Developed and Hosted by National Informatics Centre, Ministry of Electronics & IT (MeitY) | Government of India\t\t\t\t\t \t\t\t\t\t Last Updated: May 27, 2024 error: Content is protected !!"


# Testing Query_engine with Rerank

In [51]:
file_to_url_mapping = {'news_updates.docx': 'https://maharashtra.nic.in/news-update/',
'Nic.docx': 'https://maharashtra.nic.in/',
'districts.docx': 'https://maharashtra.nic.in/district-centres/',
'events.docx': 'https://maharashtra.nic.in/events/',
'services.docx': 'https://maharashtra.nic.in/services/',
'photo_gallery.docx': 'https://maharashtra.nic.in/photo-gallery/',
'Screen Reader Information.csv': 'https://maharashtra.nic.in/help/',
 'Plug-in for alternate document types.csv': 'https://maharashtra.nic.in/help/',
'SIO (State Informatics Officer).csv': 'https://maharashtra.nic.in/directory/',
 'Maharashtra State Centre, Mumbai.csv': 'https://maharashtra.nic.in/directory/',
 'Ahmadnagar.csv': 'https://maharashtra.nic.in/directory/',
 'Akola.csv': 'https://maharashtra.nic.in/directory/',
 'Amravati.csv': 'https://maharashtra.nic.in/directory/',
 'Aurangabad.csv': 'https://maharashtra.nic.in/directory/',
 'Beed.csv': 'https://maharashtra.nic.in/directory/',
 'Bhandara.csv': 'https://maharashtra.nic.in/directory/',
 'Buldana.csv': 'https://maharashtra.nic.in/directory/',
 'Chandrapur.csv': 'https://maharashtra.nic.in/directory/',
 'Dhule.csv': 'https://maharashtra.nic.in/directory/',
'District Centres (final test).csv': 'https://maharashtra.nic.in/district-centres/',
 'Gadchiroli.csv': 'https://maharashtra.nic.in/directory/',
 'Gondia.csv': 'https://maharashtra.nic.in/directory/',
 'Hingoli.csv': 'https://maharashtra.nic.in/directory/',
 'Jalgaon.csv': 'https://maharashtra.nic.in/directory/',
 'Jalna.csv': 'https://maharashtra.nic.in/directory/',
 'Kolhapur.csv': 'https://maharashtra.nic.in/directory/',
 'Mumbai.csv': 'https://maharashtra.nic.in/directory/',
 'Mumbai Suburban (Bandra).csv': 'https://maharashtra.nic.in/directory/',
 'Nagpur.csv': 'https://maharashtra.nic.in/directory/',
 'Nanded.csv': 'https://maharashtra.nic.in/directory/',
 'Nandurbar.csv': 'https://maharashtra.nic.in/directory/',
 'Nasik.csv': 'https://maharashtra.nic.in/directory/',
 'Osmanabad.csv': 'https://maharashtra.nic.in/directory/',
 'Palghar.csv': 'https://maharashtra.nic.in/directory/',
 'Parbhani.csv': 'https://maharashtra.nic.in/directory/',
 'Pune District.csv': 'https://maharashtra.nic.in/directory/',
'Reporting_officer.csv': 'https://maharashtra.nic.in/organization-structure/',
 'Raigad.csv': 'https://maharashtra.nic.in/directory/',
 'Ratnagiri.csv': 'https://maharashtra.nic.in/directory/',
 'Sangli.csv': 'https://maharashtra.nic.in/directory/',
 'Satara.csv': 'https://maharashtra.nic.in/directory/',
 'Sindhu Durg.csv': 'https://maharashtra.nic.in/directory/',
 'Solapur.csv': 'https://maharashtra.nic.in/directory/',
 'Thane.csv': 'https://maharashtra.nic.in/directory/',
 'Wardha.csv': 'https://maharashtra.nic.in/directory/',
 'Washim.csv': 'https://maharashtra.nic.in/directory/',
 'Yavatmal.csv': 'https://maharashtra.nic.in/directory/',
 'APPELLATE AUTHORITY.csv': 'https://maharashtra.nic.in/rti/',
 'PUBLIC INFORMATION OFFICERS ( PIO ).csv': 'https://maharashtra.nic.in/rti/'}

# Post Processors


In [28]:
from llama_index.core.postprocessor import LongContextReorder, SentenceEmbeddingOptimizer
from llama_index.core.postprocessor import (
    FixedRecencyPostprocessor,
    EmbeddingRecencyPostprocessor,
)

persist_dir = "/content/webpages_doc_index"
storage_context = StorageContext.from_defaults(persist_dir=persist_dir)
index = load_index_from_storage(storage_context)

print(index)

reorder = LongContextReorder()
# node_postprocessor_emb = EmbeddingRecencyPostprocessor()

reorder_engine = index.as_chat_engine(
    node_postprocessors=[reorder], similarity_top_k=10
)

# reorder_engine = index.as_chat_engine(
#     node_postprocessors=[SentenceEmbeddingOptimizer(percentile_cutoff=0.7)]
# )

base_engine = index.as_chat_engine(similarity_top_k=10)

<llama_index.core.indices.vector_store.base.VectorStoreIndex object at 0x7d4268907970>


In [None]:
from llama_index.core.response.notebook_utils import display_response

base_response = base_engine.chat("What is published date of the Training Program of IVFRT-MMP at FRRO?")
display_response(base_response)

**`Final Response:`** The published date of the Training Program of IVFRT-MMP at FRRO is August 20, 2013.

In [None]:
base_response

AgentChatResponse(response='The published date of the Training Program of IVFRT-MMP at FRRO is August 20, 2013.', sources=[ToolOutput(content='According to the provided context information, the published date of the "Training Program of IVFRT-MMP at FRRO" is August 20, 2013.', tool_name='query_engine_tool', raw_input={'input': 'published date of the Training Program of IVFRT-MMP at FRRO'}, raw_output=Response(response='According to the provided context information, the published date of the "Training Program of IVFRT-MMP at FRRO" is August 20, 2013.', source_nodes=[NodeWithScore(node=TextNode(id_='90290cbd-19a7-4594-9076-f07b68779323', embedding=None, metadata={'file_name': 'news_updates.docx', 'file_path': '/content/docs/news_updates.docx', 'file_type': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document', 'file_size': 15496, 'creation_date': '2024-06-06', 'last_modified_date': '2024-06-06'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'cr

In [None]:
node = base_response.source_nodes[0]
response_json = {}
response_json['response'] = base_response.response
response_json['Search_Score'] = node.score

try:
  response_json['url'] = str(node.metadata['URL'])
except  KeyError as e:
  response_json['url'] =  str(file_to_url_mapping[node.metadata['file_name']])

print(response_json)

{'response': 'The published date of the Training Program of IVFRT-MMP at FRRO is August 20, 2013.', 'Search_Score': 0.4379604420790352, 'url': 'https://maharashtra.nic.in/news-update/'}


In [None]:
print(base_response.get_formatted_sources())

> Source (Doc id: 1f49a56e-9966-4bb0-a2c7-e5b1c387e46a): The following are the list of photo titles which are available in the photo gallery section of th...

> Source (Doc id: 90290cbd-19a7-4594-9076-f07b68779323): Following are the news updates available in the official website of NIC Maharashtra https://mahar...

> Source (Doc id: dce2aafa-31bf-494f-98e3-454961649537): Below is the list of all the 5 events that took place in NIC as mentioned in the official website...

> Source (Doc id: 2f4b8ee2-5eb0-449c-8204-9746ac19217d): 1. What is NIC? or Introduction of NIC or Profile of NIC

Answer:

National Informatics Centre (N...

> Source (Doc id: f897f45e-34aa-475f-a8f2-3310a9b51765): Here is the list of 35 districts/district centres where National Informatics Centre (NIC) has its...

> Source (Doc id: a659c0ac-15b7-4cee-b8b5-9fc68d21dd65): Overview of Services offered by NIC:

NIC is closely associated with the government in different ...


In [29]:
reorder_response = reorder_engine.chat("What is published date of the eFile implementation at NRHM-Mumbai?")
display_response(reorder_response)

ValueError: Reached max iterations.

In [None]:
node = reorder_response.source_nodes[0]
response_json = {}
response_json['response'] = reorder_response.response
response_json['Search_Score'] = node.score

try:
  response_json['url'] = str(node.metadata['URL'])
except  KeyError as e:
  response_json['url'] =  str(file_to_url_mapping[node.metadata['file_name']])

print(response_json)

{'response': 'The published date of the eFile implementation at NRHM-Mumbai is March 13, 2012.', 'Search_Score': 0.7125760023009837, 'url': 'https://maharashtra.nic.in/events/'}


In [None]:
print(reorder_response.get_formatted_sources())

> Source (Doc id: 90290cbd-19a7-4594-9076-f07b68779323): Following are the news updates available in the official website of NIC Maharashtra https://mahar...

> Source (Doc id: 2f4b8ee2-5eb0-449c-8204-9746ac19217d): 1. What is NIC? or Introduction of NIC or Profile of NIC

Answer:

National Informatics Centre (N...

> Source (Doc id: a659c0ac-15b7-4cee-b8b5-9fc68d21dd65): Overview of Services offered by NIC:

NIC is closely associated with the government in different ...

> Source (Doc id: f897f45e-34aa-475f-a8f2-3310a9b51765): Here is the list of 35 districts/district centres where National Informatics Centre (NIC) has its...

> Source (Doc id: dce2aafa-31bf-494f-98e3-454961649537): Below is the list of all the 5 events that took place in NIC as mentioned in the official website...

> Source (Doc id: 1f49a56e-9966-4bb0-a2c7-e5b1c387e46a): The following are the list of photo titles which are available in the photo gallery section of th...


In [None]:
reorder_response

AgentChatResponse(response='The published date of the Training Program of IVFRT-MMP at FRRO is August 20, 2013.', sources=[ToolOutput(content='According to the provided context information, the published date of the Training Program of IVFRT-MMP at FRRO is August 20, 2013.', tool_name='query_engine_tool', raw_input={'input': 'published date of the Training Program of IVFRT-MMP at FRRO'}, raw_output=Response(response='According to the provided context information, the published date of the Training Program of IVFRT-MMP at FRRO is August 20, 2013.', source_nodes=[NodeWithScore(node=TextNode(id_='1f49a56e-9966-4bb0-a2c7-e5b1c387e46a', embedding=None, metadata={'file_name': 'photo_gallery.docx', 'file_path': '/content/docs/photo_gallery.docx', 'file_type': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document', 'file_size': 14816, 'creation_date': '2024-06-06', 'last_modified_date': '2024-06-06'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'crea

# Print the response

In [52]:
from llama_index.core.storage.chat_store import SimpleChatStore
from llama_index.core.memory import ChatMemoryBuffer
from llama_index.core import VectorStoreIndex, get_response_synthesizer
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.postprocessor import LongContextReorder
from llama_index.core.postprocessor import SentenceTransformerRerank


persist_dir = "/content/combined_data_index"
storage_context = StorageContext.from_defaults(persist_dir=persist_dir)
index = load_index_from_storage(storage_context)

print(index)

# reorder = LongContextReorder()
rerank = SentenceTransformerRerank(model="BAAI/bge-reranker-large", top_n=7)


chat_store = SimpleChatStore()

chat_memory = ChatMemoryBuffer.from_defaults(
    token_limit=30000000,
    chat_store=chat_store,
    chat_store_key="user1",

)

chat_engine = index.as_chat_engine(
    chat_mode="context",
    system_prompt="You are a helpful AI assistant. You are expert in retrieving the answer for the user input based on the provided context. Important note: You must not mention any file name or file path in your response.",
    memory=chat_memory,
    similarity_top_k=10,
    node_postprocessors=[rerank],
    verbose = True
)



<llama_index.core.indices.vector_store.base.VectorStoreIndex object at 0x7f15b03fff10>




In [None]:
response = query_engine.query("What is published date of AGMARKNET Workshops in Maharashtra by NIC Mumbai?")

node = response.source_nodes[0]
response_json = {}
response_json['response'] = response.response
response_json['Search_Score'] = node.score

try:
  response_json['url'] = str(node.metadata['URL'])
except  KeyError as e:
  response_json['url'] =  str(file_to_url_mapping[node.metadata['file_name']])

print(response_json)

{'response': 'I\'m unable to find any information about a "Training Program of IVFRT-MMP at FRRO" in the provided context. The files and data mentioned appear to be related to personnel information, district centers, and screen readers, but there is no mention of a training program or its published date.', 'Search_Score': 0.4713932929016982, 'url': 'https://maharashtra.nic.in/rti/'}


In [56]:
response = chat_engine.chat("to whom did Ms. Meera Joshi reports?")

node = response.source_nodes[0]
response_json = {}
response_json['response'] = response.response
response_json['Search_Score'] = node.score

try:
  response_json['url'] = str(node.metadata['URL'])
except  KeyError as e:
  response_json['url'] =  str(file_to_url_mapping[node.metadata['file_name']])

print(response_json)

{'response': 'According to the provided context, Ms. Meera Joshi reports to Shri Manoj Kumar Mishra.', 'Search_Score': 0.91262907, 'url': 'https://maharashtra.nic.in/organization-structure/'}


In [None]:
response_json

{'response': 'The published date of the "Training Program of IVFRT-MMP at FRRO" is August 20, 2013.',
 'Search_Score': 0.00040770933,
 'url': 'https://maharashtra.nic.in/news-update/'}

In [None]:
response_json

{'response': 'The published date of AGMARKNET Workshops in Maharashtra by NIC Mumbai is December 29, 2004.',
 'Search_Score': 9.792949e-05,
 'url': 'https://maharashtra.nic.in/events/'}

In [None]:
response

AgentChatResponse(response='The published date of AGMARKNET Workshops in Maharashtra by NIC Mumbai is December 29, 2004.', sources=[ToolOutput(content='system: You are a helpful AI assistant. You are expert in retrieving the answer for the user input based on the provided context. Important note: You must not mention any file name or file path in your response.\nContext information is below.\n--------------------\nfile_path: /content/docs/events.docx\n\nBelow is the list of all the 5 events that took place in NIC as mentioned in the official website of NIC maharashtra:\n\nCyber Crime and Cyber Laws (Start Date: 28/09/2022, End Date: 28/09/2022, Venue: NIC Maharashtra State Centre, Mumbai) Webinar on “Cyber Crime and Cyber Laws ” at NIC Maharashtra State Centre, Mumbai September 28th, 2022.\n\nSuperannuation of Sh. Dhanjay Kulkarni, DIO Pune and Ms. Sneha Shula, ADIO Nashik in the Month of April 2022, (Start Date: 30/04/2022, End Date : 30/04/2022, Venue : Mumbai)\n\nSIO Maharashtra met

In [None]:
response_json

{'response': 'The published date of SAMPARK is July 31, 2021.',
 'Search_Score': 0.81622535,
 'url': 'https://maharashtra.nic.in/news-update/'}

In [None]:
response_json

{'response': 'The eGov Maharashtra was held at Mumbai on May 9-10, 2013.',
 'Search_Score': 0.007553987,
 'url': 'https://maharashtra.nic.in/news-update/'}

In [None]:
# tell me about organization structure
response_json

{'response': "Based on the provided context, I can infer that the organization is likely a government agency or department in India. Here's an attempt to outline the organization structure:\n\n1. **State Informatics Officer (SIO)**: This appears to be a senior-level position responsible for overseeing informatics activities at the state level.\n2. **Scientist-F**: This designation seems to refer to a technical expert or specialist, possibly with a focus on information technology or computer science.\n3. **Junior Secretariat Assistant**: This is likely an entry-level administrative support role.\n4. **Reporting Officer**: This term suggests that there may be a hierarchical structure within the organization, where certain individuals report to specific officers or supervisors.\n\nThe organization seems to have a mix of technical and administrative roles, with a focus on information technology and computer science. The presence of a State Informatics Officer and Public Information Officer

In [None]:
# Who is the SIo
response_json

{'response': 'According to the provided context, the SIO (State Informatics Officer) is Ms. Sapna Kapoor, who holds the designation of Scientist-F and has an email address of sapna[dot]kapoor[at]nic[dot]in.',
 'Search_Score': 0.046427052,
 'url': 'https://maharashtra.nic.in/directory/'}

In [None]:
# How many districts centres are there?
response_json

{'response': 'There are 35 district centers where National Informatics Centre (NIC) has its presence.',
 'Search_Score': 0.6682483,
 'url': 'https://maharashtra.nic.in/district-centres/'}

In [None]:
# What is the published date of SIMNIC?
response_json

{'response': 'The published date of SIMNIC is August 2, 2021.',
 'Search_Score': 0.96835625,
 'url': 'https://maharashtra.nic.in/news-update/'}

In [None]:
# What are NIC services?
response_json

{'response': 'According to the provided context, NIC (National Informatics Centre) provides various services including:\n\n1. National Cloud\n2. Messaging\n3. Video Conferencing\n4. Domain Registration and Webcast\n5. Multi-Gigabit Nationwide Networks (NICNET)\n6. National Data Centres\n7. National Cloud\n8. Pan-India VC Infrastructure\n9. Command and Control Centre\n10. Multi-Layered GIS-Based Platform\n\nThese services are designed to support e-governance initiatives, provide infrastructure for government departments, and facilitate communication and collaboration among stakeholders.',
 'Search_Score': 0.9801593,
 'url': 'https://maharashtra.nic.in/services/'}

In [None]:
# What is the published date of Training Program of IVFRT-MMP at FRRO?
response_json

{'response': 'I\'m happy to help! However, I don\'t see any information about a news article titled "Training Program of IVFRT-MMP at FRRO, Mumbai" in the provided context. The context appears to be a list of personnel with their names, designations, emails, phones, and IP numbers. There is no mention of a news article or its published date. If you could provide more information or clarify what you\'re looking for, I\'d be happy to try and assist you further!',
 'Search_Score': 9.042643e-05,
 'url': 'https://maharashtra.nic.in/directory/'}

In [None]:
#  What is the published date of Implementation of Aadhaar Enabled Biometric Attendance System (AEBAS) in NIC?
response_json

{'response': 'According to the provided context, the published date of Implementation of Aadhaar Enabled Biometric Attendance System (AEBAS) in NIC is November 21, 2014.',
 'Search_Score': 0.9996412,
 'url': 'https://maharashtra.nic.in/news-update/'}

In [None]:
# What is the published date of Inauguration of Direct Benefit Transfer (AADHAAR) in pilot District Wardha (Maharashtra)?
response_json

{'response': 'According to the provided context, the published date of Inauguration of Direct Benefit Transfer (AADHAAR) in pilot District Wardha (Maharashtra) is January 1, 2013.',
 'Search_Score': 0.0042579635,
 'url': 'https://maharashtra.nic.in/news-update/'}

In [None]:
# When was the Training Program of IVFRT-MMP at FRRO, Mumbai took place?
response_json

{'response': 'According to the provided context information, the published date of the news article titled "Training Program of IVFRT-MMP at FRRO, Mumbai" is August 20, 2013.',
 'Search_Score': 0.0035063014,
 'url': 'https://maharashtra.nic.in/news-update/'}

In [None]:
# what is the published date of SIMNIC?
response_json

{'response': 'August 2, 2021',
 'Search_Score': 0.96835625,
 'url': 'https://maharashtra.nic.in/news-update/'}

In [None]:
# what is the published date of SAMPARK?
response_json

{'response': 'According to the provided context, SAMPARK is a NIC eGov Mobile App that provides information about Government officials (in service) about their present posting details and keeps it updated. The main aim of this application is to provide information to Government officials about their current postings.',
 'Search_Score': 0.57889265,
 'url': 'https://maharashtra.nic.in/news-update/'}

In [None]:
# What is the news update?
response_json

{'response': 'There is no news update. The provided context appears to be a list of screen reader information, including their names, websites, and whether they are free or commercial. There is no indication of any news-related updates.',
 'Search_Score': 7.6261866e-05,
 'url': 'https://maharashtra.nic.in/help/'}

In [None]:
# Who is the SIO?
response_json

{'response': 'Based on the provided context information, there is no mention of a person named "SIO". However, we do have two files with paths `/content/tabular_data_csv/SIO.csv` and `/content/tabular_data_csv/APPELLATE AUTHORITY.csv`. Since the query asks about the SIO, I would say that the SIO refers to the file path `/content/tabular_data_csv/SIO.csv`, but there is no information provided about who this person might be.',
 'Search_Score': 0.0002616312,
 'url': 'https://maharashtra.nic.in/directory/'}

In [None]:
# What are the events?
response_json

{'response': "Based on the provided context, the events mentioned are:\n\n1. Cyber Crime and Cyber Laws (September 28th, 2022)\n2. Superannuation of Sh. Dhanjay Kulkarni and Ms. Sneha Shula (April 30th, 2022)\n3. SIO Maharashtra met new Director General, NIC (June 1st, 2022)\n4. Hon'ble Minister of Electronics and Information Technology and Railways visit to Mumbai (February 18th, 2022)\n5. Swachhatta Pakhwada from 01st to 15th February 2022\n\nThese events are mentioned in the file_path: /content/text/events.txt context.",
 'Search_Score': 0.016033381,
 'url': 'https://maharashtra.nic.in/photo-gallery/'}

In [None]:
# what is the organization structure?
response_json

{'response': 'The organization structure consists of various officials with their respective designations, reporting to other officials. The officials include:\n\n1. Ms. D. Lakshmi Prasanna - Scientist-F\nReporting To: Shri Anand Swarup Srivastava\n2. Shri C.Prasanna - Scientist-E\nReporting To: Ms. D. Lakshmi Prasanna\n3. Shri Syed Ajaz Gulab - Scientist-E\nReporting To: Ms. D. Lakshmi Prasanna\n4. Shri Rajendra F. Hatwar - Scientist-D\nReporting To: Ms. D. Lakshmi Prasanna\n5. Shri Kunal Vijay Bhamare - Scientist-D\nReporting To: Shri Rajendra F. Hatwar\n6. Shri Saravana Kumar - Scientist-B\nReporting To: Shri Kunal Vijay Bhamare\n\nAnd so on, with multiple officials listed under various categories such as Scientist-F, Scientist-E, Scientist-D, and more.',
 'Search_Score': 0.7342759,
 'url': 'https://maharashtra.nic.in/organization-structure/'}

In [None]:
# whom did Ms. D. Lakshmi Prasanna reports to?
response_json

{'response': 'Shri Sunil Digambarro Rao Potekar.',
 'Search_Score': 0.02319305,
 'url': 'https://maharashtra.nic.in/directory/'}

In [None]:
# what is there in the video gallery?
response_json

{'response': "According to the provided context information, the Video Gallery contains a collection of videos showcasing NIC Maharashtra's ICT support during COVID-19.",
 'Search_Score': 0.024777586,
 'url': 'https://maharashtra.nic.in/video-gallery/'}

In [None]:
response_json

{'response': "Based on the provided context information, National Informatics Centre (NIC) provides various services and platforms to facilitate e-Governance applications in Government Ministries/Departments at the Centre, States, Districts, and Block level. Some of the notable works done by NIC include:\n\n1. Providing different kinds of E-mail services over NICNET, which is NIC's satellite-based communication network.\n2. Establishing a robust Messaging framework that includes core eMail application Gateway services, Short Messaging Service (SMS), OBD (Outbound Dialing), and an IT Platform for citizen engagement (Sampark).\n3. Developing and hosting various websites and portals, such as the NIC website, which provides information on NIC's services, products, and platforms.\n4. Providing institutional linkages across all the Ministries /Departments of the Central Government, State Governments, Union Territories, and District administrations of the country through its Information and C

# Chat_Engine and Storing

In [None]:
from llama_index.core.storage.chat_store import SimpleChatStore
from llama_index.core.memory import ChatMemoryBuffer

chat_store = SimpleChatStore()

chat_memory1 = ChatMemoryBuffer.from_defaults(
    token_limit=30000000,
    chat_store=chat_store,
    chat_store_key="user1",

)

chat_engine = index.as_chat_engine(
    chat_mode="context",
    system_prompt="You are a helpful AI assistant. You are expert in retrieving the answer for the user input based on the provided context. Important note: You must not mention any file name or file path in your response.",
    memory=chat_memory1, verbose = True, similarity_top_k=10, node_postprocessors=[rerank]
)



In [None]:
val_query_response = chat_engine.chat("Can you please keep in mind the following information: Training Program of IVFRT-MMP at FRRO, Mumbai on 20th August 2013, C-FRO200813, Publish Date : August 20, 2013. Also direct the user to link/url: https://maharashtra.nic.in/news-update/  .")

In [None]:
val_query_response = chat_engine.chat("Can you tell me the start and end date of Swachhatta Pakhwada event?")

In [None]:
val_query_response

AgentChatResponse(response="Based on the provided context, National Informatics Centre (NIC) has been involved in various activities and projects across different domains. Here's a summary of some of the key works done by NIC:\n\n**Digital Services:**\n\n* Cloud services\n* Domain registration\n* Email services\n* Security solutions\n* Hosting services\n* Video-conferencing\n\n**Infrastructure:**\n\n* Data centers\n* Networking services\n* Office-automation solutions\n\n**ICT Support:**\n\n* Projects like biometric attendance, e-Office, messaging, cyber security, and webcasting\n* Development of digital agriculture platforms, fintech, health-tech, and edu-tech solutions\n\n**Development Projects:**\n\n* Digital agriculture platforms\n* Fintech solutions\n* Health-tech solutions\n* Edu-tech solutions\n\n**GIS-based Initiatives:**\n\n* GIS portal for Slum Rehabilitation Schemes (SRA) in Mumbai\n* Mobile app for SRA, providing citizen-centric information on slum rehabilitation schemes\n\n

In [None]:
val_query_response.response

'I apologize for missing that earlier! Thank you for providing the additional context. According to the provided information, the published date of the news article titled "Training Program of IVFRT-MMP at FRRO, Mumbai" is August 20, 2013.'

In [None]:
node = val_query_response.source_nodes[0]
response_json = {}
response_json['response'] = val_query_response.response
response_json['search_Score'] = node.score

try:
  response_json['url'] = str(node.metadata['URL'])
except  KeyError as e:
  response_json['url'] =  str(file_to_url_mapping[node.metadata['file_name']])




In [None]:
response_json

{'response': 'I\'ve taken note of the additional context! Thank you for providing the specific information about the news article titled "Training Program of IVFRT-MMP at FRRO, Mumbai". I\'ll make sure to keep this in mind as we continue our conversation.\n\nTo answer your original question: The published date of the news article titled "Training Program of IVFRT-MMP at FRRO, Mumbai" is August 20, 2013.',
 'search_Score': 0.030322898,
 'url': 'https://maharashtra.nic.in/photo-gallery/'}

In [54]:
chat_store.persist(persist_path="/content/chat_store_v3.pkl")


In [55]:
chat_store

SimpleChatStore(store={'user1': [ChatMessage(role=<MessageRole.USER: 'user'>, content='Please remember that the Training Program of IVFRT-MMP at FRRO, Mumbai on 20th August 2013, Publish Date : August 20, 2013. this info is from the url https://maharashtra.nic.in/news-update/   ', additional_kwargs={}), ChatMessage(role=<MessageRole.ASSISTANT: 'assistant'>, content="I apologize for not mentioning any file name or file path in my previous response. Here's a helpful AI assistant response:\n\nThe Training Program of IVFRT-MMP at FRRO, Mumbai on 20th August 2013, Publish Date : August 20, 2013 is an important news update from the official website of NIC Maharashtra https://maharashtra.nic.in/news-update/. This event highlights the importance of training and capacity building in the field of IVFRT-MMP.", additional_kwargs={})]})

# Loading the memory


In [None]:
persist_dir = "/content/combined_index_v3"

storage_context = StorageContext.from_defaults(persist_dir=persist_dir)
index1 = load_index_from_storage(storage_context)

print(index)

<llama_index.core.indices.vector_store.base.VectorStoreIndex object at 0x7cb53b45b0d0>


In [None]:
# prompt: Write the code to retrieve chats from chat_store_old.pkl file and then query the chat_engine
from llama_index.core.storage.chat_store import SimpleChatStore
from llama_index.core.memory import ChatMemoryBuffer

# # Load the chat store from the file
# loaded_chat_store1 = SimpleChatStore.from_persist_path(
#     persist_path="/content/combined_index_v1.zip"
# )

store = SimpleChatStore()

chat_memory = ChatMemoryBuffer.from_defaults(
    token_limit=300000,
    chat_store=store,
    chat_store_key="user1",

)


# Initialize the chat engine
chat_engine1 = index1.as_chat_engine(
    chat_mode="context",
    system_prompt="You are a helpful AI assistant. You are expert in retrieving the answer for the user input based on the provided context. You must not mention any file name or file path in your response.",
    memory=chat_memory, verbose = True, similarity_top_k=10, node_postprocessors=[rerank]
)



In [None]:
response = chat_engine1.chat("what is the publish date of news article titled as Training Program of IVFRT-MMP at FRRO?")


In [None]:
node = response.source_nodes[0]
response_json = {}
response_json['response'] = response.response
response_json['search_Score'] = node.score

try:
  response_json['url'] = str(node.metadata['URL'])
except  KeyError as e:
  response_json['url'] =  str(file_to_url_mapping[node.metadata['file_name']])


In [None]:
# Can you tell me about  Swachhatta Pakhwada ?
response_json

{'response': 'According to the context, Swachhatta Pakhwada was observed at NIC Maharashtra State and District Centres from 1st to 15th February 2022. During this period, all officers and staff from State centre and District centres took the Swachhatta Pledge on 2nd Feb. 2022.\n\nAdditionally, with reference to the circular No. A-60019/04/2021/GCS-1 dated 28/01/2022 from NIC HQ regarding disposal of obsolete Computer Hardware and its related peripherals and furniture items during the Swachhatta Pakhwada 2022, NIC Maharashtra has auctioned obsolete hardware, old books, and weeded files as per the guidelines. Also, initiated the process of auctioning obsolete air conditioners, old furniture, and remaining obsolete hardware from State and District Centres of Maharashtra.',
 'search_Score': 0.13719939,
 'url': 'https://maharashtra.nic.in/events/'}

In [None]:
# What are the events that took place?
response_json

{'response': 'Based on the provided context, the following events took place at NIC:\n\n1. Cyber Crime and Cyber Laws Webinar (Start Date: 28/09/2022, End Date: 28/09/2022, Venue: NIC Maharashtra State Centre, Mumbai)\n2. Superannuation of Sh. Dhanjay Kulkarni, DIO Pune and Ms. Sneha Shula, ADIO Nashik in the Month of April 2022 (Start Date: 30/04/2022, End Date: 30/04/2022, Venue: Mumbai)\n3. SIO Maharashtra met new Director General, NIC on 01/06/2022 (Start Date: 01/06/2022, End Date: 01/06/2022, Venue: New Delhi)\n4. Hon’ble Minister of Electronics and Information Technology and Railways visit to Mumbai (Start Date: 18/02/2022, End Date: 18/02/2022, Venue: Mumbai)\n5. Swachhatta Pakhwada from 01st to 15th February 2022 (Start Date: 01/02/2022, End Date: 15/02/2022, Venue: NIC Maharashtra State and District Centres)\n\nPlease note that these events are specific to the context provided and may not be an exhaustive list of all events that took place at NIC.',
 'search_Score': 0.0470416

In [None]:
response_json

{'response': 'I\'m happy to help! However, I don\'t see any information about a news article titled "Training Program of IVFRT-MMP at FRRO" in the provided context. The context appears to be related to personnel data and does not mention any specific news articles or publish dates. If you could provide more information or clarify which training program you are referring to, I\'ll do my best to assist you.',
 'search_Score': 7.7409335e-05,
 'url': 'https://maharashtra.nic.in/directory/'}

In [None]:
# Query the chat engine
response = chat_engine1.chat("Can you tell me about Swachhatta Pakhwada?")

# Print the response
print(response)


I remember!

Swachhata Pakwada is a 15-day cleanliness drive organized by the National Informatics Centre (NIC) in collaboration with various government departments and organizations. The objective of this initiative is to promote cleanliness, hygiene, and sanitation across the country.

The Swachhata Pakwada campaign focuses on creating awareness about the importance of cleanliness, encouraging people to take ownership of their surroundings, and promoting a culture of cleanliness and hygiene.

Some of the key activities undertaken during the Swachhata Pakwada include:

1. Cleanliness drives: Organizing mass cleaning campaigns in public places, streets, markets, and other areas.
2. Awareness programs: Conducting awareness programs through various media channels, such as print, electronic, and social media, to educate people about the importance of cleanliness.
3. Community engagement: Encouraging community participation and involvement in cleanliness activities.
4. Waste management: Or

In [None]:
# Query the chat engine
response = chat_engine1.chat("What is the published date of Implementation of Aadhar Enabled Biometric Attendance System (AEBAS) in NIC Maharashtra?")

# Print the response
print(response)

According to the text you provided, the published date of "Implementation of Aadhar Enabled Biometric Attendance System (AEBAS) in NIC Maharashtra" is November 21, 2014.


In [None]:
# Query the chat engine
response = chat_engine1.chat("When was the Glimpses of eGov Maharashtra held at Mumbai?")

# Print the response
print(response)

According to the text you provided, the Glimpses of eGov Maharashtra was held at Mumbai on May 9-10, 2013.


In [None]:
# Query the chat engine
response = chat_engine1.chat("What is GIS for SRA Mumbai?")

# Print the response
print(response_json)

{'response': 'According to the context, GIS for SRA Mumbai refers to a Geographic Information System-based application developed by NIC Mumbai for Slum Rehabilitation Authority (SRA) in Mumbai. The application provides citizen-centric information of slum rehabilitation schemes based on user-location and has been created with technical support from Utility Mapping Division of NIC, New Delhi.', 'search_Score': 0.86200213, 'url': 'https://maharashtra.nic.in/news-update/'}


In [None]:
# Query the chat engine
response = chat_engine1.chat("When was this took place?")

# Print the response
print(response)

I'm glad I got to dig deeper into the news update!

According to the NIC news update, the inauguration of e-Office in Sindhudurg Collectorate by the Hon'ble Chief Minister of Maharashtra took place on October 15, 2015.


In [None]:
# Query the chat engine
response = chat_engine1.chat("Can you tell me the start date and end date of Swachhatta Pakhwada event?")

# Print the response
print(response)


According to the text, the Swachhatta Pakhwada event was observed at NIC Maharashtra State and District Centres from **February 1st** to **February 15th**.


In [None]:
# Query the chat engine
response = chat_engine1.chat("Provide me all the events took place in NIC")

# Print the response
print(response)


Based on the provided context, the following events took place at NIC:

1. Cyber Crime and Cyber Laws Webinar (Start Date: 28/09/2022, End Date: 28/09/2022, Venue: NIC Maharashtra State Centre, Mumbai)
2. Superannuation of Sh. Dhanjay Kulkarni, DIO Pune and Ms. Sneha Shula, ADIO Nashik in the Month of April 2022 (Start Date: 30/04/2022, End Date: 30/04/2022, Venue: Mumbai)
3. SIO Maharashtra met new Director General, NIC on 01/06/2022 (Start Date: 01/06/2022, End Date: 01/06/2022, Venue: New Delhi)
4. Hon’ble Minister of Electronics and Information Technology and Railways visit to Mumbai (Start Date: 18/02/2022, End Date: 18/02/2022, Venue: Mumbai)
5. Swachhatta Pakhwada from 01st to 15th February 2022 (Start Date: 01/02/2022, End Date: 15/02/2022, Venue: NIC Maharashtra State and District Centres)

Please note that these events are specific to the context provided and may not be an exhaustive list of all events that took place at NIC.


In [None]:
# Query the chat engine
response = chat_engine1.chat("What is AEBAS?")

# Print the response
print(response)

I remember!

AEBAS stands for Aadhaar Enabled Biometric Attendance System. It is an attendance tracking system that uses biometric authentication (fingerprints or facial recognition) and Aadhaar numbers to record employee attendance.

The AEBAS system aims to improve the accuracy and efficiency of attendance tracking, reducing manual errors and increasing transparency. It also helps in monitoring and analyzing attendance patterns, making it easier to identify trends and take corrective actions.

In the context of NIC Maharashtra, AEBAS is likely used to track attendance for employees working in various government offices and departments across the state.


In [None]:
# Query the chat engine
response = chat_engine1.chat("What is the published date of of the news regarding Inauguration of e-Office in Sindhudurg Collectorate by Hon’ble Chief Minister of Maharashtra?")

# Print the response
print(response)

I apologize, but I couldn't find any information about the inauguration of e-Office in Sindhudurg Collectorate by Hon'ble Chief Minister of Maharashtra. The text you provided appears to be a list of district centers and their contact information, but it does not contain any specific news or publication dates regarding the inauguration of e-Office in Sindhudurg Collectorate.


In [None]:
# Query the chat engine
response = chat_engine1.chat("from the url https://maharashtra.nic.in/news-update/ can you tell me what is the published date of Inauguration of e-Office in Sindhudurg Collectorate by Hon’ble Chief Minister of Maharashtra?")

# Print the response
print(response)

I've checked the URL `https://maharashtra.nic.in/news-update/` and found that there is no specific news article about the Inauguration of e-Office in Sindhudurg Collectorate by Hon'ble Chief Minister of Maharashtra. The URL appears to be a general news update page, but it does not contain any information about this specific event.


In [None]:
# Query the chat engine
response = chat_engine1.chat("whom did Ms. D. Lakshmi Prasanna reports to?")

# Print the response
print(response)


According to the provided context, Ms. D. Lakshmi Prasanna reports to Shri Anand Swarup Srivastava.


In [None]:
# Query the chat engine
response = chat_engine1.chat("whom did Shri Anand Sharadchandra Ladhe reports to?")

# Print the response
print(response)


According to the provided context, Shri Anand Sharadchandra Ladhe reports to Shri Devashish Chanda.


In [None]:
response = chat_engine1.chat("What is SAMPARK?")
print(response)

I remember!

SAMPARK is a NIC eGov Mobile App, and I can retrieve information about it from the URL: `https://maharashtra.nic.in/news-update/`

Please let me know what you'd like to know about SAMPARK!


In [None]:
response = chat_engine1.chat("Can you list down all the districts?")
print(response)

I remember!

Here is the list of 35 districts where National Informatics Centre (NIC) has its presence:

1. Ahmadnagar
2. Akola
3. Amravati
4. Aurangabad
5. Beed
6. Bhandara
7. Buldana
8. Chandrapur
9. Dhule
10. Gadchiroli
11. Gondia
12. Hingoli
13. Jalgaon
14. Jalna
15. Kolhapur
16. Mumbai
17. Mumbai Suburban (Bandra)
18. Nagpur
19. Nanded
20. Nandurbar
21. Nasik
22. Osmanabad
23. Palghar
24. Parbhani
25. Pune District
26. Raigad
27. Ratnagiri
28. Sangli
29. Satara
30. Sindhu Durg
31. Solapur
32. Thane
33. Wardha
34. Washim
35. Yavatmal

Let me know if you need anything else!


In [None]:
response = chat_engine1.chat("What is the published date of SIMNIC?")
print(response)

According to the provided context, the published date of SIMNIC (Status Information Management system of NIC) is August 2, 2021.


In [None]:
response = chat_engine1.chat("who is Shri S.K. Kulkarni?")

# Print the response
print(response)


According to the provided context, Shri S.K. Kulkarni is the DIO (District Informatics Officer) of Satara district.


In [None]:
response = chat_engine1.chat("what are NIC services?")

# Print the response
print(response)


I remember! According to the provided context, you can get information about NIC services from the URL `https://maharashtra.nic.in/services/`.


In [None]:
response_json

{'response': 'According to the text, the Swachhatta Pakhwada event was observed at NIC Maharashtra State and District Centres from **February 1st** to **February 15th**.',
 'search_Score': 0.97805095,
 'url': 'https://maharashtra.nic.in/events/'}

In [None]:
response_json

{'response': "I can recall that the Swachhata Pakwada event took place from February 1st to February 15th in the year 2022!\n\nAccording to my knowledge, the exact dates for the Swachhata Pakwada event may vary slightly from year to year, but generally, it falls within this 15-day period in February.\n\nHere are some specific events that took place during the Swachhata Pakwada 2022:\n\n1. **February 1st-15th**: Swachhata Pakwada (Cleanliness Fortnight) - a nationwide campaign to promote cleanliness and sanitation.\n2. **February 4th, 2022**: Hon'ble Minister of Electronics and Information Technology and Railways visit to Mumbai.\n\nPlease note that these dates might not be applicable for future events, but I'll do my best to provide accurate information based on the provided context!",
 'search_Score': 0.5701683740029008,
 'url': 'https://maharashtra.nic.in/events/'}