In [16]:
# Read the document
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader(input_files = ["data2/Attention is All You Need.pdf"]).load_data()

In [17]:
# Initiate ollama in local
from llama_index.llms.ollama import Ollama
llm = Ollama(model="llama2", request_timeout=2000.0)

In [3]:
# Set the embedding as we are not using OpenAI API
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings

Settings.embed_model = HuggingFaceEmbedding(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

In [4]:
# Set the llm ( Here llm is ollama)

Settings.chunk_size = 2048
Settings.context_window = 3900

Settings.llm = llm

index = VectorStoreIndex.from_documents(documents)


In [20]:
# get the index and query it
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(streaming=True)
response_stream = query_engine.query("What is this document about?")
response_stream.print_response_stream()



Based on the context information provided, the document appears to be about a variety of topics related to natural language processing (NLP) and machine learning. The document mentions several different techniques and approaches used in NLP, including attention-based models, recurrent neural networks (RNNs), long short-term memory (LSTM) networks, and more. It also references several papers and research works related to these topics.
Without prior knowledge, it is difficult to pinpoint a single topic or theme that the document is about. However, based on the context information provided, it seems likely that the document is discussing various aspects of NLP and machine learning, and how these techniques can be applied to real-world problems.

In [6]:
# This is another document i tested through query engine
query_engine = index.as_query_engine(streaming=True)
response_stream = query_engine.query("What is this document about?")
response_stream.print_response_stream()



Based on the provided context, this document appears to be a set of guidelines for the New Health Insurance Scheme (NHIS) for employees and pensioners in Tamil Nadu, India. The document provides instructions on how to access the NHIS web portal and mobile application, which were developed by the United India Insurance Company Limited to enable effective monitoring and increase transparency in the implementation of the NHIS scheme. The guidelines also provide information on how to login into the web portal or mobile application, including the user ID and password requirements, as well as contact details of district coordinators, nodal officers, and treasury officers for any grievance redressal.

In [7]:
# Reading content from the website url
from llama_index.readers.web import BeautifulSoupWebReader

url = "https://learndatasciencewithme.com"

documents1 = BeautifulSoupWebReader().load_data([url])

In [9]:
index = VectorStoreIndex.from_documents(documents1)
query_engine = index.as_query_engine(streaming=True)
response_stream = query_engine.query("What is this document about?")
response_stream.print_response_stream()


Based on the provided context information, this document appears to be a blog post or article about learning data science with the author's personal journey and experiences. The author mentions that they are starting to learn data science and will update the blog with their learnings, making it a free resource for anyone interested in learning data science. The post also includes some general information about data science and related topics, such as machine learning and statistics.

In [12]:
response_stream = query_engine.query("What other details mentioned in the blog")
response_stream.print_response_stream()

Based on the provided context information, it seems that the blog is focused on learning Data Science for free, without enrolling in a course with a fee structure. The author is sharing their own learnings and experiences in Data Science, and inviting others to join them on this journey.
Other details mentioned in the blog include:
* The author's name is Sridevi, and they have experience in the infrastructure domain.
* The author started learning Data Science out of curiosity, to understand what it's all about.
* The author is now completely obsessed with Data Science, and believes it's not just a word but the future.
* The blog uses Just the Docs, a documentation theme for Jekyll, for its layout.
* The author has a LinkedIn profile.

In [13]:
response_stream.print_response_stream()
response_stream = query_engine.query("What is statistics according to the blog")
response_stream.print_response_stream()

Based on the provided context information, it seems that the blog is focused on learning Data Science for free, without enrolling in a course with a fee structure. The author is sharing their own learnings and experiences in Data Science, and inviting others to join them on this journey.

Other details mentioned in the blog include:

* The author's name is Sridevi, and they have experience in the infrastructure domain.
* The author started learning Data Science out of curiosity, to understand what it's all about.
* The author is now completely obsessed with Data Science, and believes it's not just a word but the future.
* The blog uses Just the Docs, a documentation theme for Jekyll, for its layout.
* The author has a LinkedIn profile.
Statistics, as defined on the blog, refers to the study of collecting, analyzing, and interpreting data. It involves the use of mathematical techniques to summarize and describe the main features of a dataset, as well as to make predictions or decisions 

In [14]:
response_stream.print_response_stream()
response_stream = query_engine.query("What other sub topics are there")
response_stream.print_response_stream()

Statistics, as defined on the blog, refers to the study of collecting, analyzing, and interpreting data. It involves the use of mathematical techniques to summarize and describe the main features of a dataset, as well as to make predictions or decisions based on the data. The blog defines statistics as an essential part of data science, as it provides a foundation for understanding and working with data.
Based on the provided context, there are several other subtopics that are related to data science and machine learning. Here are some of them:
1. Data Preprocessing: This subtopic deals with the cleaning, transforming, and preparing datasets for analysis. It includes tasks such as handling missing values, outliers, and data normalization.
2. Feature Selection: This subtopic involves selecting the most relevant features from a dataset to use in machine learning models. It helps improve the model's performance and reduce the risk of overfitting.
3. Data Visualization: This subtopic deals