# Downloading NEcessary Packages

In [1]:
!pip install langchain-community==0.2.15 langchain-chroma==0.1.3 langchain-text-splitters==0.2.2 langchain-huggingface==0.0.3 langchain-groq==0.1.9 unstructured==0.15.0 unstructured[pdf]==0.15.0 nltk==3.8.1

Collecting langchain-community==0.2.15
  Downloading langchain_community-0.2.15-py3-none-any.whl.metadata (2.7 kB)
Collecting langchain-chroma==0.1.3
  Downloading langchain_chroma-0.1.3-py3-none-any.whl.metadata (1.5 kB)
Collecting langchain-text-splitters==0.2.2
  Downloading langchain_text_splitters-0.2.2-py3-none-any.whl.metadata (2.1 kB)
Collecting langchain-huggingface==0.0.3
  Downloading langchain_huggingface-0.0.3-py3-none-any.whl.metadata (1.2 kB)
Collecting langchain-groq==0.1.9
  Downloading langchain_groq-0.1.9-py3-none-any.whl.metadata (2.9 kB)
Collecting unstructured==0.15.0
  Downloading unstructured-0.15.0-py3-none-any.whl.metadata (29 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community==0.2.15)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting langchain<0.3.0,>=0.2.15 (from langchain-community==0.2.15)
  Downloading langchain-0.2.16-py3-none-any.whl.metadata (7.1 kB)
Collecting langchain-core<0.3.0,>=0.2.37 (from langc

In [1]:
!pip install pytesseract poppler-utils

Collecting poppler-utils
  Downloading poppler_utils-0.1.0-py3-none-any.whl.metadata (883 bytes)
Downloading poppler_utils-0.1.0-py3-none-any.whl (9.2 kB)
Installing collected packages: poppler-utils
Successfully installed poppler-utils-0.1.0


# Importing Necessary Packages

In [2]:
import os

from langchain_community.document_loaders import UnstructuredPDFLoader, DirectoryLoader
from langchain_text_splitters import CharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_chroma import Chroma
from langchain_groq import ChatGroq
from langchain.chains import RetrievalQA

# API Reference

In [19]:
GROQ_API_KEY = "your_groq_api_key"

In [20]:
os.environ["GROQ_API_KEY"] = GROQ_API_KEY

## Loading PDF Files for the reference

In [21]:
loader = DirectoryLoader("./data/", glob="*.pdf", loader_cls=UnstructuredPDFLoader)
document = loader.load()

## Split Characters on the basis of Pragraphs

In [23]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Initialize the RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=2000,      # You can adjust this if needed
    chunk_overlap=500,    # Overlap to ensure context continuity
    separators=["\n\n", "\n"]  # First split by paragraphs, then newlines
)

# Split the text into paragraph-wise chunks
text_chunks = text_splitter.split_documents(document)

# Output the chunks or append them as needed
for i, chunk in enumerate(text_chunks):
    print(f"Chunk {i+1}:\n{chunk}\n")


[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Discussion: The core discussion of the post. The talking points from above will be broadened out and explored.

Things To Ponder: Exercises that reinforce the material.' metadata={'source': 'data/business_basics_for_entrepreneurs.pdf'}

Chunk 2899:
page_content='1.2 Organization

The topics are organized into sections with a total length between 500 and 1500 words (2-5 pages) – just enough to grasp the basics. Each topic has the following sections:

Talking Points: The main points of the post. These will generally correspond to the main text headings.

Discussion: The core discussion of the post. The talking points from above will be broadened out and explored.

Things To Ponder: Exercises that reinforce the material.

Exploring Further: Books, web sites or articles that dig deeper into the subject.

1.3 Who is this for?

A lot of people are scared of business but creative types are down right terriﬁed of it. Creatives sq

In [24]:
persist_directory = "doc_db"

In [25]:
embedding = HuggingFaceEmbeddings()



## Converting into Vector Database

In [27]:
vectorstore = Chroma.from_documents(
    documents=text_chunks,
    embedding=embedding,
    persist_directory=persist_directory
  )


It took 1H 20M 51S to Vectorize database

## Retrieving Vector Database

In [28]:
retriever = vectorstore.as_retriever()

##Loading LLM Model

In [29]:
llm = ChatGroq(
    model ="llama-3.1-70b-versatile",
    temperature = 0
)

## Creating Chain

In [30]:
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=True
)

# Passing Test Cases

In [31]:
query = input("Enter your query: ")
response = qa_chain.invoke({"query":query})

Enter your query: What are the types of Business, give 5 Types with brief explanation?


In [32]:
print(response["result"])

Based on the provided context, the following are 5 types of organizational structures in business, along with brief explanations:

1. **Team**: A small group of people that solely focus on one thing, with a leader to guide and motivate them. Think of a sports team with a captain.

2. **Functional**: An organizational structure where common functions (e.g., Engineering, accounting, manufacturing) are managed in the same group. Resources are allocated to those who need them.

3. **Divisional**: An organizational structure that aligns all necessary resources to target a particular market or set of markets. It's like a mini-business, with all necessary functional areas to stand on its own.

4. **Matrix**: A hybrid structure that combines functional and divisional elements. Some resources are functional (e.g., Engineering and sales), while others are divisional (e.g., Marketing and management). This structure aims to gain efficiencies by having functional groups work for multiple divisional

In [33]:
query = input("Enter your query: ")
response = qa_chain.invoke({"query":query})
print(response["result"])

Enter your query: What is Business, Give Types. Additinally what are the practical applications of Business?
**What is Business?**

According to the provided text, a business is defined as:

1. An activity that provides goods or services to consumers for the purpose of making a profit.
2. An organization consisting of one or more people providing goods or services for the benefit of customers and the community.

**Types of Business:**

Based on the text, businesses can be categorized into two main types:

1. **Goods-based businesses**: These businesses produce and sell physical products, such as Apple (iPhone, iPod, Mac), car dealerships, and manufacturers.
2. **Service-based businesses**: These businesses provide intangible offerings, such as banking, internet services, airlines, law firms, movie theaters, and hospitals.

Some businesses may also provide both goods and services, such as a car dealership that sells cars (goods) and offers automobile repairs (services).

**Practical App

In [34]:
query = input("Enter your query: ")
response = qa_chain.invoke({"query":query})
print(response["result"])

Enter your query: How entrepreneur start their business? What are the key role that affect in their performances?
According to the provided context, entrepreneurs start their business by identifying a business opportunity and assuming the risk of creating and running a business to take advantage of it. They typically offer a new product, apply a new technique or technology, open a new market, or develop a new form of organization for the purpose of producing or enhancing a product.

The key roles that affect an entrepreneur's performance include:

1. **Innovating**: Offering a new product, applying a new technique or technology, opening a new market, or developing a new form of organization.
2. **Running a business**: Setting up a business to make a profit from an innovative product or process.
3. **Risk taking**: Working under a certain degree of uncertainty and making decisions with unknown outcomes.

Additionally, the Small Business Administration (SBA) suggests that entrepreneurs a