In [1]:
import os
from dotenv import load_dotenv
load_dotenv()


True

In [2]:
GOOGLE_API_KEY=os.getenv("GOOGLE_API_KEY")


In [4]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
documents=SimpleDirectoryReader("data").load_data()

In [11]:
import google.generativeai as genai

genai.configure(api_key=GOOGLE_API_KEY)
model = genai.GenerativeModel("gemini-1.5-flash")


In [32]:
import os
import PyPDF2
from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain.chains.question_answering import load_qa_chain

# Step 1: Parse the PDF
def parse_pdf(file_path):
    with open(file_path, 'rb') as f:
        reader = PyPDF2.PdfReader(f)
        text = ""
        for page in reader.pages:
            text += page.extract_text()
    return text

# Step 2: Initialize the embedding model (GoogleGenerativeAIEmbeddings)
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

# Step 3: Prepare the document text (from the PDF)
document_text = parse_pdf("data/Analytics_vidya_courses.pdf")  # Path to your PDF file

# Step 4: Generate embeddings for the document text using the GoogleGenerativeAIEmbeddings
vector = embeddings.embed_query(document_text)

# Step 5: Store embeddings in a Chroma vector store


In [34]:
from langchain.docstore.document import Document# Here, vector is the embedding for the entire document
document = Document(page_content=document_text)

# Step 5: Manually store embeddings in a Chroma vector store
# Assuming the vector is a list of embeddings, and you want to store them.
vector_store = Chroma.from_documents(
    [document],  # Document list
    embedding=embeddings,  # Embedding model
    persist_directory="./persisted_vector_store"  # Specify directory to persist vectors
)

In [37]:
document

Document(metadata={}, page_content='LLM  \n1. Framework to Choose the Right \nLLM for your Business  \n \nCourse Description  \nThis course will guide you through the process of selecting the most suitable Large Language \nModel (LLM) for various business needs. By examining factors such as accuracy, cost, \nscalability, and integration, you will understand how different LLMs perform in specific \nscenarios, from customer support to healthcare and strategy development. The course \nemphasizes practical decision -making with real -world case studies, helping business es \nnavigate the rapidly evolving LLM landscape effectively.  \n \nWho should Enroll?  \n\uf0b7 Business leaders seeking to implement AI -driven solutions efficiently.  \n\uf0b7 Data scientists exploring LLMs for industry -specific applications.  \n\uf0b7 Tech professionals involved in AI inte gration and decision -making processes.  \nKey Takeaways  \n\uf0b7 Understand how to evaluate and select the right LLM for business

In [42]:
chat_model = ChatGoogleGenerativeAI(model="gemini-1.5-flash")

# Step 6: Set up Langchain QA pipeline with Google's generative AI model
qa_chain = load_qa_chain(chat_model, chain_type="stuff")

# Step 7: Define the question and query the model
question = "ALL ml courses"
response = qa_chain.run(input_documents=[document], question=question)

# Step 8: Print the response
print(response)

Here's a list of all the machine learning (ML) related courses mentioned in the provided text:

* **Machine Learning Summer Training:** An online program to build and enhance programming and machine learning skills.
* **The A to Z of Unsupervised ML:** Covers unsupervised machine learning techniques.
* **Machine Learning Certification Course for Beginners:** Introduces machine learning concepts and techniques.
* **Loan Prediction Practice Problem (Using Python):**  A course focused on solving binary classification problems using a real-life case study.
* **Twitter Sentiment Analysis:**  A project-based course applying machine learning to sentiment analysis.
* **Ensemble Learning and Ensemble Learning Techniques:** A comprehensive course on ensemble learning methods.
* **K-Nearest Neighbors (KNN) Algorithm in Python and R:** Teaches the KNN algorithm in machine learning.
* **Improving Real World RAG Systems:** While not strictly ML, this course involves improving Retrieval-Augmented Gen

In [45]:
question = "ALL tools in data science courses, also give brief description about it"
response = qa_chain.run(input_documents=[document], question=question)

# Step 8: Print the response
print(response)

Based on the provided text, here are some tools mentioned in the data science courses, along with brief descriptions:

* **Python:** A widely used programming language in data science, known for its readability and extensive libraries for data analysis and machine learning.

* **PyTorch:** A popular deep learning framework used for building and training neural networks, particularly favored for its flexibility and ease of use.

* **Pandas:** A Python library crucial for data manipulation and analysis.  It provides data structures like DataFrames for efficient data handling.

* **Tableau:** A leading business intelligence and data visualization tool. It allows users to create interactive dashboards and visualizations without extensive coding.

* **Microsoft Excel:** A spreadsheet program widely used for data analysis, especially for simpler tasks and initial data exploration.  It includes a variety of formulas and functions for data manipulation.

* **LlamaIndex:** A framework for build