In [None]:
! pip install -q --upgrade google-generativeai langchain-google-genai python-dotenv

In [None]:
# prompt: create a .env file in the workspace
# https://makersuite.google.com/

!echo -e 'GOOGLE_API_KEY=AIzaSyCY3JgCLz0sqrR-No9N2czD-iE4A-w-Nnw' > .env

In [None]:
!ls -a

.  ..  .config	data  .env  sample_data


In [None]:
from dotenv import load_dotenv
load_dotenv()

True

In [None]:
from IPython.display import display
from IPython.display import Markdown
import textwrap


def to_markdown(text):
  text = text.replace('•', '  *')
  return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))

In [None]:
import google.generativeai as genai

In [None]:
import os
genai.configure(api_key=os.environ.get("GOOGLE_API_KEY"))

In [None]:
model = genai.GenerativeModel(model_name = "gemini-pro")
model

genai.GenerativeModel(
    model_name='models/gemini-pro',
    generation_config={},
    safety_settings={},
    tools=None,
    system_instruction=None,
    cached_content=None
)

In [None]:
prompt = [
    "What is Mixture of Experts?",
]

response = model.generate_content(prompt)

In [None]:
to_markdown(response.text)

> **Mixture of Experts (MoE)**
> 
> The Mixture of Experts (MoE) model is an ensemble learning technique that utilizes multiple specialized sub-models (experts) to make predictions. Instead of combining the sub-models' predictions through majority voting or averaging, MoE trains a gating network to determine the optimal weight for each expert for a given input.
> 
> **How it Works:**
> 
> 1. **Expert Sub-Models:** The MoE consists of multiple sub-models (experts) that are each specialized in solving a specific sub-task or capturing certain features of the data.
> 
> 2. **Gating Network:** A separate gating network is trained to predict the weight (probability) of each expert on a given input. The gating network ensures that the experts contribute to the final prediction in a way that aligns with their individual strengths.
> 
> 3. **Mixture Prediction:** The final prediction is a weighted sum of the predictions from each expert, with the weights determined by the gating network. This mixture model adapts dynamically to different inputs, allowing for more complex and tailored responses.
> 
> **Advantages:**
> 
> * **Increased Model Capacity:** MoE enhances model capacity by leveraging the collective knowledge of multiple experts, leading to improved accuracy and generalization.
> * **Specialization and Flexibility:** Experts can be trained to address specific sub-tasks or capture different aspects of the data, resulting in a more specialized and flexible model.
> * **Scalability:** MoE can be scaled up to large numbers of experts, increasing its predictive power and applicability to complex problems.
> 
> **Applications:**
> 
> MoE has been successfully applied in various domains, including:
> 
> * Natural Language Processing (NLP)
> * Computer Vision
> * Speech Recognition
> * Machine Translation
> * Recommendation Systems

Langchain


In [None]:
from langchain_google_genai import ChatGoogleGenerativeAI

In [None]:
llm = ChatGoogleGenerativeAI(model="gemini-pro")

In [None]:
result = llm.invoke("What is Mixture of Experts?")

In [None]:
to_markdown(result.content)

> **Mixture of Experts (MoE)**
> 
> **Definition:**
> 
> Mixture of Experts is a machine learning technique that combines multiple "expert" models to improve prediction accuracy. Each expert model is trained on a different subset of the data and specializes in a particular aspect of the problem.
> 
> **Architecture:**
> 
> * **Input:** A feature vector representing the input data.
> * **Gating Network:** A neural network that determines the weights for each expert model.
> * **Expert Models:** Multiple neural networks, each trained on a different subset of the data.
> * **Output:** A weighted combination of the predictions from the expert models.
> 
> **How it Works:**
> 
> 1. The input data is passed through the gating network, which outputs a set of weights for each expert model.
> 2. Each expert model makes a prediction based on its weighted input.
> 3. The weighted predictions from the expert models are combined to produce the final output.
> 
> **Advantages:**
> 
> * **Improved accuracy:** MoE can achieve higher prediction accuracy by leveraging the specialized knowledge of multiple experts.
> * **Robustness:** MoE is more robust to noise and outliers in the data because the predictions are based on a combination of models.
> * **Scalability:** MoE can be scaled to handle large datasets by adding more expert models.
> 
> **Applications:**
> 
> MoE is used in a wide range of applications, including:
> 
> * **Image classification:** By training expert models on different object categories.
> * **Natural language processing:** By training expert models on different aspects of language, such as syntax and semantics.
> * **Medical diagnosis:** By training expert models on different diseases and symptoms.
> 
> **Related Techniques:**
> 
> * **Weighted Ensemble Learning:** A simpler technique that combines multiple base models using fixed weights.
> * **Hierarchical Mixture of Experts:** An extension of MoE where the expert models are organized in a hierarchical structure.

GEMINI PRO VERSION


In [None]:
from langchain_core.messages import HumanMessage
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash")
# example
message = HumanMessage(
    content=[
        {
            "type": "text",
            "text": "What's in this image?",
        },
        {"type": "image_url", "image_url": "https://picsum.photos/seed/picsum/200/300"},
    ]
)
llm.invoke([message]).content

"That's a beautiful landscape photo!  It depicts a snow-covered mountain peak at either sunrise or sunset.  The sky is a soft pastel palette of pinks, oranges, and purples.  In the foreground, there's a gently sloping expanse of snow-covered land.  The overall feeling is one of serenity and vastness."

CHATS WITH DOCUMENTS


In [None]:
!sudo apt -y -qq install tesseract-ocr libtesseract-dev

!sudo apt-get -y -qq install poppler-utils libxml2-dev libxslt1-dev antiword unrtf poppler-utils pstotext tesseract-ocr flac ffmpeg lame libmad0 libsox-fmt-mp3 sox libjpeg-dev swig

!pip install langchain

libtesseract-dev is already the newest version (4.1.1-2.1build1).
tesseract-ocr is already the newest version (4.1.1-2.1build1).
0 upgraded, 0 newly installed, 0 to remove and 19 not upgraded.


In [None]:
!pip install -U langchain-community



In [None]:
import urllib
import warnings
from pathlib import Path as p
from pprint import pprint

import pandas as pd
from langchain import PromptTemplate
from langchain.chains.question_answering import load_qa_chain
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma

warnings.filterwarnings("ignore")


In Context Information Retreival

In [None]:
from langchain_google_genai import ChatGoogleGenerativeAI

In [None]:
model = ChatGoogleGenerativeAI(model="gemini-pro",
                             temperature=0.3)

DOWNLOAD THE CONTENT


In [None]:
data_folder = p.cwd() / "data"
p(data_folder).mkdir(parents=True, exist_ok=True)

pdf_url = "https://services.google.com/fh/files/misc/practitioners_guide_to_mlops_whitepaper.pdf"
pdf_file = str(p(data_folder, pdf_url.split("/")[-1]))

urllib.request.urlretrieve(pdf_url, pdf_file)

('/content/data/practitioners_guide_to_mlops_whitepaper.pdf',
 <http.client.HTTPMessage at 0x78b0a15f0e10>)

EXTRACT TEXT FROM PDF


In [None]:
!pip install pypdf



In [None]:
pdf_loader = PyPDFLoader(pdf_file)
pages = pdf_loader.load_and_split()
print(pages[3].page_content)

4
Organizations can use the framework to identify gaps in building an integrated ML platform and to focus on the scale 
and automate themes from Google’s AI Adoption Framework. The decision about whether (or to which degree) to 
adopt each of these processes and capabilities in your organization depends on your business context. For exam-
ple, you must determine the business value that the framework creates when compared to the cost of purchasing or 
building capabilities (for example, the cost in engineering hours).
Overview of MLOps lifecycle 
and core capabilities
Despite the growing recognition of AI/ML as a crucial pillar of digital transformation, successful deployments and 
effective operations are a bottleneck for getting value from AI. Only one in two organizations has moved beyond 
pilots and proofs of concept. Moreover, 72% of a cohort of organizations that began AI pilots before 2019 have not 
been able to deploy even a single application in production.1 Algorithmia’s surve

In [None]:
context = "\n".join(str(p.page_content) for p in pages[:30])
print("The total words in the context: ", len(context))

The total words in the context:  55372


Prompt Design - In Context

In [None]:
prompt_template = """Answer the question as precise as possible using the provided context. If the answer is
                    not contained in the context, say "answer not available in context" \n\n
                    Context: \n {context}?\n
                    Question: \n {question} \n
                    Answer:
                  """

prompt = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

In [None]:
stuff_chain = load_qa_chain(model, chain_type="stuff", prompt=prompt)

In [None]:
question = "What is Experimentation? Provide a detailed answer."


stuff_answer = stuff_chain(
    {"input_documents": pages[7:10], "question": question}, return_only_outputs=True
)

In [None]:
pprint(stuff_answer)

{'output_text': 'Experimentation is the core activity during the ML '
                'development phase. Data scientists and ML researchers '
                'prototype model architectures and training routines, create '
                'labeled datasets, and use features and other reusable ML '
                'artifacts that are governed through the data and model '
                'management process. The primary output of this process is a '
                'formalized training procedure, which includes data '
                'preprocessing, model architecture, and model training '
                'settings.'}


In [None]:
question = "Describe data management and feature management systems."


stuff_answer = stuff_chain(
    {"input_documents": pages[7:10], "question": question}, return_only_outputs=True
)

pprint(stuff_answer)

{'output_text': 'Answer not available in context'}


RAG Pipeline: Embedding + LLM


In [None]:
!pip install langchain-google-genai



In [None]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings

In [None]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=10000, chunk_overlap=0)
context = "\n\n".join(str(p.page_content) for p in pages)
texts = text_splitter.split_text(context)

In [None]:
# texts


In [None]:
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

In [None]:
!pip install chromadb



In [None]:
vector_index = Chroma.from_texts(texts, embeddings).as_retriever()


In [None]:
question = "Describe data management and feature management systems."
docs = vector_index.get_relevant_documents(question)

In [None]:
docs

[Document(metadata={}, page_content='26\nThe serving engine can serve predictions to consumers in the following \nforms:\n• Online inference in near real time for high-frequency singleton \nrequests (or mini batches of requests), using interfaces like REST \nor gRPC.\n• Streaming inference in near real time, such as through an \nevent-processing pipeline.\n• Offline batch inference for bulk data scoring, usually integrated \nwith extract, transform, load (ETL) processes.\n• Embedded inference as part of embedded systems or edge devic-\nes.\nIn some scenarios of prediction serving, the serving engine might need \nto look up feature values that are related to the request. For example, you \nmight have a model that predicts the propensity of a customer to buy a \nparticular product, given a set of customer and product features. However, \nthe request includes only the customer and the product identifier. There-\nfore, the serving engine uses these identifiers to fetch the customer and \nt

In [None]:
stuff_answer = stuff_chain(
    {"input_documents": docs, "question": question}, return_only_outputs=True
)

In [None]:
pprint(stuff_answer)

{'output_text': 'Data and feature management helps mitigate such issues by '
                'providing \n'
                'a unified repository for ML features and datasets. Figure 12 '
                'shows how the feature and dataset repository provides \n'
                'the same set of data entities for multiple uses in the MLOps '
                'environment.\n'
                'As the diagram shows, the features and datasets are created, '
                'discovered, and reused in different experiments. Batch \n'
                'serving of the data is used for experimentation, continuous '
                'training, and batch prediction, while online serving of the \n'
                'data is used for real-time prediction use cases.\n'
                'Feature management\n'
                'Features are attributes of business entities that are '
                'cleansed and prepared based on standard business rules—ag-\n'
                'gregations, derivations, flags,