In [1]:
import os

os.environ["PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION"] = "python"

In [2]:
from langchain.document_loaders import PyPDFLoader

loader = PyPDFLoader("/sandbox/notebooks/data/1811.12808v3.pdf")
pages = loader.load()

In [3]:
pages = pages[:-2]
len(pages)

47

In [4]:
from IPython.display import Markdown

page = pages[0]
Markdown(page.page_content[0:1000])

Model Evaluation, Model Selection, and Algorithm
Selection in Machine Learning
Sebastian Raschka
University of Wisconsin–Madison
Department of Statistics
November 2018
sraschka@wisc.edu
Abstract
The correct use of model evaluation, model selection, and algorithm selection
techniques is vital in academic machine learning research as well as in many
industrial settings. This article reviews different techniques that can be used for
each of these three subtasks and discusses the main advantages and disadvantages
of each technique with references to theoretical and empirical studies. Further,
recommendations are given to encourage best yet feasible practices in research and
applications of machine learning. Common methods such as the holdout method
for model evaluation and selection are covered, which are not recommended
when working with small datasets. Different ﬂavors of the bootstrap technique
are introduced for estimating the uncertainty of performance estimates, as an
alternative to 

In [5]:
page.metadata

{'producer': 'pdfTeX-1.40.21',
 'creator': 'LaTeX with hyperref',
 'creationdate': '2020-11-12T01:17:31+00:00',
 'author': '',
 'keywords': '',
 'moddate': '2020-11-12T01:17:31+00:00',
 'ptex.fullbanner': 'This is pdfTeX, Version 3.14159265-2.6-1.40.21 (TeX Live 2020) kpathsea version 6.3.2',
 'subject': '',
 'title': '',
 'trapped': '/False',
 'source': '/sandbox/notebooks/data/1811.12808v3.pdf',
 'total_pages': 49,
 'page': 0,
 'page_label': '1'}

In [6]:
import os
import pickle

from langchain_experimental.text_splitter import SemanticChunker
from langchain_google_genai import GoogleGenerativeAIEmbeddings

if os.path.exists("docs.pkl"):
    with open("docs.pkl", "rb") as f:
        docs = pickle.load(f)
else:
    semantic_splitter = SemanticChunker(
        GoogleGenerativeAIEmbeddings(model="models/embedding-001"),
        breakpoint_threshold_type="percentile",
    )
    docs = semantic_splitter.split_documents(pages)
    with open("docs.pkl", "wb") as f:
        pickle.dump(docs, f)

len(docs)

152

In [7]:
from langchain.vectorstores import Chroma

persist_directory = "docs/chroma/"

vectordb = Chroma.from_documents(
    documents=docs,
    embedding=GoogleGenerativeAIEmbeddings(model="models/embedding-001"),
    persist_directory=persist_directory,
)
# vectordb.persist()
# vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)

In [8]:
question = "linear regression"
docs = vectordb.similarity_search(question, k=3)

In [9]:
Markdown(docs[0].page_content)

regression, the model parameters, are updated by maximizing a log-likelihood function or minimizing
the logistic cost. For ﬁtting a model to the training data, a hyperparameter of a logistic regression
algorithm could be the number of iterations or passes over the training set (epochs) in gradient-based
optimization. Another example of a hyperparameter would be the value of a regularization parameter
such as the lambda-term in L2-regularized logistic regression (Figure 11). Figure 11: Conceptual overview of logistic regression. Changing the hyperparameter values when running a learning algorithm over a training set may result
in different models.

In [10]:
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash")

In [11]:
from langchain.prompts import PromptTemplate

template = """
    You are an assistant for question-answering tasks. 
    Use the following pieces of retrieved context to answer the question. 
    If you don't know the answer, just say that you don't know. 
    Use five sentences maximum and keep the answer concise.
    {context}
    Question: {question}
    Helpful Answer:
"""

QA_CHAIN_PROMPT = PromptTemplate.from_template(template)

In [12]:
from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=vectordb.as_retriever(),
    return_source_documents=True,
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT},
)

In [13]:
query = "Are transformers discussed in these documents?"
result = qa_chain.invoke({"query": query})

In [14]:
Markdown(result["result"])

No, transformers are not discussed in these documents. The provided text focuses on model parameters like weight coefficients and bias terms, and introduces the holdout method for model evaluation, including splitting datasets into training and test sets.

In [15]:
query = "What statistical tests are discussed in these documents?"
result = qa_chain.invoke({"query": query})

In [16]:
Markdown(result["result"])

The documents discuss a general procedure for various hypothesis tests. Specifically, they mention computing the z-score as a test statistic within this procedure. The text also refers to the Binomial Test and Multiple Hypotheses Testing.