# Installing packages

- langchain
- openai
- tqdm: library to show the progress of an action (downloading, training, ...) 
- jq: lightweight and flexible JSON processor
- unstructured: A library that prepares raw documents for downstream ML tasks
- pypdf: A pure-python PDF library capable of splitting, merging, cropping, and transforming PDF files
- tiktoken: a fast open-source tokenizer by OpenAI.

In [1]:
%pip install langchain openai tqdm jq unstructured pypdf tiktoken

Note: you may need to restart the kernel to use updated packages.


# Loading documents

In [2]:
from langchain.document_loaders import (
    UnstructuredCSVLoader,
    UnstructuredHTMLLoader,
    UnstructuredImageLoader,
    PythonLoader,
    PyPDFLoader,
    JSONLoader
)

from langchain.document_loaders.csv_loader import CSVLoader

file_path ="/Users/damienbenveniste/Projects/Teaching/Introduction_Langchain/data/csv_data/weather.csv"

csv_loader = CSVLoader(file_path=file_path)
weather_data = csv_loader.load()


In [58]:
weather_data[0]

Document(page_content='country: Afghanistan\ncapital: Kabul\ndate: 1966-03-02\nseason: winter\navg_temp_c: 7.1\nmin_temp_c: \nmax_temp_c: \nprecipitation_mm: \nsnow_depth_mm: \navg_wind_dir_deg: \navg_wind_speed_kmh: \npeak_wind_gust_kmh: \navg_sea_level_pres_hpa: \nsunshine_total_min: ', metadata={'source': '/Users/damienbenveniste/Projects/Teaching/Introduction_Langchain/data/csv_data/weather.csv', 'row': 0})

In [7]:
import pandas as pd

df = pd.read_csv(file_path)

df

Unnamed: 0,country,capital,date,season,avg_temp_c,min_temp_c,max_temp_c,precipitation_mm,snow_depth_mm,avg_wind_dir_deg,avg_wind_speed_kmh,peak_wind_gust_kmh,avg_sea_level_pres_hpa,sunshine_total_min
0,Afghanistan,Kabul,1966-03-02,winter,7.1,,,,,,,,,
1,Afghanistan,Kabul,1966-03-28,spring,7.9,,,,,,,,,
2,Afghanistan,Kabul,1966-05-02,spring,18.8,,22.2,,,,,,,
3,Afghanistan,Kabul,1966-05-04,spring,19.7,,27.2,,,,,,,
4,Afghanistan,Kabul,1966-05-18,spring,24.6,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5001181,Zimbabwe,Harare,2023-08-09,winter,15.3,7.9,23.0,0.0,,124.0,13.1,,1024.0,
5001182,Zimbabwe,Harare,2023-08-10,winter,15.5,10.4,20.8,0.0,,109.0,17.9,,1024.8,
5001183,Zimbabwe,Harare,2023-08-11,winter,15.5,9.5,22.0,0.0,,105.0,11.1,,1023.1,
5001184,Zimbabwe,Harare,2023-08-12,winter,16.4,9.1,22.4,0.0,,94.0,8.7,,1020.6,


In [8]:
file_path = '/Users/damienbenveniste/Projects/Teaching/Introduction_Langchain/data/mixed_data/element_of_SL.pdf'

sl_loader = PyPDFLoader(file_path=file_path)
sl_data = sl_loader.load_and_split()

In [59]:
sl_data[0]

Document(page_content='Springer Series in Statistics\nTrevor Hastie\nRobert TibshiraniJerome FriedmanSpringer Series in Statistics\nThe Elements of\nStatistical Learning\nData Mining, Inference, and Prediction\nThe Elements of Statistical LearningDuring the past decade there has been an explosion in computation and information tech-\nnology. With it have come vast amounts of data in a variety of fields such as medicine, biolo-gy, finance, and marketing. The challenge of understanding these data has led to the devel-opment of new tools in the field of statistics, and spawned new areas such as data mining,machine learning, and bioinformatics. Many of these tools have common underpinnings butare often expressed with different terminology. This book describes the important ideas inthese areas in a common conceptual framework. While the approach is statistical, theemphasis is on concepts rather than mathematics. Many examples are given, with a liberaluse of color graphics. It should be a va

In [13]:
from langchain.text_splitter import (
    CharacterTextSplitter,
    RecursiveCharacterTextSplitter
)

# split on "\n\n"
splitter1 = CharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=0,
)

# split ["\n\n", "\n", " ", ""]
splitter2 = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=0,
)


sl_data1 = sl_loader.load_and_split(text_splitter=splitter1)
sl_data2 = sl_loader.load_and_split(text_splitter=splitter2)

In [25]:
len(sl_data1[600].page_content)

2164

In [18]:
len(sl_data2[1].page_content)

999

In [26]:
from langchain.document_loaders import DirectoryLoader

folder_path = '/Users/damienbenveniste/Projects/Teaching/Introduction_Langchain/data/mixed_data/'

mixed_loader = DirectoryLoader(
    path=folder_path,
    use_multithreading=True,
    show_progress=True
)

mixed_data = mixed_loader.load_and_split()

100%|████████████████████████████████████████████| 3/3 [06:13<00:00, 124.47s/it]


In [32]:
mixed_data[]

Document(page_content='Springer Series in Statistics\n\nSpringer Series in Statistics\n\nH a s t i e • T i b s h i r a n\n\nTrevor Hastie Robert Tibshirani Jerome Friedman\n\ni\n\nF r i e d m a n\n\nTrevor Hastie • Robert Tibshirani • Jerome Friedman The Elements of Statictical Learning\n\nDuring the past decade there has been an explosion in computation and information tech- nology. With it have come vast amounts of data in a variety of fields such as medicine, biolo- gy, finance, and marketing. The challenge of understanding these data has led to the devel- opment of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with 

# Summarizing

## The "stuff" chain

In [63]:
from langchain.chains.summarize import load_summarize_chain
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI()

chain = load_summarize_chain(
    llm=llm,
    chain_type='stuff'
)

chain.run(sl_data[:20])

InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 12499 tokens. Please reduce the length of the messages.

In [37]:
chain.run(weather_data[:5])

'The given data provides information about the weather conditions in Kabul, Afghanistan during different dates in 1966. The average temperature, maximum temperature, and season are mentioned for each date, while other weather parameters such as minimum temperature, precipitation, snow depth, wind direction and speed, peak wind gust, sea level pressure, and sunshine duration are not provided.'

## Custom prompt

In [39]:
print(chain.llm_chain.prompt.template)

Write a concise summary of the following:


"{text}"


CONCISE SUMMARY:


In [40]:
from langchain.prompts import PromptTemplate

template = """
Write a concise summary of the following in spanish:

"{text}"

CONCISE SUMMARY IN SPANISH:
"""

prompt = PromptTemplate.from_template(template)

chain = load_summarize_chain(
    llm=llm,
    prompt=prompt   
)

chain.run(sl_data[:2])

'La Serie Springer de Estadística presenta el libro "The Elements of Statistical Learning" escrito por Trevor Hastie, Robert Tibshirani y Jerome Friedman. Esta obra aborda el desafío de comprender grandes cantidades de datos en áreas como la medicina, biología, finanzas y marketing, y presenta herramientas estadísticas y nuevas áreas como la minería de datos, el aprendizaje automático y la bioinformática. El libro cubre desde el aprendizaje supervisado hasta el no supervisado, incluyendo redes neuronales, máquinas de vectores de soporte, árboles de clasificación y aumento. Esta segunda edición incluye nuevos temas como modelos gráficos, bosques aleatorios, métodos de ensamble y algoritmos para el lasso. Los autores son destacados investigadores en estadística y desarrolladores de herramientas de minería de datos.'

## The Map-reduce chain

In [42]:
chain = load_summarize_chain(
    llm=llm,
    chain_type='map_reduce',
    verbose=True
)

chain.run(sl_data[:20])



[1m> Entering new MapReduceDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"Springer Series in Statistics
Trevor Hastie
Robert TibshiraniJerome FriedmanSpringer Series in Statistics
The Elements of
Statistical Learning
Data Mining, Inference, and Prediction
The Elements of Statistical LearningDuring the past decade there has been an explosion in computation and information tech-
nology. With it have come vast amounts of data in a variety of fields such as medicine, biolo-gy, finance, and marketing. The challenge of understanding these data has led to the devel-opment of new tools in the field of statistics, and spawned new areas such as data mining,machine learning, and bioinformatics. Many of these tools have common underpinnings butare often expressed with different terminology. This book describes the important ideas inthese areas in a common conceptual framework. While the ap


[1m> Finished chain.[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"The book "The Elements of Statistical Learning" is a comprehensive resource that covers various tools and concepts in the field of statistics, data mining, and machine learning. Written by Trevor Hastie, Robert Tibshirani, and Jerome Friedman, the book explores topics such as neural networks, support vector machines, and classification trees. The second edition includes additional topics like graphical models, random forests, and ensemble methods. The authors are prominent researchers in the field and have made significant contributions to statistical modeling and data mining.

The page acknowledges and thanks the parents and families of Valerie and Patrick Hastie, Vera and Sami Tibshirani, and Florence and Harry Friedman. The families mentioned include Samantha, Timothy, and Lynda; Charlie, Ryan, Julie, and Cheryl; and Melanie, Dora, 


[1m> Finished chain.[0m

[1m> Finished chain.[0m


'"The Elements of Statistical Learning" is a comprehensive book on statistics, data mining, and machine learning. It covers various topics such as neural networks, support vector machines, and classification trees. The second edition includes new chapters on graphical models, random forests, and ensemble methods. The book is written by prominent researchers in the field and aims to provide conceptual understanding rather than theoretical properties. It discusses challenges in the field of statistics, the emergence of data mining and bioinformatics, and the goal of learning from data. The book also includes sections on supervised and unsupervised learning, linear methods for regression and classification, kernel smoothing methods, model assessment and selection, boosting and neural networks, and topics in unsupervised learning. It concludes with chapters on random forests, ensemble learning, undirected graphical models, and high-dimensional problems. The book provides exercises for prac

## Custom prompt

In [46]:
print(chain.llm_chain.prompt.template)

Write a concise summary of the following:


"{text}"


CONCISE SUMMARY:


In [47]:
print(chain.combine_document_chain.llm_chain.prompt.template)

Write a concise summary of the following:


"{text}"


CONCISE SUMMARY:


In [50]:
map_template = """The following is a set of documents

{text}

Based on this list of docs, please identify the main themes 
Helpful Answer:"""

combine_template = """The following is a set of summaries:

{text}

Take these and distill it into a final, consolidated list of the main themes. 
Return that list as a comma separated list. 
Helpful Answer:"""


map_prompt = PromptTemplate.from_template(map_template)
combine_prompt = PromptTemplate.from_template(combine_template)

chain = load_summarize_chain(
    llm=llm,
    chain_type='map_reduce',
    map_prompt=map_prompt,
    combine_prompt=combine_prompt,
    verbose=True
)

chain.run(sl_data[:20])





[1m> Entering new MapReduceDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a set of documents

Springer Series in Statistics
Trevor Hastie
Robert TibshiraniJerome FriedmanSpringer Series in Statistics
The Elements of
Statistical Learning
Data Mining, Inference, and Prediction
The Elements of Statistical LearningDuring the past decade there has been an explosion in computation and information tech-
nology. With it have come vast amounts of data in a variety of fields such as medicine, biolo-gy, finance, and marketing. The challenge of understanding these data has led to the devel-opment of new tools in the field of statistics, and spawned new areas such as data mining,machine learning, and bioinformatics. Many of these tools have common underpinnings butare often expressed with different terminology. This book describes the important ideas inthese areas in a common conceptual framework. While the approach i


[1m> Finished chain.[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a set of summaries:

Based on the list of documents, the main themes are:
1. Springer Series in Statistics: This is a series of books on statistics.
2. The Elements of Statistical Learning: This book discusses data mining, inference, and prediction using statistical concepts.
3. Computation and information technology: The explosion of computation and information technology has led to the development of new tools in statistics, such as data mining and machine learning.
4. Broad coverage: The book covers various topics, including neural networks, support vector machines, classification trees, boosting, graphical models, random forests, ensemble methods, and more.
5. Authors' expertise: The authors are prominent researchers in the field of statistics and have contributed to the development of statistical modeling software and various data mining tools.

The main the


[1m> Finished chain.[0m

[1m> Finished chain.[0m


"Springer Series in Statistics, The Elements of Statistical Learning, Computation and information technology, Broad coverage, Authors' expertise, Family, Gratitude, Introduction to the second edition, Overview of Supervised Learning, Linear Methods for Regression, Linear Methods for Classification, Basis Expansions and Regularization, Kernel Smoothing Methods, Model Assessment and Selection, Model Inference and Averaging, Additive Models, Trees, and Related Methods, Boosting and Additive Trees, Neural Networks, Support Vector Machines and Flexible Discriminants, Prototype Methods and Nearest-Neighbors, Unsupervised Learning, Random Forests, Ensemble Learning, Undirected Graphical Models, High-Dimensional Problems, Bootstrap and Maximum Likelihood Methods, Bayesian Methods and their relationship with the Bootstrap, EM Algorithm, MCMC for Sampling from the Posterior, Bagging and Model Averaging, Stochastic Search: Bumping, Radial Basis Functions and Kernels, Mixture Models for Density Es

In [51]:
chain = load_summarize_chain(
    llm=llm,
    chain_type='map_reduce',
    verbose=True
)

chain.run(weather_data[:200])



[1m> Entering new MapReduceDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"country: Afghanistan
capital: Kabul
date: 1966-03-02
season: winter
avg_temp_c: 7.1
min_temp_c: 
max_temp_c: 
precipitation_mm: 
snow_depth_mm: 
avg_wind_dir_deg: 
avg_wind_speed_kmh: 
peak_wind_gust_kmh: 
avg_sea_level_pres_hpa: 
sunshine_total_min: "


CONCISE SUMMARY:[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"country: Afghanistan
capital: Kabul
date: 1966-03-28
season: spring
avg_temp_c: 7.9
min_temp_c: 
max_temp_c: 
precipitation_mm: 
snow_depth_mm: 
avg_wind_dir_deg: 
avg_wind_speed_kmh: 
peak_wind_gust_kmh: 
avg_sea_level_pres_hpa: 
sunshine_total_min: "


CONCISE SUMMARY:[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"country: Afghanistan
capital: Kabul
date: 1966-05-02
season: spring
avg_temp_c: 18.8
min_temp_c: 


Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised ServiceUnavailableError: The server is overloaded or not ready yet..



[1m> Finished chain.[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"Afghanistan's capital is Kabul and the given information is related to the weather on March 2, 1966, during the winter season. The average temperature was 7.1°C, but there is no data provided for the minimum and maximum temperature, precipitation, snow depth, wind direction, wind speed, peak wind gust, sea level pressure, or total sunshine duration.

In Afghanistan, the capital city is Kabul. The date provided is March 28, 1966, indicating the season as spring. The average temperature is 7.9 degrees Celsius, but specific details for minimum and maximum temperatures, precipitation, snow depth, wind direction and speed, peak wind gust, sea level pressure, and total sunshine minutes are not provided.

On May 2, 1966, during the spring season in Afghanistan, the capital city Kabul exper


[1m> Finished chain.[0m

[1m> Finished chain.[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"In Afghanistan, the capital city is Kabul. The given information does not provide details about the maximum temperature, precipitation, snow depth, average wind direction, average wind speed, peak wind gust, average sea level pressure, or total sunshine minutes. However, the average temperature in autumn is 7.1 degrees Celsius, with a minimum temperature of 3.9 degrees Celsius.

Afghanistan is a country with its capital in Kabul. The date mentioned is October 28, 1967, during the autumn season. The average temperature is 6.8°C, with a minimum of 1.1°C and a maximum of 20.0°C. Information about precipitation, snow depth, wind direction and speed, peak wind gust, sea level pressure, and total sunshine duration is not provided.

Afghanistan is a country with 


[1m> Finished chain.[0m

[1m> Finished chain.[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"In Afghanistan, the capital city Kabul experienced winter on December 26, 1968. The average temperature was -2.2°C, with a minimum temperature of -7.8°C and a maximum temperature of 7.8°C. No information is provided about precipitation, snow depth, wind direction and speed, peak wind gust, average sea level pressure, or total sunshine duration.

In Afghanistan, the capital city is Kabul. The given information is related to the date, season, average temperature, and other weather conditions, but specific values for minimum temperature, maximum temperature, precipitation, snow depth, average wind direction, average wind speed, peak wind gust, average sea level pressure, and total sunshine are missing.

Afghanistan is the country with Kabul as its capital. Th


[1m> Finished chain.[0m

[1m> Finished chain.[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"This is a summary of weather data for Afghanistan on February 15, 1973. The capital city is Kabul, and it is winter season. The average temperature is -1.0°C. No data is provided for minimum and maximum temperature, precipitation, snow depth, average wind direction, average wind speed, peak wind gust, average sea-level pressure, and total sunshine duration.

This is information about Afghanistan, specifically the capital city Kabul, on February 16, 1973 during the winter season. The average temperature is -3.2 degrees Celsius, but there is no information provided for minimum and maximum temperatures, precipitation, snow depth, average wind direction, average wind speed, peak wind gust, average sea level pressure, and total sunshine minutes.

This informati


[1m> Finished chain.[0m

[1m> Finished chain.[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"The information provided is about Afghanistan. The capital is Kabul and the date is April 3, 1973, during the spring season. The average temperature is 13.8 degrees Celsius, but there is no information available for minimum and maximum temperatures, precipitation, snow depth, average wind direction, wind speed, peak wind gust, average sea level pressure, and total sunshine duration.

This is information about Afghanistan, specifically the capital city Kabul. The date is April 4, 1973, which is during the spring season. The average temperature is 7.0 degrees Celsius, but there is no information provided for the minimum and maximum temperatures. There is also no information for precipitation, snow depth, average wind direction, average wind speed, peak wind 


[1m> Finished chain.[0m

[1m> Finished chain.[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"This is a summary of weather data for Afghanistan on May 15, 1973. The capital city is Kabul and it is spring season. The average temperature is 14.8°C and information regarding minimum and maximum temperature, precipitation, snow depth, wind direction and speed, peak wind gust, sea level pressure, and sunshine duration is not provided.

The given information provides details about Afghanistan, specifically the capital city, Kabul, and the date, which is May 17, 1973. The season is stated as spring, and the average temperature is 14.3 degrees Celsius, with a minimum temperature of 11.0 degrees Celsius. However, the maximum temperature, precipitation, snow depth, average wind direction, average wind speed, peak wind gust, average sea level pressure, and tot


[1m> Finished chain.[0m

[1m> Finished chain.[0m


"The provided information includes details about the weather in Afghanistan's capital city, Kabul, on various dates in 1966 and 1967. The data includes the season and average temperature, but lacks information on other weather factors such as precipitation, wind, and sunshine duration."

## The Refine chain

In [52]:
chain = load_summarize_chain(
    llm=llm,
    chain_type='refine',
    verbose=True
)

chain.run(sl_data[:20])



[1m> Entering new RefineDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"Springer Series in Statistics
Trevor Hastie
Robert TibshiraniJerome FriedmanSpringer Series in Statistics
The Elements of
Statistical Learning
Data Mining, Inference, and Prediction
The Elements of Statistical LearningDuring the past decade there has been an explosion in computation and information tech-
nology. With it have come vast amounts of data in a variety of fields such as medicine, biolo-gy, finance, and marketing. The challenge of understanding these data has led to the devel-opment of new tools in the field of statistics, and spawned new areas such as data mining,machine learning, and bioinformatics. Many of these tools have common underpinnings butare often expressed with different terminology. This book describes the important ideas inthese areas in a common conceptual framework. While the appro

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised ServiceUnavailableError: The server is overloaded or not ready yet..



[1m> Finished chain.[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYour job is to produce a final summary
We have provided an existing summary up to a certain point: The book "The Elements of Statistical Learning" is a comprehensive guide to data mining, machine learning, and bioinformatics. It covers a broad range of topics, including supervised and unsupervised learning, and includes new topics not covered in the original edition. The authors are professors of statistics at Stanford University and are prominent researchers in the field.
We have the opportunity to refine the existing summary(only if needed) with some more context below.
------------
This is page v
Printer: Opaque this
To our parents:
Valerie and Patrick Hastie
Vera and Sami Tibshirani
Florence and Harry Friedman
and to our families:
Samantha, Timothy, and Lynda
Charlie, Ryan, Julie, and Cheryl
Melanie, Dora, Monika, and Ildiko
------------
Given the new context, refine the orig


[1m> Finished chain.[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYour job is to produce a final summary
We have provided an existing summary up to a certain point: Refined summary: The book provides a comprehensive overview of machine learning, covering topics such as supervised learning, regression and classification methods, kernel smoothing, model assessment and selection, additive models and trees, boosting, neural networks, support vector machines, prototype methods, unsupervised learning, random forests, ensemble learning, undirected graphical models, and high-dimensional problems. The second edition also includes chapters on undirected graphical models and learning in high-dimensional feature spaces. The authors have made improvements to address colorblind readers and have taken steps to clarify concepts related to error-rate estimation. The preface also acknowledges the contributions and feedback from various individuals.
We have the o


[1m> Finished chain.[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYour job is to produce a final summary
We have provided an existing summary up to a certain point: The book provides a comprehensive overview of machine learning, covering topics such as supervised learning, regression and classification methods, kernel smoothing, model assessment and selection, additive models and trees, boosting, neural networks, support vector machines, prototype methods, unsupervised learning, random forests, ensemble learning, undirected graphical models, and high-dimensional problems. The second edition also includes chapters on undirected graphical models and learning in high-dimensional feature spaces. The authors have made improvements to address colorblind readers and have taken steps to clarify concepts related to error-rate estimation. The preface acknowledges the contributions and feedback from various individuals and emphasizes the importance of sta


[1m> Finished chain.[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYour job is to produce a final summary
We have provided an existing summary up to a certain point: The book provides a comprehensive overview of machine learning, covering topics such as supervised learning, regression and classification methods, kernel smoothing, model assessment and selection, additive models and trees, boosting, neural networks, support vector machines, prototype methods, unsupervised learning, random forests, ensemble learning, undirected graphical models, and high-dimensional problems. The second edition also includes chapters on undirected graphical models and learning in high-dimensional feature spaces. The authors have made improvements to address colorblind readers and have taken steps to clarify concepts related to error-rate estimation. The book covers variable types, statistical decision theory, local methods in high dimensions, structured regression 


[1m> Finished chain.[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYour job is to produce a final summary
We have provided an existing summary up to a certain point: The book covers a comprehensive range of topics in machine learning, including supervised learning, regression, classification, kernel smoothing, model assessment and selection, additive models, trees, boosting, neural networks, support vector machines, prototype methods, unsupervised learning, random forests, ensemble learning, undirected graphical models, high-dimensional problems, and more. The second edition also includes chapters on undirected graphical models and learning in high-dimensional feature spaces. The authors have made improvements to address colorblind readers and clarify concepts related to error-rate estimation. The book also covers variable types, statistical decision theory, local methods in high dimensions, structured regression models, function approximation, 


[1m> Finished chain.[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYour job is to produce a final summary
We have provided an existing summary up to a certain point: The book covers a comprehensive range of topics in machine learning, including supervised learning, regression, classification, kernel smoothing, model assessment and selection, additive models, trees, boosting, neural networks, support vector machines, prototype methods, unsupervised learning, random forests, ensemble learning, undirected graphical models, high-dimensional problems, and more. The second edition also includes chapters on undirected graphical models, learning in high-dimensional feature spaces, support vector machines, flexible discriminants, prototype methods, and nearest-neighbors. The authors have made improvements to address colorblind readers and clarify concepts related to error-rate estimation. The book also covers variable types, statistical decision theory, 


[1m> Finished chain.[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYour job is to produce a final summary
We have provided an existing summary up to a certain point: The book covers a comprehensive range of topics in machine learning, including supervised learning, regression, classification, kernel smoothing, model assessment and selection, additive models, trees, boosting, neural networks, support vector machines, prototype methods, unsupervised learning, random forests, ensemble learning, undirected graphical models, high-dimensional problems, and more. The second edition also includes chapters on undirected graphical models, learning in high-dimensional feature spaces, support vector machines, flexible discriminants, prototype methods, and nearest-neighbors. The authors have made improvements to address colorblind readers and clarify concepts related to error-rate estimation. The book also covers variable types, statistical decision theory, 

'The existing summary is already comprehensive and does not require any refinement.'

## Custom prompt

In [54]:
print(chain.initial_llm_chain.prompt.template)

Write a concise summary of the following:


"{text}"


CONCISE SUMMARY:


In [55]:
print(chain.refine_llm_chain.prompt.template)

Your job is to produce a final summary
We have provided an existing summary up to a certain point: {existing_answer}
We have the opportunity to refine the existing summary(only if needed) with some more context below.
------------
{text}
------------
Given the new context, refine the original summary
If the context isn't useful, return the original summary.


In [56]:
initial_template = """
Extract the most relevant themes from the following:


"{text}"


THEMES:"""

refine_template = """
Your job is to extract the most relevant themes
We have provided an existing list of themes up to a certain point: {existing_answer}
We have the opportunity to refine the existing list(only if needed) with some more context below.
------------
{text}
------------
Given the new context, refine the original list
If the context isn't useful, return the original list and ONLY the original list.
Return that list as a comma separated list.

LIST:"""

initial_prompt = PromptTemplate.from_template(initial_template)
refine_prompt = PromptTemplate.from_template(refine_template)

chain = load_summarize_chain(
    llm=llm,
    chain_type='refine',
    question_prompt=initial_prompt,
    refine_prompt=refine_prompt,
    verbose=True
)

chain.run(sl_data[:20])



[1m> Entering new RefineDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Extract the most relevant themes from the following:


"Springer Series in Statistics
Trevor Hastie
Robert TibshiraniJerome FriedmanSpringer Series in Statistics
The Elements of
Statistical Learning
Data Mining, Inference, and Prediction
The Elements of Statistical LearningDuring the past decade there has been an explosion in computation and information tech-
nology. With it have come vast amounts of data in a variety of fields such as medicine, biolo-gy, finance, and marketing. The challenge of understanding these data has led to the devel-opment of new tools in the field of statistics, and spawned new areas such as data mining,machine learning, and bioinformatics. Many of these tools have common underpinnings butare often expressed with different terminology. This book describes the important ideas inthese areas in a common conceptual framework. Whi


[1m> Finished chain.[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Your job is to extract the most relevant themes
We have provided an existing list of themes up to a certain point: 1. Explosion in computation and information technology, 
2. Vast amounts of data in various fields, 
3. Tools and techniques in statistics, data mining, machine learning, and bioinformatics, 
4. Common conceptual framework for understanding these areas, 
5. Emphasis on concepts rather than mathematics, 
6. Examples and color graphics provided, 
7. Broad coverage, including supervised and unsupervised learning, 
8. Introduction of new topics in the second edition, 
9. Prominent researchers and authors in the field, 
10. Statistical modeling software and environment (R/S-PLUS), 
11. Introduction of various data mining tools and techniques
We have the opportunity to refine the existing list(only if needed) with some more context below.
------------
Preface to the Secon


[1m> Finished chain.[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Your job is to extract the most relevant themes
We have provided an existing list of themes up to a certain point: Explosion in computation and information technology, Vast amounts of data in various fields, Tools and techniques in statistics, data mining, machine learning, and bioinformatics, Common conceptual framework for understanding these areas, Emphasis on concepts rather than mathematics, Examples and color graphics provided, Broad coverage, including supervised and unsupervised learning, Introduction of new topics in the second edition, Prominent researchers and authors in the field, Statistical modeling software and environment (R/S-PLUS), Introduction of various data mining tools and techniques
We have the opportunity to refine the existing list(only if needed) with some more context below.
------------
This is page xiii
Printer: Opaque this
Contents
Preface to the Se


[1m> Finished chain.[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Your job is to extract the most relevant themes
We have provided an existing list of themes up to a certain point: Explosion in computation and information technology, Vast amounts of data in various fields, Tools and techniques in statistics, data mining, machine learning, and bioinformatics, Common conceptual framework for understanding these areas, Emphasis on concepts rather than mathematics, Examples and color graphics provided, Broad coverage, including supervised and unsupervised learning, Introduction of new topics in the second edition, Prominent researchers and authors in the field, Statistical modeling software and environment (R/S-PLUS), Introduction of various data mining tools and techniques.
We have the opportunity to refine the existing list(only if needed) with some more context below.
------------
xvi Contents
6 Kernel Smoothing Methods 191
6.1 One-Dimensional 


[1m> Finished chain.[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Your job is to extract the most relevant themes
We have provided an existing list of themes up to a certain point: Explosion in computation and information technology, Vast amounts of data in various fields, Tools and techniques in statistics, data mining, machine learning, and bioinformatics, Common conceptual framework for understanding these areas, Emphasis on concepts rather than mathematics, Examples and color graphics provided, Broad coverage, including supervised and unsupervised learning, Introduction of new topics in the second edition, Prominent researchers and authors in the field, Statistical modeling software and environment (R/S-PLUS), Introduction of various data mining tools and techniques
We have the opportunity to refine the existing list(only if needed) with some more context below.
------------
Contents xix
Exercises . . . . . . . . . . . . . . . . . . . . . 


[1m> Finished chain.[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Your job is to extract the most relevant themes
We have provided an existing list of themes up to a certain point: Explosion in computation and information technology, Vast amounts of data in various fields, Tools and techniques in statistics, data mining, machine learning, and bioinformatics, Common conceptual framework for understanding these areas, Emphasis on concepts rather than mathematics, Examples and color graphics provided, Broad coverage, including supervised and unsupervised learning, Introduction of new topics in the second edition, Prominent researchers and authors in the field, Statistical modeling software and environment (R/S-PLUS), Introduction of various data mining tools and techniques
We have the opportunity to refine the existing list(only if needed) with some more context below.
------------
xxii Contents
18.2 Diagonal Linear Discriminant Analysis
and Near

'Explosion in computation and information technology, Vast amounts of data in various fields, Tools and techniques in statistics, data mining, machine learning, and bioinformatics, Common conceptual framework for understanding these areas, Emphasis on concepts rather than mathematics, Examples and color graphics provided, Broad coverage, including supervised and unsupervised learning, Introduction of new topics in the second edition, Prominent researchers and authors in the field, Statistical modeling software and environment (R/S-PLUS), Introduction of various data mining tools and techniques'

In [68]:
from langchain.prompts import ChatPromptTemplate
from langchain.chat_models import ChatOpenAI

model = ChatOpenAI()

prompt = ChatPromptTemplate.from_template("tell me a joke about {foo}")

chain = prompt | model

In [72]:
chain.invoke({'foo':'test'})

AIMessage(content='Why did the math book look so sad during the test?\n\nBecause it had too many problems!', additional_kwargs={}, example=False)