generated from MoSHOTus/MoSHOTus
-
Notifications
You must be signed in to change notification settings - Fork 0
/
LLM.txt
1894 lines (1894 loc) · 106 KB
/
LLM.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Retrieval Augmentation for GPT-4 using Pinecone\n",
"\n",
"#### Fixing LLMs that Hallucinate\n",
"\n",
"In this notebook we will learn how to query relevant contexts to our queries from Pinecone, and pass these to a GPT-4 model to generate an answer backed by real data sources.\n",
"\n",
"GPT-4 is a big step up from previous OpenAI completion models. It also exclusively uses the `ChatCompletion` endpoint, so we must use it in a slightly different way to usual. However, the power of the model makes the change worthwhile, particularly when augmented with an external knowledge base like the Pinecone vector database.\n",
"\n",
"Required installs for this notebook are:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "_HDKlQO5svqI",
"outputId": "4a57df82-5e46-4b60-f0c7-c408e3d0f5b0"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[?25l \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m0.0/1.7 MB\u001b[0m \u001b[31m?\u001b[0m eta \u001b[36m-:--:--\u001b[0m\r\u001b[2K \u001b[91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[91m╸\u001b[0m \u001b[32m1.7/1.7 MB\u001b[0m \u001b[31m71.4 MB/s\u001b[0m eta \u001b[36m0:00:01\u001b[0m\r\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.7/1.7 MB\u001b[0m \u001b[31m41.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m70.1/70.1 KB\u001b[0m \u001b[31m6.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m396.0/396.0 KB\u001b[0m \u001b[31m28.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m177.2/177.2 KB\u001b[0m \u001b[31m12.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m62.8/62.8 KB\u001b[0m \u001b[31m4.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.0/1.0 MB\u001b[0m \u001b[31m4.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m58.3/58.3 KB\u001b[0m \u001b[31m8.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.1/1.1 MB\u001b[0m \u001b[31m43.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.3/1.3 MB\u001b[0m \u001b[31m77.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m158.8/158.8 KB\u001b[0m \u001b[31m19.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m199.2/199.2 KB\u001b[0m \u001b[31m26.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m264.6/264.6 KB\u001b[0m \u001b[31m35.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m114.2/114.2 KB\u001b[0m \u001b[31m15.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m49.1/49.1 KB\u001b[0m \u001b[31m7.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m218.0/218.0 KB\u001b[0m \u001b[31m27.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m218.0/218.0 KB\u001b[0m \u001b[31m28.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m211.7/211.7 KB\u001b[0m \u001b[31m12.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25h\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
"google-cloud-translate 3.8.4 requires protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5, but you have protobuf 3.19.3 which is incompatible.\n",
"google-cloud-language 2.6.1 requires protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5, but you have protobuf 3.19.3 which is incompatible.\n",
"google-cloud-firestore 2.7.3 requires protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5, but you have protobuf 3.19.3 which is incompatible.\n",
"google-cloud-datastore 2.11.1 requires protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5, but you have protobuf 3.19.3 which is incompatible.\n",
"google-cloud-bigquery 3.4.2 requires protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5, but you have protobuf 3.19.3 which is incompatible.\n",
"google-cloud-bigquery-storage 2.19.0 requires protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5, but you have protobuf 3.19.3 which is incompatible.\n",
"google-api-core 2.11.0 requires protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5, but you have protobuf 3.19.3 which is incompatible.\u001b[0m\u001b[31m\n",
"\u001b[0m"
]
}
],
"source": [
"!pip install -qU bs4 tiktoken openai langchain pinecone-client[grpc]"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Preparing the Data"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"In this example, we will download the LangChain docs from [langchain.readthedocs.io/](https://langchain.readthedocs.io/latest/en/). We get all `.html` files located on the site like so:"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "xo9gYhGPr_DQ",
"outputId": "f1b00acf-b7f0-48e3-abf6-8b0d8a86ffd2"
},
"outputs": [
{
"data": {
"text/plain": [
"<Response [200]>"
]
},
"execution_count": 56,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"!wget -r -A.html -P rtdocs https://python.langchain.com/en/latest/"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"This downloads all HTML into the `rtdocs` directory. Now we can use LangChain itself to process these docs. We do this using the `ReadTheDocsLoader` like so:"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "n40_0MtlsKgM",
"outputId": "e0978f3f-2b1a-4d95-c8c6-5e73a5f35f56"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
".rst .pdf Welcome to LangChain Contents Getting Started Modules Use Cases Reference Docs LangChain Ecosystem Additional Resources Welcome to LangChain# Large language models (LLMs) are emerging as a transformative technology, enabling developers to build applications that they previously could not. But using these LLMs in isolation is often not enough to create a truly powerful app - the real power comes when you are able to combine them with other sources of computation or knowledge. This library is aimed at assisting in the development of those types of applications. Common examples of these types of applications include: ❓ Question Answering over specific documents Documentation End-to-end Example: Question Answering over Notion Database 💬 Chatbots Documentation End-to-end Example: Chat-LangChain 🤖 Agents Documentation End-to-end Example: GPT+WolframAlpha Getting Started# Checkout the below guide for a walkthrough of how to get started using LangChain to create an Language Model application. Getting Started Documentation Modules# There are several main modules that LangChain provides support for. For each module we provide some examples to get started, how-to guides, reference docs, and conceptual guides. These modules are, in increasing order of complexity: Prompts: This includes prompt management, prompt optimization, and prompt serialization. LLMs: This includes a generic interface for all LLMs, and common utilities for working with LLMs. Document Loaders: This includes a standard interface for loading documents, as well as specific integrations to all types of text data sources. Utils: Language models are often more powerful when interacting with other sources of knowledge or computation. This can include Python REPLs, embeddings, search engines, and more. LangChain provides a large collection of common utils to use in your application. Chains: Chains go beyond just a single LLM call, and are sequences of calls (whether to an LLM or a different utility). LangChain provides a standard interface for chains, lots of integrations with other tools, and end-to-end chains for common applications. Indexes: Language models are often more powerful when combined with your own text data - this module covers best practices for doing exactly that. Agents: Agents involve an LLM making decisions about which Actions to take, taking that Action, seeing an Observation, and repeating that until done. LangChain provides a standard interface for agents, a selection of agents to choose from, and examples of end to end agents. Memory: Memory is the concept of persisting state between calls of a chain/agent. LangChain provides a standard interface for memory, a collection of memory implementations, and examples of chains/agents that use memory. Chat: Chat models are a variation on Language Models that expose a different API - rather than working with raw text, they work with messages. LangChain provides a standard interface for working with them and doing all the same things as above. Use Cases# The above modules can be used in a variety of ways. LangChain also provides guidance and assistance in this. Below are some of the common use cases LangChain supports. Agents: Agents are systems that use a language model to interact with other tools. These can be used to do more grounded question/answering, interact with APIs, or even take actions. Chatbots: Since language models are good at producing text, that makes them ideal for creating chatbots. Data Augmented Generation: Data Augmented Generation involves specific types of chains that first interact with an external datasource to fetch data to use in the generation step. Examples of this include summarization of long pieces of text and question/answering over specific data sources. Question Answering: Answering questions over specific documents, only utilizing the information in those documents to construct an answer. A type of Data Augmented Generation. Summarization: Summarizing longer documents into shorter, more condensed chunks of information. A type of Data Augmented Generation. Evaluation: Generative models are notoriously hard to evaluate with traditional metrics. One new way of evaluating them is using language models themselves to do the evaluation. LangChain provides some prompts/chains for assisting in this. Generate similar examples: Generating similar examples to a given input. This is a common use case for many applications, and LangChain provides some prompts/chains for assisting in this. Compare models: Experimenting with different prompts, models, and chains is a big part of developing the best possible application. The ModelLaboratory makes it easy to do so. Reference Docs# All of LangChain’s reference documentation, in one place. Full documentation on all methods, classes, installation methods, and integration setups for LangChain. Reference Documentation LangChain Ecosystem# Guides for how other companies/products can be used with LangChain LangChain Ecosystem Additional Resources# Additional collection of resources we think may be useful as you develop your application! LangChainHub: The LangChainHub is a place to share and explore other prompts, chains, and agents. Glossary: A glossary of all related terms, papers, methods, etc. Whether implemented in LangChain or not! Gallery: A collection of our favorite projects that use LangChain. Useful for finding inspiration or seeing how things were done in other applications. Deployments: A collection of instructions, code snippets, and template repositories for deploying LangChain apps. Discord: Join us on our Discord to discuss all things LangChain! Tracing: A guide on using tracing in LangChain to visualize the execution of chains and agents. Production Support: As you move your LangChains into production, we’d love to offer more comprehensive support. Please fill out this form and we’ll set up a dedicated support Slack channel. next Quickstart Guide Contents Getting Started Modules Use Cases Reference Docs LangChain Ecosystem Additional Resources By Harrison Chase © Copyright 2022, Harrison Chase. Last updated on Mar 15, 2023.\n"
]
}
],
"source": [
"from langchain.document_loaders import ReadTheDocsLoader\n",
"\n",
"loader = ReadTheDocsLoader('rtdocs')\n",
"docs = loader.load()\n",
"len(docs)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"This leaves us with hundreds of processed doc pages. Let's take a look at the format each one contains:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"docs[0]"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"We access the plaintext page content like so:"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {
"id": "OcIkny_6xiZJ"
},
"outputs": [],
"source": [
"print(docs[0].page_content)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(docs[5].page_content)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also find the source of each document:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"docs[5].metadata['source'].replace('rtdocs/', 'https://')"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "F0tUQRxtzqF0",
"outputId": "a7a9b799-98cb-41a2-a696-fc9f00579773"
},
"outputs": [
{
"data": {
"text/plain": [
"{'url': 'https://langchain.readthedocs.io/en/latest/modules/memory/types/entity_summary_memory.html',\n",
" 'text': '.ipynb .pdf Entity Memory Contents Using in a chain Inspecting the memory store Entity Memory# This notebook shows how to work with a memory module that remembers things about specific entities. It extracts information on entities (using LLMs) and builds up its knowledge about that entity over time (also using LLMs). Let’s first walk through using this functionality. from langchain.llms import OpenAI from langchain.memory import ConversationEntityMemory llm = OpenAI(temperature=0) memory = ConversationEntityMemory(llm=llm) _input = {\"input\": \"Deven & Sam are working on a hackathon project\"} memory.load_memory_variables(_input) memory.save_context( _input, {\"ouput\": \" That sounds like a great project! What kind of project are they working on?\"} ) memory.load_memory_variables({\"input\": \\'who is Sam\\'}) {\\'history\\': \\'Human: Deven & Sam are working on a hackathon project\\\\nAI: That sounds like a great project! What kind of project are they working on?\\', \\'entities\\': {\\'Sam\\': \\'Sam is working on a hackathon project with Deven.\\'}} memory = ConversationEntityMemory(llm=llm, return_messages=True) _input = {\"input\": \"Deven & Sam are working on a hackathon project\"} memory.load_memory_variables(_input) memory.save_context( _input, {\"ouput\": \" That sounds like a great project! What kind of project are they working on?\"} ) memory.load_memory_variables({\"input\": \\'who is Sam\\'}) {\\'history\\': [HumanMessage(content=\\'Deven & Sam are working on a hackathon project\\', additional_kwargs={}), AIMessage(content=\\' That sounds like a great project! What kind of project are they working on?\\', additional_kwargs={})], \\'entities\\': {\\'Sam\\': \\'Sam is working on a hackathon project with Deven.\\'}} Using in a chain# Let’s now use it in a chain! from langchain.chains import ConversationChain from langchain.memory import ConversationEntityMemory from langchain.memory.prompt import ENTITY_MEMORY_CONVERSATION_TEMPLATE from pydantic import BaseModel from typing import List, Dict, Any conversation = ConversationChain( llm=llm, verbose=True, prompt=ENTITY_MEMORY_CONVERSATION_TEMPLATE, memory=ConversationEntityMemory(llm=llm) ) conversation.predict(input=\"Deven & Sam are working on a hackathon project\") > Entering new ConversationChain chain... Prompt after formatting: You are an assistant to a human, powered by a large language model trained by OpenAI. You are designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, you are able to generate human-like text based on the input you receive, allowing you to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand. You are constantly learning and improving, and your capabilities are constantly evolving. You are able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. You have access to some personalized information provided by the human in the Context section below. Additionally, you are able to generate your own text based on the input you receive, allowing you to engage in discussions and provide explanations and descriptions on a wide range of topics. Overall, you are a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether the human needs help with a specific question or just wants to have a conversation about a particular topic, you are here to assist. Context: {\\'Deven\\': \\'\\', \\'Sam\\': \\'\\'} Current conversation: Last line: Human: Deven & Sam are working on a hackathon project You: > Finished chain. \\' That sounds like a great project! What kind of project are they working on?\\' conversation.memory.store {\\'Deven\\': \\'Deven is working on a hackathon project with Sam.\\', \\'Sam\\': \\'Sam is working on a hackathon project with Deven.\\'} conversation.predict(input=\"They are trying to add more complex memory structures to Langchain\") > Entering new ConversationChain chain... Prompt after formatting: You are an assistant to a human, powered by a large language model trained by OpenAI. You are designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, you are able to generate human-like text based on the input you receive, allowing you to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand. You are constantly learning and improving, and your capabilities are constantly evolving. You are able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. You have access to some personalized information provided by the human in the Context section below. Additionally, you are able to generate your own text based on the input you receive, allowing you to engage in discussions and provide explanations and descriptions on a wide range of topics. Overall, you are a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether the human needs help with a specific question or just wants to have a conversation about a particular topic, you are here to assist. Context: {\\'Deven\\': \\'Deven is working on a hackathon project with Sam.\\', \\'Sam\\': \\'Sam is working on a hackathon project with Deven.\\', \\'Langchain\\': \\'\\'} Current conversation: Human: Deven & Sam are working on a hackathon project AI: That sounds like a great project! What kind of project are they working on? Last line: Human: They are trying to add more complex memory structures to Langchain You: > Finished chain. \\' That sounds like an interesting project! What kind of memory structures are they trying to add?\\' conversation.predict(input=\"They are adding in a key-value store for entities mentioned so far in the conversation.\") > Entering new ConversationChain chain... Prompt after formatting: You are an assistant to a human, powered by a large language model trained by OpenAI. You are designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, you are able to generate human-like text based on the input you receive, allowing you to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand. You are constantly learning and improving, and your capabilities are constantly evolving. You are able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. You have access to some personalized information provided by the human in the Context section below. Additionally, you are able to generate your own text based on the input you receive, allowing you to engage in discussions and provide explanations and descriptions on a wide range of topics. Overall, you are a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether the human needs help with a specific question or just wants to have a conversation about a particular topic, you are here to assist. Context: {\\'Deven\\': \\'Deven is working on a hackathon project with Sam, attempting to add more complex memory structures to Langchain.\\', \\'Sam\\': \\'Sam is working on a hackathon project with Deven, trying to add more complex memory structures to Langchain.\\', \\'Langchain\\': \\'Langchain is a project that is trying to add more complex memory structures.\\', \\'Key-Value Store\\': \\'\\'} Current conversation: Human: Deven & Sam are working on a hackathon project AI: That sounds like a great project! What kind of project are they working on? Human: They are trying to add more complex memory structures to Langchain AI: That sounds like an interesting project! What kind of memory structures are they trying to add? Last line: Human: They are adding in a key-value store for entities mentioned so far in the conversation. You: > Finished chain. \\' That sounds like a great idea! How will the key-value store work?\\' conversation.predict(input=\"What do you know about Deven & Sam?\") > Entering new ConversationChain chain... Prompt after formatting: You are an assistant to a human, powered by a large language model trained by OpenAI. You are designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, you are able to generate human-like text based on the input you receive, allowing you to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand. You are constantly learning and improving, and your capabilities are constantly evolving. You are able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. You have access to some personalized information provided by the human in the Context section below. Additionally, you are able to generate your own text based on the input you receive, allowing you to engage in discussions and provide explanations and descriptions on a wide range of topics. Overall, you are a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether the human needs help with a specific question or just wants to have a conversation about a particular topic, you are here to assist. Context: {\\'Deven\\': \\'Deven is working on a hackathon project with Sam, attempting to add more complex memory structures to Langchain, including a key-value store for entities mentioned so far in the conversation.\\', \\'Sam\\': \\'Sam is working on a hackathon project with Deven, trying to add more complex memory structures to Langchain, including a key-value store for entities mentioned so far in the conversation.\\'} Current conversation: Human: Deven & Sam are working on a hackathon project AI: That sounds like a great project! What kind of project are they working on? Human: They are trying to add more complex memory structures to Langchain AI: That sounds like an interesting project! What kind of memory structures are they trying to add? Human: They are adding in a key-value store for entities mentioned so far in the conversation. AI: That sounds like a great idea! How will the key-value store work? Last line: Human: What do you know about Deven & Sam? You: > Finished chain. \\' Deven and Sam are working on a hackathon project together, attempting to add more complex memory structures to Langchain, including a key-value store for entities mentioned so far in the conversation.\\' Inspecting the memory store# We can also inspect the memory store directly. In the following examaples, we look at it directly, and then go through some examples of adding information and watch how it changes. from pprint import pprint pprint(conversation.memory.store) {\\'Deven\\': \\'Deven is working on a hackathon project with Sam, attempting to add \\' \\'more complex memory structures to Langchain, including a key-value \\' \\'store for entities mentioned so far in the conversation.\\', \\'Key-Value Store\\': \\'A key-value store that stores entities mentioned in the \\' \\'conversation.\\', \\'Langchain\\': \\'Langchain is a project that is trying to add more complex \\' \\'memory structures, including a key-value store for entities \\' \\'mentioned so far in the conversation.\\', \\'Sam\\': \\'Sam is working on a hackathon project with Deven, attempting to add \\' \\'more complex memory structures to Langchain, including a key-value \\' \\'store for entities mentioned so far in the conversation.\\'} conversation.predict(input=\"Sam is the founder of a company called Daimon.\") > Entering new ConversationChain chain... Prompt after formatting: You are an assistant to a human, powered by a large language model trained by OpenAI. You are designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, you are able to generate human-like text based on the input you receive, allowing you to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand. You are constantly learning and improving, and your capabilities are constantly evolving. You are able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. You have access to some personalized information provided by the human in the Context section below. Additionally, you are able to generate your own text based on the input you receive, allowing you to engage in discussions and provide explanations and descriptions on a wide range of topics. Overall, you are a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether the human needs help with a specific question or just wants to have a conversation about a particular topic, you are here to assist. Context: {\\'Daimon\\': \\'\\', \\'Sam\\': \\'Sam is working on a hackathon project with Deven to add more complex memory structures to Langchain, including a key-value store for entities mentioned so far in the conversation.\\'} Current conversation: Human: They are trying to add more complex memory structures to Langchain AI: That sounds like an interesting project! What kind of memory structures are they trying to add? Human: They are adding in a key-value store for entities mentioned so far in the conversation. AI: That sounds like a great idea! How will the key-value store work? Human: What do you know about Deven & Sam? AI: Deven and Sam are working on a hackathon project to add more complex memory structures to Langchain, including a key-value store for entities mentioned so far in the conversation. They seem to be very motivated and passionate about their project, and are working hard to make it a success. Last line: Human: Sam is the founder of a company called Daimon. You: > Finished chain. \"\\\\nThat\\'s impressive! It sounds like Sam is a very successful entrepreneur. What kind of company is Daimon?\" from pprint import pprint pprint(conversation.memory.store) {\\'Daimon\\': \\'Daimon is a company founded by Sam.\\', \\'Deven\\': \\'Deven is working on a hackathon project with Sam to add more \\' \\'complex memory structures to Langchain, including a key-value store \\' \\'for entities mentioned so far in the conversation.\\', \\'Key-Value Store\\': \\'Key-Value Store: A data structure that stores values \\' \\'associated with a unique key, allowing for efficient \\' \\'retrieval of values. Deven and Sam are adding a key-value \\' \\'store for entities mentioned so far in the conversation.\\', \\'Langchain\\': \\'Langchain is a project that seeks to add more complex memory \\' \\'structures, including a key-value store for entities mentioned \\' \\'so far in the conversation.\\', \\'Sam\\': \\'Sam is working on a hackathon project with Deven to add more complex \\' \\'memory structures to Langchain, including a key-value store for \\' \\'entities mentioned so far in the conversation. He is also the founder \\' \\'of a company called Daimon.\\'} conversation.predict(input=\"What do you know about Sam?\") > Entering new ConversationChain chain... Prompt after formatting: You are an assistant to a human, powered by a large language model trained by OpenAI. You are designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, you are able to generate human-like text based on the input you receive, allowing you to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand. You are constantly learning and improving, and your capabilities are constantly evolving. You are able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. You have access to some personalized information provided by the human in the Context section below. Additionally, you are able to generate your own text based on the input you receive, allowing you to engage in discussions and provide explanations and descriptions on a wide range of topics. Overall, you are a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether the human needs help with a specific question or just wants to have a conversation about a particular topic, you are here to assist. Context: {\\'Sam\\': \\'Sam is working on a hackathon project with Deven to add more complex memory structures to Langchain, including a key-value store for entities mentioned so far in the conversation. He is also the founder of a company called Daimon.\\', \\'Daimon\\': \\'Daimon is a company founded by Sam.\\'} Current conversation: Human: They are adding in a key-value store for entities mentioned so far in the conversation. AI: That sounds like a great idea! How will the key-value store work? Human: What do you know about Deven & Sam? AI: Deven and Sam are working on a hackathon project to add more complex memory structures to Langchain, including a key-value store for entities mentioned so far in the conversation. They seem to be very motivated and passionate about their project, and are working hard to make it a success. Human: Sam is the founder of a company called Daimon. AI: That\\'s impressive! It sounds like Sam is a very successful entrepreneur. What kind of company is Daimon? Last line: Human: What do you know about Sam? You: > Finished chain. \\' Sam is the founder of a company called Daimon. He is also working on a hackathon project with Deven to add more complex memory structures to Langchain, including a key-value store for entities mentioned so far in the conversation. He seems to be very motivated and passionate about his project, and is working hard to make it a success.\\' previous ConversationBufferWindowMemory next Conversation Knowledge Graph Memory Contents Using in a chain Inspecting the memory store By Harrison Chase © Copyright 2022, Harrison Chase. Last updated on Mar 15, 2023.'}"
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data[3]"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "ouY4rcx7z2oa"
},
"source": [
"It's pretty ugly but it's good enough for now. Let's see how we can process all of these. We will chunk everything into ~400 token chunks, we can do this easily with `langchain` and `tiktoken`:"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {
"id": "Rb7KxUqYzsuV"
},
"outputs": [],
"source": [
"import tiktoken\n",
"\n",
"tokenizer = tiktoken.get_encoding('p50k_base')\n",
"\n",
"# create the length function\n",
"def tiktoken_len(text):\n",
" tokens = tokenizer.encode(\n",
" text,\n",
" disallowed_special=()\n",
" )\n",
" return len(tokens)"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {
"id": "OKO8e3Dp0dQS"
},
"outputs": [],
"source": [
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
"\n",
"text_splitter = RecursiveCharacterTextSplitter(\n",
" chunk_size=400,\n",
" chunk_overlap=20,\n",
" length_function=tiktoken_len,\n",
" separators=[\"\\n\\n\", \"\\n\", \" \", \"\"]\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "bLdvW8eq06Zd"
},
"source": [
"Process the `data` into more chunks using this approach."
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 49,
"referenced_widgets": [
"fe4aa5160ef74ecd820f9c6f7de035a0",
"8f0edead487948358efe478a01316209",
"bcd3e274345a44dc950890c1fa1026a7",
"153f898146264d50b77b5ef23db92408",
"30d7236e54844058b7c76404e1f2ccb8",
"348fe1bd02ed4dca8df422031d1184f6",
"157f79e1ecf0423393cb15dcd2e66996",
"a43a7db5cab149b3a8aeef23d3fb936f",
"865d0f1e70cc4889aaf91cf1ad82b909",
"1bed5d4ebf054e80b4d63d6f8a2593d8",
"04fd6e9cebaa4c9287d16cb8c861c8a3"
]
},
"id": "uOdPyiAQ0uWs",
"outputId": "b1d00544-a432-4a41-a68e-59a0f9825070"
},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "fe4aa5160ef74ecd820f9c6f7de035a0",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/231 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from uuid import uuid4\n",
"from tqdm.auto import tqdm\n",
"\n",
"chunks = []\n",
"\n",
"for idx, record in enumerate(tqdm(data)):\n",
" texts = text_splitter.split_text(record['text'])\n",
" chunks.extend([{\n",
" 'id': str(uuid4()),\n",
" 'text': texts[i],\n",
" 'chunk': i,\n",
" 'url': record['url']\n",
" } for i in range(len(texts))])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "JegURaAg2PuN"
},
"source": [
"Our chunks are ready so now we move onto embedding and indexing everything."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "zGIZbQqJ2WBh"
},
"source": [
"## Initialize Embedding Model\n",
"\n",
"We use `text-embedding-ada-002` as the embedding model. We can embed text like so:"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {
"id": "kteZ69Z5M55S"
},
"outputs": [],
"source": [
"import openai\n",
"\n",
"# initialize openai API key\n",
"openai.api_key = \"sk-...\" #platform.openai.com\n",
"\n",
"embed_model = \"text-embedding-ada-002\"\n",
"\n",
"res = openai.Embedding.create(\n",
" input=[\n",
" \"Sample document text goes here\",\n",
" \"there will be several phrases in each batch\"\n",
" ], engine=embed_model\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "aNZ7IWekNLbu"
},
"source": [
"In the response `res` we will find a JSON-like object containing our new embeddings within the `'data'` field."
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "esagZj6iNLPZ",
"outputId": "0e8b59d7-6c26-4fbf-e093-56d35aa18ab5"
},
"outputs": [
{
"data": {
"text/plain": [
"dict_keys(['object', 'data', 'model', 'usage'])"
]
},
"execution_count": 67,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"res.keys()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "zStnHFpkNVIU"
},
"source": [
"Inside `'data'` we will find two records, one for each of the two sentences we just embedded. Each vector embedding contains `1536` dimensions (the output dimensionality of the `text-embedding-ada-002` model."
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "uVoP9VcINWAC",
"outputId": "36329c98-2191-4e3d-b064-60af9b905dae"
},
"outputs": [
{
"data": {
"text/plain": [
"2"
]
},
"execution_count": 68,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(res['data'])"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "s-zraDCjNeC6",
"outputId": "a7c79486-6fb4-4bc6-efec-320e9f525766"
},
"outputs": [
{
"data": {
"text/plain": [
"(1536, 1536)"
]
},
"execution_count": 69,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(res['data'][0]['embedding']), len(res['data'][1]['embedding'])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "XPd41MjANhmp"
},
"source": [
"We will apply this same embedding logic to the langchain docs dataset we've just scraped. But before doing so we must create a place to store the embeddings."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "WPi4MZvMNvUH"
},
"source": [
"## Initializing the Index"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "H5RRQArrN2lN"
},
"source": [
"Now we need a place to store these embeddings and enable a efficient vector search through them all. To do that we use Pinecone, we can get a [free API key](https://app.pinecone.io/) and enter it below where we will initialize our connection to Pinecone and create a new index."
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "EO8sbJFZNyIZ",
"outputId": "f2d2efca-65be-47ea-ab1d-1dab2786a6b9"
},
"outputs": [
{
"data": {
"text/plain": [
"{'dimension': 1536,\n",
" 'index_fullness': 0.0,\n",
" 'namespaces': {},\n",
" 'total_vector_count': 0}"
]
},
"execution_count": 70,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pinecone\n",
"\n",
"index_name = 'gpt-4-langchain-docs'\n",
"\n",
"# initialize connection to pinecone\n",
"pinecone.init(\n",
" api_key=\"PINECONE_API_KEY\", # app.pinecone.io (console)\n",
" environment=\"PINECONE_ENVIRONMENT\" # next to API key in console\n",
")\n",
"\n",
"# check if index already exists (it shouldn't if this is first time)\n",
"if index_name not in pinecone.list_indexes():\n",
" # if does not exist, create index\n",
" pinecone.create_index(\n",
" index_name,\n",
" dimension=len(res['data'][0]['embedding']),\n",
" metric='dotproduct'\n",
" )\n",
"# connect to index\n",
"index = pinecone.GRPCIndex(index_name)\n",
"# view index stats\n",
"index.describe_index_stats()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ezSTzN2rPa2o"
},
"source": [
"We can see the index is currently empty with a `total_vector_count` of `0`. We can begin populating it with OpenAI `text-embedding-ada-002` built embeddings like so:"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 49,
"referenced_widgets": [
"b6b5865b02504e10a020ad5f42241df6",
"b1489d5d6c1f498fadaea8aeb16ab60f",
"90aa35cbdf0c45a1bb7b9075c48a6f7d",
"70eaf7d1a5b24e49a32490fd3a75ea15",
"144e2e3c8a014c549e0f552a64a670ef",
"e5b49411f2134a9b9649528314f746d6",
"7d78613ce91b4427a4afacb699ef031e",
"1b93eb9d358041ab99fe87045f7f0660",
"af4f336bfcb446afb9e6a513d49d791f",
"a9552f4dca1642e2924ee152067f1f3d",
"c82f8fbcef0648489f1dcbb4af5ea8c4"
]
},
"id": "iZbFbulAPeop",
"outputId": "a017780a-19d0-4e6f-a68c-529c0c96e4f8"
},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "b6b5865b02504e10a020ad5f42241df6",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/12 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from tqdm.auto import tqdm\n",
"import datetime\n",
"from time import sleep\n",
"\n",
"batch_size = 100 # how many embeddings we create and insert at once\n",
"\n",
"for i in tqdm(range(0, len(chunks), batch_size)):\n",
" # find end of batch\n",
" i_end = min(len(chunks), i+batch_size)\n",
" meta_batch = chunks[i:i_end]\n",
" # get ids\n",
" ids_batch = [x['id'] for x in meta_batch]\n",
" # get texts to encode\n",
" texts = [x['text'] for x in meta_batch]\n",
" # create embeddings (try-except added to avoid RateLimitError)\n",
" try:\n",
" res = openai.Embedding.create(input=texts, engine=embed_model)\n",
" except:\n",
" done = False\n",
" while not done:\n",
" sleep(5)\n",
" try:\n",
" res = openai.Embedding.create(input=texts, engine=embed_model)\n",
" done = True\n",
" except:\n",
" pass\n",
" embeds = [record['embedding'] for record in res['data']]\n",
" # cleanup metadata\n",
" meta_batch = [{\n",
" 'text': x['text'],\n",
" 'chunk': x['chunk'],\n",
" 'url': x['url']\n",
" } for x in meta_batch]\n",
" to_upsert = list(zip(ids_batch, embeds, meta_batch))\n",
" # upsert to Pinecone\n",
" index.upsert(vectors=to_upsert)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "YttJOrEtQIF9"
},
"source": [
"Now we've added all of our langchain docs to the index. With that we can move on to retrieval and then answer generation using GPT-4."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "FumVmMRlQQ7w"
},
"source": [
"## Retrieval"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "nLRODeL-QTJ9"
},
"source": [
"To search through our documents we first need to create a query vector `xq`. Using `xq` we will retrieve the most relevant chunks from the LangChain docs, like so:"
]
},
{
"cell_type": "code",
"execution_count": 83,
"metadata": {
"id": "FMUPdX9cQQYC"
},
"outputs": [],
"source": [
"query = \"how do I use the LLMChain in LangChain?\"\n",
"\n",
"res = openai.Embedding.create(\n",
" input=[query],\n",
" engine=embed_model\n",
")\n",
"\n",
"# retrieve from Pinecone\n",
"xq = res['data'][0]['embedding']\n",
"\n",
"# get relevant contexts (including the questions)\n",
"res = index.query(xq, top_k=5, include_metadata=True)"
]
},
{
"cell_type": "code",
"execution_count": 84,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "zl9SrFPkQjg-",
"outputId": "86a8c598-15d1-4ad1-db32-6db2b3e0af1e"
},
"outputs": [
{
"data": {
"text/plain": [
"{'matches': [{'id': '1fec660b-9937-4f7e-9692-280c8cc7ce0d',\n",
" 'metadata': {'chunk': 0.0,\n",
" 'text': '.rst .pdf Chains Chains# Using an LLM in '\n",
" 'isolation is fine for some simple '\n",
" 'applications, but many more complex ones '\n",
" 'require chaining LLMs - either with each '\n",
" 'other or with other experts. LangChain '\n",
" 'provides a standard interface for Chains, '\n",
" 'as well as some common implementations of '\n",
" 'chains for ease of use. The following '\n",
" 'sections of documentation are provided: '\n",
" 'Getting Started: A getting started guide '\n",
" 'for chains, to get you up and running '\n",
" 'quickly. Key Concepts: A conceptual guide '\n",
" 'going over the various concepts related to '\n",
" 'chains. How-To Guides: A collection of '\n",
" 'how-to guides. These highlight how to use '\n",
" 'various types of chains. Reference: API '\n",
" 'reference documentation for all Chain '\n",
" 'classes. previous Vector DB Text '\n",
" 'Generation next Getting Started By '\n",
" 'Harrison Chase © Copyright 2022, Harrison '\n",
" 'Chase. Last updated on Mar 15, 2023.',\n",
" 'url': 'https://langchain.readthedocs.io/en/latest/modules/chains.html'},\n",
" 'score': 0.8848499,\n",
" 'sparse_values': {'indices': [], 'values': []},\n",
" 'values': []},\n",
" {'id': 'fe48438d-228a-4e0e-b41e-5cb5c6ba1482',\n",
" 'metadata': {'chunk': 0.0,\n",
" 'text': '.rst .pdf LLMs LLMs# Large Language Models '\n",
" '(LLMs) are a core component of LangChain. '\n",
" 'LangChain is not a provider of LLMs, but '\n",
" 'rather provides a standard interface '\n",
" 'through which you can interact with a '\n",
" 'variety of LLMs. The following sections of '\n",
" 'documentation are provided: Getting '\n",
" 'Started: An overview of all the '\n",
" 'functionality the LangChain LLM class '\n",
" 'provides. Key Concepts: A conceptual guide '\n",
" 'going over the various concepts related to '\n",
" 'LLMs. How-To Guides: A collection of '\n",
" 'how-to guides. These highlight how to '\n",
" 'accomplish various objectives with our LLM '\n",
" 'class, as well as how to integrate with '\n",
" 'various LLM providers. Reference: API '\n",
" 'reference documentation for all LLM '\n",
" 'classes. previous Example Selector next '\n",
" 'Getting Started By Harrison Chase © '\n",
" 'Copyright 2022, Harrison Chase. Last '\n",
" 'updated on Mar 15, 2023.',\n",
" 'url': 'https://langchain.readthedocs.io/en/latest/modules/llms.html'},\n",
" 'score': 0.8595519,\n",
" 'sparse_values': {'indices': [], 'values': []},\n",
" 'values': []},\n",
" {'id': '60df5bff-5f79-46ee-9456-534d42f6a94e',\n",
" 'metadata': {'chunk': 0.0,\n",
" 'text': '.ipynb .pdf Getting Started Contents Why '\n",
" 'do we need chains? Query an LLM with the '\n",
" 'LLMChain Combine chains with the '\n",
" 'SequentialChain Create a custom chain with '\n",
" 'the Chain class Getting Started# In this '\n",
" 'tutorial, we will learn about creating '\n",
" 'simple chains in LangChain. We will learn '\n",
" 'how to create a chain, add components to '\n",
" 'it, and run it. In this tutorial, we will '\n",
" 'cover: Using a simple LLM chain Creating '\n",
" 'sequential chains Creating a custom chain '\n",
" 'Why do we need chains?# Chains allow us to '\n",
" 'combine multiple components together to '\n",
" 'create a single, coherent application. For '\n",
" 'example, we can create a chain that takes '\n",
" 'user input, formats it with a '\n",
" 'PromptTemplate, and then passes the '\n",
" 'formatted response to an LLM. We can build '\n",
" 'more complex chains by combining multiple '\n",
" 'chains together, or by combining chains '\n",
" 'with other components. Query an LLM with '\n",
" 'the LLMChain# The LLMChain is a simple '\n",
" 'chain that takes in a prompt template, '\n",
" 'formats it with the user input and returns '\n",
" 'the response from an LLM. To use the '\n",
" 'LLMChain, first create a prompt template. '\n",
" 'from langchain.prompts import '\n",
" 'PromptTemplate from langchain.llms import '\n",
" 'OpenAI llm = OpenAI(temperature=0.9) '\n",
" 'prompt = PromptTemplate( '\n",
" 'input_variables=[\"product\"], '\n",
" 'template=\"What is a good',\n",
" 'url': 'https://langchain.readthedocs.io/en/latest/modules/chains/getting_started.html'},\n",
" 'score': 0.8462403,\n",
" 'sparse_values': {'indices': [], 'values': []},\n",
" 'values': []},\n",
" {'id': '2f11beb1-3935-447e-b565-b20383dc4544',\n",
" 'metadata': {'chunk': 1.0,\n",
" 'text': 'chain first uses a LLM to construct the '\n",
" 'url to hit, then makes that request with '\n",
" 'the Requests wrapper, and finally runs '\n",
" 'that result through the language model '\n",
" 'again in order to product a natural '\n",
" 'language response. Example Notebook '\n",
" 'LLMBash Chain Links Used: BashProcess, '\n",
" 'LLMChain Notes: This chain takes user '\n",
" 'input (a question), uses an LLM chain to '\n",
" 'convert it to a bash command to run in the '\n",
" 'terminal, and then returns that as the '\n",
" 'result. Example Notebook LLMChecker Chain '\n",
" 'Links Used: LLMChain Notes: This chain '\n",
" 'takes user input (a question), uses an LLM '\n",
" 'chain to answer that question, and then '\n",
" 'uses other LLMChains to self-check that '\n",
" 'answer. Example Notebook LLMRequests Chain '\n",
" 'Links Used: Requests, LLMChain Notes: This '\n",
" 'chain takes a URL and other inputs, uses '\n",
" 'Requests to get the data at that URL, and '\n",
" 'then passes that along with the other '\n",
" 'inputs into an LLMChain to generate a '\n",
" 'response. The example included shows how '\n",
" 'to ask a question to Google - it firsts '\n",
" 'constructs a Google url, then fetches the '\n",
" 'data there, then passes that data + the '\n",
" 'original question into an LLMChain to get '\n",
" 'an answer. Example Notebook Moderation '\n",
" 'Chain Links Used: LLMChain, '\n",
" 'ModerationChain Notes: This chain shows '\n",
" 'how to use OpenAI’s content',\n",
" 'url': 'https://langchain.readthedocs.io/en/latest/modules/chains/utility_how_to.html'},\n",
" 'score': 0.8451743,\n",
" 'sparse_values': {'indices': [], 'values': []},\n",
" 'values': []},\n",
" {'id': 'f3ed41eb-063c-407f-bdaa-706a8c6a2091',\n",
" 'metadata': {'chunk': 1.0,\n",
" 'text': 'Prompts: This includes prompt management, '\n",
" 'prompt optimization, and prompt '\n",
" 'serialization. LLMs: This includes a '\n",
" 'generic interface for all LLMs, and common '\n",
" 'utilities for working with LLMs. Document '\n",
" 'Loaders: This includes a standard '\n",
" 'interface for loading documents, as well '\n",
" 'as specific integrations to all types of '\n",
" 'text data sources. Utils: Language models '\n",
" 'are often more powerful when interacting '\n",
" 'with other sources of knowledge or '\n",
" 'computation. This can include Python '\n",
" 'REPLs, embeddings, search engines, and '\n",
" 'more. LangChain provides a large '\n",
" 'collection of common utils to use in your '\n",
" 'application. Chains: Chains go beyond just '\n",
" 'a single LLM call, and are sequences of '\n",
" 'calls (whether to an LLM or a different '\n",
" 'utility). LangChain provides a standard '\n",
" 'interface for chains, lots of integrations '\n",
" 'with other tools, and end-to-end chains '\n",
" 'for common applications. Indexes: Language '\n",
" 'models are often more powerful when '\n",
" 'combined with your own text data - this '\n",
" 'module covers best practices for doing '\n",
" 'exactly that. Agents: Agents involve an '\n",
" 'LLM making decisions about which Actions '\n",
" 'to take, taking that Action, seeing an '\n",
" 'Observation, and repeating that until '\n",
" 'done. LangChain provides a standard '\n",
" 'interface for agents, a selection of '\n",
" 'agents to choose from, and examples of end '\n",
" 'to end agents. Memory: Memory is the',\n",
" 'url': 'https://langchain.readthedocs.io/en/latest/'},\n",
" 'score': 0.84271824,\n",
" 'sparse_values': {'indices': [], 'values': []},\n",
" 'values': []}],\n",
" 'namespace': ''}"
]
},
"execution_count": 84,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"res"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "MoBSiDLIUADZ"
},
"source": [
"With retrieval complete, we move on to feeding these into GPT-4 to produce answers."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "qfzS4-6-UXgX"
},
"source": [
"## Retrieval Augmented Generation"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "XPC1jQaKUcy0"
},
"source": [
"GPT-4 is currently accessed via the `ChatCompletions` endpoint of OpenAI. To add the information we retrieved into the model, we need to pass it into our user prompts *alongside* our original query. We can do that like so:"
]
},
{
"cell_type": "code",
"execution_count": 85,
"metadata": {
"id": "unZstoHNUHeG"
},
"outputs": [],
"source": [
"# get list of retrieved text\n",
"contexts = [item['metadata']['text'] for item in res['matches']]\n",
"\n",
"augmented_query = \"\\n\\n---\\n\\n\".join(contexts)+\"\\n\\n-----\\n\\n\"+query"
]
},
{
"cell_type": "code",
"execution_count": 87,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "LRcEHm0Z9fXE",
"outputId": "872a7f7e-2001-44b5-ff44-cb045365a515"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
".rst .pdf Chains Chains# Using an LLM in isolation is fine for some simple applications, but many more complex ones require chaining LLMs - either with each other or with other experts. LangChain provides a standard interface for Chains, as well as some common implementations of chains for ease of use. The following sections of documentation are provided: Getting Started: A getting started guide for chains, to get you up and running quickly. Key Concepts: A conceptual guide going over the various concepts related to chains. How-To Guides: A collection of how-to guides. These highlight how to use various types of chains. Reference: API reference documentation for all Chain classes. previous Vector DB Text Generation next Getting Started By Harrison Chase © Copyright 2022, Harrison Chase. Last updated on Mar 15, 2023.\n",
"\n",
"---\n",
"\n",
".rst .pdf LLMs LLMs# Large Language Models (LLMs) are a core component of LangChain. LangChain is not a provider of LLMs, but rather provides a standard interface through which you can interact with a variety of LLMs. The following sections of documentation are provided: Getting Started: An overview of all the functionality the LangChain LLM class provides. Key Concepts: A conceptual guide going over the various concepts related to LLMs. How-To Guides: A collection of how-to guides. These highlight how to accomplish various objectives with our LLM class, as well as how to integrate with various LLM providers. Reference: API reference documentation for all LLM classes. previous Example Selector next Getting Started By Harrison Chase © Copyright 2022, Harrison Chase. Last updated on Mar 15, 2023.\n",
"\n",
"---\n",
"\n",
".ipynb .pdf Getting Started Contents Why do we need chains? Query an LLM with the LLMChain Combine chains with the SequentialChain Create a custom chain with the Chain class Getting Started# In this tutorial, we will learn about creating simple chains in LangChain. We will learn how to create a chain, add components to it, and run it. In this tutorial, we will cover: Using a simple LLM chain Creating sequential chains Creating a custom chain Why do we need chains?# Chains allow us to combine multiple components together to create a single, coherent application. For example, we can create a chain that takes user input, formats it with a PromptTemplate, and then passes the formatted response to an LLM. We can build more complex chains by combining multiple chains together, or by combining chains with other components. Query an LLM with the LLMChain# The LLMChain is a simple chain that takes in a prompt template, formats it with the user input and returns the response from an LLM. To use the LLMChain, first create a prompt template. from langchain.prompts import PromptTemplate from langchain.llms import OpenAI llm = OpenAI(temperature=0.9) prompt = PromptTemplate( input_variables=[\"product\"], template=\"What is a good\n",
"\n",
"---\n",
"\n",
"chain first uses a LLM to construct the url to hit, then makes that request with the Requests wrapper, and finally runs that result through the language model again in order to product a natural language response. Example Notebook LLMBash Chain Links Used: BashProcess, LLMChain Notes: This chain takes user input (a question), uses an LLM chain to convert it to a bash command to run in the terminal, and then returns that as the result. Example Notebook LLMChecker Chain Links Used: LLMChain Notes: This chain takes user input (a question), uses an LLM chain to answer that question, and then uses other LLMChains to self-check that answer. Example Notebook LLMRequests Chain Links Used: Requests, LLMChain Notes: This chain takes a URL and other inputs, uses Requests to get the data at that URL, and then passes that along with the other inputs into an LLMChain to generate a response. The example included shows how to ask a question to Google - it firsts constructs a Google url, then fetches the data there, then passes that data + the original question into an LLMChain to get an answer. Example Notebook Moderation Chain Links Used: LLMChain, ModerationChain Notes: This chain shows how to use OpenAI’s content\n",
"\n",
"---\n",
"\n",
"Prompts: This includes prompt management, prompt optimization, and prompt serialization. LLMs: This includes a generic interface for all LLMs, and common utilities for working with LLMs. Document Loaders: This includes a standard interface for loading documents, as well as specific integrations to all types of text data sources. Utils: Language models are often more powerful when interacting with other sources of knowledge or computation. This can include Python REPLs, embeddings, search engines, and more. LangChain provides a large collection of common utils to use in your application. Chains: Chains go beyond just a single LLM call, and are sequences of calls (whether to an LLM or a different utility). LangChain provides a standard interface for chains, lots of integrations with other tools, and end-to-end chains for common applications. Indexes: Language models are often more powerful when combined with your own text data - this module covers best practices for doing exactly that. Agents: Agents involve an LLM making decisions about which Actions to take, taking that Action, seeing an Observation, and repeating that until done. LangChain provides a standard interface for agents, a selection of agents to choose from, and examples of end to end agents. Memory: Memory is the\n",
"\n",
"-----\n",
"\n",
"how do I use the LLMChain in LangChain?\n"
]
}
],
"source": [
"print(augmented_query)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "sihH_GMiV5_p"
},
"source": [
"Now we ask the question:"
]
},
{
"cell_type": "code",
"execution_count": 88,
"metadata": {
"id": "IThBqBi8V70d"
},
"outputs": [],
"source": [
"# system message to 'prime' the model\n",
"primer = f\"\"\"You are Q&A bot. A highly intelligent system that answers\n",
"user questions based on the information provided by the user above\n",
"each question. If the information can not be found in the information\n",
"provided by the user you truthfully say \"I don't know\".\n",
"\"\"\"\n",