#### Databrick's Free Dolly with LangChain

- To use the pipeline with LangChain, you must set return_full_text=True, as LangChain expects the full text to be returned and the default for the pipeline is to only return the new text.

##### Main Use Cases of LangChain

- Summarization - Express the most important facts about a body of text or chat interaction

- Question and Answering Over Documents - Use information held within documents to answer questions or query

- Extraction - Pull structured data from a body of text or an user query

- Evaluation - Understand the quality of output from your application

- Querying Tabular Data - Pull data from databases or other tabular source

- Code Understanding - Reason about and digest code

- Interacting with APIs - Query APIs and interact with the outside world

- Chatbots - A framework to have a back and forth interaction with a user combined with memory in a chat interface

- Agents - Use LLMs to make decisions about what to do next. Enable these decisions with tools.



In [None]:
!pip install --upgrade pip

!pip install "accelerate>=0.16.0,<1" "transformers[torch]>=4.28.1,<5" "torch>=1.13.1,<2"

!pip install langchain

!pip install unstructured
!pip install "unstructured[pdf]"


[0m

In [None]:
#!pip install langchain>=0.0.139

In [None]:
# download CV data
!wget https://btcampdata.s3.amazonaws.com/gen-ai-data/Private-Data.zip
!unzip Private-Data.zip

--2023-12-11 15:43:18--  https://btcampdata.s3.amazonaws.com/gen-ai-data/Private-Data.zip
Resolving btcampdata.s3.amazonaws.com (btcampdata.s3.amazonaws.com)... 52.219.95.28, 16.12.65.220, 52.219.92.76, ...
Connecting to btcampdata.s3.amazonaws.com (btcampdata.s3.amazonaws.com)|52.219.95.28|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 548959 (536K) [application/zip]
Saving to: ‘Private-Data.zip’


2023-12-11 15:43:18 (2.18 MB/s) - ‘Private-Data.zip’ saved [548959/548959]

Archive:  Private-Data.zip
   creating: Private-Data/
  inflating: __MACOSX/._Private-Data  
  inflating: Private-Data/CV4.pdf    
  inflating: __MACOSX/Private-Data/._CV4.pdf  
  inflating: Private-Data/CV5.pdf    
  inflating: __MACOSX/Private-Data/._CV5.pdf  
  inflating: Private-Data/CV7.pdf    
  inflating: __MACOSX/Private-Data/._CV7.pdf  
  inflating: Private-Data/CV6.pdf    
  inflating: __MACOSX/Private-Data/._CV6.pdf  
  inflating: Private-Data/CV2.pdf    
  inflating: __MACOSX/P

In [None]:
import torch
from transformers import pipeline

generate_text = pipeline(model="databricks/dolly-v2-12b", torch_dtype=torch.bfloat16,
                         trust_remote_code=True, device_map="auto", return_full_text=True)


  from .autonotebook import tqdm as notebook_tqdm


In [None]:
from langchain import PromptTemplate, LLMChain
from langchain.llms import HuggingFacePipeline
import unstructured
from langchain.document_loaders import S3FileLoader
from langchain.document_loaders import UnstructuredFileLoader #fix

In [None]:
from unstructured.partition.auto import partition
#elements = partition(filename="example-docs/eml/fake-email.eml")



In [None]:
# template for an instrution with no input
prompt = PromptTemplate(
    input_variables=["instruction"],
    template="{instruction}")

# template for an instruction with input
prompt_with_context = PromptTemplate(
    input_variables=["instruction", "context"],
    template="{instruction}\n\nInput:\n{context}")

hf_pipeline = HuggingFacePipeline(pipeline=generate_text)

llm_chain = LLMChain(llm=hf_pipeline, prompt=prompt)
llm_context_chain = LLMChain(llm=hf_pipeline, prompt=prompt_with_context)

In [None]:
#Loading pdf file as context to langchain
#loader = S3FileLoader("sagemaker-studio-njiztjducek", 's3://webage-genaidata/Private-Data/CV1.pdf')#"genai/Private-Data/CV1.pdf"
## s3fileloader (bucket, key)
# loader = S3FileLoader("webage-genaidata", "Private-Data/CV2.pdf")
#                       #"genai/Private-Data/CV1.pdf"
# loader
# data=loader.load()
#context = data[0].page_content
#print(llm_context_chain.predict(instruction="Give the carrier summary of CHRISTOPHOER MORGAN who is senior web developer?", context=context).lstrip())

loader = UnstructuredFileLoader(
    "Private-Data/CV1.pdf")

data = loader.load()

In [None]:
context = data[0].page_content
print(llm_context_chain.predict(instruction="Provide the career summary of CHRISTOPHOER MORGAN, senior web developer?", context=context).lstrip())

In [None]:
print(llm_context_chain.predict(instruction="Name of certification of CHRISTOPHOER MORGAN, senior web developer?", context=context).lstrip())

Data Analyst Certification


In [None]:
print(llm_context_chain.predict(instruction="What are the Certifications completed by CHRISTOPHOER MORGAN, senior web developer?", context=context).lstrip())

In [None]:
print(llm_context_chain.predict(instruction="Provide the career summary of CHRISTOPHOER MORGAN, senior web developer?", context=context).lstrip())