In [1]:
# !python.exe -m pip install --upgrade pip
# %pip install -q -f -U torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
# %pip install -q -f -U transformers
# %pip install -q -f -U langchain
# %pip install -q -f -U accelerate
# %pip install -q -f -U sentencepiece
# %pip install -q -f -U tiktoken
# %pip install -q -f -U sentence_transformers
# %pip install -q -f -U pandas
# %pip install -q -f -U tabulate

# This code creates the environment for Langchain to use your local LLM as a Chat Model

In [2]:
import torch
from transformers import LlamaTokenizer, LlamaForCausalLM
from langchain.chat_models.base import BaseChatModel
from langchain.schema import BaseMessage, AIMessage, HumanMessage, SystemMessage, ChatResult, ChatGeneration
from typing import Optional, List
import pandas as pd

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
class LocalChatModel(BaseChatModel):
    tokenizer: LlamaTokenizer
    model: LlamaForCausalLM
    device: str
    other_kwargs: dict

    def get_prompt(self, messages: List[BaseMessage])->str:
        prompt = []
        for message in messages:
            if isinstance(message, SystemMessage):
                prepend = "SYSTEM: "
            elif isinstance(message, HumanMessage):
                prepend = "USER: "
            elif isinstance(message, AIMessage):
                prepend = "ASSISTANT: "
            prompt.append(prepend + message.content)
        prompt.append("ASSISTANT: ")
        return "\n".join(prompt)



    def _generate(self, messages: List[BaseMessage], stop: Optional[List[str]]=None)->ChatResult: # type: ignore
        # print(messages)
        prompt = self.get_prompt(messages)
        # print(prompt)
        inputs = self.tokenizer(prompt, return_tensors='pt') # type: ignore

        outputs = self.model.generate(inputs.input_ids.to(self.device), **self.other_kwargs) # type: ignore
        generated_text = self.tokenizer.batch_decode(outputs[:, inputs.input_ids.shape[1]:], skip_special_tokens=True)[0] # type: ignore
        ai_message = AIMessage(content=generated_text.strip())
        chat_result = ChatResult(generations=[ChatGeneration(message=ai_message)])
        #print(chat_result)
        return chat_result

    def _agenerate(self):
        pass
    
    def _llm_type(self):
        pass


In [4]:
model_path = "./models/Llama-2-7b-chat-hf" # You will need to download and place the model files in the working directory / models / Llama-2-7b-chat-hf folder

In [5]:
tokenizer = LlamaTokenizer.from_pretrained(model_path)

In [6]:
model = LlamaForCausalLM.from_pretrained(
    model_path,
    low_cpu_mem_usage=True,
    torch_dtype=torch.float16,
    device_map='auto',
    local_files_only=True
)

Loading checkpoint shards: 100%|██████████| 2/2 [00:05<00:00,  2.64s/it]


In [7]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = model.to(device) # type: ignore

In [15]:
# Sample code to check if the LLM is working

chat = LocalChatModel(tokenizer=tokenizer, model=model, device=device, other_kwargs=dict(max_new_tokens=512))
response = chat([
    SystemMessage(content="You are a helpful assistanct that specializes in Python code. Once answer is given, do not add another USER query."),
    HumanMessage(content="Program that prints hello world.")
]).content

In [16]:
print(response)

Sure! Here is a simple Python program that prints "Hello, World!" to the screen:
```
print("Hello, World!")
```
Does this help? Let me know if you have any other questions.
```
```

```


# From here begins the process of loading the csv to Chroma DB
## For this section, I found the below link very helpful
### https://towardsai.net/p/machine-learning/query-your-dataframes-with-powerful-large-language-models-using-langchain

In [10]:
df = pd.read_csv('employee_reviews.csv')

In [19]:
df.head()

Unnamed: 0,Review Date,Employee ID,Department,Strengths,Weaknesses,Training Needs
0,2023-06-08,1,Finance,-Analytical\n-Detail Oriented\n-Able to Handle...,There are many weaknesses that an employee in ...,-An understanding of financial statements\n-An...
1,2023-06-18,2,Operations,-An eye for detail\n-Organized\n-Ability to mu...,1) They may not have experience with managing ...,An employee in the Operations department will ...
2,2023-06-14,3,Admin,Some strengths that an employee in the Admin d...,weaknesses for an employee in the Admin depart...,-An employee in the Admin department will need...
3,2023-06-27,4,Operations,Some strengths that an employee in the Operati...,- May be not be as efficient when working inde...,The Operations department requires employees t...
4,2023-06-16,5,Admin,The Admin department is responsible for the da...,An employee in the Admin department may have d...,-How to use the office phone system\n-How to f...


In [27]:
df['Merged Column'] = "Feedback for the employee in department of " + df['Department'] + ". It's strengths are " + df['Strengths'] + ". These are the weaknesses, " + df['Weaknesses'] + '. Apart from this, here are some training needs ' + df['Training Needs']

In [28]:
print(df['Merged Column'][3])

Feedback for the employee in department of Operations. It's strengths are Some strengths that an employee in the Operations department might have include:
- Ability to work well under pressure and meet deadlines
- Strong attention to detail
- Ability to multitask and prioritize tasks
- Strong organizational skills
- Good communication skills. These are the weaknesses, - May be not be as efficient when working independently
- Organizations may find it difficult to capitalize on employee's strengths
- Communication skills may need improvement
- May need more guidance and supervision. Apart from this, here are some training needs The Operations department requires employees to have knowledge of production and manufacturing processes, quality control, andMr. Jenkins has over ten years of experience in product development, marketing, and operations management. Additionally, he has a degree in business administration from the University of Alabama. warehousing. They also need to be able to u

In [29]:
from langchain.document_loaders import DataFrameLoader

In [30]:
from langchain.vectorstores import Chroma

In [31]:
df_loader = DataFrameLoader(df, page_content_column='Merged Column')

In [32]:
df_document = df_loader.load()
display(df_document)

[Document(page_content="Feedback for the employee in department of Finance. It's strengths are -Analytical\n-Detail Oriented\n-Able to Handle large workloads\n-Proactive\n\nSome weaknesses for an employee in the finance department: \n\n-Too detail oriented \n-Missing the big picture \n-Inflexible \n-Resistant to change. These are the weaknesses, There are many weaknesses that an employee in the finance department may have. One weakness may be that they are not very good at communication. Another weakness may be that they are not very good at working with numbers.. Apart from this, here are some training needs -An understanding of financial statements\n-An understanding of GAAP\n-An understanding of financial ratios\n-An understanding of the time value of money\n-An understanding of risk and return", metadata={'Review Date': '2023-06-08', 'Employee ID': 1, 'Department': 'Finance', 'Strengths': '-Analytical\n-Detail Oriented\n-Able to Handle large workloads\n-Proactive\n\nSome weaknesses

In [33]:
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings


In [38]:
text_splitter = CharacterTextSplitter(chunk_size=200, chunk_overlap=10)
texts = text_splitter.split_documents(df_document)

Created a chunk of size 321, which is longer than the specified 200
Created a chunk of size 390, which is longer than the specified 200
Created a chunk of size 600, which is longer than the specified 200
Created a chunk of size 418, which is longer than the specified 200
Created a chunk of size 305, which is longer than the specified 200
Created a chunk of size 868, which is longer than the specified 200
Created a chunk of size 552, which is longer than the specified 200
Created a chunk of size 272, which is longer than the specified 200
Created a chunk of size 238, which is longer than the specified 200
Created a chunk of size 811, which is longer than the specified 200
Created a chunk of size 559, which is longer than the specified 200
Created a chunk of size 555, which is longer than the specified 200
Created a chunk of size 266, which is longer than the specified 200
Created a chunk of size 213, which is longer than the specified 200
Created a chunk of size 317, which is longer tha

In [39]:
embedding_function = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

In [40]:
chromadb_index = Chroma.from_documents(
 texts, embedding_function, persist_directory='./input'
)

# From here we start querying the data

In [41]:
from langchain.chains import RetrievalQA
from langchain.llms import HuggingFacePipeline

In [42]:
retriever = chromadb_index.as_retriever()

In [43]:
document_qa = RetrievalQA.from_chain_type(
 llm=chat, chain_type="stuff", retriever=retriever
)

In [46]:
response = document_qa.run("What training needs are needed for people in Admin?")

In [47]:
print(response)

Based on the information provided, some training needs for people in the Admin department may include:

* Excellent communication skills
* Excellent organizational skills
* Attention to detail
* Ability to work independently
* Strong work ethic

* Using basic office equipment such as a computer, printer, and telephone
* Effectively communicating with co-workers and customers
* Handling customer inquiries and complaints in a professional manner
* Typing accurately

It is important to note that the specific training needs may vary depending on the company and the position within the Admin department. Additionally, if an employee has weaknesses in certain areas, such as difficulty handling a lot of paperwork or dealing with angry customers, additional training may be necessary to help them improve in those areas.


# As you can see I start getting results but the quality still needs improvement so trying a QA Chain instead

In [56]:
# create the chain to answer questions
qa_chain = RetrievalQA.from_chain_type(llm=chat,
                                  chain_type="stuff",
                                  retriever=retriever,
                                  return_source_documents=True,
                                  verbose=True)

In [57]:
query = "What are the training needs for operations?"
llm_response = qa_chain(query)




[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


In [59]:
print(llm_response['result'])

Based on the context provided, some of the training needs for employees in the Operations department may include:

1. Lack of knowledge in certain areas.
2. Poor communication skills.
3. Lack of experience.
4. Not being able to work well under pressure.
5. Poor time management skills.
6. Lack of creativity and innovation.

Please note that these are just some possible training needs and may vary depending on the specific company and its operations. If you have any further questions, feel free to ask.


# This code is still not perfect but I feel it's in the right direction.
# Happy to get inputs from my peers here if there's a better way to approach this.