#### Imports

In [184]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_groq import ChatGroq
from typing import List, Literal, Annotated
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_core.tools import tool
import getpass
import numpy as np
from datasets import load_dataset
from prompt_poet import Prompt
from tqdm import tqdm
import pandas as pd
from devtools import pprint
from typing import List, Optional

#### Prompt

In [5]:
groq_api_key = getpass.getpass()

 ········


In [112]:
MaxLengthStr = Annotated[str, Field(max_length=20)]

class JobDescriptionExtraction(BaseModel):
    job_title: str = Field(description="The Job title of the text, keep the title short, do not include any locations or other details apart from title")
    tech_skills: List[MaxLengthStr] = Field(description="Technical Skills mentioned in the text")
    soft_skills: List[MaxLengthStr] = Field(description="Soft Skills mentioned in the text")
    certifications: List[MaxLengthStr] = Field(description="Certifications mentioned in the text")
    locations: List[str] = Field(description="Geographical Locations mentioned in Job Description if any. Otherwise return an empty list")

Let's say we want to extract certain entities from raw text. The generic task of such type could be formulated as as token classification problem.

Eg: POS tagging, Named entity recognition etc.

#### Traditional Methods

- Creation of data labels at a token or word level
- For multiword phrases, creating Begin-Inside-End tokens
- Training sequential models HMM, LSTM etc.
- Labelling is an exhaustive effort.
- Feature engineering is necessary.
- Generalization beyond the domain in which model is trained for is not possible.
- Distributional learning methods like Word2vec help but not by a lot
- Lot of applications in the industry:
- Aspect level sentiment analysis: Product - Feature - Sentiment
- Getting structured data from unstructured text

As an example, let's see if we can use a pretrained LLM to directly extract entities from a Job description using Zero-shot approach.

#### Entity Extraction

In [113]:
raw_template = """
- name: system instructions
  role: system
  content: |
    You are an expert in classifying a given text into a job title and extracting properties defined in the JobDescriptionExtraction function. 
    Do not respond with anything other than the text mentioned in the text.

- name: user query
  role: user
  content: |
    Please extract properties defined in the JobDescriptionExtraction function: 
    {{ escape_special_characters(text) }}
"""

In [114]:
template_data = {"text" : '''
Data Scientist VP - Chief Data Office India, Bengaluruor Mumbai

Description

As a Data Scientist with the Chief Data Office, you will shape the future of the Chief Administrative Office and its businesses by applying world-class machine learning expertise. You will collaborate on a wide array of product and business problems with a diverse set of cross-functional partners across Finance, Supplier Services, Data Security intelligence program, Global Real Estate and Customer Experience. You will use data and analysis to identify and solve our divisions biggest challenges and develop state-of-the art machine learning models to solve real-world problems. We have evolved from our ‘startup’ roots to become a credible strategic partner trusted by division wide leadership and are expanding now. By joining JP Morgan Chief Data Office (CAO), you will become part of a world-class Data science community dedicated to problem solving and career growth in ML/AI discipline and beyond.

Product Owner: Develop and own ML products to drive business outcomes and influence your strategic partners, in a highly collaborative environment
Research & Learning: The candidate must also have a strong passion for machine learning and invest independent time towards learning, researching, and experimenting with new innovations in the field.
Problem-Solving: We want a strategic thinker with demonstrated problem-solving skills using Machine Learning Skills.

Technical Skills

Master’s in quantitative field (Computer Science, Mathematics, Statistics, or ML)
6-8 years industry experience in data science / applied ML model development (must have)
Strong knowledge and experience with Traditional ML, Deep Learning,LLM, NLP, time-series predictions, or recommendation systems (must have)
Excellent python coding and algorithm skills (must have)
Experience with data visualization techniques and software
Foundational Statistics knowledge

Additional Skills

Experience driving AI adoption
Experience with Data Querying (e.g., SQL, big data), A/B Testing
Experience with Cloud based deployment (e.g., aws, azure), Engineering background
Experience with python frameworks (e.g., pyspark, django, Flask, Bottle)
Experience in financial markets or services firm

ABOUT US

JPMorgan Chase & Co., one of the oldest financial institutions, offers innovative financial solutions to millions of consumers, small businesses and many of the world’s most prominent corporate, institutional and government clients under the J.P. Morgan and Chase brands. Our history spans over 200 years and today we are a leader in investment banking, consumer and small business banking, commercial banking, financial transaction processing and asset management.

We recognize that our people are our strength and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. We also make reasonable accommodations for applicants’ and employees’ religious practices and beliefs, as well as mental health or physical disability needs. Visit our FAQs for more information about requesting an accommodation.

About The Team

Our professionals in our Corporate Functions cover a diverse range of areas from finance and risk to human resources and marketing. Our corporate teams are an essential part of our company, ensuring that we’re setting our businesses, clients, customers and employees up for success.

'''}

In [115]:
llm_job_extraction = ChatGroq(temperature=0, model_name="llama3-groq-70b-8192-tool-use-preview",api_key=groq_api_key).with_structured_output(JobDescriptionExtraction)

In [116]:
prompt = Prompt(
    raw_template=raw_template,
    template_data=template_data
)

In [117]:
prompt.messages

[{'role': 'system', 'content': 'You are an expert in classifying a given text into a job title and extracting properties defined in the JobDescriptionExtraction function. \nDo not respond with anything other than the text mentioned in the text.'}, {'role': 'user', 'content': 'Please extract properties defined in the JobDescriptionExtraction function: \n\nData Scientist VP - Chief Data Office India, Bengaluruor Mumbai\n\nDescription\n\nAs a Data Scientist with the Chief Data Office, you will shape the future of the Chief Administrative Office and its businesses by applying world-class machine learning expertise. You will collaborate on a wide array of product and business problems with a diverse set of cross-functional partners across Finance, Supplier Services, Data Security intelligence program, Global Real Estate and Customer Experience. You will use data and analysis to identify and solve our divisions biggest challenges and develop state-of-the art machine learning models to solve 

In [118]:
result = llm_job_extraction.invoke(prompt.messages)

In [119]:
pprint(result)

JobDescriptionExtraction(
    job_title='Data Scientist VP - Chief Data Office',
    tech_skills=[
        'Machine Learning',
        'Deep Learning',
        'LLM',
        'NLP',
        'time-series predictions',
        'recommendation systems',
        'Python',
        'SQL',
        'Big Data',
        'AWS',
        'Azure',
        'Pyspark',
        'Django',
        'Flask',
        'Bottle',
    ],
    soft_skills=[
        'Strategic thinking',
        'Problem-solving',
        'Collaboration',
        'Communication',
        'Adaptability',
        'Innovation',
    ],
    certifications=[],
    locations=[
        'India',
        'Bengaluru',
        'Mumbai',
    ],
)


#### Aspect level Sentiment 

In [281]:
raw_template = """
- name: system instructions
  role: system
  content: |
    You are an expert in identifying the sentiment of the review of a product into a positive, negative or neutral and extracting properties defined in the AspectLevelSentiments function. 
    Do not respond with anything other than the text mentioned in the text.

- name: user query
  role: user
  content: |
    Please extract properties defined in the AspectLevelSentiments function: 
    {{ escape_special_characters(text) }}
"""

In [282]:
template_data = {"text": '''
Pros
1. Very good looking, especially the Oasis green variant.
2. Very smooth without any stutters
3. No heating in normal use and I am not a gamer.
4. Longer software updates
5. Good display and charges on 32 mins.
Cons
1. Average cameras
2. Display should have been better in outdoor brightness.
3. Battery drains faster even in power saver mode. Lasts only a day with average normal usage.
4. Software experience in oxygen OS has been degraded and with some bugs.

Overall an above average experience with the Nord 4.
'''}

In [289]:
class AspectLevelSentiment(BaseModel):
    product_aspect: Optional[str] = Field(description="What aspect of the product is the user talking about")
    sentiment: Optional[str]= Field(enum=["positive","negative","neutral"])
    sentiment_term: Optional[str]= Field(description="term used to describe the sentiment on the aspect")

class AspectLevelSentiments(AspectLevelSentiment):
    Sentiments: List[AspectLevelSentiment]

In [290]:
prompt = Prompt(
    raw_template=raw_template,
    template_data=template_data
)

In [291]:
prompt.messages

[{'role': 'system', 'content': 'You are an expert in identifying the sentiment of the review of a product into a positive, negative or neutral and extracting properties defined in the AspectLevelSentiments function. \nDo not respond with anything other than the text mentioned in the text.'}, {'role': 'user', 'content': 'Please extract properties defined in the AspectLevelSentiments function: \n\nPros\n1. Very good looking, especially the Oasis green variant.\n2. Very smooth without any stutters\n3. No heating in normal use and I am not a gamer.\n4. Longer software updates\n5. Good display and charges on 32 mins.\nCons\n1. Average cameras\n2. Display should have been better in outdoor brightness.\n3. Battery drains faster even in power saver mode. Lasts only a day with average normal usage.\n4. Software experience in oxygen OS has been degraded and with some bugs.\n\nOverall an above average experience with the Nord 4.\n'}]

In [292]:
llm_aspect_sentiment = ChatGroq(temperature=0, model_name="llama3-groq-70b-8192-tool-use-preview",api_key=groq_api_key).with_structured_output(AspectLevelSentiments)

In [293]:
result = llm_aspect_sentiment.invoke(prompt.messages)

In [294]:
pprint(result)

AspectLevelSentiments(
    product_aspect=None,
    sentiment=None,
    sentiment_term=None,
    Sentiments=[
        AspectLevelSentiment(
            product_aspect='appearance',
            sentiment='positive',
            sentiment_term='good looking',
        ),
        AspectLevelSentiment(
            product_aspect='performance',
            sentiment='positive',
            sentiment_term='smooth',
        ),
        AspectLevelSentiment(
            product_aspect='battery life',
            sentiment='positive',
            sentiment_term='no heating',
        ),
        AspectLevelSentiment(
            product_aspect='software updates',
            sentiment='positive',
            sentiment_term='longer',
        ),
        AspectLevelSentiment(
            product_aspect='display',
            sentiment='positive',
            sentiment_term='good',
        ),
        AspectLevelSentiment(
            product_aspect='battery life',
            sentiment='negative',
    

In [295]:
template_data = {"text": '''
Realme Buds T300 are a great choice for this price range! Here's what I observed with the unit of product I received:

Charging : They support Fast Charging. The most interesting and important part about this is that realme claims that 10 minutes of charge can provide around 7 hours of playback which I found to be true.

Mic Quality : I tested the mic quality and it seemed interesting. It apparently depends on which mode you're using these buds which heavily affects the mic quality. For example, when I call with this without Noise Cancellation on, it actually cancels the noise from my audio. But when I don't use Noise Cancellation, it doesn't suppresses the background noise. And there's this issue I'm having with the device that my phone or my PC doesn't seems to be able to use the mic of this device while I'm not in a call. Mic quality is fine.

Volume Control & Sound Quality : Here comes the best part. The Bass on this thing is extremely good. You can actually make it even better by using the realme link app to change the modes on this device. All in all, it's great for music listeners. This product is recommended if you're looking for a good music listening experience at the this price range.

Appearance : There is no doubt that this thing looks absolutely professional. It has a really sleek design and it feels very smooth when you hold it.

Build Quality : I don't know if it's just me but I feel like the build quality of this thing isn't that good. Especially the lid of the case. When I open the case, I have a feeling that the lid might break if I push it a bit too hard. Of course, that wouldn't happen and I tried it. The lid of the case, overall feels a bit loose. I don't know if that's a defect in the unit I got or is it the standard. Either way, I'm not really satisfied with this defect.

Button Control : There is a button to control media on this device. There are three possible combinations that one can use on both the sides of the earbuds. It comes with defaults but you can configure the buttons in the realme link app. I personally found the usage of buttons quite limited. But it's good for simplicity of controls.

Noise Cancellation : BY FAR, THE BEST PART. Personally, I'm someone who comes from a community that uses headphones a lot. I have one wired headphones with me which used to be my main before I got this. I have ALWAYS wanted Noise Cancellation and since I used headphones that didn't support it. It was a big thing for me. This was the best part in my opinion of these earbuds. They support upto -30db of Noise Cancellation and it actually works damn well. To get Ambient Noise Cancellation in this budget is quite crazy.

Bluetooth 5.3 : It has Bluetooth 5.3 and Wake & Pair and in my opinion, it works pretty well.

IP55 Rating : It has a rating of IP55 so it's resistant from water splashes/sweat and light amount of dust. Of course, I had to test it and it seems that it really is resistant to water splashes and dust. I tested it in such environments and it still works great. No problemo.

Dolby Atmos : It also has Dolby Atmos which, personally, I didn't notice that much but when I was testing this feature, I shifted my focus to the 360° audio and it surely seemed like it works and pretty well at that. It could be my imagination though so don't take my word for it here.

App : You can use realme link app to further configure the settings of this device and it's pretty decent. You get options to change audio modes, button functionality, etc. So it's pretty decent.

In conclusion, I think that these earbuds are extremely good for their price range and you should definitely go for it if you're a music lover or you just need earbuds for day-to-day tasks.

I hope my review was helpful to the readers. Thank you for your quality time!

'''}

In [296]:
prompt = Prompt(
    raw_template=raw_template,
    template_data=template_data
)

In [297]:
prompt.messages

[{'role': 'system', 'content': 'You are an expert in identifying the sentiment of the review of a product into a positive, negative or neutral and extracting properties defined in the AspectLevelSentiments function. \nDo not respond with anything other than the text mentioned in the text.'}, {'role': 'user', 'content': "Please extract properties defined in the AspectLevelSentiments function: \n\nRealme Buds T300 are a great choice for this price range! Here's what I observed with the unit of product I received:\n\nCharging : They support Fast Charging. The most interesting and important part about this is that realme claims that 10 minutes of charge can provide around 7 hours of playback which I found to be true.\n\nMic Quality : I tested the mic quality and it seemed interesting. It apparently depends on which mode you're using these buds which heavily affects the mic quality. For example, when I call with this without Noise Cancellation on, it actually cancels the noise from my audio

In [298]:
llm_aspect_sentiment = ChatGroq(temperature=0, model_name="llama3-groq-70b-8192-tool-use-preview",api_key=groq_api_key).with_structured_output(AspectLevelSentiments)
result = llm_aspect_sentiment.invoke(prompt.messages)

In [299]:
pprint(result)

AspectLevelSentiments(
    product_aspect=None,
    sentiment=None,
    sentiment_term=None,
    Sentiments=[
        AspectLevelSentiment(
            product_aspect='Charging',
            sentiment='positive',
            sentiment_term='Fast Charging',
        ),
        AspectLevelSentiment(
            product_aspect='Mic Quality',
            sentiment='neutral',
            sentiment_term='Interesting',
        ),
        AspectLevelSentiment(
            product_aspect='Volume Control & Sound Quality',
            sentiment='positive',
            sentiment_term='Great for music listeners',
        ),
        AspectLevelSentiment(
            product_aspect='Appearance',
            sentiment='positive',
            sentiment_term='Professional',
        ),
        AspectLevelSentiment(
            product_aspect='Build Quality',
            sentiment='negative',
            sentiment_term='Not satisfied',
        ),
        AspectLevelSentiment(
            product_aspect='Bu

#### POS tagging

In [305]:
raw_template = """
- name: system instructions
  role: system
  content: |
    You are an expert in identifying the parts of speech tag of each word in the given text and extracting properties defined in the POSTagger function. 
    Do not respond with anything other than the text mentioned in the text.

- name: user query
  role: user
  content: |
    Please extract properties defined in the POSTagger function: 
    {{ escape_special_characters(text) }}
"""

In [311]:
class POSTag(BaseModel):
    word: Optional[str] = Field(description="Word within sentence")
    pos_tag: Optional[str]= Field(description="POS tag of the word")
    pos_description: Optional[str] = Field(description="Full form of the tag and short description")

class POSTagger(BaseModel):
    pos_tags: List[POSTag]

In [312]:
prompt = Prompt(
    raw_template=raw_template,
    template_data={"text": "The quick brown fox jumps over the lazy dog."}
) 

In [313]:
prompt.messages

[{'role': 'system', 'content': 'You are an expert in identifying the parts of speech tag of each word in the given text and extracting properties defined in the POSTagger function. \nDo not respond with anything other than the text mentioned in the text.'}, {'role': 'user', 'content': 'Please extract properties defined in the POSTagger function: \nThe quick brown fox jumps over the lazy dog.'}]

In [314]:
llm_pos_tagging = ChatGroq(temperature=0, model_name="llama3-groq-70b-8192-tool-use-preview",api_key=groq_api_key).with_structured_output(POSTagger)
result = llm_pos_tagging.invoke(prompt.messages)

In [315]:
pprint(result)

POSTagger(
    pos_tags=[
        POSTag(
            word='The',
            pos_tag='DT',
            pos_description='Determiner',
        ),
        POSTag(
            word='quick',
            pos_tag='JJ',
            pos_description='Adjective',
        ),
        POSTag(
            word='brown',
            pos_tag='JJ',
            pos_description='Adjective',
        ),
        POSTag(
            word='fox',
            pos_tag='NN',
            pos_description='Noun, singular or mass',
        ),
        POSTag(
            word='jumps',
            pos_tag='VBZ',
            pos_description='Verb, present tense, third person singular',
        ),
        POSTag(
            word='over',
            pos_tag='IN',
            pos_description='Preposition or subordinating conjunction',
        ),
        POSTag(
            word='the',
            pos_tag='DT',
            pos_description='Determiner',
        ),
        POSTag(
            word='lazy',
            pos_tag=