# Information extraction with LLM

In this demo, we ask LLM to summarize skills that are needed in job postings. Our dataset is [Linkedin Job Postings](https://www.kaggle.com/datasets/arshkon/linkedin-job-postings) and we take the posting descriptions as our data. Then we use [Ollama library](https://ollama.com/) for asking the skills needed in the jobs. 

## Downloading the data

You can either download the data directly from Kaggle or, alternatively, use Python to do it:

In [1]:
import kagglehub

# Download latest version
path = kagglehub.dataset_download("arshkon/linkedin-job-postings")

print("Path to dataset files:", path)

  from .autonotebook import tqdm as notebook_tqdm


Path to dataset files: /Users/huhtis/.cache/kagglehub/datasets/arshkon/linkedin-job-postings/versions/13


## Installation of Ollama

Ollama is an easy-to use tool for running LLM models. Install Ollama from [here](https://ollama.com/download). You should get a command line tool working after the installation. 

## Information about tinyllama

Tinyllama is a project to train 1.1B LLama model where the normal Llama3 has parameter sizes 8B and 70B. Tinyllama is meant to be run in a normal computer. 

This is how you install tinyllama:

<kbd>ollama pull tinyllama</kbd>

## 1. Example of a summary 

Let's use tinyllama and take an example of asking skills needed in a job description. Our example job description is following description of an Account Manager position. Let's ask tinyllama model through ollama to summarise skills needed in the job into 5 points. 

#### Example job description

Company DescriptionShiftRx is an early-stage (seed) tech company based in Austin, TX. We are focused on solving the critical healthcare provider shortage by streamlining the sourcing, credentialing, scheduling, and payment process for temporary healthcare providers. Leveraging generative AI and large language models, we develop customized onboarding modules to seamlessly integrate providers into new clinical environments. Our mission is to transform healthcare staffing and enable providers to focus on delivering the best patient care.

Role Description

This is a full-time on-site role as an Account Manager / Go-To-Market (GTM) Specialist at ShiftRx. This is an opportunity to join as one of the first 10 hires and founding members of the team at an early-stage company building in the AI + healthcare space. You'll work on the enterprise sales pipeline, help drive growth, and prospect and close exciting key healthcare facility accounts. As a foundational member of the sales team, you will play a critical role in shaping the direction and success of ShiftRx. This position offers the unique opportunity to build sales strategies from the ground up, ensuring our service delivery is seamless, scalable, and effectively meets the needs of our clients. We are seeking a driven and dynamic Sales Account Manager to join our fast-paced, B2B2C startup. This is not your typical 9-5 role; it's an opportunity to dive into a rapidly growing company, based out of ATX.

Key Responsibilities

Identify and develop new business opportunities with healthcare facilities and hospitalsManage and grow relationships with existing clientsUnderstand and communicate the value proposition of AI and healthcare technology solutionsWork closely with the team to develop and execute sales strategiesProvide feedback and insights from clients to improve our products and servicesExcellent communication and interpersonal skillsAnalyze data and generate reports
QualificationsProblem-solving and decision-making abilitiesStrong attention to detail and accuracyAbility to work in a fast-paced and high-pressure environmentStrong organizational and time management skillsMinimum of 2 years of sales experience, preferably in a fast-paced, startup environment.Experience in selling to healthcare facilities and hospitals is highly desirable.Proven track record of achieving sales targets + proficient in CRM software.Strong communication and negotiation skills.Ability to work independently and as part of a team.Early-stage (Seed, Series A, Series B) startup experience is requiredMust live in the Austin area or be willing to relocate 
Additional qualifications and skills that would be beneficial for the role include a background in healthcare tech, understanding of clinical workflows, and/ or experience with AI and language models.

What We Offer

A chance to be part of an early, scrappy team committed to transforming healthcare staffing.Significant growth opportunities and the ability to influence major company decisions.Competitive salary, equity options, and comprehensive health benefits.Ability to drive impact in healthcare to help solve the healthcare provider shortage.A unique opportunity to grow into a sales leadership role and help build the team.Unlimited variable compensation with attractive commissions.

Why Join Us?

At ShiftRx, you're not just taking a job; you're seizing an opportunity to be at the forefront of a rapidly evolving field. You'll be making real impacts, not only in our company but in the lives of healthcare providers and patients. If you're ready to make a difference and propel your career in an exciting direction, we want to hear from you.

### Results of an example summarization

We can see from the results, that the result is not formed in five points. Also, all things mentioned are not skills needed in the job. For example, in the result there are listed things that the company offers, such as attractive compensation. Also, the amount of skills listed is bigger than 5 in general. 

In [2]:
# pip install ollama

In [3]:
import ollama

response = ollama.chat(
    model="tinyllama",
    messages=[
        {
            "role": "user",
            "content": "Following text is a job summary. Tell me with five points, which skills are needed in the job. Summary: Company DescriptionShiftRx is an early-stage (seed) tech company based in Austin, TX. We are focused on solving the critical healthcare provider shortage by streamlining the sourcing, credentialing, scheduling, and payment process for temporary healthcare providers. Leveraging generative AI and large language models, we develop customized onboarding modules to seamlessly integrate providers into new clinical environments. Our mission is to transform healthcare staffing and enable providers to focus on delivering the best patient care. Role DescriptionThis is a full-time on-site role as an Account Manager / Go-To-Market (GTM) Specialist at ShiftRx. This is an opportunity to join as one of the first 10 hires and founding members of the team at an early-stage company building in the AI + healthcare space. You'll work on the enterprise sales pipeline, help drive growth, and prospect and close exciting key healthcare facility accounts. As a foundational member of the sales team, you will play a critical role in shaping the direction and success of ShiftRx. This position offers the unique opportunity to build sales strategies from the ground up, ensuring our service delivery is seamless, scalable, and effectively meets the needs of our clients. We are seeking a driven and dynamic Sales Account Manager to join our fast-paced, B2B2C startup. This is not your typical 9-5 role; it's an opportunity to dive into a rapidly growing company, based out of ATX. Key ResponsibilitiesIdentify and develop new business opportunities with healthcare facilities and hospitalsManage and grow relationships with existing clientsUnderstand and communicate the value proposition of AI and healthcare technology solutionsWork closely with the team to develop and execute sales strategiesProvide feedback and insights from clients to improve our products and servicesExcellent communication and interpersonal skillsAnalyze data and generate reports QualificationsProblem-solving and decision-making abilitiesStrong attention to detail and accuracyAbility to work in a fast-paced and high-pressure environmentStrong organizational and time management skillsMinimum of 2 years of sales experience, preferably in a fast-paced, startup environment.Experience in selling to healthcare facilities and hospitals is highly desirable.Proven track record of achieving sales targets + proficient in CRM software.Strong communication and negotiation skills.Ability to work independently and as part of a team.Early-stage (Seed, Series A, Series B) startup experience is requiredMust live in the Austin area or be willing to relocate Additional qualifications and skills that would be beneficial for the role include a background in healthcare tech, understanding of clinical workflows, and/ or experience with AI and language models. What We OfferA chance to be part of an early, scrappy team committed to transforming healthcare staffing.Significant growth opportunities and the ability to influence major company decisions.Competitive salary, equity options, and comprehensive health benefits.Ability to drive impact in healthcare to help solve the healthcare provider shortage.A unique opportunity to grow into a sales leadership role and help build the team.Unlimited variable compensation with attractive commissions. Why Join Us? At ShiftRx, you're not just taking a job; you're seizing an opportunity to be at the forefront of a rapidly evolving field. You'll be making real impacts, not only in our company but in the lives of healthcare providers and patients. If you're ready to make a difference and propel your career in an exciting direction, we want to hear from you.",
        },
    ],
    options={"seed": 42}
)
print(response["message"]["content"])

In the job summary, the skills needed for this position are identified as identifying and developing new business opportunities with healthcare facilities and hospital clients; managing and growing relationships with existing clients; undergoing data analysis and generating reports; problem-solving and decision-making abilities; strong organizational and time management skills; proven track record of achieving sales targets + proficient in CRM software; early-stage (seed, seed plus, startup) experience is required; a background in healthcare tecch or the ability to understand clinical workflows and/or experience with AI and languaure models; the opportunity to be part of an early, scrappy team committed to transforming healthcare staffing; signifiant growth opportunities and equity options; comprehensive health benefits, and competitive salary. The company is seeking a sales account manager with the skills listed above, as well as leadership potential that could drive impact in healthc

## Skills for other postings 

Let's download a job postings file and start forming extractions for other job descriptions. 

### Task: Describe how good the skills extractions are and reflect why they might be how they are. 

In [4]:
import pandas as pd

# df = pd.read_csv("postings.csv")

# Use this if you downloaded the dataset with kagglehub
df = pd.read_csv(path + "/postings.csv")

In [5]:

# for row in df.head(20).itertuples()

In [6]:
# Find locations in Finland
df[df.location.str.contains("Helsinki", na=False, case=False)]





Unnamed: 0,job_id,company_name,title,description,max_salary,pay_period,location,company_id,views,med_salary,...,skills_desc,listed_time,posting_domain,sponsored,work_type,currency,compensation_type,normalized_salary,zip_code,fips


In [7]:
df.value_counts("location")

location
United States       8125
New York, NY        2756
Chicago, IL         1834
Houston, TX         1762
Dallas, TX          1383
                    ... 
Ramseur, NC            1
Canastota, NY          1
Lewisport, KY          1
Canal Fulton, OH       1
Girard, PA             1
Name: count, Length: 8526, dtype: int64

In [8]:
df.pay_period.value_counts()	

pay_period
YEARLY      20628
HOURLY      14741
MONTHLY       518
WEEKLY        177
BIWEEKLY        9
Name: count, dtype: int64

In [9]:
def convert_to_euros(amount, pay_period, conversion_rate=0.85):
    """
    Convert an amount and pay period to an annual salary in euros.

    Parameters:
    - amount (float): The salary amount.
    - pay_period (str): The pay period (e.g., 'YEARLY', 'MONTHLY', 'WEEKLY', 'HOURLY', 'BIWEEKLY').
    - conversion_rate (float): The conversion rate from USD to EUR. Default is 0.85.

    Returns:
    - float: The annual salary in euros.
    """
    if pay_period == 'YEARLY':
        annual_salary = amount
    elif pay_period == 'MONTHLY':
        annual_salary = amount * 12
    elif pay_period == 'WEEKLY':
        annual_salary = amount * 52
    elif pay_period == 'BIWEEKLY':
        annual_salary = amount * 26
    elif pay_period == 'HOURLY':
        # Assuming 40 hours per week and 52 weeks per year
        annual_salary = amount * 40 * 52
    else:
        raise ValueError("Unsupported pay period")

    return annual_salary * conversion_rate

In [10]:
# Create a function that receives an amount and related pay period and returns the amount in euros
