# Auto-generation of advertisements
## Dependency
**Note**: To run this notebook, the following packages should be installed
- transformers
- pandas
- torch
- scikit-learn

I have successfully tested the notebook in Python 3.7+, it should also work in other Python3.x environment

## Overview
This notebook demonstrates a possible solution for generating job advertisement automatically with a pre-trained generative transformer model. Specially, this transformer model is a [GPT-2 model](https://openai.com/blog/better-language-models/) which is fine-tuned on the given meta information (e.g. job title, abstract, keyword etc.) related to the job ads.

The training, validation and test corpus used in this use case are ads containing the words *machine learning*, *data science/scientist*, *AI* and *artificial intelligence* in the job description of provided JSON file. Note that due to the time limitation, I did not go through the whole 50,000 ads and specially pick all well-written and concise ads out of these ads. But this solution may be used to assist advertisers to automatically generate new better written, concise ads, if we carefully prepare such a training corpus to "teach" the model what well-written and concise ads look like.

I followed [this blog](https://towardsdatascience.com/conditional-text-generation-by-fine-tuning-gpt-2-11c1a9fc639d) to fine-tune the pre-trained GPT-2 on our own ads dataset. For more technical details, please refer to ["Conditional Text Generation by Fine Tuning GPT-2"](https://towardsdatascience.com/conditional-text-generation-by-fine-tuning-gpt-2-11c1a9fc639d)

A quick look of the pipeline contained in this study case

<ol>
    <li>Prepare text corpus
        <ul>
            <li> Remove columns irrelevant to the task
            <li> Check whether the data contain missing values in "title", "abstract", "clean_text" etc.
        </ul>
    <li> Configuration of hyper-parameters
         <ul>
             <li> Configure hyper-parameters for dataset split and model training
             <li> Fix seeds used in dataset generation and model initialization
        </ul>
    <li> Load tokenizer
    <li> Create training and validation dataset
        <ul>
             <li> Define Dataset class for dataset generation
             <li> Split whole dataset into training and validation sets
        </ul>
    <li> Load language model
    <li> Fine-tuning of language model
        <ul>
             <li> Freeze the corresponding layers in language model
             <li> Fine tune the language model on our dataset
             <li> Save the fine-tuned model
        </ul>
    <li> Advertisement generation using the fine-tuned model
        <ul>
             <li> Load tokenizer and fine-tuned model
             <li> Generate prompt for ad generation
             <li> Ad generation using greedy search
             <li> Ad generation using beam search
        </ul>
</ol>

In [None]:
# Install dependencies
!pip install -r requirements.txt

Import all required packages

In [1]:
import pandas as pd
import os
import random
import numpy as np
import torch
from torch.utils.data import Dataset
from transformers import AutoTokenizer
from transformers import AutoConfig, AutoModelForPreTraining, TrainingArguments, Trainer

## Prepare text corpus

We selected the job ads that contains the words such as *machine learning*, *data science/scientist*, *AI* and *artificial intelligence* in the job description to form the text corpus that was used to fine-tune the pretrained GPT-2 language model for job ads generation. The aim of the fine-tuned job ads generator is to input the job title, job abstract (1-2 sentences), and some keywords(e.g. skills, responsibilities etc.) of the job description, the generator, ideally, would create the whole job advertisement with responsibility, skills required, and other relevant and detailed information.

To prepare the text corpus for fine-tuning the language model, we load the data frame, and only keep "id", "quality", "title", "abstract", "clean_text", "skills", "responsilibities" etc. (detailed job contents after preprocessing e.g. removal of HTML tags ) columns and remove other columns in the data frame

In [2]:
df_ads = pd.read_csv('JD_ML.csv', encoding='windows-1252')
remove_cols = [col for col in df_ads.columns if col not in ["id", "quality", "title", "abstract", "clean_text", "SKILLS", "RESPONSIBILITIES", "REQUIREMENTS", "EXPERIENCE", "QUALIFICATION"]]
df_ads.drop(columns=remove_cols, inplace=True)
print(f'find {len(df_ads)} examples')

find 373 examples


In [3]:
df_ads.head(10)

Unnamed: 0,id,quality,title,abstract,clean_text,SKILLS,RESPONSIBILITIES,REQUIREMENTS,EXPERIENCE,QUALIFICATION
0,38964460,bad,Senior Product Manager,Electronic Arts is looking for a full time Sen...,The Company We are EA! And we make games ????...,"strong quantitative background to create, bala...",Measure performance and adjust\nWork closely w...,"designing the game economy, and providing anal...",experience with data and metrics driven decisi...,
1,38996736,good,Social Media/membership and Events Assistant,An ideal entry level role initially working on...,Management of AIPS Website including: Ensurin...,Quarterly in managing of AIPS public Facebook/...,Ensuring all information is current and releva...,ability to increase the frequency and reach of...,"8 - 12 hrs hours per week, flexible hours and ...",Office management skills\n
2,38819733,bad,"Pricing Analyst's all levels, Sydney CBD",Superb roles with leading Financial Services f...,My client is a Financial Services organisatio...,SQL\nSAS\nPython\nability in the fullness of t...,working as part of a customer-facing team who ...,Candidates also need a solid understanding of ...,,
3,38979340,good,IT Designer,Our client is seeking an experienced IT Design...,Our client is seeking an experienced IT Desig...,Agile delivery\nshare knowledge and informatio...,"Analyse system, development or support issues ...",Interpret business requirements into applicati...,experience in working in a scaled agile enviro...,Australian Citizens\n
4,38912123,good,Project Manager - Service Delivery,5 years of experience working in managing end ...,Infosys is a global leader in next-generation...,"able to communicate by telephone, email\n",creating new avenues to generate value\nCommun...,Location in Australia\nAbility to work in team...,5 years of experience working in managing end ...,
5,38959448,good,"Aged Care Nurses. RN's, EN's & Cert 3 AIN/PCA'...",RNS Nursing is seeking experienced Aged Care (...,"JOB ROLE We are seeking experienced, compassi...",,Roster update availability Friendly supportive...,,,Cert 3 AIN/PCA's to provide a high quality of ...
6,38926745,good,NDT Technician,APTS are urgently seeking an experienced NDT T...,APTS Pty Ltd provides service excellence in t...,ability to commence immediately\nInterpret and...,"Mining, Power and Water Treatment\nSet up and ...",Perform and supervise tests\n,adaptable with working hours which can vary fr...,
7,38847053,good,Agile Business Analyst,12 + 24 month Engagement. Lead a high perform...,Our Federal Government client requires an exp...,ability to work effectively with multiple disc...,Engage with customers and stakeholders to unde...,,Experience working as a Business Analyst in an...,
8,38856874,bad,Data Analysts,Aspiring Data gurus? Do you have a passion for...,Client We are currently supporting a global f...,Strong business analysis skills\nTableau\nUnde...,,Following a detailed review of their organisat...,At least two years working within a data analy...,
9,38823908,good,Personal Carer,Personal carer position available for Monday t...,DO YOU SHARE A VISION OF EXCELLENCE IN PERSON...,,provide the help or assistance required to ens...,,,


Combine job skills, responsibilities, requirements, experience, qualification into a new column "keywords", which will be input to the generative model

In [5]:
# df_ads["keywords"] = df_ads[df_ads.columns[5:]].apply(lambda x: '\n'.join(x.dropna()),  axis=1)
df_ads["keywords"] = df_ads[df_ads.columns[5:7]].apply(lambda x: '\n'.join(x.dropna()),  axis=1)
df_ads.drop(columns=["SKILLS", "RESPONSIBILITIES", "REQUIREMENTS", "EXPERIENCE", "QUALIFICATION"], inplace=True)
df_ads.head(10)

Unnamed: 0,id,quality,title,abstract,clean_text,keywords
0,38964460,bad,Senior Product Manager,Electronic Arts is looking for a full time Sen...,The Company We are EA! And we make games ????...,"strong quantitative background to create, bala..."
1,38996736,good,Social Media/membership and Events Assistant,An ideal entry level role initially working on...,Management of AIPS Website including: Ensurin...,Quarterly in managing of AIPS public Facebook/...
2,38819733,bad,"Pricing Analyst's all levels, Sydney CBD",Superb roles with leading Financial Services f...,My client is a Financial Services organisatio...,SQL\nSAS\nPython\nability in the fullness of t...
3,38979340,good,IT Designer,Our client is seeking an experienced IT Design...,Our client is seeking an experienced IT Desig...,Agile delivery\nshare knowledge and informatio...
4,38912123,good,Project Manager - Service Delivery,5 years of experience working in managing end ...,Infosys is a global leader in next-generation...,"able to communicate by telephone, email\n\ncre..."
5,38959448,good,"Aged Care Nurses. RN's, EN's & Cert 3 AIN/PCA'...",RNS Nursing is seeking experienced Aged Care (...,"JOB ROLE We are seeking experienced, compassi...",Roster update availability Friendly supportive...
6,38926745,good,NDT Technician,APTS are urgently seeking an experienced NDT T...,APTS Pty Ltd provides service excellence in t...,ability to commence immediately\nInterpret and...
7,38847053,good,Agile Business Analyst,12 + 24 month Engagement. Lead a high perform...,Our Federal Government client requires an exp...,ability to work effectively with multiple disc...
8,38856874,bad,Data Analysts,Aspiring Data gurus? Do you have a passion for...,Client We are currently supporting a global f...,Strong business analysis skills\nTableau\nUnde...
9,38823908,good,Personal Carer,Personal carer position available for Monday t...,DO YOU SHARE A VISION OF EXCELLENCE IN PERSON...,provide the help or assistance required to ens...


Check if the dataframe contain missing values in "conciseness", "title", "abstract", "clean_text", "keywords" etc

In [6]:
# check if the dataframe has missing values 
df_ads.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 373 entries, 0 to 372
Data columns (total 6 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   id          373 non-null    int64 
 1   quality     373 non-null    object
 2   title       373 non-null    object
 3   abstract    373 non-null    object
 4   clean_text  373 non-null    object
 5   keywords    373 non-null    object
dtypes: int64(1), object(5)
memory usage: 17.6+ KB


As all columns contain non-null objects,  we do not need to handle the missing value. Take a look at the number of concise and non-concise ads

In [7]:
# number of good-quality and bad-quality ads in the dataframe
df_ads["quality"].value_counts()

good    209
bad     164
Name: quality, dtype: int64

Select concise ads for fine-tuning the generative model and non-concise ads for testing the model

In [8]:
df_good_ads = df_ads.loc[df_ads["quality"]=="good"]
df_bad_ads = df_ads.loc[df_ads["quality"]=="bad"]

## Configuration of hyper-parameters

Next we set up some hyper-parameters that will be used later for dataset split, tokenizer generation and model initialization & training

In [9]:
 
# Configure hyper-parameters for training, parameters are referred to https://colab.research.google.com/drive/16UTbQOhspQOF3XlxDFyI28S-0nAkTzk_#scrollTo=vCPohrZ-CTWu
MODELS = ["gpt2", "gpt2-medium", "gpt2-large", "gpt2-xl"]
MODEL = MODELS[0]

USE_APEX = True # Enable mixed precision in training
APEX_OPT_LEVEL  = 'O1'
UNFREEZE_LAST_N = 6 # Unfreeze the last N layers in GPT-2 model for fine-tuning

SPECIAL_TOKENS  = { "bos_token": "<|BOS|>",
                    "eos_token": "<|EOS|>",
                    "unk_token": "<|UNK|>",                    
                    "pad_token": "<|PAD|>",
                    "sep_token": "<|SEP|>"}
                    
MAXLEN = 768  #{768, 1024, 1280, 1600} # Maximum number of tokens can be used in GPT-2 model

TRAIN_SIZE = 0.8

if USE_APEX:
    BATCHSIZE = 3
    BATCH_UPDATE = 16
else:
    BATCHSIZE = 2
    BATCH_UPDATE = 32

EPOCHS = 20
LR = 5e-4
EPS  = 1e-8
WARMUP_STEPS = 1e2
WD = 0.01

SEED = 1

To make the reproducible result, we fixed the seed values for dataset generation and model initialization

In [10]:
def seed_everything(seed):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = True

seed_everything(SEED)

## Load tokenizer

Load the default GPT-2 tokenizer and add some special tokens such as *begin of text*(**"<|BOS|>"**), *end of text*(**"<|EOS|>"**), *unknown word*(**"<|UNK|>"**), *padding token*(**"<|PAD|>"**), and *separate token*(**"<|SEP|>"**) to the tokenizer. The tokenizer will be used to encode the input text into vector representation for model training and inference

In [11]:
# load tokenizer
def get_tokenizer(special_tokens=None, load_token_path=None):
    
    if load_token_path:
        tokenizer = AutoTokenizer.from_pretrained(load_token_path)
    
    else:
        tokenizer = AutoTokenizer.from_pretrained(MODEL) #GPT2Tokenizer
        if special_tokens:
            tokenizer.add_special_tokens(special_tokens)
            print("Special tokens added")
    
    return tokenizer

tokenizer = get_tokenizer(special_tokens=SPECIAL_TOKENS)


Special tokens added


## Create training and validation dataset

### Define Dataset class for dataset generation

For each job ad, we join the job title, job abstract and job content sequentially into an input text and encode it into vectors using the GPT-2 tokenizer. The encoded vectors are then fed into the language model for fine-tuning

In [12]:
class ADDataset(Dataset):
    
    def __init__(self, df_data, tokenizer, randomize=True):
        self.randomize = randomize
        self.tokenizer = tokenizer 
        self.title     = df_data["title"].tolist()
        self.text      = df_data["clean_text"].tolist()
        self.abstract  = df_data["abstract"].tolist()
        self.keywords  = df_data["keywords"].tolist()
     
    @staticmethod
    def join_keywords(keywords, randomize=True):
        
        kws = keywords.replace('\n\n', '\n').splitlines()
        N = len(kws)

        #random sampling and shuffle
        if randomize: 
            M = random.choice(range(N+1))
            kws = kws[:M]
            random.shuffle(kws)

        return ';'.join(kws)

    def __len__(self):
        return len(self.title)

    def __getitem__(self, idx):
        keywords = self.keywords[idx]

        kw = self.join_keywords(keywords, self.randomize)

        """
        For ad content in each Json record, we prepend it with the start of text token, the title, abstract of that ad,
        then append it with the end of text token and pad to the maximum length with the pad token. 
        """
        input = SPECIAL_TOKENS['bos_token'] + self.title[idx] + \
                SPECIAL_TOKENS['sep_token'] + self.abstract[idx] + \
                SPECIAL_TOKENS['sep_token'] + kw + SPECIAL_TOKENS['sep_token'] + \
                self.text[idx] + SPECIAL_TOKENS['eos_token']
            
        # input = SPECIAL_TOKENS['bos_token'] + self.title[idx] + SPECIAL_TOKENS['sep_token'] + \
        #         self.abstract[idx] + SPECIAL_TOKENS['sep_token'] + \
        #         self.text[idx] + SPECIAL_TOKENS['eos_token']

        encodings_dict = tokenizer(input, truncation=True, max_length=MAXLEN, padding="max_length")

        input_ids = encodings_dict['input_ids']
        attention_mask = encodings_dict['attention_mask']

        """
        Appends both the encoded tensor and the attention mask for that encoding to a dict. The attention mask is
        a binary list of 1's or 0's which determine whether the langauge model should take that token into consideration or not. 
        """

        return {'label': torch.tensor(input_ids),
                'input_ids': torch.tensor(input_ids), 
                'attention_mask': torch.tensor(attention_mask)}

### Split good-quality ads into training and validation sets

In [13]:
# Split dataframe into training and validation data
from sklearn.model_selection import train_test_split
train_df, val_df = train_test_split(df_good_ads, train_size=TRAIN_SIZE, random_state=SEED)
print(f'{len(train_df)} samples for training, and {len(val_df)} samples for validation')

test_df = df_bad_ads
print(f'{len(test_df)} samples for testing')

# shuffle training dataframes
train_df.sample(frac=1, random_state=SEED)
train_dataset = ADDataset(train_df, tokenizer)
val_dataset = ADDataset(val_df, tokenizer, randomize=False)

167 samples for training, and 42 samples for validation
164 samples for testing


## Load the language model

Load and set the parameters of the GPT-2 model

In [14]:
# Load configuration and model 
def get_model(tokenizer, special_tokens=None, load_model_path=None):
    
    #GPT2LMHeadModel
    if special_tokens:
        config = AutoConfig.from_pretrained(MODEL, 
                                            bos_token_id=tokenizer.bos_token_id,
                                            eos_token_id=tokenizer.eos_token_id,
                                            sep_token_id=tokenizer.sep_token_id,
                                            pad_token_id=tokenizer.pad_token_id,
                                            output_hidden_states=False)
    else: 
        config = AutoConfig.from_pretrained(MODEL,                                     
                                            pad_token_id=tokenizer.eos_token_id,
                                            output_hidden_states=False)    

    model = AutoModelForPreTraining.from_pretrained(MODEL, config=config)

    if special_tokens:
        #Special tokens added, model needs to be resized accordingly
        model.resize_token_embeddings(len(tokenizer))

    if load_model_path:
        model.load_state_dict(torch.load(load_model_path))

    # Run model on GPU
    device = torch.device("cuda")
    model.cuda()
    model = model.to(device)
    return model

model = get_model(tokenizer, special_tokens=SPECIAL_TOKENS)
print("Load model successfully")

Load model successfully


## Fine-tune the language model

Refer to ["Conditional Text Generation by Fine Tuning GPT-2"]("https://towardsdatascience.com/conditional-text-generation-by-fine-tuning-gpt-2-11c1a9fc639d"), we fine-tune the weights of the last 6 layers in GPT-2 model by setting its ```parameter.required_grad``` as ```True```, and freeze the other layers

In [15]:
# Freeze the last N layers
for parameter in model.parameters():
    parameter.requires_grad = False

for i, m in enumerate(model.transformer.h):        
    # Only un-freeze the last n transformer blocks
    if i+1 > 12 - UNFREEZE_LAST_N:
        for parameter in m.parameters():
            parameter.requires_grad = True 

for parameter in model.transformer.ln_f.parameters():        
    parameter.requires_grad = True

for parameter in model.lm_head.parameters():        
    parameter.requires_grad = True

### Fine tune the language model on advertisement dataset

Fine-tune the GPT-2 model on our own advertisement dataset using the pre-defined hyper-parameter values and the model configurations

In [16]:
# Fine-tune GPT-2 model
model_dir = f"./output/ad_generation/{MODEL}"
os.makedirs(model_dir, exist_ok=True)

# Fine-tune GPT2 using Trainer
training_args = TrainingArguments(
    output_dir=model_dir,
    num_train_epochs=EPOCHS,
    per_device_train_batch_size=BATCHSIZE,
    per_device_eval_batch_size=BATCHSIZE,
    gradient_accumulation_steps=BATCH_UPDATE,
    save_strategy="epoch",
    evaluation_strategy="epoch",
    fp16=True,
    fp16_opt_level=APEX_OPT_LEVEL,
    warmup_steps=WARMUP_STEPS,    
    learning_rate=LR,
    adam_epsilon=EPS,
    weight_decay=WD,
    seed=SEED,
    save_total_limit=1,
    load_best_model_at_end=True
)

trainer = Trainer(
    model=model,
    args=training_args,    
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    tokenizer=tokenizer,
    # compute_metrics=compute_metrics
)

trainer.train()

Using amp fp16 backend
***** Running training *****
  Num examples = 167
  Num Epochs = 20
  Instantaneous batch size per device = 3
  Total train batch size (w. parallel, distributed & accumulation) = 48
  Gradient Accumulation steps = 16
  Total optimization steps = 60
  args.max_grad_norm,


Epoch,Training Loss,Validation Loss
0,No log,65.833359
1,No log,65.138977
2,No log,59.509144
3,No log,54.178036
4,No log,33.133171
5,No log,13.299901
6,No log,5.328084
7,No log,3.623007
8,No log,3.115868
9,No log,2.838604


***** Running Evaluation *****
  Num examples = 42
  Batch size = 3
Saving model checkpoint to ./output/ad_generation/gpt2/checkpoint-3
Configuration saved in ./output/ad_generation/gpt2/checkpoint-3/config.json
Model weights saved in ./output/ad_generation/gpt2/checkpoint-3/pytorch_model.bin
tokenizer config file saved in ./output/ad_generation/gpt2/checkpoint-3/tokenizer_config.json
Special tokens file saved in ./output/ad_generation/gpt2/checkpoint-3/special_tokens_map.json
Deleting older checkpoint [output/ad_generation/gpt2/checkpoint-57] due to args.save_total_limit
  args.max_grad_norm,
***** Running Evaluation *****
  Num examples = 42
  Batch size = 3
Saving model checkpoint to ./output/ad_generation/gpt2/checkpoint-6
Configuration saved in ./output/ad_generation/gpt2/checkpoint-6/config.json
Model weights saved in ./output/ad_generation/gpt2/checkpoint-6/pytorch_model.bin
tokenizer config file saved in ./output/ad_generation/gpt2/checkpoint-6/tokenizer_config.json
Special tok

TrainOutput(global_step=60, training_loss=16.426580810546874, metrics={'train_runtime': 567.8362, 'train_samples_per_second': 5.882, 'train_steps_per_second': 0.106, 'total_flos': 1300058505216000.0, 'train_loss': 16.426580810546874, 'epoch': 19.86})

### Save the fine-tuned model

In [17]:
trainer.save_model()

Saving model checkpoint to ./output/ad_generation/gpt2
Configuration saved in ./output/ad_generation/gpt2/config.json
Model weights saved in ./output/ad_generation/gpt2/pytorch_model.bin
tokenizer config file saved in ./output/ad_generation/gpt2/tokenizer_config.json
Special tokens file saved in ./output/ad_generation/gpt2/special_tokens_map.json


## Advertisement generation using the fine-tuned model

Now, we can generate ad contents using fine-tuned GPT-2 model. To evaluate the "goodness" of the ad generation model, we select the first sample in the validation set as an example and compare the generated job contents with the real content.

### Load tokenizer and fine-tuned model

First, load the saved tokenizer and trained fine-tuned model

In [18]:
# Load tokenizer and saved model
tokenizer = get_tokenizer(load_token_path=model_dir)
model = get_model(tokenizer, special_tokens=SPECIAL_TOKENS, load_model_path=os.path.join(model_dir, 'pytorch_model.bin'))

loading file ./output/ad_generation/gpt2/vocab.json
loading file ./output/ad_generation/gpt2/merges.txt
loading file ./output/ad_generation/gpt2/tokenizer.json
loading file ./output/ad_generation/gpt2/added_tokens.json
loading file ./output/ad_generation/gpt2/special_tokens_map.json
loading file ./output/ad_generation/gpt2/tokenizer_config.json
loading configuration file https://huggingface.co/gpt2/resolve/main/config.json from cache at /home/hpan/.cache/huggingface/transformers/fc674cd6907b4c9e933cb42d67662436b89fa9540a1f40d7c919d0109289ad01.7d2e0efa5ca20cef4fb199382111e9d3ad96fd77b849e1d4bed13a66e1336f51
Model config GPT2Config {
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50257,
  "embd_pdrop": 0.1,
  "eos_token_id": 50258,
  "gradient_checkpointing": false,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 768,
  "n_head": 12,
  "n_inner": n

### Generate prompt for ad generation

Generate the prompt for ad generator by joining the ad title and ad abstract from the first sample in validation set into the prompt text

In [19]:
# choose the first sample in test dataset for testing
title = test_df.iloc[1]['title']
abstract = test_df.iloc[1]['abstract']
keywords = test_df.iloc[1]['keywords']
kw = ADDataset.join_keywords(keywords, randomize=False)

prompt = SPECIAL_TOKENS['bos_token'] + title + \
         SPECIAL_TOKENS['sep_token'] + abstract + \
         SPECIAL_TOKENS['sep_token'] + kw + SPECIAL_TOKENS['sep_token']

        
# # prompt = SPECIAL_TOKENS['bos_token'] + title + \
# #          SPECIAL_TOKENS['sep_token'] + abstract + SPECIAL_TOKENS['sep_token']

print(prompt)

<|BOS|>Pricing Analyst's all levels, Sydney CBD<|SEP|>Superb roles with leading Financial Services firm. Combine Actuarial/Pricing Analytics with the latest Data Science, Machine Learning techniques<|SEP|>SQL;SAS;Python;ability in the fullness of time to suggest process improvements as well as the quality of your stakeholder relationships;working as part of a customer-facing team who provide internal consultancy to the business, supporting the execution of various strategies from a pricing perspective<|SEP|>


display the real advertisement content of the example case

In [20]:
content = test_df.iloc[1]['clean_text']
content

' My client is a Financial Services organisation who continue to go from strength to strength with their innovative product suite and good governance enabling them to continue growing as we move into 201.With a nascent new Product Pricing Team growing currently they are looking for talented and experienced Actuarial / Pricing Analyst.s to join their team. I am looking for applications from candidates with strong mathematically orientated academics  BSc, MSc etc in a Maths focused Degree . e.g. Economics, Statistics, Mathematics, Actuarial Studies, Operations Research et al and at least 12m ideally 3 years but more experienced candidates are also encouraged to apply experience in an Analytically focused role within a General Insurance team. In this role you will be working as part of a customer-facing team who provide internal consultancy to the business, supporting the execution of various strategies from a pricing perspective. With numerous initiatives underway your role will require 

Encode the prompt text into vectors using the tokenizer and evaluate the generator model

In [21]:
generated = torch.tensor(tokenizer.encode(prompt)).unsqueeze(0)
# Run model on GPU.
device = torch.device("cuda")
generated = generated.to(device)

model.eval()

GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50262, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0): GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
      (1): GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )


### Ad generation using greedy search

Generate the ad contents using the greedy search and return 10 samples with different criteria

In [22]:
# Top-p (nucleus) text generation (10 samples):
sample_outputs = model.generate(generated, 
                                do_sample=True,   
                                min_length=50, 
                                max_length=MAXLEN,
                                top_k=30,                                 
                                top_p=0.7,        
                                temperature=0.9,
                                repetition_penalty=2.0,
                                num_return_sequences=10
                                )

for i, sample_output in enumerate(sample_outputs):
    text = tokenizer.decode(sample_output, skip_special_tokens=True)
    a = len(title) + len(abstract) + len(','.join(keywords))
    # a = len(title) + len(abstract)

    print("{}: {}\n\n".format(i+1,  text[a:]))

1: eople are constantly changing their daily lives you will need both analytical skills such analysis capability knowledge at AIM level. You may also be required by other regulatory agencies including FIFO Regulation Authority Australia,.to complete this role please apply online through our website 


2: years experience working within Australia based sales channels / platforms such eCommerce systems using SQL Azure databases Ability +3+years market intelligence skills Excellent understanding about complex data pipelines including RDF Analysis Experience building predictive models A high level analytical approach which helps you understand how customers use products effectively What they do Best practice when faced With multiple options available To work autonomously across large quantities depending upon user needs Flexible hours You may choose between two different kinds : 1) Working under pressure 2............a lot more flexible Hours per Week $10 hrs plus bonus if required Locatio

### Ad generation using beam search

Generate the ad contents using the beam search and return 5 samples with different criteria

In [23]:
# Beam-search text generation:
sample_outputs = model.generate(generated, 
                                do_sample=True,   
                                max_length=MAXLEN,                                                      
                                num_beams=5,
                                repetition_penalty=5.0,
                                early_stopping=True,      
                                num_return_sequences=5
                                )

for i, sample_output in enumerate(sample_outputs):
    text = tokenizer.decode(sample_output, skip_special_tokens=True)
    a = len(title) + len(abstract) + len(','.join(keywords))
    # a = len(title) + len(abstract)    
    print("{}: {}\n\n".format(i+1,  text[a:]))

To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at  ../aten/src/ATen/native/BinaryOps.cpp:467.)
  return torch.floor_divide(self, other)


1: t trends using analysis tools including ML or R;Analyse trading patterns using quantitative modelling methods such as Bayes' Uncertainty Hypothesis Estimator;Manage intra-annuation remuneration for accounts receivables based on interest rates;Provide technical assistance when required due to client needs;Work autonomously at critical junctures during periods of high volume - particularly times of large volumes We are looking for experienced financial services analysts willing to join our team.. If this sounds like you, please apply today! ??€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????€????