Skip to content

Commit

Permalink
WIP
Browse files Browse the repository at this point in the history
  • Loading branch information
scruti committed Jul 10, 2024
1 parent 4cd05e8 commit 73ef313
Show file tree
Hide file tree
Showing 10 changed files with 470 additions and 119 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ module Vacancies::Export::DwpFindAJob::PublishedAndUpdatedVacancies
class ParsedVacancy
include ActionView::Helpers::SanitizeHelper
include Rails.application.routes.url_helpers
include Vacancies::Export::DwpFindAJob::Versioning

CATEGORY_IT_ID = 14
CATEGORY_EDUCATION_ID = 27
Expand All @@ -14,7 +15,7 @@ class ParsedVacancy

attr_reader :vacancy

delegate :id, :job_title, :organisation, to: :vacancy
delegate :job_title, :organisation, to: :vacancy

def initialize(vacancy)
@vacancy = vacancy
Expand All @@ -38,19 +39,22 @@ def description
end

def expiry
expiry_date = vacancy.expires_at.to_date
# Why max 29 days? Find a Job is rejecting vacancies where the expiry date 30 days from today is explicitly set.
# If left blank it will default to 30 days from today without rejection.
return unless expiry_date.between?(Date.today + 1, Date.today + 29.days)
min_date = date_from_publishing_version(MIN_LIVE_DAYS).in_time_zone.at_beginning_of_day
max_date = date_from_publishing_version(MAX_LIVE_DAYS).in_time_zone.at_end_of_day
return unless vacancy.expires_at.between?(min_date, max_date)

expiry_date.to_s
vacancy.expires_at.to_date.to_s
end

def reference
versioned_reference(vacancy)
end

def status_id
wp = vacancy.working_patterns
if wp.blank?
nil
elsif wp.include?("full_time") || (wp.include?("term_time") && wp.exclude?("part_time"))
return if wp.blank?

if wp.include?("full_time") || (wp.include?("term_time") && wp.exclude?("part_time"))
STATUS_FULL_TIME_ID
else
STATUS_PART_TIME_ID
Expand All @@ -68,6 +72,16 @@ def type_id

private

# Every particular repost version of a vacancy will be live for a different 30 days period after the previous version.
def date_from_publishing_version(offset_days)
publish_date = vacancy.publish_on.to_date
# Export runs at 23:30 and publishes vacancies published on TV after 23:30 the previous day to 23:30 today
# We need to add a day to the TV publish date if the vacancy was published after 23:30 to reflect when the vacancy
# got exported to Find a Job service.
publish_date += 1.day if vacancy.publish_on.after?(vacancy.publish_on.change(hour: 23, min: 30))
publish_date + ((version(vacancy) * DAYS_BETWEEN_REPOSTS) + offset_days).days
end

def description_paragraph(title, text)
plain_text = html_to_plain_text(text)
return "" if plain_text.blank?
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
module Vacancies::Export::DwpFindAJob::PublishedAndUpdatedVacancies
class Query
# Find a Job only accepts "expiry" dates up to 30 days from the date of export/update.
FIND_A_JOB_MAX_EXPIRY_DAYS = 30
include Vacancies::Export::DwpFindAJob::Versioning

attr_reader :from_date

Expand All @@ -12,8 +11,7 @@ def initialize(from_date)
def vacancies
vacancies_published_after_date
.or(vacancies_updated_after_date)
.or(vacancies_that_reached_expiry_date_threshold)
.or(vacancies_that_need_expiry_date_pushed_back)
.or(vacancies_to_repost_today)
end

private
Expand All @@ -30,32 +28,19 @@ def vacancies_updated_after_date
vacancies_published_before_date.where("updated_at > ?", from_date)
end

# Vacancies in our service that had over 30 days to expire when published,
# but as today have reached exactly 30 days to expire.
# Find a Job vacancies can only be posted up to 30 days from the original posting date.
#
# Now we can select them to be exported so it will align Find a Job and TV expiration/closing dates.
def vacancies_that_reached_expiry_date_threshold
vacancies_published_before_date.where(expires_at: (Time.zone.now + FIND_A_JOB_MAX_EXPIRY_DAYS.days).all_day)
end

# Vacancies in our service that have over 30 days to expire.
#
# Since Find a Job vacancies have a maximum expiry date of 30 days from when exported/updated, we need to regularly
# push back the expiration date onf Find a Job to "30 days from today" through the life of the vacancy in TV service.
#
# Every time we select them for exporting, it will push back their expiry date on Find a Job service to the max 30 days.
#
# To achieve this regular updates, we select for export any TV vacancy that::
# - has over 30 days to expire
# - the difference in days between today and the vacancy expiration date in TV is a multiple of 7 days.
# Vacancies in our service generally last well beyond 30 days, so we need to repost them as new job adverts in Find
# a Job service every 31 days from the publish date, exactly when the previous advert has expired and the previous
# version of the vacancy posting is no longer live.
#
# This will cause vacancies to be exported every 7 days (pushing back their expiry on Find a Job service to 30 days)
# until they reach the last 30 days of their life.
def vacancies_that_need_expiry_date_pushed_back
vacancies_published_before_date
.where("expires_at > ?", Time.zone.now + FIND_A_JOB_MAX_EXPIRY_DAYS.days)
.where("DATE_PART('day', DATE_TRUNC('day', expires_at::timestamp) - '#{Date.today}'::date)::integer
% 7 = 0") # TV expiration date is a multiple of 7 days from today
# This query identifies all the live vacancies that, as today, need to be reposted as their publish date is a
# multiple of 31 days ago.
def vacancies_to_repost_today
vacancies_published_before_date.where(
"DATE_PART('day', DATE_TRUNC('day', '#{Date.today}'::date - publish_on::timestamp))::integer
% #{DAYS_BETWEEN_REPOSTS} = 0",
)
end
end
end
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ def vacancy_to_xml(vacancy, xml)

vacancy = ParsedVacancy.new(vacancy)

xml.Vacancy(vacancyRefCode: vacancy.id) do
xml.Vacancy(vacancyRefCode: vacancy.reference) do
xml.Title vacancy.job_title
xml.Description vacancy.description
xml.Location do
Expand Down
29 changes: 29 additions & 0 deletions app/services/vacancies/export/dwp_find_a_job/versioning.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
module Vacancies::Export::DwpFindAJob::Versioning
# Find a Job vacancies can only be live to 30 days from the original posted date.
# They need to ve reposted as a different job advert every 31 days.
DAYS_BETWEEN_REPOSTS = 31
MIN_LIVE_DAYS = 1
MAX_LIVE_DAYS = 30

# It generates a versioned reference for a vacancy.
# The original reference for a vacancy on its first publishing period in Find a Job service is the vacancy id.
# Each repost period will version the reference as "id-1", "id-2", "id-3", etc.
def versioned_reference(vacancy)
version = version(vacancy)
return unless version

version.zero? ? vacancy.id : vacancy.id + "-#{version}"
end

# Each repost of a vacancy will have an incremental version number.
def version(vacancy)
return if vacancy.publish_on.blank? || vacancy.publish_on.to_date > Date.today

published_days = (Date.today - vacancy.publish_on.to_date).to_i
if published_days < DAYS_BETWEEN_REPOSTS
0
else
published_days / DAYS_BETWEEN_REPOSTS
end
end
end
2 changes: 2 additions & 0 deletions config/schedule.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ export_users:
class: 'ExportDSIUsersToBigQueryJob'
queue: low

# Internal querying & parsing of Find a Job export data depends on this running time.
# Be careful if changing this time as you will need to adapt the code.
export_vacancies_published_and_updated_to_dwp_find_a_job_service:
cron: '30 23 * * *'
class: 'ExportVacanciesPublishedAndUpdatedSinceYesterdayToDwpFindAJobServiceJob'
Expand Down
102 changes: 63 additions & 39 deletions documentation/integrations/dwp-find-a-job.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,9 @@ title: DWP Find a Job integration code structure
classDiagram
PublishedAndUpdated *-- Upload : composition
ClosedEarly *-- Upload : composition
NewQuery *-- Versioning : inclusion
NewXml *-- ParsedVacancy : composition
ParsedVacancy *-- Versioning : inclusion
PublishedAndUpdated *-- NewXml : composition
PublishedAndUpdated *-- NewQuery : composition
ClosedEarly *-- ExpiredXml : composition
Expand All @@ -108,6 +110,11 @@ classDiagram
-upload_to_find_a_job_sftp(file_path)
}
class Versioning {
versioned_reference
version
}
namespace PublishedAndUpdatedVacancies {
class NewXml["Xml"] {
vacancies
Expand All @@ -121,13 +128,12 @@ classDiagram
-vacancies_published_after_date()
-vacancies_published_before_date()
-vacancies_updated_after_date()
-vacancies_that_reached_expiry_date_threshold()
-vacancies_that_need_expiry_date_pushed_back()
-vacancies_to_repost_today()
}
class ParsedVacancy {
vacancy
+id()
+reference()
+job_title()
+organisation()
+apply_url()
Expand All @@ -136,6 +142,7 @@ classDiagram
+expiry()
+status_id()
+type_id()
-date_from_publishing_version(offset_days)
-description_paragraph(title, text)
-html_to_plain_text(html)
-sanitize(text)
Expand Down Expand Up @@ -166,72 +173,89 @@ We publish vacancies that fall on any of these conditions:
EG: Providing yesterday's date, all the vacancies published yesterday and today will be returned.
- Were previously published but got updated after the given date.
In this case, the publish will update the vacancy changed info in the Find a Job service.
- Were previously published but need their expiry date to be updated/pushed back in Find a Job service.
- Were previously published but need to be reposted on Find a Job service.

```mermaid
block-beta
columns 3
block:TeachingVacancies:3
columns 4
space:2 Database[("Database\nVacancies")] space
columns 8
block:TeachingVacancies:8
columns 3
space Database[("Database\nVacancies")] space
RecentlyPublished{"Recently\npublished"}
RecentlyUpdated{"Recently\nupdated"}
NeedToSetExpiryDate{"Need to set\nthe expiry date"}
NeedToPushExpiryDate{"Need to push\nthe expiry date"}
NeedToBeReposted{"Need to be\nreposted"}
space:2 Query space
space Query space
Database --> RecentlyPublished
Database --> RecentlyUpdated
Database --> NeedToSetExpiryDate
Database --> NeedToPushExpiryDate
Database --> NeedToBeReposted
RecentlyPublished --> Query
RecentlyUpdated --> Query
NeedToSetExpiryDate --> Query
NeedToPushExpiryDate --> Query
NeedToBeReposted --> Query
end
style TeachingVacancies fill:#e6f2ff,stroke:#333,stroke-width:2px
classDef querybit fill:#ffffcc,stroke:#333;
classDef datasource fill:#ffcc99,stroke:#333;
class RecentlyPublished,RecentlyUpdated,NeedToSetExpiryDate,NeedToPushExpiryDate querybit
class RecentlyPublished,RecentlyUpdated,NeedToBeReposted querybit
class Database datasource
```

### Vacancies that need their expiry date updated/pushed back

### Reposting Vacancies.
Find a Job service has a limit on the vacancy closing/expiry date:

**A vacancy advert must expire in maximum of 30 days from the date it got published/updated .**
**A vacancy advert must expire after a maximum of 30 days from the date it got published.**

For a vacancy to be accepted we must either:
- Do not specify an expiry date in its XML value: It will default to 30 days.
- Set a specific expiry date in its XML value: Between 1 and 30 days after the publish/update date.
For a new vacancy to be accepted we must either:
- Do not specify an expiry date: It will default to 30 days after the publish date.
- Specify an expiry date between 1 and 30 days after the publishing date.

For a vacancy edit/update to be acceoted we must either:
- Do not specify an expiry date: It will keep the original value.
- Set a specific expiry date: Between 1 and 30 days after the original publishing date.

#### The problem with this limitation
The majority of our live vacancies have closing dates way beyond a month from the publish date.

**How do we ensure the vacancy from TV is live in the DWP Find a Job service after 30 days from being published?**

#### Our take on resolving this
Forcing regular updates to Find a Job for TV long-life vacancies, pushing back the expiry date to 30 days from the update.

Doing this daily would cause hundreds/thousands of unneeded updates, so we decided to do this on a weekly basis:
- When the query for filtering vacancies to push runs, it will also select vacancies that:
- Have a TV closing date over 30 days from today.
- The number of days between the current date and its TV closing date is a multiple of 7.
```mermaid
timeline
title Max life of an advert in Find a Job
Day 1 : Advert gets Published
Day 31 : Advert expires.
: Last day published
Day 32 : Advert no longer published
These conditions translate in **any vacancy with closing date over 30 days from today, will be selected for pushing back their "Find a Job" expiry date once per week**.
```

This approach distributes the load of pushing back vacancies over multiple days on the regular publish/update runs, instead of having a dedicated scheduled job dedicated to exclusively query/update these vacancies.
#### How do we ensure the vacancy from TV is live in the DWP Find a Job service after 30 days from being published?

#### Alternative approach
If we want more fine control over when to run the expiration date pushbacks for long-life TV vacancies, we could extract it to its own query/upload service, and schedule it independently to push back all long-life vacancies expiry dates in Find a Job service.
We will republish the vacancy as a new advert on DWP Find a Job every 31 days.

This would:
- Pro: Reduce DB query complexity. As we would only need to query: "Any vacancies with expiry date over 30 days from today".
- Con: Increase the size of the XML files we push into DWP Find a Job service, as now selects "all the vacancies over 30 days..." instead of a subset of those.
**EG: Lifetime of a TV vacancy published today, that expires in TV 70 days later:**
```mermaid
timeline
title Vacancy: TV --> Find a Job
Day 1 : TV exports the vacancy for first time
: Advert reference is the vacancy "id"
: Advert expiry date is left to default 30 days after today
: Advert "id" gets Published
Day 31 : Advert "id" Expires
Day 32 : Advert "id" no longer published
: TV exports the vacancy for second time
: New advert with reference "id-1"
: Advert expiry date is left to default 30 days after today
: Advert "id-1" gets published
Day 62 : Advert "id-1" expires
Day 63 : Advert "id-1" no longer published
: TV exports the vacancy for third time
: New advert with reference "id-2"
: Advert expiry date is set to 8 days after today, matching TV expiry date.
: Advert "id-2" gets published
Day 71 : Advert "id-2" Expires same day as in TV
Day 72 : Advert "id-2" no longer published
```

## Uploading vacancies closed early

Expand Down
Loading

0 comments on commit 73ef313

Please sign in to comment.