Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removal of inactive job listings #45

Open
dsychin opened this issue Feb 23, 2022 · 4 comments
Open

Removal of inactive job listings #45

dsychin opened this issue Feb 23, 2022 · 4 comments
Milestone

Comments

@dsychin
Copy link
Member

dsychin commented Feb 23, 2022

Currently implementation adds new job listings but does not remove old ones.

2 ways to do this.

  1. If the job page is no longer accessible when it is no longer valid, then just checking the page regularly and marking it appropriately should be fine.
  2. During the scraper job, scrape all job listings and compare with all entries in the database and mark missing ones as inactive.
@syahnur197
Copy link
Contributor

to implement number 2, maybe I can do it like this

  1. when running cmd/jobbuzz-scraper.go, mark all jobs as inactive
  2. Fetch all jobs
  3. Loop all jobs, for each job, if exist, mark back as active, else insert

@syahnur197
Copy link
Contributor

@dsychin what do you think of this approach? Is there any better approach?

@dsychin
Copy link
Member Author

dsychin commented Mar 9, 2022

@syahnur197 I am leaning towards solution 1 at the moment. It has a few advantages:

  • more scalable because we don't have to scrape everything all the time
  • more resilient because we even if some pages fail it is still fine.
  • similar flow can be used for both providers. e.g. if page is unavailable for jobcentre, then it is inactive. if entry is older than x days in bruneida, then mark deleted.

@dsychin dsychin added this to the MVP milestone Apr 24, 2022
@dsychin
Copy link
Member Author

dsychin commented Apr 24, 2022

Implement expiry date for job listings.

Use "last date to apply" for jobcentre listings, and an arbitrary time period for bruneida.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants