You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using the mentioned package to extract job post data with python3.10 and airflow 2.7.1.
I found that executing the script from the docker container directly is very fast but it get stucked indefnitly
when triggered from airflow using PythonOperator.
It scraps fast for linkedin and indeed but never can scrape zip_recruiter and glassdoor. What can be the reasons?
from jobspy import scrape_jobs
jobs = scrape_jobs(
# site_name=["indeed", "linkedin", "zip_recruiter", "glassdoor"],
site_name=['glassdoor'],
search_term=search_term,
location="Hong Kong",
results_wanted=results_wanted,
hours_old=hours_old, # (only Linkedin/Indeed is hour specific, others round up to days old)
country_indeed='Hong Kong', # only needed for indeed / glassdoor
# linkedin_fetch_description=True # get full description and direct job url for linkedin (slower)
)
FROM apache/airflow:2.7.1-python3.10
COPY requirements.txt /opt/airflow/
USER root
RUN apt-get update && apt-get install -y gcc python3-dev
USER airflow
RUN pip install --no-cache-dir -r /opt/airflow/requirements.txt
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Jobspy library: https://github.com/Bunsly/JobSpy
I am using the mentioned package to extract job post data with python3.10 and airflow 2.7.1.
I found that executing the script from the docker container directly is very fast but it get stucked indefnitly
when triggered from airflow using PythonOperator.
It scraps fast for linkedin and indeed but never can scrape zip_recruiter and glassdoor. What can be the reasons?
requirements.txt
Beta Was this translation helpful? Give feedback.
All reactions