Skip to content

Yidti/jobscan

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

JOBSCAN

Docker Compose Installation and Execution

To set up the project using Docker Compose, follow these steps:

  1. Clone the repository:

    git clone https://github.com/Yidti/jobscan.git
    cd jobscan
  2. Create a .env file with necessary environment variables.

    Note: This project is a sample and requires you to modify the environment settings in the .env file and docker-compose.yml. Apologies, the environment variables are not yet fully organized.

  3. Build and start the containers:

    docker-compose up --build
  4. Access the services:

Features

Filter conditions need to be set in advance

  • Set filters before website searching

    # custom filter params for search - for yidti
    role = {'ro':'全職'}
    keyword = {'keyword':"後端工程師 python"}
    isnew = {'isnew':'三日內'}
    jobexp = {'jobexp':['1年以下', '1-3年']}
    mode = {'mode':'列表'}  # 一次能呈現比較多筆資料
    order = {'order':'日期排序'}
    asc = {'asc':'遞減'}
    filter_params = get_filter_params(role, keyword, isnew, jobexp, mode, order, asc)
    user = "yidti"
    title = "data_Engineer"
  • Set filters for the data saved after web crawling."

    # keywords for filter job again
    job_keywords = ('工程','資料','python','data','數據','後端')
    # Exclude keywords to filter out companies related to gambling or others that I don't want to consider.
    company_exclude = ('新加坡商冕創有限公司','新博軟體開發股份有限公司','現觀科技股份有限公司'
                          ,'全富數位有限公司','杰思數位有限公司','博凡星國際有限公司',
                          '尊博科技股份有限公司','新騎資訊有限公司','新加坡商鈦坦科技股份有限公司台灣分公司',
                          '豪穎科技股份有限公司','塶樂微創有限公司','磐弈有限公司',
                          '聯訊網路有限公司','冶金數位科技有限公司','肥貓科技有限公司',
                          '無名科技有限公司','博澭科技有限公司','緯雲股份有限公司',
                          '風采有限公司','英屬維京群島商嘉碼科技有限公司台灣分公司',
                          '冠宇數位科技股份有限公司','英仕國際有限公司','元遊科技有限公司',
                          '禾碩資訊股份有限公司','向上集團_向上國際科技股份有限公司',
                          '弈樂科技股份有限公司','馬來西亞商極限電腦科技有限公司台灣分公司',
                          '樂夠科技有限公司','威智國際有限公司','紅信科技有限公司',
                          '深思設計有限公司','揚帆科技有限公司','晶要資訊有限公司',
                          '九七科技股份有限公司','臣悅科技有限公司','尊承科技股份有限公司',
                          '遊戲河流有限公司','唐傳有限公司','捷訊資訊有限公司',
                          '逍遙遊科技有限公司','澄果資訊服務有限公司','果遊科技有限公司',
                          '昱泉國際股份有限公司','博星數位股份有限公司',
                          )
    print(f"設定排除{len(company_exclude)}家公司")

ETL Structure

image

Airflow Workflow

image

  • Web Crawler: Search results list and details from 104.

  • Export File: Save data into an Excel file.

  • Data Lake: Store data into NoSQL (MongoDB).

  • Data Warehouse: Store data into SQL (MySQL).

  • FastAPI: Develop RESTful API for accessing job listing data.

    image

  • EDA (Exploratory Data Analysis):

    • Vertical Bar Chart: Education Education Vertical Bar Chart
    • Horizontal Bar Chart: Education Education Horizontal Bar Chart
    • Horizontal Bar Chart: Major Major Horizontal Bar Chart
    • Horizontal Bar Chart: Skills Skills Horizontal Bar Chart
    • Horizontal Bar Chart: Tools Tools Horizontal Bar Chart
    • Pie Chart: Education Education Pie Chart
    • Pie Chart: Location Location Pie Chart
    • Pie Chart: Major Major Pie Chart
    • Pie Chart: Skills Skills Pie Chart
    • Pie Chart: Tools Tools Pie Chart
  • Data Cleaning and Merging:

    • Horizontal Bar Chart + Word Cloud: Tools Tools Bar Chart + Word Cloud
    • Vertical Bar Chart: Tools Tools Vertical Bar Chart
    • Horizontal Bar Chart: Tools Tools Horizontal Bar Chart

References

About

study 104 jobs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published