- 繁體中文 README.md(Traditional Chinese README.md)
- Using SQLite, asyncio, pandas, pyquery, requests to build a crawler and deploy on Heroku with connected to Line ChatBot.
104, 1111 and cakeresume are the most frequently used job search websites in Taiwan.
Nowadays, job search information are huge and change rapidly. It is always effective and necessary to quickly collect job search information through crawlers. Besides, storing the crawled data into the database can be search at any time.
In this project, different from the previous linear execution method, I use an asynchronous method to crawl the job websites. This application will increase the crawler speed and shorten the waiting time.
- python
- SQLite
- asyncio
- Pandas
- pyquery
- requests
Clone the repo
git clone https://github.com/DysonMa/JobSearch.git
- Define the required parameters.
web
can also choose104
,1111
andcakeresume
.end_page
means the number of page for crawling.
web = '104'
keyword = 'python'
end_page = 3
sqlite_path = './job.db'
- Execute crawling
start_crawling()
歷經6.618713617324829秒
存檔成功
- Show DataFrame
df
Distributed under the MIT License.
Dyson Ma - madihsiang@gmail.com Project Link: https://github.com/DysonMa/JobSearch