Skip to content

A simple crawler to find jobs from several websites

License

Notifications You must be signed in to change notification settings

damienzeng73/find-job

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Find-job

A simple crawler to find jobs from several websites.

Dependencies

  • Python3+
  • MongoDB

Getting Started

Installation

Clone this repository:

git clone https://github.com/damnee562/find-job.git

Create virtualenv and install all requirements:

cd find-job/
python3 -m venv venv_name
source venv_name/bin/activate
pip install -r requirements.txt

Make sure MongoDB is running on your system, you can check its status by typing:

sudo service mongod status

If it's not running, fire it up:

sudo service mongod start

Start crawling using scrapy:

cd find_job/
python crawl.py -k python # Replace 'python' to whatever keyword you want.

It will create a new database named find_job in MongoDB, all found jobs will be stored into jobs collection.

# jobs collection
{
    "_id": ObjectId(),
    "name": Job's title,
    "location" Job's location,
    "company": Offer's company,
    "salary": Salary range,
    "date": Post date,
    "url": Link to post
}

Export data in CSV format:

mongoexport -d find_job -c jobs --type=csv --fields name,company,location,salary,date,url -o jobs.csv

See Official docs for more export usage.

Built With

  • Scrapy - An open source and collaborative framework for extracting the data you need from websites.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

A simple crawler to find jobs from several websites

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages