Skip to content

This repo comprises multiple web scrappers written in Python.

Notifications You must be signed in to change notification settings

AyushSenapati/Spider

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spider

This repo comprises a scrapper written in Python, which supports scraping indeed.com for now.

Features

It takes two files as input. one for the job list and another for locations. Depth of crawling must be specified else it will set the default depth to 3.

  • It is a multi-threaded scrapper. for each querry it initialises a thread.
  • Provides inbuilt Proxy pool API, IP rotater.
  • Requests are made at random interval for real user simulation.
  • Data is stored in pandas dataframe and dumped into a json file.

About

This repo comprises multiple web scrappers written in Python.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages