Skip to content

For collecting top 10 voted and 10 Newest featured Questions on Stack Overflow

License

Notifications You must be signed in to change notification settings

LeslieWongCV/stack-overflow-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Stack Overflow Crawler

Powerful tool for collecting top 10 voted and 10 Newest featured Questions on Stack Overflow Features:

  • Switch between the time window of 7 days and 1 month
  • Configurable pool size
  • Speed Contral
  • Rendering Web Page with Django

This is a task last for 48 hours. Thanks to Scrapy Docs and The Django Book

Usage

NOTE: These instructions assume that you have Scrapy and Django installed.

Path

This file: '/Stack_Overflow_crawler/django_st/django_st/views.py' -> change path at line 16 and line 26 to: '/Stack_Overflow_crawler-master/Scrapy_module/stack_overflow/dict_context.txt' note: '~' is up to your storge, make sure it contains the 'dict_context.txt'

Go to the directory before starting:

$ cd ~/Stack_Overflow_crawler-master/Scrapy_module

The easiest way to get started is using the init.py:

Step 1/2

$ python init.py

Two files will be generated at : '~/Stack_Overflow_crawler-master/Scrapy_module/' .

NOTE: This step will download and sort the data on the website, sort them by time and number of comments, then build a python dictionary for input to Django backend.

Step 2/2

Go the the directory:

$ cd ~/Stack_Overflow_crawler-master/django_st

and start the server:

$ python manage.py runserver 0.0.0.0:8000

Check the result at http://0.0.0.0:8000/

Note

  • You are able to swtich between 7 days and 30 days:

License

MIT

If you do find this script useful, a link back to this repository would be appreciated. Thanks!

Advance

[updating]

About

For collecting top 10 voted and 10 Newest featured Questions on Stack Overflow

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published