Skip to content

Package for DB processor and Asyncio Web Scraper based on coroutine

Notifications You must be signed in to change notification settings

Hail-cali/db_modeling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Coroutines Web Scrapper & DB Processor

What is Coroutines ?

  • asynchronous programming
  • stable, also useful to exception handling

Based on asyncio Stream

sample

Feature

  • coroutines web scraper: run_web_scrapper.py in test
  • coroutines selenium scrapper : run_selenium.py in test
  • db processor: DBConnector in db_connector
  • query builder: dev for sql query builder & http query builder

How to use

Web Scraper (crawling)

  • run code inside test dir
  • when use tasks.csv
python run_web_scrapper.py --tasks path/to/tasks.csv --save_file crawling  \
                            --result_path result --result_type text
  • when edit url list inside code, skip tasks option
python run_web_scrapper.py  --save_file crawling  \
                            --result_path result --result_type text
  • sample run sh
python run_web_scrapper.py --tasks ../tasks.csv --save_file crawling  \
                            --result_path result --result_type text

Selenium Scrapper

python run_selenium.py --save_file selenium  \
                            --result_path result --result_type text

DB processor

  • shell 'dev'
python run.py

Modules

  • stream with request module: Reader, Writer, Stream, Session in stream.map

How to Custom

  • inherit stream.map.BaseSession, make CustomSession

  • edit code inside async def aenter

  • edit params base_session of func asyncio_scraper in run_web_scrapper.py

  • Same as selenium scrapper

example


Task lists

  • selenium scrapper
  • db processor code
  • web scrapper code
  • Dev crawler using API

About

Package for DB processor and Asyncio Web Scraper based on coroutine

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages