-
Notifications
You must be signed in to change notification settings - Fork 1
aditya88/MCrawler
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
MCrawler version 0.01 ===================== The README is used to introduce the module and provide instructions on how to install the module, any machine dependencies it may have (for example C compilers and installed libraries) and any other information that should be provided before the module is installed. A README file is required for CPAN modules since CPAN extracts the README file from a module distribution so that people browsing the archive can use it get an idea of the modules uses. It is usually a good idea to provide version information here so that people can decide whether fixes for the module are worth downloading. INSTALLATION To install this module type the following: perl Makefile.PL --> creates make file make --> automatically checks dependencies and install required packages from CPAN make test --> tests not working need to write some make install --> avoid installing it since in development phase Creating Database createdb -W -O crawler -U crawler crawlerdb --> crawler is user --> crawlerdb is DB name --> default password is 'crawler' All default database values can be changed in MCrawler_config.yml psql -f <root_dir>/MCrawler_DB.sql -U crawler -d crawlerdb --> creates database create roles for database clients: login psql with super user permissions CREATE ROLE "client_"<client_id> WITH LOGIN PASSWORD <client_database_password> ; example: CREATE ROLE client_8989 WITH LOGIN PASSWORD 'secret'; Starting the Crawler and Server cd <Root_directory> perl Crawler.PL Starting Client perl MClient.PL Client command format: Making new request --> new_request Checking messages from server --> check_inbox Queueing a URL --> queue_url <request_id> <url> Queueing a file with URLs --> queue_file <request_id> <file_location> Other Settings --> downloads_type <request_id> <user_value> not_used --> depth_of_search <request_id> <user_value> default_value -> 3 --> refresh_rate <request_id> <user_value> default_value -> 60*60*24 --> allowed_content <request_id> <user_value> default_value -> html|txt --> user_agent <request_id> <user_value> default_value -> MCrawler bot (http://cyber.law.harvard.edu) --> dequeue_request <request_id> --> check_status <request_id> format of seeds.txt file is new line seperated URLs DEPENDENCIES This module requires these other modules and libraries: These are compulsory modules inorder to run Makefile.PL -> ExtUtils::MakeMaker -> ExtUtils::AutoInstall SAMPLE INPUT/OUTPUT Admin creates a role by logging into crawler database with psql CREATE ROLE client_8989 WITH LOGIN PASSWORD 'client_password'; change server port(default is 6666) and client id in MClient.PL pass it to client client starts a request by entering new_request.here is a series of standard commands. --> initiating a new request input --> new_request input --> check_inbox output --> you(8989) made new request 23 --> sending input to server input --> queue_url 23 http://wikipedia.org input --> queue_file 23 ./seeds.txt --> configuring the request input --> depth_of_search 23 10 input --> refresh_rate 23 7200 input --> user_agent 23 MCrawler-bot/1.0 (http://cyber.law.harvard.edu) --> committing the request input --> commit_request 23 -->status checking input --> check_status 23 output --> please check your request status on view_8989_23 table. Out of 1 URLs,0 URLs are completed and remaining are in processing. next client login to postgres server with given password by admin. and can only has SELECT permission on view_8989_23 view. -->dequeueing input --> dequeue_request 23 dequeue removes request_id from the queue and stops any requests with that request_id. And any data in database with that request_id will be removed. if client has module to contact server directly syntax of communication can be seen after executing every command. COPYRIGHT AND LICENCE Copyright (C) 2010 by aditya This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.10.1 or, at your option, any later version of Perl 5 you may have available.
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published