No description, website, or topics provided.
Perl Shell
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
not used


MCrawler version 0.01

The README is used to introduce the module and provide instructions on
how to install the module, any machine dependencies it may have (for
example C compilers and installed libraries) and any other information
that should be provided before the module is installed.

A README file is required for CPAN modules since CPAN extracts the
README file from a module distribution so that people browsing the
archive can use it get an idea of the modules uses. It is usually a
good idea to provide version information here so that people can
decide whether fixes for the module are worth downloading.


To install this module type the following:

   perl Makefile.PL				 --> creates make file
   make                          --> automatically checks dependencies and install required packages from CPAN
   make test					 --> tests not working need to write some
   make install					 --> avoid installing it since in development phase
   Creating Database
   createdb -W -O crawler -U crawler crawlerdb   			   --> crawler is user
   												 			   --> crawlerdb is DB name
   												 			   --> default password is 'crawler'
   All default database values can be changed in MCrawler_config.yml
   psql -f <root_dir>/MCrawler_DB.sql -U crawler -d crawlerdb  --> creates database
   create roles for database clients:
   login psql with super user permissions
   CREATE ROLE "client_"<client_id> WITH LOGIN PASSWORD <client_database_password> ;
   example: CREATE ROLE client_8989 WITH LOGIN PASSWORD 'secret';
   Starting the Crawler and Server 
   cd <Root_directory>
   perl Crawler.PL
   Starting Client
   perl MClient.PL
   Client command format:
   Making new request   		 --> new_request
   Checking messages from server --> check_inbox
   Queueing a URL				 --> queue_url <request_id> <url>
   Queueing a file with URLs	 --> queue_file <request_id> <file_location>              
   Other Settings				 --> downloads_type <request_id> <user_value>			  not_used
   								 --> depth_of_search <request_id> <user_value>			  default_value -> 3
   								 --> refresh_rate <request_id> <user_value>				  default_value -> 60*60*24
   								 --> allowed_content <request_id> <user_value>			  default_value -> html|txt
   								 --> user_agent <request_id> <user_value>				  default_value -> MCrawler bot (
								 --> dequeue_request <request_id>
								 --> check_status <request_id>
   format of seeds.txt file is new line seperated URLs 

This module requires these other modules and libraries:

	These are compulsory modules inorder to run Makefile.PL
	-> ExtUtils::MakeMaker
	-> ExtUtils::AutoInstall
Admin creates a role by logging into crawler database with psql
CREATE ROLE client_8989 WITH LOGIN PASSWORD 'client_password';
change server port(default is 6666) and client id in MClient.PL pass it to client
client starts a request by entering is a series of standard commands.

--> initiating a new request
input --> new_request
input --> check_inbox
output --> you(8989) made new request 23

--> sending input to server
input --> queue_url 23
input --> queue_file 23 ./seeds.txt

--> configuring the request
input --> depth_of_search 23 10
input --> refresh_rate 23 7200
input --> user_agent 23 MCrawler-bot/1.0 (

--> committing the request 
input --> commit_request 23

-->status checking
input --> check_status 23
output --> please check your request status on view_8989_23 table.
			Out of 1 URLs,0 URLs are completed and remaining are in processing.

next client login to postgres server with given password by admin.
and can only has SELECT permission on view_8989_23 view.

input --> dequeue_request 23 
dequeue removes request_id from the queue and stops any requests with that request_id.
And any data in database with that request_id will be removed.

if client has module to contact server directly syntax of communication can be seen after executing every command.


Copyright (C) 2010 by aditya

This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself, either Perl version 5.10.1 or,
at your option, any later version of Perl 5 you may have available.