forked from wonko9/skynet
/
History.txt
144 lines (119 loc) · 12 KB
/
History.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
== 0.9.3 2008-05-22
Skynet::Manager and Skynet Script Runner
- Rewrote how Skynet workers and skynet manager talks on each machine. See below for more info
- Support starting Skynet with ./script/skynet start and stop to daemonize
- Gracefully handles trying to start skynet more than once
- Close file handles on exec.
Skynet::Worker and Skynet::Manager now call Skynet.fork_and_exec instead of their own versions.
Skynet.fork_and_exec prevents file descriptor exhaustion by calling Skynet.close_file_handles.
Skynet::Manager detatches from console by calling Skynet.close_console
- Dramatically improved Skynet shutdown time
- Huge performance improvements in taking/putting tasks and results. Much lower resource utilization, especially on mysql.
- You now have to specify your PIDFILE and LOGFILE dirs and files differently. This will break all old skynet runners. Sorry.
- Skynet runner handles default config variables better
- Fix serious bug in Skynet::Partitioners::RecombineAndSplit where it wouldn't handle empty results well.
- Fixed Skynet::Partitioners::ArrayDataSplitByFirstEntry to handle strings as keys better
- Added Skynet::Job.results_by_job_id to retrieve results from asyncronous jobs
- Added printlog logging method which always prints to the log as [LOG]
- Deprecated Skynet.new to Skynet.start
- Mysql Message Queue Adapter - Make delete_expired_messages much safer.
- ActiveRecord::Base.distributed_find - Patch submitted by Lourens Naude (lourens@methodmissing.com) which checks the model for the primary_key name as opposed to assuming it is
'id'
- We don't want to use rails constantize so I've temporarily borrowed the method from ActiveSupport inflector and added it to skynet_ruby_extensions.
- Fix bug in Job comment where it referenced MapreduceTest instead of Skynet::MapreduceTest
- Fix tests. For some reason you still can't run ALL the test at once with rake test, but if the files are run individually they all pass.
- Change mysql text fields to longtext in migration and schema files
- Include some extras including our init.d script, our nagios monitoring script, rails controller and view for monitoring
- Created a new skynet rails initializer which gets installed from skynet_install --rails
- Modified the skynet_install skynet runner to take skynet initializer into account
- Skynet::ActiveRecordExtensions Fixed a bug where it would fail if your table had fewer than 1000 rows. It performs a count
first now to make sure there are enough rows.
- Introduced some new Skynet::Config methods for getting logfile and pidfile locations
Skynet Manager/Worker Refactor
The workers used to publish their worker statuses to the skynet_worker_queue which lived in the same Q space as the skynet_message_queue. skynet managers would then query that queue for their workers' statuses. This was a very inefficient use of central resources. NOW, workers communicate with their manager vir DRb, calling manager.worker_notify(status). This adds the worker status hash onto a local Queue object stored in the manager. A separate thread watches that queue and updates the internal manager information about it's workers. This was in place of polling a queue every N seconds for new worker records.
Managers now save their worker information to a file periodically so they can be reloaded on a restart. This means managers can keep track of how many tasks all of their workers have done even after restarts.
As a consequence of decentralizing worker stats, you now have to ask all your managers for their individual stats along with the main message queue stats. Added a stats_for_hosts method to Skynet::Manager which aggregates stats accross many managers.
== 0.9.2 2008-01-22
Highlights:
- Multiple Message Queues
- Many more Job options including options to control how jobs are distributed.
- The various options for how a job is run has been made much clearer.
- The Mysql Message Queue Adapter has been optimized and made more reliable.
- You can now control how many times skynet retries failed master, map and reduce tasks.
- Large data sets can now be streamed to the queue.
Details:
Active::Record#distributed_find
- Active Record distributed_find now handles REALLY large sets by breaking them into seperate jobs. 1MM models per master broken into ranges of 1000.
Skynet::Job
- code path through Skynet::Job is now clear. There are 3 ways to run Skynet::Job. Local Master (default), Remote Master, Async (implies remote master)
- Skynet::Job supports keep_map_tasks and keep_reduce_tasks settings.
If true, the master will run the tasks locally.
If a number is provided, the master will run the tasks locally if there are LESS THAN OR EQUAL TO the number provided
There are also Skynet::CONFIG settings for defaults. DEFAULT_KEEP_REDUCE_TASKS, DEFAULT_KEEP_MAP_TASKS
I can see there being a problem with the timeouts being a little off... Since your kinda in a master and kinda in a map or reduce timeout. The task timeouts will be correct at least. Though, you won't get the benefit of redoes yet.
- Skynet::Job now supports setting RETRY times per job by MASTER, MAP and REDUCE. So you can have a MASTER_RETRY=0, but have MAP_RETRY=2 and REDUCE_RETRY=3. There are now defaults for those as well :DEFAULT_MASTER_RETRY, :DEFAULT_MAP_RETRY, :DEFAULT_REDUCE_RETRY. If a message passes its RETRY it will be marked with an iteration of -1. delete_expired_messages removes those messages as well. These show up in the stats as :failed_tasks. Skynet::Task and Skynet::Message now have retry fields denoting the maximum number of retries.
- You can now pass queue_id or queue to Skynet::Job
- There is now a :MAX_RETRIES config setting that controls how many iterations Skynet will even look for tasks as well.
- You can now stream map_data to the queue by passing an Enumerable for your map_data
- Refactored Skynet::Job to be much cleaner and easier to test.
- Skynet::Job now has access to a local queue which it can treat almost like the real one.
- deprecate Skynet::Job#run_master
- rename reduce_partitioner method to just reduce_partition
- Skynet::Job has better support for running tasks locally. A job may run tasks in its own process if
you are running in solo mode, you've made a "single" job, or you set the keep_map_tasks or keep_reduce_tasks below.
When a job runs tasks locally it now honors the retry settings and timeouts.
- Skynet::Job and Skynet::AsyncJob are now almost identical. In fact you can just use Skynet::Job and tell it to run async.
- Skynet::Job Changed map_tasks and reduce_tasks to mappers and reducers respectively. This was to remove the ambiguity between the actual map/reduce tasks and the number of mappers/reducers desired.
- Skynet::Jobs can not be told what queue to use for that job by passing :queue or :queue_id DEFAULT 0
- Skynet::Job won't call the reduce_partitioner if there are no valid results from the map_step.
MapreduceHelper mixin
- You can include MapreduceHelper into your class and then implement self.map_each and self.reduce_each methods. The included self.map and self.reduce methods will handle iterating over the map_data and reduce_data, passing each element to your map_each and reduce_each methods respectively. They will also handle error handling within that loop to make sure even if a single map or reduce fails, processing will continue. If you do not want processing to continue if a map fails, do not use the MapreduceHelper mixin.
Multiple Message Queues!
- Add the ability to have multiple message queues in the same table message_queue_table.
- You can start skynet with a --queue_id or --queue option to determine which queue workers should look in.
- Skynet::Jobs can not be told what queue to use for that job by passing :queue or :queue_id. DEFAULT 0
- Queues can be configred via Skynet::CONFIG[:MESSAGE_QUEUES] = [] which comes with an array of queues id 1 through 10 named "one" through "ten"
Skynet Console
- You now start the skynet console by running 'skynet console' at the command line. There is no longer a skynet_console app.
- The console now loads the configs that are in your local skynet script.
Skynet::Config
- Added Skynet.silent {} Runs your code with no debugging output.
- Made sure all config options for TupleSpace? adapter begin with TS and all Mysql adapter CONFIG settings start with MYSQL.
- There is now a :MAX_RETRIES config setting that controls how many iterations Skynet will even look for tasks as well.
Skynet::Partitioners
- Created a Skynet::Partitioners class where various partitioners can be found. To specify one of them merely provide that specific Skynet::Partitioners subclass as your reduce_partitioner in your Skynet::Job
Skynet Install
- skynet_install now has a --mysql option. This installs the migration as well as a skynet_schema.sql file.
Mysql Message Queue Adapter
- You can now configure your database options outside of rails with Skynet::CONFIG[:MYSQL_*] options.
- Mysql adapter now updates the updated_on time of rows in skynet_message_queues
- Fixed Mysql Message Adaptor to take_next_task safer and more efficiently. There seems to be far less risk of a race condition where two workers would take the same task.
- Eliminated 1 db update per every task taken making it MUCH more efficient.
- Skynet::MessageQueueAdaptor::Mysql now tries to reconnect if it gets disconnected. This was to solve the "Mysql Server has Gone Away" errors.
- Implemented version_active? in mysql message queue adapter. It's a way for workers to check to see if a version is still in the queue.
Skynet::Task
- Skynet::Task#master_task takes care of creating the master job and task now. I have mixed feelings about this.
- ENFORCED TIMEOUTS - Even though a master might give up on a worker if it didn't respond in time, there was nothing to step any given worker from running forever. We now enforce the timeouts (master_timeout, map_timeout, reduce_timeout) given in Skynet::Job using the Timeout module. This causes a Timeout::Error to be thrown. If you are using the mysql adapter, this can cause strange results sometimes. If the Timeout error is thrown during a DB query, ActiveRecord will throw an ActiveRecord::StatementInvalid exception which includes the Timeout::Error exception in it. Not sure how to prevent that from happening.
Skynet::Worker
- Workers now have a Skynet::CONFIG[:WORKER_MAX_PROCESSED] setting to control when to respawn based on how many that worker has processed.
- You can start skynet with a --queue_id or --queue option to determine which queue workers should look in.
- Workers do not restart until there are no more items in the queue of that version.
Skynet::Message
- Skynet::Message now stores fields as an array. It is far more efficient now as well.
Now 90% more Tests!
BUGFIXES
- Skynet Workers now restart properly when the worker_version changes.
- starting tuplespce_server, you no longer need to provide the --port if you're already providing the drburi-
- Fixed a bug where Skynet::MessageQueueAdapter::Mysql would sometimes pick up tasks another worker had already picked up.
- Fix bug in Skynet::Worker where it wouldn't die right if the max processed was reached.
- Workers were supposed to restart when the worker_version changed. They do that properly now.
Thanks to Jason Rimmer for finding these bugs.
- Fix bug in Skynet::Message where it would calculate the iteration improperly.
- The skynet gem appears to be missing the Rubigen dependency
- Running skynet_install even without the rails arg still generates code with rails dependencies and tailings: RAILS_ROOT, RAILS_ENV, and the various directory tailings such as 'db/migrate', etc.
- Generated skynet script is missing "require 'rubygems'"
- Specification of pid directory and file is incorrect as 'skynet_manager.rb' wants only a directory with it specifying the file
- The sleep while waiting to start the queue server isn't long enough. There is now a CONFIG setting TS_SERVER_START_DELAY.
== 0.0.1 2007-12-16
* 1 major enhancement:
* Initial release