cwensel / cascading

Cascading is a feature rich API for defining and executing complex and fault tolerant data processing workflows on a Hadoop cluster.

cascading / CHANGES.txt
4477dbf8 » cwensel 2008-05-01 rename file 1 Cascading Change Log
2
a7920a80 » cwensel 2009-03-27 version 1.0.6 hadoop 0.19.0+ 3 1.0.6
9113a785 » cwensel 2009-03-17 Fixed bug where default pro... 4
d90ccac1 » cwensel 2009-03-26 Fixed bug where a uri path ... 5 Fixed bug where a uri path to a s3n://bucket/ could cause an NPE when determining mod time on the path.
6
4df22823 » cwensel 2009-03-23 Fixed bug where sink c.s.Sc... 7 Fixed bug where sink c.s.Scheme sink fields were not being consulted during planning. This fix may
8 cause planner errors in existing applications where the sink fields are not actually available in the incoming
9 tuple stream.
10
46c1ef4a » cwensel 2009-03-18 Updated application jar dis... 11 Updated application jar discovery to provide more sane defaults supporting simple cases.
12
9113a785 » cwensel 2009-03-17 Fixed bug where default pro... 13 Fixed bug where default properties in nested j.u.Properties object were not being copied.
14
71aec6b8 » cwensel 2009-03-12 Added check if num reducers... 15 1.0.5
16
17 Added check if num reducers is zero, if so, assume #reduce() has no intention of being called and return silently.
18
237bad0b » cwensel 2009-03-09 version 1.0.4 19 1.0.4
c0dbf7e0 » cwensel 2009-02-27 Fixed bug where unsafe file... 20
e1386626 » cwensel 2009-03-04 Updated split optimizer to ... 21 Updated split optimizer to perform a multipass optimization.
22
6ba46c5c » cwensel 2009-03-04 Fixed bug where c.f.MultiMa... 23 Fixed bug where c.f.MultiMapReducePlanner was not properly handling splits on named Pipe instances.
24
c1f3845e » cwensel 2009-03-03 Added c.t.TemplateTap const... 25 Added c.t.TemplateTap constructor arg that allows for independent tuple selection for use by template path.
26
27 Fixed bug where unsafe filename characters were leaking into temporary filenames, didn't take the first time.
c0dbf7e0 » cwensel 2009-02-27 Fixed bug where unsafe file... 28
1ec8bf57 » cwensel 2009-02-27 version 1.0.3 29 1.0.3
6065b836 » cwensel 2009-02-11 Fixed bug preventing c.t.Te... 30
7d8a36b2 » cwensel 2009-02-24 Fixed bug in c.f.MultiMapRe... 31 Fixed bug in c.f.MultiMapReducePlanner where split and joins with the same source were not handled properly.
32
4bbf8fe6 » cwensel 2009-02-24 Fixed bug in c.f.Flow#write... 33 Fixed bug in c.f.Flow#writeDOT caused by changes in 1.0.2.
34
de1b4c0f » cwensel 2009-02-24 Fixed bug in c.o.t.DateForm... 35 Fixed bug in c.o.t.DateFormatter and c.o.t.DateParser where the TimeZone value was not being properly set. This
36 fix could affect existing applications.
37
38 1.0.2
39
92f7ae67 » cwensel 2009-02-20 Added rules to verify no du... 40 Added rules to verify no duplicate head or tail names exist in an assembly when calling c.f.FlowConnector#connect().
74637d19 » cwensel 2009-02-20 demote dupe head and tail n... 41 Currently a WARNING will be issued via the logger, next major release this will be an exception. This is a change
42 that was supported in prior releases, but turns out to allow error prone code. Two workarounds are availabe: bind
43 the same tap to both names in the tap map, or split from a single named c.p.Pipe instance.
92f7ae67 » cwensel 2009-02-20 Added rules to verify no du... 44
730c59e8 » cwensel 2009-02-19 Added support for c.o.e.Exp... 45 Added support for c.o.e.ExpressionFunction to evaluate expressions with no input parameters.
46
b9ae5045 » cwensel 2009-02-19 Reverted MR job naming to i... 47 Reverted MR job naming to include sink c.t.Tap name. More verbose, but easier for degugging.
48
49 Update c.c.Cascade to not delete c.f.Flow sinks if they are appendable before the Flow is executed.
4d1486ea » cwensel 2009-02-19 Update c.c.Cascade to not d... 50
bb393118 » cwensel 2009-02-19 Updated error messages to w... 51 Updated error messages to warn when internal element graphs remove all place holders resulting in an empty graph
52 usually due to missing linkages between pipe assemblies.
53
5565cdd0 » cwensel 2009-02-18 Allowing Fields.UNKNOWN to ... 54 Allowing Fields.UNKNOWN to propagate through pipes that do not declare argument selectors. This is a relaxation
55 of the strict planning and seems very natural when assembling pipes to process unknown field sets. Reserving
56 the right to revert this feature if it causes unforseen issues.
57
58 Fixed bug in c.o.f.UnGroup where the num arg value was improperly calculated.
59
c4b37975 » cwensel 2009-02-18 Allow for white space in th... 60 Allow for white space in the serializations token property so it can be set in a config file simply.
61
faa46cb5 » cwensel 2009-02-18 Added new log message if no... 62 Added new log message if no serialization token is found for a class being serialized out.
63
7a89512c » cwensel 2009-02-18 Fixed bug that allowed c.t.... 64 Fixed bug that allowed c.t.Field instances to be nested in new Fields instances.
65
b42980f1 » cwensel 2009-02-18 Updated many error messages... 66 Updated many error messages to print the number of fields along with a list of the field names.
67
68 Fixed bug preventing custom c.s.Scheme types from using a different key/value classes in some situations.
5011488a » cwensel 2009-02-18 Fixed bug preventing custom... 69
6065b836 » cwensel 2009-02-11 Fixed bug preventing c.t.Te... 70 Fixed bug preventing c.t.TemplateTap from being written to in Reducer.
71
9aa630a6 » cwensel 2009-02-04 changed version 72 1.0.1
b0dc759b » cwensel 2009-02-04 Changed SinkMode.APPEND sup... 73
51133512 » cwensel 2009-02-04 Improved error message for ... 74 Improved error message for the case a Hadoop serializer/deserializer cannot be found.
75
036bdb94 » cwensel 2009-02-04 Changed c.s.Scheme sourceFi... 76 Changed c.s.Scheme sourceFields default to Fields.UKNOWN. sinkFields default remains Fields.ALL.
77
25dc51d3 » cwensel 2009-02-04 Fixed bug where unsafe file... 78 Fixed bug where unsafe filename characters were leaking into temporary filenames.
79
b0dc759b » cwensel 2009-02-04 Changed SinkMode.APPEND sup... 80 Changed SinkMode.APPEND support checks to be done in c.t.Hfs, instead of c.t.Tap.
81
82 1.0.0
9d55bce7 » cwensel 2009-01-13 Updated copyright messages.... 83
84 Updated copyright messages.
67f56ad6 » cwensel 2008-12-30 renamed Fields.minus to sub... 85
4f8b3e60 » cwensel 2008-12-31 Fixed bug where c.t.TuplePa... 86 Fixed bug where c.t.TuplePair threw a NPE during dubugging.
87
06479e02 » cwensel 2008-12-30 Fixed bug where positional ... 88 Fixed bug where positional selectors failed against Fields.UNKNOWN.
89
74fe9f86 » cwensel 2008-12-30 Changed all constructors on... 90 Changed all constructors on c.p.Group to be protected. Must now use subclasses to construct.
91
67f56ad6 » cwensel 2008-12-30 renamed Fields.minus to sub... 92 Renamed c.t.Fields#minus to subtract.
93
9bdcead7 » cwensel 2008-12-12 version 0.10.0 94 0.10.0
61f51dd7 » cwensel 2008-11-26 Merge branch 'working' 95
92ea3aff » cwensel 2008-12-30 Merge branch 'working' 96 Changed c.p.CoGroup "repeat" parameter to numSelfJoins to respresent the actual number of self joins to be performed.
97 Thus a value of 1, will cause a single self join of a pipe. Users will need to decrement the current value by 1.
98
e81f24e1 » cwensel 2008-12-12 renamed repeat parameter to... 99 Changed c.p.CoGroup "repeat" parameter to numSelfJoins to respresent the actual number of self joins to be performed.
100 Thus a value of 1, will cause a single self join of a pipe. Users will need to decrement the current value by 1.
101
b35f1f74 » cwensel 2008-12-11 Merge branch 'working' 102 Fixed bug with temporary filename generation where path created was too long.
103
104 Fixed Janino c.o.expression operations to require parameter names and types. Janino
105 was returning guessed parameter names in an undeterministic order.
106
107 Fixed boolean type c.t.Tuple serialization.
108
109 Fixed c.p.GroupBy merging case where grouping field names were not properly resolved.
110
111 Changed c.o.r.RegexParser to emit variable sized Tuples if a fieldDeclaration is not given. Also will emit group
112 matches if they are any, otherwise the match is emitted.
113
114 Removed deprecated classes; c.o.t.Texts, c.o.r.Regexes, c.p.EndPipe.
115
116 Removed experimental c.p.EndPipe class.
117
118 Changed c.t.Tap#isUseTapCollector to Tap#isWriteDirect.
119
120 Changed c.t.Tap and c.f.Flow to return c.t.TupleEntryIterator instead of c.t.TupleIterator. This is more consistent
121 and more useful.
122
123 Added c.t.TemplateTap to support dynamically writing out c.t.Tuple values to unique directories.
124
88d6db70 » cwensel 2008-11-26 Merge branch 'working' 125 Changed Cascading to support null values returned from c.t.Tap#source() and subsequently c.t.Scheme#source().
126 This allows for Schemes to skip records returned by an internal Hadoop InputFormat without having to implement
127 a custom Hadoop InputFormat or instrument a pipe assembly with a c.o.Filter.
128
004c302b » cwensel 2008-11-24 version 0.9.0 129 0.9.0
81198bc5 » cwensel 2008-10-14 Hadoop 0.18 support along w... 130
61f51dd7 » cwensel 2008-11-26 Merge branch 'working' 131 Updated c.o.Debug to allow for printing field names and tuple values in intervals.
132
133 Changed planner to fail if traps are not contained within single Map or Reduce tasks. This prevents the chance of
134 multiple tasks writing to the same output location. Hadoop only partially supports appends, so it is not currently
135 possible to append subsequent jobs to existing trap files. Naming sections of a pipe assembly allows traps to be
ae7baf30 » cwensel 2008-11-18 debug, planner, and new fil... 136 bound to smaller sections of assemblies.
137
138 c.o.f.Sample and c.o.f.Limit Filters. Sample allows a given percentage of Tuples to pass. Limit only allows the
139 specified number of Tuples to pass.
140
406ccd02 » cwensel 2008-11-18 added line number debugging... 141 c.p.Pipe instances now capture line numbers and classnames where they are instantiated so this information
142 can be printed out during planner failures.
143
46ef0a35 » cwensel 2008-11-18 added support for FlowSkipS... 144 Added c.f.FlowSkipStrategy interface to allow for pluggable rules for when to skip executing a c.f.Flow participating
145 in a c.c.Cascade. The default implementation is c.f.FlowSkipIfSinkStale, with an optional c.f.FlowSkipIfSinkExists.
9db6f98d » cwensel 2008-11-18 added cascade override 146 Setting a skip strategy on a Cascade overrides all Flow instance strategies.
46ef0a35 » cwensel 2008-11-18 added support for FlowSkipS... 147
5249bc9e » cwensel 2008-11-07 Fixed bug with c.t.Tuple#re... 148 Fixed bug with c.t.Tuple#remove() method not correctly removing values from Tuple.
149
ed894ac4 » cwensel 2008-10-22 Updated c.t.Tap api to supp... 150 Updated c.t.Tap api to support c.t.SinkMode enums. This opens up ability to support appends in the near future.
151
956268b2 » cwensel 2008-10-22 Added support for Hadoop 0.... 152 Added support for Hadoop 0.19.x. This release skips Hadoop 0.18.x.
153
368d3541 » cwensel 2008-10-22 Changed project structure s... 154 Changed project structure so that XML functions live in their own sub-project. This includes renaming the base
155 Cascading tree and jars to 'core'.
156
773bb6e4 » cwensel 2008-10-22 Fixed bug that prevented Fi... 157 Fixed bug that prevented Fields.UNKNOWN input sources from begin fed into a c.p.CoGroup for joining.
158
083b37c8 » cwensel 2008-10-14 Unmodifiable tuples 159 Changed all operations so that incoming c.t.Tuple and c.t.TupleEntry instances are unmodifiable. An
160 UnsupportedOperationException will be thrown on any attempt to modify argument tuples within an operation.
161 This enforces the rule argument tuples should not be modified to protect against concurrent modification in
162 parallel threads.
163
91d30d59 » cwensel 2008-10-14 use find() instead of matches 164 Updated c.o.r.RegexMatcher base class to use j.u.r.Matcher#find() instead of #matches(). This is more consistent
165 with default behaviors of popular languages. Matcher is now also initialized in prepare() and reset() in
166 the operation to reduce overhead.
167
d362396b » cwensel 2008-10-14 updated javadoc 168 Added new lifecycle methods to c.o.Operation, prepare and cleanup. These methods are called so that an Operation
169 instance can initialize and destroy any resources. They may be called more than once before the instance is
91d30d59 » cwensel 2008-10-14 use find() instead of matches 170 garbage collected.
d362396b » cwensel 2008-10-14 updated javadoc 171
601cf94e » cwensel 2008-10-14 new Buffer tests 172 Added a new operation called c.o.Buffer. Buffers are similiar to Reduce in MapReduce. They are given an Iterator
173 of input arguments and can emit any number of result c.t.Tuple instances. For many problems, this is more
174 efficient than using an c.o.Aggregator operation. Only one c.p.Every pipe with a Buffer operation may
175 follow a GroupBy or CoGroup.
176
faf370eb » cwensel 2008-10-14 Fixed dot file writing so G... 177 Fixed dot file writing so GraphViz can properly load.
178
179 Upgraded jgrapht library, requires JDK 1.6.
180
1f162fe4 » cwensel 2008-10-14 Fixed bug where selecting p... 181 Fixed bug where selecting postions from a c.t.Fields.UNKNOWN declaration would return the first position, not
182 the specified position.
183
81198bc5 » cwensel 2008-10-14 Hadoop 0.18 support along w... 184 Renamed c.t.Fields.KEYS to c.t.Fields.GROUP to be consistent with the Cascading model.
185
186 Fixed bug where c.t.Tap may inappropriately delete a sink from a task.
187
188 Changed c.o.Aggregator to no longer use a Map for the context. Users can now specify custom types by returning
189 either a new instance from start() or recycling an instance passed into start(). This change will break all existing
58db2906 » cwensel 2008-10-14 updated comments 190 implementations of Aggregator. Note, simply setting a new Map<Object,Object> on the call instance in start()
191 should be sufficient.
81198bc5 » cwensel 2008-10-14 Hadoop 0.18 support along w... 192
193 Changed all c.o.Function, c.o.Filter, c.o.Aggregator, c.o.ValueAssertion, and c.o.GroupAssertions to accept
58db2906 » cwensel 2008-10-14 updated comments 194 a c.f.FlowProcess object on all relevant methods. FlowProcess provides call-backs into the underyling system
81198bc5 » cwensel 2008-10-14 Hadoop 0.18 support along w... 195 to get configuration properties, fire a "keep alive" ping, or increment a custom counter. This change will
196 break all existing implemenations of the above interfaces.
197
198 Added ability to set serialization tokens via the cascading.serialization.tokens property. This compliments the
199 c.t.h.SerializationToken annotation.
200
201 Optimized co-grouping operation by using c.t.IndexTuple instead of a nested c.t.Tuple.
202
203 Changed c.t.Tap and c.s.Scheme sink methods to take a c.t.TupleEntry, instead of c.t.Fields and c.t.Tuple
204 individually.
205
206 Added the c.t.h.SerializationToken Java Annotation. This allows for an int value to be written during serialization
207 instead of a Class name for custom objects nested in c.t.Tuple instances. This feature should dramatically reduce
208 the size of Tuples saved in SequenceFiles, and improve the general performance during 'shuffling' between Map and
209 Reduce stages.
210
211 Added c.t.h.TupleSerialization, a Hadoop Serialization implementation. Tuple is no longer Hadoop Writable
212 and now relies on TupleSerialization for serialization support. Subequently nested objects in c.t.Tuple
213 only need to be c.l.Comparable. So they can be serialized properly, a Serialization implementation must be
214 registered with Hadoop. Note all primitive types are handled directly by Tuple, but custom types must
215 have a Serialization implementation, or must be Hadoop WritableComparable so that the default WritableSerialization
216 implementation will write them out.
217
2a5bca9f » cwensel 2008-10-31 Fix for c.p.CoGroup declare... 218 0.8.3
5249bc9e » cwensel 2008-11-07 Fixed bug with c.t.Tuple#re... 219
2a5bca9f » cwensel 2008-10-31 Fix for c.p.CoGroup declare... 220 Fix for c.p.CoGroup declared fields being generated out of order.
221
81198bc5 » cwensel 2008-10-14 Hadoop 0.18 support along w... 222 0.8.2
6283af54 » cwensel 2008-09-16 Fixed bug where unserializa... 223
e4e49bd0 » cwensel 2008-09-20 Added new properties via c.... 224 Added new properties via c.f.FlowConnector.setJarClass and c.f.FlowConnector.setJarPath for
225 setting the application jar file.
226
227 Fixed bug where job jar was not being inherited by subsequent MapReduce jobs when the first job was executed
228 in local mode.
229
6283af54 » cwensel 2008-09-16 Fixed bug where unserializa... 230 Fixed bug where unserializable Operations were being squashed internally. c.f.Flow instances will now
231 fail immediately and be marked as 'failed'.
232
705ebd27 » cwensel 2008-09-13 version 0.8.1 233 0.8.1
26b8ae65 » cwensel 2008-09-10 test for aggregator that em... 234
3e534c7a » cwensel 2008-09-13 updated comments 235 Fixed bug where c.t.Lfs did not force local mode for current MapReduce step.
e077914d » cwensel 2008-09-13 Fixes bug where c.t.Lfs did... 236
a35fc62c » cwensel 2008-09-11 Fixed bug where writing to ... 237 Fixed bug where writing to a c.t.TupleCollector would fail if using a c.s.SequenceFile in some cases.
238
705ebd27 » cwensel 2008-09-13 version 0.8.1 239 Added a few minor improvements to reduce stray object creations, and speedup c.t.Tuple serialization.
240
305472ec » cwensel 2008-09-08 version 0.8.0 241 0.8.0
7032e8ab » cwensel 2008-08-04 Fixed bug where c.g.CoGroup... 242
d848a01a » cwensel 2008-09-08 updated changes.txt 243 Updated c.o.x.TagSoupParser to accept 'features', use these features to recover past behaviors.
244
625b7a75 » cwensel 2008-09-08 Updated janino and tagsoup ... 245 Updated janino and tagsoup libraries to 2.5.15 and 1.2, respectively. Note that tagsoup, in theory, is not
246 backwards compatible by default. See their release notes: http://home.ccil.org/~cowan/XML/tagsoup/#1.2
247
4f5f0412 » cwensel 2008-09-02 Added some forward compatib... 248 Added some forward compatible changes for supporting Hadoop 0.18 at the API level. Currently there are other
249 issues preventing some tests from passing on Hadoop 0.18.
250
ff22efa2 » cwensel 2008-09-02 Changed c.f.FlowException t... 251 Changed c.f.FlowException to return the parent c.f.Flow name.
252
208cf895 » cwensel 2008-08-24 Changed behavior of c.f.Mul... 253 Changed behavior of c.f.MultiMapReducePlanner to use c.t.h.MultiInputFormat to allow single Mappers
254 to support many different Hadoop InputFormat types simultaneously. This deprecates the need to normalize
255 sources to a map and reduces the number of jobs in a c.f.Flow in some cases.
256
ad1e145b » cwensel 2008-08-22 Changed behavior of Cascadi... 257 Changed behavior of Cascading to allow for multiple paths from the same c.t.Tap source to be co-grouped on
258 via c.p.CoGroup. This allows for a kind of self-join where each stream is processed by a different operation
259 path within the Mapper.
260
3c786d0f » cwensel 2008-08-19 Added c.o.f.And, c.o.f.Or, ... 261 Added c.o.f.And, c.o.f.Or, c.o.f.Xor, and c.o.f.Not logic operator c.o.Filter implementations. They should be used
262 to compose more complex filters from existing implementations.
263
85825d78 » cwensel 2008-08-18 Changed the behavior of c.o... 264 Changed the behavior of c.o.BaseOperation to properly initialize itself if it is a c.o.Filter instance. This
265 removes the requirement that Filter implementations must set declaredFields to Fields.ALL, as it makes no
266 sense for a Filter to declare fields.
267
916c105f » cwensel 2008-08-17 Added c.f.PlannerException,... 268 Added c.f.PlannerException, a subclass of c.f.FlowException, and updated c.f.MultiMapReducePlanner to throw
269 it on failures. Functionality of writing DOT files has been moved from FlowException to PlannerException.
270
aa183fcb » cwensel 2008-08-17 Added c.o.f.FilterNotNull a... 271 Added c.o.f.FilterNotNull and c.o.f.FilterNull filter classes.
272
e1c429e1 » cwensel 2008-08-17 Changed c.f.MultiMapReduceP... 273 Changed c.f.MultiMapReducePlanner to fail if it encounters an c.p.Each to c.p.Every chain. In these cases, a
274 c.p.Group type must be between them.
275
4b2562f2 » cwensel 2008-08-16 Deleted c.o.Cut class as it... 276 Deleted c.o.Cut class as it was effectively a duplicate of c.o.Identity.
277
e32ecd00 » cwensel 2008-08-16 Changed c.f.MultiMapReduceP... 278 Changed c.f.MultiMapReducePlanner to fail if a c.p.GroupAssertion is not accompanied by another c.o.Aggregator
279 operation. This is required so that the GroupAssertion does not change the passing tuple stream if it is planned out.
280
ff4ec9c5 » cwensel 2008-08-16 Changed c.f.MultiMapReduceP... 281 Changed c.f.MultiMapReducePlanner to no longer insert new c.p.Each( ..., new Identity(), ... ) as a place holder.
282
d09d20d8 » cwensel 2008-08-16 Renamed c.p.PipeAssembly to... 283 Renamed c.p.PipeAssembly to c.p.SubAssembly to better reflect its purpose, which is to encapuslate reusable
77d9f1c5 » cwensel 2008-08-16 Renamed c.p.PipeAssembly to... 284 pipe assemblies in the same manner as a sub-process or sub-routine. A temporary c.p.PipeAssembly class has been
285 provided for backwards compatibility.
d09d20d8 » cwensel 2008-08-16 Renamed c.p.PipeAssembly to... 286
20c84f5e » cwensel 2008-08-16 Fixed bug where c.t.TapColl... 287 Fixed bug where c.t.TapCollector would throw an NPE if a custom Tap was not using paths.
288
233b36ab » cwensel 2008-08-16 commit 289 Changed behavior of c.f.Flow where if a c.f.FlowListener throws an exception, the Flow instance receiving the
290 exception will stop (by calling Flow.stop()). Listeners will continue to fire as expected and Flow.complete()
291 will re-throw the thrown exception (as was the original behavior).
292
ea17be01 » cwensel 2008-08-16 Added ability to set a Casc... 293 Added ability to set a Cascading specific temporary directory path for use by intermediate taps created
294 within c.f.Flow instances. Use c.t.Hfs.setTemporaryDirectory() to configure.
295
690be501 » cwensel 2008-08-07 Fixed bug where the 'mapred... 296 Fixed bug where the 'mapred.jar' property was begin stepped on if previously set by the calling application.
297
0e88bbe7 » cwensel 2008-08-06 Changed c.t.Tap and c.f.Flo... 298 Changed c.t.Tap and c.f.Flow to return c.t.TupleIterator and c.t.TupleCollector instead of c.t.TapIterator and
690be501 » cwensel 2008-08-07 Fixed bug where the 'mapred... 299 c.t.TapCollector, respectively.
0e88bbe7 » cwensel 2008-08-06 Changed c.t.Tap and c.f.Flo... 300
de3547a9 » cwensel 2008-08-06 Added c.t.Tap.flowInit( c.f... 301 Added c.t.Tap.flowInit( c.f.Flow flow ) to allow a given tap to know what flows it is participating in. It is called
302 immediately after the Flow instance is initailized.
303
85f5d1e1 » cwensel 2008-08-06 Fixed bug with nested c.p.P... 304 Fixed bug with nested c.p.PipeAssembly instances where some nested assemblies threw an internal error from
305 the planner.
306
39a569f2 » cwensel 2008-08-04 Changed c.o.Debug to accept... 307 Changed c.o.Debug to accept a prefix text string that will be prefixed to every message.
308
647da594 » cwensel 2008-08-04 Fixed bug where c.f.MultiMa... 309 Fixed bug where c.f.MultiMapReducePlanner would fail when normalizing inputs to a group where the inputs
310 passed through one or more splits.
311
7032e8ab » cwensel 2008-08-04 Fixed bug where c.g.CoGroup... 312 Fixed bug where c.g.CoGroup silently stepped on input pipes with the same input name.
313
1e4579e2 » cwensel 2008-07-21 version 0.7.1 314 0.7.1
88d0792f » cwensel 2008-07-17 Added checks for duplicate ... 315
44faf6ce » cwensel 2008-07-21 Fixed bug in c.f.MultiMapRe... 316 Fixed bug in c.f.MultiMapReducePlanner where a source used on more than one c.p.Group would cause an internal
317 error during planning.
318
319 Changed c.f.MultiMapReducePlanner to normalize heterogeneous sinks.
94b252e2 » cwensel 2008-07-21 Changed c.f.MultiMapReduceP... 320
9a1460fd » cwensel 2008-07-19 Changed c.f.MultiMapReduceP... 321 Changed c.f.MultiMapReducePlanner to keep a splitting c.p.Each on the previous step, instead of being duplicated
322 on each branch. If the Each is preceeded by a source c.t.Tap, it will be duplicated across branches to reduce
323 the number of step in the Flow.
324
c0e8cf9c » cwensel 2008-07-19 Fixed bug in c.f.MultiMapRe... 325 Fixed bug in c.f.MultiMapReducePlanner where too many temp tap instances were being inserted while normalizing
326 the flow sources.
327
9133dea0 » cwensel 2008-07-17 Changed c.t.Fields to fail ... 328 Changed c.t.Fields to fail if given duplicate field names.
329
13a881a7 » cwensel 2008-07-17 Changed behavior if Hadoop ... 330 Changed behavior if Hadoop FileInputSplit is not used and property "map.input.file" is not set. If there is one
331 source, it will returned as the source for the mapper stack, otherwise an exception is thrown. Subsequently joins
332 and merges of non-file sources is not supported until a discriminator can be passed to the mapper.
333
21fec88f » cwensel 2008-07-17 Reverted previous change 334 Fixed bug in c.t.Tuple where NPE was thrown under certain compareTo operations.
335
336 Fixed bug that prevented CoGrouping or Merging on the same source even though it was one or more Groupings away.
88d0792f » cwensel 2008-07-17 Added checks for duplicate ... 337
58823a99 » cwensel 2008-07-15 version 0.7.0 338 0.7.0
483d1c22 » cwensel 2008-06-20 Added c.f.MapReduceFlow to ... 339
c99b7d41 » cwensel 2008-07-15 Changes project structure, ... 340 Changes project structure, removed 'examples' sub-project.
341
342 Updated to support Hadoop 0.17.x. This version is not API compatible with any Hadoop version less than 0.17.0.
3c446858 » cwensel 2008-07-14 Updated to support Hadoop 0... 343
1f5628f7 » cwensel 2008-07-10 Added ability to stop all c... 344 Added ability to stop all c.f.Flows executing within a c.c.Cascade instance via the stop() method.
345
19384a6a » cwensel 2008-07-10 Changed c.f.FlowConnector t... 346 Changed c.f.FlowConnector to only take a Map of properties. These properties are passed downstream to various
347 subsystems. This removes the Hadoop JobConf constructor, but it still can be passed as a property value. Also
348 properties will be pushed into a defaul JobConf, bypassing any direct JobConf coupling in applications.
349
b2b658db » cwensel 2008-07-09 Changed c.f.Flow to automat... 350 Changed c.f.Flow to automatically register a shutdown hook killing remote jobs on vm exit.
351
352 Changed c.f.Flow.stop() to immediately stop all running jobs.
353
13f08268 » cwensel 2008-07-09 Changed c.o.Operation to an... 354 Changed c.o.Operation to an interface and introduced c.o.BaseOperation. This makes creating custom Operation types
355 more flexible and intuitive. c.o.Filter, c.o.Function, c.o.Aggregator, and c.o.Assertion now extend c.o.Operation.
356
9da69143 » cwensel 2008-07-09 Added c.p.c.OuterJoin, c.p.... 357 Added c.p.c.OuterJoin, c.p.c.MixedJoin, c.p.c.LeftJoin, and c.p.c.RightJoin c.p.c.CoGrouper classes. They
358 compliment the default c.p.c.InnerJoin CoGrouper class.
45aa7b61 » cwensel 2008-07-09 Added c.p.c.OuterJoin, c.p.... 359
0c2afeac » cwensel 2008-07-04 Added support for passing a... 360 Added support for passing an intermediateSchemeClass to the underlying planner to be used as the default c.s.Scheme
361 for intermediate c.t.Tap instances internal to a given c.f.Flow.
362
58eaa495 » cwensel 2008-07-03 Fixed bug where c.p.Group i... 363 Fixed bug where c.p.Group is immediately followed by another c.p.Group (or their sub-classes) and fields could not
364 be resolved between them.
365
8c0ac8b9 » cwensel 2008-07-03 Added support for c.t.Tap i... 366 Added support for c.t.Tap instances implementing c.f.FlowListener. If implemented, they will automatically be
06e52306 » cwensel 2008-07-15 refactoring 367 added to the Flow event listeners collection and will receive Flow events.
8c0ac8b9 » cwensel 2008-07-03 Added support for c.t.Tap i... 368
dcff82c8 » cwensel 2008-07-03 Fixed case where multiple s... 369 Fixed case where multiple source c.t.Tap instances return true for the containsFile method. Now verifies only one
370 Tap contains the file, and fails otherwise.
371
372 Changed c.s.TextLine to not set numSinkParts to 1 by default. Now uses the natural number of parts.
373
e9106263 » cwensel 2008-06-30 Changed MapReduce planner t... 374 Changed MapReduce planner to force an intermediate file between branches with Hadoop incompatible source Taps
375 on joins/merges. If the taps are compatible (have same Scheme), all branches will be processed in same Mapper
dcff82c8 » cwensel 2008-07-03 Fixed case where multiple s... 376 before the c.p.Group.
e9106263 » cwensel 2008-06-30 Changed MapReduce planner t... 377
3188b055 » cwensel 2008-06-30 Added merge capabilities in... 378 Added merge capabilities in c.p.GroupBy. This allows multiple input branches to be grouped as if a single stream.
379
b6e16c44 » cwensel 2008-06-24 Fixed bug in c.t.TapCollect... 380 Fixed bug in c.t.TapCollector where writing to a Sequence file threw a NPE.
381
483d1c22 » cwensel 2008-06-20 Added c.f.MapReduceFlow to ... 382 Added c.f.MapReduceFlow to support custom MapReduce jobs, allowing them to participate in a Cascade job.
383
5425a34b » cwensel 2008-06-17 release 0.6.1 384 0.6.1
64465379 » cwensel 2008-06-13 Changed thrown c.f.FlowExce... 385
386 Changed thrown c.f.FlowException instances to include cause message.
387
388 Fixed bug where empty sink or source map was not detected.
389
1ef85e80 » cwensel 2008-06-12 version 0.6.0 390 0.6.0
ff7d19bb » cwensel 2008-05-07 Changed order of JobConf in... 391
68417901 » cwensel 2008-06-12 Changed default argument se... 392 Changed default argument selector for c.p.Every to be Fields.ALL, to be consistent with the default value of c.p.Each.
393
d3845abe » cwensel 2008-06-10 Added support for assembly ... 394 Added support for assembly traps. If an exception is thrown from inside an c.o.Operation, the offending Tuple
395 can be saved to a file for later processing, allowing the job to complete.
396
397 Added support for stream assertions. STRICT and VALID assertions can be built into a pipe assembly, and optionally
398 planned out during runtime. Assertions will throw exceptions if they fail.
399
b006cafa » cwensel 2008-05-09 Changed c.o.a.First, Last, ... 400 Changed c.o.a.First, Last, Min, and Max to optionally ignore specified values. Useful if you do not wish
401 for a 'default' value to be considered first, or last in a set.
402
403 Changed c.o.a.Sum to take a Class for coercion of the result value.
af68539d » cwensel 2008-05-08 Changed c.o.Average, Max, M... 404
405 Changes c.o.Max and Min to use infinity as initial values so zero is bigger than a really small number
406 for Max, and zero is smaller than a really big number for Min.
407
ff7d19bb » cwensel 2008-05-07 Changed order of JobConf in... 408 Changed order of JobConf initialization. c.f.FlowStep now is added to the JobConf last in order to catch
af68539d » cwensel 2008-05-08 Changed c.o.Average, Max, M... 409 all lazily configured values.
ff7d19bb » cwensel 2008-05-07 Changed order of JobConf in... 410
411 Changed compile to include debug info by default.
412
413 Fixed bug in c.t.MultiTap where super scheme was not returned if available.
414
fc9c649b » cwensel 2008-05-05 version 0.5.0 415 0.5.0
4477dbf8 » cwensel 2008-05-01 rename file 416
ac1016fe » cwensel 2008-05-02 update comments 417 Added skipIfSinkExists property to c.f.Flow. Set to true if the c.c.Cascade should skip the Flow instance even
418 if the sink is stale and not set to be deleted on initialization.
419
fc9c649b » cwensel 2008-05-05 version 0.5.0 420 Fixed bug in c.t.h.HttpFileSystem that URL escaped the ? prefixing the query string.
4477dbf8 » cwensel 2008-05-01 rename file 421
422 Fixed bug where a join with duplicate taps was not recognized during job planning. Now an appropriate error
423 message is displayed, instead of jobs completing with only one instance of the resource stream.
424
425 Fixed c.t.h.HttpFileSystem to remember authority information in the url and prefix it when missing.
426
427 Changed c.s.TextLine to accept either on or two source fields. If one, only the 'line' value
428 is sourced from the value, discarding the 'offset' value.
429
430 Added c.o.r.RegexSplitGenerator to support splitting single tuple values into multiple tuples based on a regex
431 delimiter. Includes new tests.
432
433 Added c.s.CascadeStats and c.s.FlowStats to provide access to current state and statistics of particular
434 Cascade, Flow, or the child Flows of a Cascade.
435
436 Added ability to sort grouping values with sort argument on c.p.GroupBy. Sorts can be reversed.
437
438 Added c.o.e.ExpressionFilter, the c.o.Filter analog to c.o.e.ExpressionFunction.
439
440 0.4.1
441
442 Fixed path normalization regex in c.u.Util where it munged any path starting with file:///.
443
444 0.4.0
445
446 Changed c.p.GroupBy default grouping fields to c.t.Fields.ALL from Fields.FIRST. This change provides a simple
447 way to sort a tuple stream based on the order of the tuple fields.
448
449 Changed c.f.FlowConnector to create c.f.Flow instances that will bypass the reducer if no c.p.Group is participating
450 in the assembly. Previoiusly Group instances were inserted if missing. This allows a chain of c.p.Every instances
451 to be used to process/filter a tuple stream without the invoking the reducer needlessly (if a sort isn't required).
452 This change also supports bypassing the default Hadoop OutputCollector in the mapper via the sink c.t.Tap instance.
453
454 Changed c.f.FlowStep behavior to run in 'local' mode if either the sink or source tap is a c.t.Lfs instance. This
455 allows for c.f.Flow instances to run mixed if configured to execute on a particular cluster by default. This behavior
456 supports complex import/export processes against the HDFS or other supported remote filesystem.
457
458 Changed behavior of c.t.Dfs to force use of HDFS. Previously Dfs would default to the local FileSystem
459 if the job was run in 'local'mode. Now a Dfs instance will cause failures if it cannot connect to a HDFS cluster.
460 Using c.t.Hfs will provide previous Dfs behavior. Hfs will use the 'default' filesystem if a scheme is not present
461 in the 'stringPath' (i.e. hdfs://host:port/some/path).
462
463 Added c.stats package to allow for collecting statics of Cascades, Flows, and FlowSteps.
464
465 Updated c.f.Flow and c.c.Cascade log messages to be easier to follow when executing many flow instances
466 simultaneously.
467
468 Added compression flag to c.s.TextLine. Can now toggle compression (Hadoop style compression) per Tap instance.
469 This prevents clusters with compression enabled by default to export text files with a .deflate extension.
470
471 Added support for bypassing Hadoop OutputCollector via Tap.setUseTapCollector() method. Setting to true will force
472 Cascading to use the c.t.TapCollector instead. This bypasses bugs in Hadoop with custom FileSystem types. This will
473 always be true for http(s) and s3tp filesystems when using a c.t.Hfs Tap type (atleast until HADOOP-3021 is resolved).
474
475 Added c.t.TupleCollector, complementing c.t.TupleIterator, for directly writing Tuple instances out via a c.t.Tap
476 instance.
477
478 Added c.f.FlowListener so that c.f.Flow instances can fire events on starting, completed, and throwable.
479
480 Changed c.t.h.S3HttpFileSystem so it can now create files remotely.
481
482 Renamed cascading.spill.threshold to cascading.cogroup.spill.threshold, so there is less a chance of collision.
483
484 Made numerous optimizations to improve overall performance. Namely split and merge of key/value tuples to remove
485 redundancy in the stream between the mapper and reducer.
486
487 Changed c.p.Operators to push c.o.Operation results directly through to next operation without intermediate
488 collection. This should improve pipelining of large result streams and lower runtime memory footprint.
489
490 Changed c.c.Cascade so it now runs Flows in parallel if Hadoop is clustered, and there are no dependencies between the
491 Flows.
492
493 Moved c.Cascade and related classed to c.cascade package. Wanted to preempt any future ugliness.
494
495 Added support in c.t.h.S3HttpFileSystem for these properties: fs.s3tp.awsAccessKeyId and fs.s3tp.awsSecretAccessKey
496
497 0.3.0
498
499 Added ability to push Log4j logger properties to mapper/reducer via JobConf.
500 Use jobConf.set("log4j.logger","logger1=LEVEL,logger2=LEVEL")
501
502 Added missing equals() and hashCode() in c.t.MultiTap.
503
504 Added c.t.h.ZipInputFormat (and ZipSplit) to support zip files. c.s.TextLine supports transparent
505 reading of zip files if the filename ends with .zip, but cannot write to them. This code is
506 loosely based on HADOOP-1824. If the underlying filesystem is hdfs or file, splits will be created
507 for each ZipEntry. Otherwise ZipEntries are iterated over to be more stream friendly. Progress status is
508 supported.
509
510 Added http, https, and s3tp read-only file systems to Hadoop. Use these URLs, respectively:
511 http://, https://, and s3tp://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@bucket-name/key
512
513 Added c.o.t.DateFormatter supporting text formatting of time stamps created by c.o.t.DateParser.
514
515 Fixed bug where in complex assemblies, some Scopes were not resolved.
516
517 Fixed bug where tap instances were not being inserted before some CoGroup joins if there was a previous Group in the
518 assembly.
519
520 Upgraded JGraphT to 0.7.3
521
522 Changed c.t.SpillableTupleList allows for iteration across entries.
523
524 Changed c.f.FlowException to optionally allow for printing of underlying pipe graph for debugging.
525
526 Added c.o.t.FieldFormatter function to format Tuples into complex strings using j.u.Formatter formatting.
527
528 Added c.o.a.Last aggregator to find the last value encountered in a group.
529
530 Changed c.o.a.Max and c.o.a.Min to maintain original value type. Will return null if no values are encountered.
531
532 Changed c.o.a.First to use Fields.ARG by default. Removed Fields constructor.
533
534 Added c.t.Fields.join(Fields...) method to allow for joining multiple Fields instances into a new instance.
535
536 Can retrieve Tuple values by field name through the TupleEntry class via the get(String) method.
537
538 Added c.t.TupleCollector interface to simplify the operation interfaces.
539
540 Added a Debug filter that will print to either stderr or stdout. Useful for debugging stream transformations.
541
542 Added CascadingTestCase base test class
543
544 Added Insert Function that allows for literal values to be inserted into the Tuple stream.
545
546 0.2.0
547
548 CoGroup will now spill to disk on extremely large co-groupings. Configurable via "cascading.spill.threshold".
549 Defaults to 10k elements.
550
551 java.util.Properties instances can be used to set defauls for FlowConnectors.
552
553 Fix for InnerJoin, the default join for CoGroup.
554
555 Introduced MultiTap to support concatenation of files into a pipe assembly.
556
557 RegexParser now fails on a failed match. Prevents it being used or behaving as a filter.
558
559 Fixed bug with PipeAssembly instances not properly being assimiliated into the pipeGraph.
560
561 Fixed assertion error thrown by JGraphT.
562
563 Renamed Tap method deleteOnInit to deleteOnSinkInit.
564
565
566 0.1.0
567
9bdcead7 » cwensel 2008-12-12 version 0.10.0 568 First release.