cwensel / cascading
- Source
- Commits
- Network (14)
- Downloads (46)
- Graphs
-
Tree:
a7920a8
cascading / CHANGES.txt
| 4477dbf8 » | cwensel | 2008-05-01 | 1 | Cascading Change Log | |
| 2 | |||||
| a7920a80 » | cwensel | 2009-03-27 | 3 | 1.0.6 | |
| 9113a785 » | cwensel | 2009-03-17 | 4 | ||
| d90ccac1 » | cwensel | 2009-03-26 | 5 | Fixed bug where a uri path to a s3n://bucket/ could cause an NPE when determining mod time on the path. | |
| 6 | |||||
| 4df22823 » | cwensel | 2009-03-23 | 7 | Fixed bug where sink c.s.Scheme sink fields were not being consulted during planning. This fix may | |
| 8 | cause planner errors in existing applications where the sink fields are not actually available in the incoming | ||||
| 9 | tuple stream. | ||||
| 10 | |||||
| 46c1ef4a » | cwensel | 2009-03-18 | 11 | Updated application jar discovery to provide more sane defaults supporting simple cases. | |
| 12 | |||||
| 9113a785 » | cwensel | 2009-03-17 | 13 | Fixed bug where default properties in nested j.u.Properties object were not being copied. | |
| 14 | |||||
| 71aec6b8 » | cwensel | 2009-03-12 | 15 | 1.0.5 | |
| 16 | |||||
| 17 | Added check if num reducers is zero, if so, assume #reduce() has no intention of being called and return silently. | ||||
| 18 | |||||
| 237bad0b » | cwensel | 2009-03-09 | 19 | 1.0.4 | |
| c0dbf7e0 » | cwensel | 2009-02-27 | 20 | ||
| e1386626 » | cwensel | 2009-03-04 | 21 | Updated split optimizer to perform a multipass optimization. | |
| 22 | |||||
| 6ba46c5c » | cwensel | 2009-03-04 | 23 | Fixed bug where c.f.MultiMapReducePlanner was not properly handling splits on named Pipe instances. | |
| 24 | |||||
| c1f3845e » | cwensel | 2009-03-03 | 25 | Added c.t.TemplateTap constructor arg that allows for independent tuple selection for use by template path. | |
| 26 | |||||
| 27 | Fixed bug where unsafe filename characters were leaking into temporary filenames, didn't take the first time. | ||||
| c0dbf7e0 » | cwensel | 2009-02-27 | 28 | ||
| 1ec8bf57 » | cwensel | 2009-02-27 | 29 | 1.0.3 | |
| 6065b836 » | cwensel | 2009-02-11 | 30 | ||
| 7d8a36b2 » | cwensel | 2009-02-24 | 31 | Fixed bug in c.f.MultiMapReducePlanner where split and joins with the same source were not handled properly. | |
| 32 | |||||
| 4bbf8fe6 » | cwensel | 2009-02-24 | 33 | Fixed bug in c.f.Flow#writeDOT caused by changes in 1.0.2. | |
| 34 | |||||
| de1b4c0f » | cwensel | 2009-02-24 | 35 | Fixed bug in c.o.t.DateFormatter and c.o.t.DateParser where the TimeZone value was not being properly set. This | |
| 36 | fix could affect existing applications. | ||||
| 37 | |||||
| 38 | 1.0.2 | ||||
| 39 | |||||
| 92f7ae67 » | cwensel | 2009-02-20 | 40 | Added rules to verify no duplicate head or tail names exist in an assembly when calling c.f.FlowConnector#connect(). | |
| 74637d19 » | cwensel | 2009-02-20 | 41 | Currently a WARNING will be issued via the logger, next major release this will be an exception. This is a change | |
| 42 | that was supported in prior releases, but turns out to allow error prone code. Two workarounds are availabe: bind | ||||
| 43 | the same tap to both names in the tap map, or split from a single named c.p.Pipe instance. | ||||
| 92f7ae67 » | cwensel | 2009-02-20 | 44 | ||
| 730c59e8 » | cwensel | 2009-02-19 | 45 | Added support for c.o.e.ExpressionFunction to evaluate expressions with no input parameters. | |
| 46 | |||||
| b9ae5045 » | cwensel | 2009-02-19 | 47 | Reverted MR job naming to include sink c.t.Tap name. More verbose, but easier for degugging. | |
| 48 | |||||
| 49 | Update c.c.Cascade to not delete c.f.Flow sinks if they are appendable before the Flow is executed. | ||||
| 4d1486ea » | cwensel | 2009-02-19 | 50 | ||
| bb393118 » | cwensel | 2009-02-19 | 51 | Updated error messages to warn when internal element graphs remove all place holders resulting in an empty graph | |
| 52 | usually due to missing linkages between pipe assemblies. | ||||
| 53 | |||||
| 5565cdd0 » | cwensel | 2009-02-18 | 54 | Allowing Fields.UNKNOWN to propagate through pipes that do not declare argument selectors. This is a relaxation | |
| 55 | of the strict planning and seems very natural when assembling pipes to process unknown field sets. Reserving | ||||
| 56 | the right to revert this feature if it causes unforseen issues. | ||||
| 57 | |||||
| 58 | Fixed bug in c.o.f.UnGroup where the num arg value was improperly calculated. | ||||
| 59 | |||||
| c4b37975 » | cwensel | 2009-02-18 | 60 | Allow for white space in the serializations token property so it can be set in a config file simply. | |
| 61 | |||||
| faa46cb5 » | cwensel | 2009-02-18 | 62 | Added new log message if no serialization token is found for a class being serialized out. | |
| 63 | |||||
| 7a89512c » | cwensel | 2009-02-18 | 64 | Fixed bug that allowed c.t.Field instances to be nested in new Fields instances. | |
| 65 | |||||
| b42980f1 » | cwensel | 2009-02-18 | 66 | Updated many error messages to print the number of fields along with a list of the field names. | |
| 67 | |||||
| 68 | Fixed bug preventing custom c.s.Scheme types from using a different key/value classes in some situations. | ||||
| 5011488a » | cwensel | 2009-02-18 | 69 | ||
| 6065b836 » | cwensel | 2009-02-11 | 70 | Fixed bug preventing c.t.TemplateTap from being written to in Reducer. | |
| 71 | |||||
| 9aa630a6 » | cwensel | 2009-02-04 | 72 | 1.0.1 | |
| b0dc759b » | cwensel | 2009-02-04 | 73 | ||
| 51133512 » | cwensel | 2009-02-04 | 74 | Improved error message for the case a Hadoop serializer/deserializer cannot be found. | |
| 75 | |||||
| 036bdb94 » | cwensel | 2009-02-04 | 76 | Changed c.s.Scheme sourceFields default to Fields.UKNOWN. sinkFields default remains Fields.ALL. | |
| 77 | |||||
| 25dc51d3 » | cwensel | 2009-02-04 | 78 | Fixed bug where unsafe filename characters were leaking into temporary filenames. | |
| 79 | |||||
| b0dc759b » | cwensel | 2009-02-04 | 80 | Changed SinkMode.APPEND support checks to be done in c.t.Hfs, instead of c.t.Tap. | |
| 81 | |||||
| 82 | 1.0.0 | ||||
| 9d55bce7 » | cwensel | 2009-01-13 | 83 | ||
| 84 | Updated copyright messages. | ||||
| 67f56ad6 » | cwensel | 2008-12-30 | 85 | ||
| 4f8b3e60 » | cwensel | 2008-12-31 | 86 | Fixed bug where c.t.TuplePair threw a NPE during dubugging. | |
| 87 | |||||
| 06479e02 » | cwensel | 2008-12-30 | 88 | Fixed bug where positional selectors failed against Fields.UNKNOWN. | |
| 89 | |||||
| 74fe9f86 » | cwensel | 2008-12-30 | 90 | Changed all constructors on c.p.Group to be protected. Must now use subclasses to construct. | |
| 91 | |||||
| 67f56ad6 » | cwensel | 2008-12-30 | 92 | Renamed c.t.Fields#minus to subtract. | |
| 93 | |||||
| 9bdcead7 » | cwensel | 2008-12-12 | 94 | 0.10.0 | |
| 61f51dd7 » | cwensel | 2008-11-26 | 95 | ||
| 92ea3aff » | cwensel | 2008-12-30 | 96 | Changed c.p.CoGroup "repeat" parameter to numSelfJoins to respresent the actual number of self joins to be performed. | |
| 97 | Thus a value of 1, will cause a single self join of a pipe. Users will need to decrement the current value by 1. | ||||
| 98 | |||||
| e81f24e1 » | cwensel | 2008-12-12 | 99 | Changed c.p.CoGroup "repeat" parameter to numSelfJoins to respresent the actual number of self joins to be performed. | |
| 100 | Thus a value of 1, will cause a single self join of a pipe. Users will need to decrement the current value by 1. | ||||
| 101 | |||||
| b35f1f74 » | cwensel | 2008-12-11 | 102 | Fixed bug with temporary filename generation where path created was too long. | |
| 103 | |||||
| 104 | Fixed Janino c.o.expression operations to require parameter names and types. Janino | ||||
| 105 | was returning guessed parameter names in an undeterministic order. | ||||
| 106 | |||||
| 107 | Fixed boolean type c.t.Tuple serialization. | ||||
| 108 | |||||
| 109 | Fixed c.p.GroupBy merging case where grouping field names were not properly resolved. | ||||
| 110 | |||||
| 111 | Changed c.o.r.RegexParser to emit variable sized Tuples if a fieldDeclaration is not given. Also will emit group | ||||
| 112 | matches if they are any, otherwise the match is emitted. | ||||
| 113 | |||||
| 114 | Removed deprecated classes; c.o.t.Texts, c.o.r.Regexes, c.p.EndPipe. | ||||
| 115 | |||||
| 116 | Removed experimental c.p.EndPipe class. | ||||
| 117 | |||||
| 118 | Changed c.t.Tap#isUseTapCollector to Tap#isWriteDirect. | ||||
| 119 | |||||
| 120 | Changed c.t.Tap and c.f.Flow to return c.t.TupleEntryIterator instead of c.t.TupleIterator. This is more consistent | ||||
| 121 | and more useful. | ||||
| 122 | |||||
| 123 | Added c.t.TemplateTap to support dynamically writing out c.t.Tuple values to unique directories. | ||||
| 124 | |||||
| 88d6db70 » | cwensel | 2008-11-26 | 125 | Changed Cascading to support null values returned from c.t.Tap#source() and subsequently c.t.Scheme#source(). | |
| 126 | This allows for Schemes to skip records returned by an internal Hadoop InputFormat without having to implement | ||||
| 127 | a custom Hadoop InputFormat or instrument a pipe assembly with a c.o.Filter. | ||||
| 128 | |||||
| 004c302b » | cwensel | 2008-11-24 | 129 | 0.9.0 | |
| 81198bc5 » | cwensel | 2008-10-14 | 130 | ||
| 61f51dd7 » | cwensel | 2008-11-26 | 131 | Updated c.o.Debug to allow for printing field names and tuple values in intervals. | |
| 132 | |||||
| 133 | Changed planner to fail if traps are not contained within single Map or Reduce tasks. This prevents the chance of | ||||
| 134 | multiple tasks writing to the same output location. Hadoop only partially supports appends, so it is not currently | ||||
| 135 | possible to append subsequent jobs to existing trap files. Naming sections of a pipe assembly allows traps to be | ||||
| ae7baf30 » | cwensel | 2008-11-18 | 136 | bound to smaller sections of assemblies. | |
| 137 | |||||
| 138 | c.o.f.Sample and c.o.f.Limit Filters. Sample allows a given percentage of Tuples to pass. Limit only allows the | ||||
| 139 | specified number of Tuples to pass. | ||||
| 140 | |||||
| 406ccd02 » | cwensel | 2008-11-18 | 141 | c.p.Pipe instances now capture line numbers and classnames where they are instantiated so this information | |
| 142 | can be printed out during planner failures. | ||||
| 143 | |||||
| 46ef0a35 » | cwensel | 2008-11-18 | 144 | Added c.f.FlowSkipStrategy interface to allow for pluggable rules for when to skip executing a c.f.Flow participating | |
| 145 | in a c.c.Cascade. The default implementation is c.f.FlowSkipIfSinkStale, with an optional c.f.FlowSkipIfSinkExists. | ||||
| 9db6f98d » | cwensel | 2008-11-18 | 146 | Setting a skip strategy on a Cascade overrides all Flow instance strategies. | |
| 46ef0a35 » | cwensel | 2008-11-18 | 147 | ||
| 5249bc9e » | cwensel | 2008-11-07 | 148 | Fixed bug with c.t.Tuple#remove() method not correctly removing values from Tuple. | |
| 149 | |||||
| ed894ac4 » | cwensel | 2008-10-22 | 150 | Updated c.t.Tap api to support c.t.SinkMode enums. This opens up ability to support appends in the near future. | |
| 151 | |||||
| 956268b2 » | cwensel | 2008-10-22 | 152 | Added support for Hadoop 0.19.x. This release skips Hadoop 0.18.x. | |
| 153 | |||||
| 368d3541 » | cwensel | 2008-10-22 | 154 | Changed project structure so that XML functions live in their own sub-project. This includes renaming the base | |
| 155 | Cascading tree and jars to 'core'. | ||||
| 156 | |||||
| 773bb6e4 » | cwensel | 2008-10-22 | 157 | Fixed bug that prevented Fields.UNKNOWN input sources from begin fed into a c.p.CoGroup for joining. | |
| 158 | |||||
| 083b37c8 » | cwensel | 2008-10-14 | 159 | Changed all operations so that incoming c.t.Tuple and c.t.TupleEntry instances are unmodifiable. An | |
| 160 | UnsupportedOperationException will be thrown on any attempt to modify argument tuples within an operation. | ||||
| 161 | This enforces the rule argument tuples should not be modified to protect against concurrent modification in | ||||
| 162 | parallel threads. | ||||
| 163 | |||||
| 91d30d59 » | cwensel | 2008-10-14 | 164 | Updated c.o.r.RegexMatcher base class to use j.u.r.Matcher#find() instead of #matches(). This is more consistent | |
| 165 | with default behaviors of popular languages. Matcher is now also initialized in prepare() and reset() in | ||||
| 166 | the operation to reduce overhead. | ||||
| 167 | |||||
| d362396b » | cwensel | 2008-10-14 | 168 | Added new lifecycle methods to c.o.Operation, prepare and cleanup. These methods are called so that an Operation | |
| 169 | instance can initialize and destroy any resources. They may be called more than once before the instance is | ||||
| 91d30d59 » | cwensel | 2008-10-14 | 170 | garbage collected. | |
| d362396b » | cwensel | 2008-10-14 | 171 | ||
| 601cf94e » | cwensel | 2008-10-14 | 172 | Added a new operation called c.o.Buffer. Buffers are similiar to Reduce in MapReduce. They are given an Iterator | |
| 173 | of input arguments and can emit any number of result c.t.Tuple instances. For many problems, this is more | ||||
| 174 | efficient than using an c.o.Aggregator operation. Only one c.p.Every pipe with a Buffer operation may | ||||
| 175 | follow a GroupBy or CoGroup. | ||||
| 176 | |||||
| faf370eb » | cwensel | 2008-10-14 | 177 | Fixed dot file writing so GraphViz can properly load. | |
| 178 | |||||
| 179 | Upgraded jgrapht library, requires JDK 1.6. | ||||
| 180 | |||||
| 1f162fe4 » | cwensel | 2008-10-14 | 181 | Fixed bug where selecting postions from a c.t.Fields.UNKNOWN declaration would return the first position, not | |
| 182 | the specified position. | ||||
| 183 | |||||
| 81198bc5 » | cwensel | 2008-10-14 | 184 | Renamed c.t.Fields.KEYS to c.t.Fields.GROUP to be consistent with the Cascading model. | |
| 185 | |||||
| 186 | Fixed bug where c.t.Tap may inappropriately delete a sink from a task. | ||||
| 187 | |||||
| 188 | Changed c.o.Aggregator to no longer use a Map for the context. Users can now specify custom types by returning | ||||
| 189 | either a new instance from start() or recycling an instance passed into start(). This change will break all existing | ||||
| 58db2906 » | cwensel | 2008-10-14 | 190 | implementations of Aggregator. Note, simply setting a new Map<Object,Object> on the call instance in start() | |
| 191 | should be sufficient. | ||||
| 81198bc5 » | cwensel | 2008-10-14 | 192 | ||
| 193 | Changed all c.o.Function, c.o.Filter, c.o.Aggregator, c.o.ValueAssertion, and c.o.GroupAssertions to accept | ||||
| 58db2906 » | cwensel | 2008-10-14 | 194 | a c.f.FlowProcess object on all relevant methods. FlowProcess provides call-backs into the underyling system | |
| 81198bc5 » | cwensel | 2008-10-14 | 195 | to get configuration properties, fire a "keep alive" ping, or increment a custom counter. This change will | |
| 196 | break all existing implemenations of the above interfaces. | ||||
| 197 | |||||
| 198 | Added ability to set serialization tokens via the cascading.serialization.tokens property. This compliments the | ||||
| 199 | c.t.h.SerializationToken annotation. | ||||
| 200 | |||||
| 201 | Optimized co-grouping operation by using c.t.IndexTuple instead of a nested c.t.Tuple. | ||||
| 202 | |||||
| 203 | Changed c.t.Tap and c.s.Scheme sink methods to take a c.t.TupleEntry, instead of c.t.Fields and c.t.Tuple | ||||
| 204 | individually. | ||||
| 205 | |||||
| 206 | Added the c.t.h.SerializationToken Java Annotation. This allows for an int value to be written during serialization | ||||
| 207 | instead of a Class name for custom objects nested in c.t.Tuple instances. This feature should dramatically reduce | ||||
| 208 | the size of Tuples saved in SequenceFiles, and improve the general performance during 'shuffling' between Map and | ||||
| 209 | Reduce stages. | ||||
| 210 | |||||
| 211 | Added c.t.h.TupleSerialization, a Hadoop Serialization implementation. Tuple is no longer Hadoop Writable | ||||
| 212 | and now relies on TupleSerialization for serialization support. Subequently nested objects in c.t.Tuple | ||||
| 213 | only need to be c.l.Comparable. So they can be serialized properly, a Serialization implementation must be | ||||
| 214 | registered with Hadoop. Note all primitive types are handled directly by Tuple, but custom types must | ||||
| 215 | have a Serialization implementation, or must be Hadoop WritableComparable so that the default WritableSerialization | ||||
| 216 | implementation will write them out. | ||||
| 217 | |||||
| 2a5bca9f » | cwensel | 2008-10-31 | 218 | 0.8.3 | |
| 5249bc9e » | cwensel | 2008-11-07 | 219 | ||
| 2a5bca9f » | cwensel | 2008-10-31 | 220 | Fix for c.p.CoGroup declared fields being generated out of order. | |
| 221 | |||||
| 81198bc5 » | cwensel | 2008-10-14 | 222 | 0.8.2 | |
| 6283af54 » | cwensel | 2008-09-16 | 223 | ||
| e4e49bd0 » | cwensel | 2008-09-20 | 224 | Added new properties via c.f.FlowConnector.setJarClass and c.f.FlowConnector.setJarPath for | |
| 225 | setting the application jar file. | ||||
| 226 | |||||
| 227 | Fixed bug where job jar was not being inherited by subsequent MapReduce jobs when the first job was executed | ||||
| 228 | in local mode. | ||||
| 229 | |||||
| 6283af54 » | cwensel | 2008-09-16 | 230 | Fixed bug where unserializable Operations were being squashed internally. c.f.Flow instances will now | |
| 231 | fail immediately and be marked as 'failed'. | ||||
| 232 | |||||
| 705ebd27 » | cwensel | 2008-09-13 | 233 | 0.8.1 | |
| 26b8ae65 » | cwensel | 2008-09-10 | 234 | ||
| 3e534c7a » | cwensel | 2008-09-13 | 235 | Fixed bug where c.t.Lfs did not force local mode for current MapReduce step. | |
| e077914d » | cwensel | 2008-09-13 | 236 | ||
| a35fc62c » | cwensel | 2008-09-11 | 237 | Fixed bug where writing to a c.t.TupleCollector would fail if using a c.s.SequenceFile in some cases. | |
| 238 | |||||
| 705ebd27 » | cwensel | 2008-09-13 | 239 | Added a few minor improvements to reduce stray object creations, and speedup c.t.Tuple serialization. | |
| 240 | |||||
| 305472ec » | cwensel | 2008-09-08 | 241 | 0.8.0 | |
| 7032e8ab » | cwensel | 2008-08-04 | 242 | ||
| d848a01a » | cwensel | 2008-09-08 | 243 | Updated c.o.x.TagSoupParser to accept 'features', use these features to recover past behaviors. | |
| 244 | |||||
| 625b7a75 » | cwensel | 2008-09-08 | 245 | Updated janino and tagsoup libraries to 2.5.15 and 1.2, respectively. Note that tagsoup, in theory, is not | |
| 246 | backwards compatible by default. See their release notes: http://home.ccil.org/~cowan/XML/tagsoup/#1.2 | ||||
| 247 | |||||
| 4f5f0412 » | cwensel | 2008-09-02 | 248 | Added some forward compatible changes for supporting Hadoop 0.18 at the API level. Currently there are other | |
| 249 | issues preventing some tests from passing on Hadoop 0.18. | ||||
| 250 | |||||
| ff22efa2 » | cwensel | 2008-09-02 | 251 | Changed c.f.FlowException to return the parent c.f.Flow name. | |
| 252 | |||||
| 208cf895 » | cwensel | 2008-08-24 | 253 | Changed behavior of c.f.MultiMapReducePlanner to use c.t.h.MultiInputFormat to allow single Mappers | |
| 254 | to support many different Hadoop InputFormat types simultaneously. This deprecates the need to normalize | ||||
| 255 | sources to a map and reduces the number of jobs in a c.f.Flow in some cases. | ||||
| 256 | |||||
| ad1e145b » | cwensel | 2008-08-22 | 257 | Changed behavior of Cascading to allow for multiple paths from the same c.t.Tap source to be co-grouped on | |
| 258 | via c.p.CoGroup. This allows for a kind of self-join where each stream is processed by a different operation | ||||
| 259 | path within the Mapper. | ||||
| 260 | |||||
| 3c786d0f » | cwensel | 2008-08-19 | 261 | Added c.o.f.And, c.o.f.Or, c.o.f.Xor, and c.o.f.Not logic operator c.o.Filter implementations. They should be used | |
| 262 | to compose more complex filters from existing implementations. | ||||
| 263 | |||||
| 85825d78 » | cwensel | 2008-08-18 | 264 | Changed the behavior of c.o.BaseOperation to properly initialize itself if it is a c.o.Filter instance. This | |
| 265 | removes the requirement that Filter implementations must set declaredFields to Fields.ALL, as it makes no | ||||
| 266 | sense for a Filter to declare fields. | ||||
| 267 | |||||
| 916c105f » | cwensel | 2008-08-17 | 268 | Added c.f.PlannerException, a subclass of c.f.FlowException, and updated c.f.MultiMapReducePlanner to throw | |
| 269 | it on failures. Functionality of writing DOT files has been moved from FlowException to PlannerException. | ||||
| 270 | |||||
| aa183fcb » | cwensel | 2008-08-17 | 271 | Added c.o.f.FilterNotNull and c.o.f.FilterNull filter classes. | |
| 272 | |||||
| e1c429e1 » | cwensel | 2008-08-17 | 273 | Changed c.f.MultiMapReducePlanner to fail if it encounters an c.p.Each to c.p.Every chain. In these cases, a | |
| 274 | c.p.Group type must be between them. | ||||
| 275 | |||||
| 4b2562f2 » | cwensel | 2008-08-16 | 276 | Deleted c.o.Cut class as it was effectively a duplicate of c.o.Identity. | |
| 277 | |||||
| e32ecd00 » | cwensel | 2008-08-16 | 278 | Changed c.f.MultiMapReducePlanner to fail if a c.p.GroupAssertion is not accompanied by another c.o.Aggregator | |
| 279 | operation. This is required so that the GroupAssertion does not change the passing tuple stream if it is planned out. | ||||
| 280 | |||||
| ff4ec9c5 » | cwensel | 2008-08-16 | 281 | Changed c.f.MultiMapReducePlanner to no longer insert new c.p.Each( ..., new Identity(), ... ) as a place holder. | |
| 282 | |||||
| d09d20d8 » | cwensel | 2008-08-16 | 283 | Renamed c.p.PipeAssembly to c.p.SubAssembly to better reflect its purpose, which is to encapuslate reusable | |
| 77d9f1c5 » | cwensel | 2008-08-16 | 284 | pipe assemblies in the same manner as a sub-process or sub-routine. A temporary c.p.PipeAssembly class has been | |
| 285 | provided for backwards compatibility. | ||||
| d09d20d8 » | cwensel | 2008-08-16 | 286 | ||
| 20c84f5e » | cwensel | 2008-08-16 | 287 | Fixed bug where c.t.TapCollector would throw an NPE if a custom Tap was not using paths. | |
| 288 | |||||
| 233b36ab » | cwensel | 2008-08-16 | 289 | Changed behavior of c.f.Flow where if a c.f.FlowListener throws an exception, the Flow instance receiving the | |
| 290 | exception will stop (by calling Flow.stop()). Listeners will continue to fire as expected and Flow.complete() | ||||
| 291 | will re-throw the thrown exception (as was the original behavior). | ||||
| 292 | |||||
| ea17be01 » | cwensel | 2008-08-16 | 293 | Added ability to set a Cascading specific temporary directory path for use by intermediate taps created | |
| 294 | within c.f.Flow instances. Use c.t.Hfs.setTemporaryDirectory() to configure. | ||||
| 295 | |||||
| 690be501 » | cwensel | 2008-08-07 | 296 | Fixed bug where the 'mapred.jar' property was begin stepped on if previously set by the calling application. | |
| 297 | |||||
| 0e88bbe7 » | cwensel | 2008-08-06 | 298 | Changed c.t.Tap and c.f.Flow to return c.t.TupleIterator and c.t.TupleCollector instead of c.t.TapIterator and | |
| 690be501 » | cwensel | 2008-08-07 | 299 | c.t.TapCollector, respectively. | |
| 0e88bbe7 » | cwensel | 2008-08-06 | 300 | ||
| de3547a9 » | cwensel | 2008-08-06 | 301 | Added c.t.Tap.flowInit( c.f.Flow flow ) to allow a given tap to know what flows it is participating in. It is called | |
| 302 | immediately after the Flow instance is initailized. | ||||
| 303 | |||||
| 85f5d1e1 » | cwensel | 2008-08-06 | 304 | Fixed bug with nested c.p.PipeAssembly instances where some nested assemblies threw an internal error from | |
| 305 | the planner. | ||||
| 306 | |||||
| 39a569f2 » | cwensel | 2008-08-04 | 307 | Changed c.o.Debug to accept a prefix text string that will be prefixed to every message. | |
| 308 | |||||
| 647da594 » | cwensel | 2008-08-04 | 309 | Fixed bug where c.f.MultiMapReducePlanner would fail when normalizing inputs to a group where the inputs | |
| 310 | passed through one or more splits. | ||||
| 311 | |||||
| 7032e8ab » | cwensel | 2008-08-04 | 312 | Fixed bug where c.g.CoGroup silently stepped on input pipes with the same input name. | |
| 313 | |||||
| 1e4579e2 » | cwensel | 2008-07-21 | 314 | 0.7.1 | |
| 88d0792f » | cwensel | 2008-07-17 | 315 | ||
| 44faf6ce » | cwensel | 2008-07-21 | 316 | Fixed bug in c.f.MultiMapReducePlanner where a source used on more than one c.p.Group would cause an internal | |
| 317 | error during planning. | ||||
| 318 | |||||
| 319 | Changed c.f.MultiMapReducePlanner to normalize heterogeneous sinks. | ||||
| 94b252e2 » | cwensel | 2008-07-21 | 320 | ||
| 9a1460fd » | cwensel | 2008-07-19 | 321 | Changed c.f.MultiMapReducePlanner to keep a splitting c.p.Each on the previous step, instead of being duplicated | |
| 322 | on each branch. If the Each is preceeded by a source c.t.Tap, it will be duplicated across branches to reduce | ||||
| 323 | the number of step in the Flow. | ||||
| 324 | |||||
| c0e8cf9c » | cwensel | 2008-07-19 | 325 | Fixed bug in c.f.MultiMapReducePlanner where too many temp tap instances were being inserted while normalizing | |
| 326 | the flow sources. | ||||
| 327 | |||||
| 9133dea0 » | cwensel | 2008-07-17 | 328 | Changed c.t.Fields to fail if given duplicate field names. | |
| 329 | |||||
| 13a881a7 » | cwensel | 2008-07-17 | 330 | Changed behavior if Hadoop FileInputSplit is not used and property "map.input.file" is not set. If there is one | |
| 331 | source, it will returned as the source for the mapper stack, otherwise an exception is thrown. Subsequently joins | ||||
| 332 | and merges of non-file sources is not supported until a discriminator can be passed to the mapper. | ||||
| 333 | |||||
| 21fec88f » | cwensel | 2008-07-17 | 334 | Fixed bug in c.t.Tuple where NPE was thrown under certain compareTo operations. | |
| 335 | |||||
| 336 | Fixed bug that prevented CoGrouping or Merging on the same source even though it was one or more Groupings away. | ||||
| 88d0792f » | cwensel | 2008-07-17 | 337 | ||
| 58823a99 » | cwensel | 2008-07-15 | 338 | 0.7.0 | |
| 483d1c22 » | cwensel | 2008-06-20 | 339 | ||
| c99b7d41 » | cwensel | 2008-07-15 | 340 | Changes project structure, removed 'examples' sub-project. | |
| 341 | |||||
| 342 | Updated to support Hadoop 0.17.x. This version is not API compatible with any Hadoop version less than 0.17.0. | ||||
| 3c446858 » | cwensel | 2008-07-14 | 343 | ||
| 1f5628f7 » | cwensel | 2008-07-10 | 344 | Added ability to stop all c.f.Flows executing within a c.c.Cascade instance via the stop() method. | |
| 345 | |||||
| 19384a6a » | cwensel | 2008-07-10 | 346 | Changed c.f.FlowConnector to only take a Map of properties. These properties are passed downstream to various | |
| 347 | subsystems. This removes the Hadoop JobConf constructor, but it still can be passed as a property value. Also | ||||
| 348 | properties will be pushed into a defaul JobConf, bypassing any direct JobConf coupling in applications. | ||||
| 349 | |||||
| b2b658db » | cwensel | 2008-07-09 | 350 | Changed c.f.Flow to automatically register a shutdown hook killing remote jobs on vm exit. | |
| 351 | |||||
| 352 | Changed c.f.Flow.stop() to immediately stop all running jobs. | ||||
| 353 | |||||
| 13f08268 » | cwensel | 2008-07-09 | 354 | Changed c.o.Operation to an interface and introduced c.o.BaseOperation. This makes creating custom Operation types | |
| 355 | more flexible and intuitive. c.o.Filter, c.o.Function, c.o.Aggregator, and c.o.Assertion now extend c.o.Operation. | ||||
| 356 | |||||
| 9da69143 » | cwensel | 2008-07-09 | 357 | Added c.p.c.OuterJoin, c.p.c.MixedJoin, c.p.c.LeftJoin, and c.p.c.RightJoin c.p.c.CoGrouper classes. They | |
| 358 | compliment the default c.p.c.InnerJoin CoGrouper class. | ||||
| 45aa7b61 » | cwensel | 2008-07-09 | 359 | ||
| 0c2afeac » | cwensel | 2008-07-04 | 360 | Added support for passing an intermediateSchemeClass to the underlying planner to be used as the default c.s.Scheme | |
| 361 | for intermediate c.t.Tap instances internal to a given c.f.Flow. | ||||
| 362 | |||||
| 58eaa495 » | cwensel | 2008-07-03 | 363 | Fixed bug where c.p.Group is immediately followed by another c.p.Group (or their sub-classes) and fields could not | |
| 364 | be resolved between them. | ||||
| 365 | |||||
| 8c0ac8b9 » | cwensel | 2008-07-03 | 366 | Added support for c.t.Tap instances implementing c.f.FlowListener. If implemented, they will automatically be | |
| 06e52306 » | cwensel | 2008-07-15 | 367 | added to the Flow event listeners collection and will receive Flow events. | |
| 8c0ac8b9 » | cwensel | 2008-07-03 | 368 | ||
| dcff82c8 » | cwensel | 2008-07-03 | 369 | Fixed case where multiple source c.t.Tap instances return true for the containsFile method. Now verifies only one | |
| 370 | Tap contains the file, and fails otherwise. | ||||
| 371 | |||||
| 372 | Changed c.s.TextLine to not set numSinkParts to 1 by default. Now uses the natural number of parts. | ||||
| 373 | |||||
| e9106263 » | cwensel | 2008-06-30 | 374 | Changed MapReduce planner to force an intermediate file between branches with Hadoop incompatible source Taps | |
| 375 | on joins/merges. If the taps are compatible (have same Scheme), all branches will be processed in same Mapper | ||||
| dcff82c8 » | cwensel | 2008-07-03 | 376 | before the c.p.Group. | |
| e9106263 » | cwensel | 2008-06-30 | 377 | ||
| 3188b055 » | cwensel | 2008-06-30 | 378 | Added merge capabilities in c.p.GroupBy. This allows multiple input branches to be grouped as if a single stream. | |
| 379 | |||||
| b6e16c44 » | cwensel | 2008-06-24 | 380 | Fixed bug in c.t.TapCollector where writing to a Sequence file threw a NPE. | |
| 381 | |||||
| 483d1c22 » | cwensel | 2008-06-20 | 382 | Added c.f.MapReduceFlow to support custom MapReduce jobs, allowing them to participate in a Cascade job. | |
| 383 | |||||
| 5425a34b » | cwensel | 2008-06-17 | 384 | 0.6.1 | |
| 64465379 » | cwensel | 2008-06-13 | 385 | ||
| 386 | Changed thrown c.f.FlowException instances to include cause message. | ||||
| 387 | |||||
| 388 | Fixed bug where empty sink or source map was not detected. | ||||
| 389 | |||||
| 1ef85e80 » | cwensel | 2008-06-12 | 390 | 0.6.0 | |
| ff7d19bb » | cwensel | 2008-05-07 | 391 | ||
| 68417901 » | cwensel | 2008-06-12 | 392 | Changed default argument selector for c.p.Every to be Fields.ALL, to be consistent with the default value of c.p.Each. | |
| 393 | |||||
| d3845abe » | cwensel | 2008-06-10 | 394 | Added support for assembly traps. If an exception is thrown from inside an c.o.Operation, the offending Tuple | |
| 395 | can be saved to a file for later processing, allowing the job to complete. | ||||
| 396 | |||||
| 397 | Added support for stream assertions. STRICT and VALID assertions can be built into a pipe assembly, and optionally | ||||
| 398 | planned out during runtime. Assertions will throw exceptions if they fail. | ||||
| 399 | |||||
| b006cafa » | cwensel | 2008-05-09 | 400 | Changed c.o.a.First, Last, Min, and Max to optionally ignore specified values. Useful if you do not wish | |
| 401 | for a 'default' value to be considered first, or last in a set. | ||||
| 402 | |||||
| 403 | Changed c.o.a.Sum to take a Class for coercion of the result value. | ||||
| af68539d » | cwensel | 2008-05-08 | 404 | ||
| 405 | Changes c.o.Max and Min to use infinity as initial values so zero is bigger than a really small number | ||||
| 406 | for Max, and zero is smaller than a really big number for Min. | ||||
| 407 | |||||
| ff7d19bb » | cwensel | 2008-05-07 | 408 | Changed order of JobConf initialization. c.f.FlowStep now is added to the JobConf last in order to catch | |
| af68539d » | cwensel | 2008-05-08 | 409 | all lazily configured values. | |
| ff7d19bb » | cwensel | 2008-05-07 | 410 | ||
| 411 | Changed compile to include debug info by default. | ||||
| 412 | |||||
| 413 | Fixed bug in c.t.MultiTap where super scheme was not returned if available. | ||||
| 414 | |||||
| fc9c649b » | cwensel | 2008-05-05 | 415 | 0.5.0 | |
| 4477dbf8 » | cwensel | 2008-05-01 | 416 | ||
| ac1016fe » | cwensel | 2008-05-02 | 417 | Added skipIfSinkExists property to c.f.Flow. Set to true if the c.c.Cascade should skip the Flow instance even | |
| 418 | if the sink is stale and not set to be deleted on initialization. | ||||
| 419 | |||||
| fc9c649b » | cwensel | 2008-05-05 | 420 | Fixed bug in c.t.h.HttpFileSystem that URL escaped the ? prefixing the query string. | |
| 4477dbf8 » | cwensel | 2008-05-01 | 421 | ||
| 422 | Fixed bug where a join with duplicate taps was not recognized during job planning. Now an appropriate error | ||||
| 423 | message is displayed, instead of jobs completing with only one instance of the resource stream. | ||||
| 424 | |||||
| 425 | Fixed c.t.h.HttpFileSystem to remember authority information in the url and prefix it when missing. | ||||
| 426 | |||||
| 427 | Changed c.s.TextLine to accept either on or two source fields. If one, only the 'line' value | ||||
| 428 | is sourced from the value, discarding the 'offset' value. | ||||
| 429 | |||||
| 430 | Added c.o.r.RegexSplitGenerator to support splitting single tuple values into multiple tuples based on a regex | ||||
| 431 | delimiter. Includes new tests. | ||||
| 432 | |||||
| 433 | Added c.s.CascadeStats and c.s.FlowStats to provide access to current state and statistics of particular | ||||
| 434 | Cascade, Flow, or the child Flows of a Cascade. | ||||
| 435 | |||||
| 436 | Added ability to sort grouping values with sort argument on c.p.GroupBy. Sorts can be reversed. | ||||
| 437 | |||||
| 438 | Added c.o.e.ExpressionFilter, the c.o.Filter analog to c.o.e.ExpressionFunction. | ||||
| 439 | |||||
| 440 | 0.4.1 | ||||
| 441 | |||||
| 442 | Fixed path normalization regex in c.u.Util where it munged any path starting with file:///. | ||||
| 443 | |||||
| 444 | 0.4.0 | ||||
| 445 | |||||
| 446 | Changed c.p.GroupBy default grouping fields to c.t.Fields.ALL from Fields.FIRST. This change provides a simple | ||||
| 447 | way to sort a tuple stream based on the order of the tuple fields. | ||||
| 448 | |||||
| 449 | Changed c.f.FlowConnector to create c.f.Flow instances that will bypass the reducer if no c.p.Group is participating | ||||
| 450 | in the assembly. Previoiusly Group instances were inserted if missing. This allows a chain of c.p.Every instances | ||||
| 451 | to be used to process/filter a tuple stream without the invoking the reducer needlessly (if a sort isn't required). | ||||
| 452 | This change also supports bypassing the default Hadoop OutputCollector in the mapper via the sink c.t.Tap instance. | ||||
| 453 | |||||
| 454 | Changed c.f.FlowStep behavior to run in 'local' mode if either the sink or source tap is a c.t.Lfs instance. This | ||||
| 455 | allows for c.f.Flow instances to run mixed if configured to execute on a particular cluster by default. This behavior | ||||
| 456 | supports complex import/export processes against the HDFS or other supported remote filesystem. | ||||
| 457 | |||||
| 458 | Changed behavior of c.t.Dfs to force use of HDFS. Previously Dfs would default to the local FileSystem | ||||
| 459 | if the job was run in 'local'mode. Now a Dfs instance will cause failures if it cannot connect to a HDFS cluster. | ||||
| 460 | Using c.t.Hfs will provide previous Dfs behavior. Hfs will use the 'default' filesystem if a scheme is not present | ||||
| 461 | in the 'stringPath' (i.e. hdfs://host:port/some/path). | ||||
| 462 | |||||
| 463 | Added c.stats package to allow for collecting statics of Cascades, Flows, and FlowSteps. | ||||
| 464 | |||||
| 465 | Updated c.f.Flow and c.c.Cascade log messages to be easier to follow when executing many flow instances | ||||
| 466 | simultaneously. | ||||
| 467 | |||||
| 468 | Added compression flag to c.s.TextLine. Can now toggle compression (Hadoop style compression) per Tap instance. | ||||
| 469 | This prevents clusters with compression enabled by default to export text files with a .deflate extension. | ||||
| 470 | |||||
| 471 | Added support for bypassing Hadoop OutputCollector via Tap.setUseTapCollector() method. Setting to true will force | ||||
| 472 | Cascading to use the c.t.TapCollector instead. This bypasses bugs in Hadoop with custom FileSystem types. This will | ||||
| 473 | always be true for http(s) and s3tp filesystems when using a c.t.Hfs Tap type (atleast until HADOOP-3021 is resolved). | ||||
| 474 | |||||
| 475 | Added c.t.TupleCollector, complementing c.t.TupleIterator, for directly writing Tuple instances out via a c.t.Tap | ||||
| 476 | instance. | ||||
| 477 | |||||
| 478 | Added c.f.FlowListener so that c.f.Flow instances can fire events on starting, completed, and throwable. | ||||
| 479 | |||||
| 480 | Changed c.t.h.S3HttpFileSystem so it can now create files remotely. | ||||
| 481 | |||||
| 482 | Renamed cascading.spill.threshold to cascading.cogroup.spill.threshold, so there is less a chance of collision. | ||||
| 483 | |||||
| 484 | Made numerous optimizations to improve overall performance. Namely split and merge of key/value tuples to remove | ||||
| 485 | redundancy in the stream between the mapper and reducer. | ||||
| 486 | |||||
| 487 | Changed c.p.Operators to push c.o.Operation results directly through to next operation without intermediate | ||||
| 488 | collection. This should improve pipelining of large result streams and lower runtime memory footprint. | ||||
| 489 | |||||
| 490 | Changed c.c.Cascade so it now runs Flows in parallel if Hadoop is clustered, and there are no dependencies between the | ||||
| 491 | Flows. | ||||
| 492 | |||||
| 493 | Moved c.Cascade and related classed to c.cascade package. Wanted to preempt any future ugliness. | ||||
| 494 | |||||
| 495 | Added support in c.t.h.S3HttpFileSystem for these properties: fs.s3tp.awsAccessKeyId and fs.s3tp.awsSecretAccessKey | ||||
| 496 | |||||
| 497 | 0.3.0 | ||||
| 498 | |||||
| 499 | Added ability to push Log4j logger properties to mapper/reducer via JobConf. | ||||
| 500 | Use jobConf.set("log4j.logger","logger1=LEVEL,logger2=LEVEL") | ||||
| 501 | |||||
| 502 | Added missing equals() and hashCode() in c.t.MultiTap. | ||||
| 503 | |||||
| 504 | Added c.t.h.ZipInputFormat (and ZipSplit) to support zip files. c.s.TextLine supports transparent | ||||
| 505 | reading of zip files if the filename ends with .zip, but cannot write to them. This code is | ||||
| 506 | loosely based on HADOOP-1824. If the underlying filesystem is hdfs or file, splits will be created | ||||
| 507 | for each ZipEntry. Otherwise ZipEntries are iterated over to be more stream friendly. Progress status is | ||||
| 508 | supported. | ||||
| 509 | |||||
| 510 | Added http, https, and s3tp read-only file systems to Hadoop. Use these URLs, respectively: | ||||
| 511 | http://, https://, and s3tp://AWS_ACCESS_KEY_ID:AWS_SECRET_ACCESS_KEY@bucket-name/key | ||||
| 512 | |||||
| 513 | Added c.o.t.DateFormatter supporting text formatting of time stamps created by c.o.t.DateParser. | ||||
| 514 | |||||
| 515 | Fixed bug where in complex assemblies, some Scopes were not resolved. | ||||
| 516 | |||||
| 517 | Fixed bug where tap instances were not being inserted before some CoGroup joins if there was a previous Group in the | ||||
| 518 | assembly. | ||||
| 519 | |||||
| 520 | Upgraded JGraphT to 0.7.3 | ||||
| 521 | |||||
| 522 | Changed c.t.SpillableTupleList allows for iteration across entries. | ||||
| 523 | |||||
| 524 | Changed c.f.FlowException to optionally allow for printing of underlying pipe graph for debugging. | ||||
| 525 | |||||
| 526 | Added c.o.t.FieldFormatter function to format Tuples into complex strings using j.u.Formatter formatting. | ||||
| 527 | |||||
| 528 | Added c.o.a.Last aggregator to find the last value encountered in a group. | ||||
| 529 | |||||
| 530 | Changed c.o.a.Max and c.o.a.Min to maintain original value type. Will return null if no values are encountered. | ||||
| 531 | |||||
| 532 | Changed c.o.a.First to use Fields.ARG by default. Removed Fields constructor. | ||||
| 533 | |||||
| 534 | Added c.t.Fields.join(Fields...) method to allow for joining multiple Fields instances into a new instance. | ||||
| 535 | |||||
| 536 | Can retrieve Tuple values by field name through the TupleEntry class via the get(String) method. | ||||
| 537 | |||||
| 538 | Added c.t.TupleCollector interface to simplify the operation interfaces. | ||||
| 539 | |||||
| 540 | Added a Debug filter that will print to either stderr or stdout. Useful for debugging stream transformations. | ||||
| 541 | |||||
| 542 | Added CascadingTestCase base test class | ||||
| 543 | |||||
| 544 | Added Insert Function that allows for literal values to be inserted into the Tuple stream. | ||||
| 545 | |||||
| 546 | 0.2.0 | ||||
| 547 | |||||
| 548 | CoGroup will now spill to disk on extremely large co-groupings. Configurable via "cascading.spill.threshold". | ||||
| 549 | Defaults to 10k elements. | ||||
| 550 | |||||
| 551 | java.util.Properties instances can be used to set defauls for FlowConnectors. | ||||
| 552 | |||||
| 553 | Fix for InnerJoin, the default join for CoGroup. | ||||
| 554 | |||||
| 555 | Introduced MultiTap to support concatenation of files into a pipe assembly. | ||||
| 556 | |||||
| 557 | RegexParser now fails on a failed match. Prevents it being used or behaving as a filter. | ||||
| 558 | |||||
| 559 | Fixed bug with PipeAssembly instances not properly being assimiliated into the pipeGraph. | ||||
| 560 | |||||
| 561 | Fixed assertion error thrown by JGraphT. | ||||
| 562 | |||||
| 563 | Renamed Tap method deleteOnInit to deleteOnSinkInit. | ||||
| 564 | |||||
| 565 | |||||
| 566 | 0.1.0 | ||||
| 567 | |||||
| 9bdcead7 » | cwensel | 2008-12-12 | 568 | First release. | |
