This repository has been archived by the owner on Feb 9, 2021. It is now read-only.
/
CHANGES.txt
4607 lines (3222 loc) · 182 KB
/
CHANGES.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Hadoop Change Log
Release 0.16.4 - 2008-05-05
BUG FIXES
HADOOP-3138. DFS mkdirs() should not throw an exception if the directory
already exists. (rangadi via mukund)
HADOOP-3294. Fix distcp to check the destination length and retry the copy
if it doesn't match the src length. (Tsz Wo (Nicholas), SZE via mukund)
HADOOP-3304. [HOD] Fixes the way the logcondense.py utility searches for log
files that need to be deleted. (yhemanth via mukund)
HADOOP-3186. Fix incorrect permission checkding for mv and renameTo
in HDFS. (Tsz Wo (Nicholas), SZE via mukund)
Release 0.16.3 - 2008-04-16
BUG FIXES
HADOOP-3010. Fix ConcurrentModificationException in ipc.Server.Responder.
(rangadi)
HADOOP-3154. Catch all Throwables from the SpillThread in MapTask, rather
than IOExceptions only. (ddas via cdouglas)
HADOOP-3159. Avoid file system cache being overwritten whenever
configuration is modified. (Tsz Wo (Nicholas), SZE via hairong)
HADOOP-3139. Remove the consistency check for the FileSystem cache in
closeAll() that causes spurious warnings and a deadlock.
(Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-3195. Fix TestFileSystem to be deterministic.
(Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-3069. Primary name-node should not truncate image when transferring
it from the secondary. (shv)
HADOOP-3182. Change permissions of the job-submission directory to 777
from 733 to ensure sharing of HOD clusters works correctly. (Tsz Wo
(Nicholas), Sze and Amareshwari Sri Ramadasu via acmurthy)
Release 0.16.2 - 2008-04-02
BUG FIXES
HADOOP-3011. Prohibit distcp from overwriting directories on the
destination filesystem with files. (cdouglas)
HADOOP-3033. The BlockReceiver thread in the datanode writes data to
the block file, changes file position (if needed) and flushes all by
itself. The PacketResponder thread does not flush block file. (dhruba)
HADOOP-2978. Fixes the JobHistory log format for counters.
(Runping Qi via ddas)
HADOOP-2985. Fixes LocalJobRunner to tolerate null job output path.
Also makes the _temporary a constant in MRConstants.java.
(Amareshwari Sriramadasu via ddas)
HADOOP-3003. FileSystem cache key is updated after a
FileSystem object is created. (Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-3042. Updates the Javadoc in JobConf.getOutputPath to reflect
the actual temporary path. (Amareshwari Sriramadasu via ddas)
HADOOP-3007. Tolerate mirror failures while DataNode is replicating
blocks as it used to before. (rangadi)
HADOOP-2944. Fixes a "Run on Hadoop" wizard NPE when creating a
Location from the wizard. (taton)
HADOOP-3049. Fixes a problem in MultiThreadedMapRunner to do with
catching RuntimeExceptions. (Alejandro Abdelnur via ddas)
HADOOP-3039. Fixes a problem to do with exceptions in tasks not
killing jobs. (Amareshwari Sriramadasu via ddas)
HADOOP-3027. Fixes a problem to do with adding a shutdown hook in
FileSystem. (Amareshwari Sriramadasu via ddas)
HADOOP-3056. Fix distcp when the target is an empty directory by
making sure the directory is created first. (cdouglas and acmurthy
via omalley)
HADOOP-3070. Protect the trash emptier thread from null pointer
exceptions. (Koji Noguchi via omalley)
HADOOP-3084. Fix HftpFileSystem to work for zero-lenghth files.
(cdouglas)
HADOOP-3107. Fix NPE when fsck invokes getListings. (dhruba)
HADOOP-3105. Fix TestMultiThreadedMapRunner to use interfaces
available to 0.16. Clearly, this fix was not checked into trunk.
(Alejandro Abdelnur and Mukund Madhugiri via cdouglas)
HADOOP-3111. Remove HBase from Hadoop contrib
HADOOP-3103. [HOD] Hadoop.tmp.dir should not be set to cluster
directory. (Vinod Kumar Vavilapalli via ddas)
HADOOP-3104. Limit MultithreadedMapRunner to have a fixed length queue
between the RecordReader and the map threads. (Alejandro Abdelnur via
omalley)
HADOOP-2833. Do not use "Dr. Who" as the default user in JobClient.
A valid user name is required. (Tsz Wo (Nicholas), SZE via rangadi)
HADOOP-3128. Throw RemoteException in setPermissions and setOwner of
DistributedFileSystem. (shv via nigel)
Release 0.16.1 - 2008-03-13
INCOMPATIBLE CHANGES
HADOOP-2861. Improve the user interface for the HOD commands.
Command line structure has changed. (Hemanth Yamijala via nigel)
HADOOP-2869. Deprecate SequenceFile.setCompressionType in favor of
SequenceFile.createWriter, SequenceFileOutputFormat.setCompressionType,
and JobConf.setMapOutputCompressionType. (Arun C Murthy via cdouglas)
Configuration changes to hadoop-default.xml:
deprecated io.seqfile.compression.type
IMPROVEMENTS
HADOOP-2371. User guide for file permissions in HDFS.
(Robert Chansler via rangadi)
HADOOP-2730. HOD documentation update.
(Vinod Kumar Vavilapalli via ddas)
HADOOP-2911. Make the information printed by the HOD allocate and
info commands less verbose and clearer. (Vinod Kumar via nigel)
HADOOP-3098. Allow more characters in user and group names while
using -chown and -chgrp commands. (rangadi)
BUG FIXES
HADOOP-2789. Race condition in IPC Server Responder that could close
connections early. (Raghu Angadi)
HADOOP-2785. minor. Fix a typo in Datanode block verification
(Raghu Angadi)
HADOOP-2788. minor. Fix help message for chgrp shell command (Raghu Angadi).
HADOOP-1188. fstime file is updated when a storage directory containing
namespace image becomes inaccessible. (shv)
HADOOP-2787. An application can set a configuration variable named
dfs.umask to set the umask that is used by DFS.
(Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-2780. The default socket buffer size for DataNodes is 128K.
(dhruba)
HADOOP-2716. Superuser privileges for the Balancer.
(Tsz Wo (Nicholas), SZE via shv)
HADOOP-2754. Filter out .crc files from local file system listing.
(Hairong Kuang via cdouglas)
HADOOP-2733. Fix compiler warnings in test code.
(Tsz Wo (Nicholas), SZE via cdouglas)
HADOOP-2725. Modify distcp to avoid leaving partially copied files at
the destination after encountering an error. (Tsz Wo (Nicholas), SZE
via cdouglas)
HADOOP-2391. Cleanup job output directory before declaring a job as
SUCCESSFUL. (Amareshwari Sri Ramadasu via ddas)
HADOOP-2808. Minor fix to FileUtil::copy to mind the overwrite
formal. (cdouglas)
HADOOP-2683. Moving UGI out of the RPC Server.
(Tsz Wo (Nicholas), SZE via shv)
HADOOP-2814. Fix for NPE in datanode in unit test TestDataTransferProtocol.
(Raghu Angadi via dhruba)
HADOOP-2811. Dump of counters in job history does not add comma between
groups. (runping via omalley)
HADOOP-2735. Enables setting TMPDIR for tasks.
(Amareshwari Sri Ramadasu via ddas)
HADOOP-2843. Fix protections on map-side join classes to enable derivation.
(cdouglas via omalley)
HADOOP-2840. Fix gridmix scripts to correctly invoke the java sort through
the proper jar. (Mukund Madhugiri via cdouglas)
HADOOP-2766. Enables setting of HADOOP_OPTS env variable for the hadoop
daemons through HOD. (Vinod Kumar Vavilapalli via ddas)
HADOOP-2769. TestNNThroughputBnechmark should not use a fixed port for
the namenode http port. (omalley)
HADOOP-2852. Update gridmix benchmark to avoid an artifically long tail.
(cdouglas)
HADOOP-2894. Fix a problem to do with tasktrackers failing to connect to
JobTracker upon reinitialization. (Owen O'Malley via ddas).
HADOOP-2903. Fix exception generated by Metrics while using pushMetric().
(girish vaitheeswaran via dhruba)
HADOOP-2904. Fix to RPC metrics to log the correct host name.
(girish vaitheeswaran via dhruba)
HADOOP-2918. Improve error logging so that dfs writes failure with
"No lease on file" can be diagnosed. (dhruba)
HADOOP-2923. Add SequenceFileAsBinaryInputFormat, which was
missed in the commit for HADOOP-2603. (cdouglas via omalley)
HADOOP-2847. Ensure idle cluster cleanup works even if the JobTracker
becomes unresponsive to RPC calls. (Hemanth Yamijala via nigel)
HADOOP-2809. Fix HOD syslog config syslog-address so that it works.
(Hemanth Yamijala via nigel)
HADOOP-2931. IOException thrown by DFSOutputStream had wrong stack
trace in some cases. (Michael Bieniosek via rangadi)
HADOOP-2883. Write failures and data corruptions on HDFS files.
The write timeout is back to what it was on 0.15 release. Also, the
datnodes flushes the block file buffered output stream before
sending a positive ack for the packet back to the client. (dhruba)
HADOOP-2925. Fix HOD to create the mapred system directory using a
naming convention that will avoid clashes in multi-user shared
cluster scenario. (Hemanth Yamijala via nigel)
HADOOP-2756. NPE in DFSClient while closing DFSOutputStreams
under load. (rangadi)
HADOOP-2958. Fixed FileBench which broke due to HADOOP-2391 which performs
a check for existence of the output directory and a trivial bug in
GenericMRLoadGenerator where min/max word lenghts were identical since
they were looking at the same config variables (Chris Douglas via
acmurthy)
HADOOP-2915. Fixed FileSystem.CACHE so that a username is included
in the cache key. (Tsz Wo (Nicholas), SZE via nigel)
HADOOP-2813. TestDU unit test uses its own directory to run its
sequence of tests. (Mahadev Konar via dhruba)
HADOOP-3159. Avoid file system cache being overwritten whenever
configuration is modified. (Tsz Wo (Nicholas), SZE via hairong)
Release 0.16.0 - 2008-02-07
INCOMPATIBLE CHANGES
HADOOP-1245. Use the mapred.tasktracker.tasks.maximum value
configured on each tasktracker when allocating tasks, instead of
the value configured on the jobtracker. InterTrackerProtocol
version changed from 5 to 6. (Michael Bieniosek via omalley)
HADOOP-1843. Removed code from Configuration and JobConf deprecated by
HADOOP-785 and a minor fix to Configuration.toString. Specifically the
important change is that mapred-default.xml is no longer supported and
Configuration no longer supports the notion of default/final resources.
(acmurthy)
HADOOP-1302. Remove deprecated abacus code from the contrib directory.
This also fixes a configuration bug in AggregateWordCount, so that the
job now works. (enis)
HADOOP-2288. Enhance FileSystem API to support access control.
(Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-2184. RPC Support for user permissions and authentication.
(Raghu Angadi via dhruba)
HADOOP-2185. RPC Server uses any available port if the specified
port is zero. Otherwise it uses the specified port. Also combines
the configuration attributes for the servers' bind address and
port from "x.x.x.x" and "y" to "x.x.x.x:y".
Deprecated configuration variables:
dfs.info.bindAddress
dfs.info.port
dfs.datanode.bindAddress
dfs.datanode.port
dfs.datanode.info.bindAdress
dfs.datanode.info.port
dfs.secondary.info.bindAddress
dfs.secondary.info.port
mapred.job.tracker.info.bindAddress
mapred.job.tracker.info.port
mapred.task.tracker.report.bindAddress
tasktracker.http.bindAddress
tasktracker.http.port
New configuration variables (post HADOOP-2404):
dfs.secondary.http.address
dfs.datanode.address
dfs.datanode.http.address
dfs.http.address
mapred.job.tracker.http.address
mapred.task.tracker.report.address
mapred.task.tracker.http.address
(Konstantin Shvachko via dhruba)
HADOOP-2401. Only the current leaseholder can abandon a block for
a HDFS file. ClientProtocol version changed from 20 to 21.
(Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-2381. Support permission information in FileStatus. Client
Protocol version changed from 21 to 22. (Raghu Angadi via dhruba)
HADOOP-2110. Block report processing creates fewer transient objects.
Datanode Protocol version changed from 10 to 11.
(Sanjay Radia via dhruba)
HADOOP-2567. Add FileSystem#getHomeDirectory(), which returns the
user's home directory in a FileSystem as a fully-qualified path.
FileSystem#getWorkingDirectory() is also changed to return a
fully-qualified path, which can break applications that attempt
to, e.g., pass LocalFileSystem#getWorkingDir().toString() directly
to java.io methods that accept file names. (cutting)
HADOOP-2514. Change trash feature to maintain a per-user trash
directory, named ".Trash" in the user's home directory. The
"fs.trash.root" parameter is no longer used. Full source paths
are also no longer reproduced within the trash.
HADOOP-2012. Periodic data verification on Datanodes.
(Raghu Angadi via dhruba)
HADOOP-1707. The DFSClient does not use a local disk file to cache
writes to a HDFS file. Changed Data Transfer Version from 7 to 8.
(dhruba)
HADOOP-2652. Fix permission issues for HftpFileSystem. This is an
incompatible change since distcp may not be able to copy files
from cluster A (compiled with this patch) to cluster B (compiled
with previous versions). (Tsz Wo (Nicholas), SZE via dhruba)
NEW FEATURES
HADOOP-1857. Ability to run a script when a task fails to capture stack
traces. (Amareshwari Sri Ramadasu via ddas)
HADOOP-2299. Defination of a login interface. A simple implementation for
Unix users and groups. (Hairong Kuang via dhruba)
HADOOP-1652. A utility to balance data among datanodes in a HDFS cluster.
(Hairong Kuang via dhruba)
HADOOP-2085. A library to support map-side joins of consistently
partitioned and sorted data sets. (Chris Douglas via omalley)
HADOOP-1301. Hadoop-On-Demand (HOD): resource management
provisioning for Hadoop. (Hemanth Yamijala via nigel)
HADOOP-2336. Shell commands to modify file permissions. (rangadi)
HADOOP-1298. Implement file permissions for HDFS.
(Tsz Wo (Nicholas) & taton via cutting)
HADOOP-2447. HDFS can be configured to limit the total number of
objects (inodes and blocks) in the file system. (dhruba)
HADOOP-2487. Added an option to get statuses for all submitted/run jobs.
This information can be used to develop tools for analysing jobs.
(Amareshwari Sri Ramadasu via acmurthy)
HADOOP-1873. Implement user permissions for Map/Reduce framework.
(Hairong Kuang via shv)
HADOOP-2532. Add to MapFile a getClosest method that returns the key
that comes just before if the key is not present. (stack via tomwhite)
HADOOP-1883. Add versioning to Record I/O. (Vivek Ratan via ddas)
HADOOP-2603. Add SeqeunceFileAsBinaryInputFormat, which reads
sequence files as BytesWritable/BytesWritable regardless of the
key and value types used to write the file. (cdouglas via omalley)
HADOOP-2367. Add ability to profile a subset of map/reduce tasks and fetch
the result to the local filesystem of the submitting application. Also
includes a general IntegerRanges extension to Configuration for setting
positive, ranged parameters. (Owen O'Malley via cdouglas)
IMPROVEMENTS
HADOOP-2045. Change committer list on website to a table, so that
folks can list their organization, timezone, etc. (cutting)
HADOOP-2058. Facilitate creating new datanodes dynamically in
MiniDFSCluster. (Hairong Kuang via dhruba)
HADOOP-1855. fsck verifies block placement policies and reports
violations. (Konstantin Shvachko via dhruba)
HADOOP-1604. An system administrator can finalize namenode upgrades
without running the cluster. (Konstantin Shvachko via dhruba)
HADOOP-1839. Link-ify the Pending/Running/Complete/Killed grid in
jobdetails.jsp to help quickly narrow down and see categorized TIPs'
details via jobtasks.jsp. (Amar Kamat via acmurthy)
HADOOP-1210. Log counters in job history. (Owen O'Malley via ddas)
HADOOP-1912. Datanode has two new commands COPY and REPLACE. These are
needed for supporting data rebalance. (Hairong Kuang via dhruba)
HADOOP-2086. This patch adds the ability to add dependencies to a job
(run via JobControl) after construction. (Adrian Woodhead via ddas)
HADOOP-1185. Support changing the logging level of a server without
restarting the server. (Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-2134. Remove developer-centric requirements from overview.html and
keep it end-user focussed, specifically sections related to subversion and
building Hadoop. (Jim Kellerman via acmurthy)
HADOOP-1989. Support simulated DataNodes. This helps creating large virtual
clusters for testing purposes. (Sanjay Radia via dhruba)
HADOOP-1274. Support different number of mappers and reducers per
TaskTracker to allow administrators to better configure and utilize
heterogenous clusters.
Configuration changes to hadoop-default.xml:
add mapred.tasktracker.map.tasks.maximum (default value of 2)
add mapred.tasktracker.reduce.tasks.maximum (default value of 2)
remove mapred.tasktracker.tasks.maximum (deprecated for 0.16.0)
(Amareshwari Sri Ramadasu via acmurthy)
HADOOP-2104. Adds a description to the ant targets. This makes the
output of "ant -projecthelp" sensible. (Chris Douglas via ddas)
HADOOP-2127. Added a pipes sort example to benchmark trivial pipes
application versus trivial java application. (omalley via acmurthy)
HADOOP-2113. A new shell command "dfs -text" to view the contents of
a gziped or SequenceFile. (Chris Douglas via dhruba)
HADOOP-2207. Add a "package" target for contrib modules that
permits each to determine what files are copied into release
builds. (stack via cutting)
HADOOP-1984. Makes the backoff for failed fetches exponential.
Earlier, it was a random backoff from an interval.
(Amar Kamat via ddas)
HADOOP-1327. Include website documentation for streaming. (Rob Weltman
via omalley)
HADOOP-2000. Rewrite NNBench to measure namenode performance accurately.
It now uses the map-reduce framework for load generation.
(Mukund Madhugiri via dhruba)
HADOOP-2248. Speeds up the framework w.r.t Counters. Also has API
updates to the Counters part. (Owen O'Malley via ddas)
HADOOP-2326. The initial block report at Datanode startup time has
a random backoff period. (Sanjay Radia via dhruba)
HADOOP-2432. HDFS includes the name of the file while throwing
"File does not exist" exception. (Jim Kellerman via dhruba)
HADOOP-2457. Added a 'forrest.home' property to the 'docs' target in
build.xml. (acmurthy)
HADOOP-2149. A new benchmark for three name-node operation: file create,
open, and block report, to evaluate the name-node performance
for optimizations or new features. (Konstantin Shvachko via shv)
HADOOP-2466. Change FileInputFormat.computeSplitSize to a protected
non-static method to allow sub-classes to provide alternate
implementations. (Alejandro Abdelnur via acmurthy)
HADOOP-2425. Change TextOutputFormat to handle Text specifically for better
performance. Make NullWritable implement Comparable. Make TextOutputFormat
treat NullWritable like null. (omalley)
HADOOP-1719. Improves the utilization of shuffle copier threads.
(Amar Kamat via ddas)
HADOOP-2390. Added documentation for user-controls for intermediate
map-outputs & final job-outputs and native-hadoop libraries. (acmurthy)
HADOOP-1660. Add the cwd of the map/reduce task to the java.library.path
of the child-jvm to support loading of native libraries distributed via
the DistributedCache. (acmurthy)
HADOOP-2285. Speeds up TextInputFormat. Also includes updates to the
Text API. (Owen O'Malley via cdouglas)
HADOOP-2233. Adds a generic load generator for modeling MR jobs. (cdouglas)
HADOOP-2369. Adds a set of scripts for simulating a mix of user map/reduce
workloads. (Runping Qi via cdouglas)
HADOOP-2547. Removes use of a 'magic number' in build.xml.
(Hrishikesh via nigel)
HADOOP-2268. Fix org.apache.hadoop.mapred.jobcontrol classes to use the
List/Map interfaces rather than concrete ArrayList/HashMap classes
internally. (Adrian Woodhead via acmurthy)
HADOOP-2406. Add a benchmark for measuring read/write performance through
the InputFormat interface, particularly with compression. (cdouglas)
HADOOP-2131. Allow finer-grained control over speculative-execution. Now
users can set it for maps and reduces independently.
Configuration changes to hadoop-default.xml:
deprecated mapred.speculative.execution
add mapred.map.tasks.speculative.execution
add mapred.reduce.tasks.speculative.execution
(Amareshwari Sri Ramadasu via acmurthy)
HADOOP-1965. Interleave sort/spill in teh map-task along with calls to the
Mapper.map method. This is done by splitting the 'io.sort.mb' buffer into
two and using one half for collecting map-outputs and the other half for
sort/spill. (Amar Kamat via acmurthy)
HADOOP-2464. Unit tests for chmod, chown, and chgrp using DFS.
(Raghu Angadi)
HADOOP-1876. Persist statuses of completed jobs in HDFS so that the
JobClient can query and get information about decommissioned jobs and also
across JobTracker restarts.
Configuration changes to hadoop-default.xml:
add mapred.job.tracker.persist.jobstatus.active (default value of false)
add mapred.job.tracker.persist.jobstatus.hours (default value of 0)
add mapred.job.tracker.persist.jobstatus.dir (default value of
/jobtracker/jobsInfo)
(Alejandro Abdelnur via acmurthy)
HADOOP-2077. Added version and build information to STARTUP_MSG for all
hadoop daemons to aid error-reporting, debugging etc. (acmurthy)
HADOOP-2398. Additional instrumentation for NameNode and RPC server.
Add support for accessing instrumentation statistics via JMX.
(Sanjay radia via dhruba)
HADOOP-2449. A return of the non-MR version of NNBench.
(Sanjay Radia via shv)
HADOOP-1989. Remove 'datanodecluster' command from bin/hadoop.
(Sanjay Radia via shv)
HADOOP-1742. Improve JavaDoc documentation for ClientProtocol, DFSClient,
and FSNamesystem. (Konstantin Shvachko)
HADOOP-2298. Add Ant target for a binary-only distribution.
(Hrishikesh via nigel)
HADOOP-2509. Add Ant target for Rat report (Apache license header
reports). (Hrishikesh via nigel)
HADOOP-2469. WritableUtils.clone should take a Configuration
instead of a JobConf. (stack via omalley)
HADOOP-2659. Introduce superuser permissions for admin operations.
(Tsz Wo (Nicholas), SZE via shv)
HADOOP-2596. Added a SequenceFile.createWriter api which allows the user
to specify the blocksize, replication factor and the buffersize to be
used for the underlying HDFS file. (Alejandro Abdelnur via acmurthy)
HADOOP-2431. Test HDFS File Permissions. (Hairong Kuang via shv)
HADOOP-2232. Add an option to disable Nagle's algorithm in the IPC stack.
(Clint Morgan via cdouglas)
HADOOP-2342. Created a micro-benchmark for measuring
local-file versus hdfs reads. (Owen O'Malley via nigel)
HADOOP-2529. First version of HDFS User Guide. (Raghu Angadi)
OPTIMIZATIONS
HADOOP-1898. Release the lock protecting the last time of the last stack
dump while the dump is happening. (Amareshwari Sri Ramadasu via omalley)
HADOOP-1900. Makes the heartbeat and task event queries interval
dependent on the cluster size. (Amareshwari Sri Ramadasu via ddas)
HADOOP-2208. Counter update frequency (from TaskTracker to JobTracker) is
capped at 1 minute. (Amareshwari Sri Ramadasu via ddas)
HADOOP-2284. Reduce the number of progress updates during the sorting in
the map task. (Amar Kamat via ddas)
BUG FIXES
HADOOP-2583. Fixes a bug in the Eclipse plug-in UI to edit locations.
Plug-in version is now synchronized with Hadoop version.
HADOOP-2100. Remove faulty check for existence of $HADOOP_PID_DIR and let
'mkdir -p' check & create it. (Michael Bieniosek via acmurthy)
HADOOP-1642. Ensure jobids generated by LocalJobRunner are unique to
avoid collissions and hence job-failures. (Doug Cutting via acmurthy)
HADOOP-2096. Close open file-descriptors held by streams while localizing
job.xml in the JobTracker and while displaying it on the webui in
jobconf.jsp. (Amar Kamat via acmurthy)
HADOOP-2098. Log start & completion of empty jobs to JobHistory, which
also ensures that we close the file-descriptor of the job's history log
opened during job-submission. (Amar Kamat via acmurthy)
HADOOP-2112. Adding back changes to build.xml lost while reverting
HADOOP-1622 i.e. http://svn.apache.org/viewvc?view=rev&revision=588771.
(acmurthy)
HADOOP-2089. Fixes the command line argument handling to handle multiple
-cacheArchive in Hadoop streaming. (Lohit Vijayarenu via ddas)
HADOOP-2071. Fix StreamXmlRecordReader to use a BufferedInputStream
wrapped over the DFSInputStream since mark/reset aren't supported by
DFSInputStream anymore. (Lohit Vijayarenu via acmurthy)
HADOOP-1348. Allow XML comments inside configuration files.
(Rajagopal Natarajan and Enis Soztutar via enis)
HADOOP-1952. Improve handling of invalid, user-specified classes while
configuring streaming jobs such as combiner, input/output formats etc.
Now invalid options are caught, logged and jobs are failed early. (Lohit
Vijayarenu via acmurthy)
HADOOP-2151. FileSystem.globPaths validates the list of Paths that
it returns. (Lohit Vijayarenu via dhruba)
HADOOP-2121. Cleanup DFSOutputStream when the stream encountered errors
when Datanodes became full. (Raghu Angadi via dhruba)
HADOOP-1130. The FileSystem.closeAll() method closes all existing
DFSClients. (Chris Douglas via dhruba)
HADOOP-2204. DFSTestUtil.waitReplication was not waiting for all replicas
to get created, thus causing unit test failure.
(Raghu Angadi via dhruba)
HADOOP-2078. An zero size file may have no blocks associated with it.
(Konstantin Shvachko via dhruba)
HADOOP-2212. ChecksumFileSystem.getSumBufferSize might throw
java.lang.ArithmeticException. The fix is to initialize bytesPerChecksum
to 0. (Michael Bieniosek via ddas)
HADOOP-2216. Fix jobtasks.jsp to ensure that it first collects the
taskids which satisfy the filtering criteria and then use that list to
print out only the required task-reports, previously it was oblivious to
the filtering and hence used the wrong index into the array of task-reports.
(Amar Kamat via acmurthy)
HADOOP-2272. Fix findbugs target to reflect changes made to the location
of the streaming jar file by HADOOP-2207. (Adrian Woodhead via nigel)
HADOOP-2244. Fixes the MapWritable.readFields to clear the instance
field variable every time readFields is called. (Michael Stack via ddas).
HADOOP-2245. Fixes LocalJobRunner to include a jobId in the mapId. Also,
adds a testcase for JobControl. (Adrian Woodhead via ddas).
HADOOP-2275. Fix erroneous detection of corrupted file when namenode
fails to allocate any datanodes for newly allocated block.
(Dhruba Borthakur via dhruba)
HADOOP-2256. Fix a buf in the namenode that could cause it to encounter
an infinite loop while deleting excess replicas that were created by
block rebalancing. (Hairong Kuang via dhruba)
HADOOP-2209. SecondaryNamenode process exits if it encounters exceptions
that it cannot handle. (Dhruba Borthakur via dhruba)
HADOOP-2314. Prevent TestBlockReplacement from occasionally getting
into an infinite loop. (Hairong Kuang via dhruba)
HADOOP-2300. This fixes a bug where mapred.tasktracker.tasks.maximum
would be ignored even if it was set in hadoop-site.xml.
(Amareshwari Sri Ramadasu via ddas)
HADOOP-2349. Improve code layout in file system transaction logging code.
(Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-2368. Fix unit tests on Windows.
(Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-2363. This fix allows running multiple instances of the unit test
in parallel. The bug was introduced in HADOOP-2185 that changed
port-rolling behaviour. (Konstantin Shvachko via dhruba)
HADOOP-2271. Fix chmod task to be non-parallel. (Adrian Woodhead via
omalley)
HADOOP-2313. Fail the build if building libhdfs fails. (nigel via omalley)
HADOOP-2359. Remove warning for interruptted exception when closing down
minidfs. (dhruba via omalley)
HADOOP-1841. Prevent slow clients from consuming threads in the NameNode.
(dhruba)
HADOOP-2323. JobTracker.close() should not print stack traces for
normal exit. (jimk via cutting)
HADOOP-2376. Prevents sort example from overriding the number of maps.
(Owen O'Malley via ddas)
HADOOP-2434. FSDatasetInterface read interface causes HDFS reads to occur
in 1 byte chunks, causing performance degradation.
(Raghu Angadi via dhruba)
HADOOP-2459. Fix package target so that src/docs/build files are not
included in the release. (nigel)
HADOOP-2215. Fix documentation in cluster_setup.html &
mapred_tutorial.html reflect that mapred.tasktracker.tasks.maximum has
been superceeded by mapred.tasktracker.{map|reduce}.tasks.maximum.
(Amareshwari Sri Ramadasu via acmurthy)
HADOOP-2459. Fix package target so that src/docs/build files are not
included in the release. (nigel)
HADOOP-2352. Remove AC_CHECK_LIB for libz and liblzo to ensure that
libhadoop.so doesn't have a dependency on them. (acmurthy)
HADOOP-2453. Fix the configuration for wordcount-simple example in Hadoop
Pipes which currently produces an XML parsing error. (Amareshwari Sri
Ramadasu via acmurthy)
HADOOP-2476. Unit test failure while reading permission bits of local
file system (on Windows) fixed. (Raghu Angadi via dhruba)
HADOOP-2247. Fine-tune the strategies for killing mappers and reducers
due to failures while fetching map-outputs. Now the map-completion times
and number of currently running reduces are taken into account by the
JobTracker before killing the mappers, while the progress made by the
reducer and the number of fetch-failures vis-a-vis total number of
fetch-attempts are taken into account before teh reducer kills itself.
(Amar Kamat via acmurthy)
HADOOP-2452. Fix eclipse plug-in build.xml to refers to the right
location where hadoop-*-core.jar is generated. (taton)
HADOOP-2492. Additional debugging in the rpc server to better
diagnose ConcurrentModificationException. (dhruba)
HADOOP-2344. Enhance the utility for executing shell commands to read the
stdout/stderr streams while waiting for the command to finish (to free up
the buffers). Also, this patch throws away stderr of the DF utility.
@deprecated
org.apache.hadoop.fs.ShellCommand for org.apache.hadoop.util.Shell
org.apache.hadoop.util.ShellUtil for
org.apache.hadoop.util.Shell.ShellCommandExecutor
(Amar Kamat via acmurthy)
HADOOP-2511. Fix a javadoc warning in org.apache.hadoop.util.Shell
introduced by HADOOP-2344. (acmurthy)
HADOOP-2442. Fix TestLocalFileSystemPermission.testLocalFSsetOwner
to work on more platforms. (Raghu Angadi via nigel)
HADOOP-2488. Fix a regression in random read performance.
(Michael Stack via rangadi)
HADOOP-2523. Fix TestDFSShell.testFilePermissions on Windows.
(Raghu Angadi via nigel)
HADOOP-2535. Removed support for deprecated mapred.child.heap.size and
fixed some indentation issues in TaskRunner. (acmurthy)
Configuration changes to hadoop-default.xml:
remove mapred.child.heap.size
HADOOP-2512. Fix error stream handling in Shell. Use exit code to
detect shell command errors in RawLocalFileSystem. (Raghu Angadi)
HADOOP-2446. Fixes TestHDFSServerPorts and TestMRServerPorts so they
do not rely on statically configured ports and cleanup better. (nigel)
HADOOP-2537. Make build process compatible with Ant 1.7.0.
(Hrishikesh via nigel)
HADOOP-1281. Ensure running tasks of completed map TIPs (e.g. speculative
tasks) are killed as soon as the TIP completed. (acmurthy)
HADOOP-2571. Suppress a suprious warning in test code. (cdouglas)
HADOOP-2481. NNBench report its progress periodically.
(Hairong Kuang via dhruba)
HADOOP-2601. Start name-node on a free port for TestNNThroughputBenchmark.
(Konstantin Shvachko)
HADOOP-2494. Set +x on contrib/*/bin/* in packaged tar bundle.
(stack via tomwhite)
HADOOP-2605. Remove bogus leading slash in task-tracker report bindAddress.
(Konstantin Shvachko)
HADOOP-2620. Trivial. 'bin/hadoop fs -help' did not list chmod, chown, and
chgrp. (Raghu Angadi)
HADOOP-2614. The DFS WebUI accesses are configured to be from the user
specified by dfs.web.ugi. (Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-2543. Implement a "no-permission-checking" mode for smooth
upgrade from a pre-0.16 install of HDFS.
(Hairong Kuang via dhruba)
HADOOP-290. A DataNode log message now prints the target of a replication
request correctly. (dhruba)
HADOOP-2538. Redirect to a warning, if plaintext parameter is true but
the filter parameter is not given in TaskLogServlet.
(Michael Bieniosek via enis)
HADOOP-2582. Prevent 'bin/hadoop fs -copyToLocal' from creating
zero-length files when the src does not exist.
(Lohit Vijayarenu via cdouglas)
HADOOP-2189. Incrementing user counters should count as progress. (ddas)
HADOOP-2649. The NameNode periodically computes replication work for
the datanodes. The periodicity of this computation is now configurable.
(dhruba)
HADOOP-2549. Correct disk size computation so that data-nodes could switch
to other local drives if current is full. (Hairong Kuang via shv)
HADOOP-2633. Fsck should call name-node methods directly rather than
through rpc. (Tsz Wo (Nicholas), SZE via shv)
HADOOP-2687. Modify a few log message generated by dfs client to be
logged only at INFO level. (stack via dhruba)
HADOOP-2402. Fix BlockCompressorStream to ensure it buffers data before
sending it down to the compressor so that each write call doesn't
compress. (Chris Douglas via acmurthy)
HADOOP-2645. The Metrics initialization code does not throw
exceptions when servers are restarted by MiniDFSCluster.
(Sanjay Radia via dhruba)
HADOOP-2691. Fix a race condition that was causing the DFSClient
to erroneously remove a good datanode from a pipeline that actually
had another datanode that was bad. (dhruba)
HADOOP-1195. All code in FSNamesystem checks the return value
of getDataNode for null before using it. (dhruba)
HADOOP-2640. Fix a bug in MultiFileSplitInputFormat that was always
returning 1 split in some circumstances. (Enis Soztutar via nigel)
HADOOP-2626. Fix paths with special characters to work correctly
with the local filesystem. (Thomas Friol via cutting)
HADOOP-2646. Fix SortValidator to work with fully-qualified
working directories. (Arun C Murthy via nigel)
HADOOP-2092. Added a ping mechanism to the pipes' task to periodically
check if the parent Java task is running, and exit if the parent isn't
alive and responding. (Amareshwari Sri Ramadasu via acmurthy)
HADOOP-2714. TestDecommission failed on windows because the replication
request was timing out. (dhruba)
HADOOP-2576. Namenode performance degradation over time triggered by
large heartbeat interval. (Raghu Angadi)
HADOOP-2720. Jumbo bug fix patch to HOD. Final sync of Apache SVN with
internal Yahoo SVN. (Hemanth Yamijala via nigel)
HADOOP-2713. TestDatanodeDeath failed on windows because the replication
request was timing out. (dhruba)
HADOOP-2639. Fixes a problem to do with incorrect maintenance of values
for runningMapTasks/runningReduceTasks. (Amar Kamat and Arun Murthy
via ddas)
HADOOP-2723. Fixed the check for checking whether to do user task
profiling. (Amareshwari Sri Ramadasu via omalley)
HADOOP-2734. Link forrest docs to new http://hadoop.apache.org
(Doug Cutting via nigel)
HADOOP-2641. Added Apache license headers to 95 files. (nigel)
HADOOP-2732. Fix bug in path globbing. (Hairong Kuang via nigel)
HADOOP-2404. Fix backwards compatability with hadoop-0.15 configuration
files that was broken by HADOOP-2185. (omalley)
HADOOP-2740. Fix HOD to work with the configuration variables changed in
HADOOP-2404. (Hemanth Yamijala via omalley)
HADOOP-2755. Fix fsck performance degradation because of permissions
issue. (Tsz Wo (Nicholas), SZE via dhruba)
HADOOP-2768. Fix performance regression caused by HADOOP-1707.
(dhruba borthakur via nigel)
HADOOP-3108. Fix NPE in setPermission and setOwner. (shv)
HADOOP-3108. Correction to the previous patch. (shv)
Release 0.15.3 - 2008-01-18
BUG FIXES
HADOOP-2562. globPaths supports {ab,cd}. (Hairong Kuang via dhruba)
HADOOP-2540. fsck reports missing blocks incorrectly. (dhruba)
HADOOP-2570. "work" directory created unconditionally, and symlinks
created from the task cwds.
HADOOP-2574. Fixed mapred_tutorial.xml to correct minor errors with the
WordCount examples. (acmurthy)
Release 0.15.2 - 2008-01-02
BUG FIXES
HADOOP-2246. Moved the changelog for HADOOP-1851 from the NEW FEATURES
section to the INCOMPATIBLE CHANGES section. (acmurthy)
HADOOP-2238. Fix TaskGraphServlet so that it sets the content type of
the response appropriately. (Paul Saab via enis)
HADOOP-2129. Fix so that distcp works correctly when source is
HDFS but not the default filesystem. HDFS paths returned by the
listStatus() method are now fully-qualified. (cutting)
HADOOP-2378. Fixes a problem where the last task completion event would
get created after the job completes. (Alejandro Abdelnur via ddas)
HADOOP-2228. Checks whether a job with a certain jobId is already running
and then tries to create the JobInProgress object.
(Johan Oskarsson via ddas)
HADOOP-2422. dfs -cat multiple files fail with 'Unable to write to
output stream'. (Raghu Angadi via dhruba)
HADOOP-2460. When the namenode encounters ioerrors on writing a
transaction log, it stops writing new transactions to that one.
(Raghu Angadi via dhruba)
HADOOP-2227. Use the LocalDirAllocator uniformly for handling all of the
temporary storage required for a given task. It also implies that
mapred.local.dir.minspacestart is handled by checking if there is enough
free-space on any one of the available disks. (Amareshwari Sri Ramadasu
via acmurthy)
HADOOP-2437. Fix the LocalDirAllocator to choose the seed for the
round-robin disk selections randomly. This helps in spreading data across
multiple partitions much better. (acmurhty)
HADOOP-2486. When the list of files from the InMemoryFileSystem is obtained
for merging, this patch will ensure that only those files whose checksums
have also got created (renamed) are returned. (ddas)
HADOOP-2456. Hardcode English locale to prevent NumberFormatException
from occurring when starting the NameNode with certain locales.
(Matthias Friedrich via nigel)
IMPROVEMENTS
HADOOP-2160. Remove project-level, non-user documentation from
releases, since it's now maintained in a separate tree. (cutting)
HADOOP-1327. Add user documentation for streaming. (cutting)
HADOOP-2382. Add hadoop-default.html to subversion. (cutting)
HADOOP-2158. hdfsListDirectory calls FileSystem.listStatus instead
of FileSystem.listPaths. This reduces the number of RPC calls on the
namenode, thereby improving scalability. (Christian Kunz via dhruba)
Release 0.15.1 - 2007-11-27
INCOMPATIBLE CHANGES
HADOOP-713. Reduce CPU usage on namenode while listing directories.
FileSystem.listPaths does not return the size of the entire subtree.
Introduced a new API ClientProtocol.getContentLength that returns the
size of the subtree. (Dhruba Borthakur via dhruba)
IMPROVEMENTS
HADOOP-1917. Addition of guides/tutorial for better overall
documentation for Hadoop. Specifically:
* quickstart.html is targetted towards first-time users and helps them
setup a single-node cluster and play with Hadoop.
* cluster_setup.html helps admins to configure and setup non-trivial