Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse code

Merge branch 'akpm' (aka "Andrew's patch-bomb, take two")

Andrew explains:

 - various misc stuff

 - Most of the rest of MM: memcg, threaded hugepages, others.

 - cpumask

 - kexec

 - kdump

 - some direct-io performance tweaking

 - radix-tree optimisations

 - new selftests code

   A note on this: often people will develop a new userspace-visible
   feature and will develop userspace code to exercise/test that
   feature.  Then they merge the patch and the selftest code dies.
   Sometimes we paste it into the changelog.  Sometimes the code gets
   thrown into Documentation/(!).

   This saddens me.  So this patch creates a bare-bones framework which
   will henceforth allow me to ask people to include their test apps in
   the kernel tree so we can keep them alive.  Then when people enhance
   or fix the feature, I can ask them to update the test app too.

   The infrastruture is terribly trivial at present - let's see how it
   evolves.

 - checkpoint/restart feature work.

   A note on this: this is a project by various mad Russians to perform
   c/r mainly from userspace, with various oddball helper code added
   into the kernel where the need is demonstrated.

   So rather than some large central lump of code, what we have is
   little bits and pieces popping up in various places which either
   expose something new or which permit something which is normally
   kernel-private to be modified.

   The overall project is an ongoing thing.  I've judged that the size
   and scope of the thing means that we're more likely to be successful
   with it if we integrate the support into mainline piecemeal rather
   than allowing it all to develop out-of-tree.

   However I'm less confident than the developers that it will all
   eventually work! So what I'm asking them to do is to wrap each piece
   of new code inside CONFIG_CHECKPOINT_RESTORE.  So if it all
   eventually comes to tears and the project as a whole fails, it should
   be a simple matter to go through and delete all trace of it.

This lot pretty much wraps up the -rc1 merge for me.

* akpm: (96 commits)
  unlzo: fix input buffer free
  ramoops: update parameters only after successful init
  ramoops: fix use of rounddown_pow_of_two()
  c/r: prctl: add PR_SET_MM codes to set up mm_struct entries
  c/r: procfs: add start_data, end_data, start_brk members to /proc/$pid/stat v4
  c/r: introduce CHECKPOINT_RESTORE symbol
  selftests: new x86 breakpoints selftest
  selftests: new very basic kernel selftests directory
  radix_tree: take radix_tree_path off stack
  radix_tree: remove radix_tree_indirect_to_ptr()
  dio: optimize cache misses in the submission path
  vfs: cache request_queue in struct block_device
  fs/direct-io.c: calculate fs_count correctly in get_more_blocks()
  drivers/parport/parport_pc.c: fix warnings
  panic: don't print redundant backtraces on oops
  sysctl: add the kernel.ns_last_pid control
  kdump: add udev events for memory online/offline
  include/linux/crash_dump.h needs elf.h
  kdump: fix crash_kexec()/smp_send_stop() race in panic()
  kdump: crashk_res init check for /sys/kernel/kexec_crash_size
  ...
  • Loading branch information...
commit 099469502f62fbe0d7e4f0b83a2f22538367f734 2 parents 7c17d86 + 35f1526
Linus Torvalds authored January 12, 2012

Showing 100 changed files with 2,589 additions and 1,562 deletions. Show diff stats Hide diff stats

  1. 4  Documentation/ABI/testing/sysfs-kernel-slab
  2. 9  Documentation/cgroups/memory.txt
  3. 3  Documentation/filesystems/proc.txt
  4. 8  Documentation/sysctl/kernel.txt
  5. 5  Documentation/vm/slub.txt
  6. 14  arch/Kconfig
  7. 2  arch/avr32/include/asm/system.h
  8. 2  arch/avr32/kernel/traps.c
  9. 1  arch/ia64/include/asm/processor.h
  10. 4  arch/ia64/kernel/machine_kexec.c
  11. 3  arch/m68k/amiga/config.c
  12. 2  arch/mips/include/asm/ptrace.h
  13. 2  arch/mips/kernel/traps.c
  14. 2  arch/mn10300/include/asm/exceptions.h
  15. 2  arch/parisc/include/asm/processor.h
  16. 1  arch/parisc/kernel/process.c
  17. 4  arch/powerpc/kernel/machine_kexec_32.c
  18. 6  arch/powerpc/kernel/machine_kexec_64.c
  19. 2  arch/powerpc/mm/numa.c
  20. 1  arch/powerpc/platforms/pseries/nvram.c
  21. 2  arch/s390/include/asm/processor.h
  22. 2  arch/s390/kernel/nmi.c
  23. 2  arch/sh/kernel/process_32.c
  24. 2  arch/sh/kernel/process_64.c
  25. 6  arch/tile/kernel/machine_kexec.c
  26. 3  arch/x86/Kconfig
  27. 6  arch/x86/Kconfig.cpu
  28. 2  arch/x86/mm/numa.c
  29. 8  arch/x86/um/Kconfig
  30. 17  drivers/base/memory.c
  31. 24  drivers/char/ramoops.c
  32. 3  drivers/mtd/mtdoops.c
  33. 4  drivers/parport/parport_pc.c
  34. 6  drivers/video/nvidia/nvidia.c
  35. 3  fs/block_dev.c
  36. 5  fs/btrfs/disk-io.c
  37. 57  fs/direct-io.c
  38. 234  fs/eventpoll.c
  39. 3  fs/hugetlbfs/inode.c
  40. 2  fs/nfs/internal.h
  41. 4  fs/nfs/write.c
  42. 2  fs/pipe.c
  43. 7  fs/proc/array.c
  44. 2  fs/proc/base.c
  45. 14  include/asm-generic/tlb.h
  46. 1  include/linux/crash_dump.h
  47. 1  include/linux/eventpoll.h
  48. 14  include/linux/fs.h
  49. 2  include/linux/huge_mm.h
  50. 13  include/linux/kernel.h
  51. 1  include/linux/kmsg_dump.h
  52. 4  include/linux/linkage.h
  53. 105  include/linux/memcontrol.h
  54. 23  include/linux/migrate.h
  55. 44  include/linux/mm_inline.h
  56. 9  include/linux/mm_types.h
  57. 28  include/linux/mmzone.h
  58. 2  include/linux/oom.h
  59. 46  include/linux/page_cgroup.h
  60. 12  include/linux/pagevec.h
  61. 12  include/linux/prctl.h
  62. 3  include/linux/radix-tree.h
  63. 4  include/linux/rmap.h
  64. 2  include/linux/sched.h
  65. 22  include/trace/events/vmscan.h
  66. 11  init/Kconfig
  67. 6  kernel/exit.c
  68. 25  kernel/kexec.c
  69. 2  kernel/kprobes.c
  70. 26  kernel/panic.c
  71. 4  kernel/pid.c
  72. 31  kernel/pid_namespace.c
  73. 121  kernel/sys.c
  74. 2  lib/decompress_unlzo.c
  75. 154  lib/radix-tree.c
  76. 5  mm/compaction.c
  77. 18  mm/filemap.c
  78. 93  mm/huge_memory.c
  79. 11  mm/ksm.c
  80. 1,102  mm/memcontrol.c
  81. 2  mm/memory-failure.c
  82. 4  mm/memory.c
  83. 2  mm/memory_hotplug.c
  84. 2  mm/mempolicy.c
  85. 173  mm/migrate.c
  86. 42  mm/oom_kill.c
  87. 55  mm/page_alloc.c
  88. 164  mm/page_cgroup.c
  89. 20  mm/rmap.c
  90. 9  mm/slub.c
  91. 79  mm/swap.c
  92. 10  mm/swap_state.c
  93. 9  mm/swapfile.c
  94. 9  mm/vmalloc.c
  95. 680  mm/vmscan.c
  96. 2  mm/vmstat.c
  97. 11  tools/testing/selftests/Makefile
  98. 20  tools/testing/selftests/breakpoints/Makefile
  99. 394  tools/testing/selftests/breakpoints/breakpoint_test.c
  100. 8  tools/testing/selftests/run_tests
4  Documentation/ABI/testing/sysfs-kernel-slab
@@ -346,6 +346,10 @@ Description:
346 346
 		number of objects per slab.  If a slab cannot be allocated
347 347
 		because of fragmentation, SLUB will retry with the minimum order
348 348
 		possible depending on its characteristics.
  349
+		When debug_guardpage_minorder=N (N > 0) parameter is specified
  350
+		(see Documentation/kernel-parameters.txt), the minimum possible
  351
+		order is used and this sysfs entry can not be used to change
  352
+		the order at run time.
349 353
 
350 354
 What:		/sys/kernel/slab/cache/order_fallback
351 355
 Date:		April 2008
9  Documentation/cgroups/memory.txt
@@ -61,7 +61,7 @@ Brief summary of control files.
61 61
  memory.failcnt			 # show the number of memory usage hits limits
62 62
  memory.memsw.failcnt		 # show the number of memory+Swap hits limits
63 63
  memory.max_usage_in_bytes	 # show max memory usage recorded
64  
- memory.memsw.usage_in_bytes	 # show max memory+Swap usage recorded
  64
+ memory.memsw.max_usage_in_bytes # show max memory+Swap usage recorded
65 65
  memory.soft_limit_in_bytes	 # set/show soft limit of memory usage
66 66
  memory.stat			 # show various statistics
67 67
  memory.use_hierarchy		 # set/show hierarchical account enabled
@@ -410,8 +410,11 @@ memory.stat file includes following statistics
410 410
 cache		- # of bytes of page cache memory.
411 411
 rss		- # of bytes of anonymous and swap cache memory.
412 412
 mapped_file	- # of bytes of mapped file (includes tmpfs/shmem)
413  
-pgpgin		- # of pages paged in (equivalent to # of charging events).
414  
-pgpgout		- # of pages paged out (equivalent to # of uncharging events).
  413
+pgpgin		- # of charging events to the memory cgroup. The charging
  414
+		event happens each time a page is accounted as either mapped
  415
+		anon page(RSS) or cache page(Page Cache) to the cgroup.
  416
+pgpgout		- # of uncharging events to the memory cgroup. The uncharging
  417
+		event happens each time a page is unaccounted from the cgroup.
415 418
 swap		- # of bytes of swap usage
416 419
 inactive_anon	- # of bytes of anonymous memory and swap cache memory on
417 420
 		LRU list.
3  Documentation/filesystems/proc.txt
@@ -307,6 +307,9 @@ Table 1-4: Contents of the stat files (as of 2.6.30-rc7)
307 307
   blkio_ticks   time spent waiting for block IO
308 308
   gtime         guest time of the task in jiffies
309 309
   cgtime        guest time of the task children in jiffies
  310
+  start_data    address above which program data+bss is placed
  311
+  end_data      address below which program data+bss is placed
  312
+  start_brk     address above which program heap can be expanded with brk()
310 313
 ..............................................................................
311 314
 
312 315
 The /proc/PID/maps file containing the currently mapped memory regions and
8  Documentation/sysctl/kernel.txt
@@ -415,6 +415,14 @@ PIDs of value pid_max or larger are not allocated.
415 415
 
416 416
 ==============================================================
417 417
 
  418
+ns_last_pid:
  419
+
  420
+The last pid allocated in the current (the one task using this sysctl
  421
+lives in) pid namespace. When selecting a pid for a next task on fork
  422
+kernel tries to allocate a number starting from this one.
  423
+
  424
+==============================================================
  425
+
418 426
 powersave-nap: (PPC only)
419 427
 
420 428
 If set, Linux-PPC will use the 'nap' mode of powersaving,
5  Documentation/vm/slub.txt
@@ -131,7 +131,10 @@ slub_min_objects.
131 131
 slub_max_order specified the order at which slub_min_objects should no
132 132
 longer be checked. This is useful to avoid SLUB trying to generate
133 133
 super large order pages to fit slub_min_objects of a slab cache with
134  
-large object sizes into one high order page.
  134
+large object sizes into one high order page. Setting command line
  135
+parameter debug_guardpage_minorder=N (N > 0), forces setting
  136
+slub_max_order to 0, what cause minimum possible order of slabs
  137
+allocation.
135 138
 
136 139
 SLUB Debug output
137 140
 -----------------
14  arch/Kconfig
@@ -185,4 +185,18 @@ config HAVE_RCU_TABLE_FREE
185 185
 config ARCH_HAVE_NMI_SAFE_CMPXCHG
186 186
 	bool
187 187
 
  188
+config HAVE_ALIGNED_STRUCT_PAGE
  189
+	bool
  190
+	help
  191
+	  This makes sure that struct pages are double word aligned and that
  192
+	  e.g. the SLUB allocator can perform double word atomic operations
  193
+	  on a struct page for better performance. However selecting this
  194
+	  might increase the size of a struct page by a word.
  195
+
  196
+config HAVE_CMPXCHG_LOCAL
  197
+	bool
  198
+
  199
+config HAVE_CMPXCHG_DOUBLE
  200
+	bool
  201
+
188 202
 source "kernel/gcov/Kconfig"
2  arch/avr32/include/asm/system.h
@@ -169,7 +169,7 @@ static inline unsigned long __cmpxchg_local(volatile void *ptr,
169 169
 #define cmpxchg64_local(ptr, o, n) __cmpxchg64_local_generic((ptr), (o), (n))
170 170
 
171 171
 struct pt_regs;
172  
-void NORET_TYPE die(const char *str, struct pt_regs *regs, long err);
  172
+void die(const char *str, struct pt_regs *regs, long err);
173 173
 void _exception(long signr, struct pt_regs *regs, int code,
174 174
 		unsigned long addr);
175 175
 
2  arch/avr32/kernel/traps.c
@@ -24,7 +24,7 @@
24 24
 
25 25
 static DEFINE_SPINLOCK(die_lock);
26 26
 
27  
-void NORET_TYPE die(const char *str, struct pt_regs *regs, long err)
  27
+void die(const char *str, struct pt_regs *regs, long err)
28 28
 {
29 29
 	static int die_counter;
30 30
 
1  arch/ia64/include/asm/processor.h
@@ -309,7 +309,6 @@ struct thread_struct {
309 309
 }
310 310
 
311 311
 #define start_thread(regs,new_ip,new_sp) do {							\
312  
-	set_fs(USER_DS);									\
313 312
 	regs->cr_ipsr = ((regs->cr_ipsr | (IA64_PSR_BITS_TO_SET | IA64_PSR_CPL))		\
314 313
 			 & ~(IA64_PSR_BITS_TO_CLEAR | IA64_PSR_RI | IA64_PSR_IS));		\
315 314
 	regs->cr_iip = new_ip;									\
4  arch/ia64/kernel/machine_kexec.c
@@ -27,11 +27,11 @@
27 27
 #include <asm/sal.h>
28 28
 #include <asm/mca.h>
29 29
 
30  
-typedef NORET_TYPE void (*relocate_new_kernel_t)(
  30
+typedef void (*relocate_new_kernel_t)(
31 31
 					unsigned long indirection_page,
32 32
 					unsigned long start_address,
33 33
 					struct ia64_boot_param *boot_param,
34  
-					unsigned long pal_addr) ATTRIB_NORET;
  34
+					unsigned long pal_addr) __noreturn;
35 35
 
36 36
 struct kimage *ia64_kimage;
37 37
 
3  arch/m68k/amiga/config.c
@@ -511,8 +511,7 @@ static unsigned long amiga_gettimeoffset(void)
511 511
 	return ticks + offset;
512 512
 }
513 513
 
514  
-static NORET_TYPE void amiga_reset(void)
515  
-    ATTRIB_NORET;
  514
+static void amiga_reset(void)  __noreturn;
516 515
 
517 516
 static void amiga_reset(void)
518 517
 {
2  arch/mips/include/asm/ptrace.h
@@ -144,7 +144,7 @@ extern int ptrace_set_watch_regs(struct task_struct *child,
144 144
 extern asmlinkage void syscall_trace_enter(struct pt_regs *regs);
145 145
 extern asmlinkage void syscall_trace_leave(struct pt_regs *regs);
146 146
 
147  
-extern NORET_TYPE void die(const char *, struct pt_regs *) ATTRIB_NORET;
  147
+extern void die(const char *, struct pt_regs *) __noreturn;
148 148
 
149 149
 static inline void die_if_kernel(const char *str, struct pt_regs *regs)
150 150
 {
2  arch/mips/kernel/traps.c
@@ -1340,7 +1340,7 @@ void ejtag_exception_handler(struct pt_regs *regs)
1340 1340
 /*
1341 1341
  * NMI exception handler.
1342 1342
  */
1343  
-NORET_TYPE void ATTRIB_NORET nmi_exception_handler(struct pt_regs *regs)
  1343
+void __noreturn nmi_exception_handler(struct pt_regs *regs)
1344 1344
 {
1345 1345
 	bust_spinlocks(1);
1346 1346
 	printk("NMI taken!!!!\n");
2  arch/mn10300/include/asm/exceptions.h
@@ -110,7 +110,7 @@ extern asmlinkage void nmi_handler(void);
110 110
 extern asmlinkage void misalignment(struct pt_regs *, enum exception_code);
111 111
 
112 112
 extern void die(const char *, struct pt_regs *, enum exception_code)
113  
-	ATTRIB_NORET;
  113
+	__noreturn;
114 114
 
115 115
 extern int die_if_no_fixup(const char *, struct pt_regs *, enum exception_code);
116 116
 
2  arch/parisc/include/asm/processor.h
@@ -196,7 +196,6 @@ typedef unsigned int elf_caddr_t;
196 196
 	/* offset pc for priv. level */			\
197 197
 	pc |= 3;					\
198 198
 							\
199  
-	set_fs(USER_DS);				\
200 199
 	regs->iasq[0] = spaceid;			\
201 200
 	regs->iasq[1] = spaceid;			\
202 201
 	regs->iaoq[0] = pc;				\
@@ -299,7 +298,6 @@ on downward growing arches, it looks like this:
299 298
 	elf_addr_t pc = (elf_addr_t)new_pc | 3;		\
300 299
 	elf_caddr_t *argv = (elf_caddr_t *)bprm->exec + 1;	\
301 300
 							\
302  
-	set_fs(USER_DS);				\
303 301
 	regs->iasq[0] = spaceid;			\
304 302
 	regs->iasq[1] = spaceid;			\
305 303
 	regs->iaoq[0] = pc;				\
1  arch/parisc/kernel/process.c
@@ -192,7 +192,6 @@ void flush_thread(void)
192 192
 	/* Only needs to handle fpu stuff or perf monitors.
193 193
 	** REVISIT: several arches implement a "lazy fpu state".
194 194
 	*/
195  
-	set_fs(USER_DS);
196 195
 }
197 196
 
198 197
 void release_thread(struct task_struct *dead_task)
4  arch/powerpc/kernel/machine_kexec_32.c
@@ -16,10 +16,10 @@
16 16
 #include <asm/hw_irq.h>
17 17
 #include <asm/io.h>
18 18
 
19  
-typedef NORET_TYPE void (*relocate_new_kernel_t)(
  19
+typedef void (*relocate_new_kernel_t)(
20 20
 				unsigned long indirection_page,
21 21
 				unsigned long reboot_code_buffer,
22  
-				unsigned long start_address) ATTRIB_NORET;
  22
+				unsigned long start_address) __noreturn;
23 23
 
24 24
 /*
25 25
  * This is a generic machine_kexec function suitable at least for
6  arch/powerpc/kernel/machine_kexec_64.c
@@ -307,9 +307,9 @@ static union thread_union kexec_stack __init_task_data =
307 307
 struct paca_struct kexec_paca;
308 308
 
309 309
 /* Our assembly helper, in kexec_stub.S */
310  
-extern NORET_TYPE void kexec_sequence(void *newstack, unsigned long start,
311  
-					void *image, void *control,
312  
-					void (*clear_all)(void)) ATTRIB_NORET;
  310
+extern void kexec_sequence(void *newstack, unsigned long start,
  311
+			   void *image, void *control,
  312
+			   void (*clear_all)(void)) __noreturn;
313 313
 
314 314
 /* too late to fail here */
315 315
 void default_machine_kexec(struct kimage *image)
2  arch/powerpc/mm/numa.c
@@ -58,7 +58,7 @@ static int distance_lookup_table[MAX_NUMNODES][MAX_DISTANCE_REF_POINTS];
58 58
  * Allocate node_to_cpumask_map based on number of available nodes
59 59
  * Requires node_possible_map to be valid.
60 60
  *
61  
- * Note: node_to_cpumask() is not valid until after this is done.
  61
+ * Note: cpumask_of_node() is not valid until after this is done.
62 62
  */
63 63
 static void __init setup_node_to_cpumask_map(void)
64 64
 {
1  arch/powerpc/platforms/pseries/nvram.c
@@ -638,7 +638,6 @@ static void oops_to_nvram(struct kmsg_dumper *dumper,
638 638
 		/* These are almost always orderly shutdowns. */
639 639
 		return;
640 640
 	case KMSG_DUMP_OOPS:
641  
-	case KMSG_DUMP_KEXEC:
642 641
 		break;
643 642
 	case KMSG_DUMP_PANIC:
644 643
 		panicking = true;
2  arch/s390/include/asm/processor.h
@@ -236,7 +236,7 @@ static inline unsigned long __rewind_psw(psw_t psw, unsigned long ilc)
236 236
 /*
237 237
  * Function to drop a processor into disabled wait state
238 238
  */
239  
-static inline void ATTRIB_NORET disabled_wait(unsigned long code)
  239
+static inline void __noreturn disabled_wait(unsigned long code)
240 240
 {
241 241
         unsigned long ctl_buf;
242 242
         psw_t dw_psw;
2  arch/s390/kernel/nmi.c
@@ -30,7 +30,7 @@ struct mcck_struct {
30 30
 
31 31
 static DEFINE_PER_CPU(struct mcck_struct, cpu_mcck);
32 32
 
33  
-static NORET_TYPE void s390_handle_damage(char *msg)
  33
+static void s390_handle_damage(char *msg)
34 34
 {
35 35
 	smp_send_stop();
36 36
 	disabled_wait((unsigned long) __builtin_return_address(0));
2  arch/sh/kernel/process_32.c
@@ -70,7 +70,7 @@ void show_regs(struct pt_regs * regs)
70 70
 /*
71 71
  * Create a kernel thread
72 72
  */
73  
-ATTRIB_NORET void kernel_thread_helper(void *arg, int (*fn)(void *))
  73
+__noreturn void kernel_thread_helper(void *arg, int (*fn)(void *))
74 74
 {
75 75
 	do_exit(fn(arg));
76 76
 }
2  arch/sh/kernel/process_64.c
@@ -285,7 +285,7 @@ void show_regs(struct pt_regs *regs)
285 285
 /*
286 286
  * Create a kernel thread
287 287
  */
288  
-ATTRIB_NORET void kernel_thread_helper(void *arg, int (*fn)(void *))
  288
+__noreturn void kernel_thread_helper(void *arg, int (*fn)(void *))
289 289
 {
290 290
 	do_exit(fn(arg));
291 291
 }
6  arch/tile/kernel/machine_kexec.c
@@ -248,11 +248,11 @@ static void setup_quasi_va_is_pa(void)
248 248
 }
249 249
 
250 250
 
251  
-NORET_TYPE void machine_kexec(struct kimage *image)
  251
+void machine_kexec(struct kimage *image)
252 252
 {
253 253
 	void *reboot_code_buffer;
254  
-	NORET_TYPE void (*rnk)(unsigned long, void *, unsigned long)
255  
-		ATTRIB_NORET;
  254
+	void (*rnk)(unsigned long, void *, unsigned long)
  255
+		__noreturn;
256 256
 
257 257
 	/* Mask all interrupts before starting to reboot. */
258 258
 	interrupt_mask_set_mask(~0ULL);
3  arch/x86/Kconfig
@@ -60,6 +60,9 @@ config X86
60 60
 	select PERF_EVENTS
61 61
 	select HAVE_PERF_EVENTS_NMI
62 62
 	select ANON_INODES
  63
+	select HAVE_ALIGNED_STRUCT_PAGE if SLUB && !M386
  64
+	select HAVE_CMPXCHG_LOCAL if !M386
  65
+	select HAVE_CMPXCHG_DOUBLE
63 66
 	select HAVE_ARCH_KMEMCHECK
64 67
 	select HAVE_USER_RETURN_NOTIFIER
65 68
 	select ARCH_BINFMT_ELF_RANDOMIZE_PIE
6  arch/x86/Kconfig.cpu
@@ -309,12 +309,6 @@ config X86_INTERNODE_CACHE_SHIFT
309 309
 config X86_CMPXCHG
310 310
 	def_bool X86_64 || (X86_32 && !M386)
311 311
 
312  
-config CMPXCHG_LOCAL
313  
-	def_bool X86_64 || (X86_32 && !M386)
314  
-
315  
-config CMPXCHG_DOUBLE
316  
-	def_bool y
317  
-
318 312
 config X86_L1_CACHE_SHIFT
319 313
 	int
320 314
 	default "7" if MPENTIUM4 || MPSC
2  arch/x86/mm/numa.c
@@ -110,7 +110,7 @@ void __cpuinit numa_clear_node(int cpu)
110 110
  * Allocate node_to_cpumask_map based on number of available nodes
111 111
  * Requires node_possible_map to be valid.
112 112
  *
113  
- * Note: node_to_cpumask() is not valid until after this is done.
  113
+ * Note: cpumask_of_node() is not valid until after this is done.
114 114
  * (Use CONFIG_DEBUG_PER_CPU_MAPS to check this.)
115 115
  */
116 116
 void __init setup_node_to_cpumask_map(void)
8  arch/x86/um/Kconfig
@@ -6,14 +6,6 @@ menu "UML-specific options"
6 6
 
7 7
 menu "Host processor type and features"
8 8
 
9  
-config CMPXCHG_LOCAL
10  
-	bool
11  
-	default n
12  
-
13  
-config CMPXCHG_DOUBLE
14  
-	bool
15  
-	default n
16  
-
17 9
 source "arch/x86/Kconfig.cpu"
18 10
 
19 11
 endmenu
17  drivers/base/memory.c
@@ -295,11 +295,22 @@ static int memory_block_change_state(struct memory_block *mem,
295 295
 
296 296
 	ret = memory_block_action(mem->start_section_nr, to_state);
297 297
 
298  
-	if (ret)
  298
+	if (ret) {
299 299
 		mem->state = from_state_req;
300  
-	else
301  
-		mem->state = to_state;
  300
+		goto out;
  301
+	}
302 302
 
  303
+	mem->state = to_state;
  304
+	switch (mem->state) {
  305
+	case MEM_OFFLINE:
  306
+		kobject_uevent(&mem->dev.kobj, KOBJ_OFFLINE);
  307
+		break;
  308
+	case MEM_ONLINE:
  309
+		kobject_uevent(&mem->dev.kobj, KOBJ_ONLINE);
  310
+		break;
  311
+	default:
  312
+		break;
  313
+	}
303 314
 out:
304 315
 	mutex_unlock(&mem->state_mutex);
305 316
 	return ret;
24  drivers/char/ramoops.c
@@ -83,8 +83,7 @@ static void ramoops_do_dump(struct kmsg_dumper *dumper,
83 83
 	struct timeval timestamp;
84 84
 
85 85
 	if (reason != KMSG_DUMP_OOPS &&
86  
-	    reason != KMSG_DUMP_PANIC &&
87  
-	    reason != KMSG_DUMP_KEXEC)
  86
+	    reason != KMSG_DUMP_PANIC)
88 87
 		return;
89 88
 
90 89
 	/* Only dump oopses if dump_oops is set */
@@ -126,8 +125,8 @@ static int __init ramoops_probe(struct platform_device *pdev)
126 125
 		goto fail3;
127 126
 	}
128 127
 
129  
-	rounddown_pow_of_two(pdata->mem_size);
130  
-	rounddown_pow_of_two(pdata->record_size);
  128
+	pdata->mem_size = rounddown_pow_of_two(pdata->mem_size);
  129
+	pdata->record_size = rounddown_pow_of_two(pdata->record_size);
131 130
 
132 131
 	/* Check for the minimum memory size */
133 132
 	if (pdata->mem_size < MIN_MEM_SIZE &&
@@ -148,14 +147,6 @@ static int __init ramoops_probe(struct platform_device *pdev)
148 147
 	cxt->phys_addr = pdata->mem_address;
149 148
 	cxt->record_size = pdata->record_size;
150 149
 	cxt->dump_oops = pdata->dump_oops;
151  
-	/*
152  
-	 * Update the module parameter variables as well so they are visible
153  
-	 * through /sys/module/ramoops/parameters/
154  
-	 */
155  
-	mem_size = pdata->mem_size;
156  
-	mem_address = pdata->mem_address;
157  
-	record_size = pdata->record_size;
158  
-	dump_oops = pdata->dump_oops;
159 150
 
160 151
 	if (!request_mem_region(cxt->phys_addr, cxt->size, "ramoops")) {
161 152
 		pr_err("request mem region failed\n");
@@ -176,6 +167,15 @@ static int __init ramoops_probe(struct platform_device *pdev)
176 167
 		goto fail1;
177 168
 	}
178 169
 
  170
+	/*
  171
+	 * Update the module parameter variables as well so they are visible
  172
+	 * through /sys/module/ramoops/parameters/
  173
+	 */
  174
+	mem_size = pdata->mem_size;
  175
+	mem_address = pdata->mem_address;
  176
+	record_size = pdata->record_size;
  177
+	dump_oops = pdata->dump_oops;
  178
+
179 179
 	return 0;
180 180
 
181 181
 fail1:
3  drivers/mtd/mtdoops.c
@@ -315,8 +315,7 @@ static void mtdoops_do_dump(struct kmsg_dumper *dumper,
315 315
 	char *dst;
316 316
 
317 317
 	if (reason != KMSG_DUMP_OOPS &&
318  
-	    reason != KMSG_DUMP_PANIC &&
319  
-	    reason != KMSG_DUMP_KEXEC)
  318
+	    reason != KMSG_DUMP_PANIC)
320 319
 		return;
321 320
 
322 321
 	/* Only dump oopses if dump_oops is set */
4  drivers/parport/parport_pc.c
@@ -3404,8 +3404,8 @@ static int __init parport_init_mode_setup(char *str)
3404 3404
 #endif
3405 3405
 
3406 3406
 #ifdef MODULE
3407  
-static const char *irq[PARPORT_PC_MAX_PORTS];
3408  
-static const char *dma[PARPORT_PC_MAX_PORTS];
  3407
+static char *irq[PARPORT_PC_MAX_PORTS];
  3408
+static char *dma[PARPORT_PC_MAX_PORTS];
3409 3409
 
3410 3410
 MODULE_PARM_DESC(io, "Base I/O address (SPP regs)");
3411 3411
 module_param_array(io, int, NULL, 0);
6  drivers/video/nvidia/nvidia.c
@@ -81,7 +81,7 @@ static int vram __devinitdata = 0;
81 81
 static int bpp __devinitdata = 8;
82 82
 static int reverse_i2c __devinitdata;
83 83
 #ifdef CONFIG_MTRR
84  
-static int nomtrr __devinitdata = 0;
  84
+static bool nomtrr __devinitdata = false;
85 85
 #endif
86 86
 #ifdef CONFIG_PMAC_BACKLIGHT
87 87
 static int backlight __devinitdata = 1;
@@ -1509,7 +1509,7 @@ static int __devinit nvidiafb_setup(char *options)
1509 1509
 			backlight = simple_strtoul(this_opt+10, NULL, 0);
1510 1510
 #ifdef CONFIG_MTRR
1511 1511
 		} else if (!strncmp(this_opt, "nomtrr", 6)) {
1512  
-			nomtrr = 1;
  1512
+			nomtrr = true;
1513 1513
 #endif
1514 1514
 		} else if (!strncmp(this_opt, "fpdither:", 9)) {
1515 1515
 			fpdither = simple_strtol(this_opt+9, NULL, 0);
@@ -1599,7 +1599,7 @@ MODULE_PARM_DESC(bpp, "pixel width in bits"
1599 1599
 module_param(reverse_i2c, int, 0);
1600 1600
 MODULE_PARM_DESC(reverse_i2c, "reverse port assignment of the i2c bus");
1601 1601
 #ifdef CONFIG_MTRR
1602  
-module_param(nomtrr, bool, 0);
  1602
+module_param(nomtrr, bool, false);
1603 1603
 MODULE_PARM_DESC(nomtrr, "Disables MTRR support (0 or 1=disabled) "
1604 1604
 		 "(default=0)");
1605 1605
 #endif
3  fs/block_dev.c
@@ -1139,6 +1139,7 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
1139 1139
 	mutex_lock_nested(&bdev->bd_mutex, for_part);
1140 1140
 	if (!bdev->bd_openers) {
1141 1141
 		bdev->bd_disk = disk;
  1142
+		bdev->bd_queue = disk->queue;
1142 1143
 		bdev->bd_contains = bdev;
1143 1144
 		if (!partno) {
1144 1145
 			struct backing_dev_info *bdi;
@@ -1159,6 +1160,7 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
1159 1160
 					disk_put_part(bdev->bd_part);
1160 1161
 					bdev->bd_part = NULL;
1161 1162
 					bdev->bd_disk = NULL;
  1163
+					bdev->bd_queue = NULL;
1162 1164
 					mutex_unlock(&bdev->bd_mutex);
1163 1165
 					disk_unblock_events(disk);
1164 1166
 					put_disk(disk);
@@ -1232,6 +1234,7 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
1232 1234
 	disk_put_part(bdev->bd_part);
1233 1235
 	bdev->bd_disk = NULL;
1234 1236
 	bdev->bd_part = NULL;
  1237
+	bdev->bd_queue = NULL;
1235 1238
 	bdev_inode_switch_bdi(bdev->bd_inode, &default_backing_dev_info);
1236 1239
 	if (bdev != bdev->bd_contains)
1237 1240
 		__blkdev_put(bdev->bd_contains, mode, 1);
5  fs/btrfs/disk-io.c
@@ -872,7 +872,8 @@ static int btree_submit_bio_hook(struct inode *inode, int rw, struct bio *bio,
872 872
 
873 873
 #ifdef CONFIG_MIGRATION
874 874
 static int btree_migratepage(struct address_space *mapping,
875  
-			struct page *newpage, struct page *page)
  875
+			struct page *newpage, struct page *page,
  876
+			enum migrate_mode mode)
876 877
 {
877 878
 	/*
878 879
 	 * we can't safely write a btree page from here,
@@ -887,7 +888,7 @@ static int btree_migratepage(struct address_space *mapping,
887 888
 	if (page_has_private(page) &&
888 889
 	    !try_to_release_page(page, GFP_KERNEL))
889 890
 		return -EAGAIN;
890  
-	return migrate_page(mapping, newpage, page);
  891
+	return migrate_page(mapping, newpage, page, mode);
891 892
 }
892 893
 #endif
893 894
 
57  fs/direct-io.c
@@ -36,6 +36,7 @@
36 36
 #include <linux/rwsem.h>
37 37
 #include <linux/uio.h>
38 38
 #include <linux/atomic.h>
  39
+#include <linux/prefetch.h>
39 40
 
40 41
 /*
41 42
  * How many user pages to map in one call to get_user_pages().  This determines
@@ -580,9 +581,8 @@ static int get_more_blocks(struct dio *dio, struct dio_submit *sdio,
580 581
 {
581 582
 	int ret;
582 583
 	sector_t fs_startblk;	/* Into file, in filesystem-sized blocks */
  584
+	sector_t fs_endblk;	/* Into file, in filesystem-sized blocks */
583 585
 	unsigned long fs_count;	/* Number of filesystem-sized blocks */
584  
-	unsigned long dio_count;/* Number of dio_block-sized blocks */
585  
-	unsigned long blkmask;
586 586
 	int create;
587 587
 
588 588
 	/*
@@ -593,11 +593,9 @@ static int get_more_blocks(struct dio *dio, struct dio_submit *sdio,
593 593
 	if (ret == 0) {
594 594
 		BUG_ON(sdio->block_in_file >= sdio->final_block_in_request);
595 595
 		fs_startblk = sdio->block_in_file >> sdio->blkfactor;
596  
-		dio_count = sdio->final_block_in_request - sdio->block_in_file;
597  
-		fs_count = dio_count >> sdio->blkfactor;
598  
-		blkmask = (1 << sdio->blkfactor) - 1;
599  
-		if (dio_count & blkmask)	
600  
-			fs_count++;
  596
+		fs_endblk = (sdio->final_block_in_request - 1) >>
  597
+					sdio->blkfactor;
  598
+		fs_count = fs_endblk - fs_startblk + 1;
601 599
 
602 600
 		map_bh->b_state = 0;
603 601
 		map_bh->b_size = fs_count << dio->inode->i_blkbits;
@@ -1090,8 +1088,8 @@ static inline int drop_refcount(struct dio *dio)
1090 1088
  * individual fields and will generate much worse code. This is important
1091 1089
  * for the whole file.
1092 1090
  */
1093  
-ssize_t
1094  
-__blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
  1091
+static inline ssize_t
  1092
+do_blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
1095 1093
 	struct block_device *bdev, const struct iovec *iov, loff_t offset, 
1096 1094
 	unsigned long nr_segs, get_block_t get_block, dio_iodone_t end_io,
1097 1095
 	dio_submit_t submit_io,	int flags)
@@ -1100,7 +1098,6 @@ __blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
1100 1098
 	size_t size;
1101 1099
 	unsigned long addr;
1102 1100
 	unsigned blkbits = inode->i_blkbits;
1103  
-	unsigned bdev_blkbits = 0;
1104 1101
 	unsigned blocksize_mask = (1 << blkbits) - 1;
1105 1102
 	ssize_t retval = -EINVAL;
1106 1103
 	loff_t end = offset;
@@ -1113,12 +1110,14 @@ __blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
1113 1110
 	if (rw & WRITE)
1114 1111
 		rw = WRITE_ODIRECT;
1115 1112
 
1116  
-	if (bdev)
1117  
-		bdev_blkbits = blksize_bits(bdev_logical_block_size(bdev));
  1113
+	/*
  1114
+	 * Avoid references to bdev if not absolutely needed to give
  1115
+	 * the early prefetch in the caller enough time.
  1116
+	 */
1118 1117
 
1119 1118
 	if (offset & blocksize_mask) {
1120 1119
 		if (bdev)
1121  
-			 blkbits = bdev_blkbits;
  1120
+			blkbits = blksize_bits(bdev_logical_block_size(bdev));
1122 1121
 		blocksize_mask = (1 << blkbits) - 1;
1123 1122
 		if (offset & blocksize_mask)
1124 1123
 			goto out;
@@ -1129,11 +1128,13 @@ __blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
1129 1128
 		addr = (unsigned long)iov[seg].iov_base;
1130 1129
 		size = iov[seg].iov_len;
1131 1130
 		end += size;
1132  
-		if ((addr & blocksize_mask) || (size & blocksize_mask))  {
  1131
+		if (unlikely((addr & blocksize_mask) ||
  1132
+			     (size & blocksize_mask))) {
1133 1133
 			if (bdev)
1134  
-				 blkbits = bdev_blkbits;
  1134
+				blkbits = blksize_bits(
  1135
+					 bdev_logical_block_size(bdev));
1135 1136
 			blocksize_mask = (1 << blkbits) - 1;
1136  
-			if ((addr & blocksize_mask) || (size & blocksize_mask))  
  1137
+			if ((addr & blocksize_mask) || (size & blocksize_mask))
1137 1138
 				goto out;
1138 1139
 		}
1139 1140
 	}
@@ -1316,6 +1317,30 @@ __blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
1316 1317
 out:
1317 1318
 	return retval;
1318 1319
 }
  1320
+
  1321
+ssize_t
  1322
+__blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
  1323
+	struct block_device *bdev, const struct iovec *iov, loff_t offset,
  1324
+	unsigned long nr_segs, get_block_t get_block, dio_iodone_t end_io,
  1325
+	dio_submit_t submit_io,	int flags)
  1326
+{
  1327
+	/*
  1328
+	 * The block device state is needed in the end to finally
  1329
+	 * submit everything.  Since it's likely to be cache cold
  1330
+	 * prefetch it here as first thing to hide some of the
  1331
+	 * latency.
  1332
+	 *
  1333
+	 * Attempt to prefetch the pieces we likely need later.
  1334
+	 */
  1335
+	prefetch(&bdev->bd_disk->part_tbl);
  1336
+	prefetch(bdev->bd_queue);
  1337
+	prefetch((char *)bdev->bd_queue + SMP_CACHE_BYTES);
  1338
+
  1339
+	return do_blockdev_direct_IO(rw, iocb, inode, bdev, iov, offset,
  1340
+				     nr_segs, get_block, end_io,
  1341
+				     submit_io, flags);
  1342
+}
  1343
+
1319 1344
 EXPORT_SYMBOL(__blockdev_direct_IO);
1320 1345
 
1321 1346
 static __init int dio_init(void)
234  fs/eventpoll.c
@@ -197,6 +197,12 @@ struct eventpoll {
197 197
 
198 198
 	/* The user that created the eventpoll descriptor */
199 199
 	struct user_struct *user;
  200
+
  201
+	struct file *file;
  202
+
  203
+	/* used to optimize loop detection check */
  204
+	int visited;
  205
+	struct list_head visited_list_link;
200 206
 };
201 207
 
202 208
 /* Wait structure used by the poll hooks */
@@ -255,6 +261,15 @@ static struct kmem_cache *epi_cache __read_mostly;
255 261
 /* Slab cache used to allocate "struct eppoll_entry" */
256 262
 static struct kmem_cache *pwq_cache __read_mostly;
257 263
 
  264
+/* Visited nodes during ep_loop_check(), so we can unset them when we finish */
  265
+static LIST_HEAD(visited_list);
  266
+
  267
+/*
  268
+ * List of files with newly added links, where we may need to limit the number
  269
+ * of emanating paths. Protected by the epmutex.
  270
+ */
  271
+static LIST_HEAD(tfile_check_list);
  272
+
258 273
 #ifdef CONFIG_SYSCTL
259 274
 
260 275
 #include <linux/sysctl.h>
@@ -276,6 +291,12 @@ ctl_table epoll_table[] = {
276 291
 };
277 292
 #endif /* CONFIG_SYSCTL */
278 293
 
  294
+static const struct file_operations eventpoll_fops;
  295
+
  296
+static inline int is_file_epoll(struct file *f)
  297
+{
  298
+	return f->f_op == &eventpoll_fops;
  299
+}
279 300
 
280 301
 /* Setup the structure that is used as key for the RB tree */
281 302
 static inline void ep_set_ffd(struct epoll_filefd *ffd,
@@ -711,12 +732,6 @@ static const struct file_operations eventpoll_fops = {
711 732
 	.llseek		= noop_llseek,
712 733
 };
713 734
 
714  
-/* Fast test to see if the file is an eventpoll file */
715  
-static inline int is_file_epoll(struct file *f)
716  
-{
717  
-	return f->f_op == &eventpoll_fops;
718  
-}
719  
-
720 735
 /*
721 736
  * This is called from eventpoll_release() to unlink files from the eventpoll
722 737
  * interface. We need to have this facility to cleanup correctly files that are
@@ -926,6 +941,99 @@ static void ep_rbtree_insert(struct eventpoll *ep, struct epitem *epi)
926 941
 	rb_insert_color(&epi->rbn, &ep->rbr);
927 942
 }
928 943
 
  944
+
  945
+
  946
+#define PATH_ARR_SIZE 5
  947
+/*
  948
+ * These are the number paths of length 1 to 5, that we are allowing to emanate
  949
+ * from a single file of interest. For example, we allow 1000 paths of length
  950
+ * 1, to emanate from each file of interest. This essentially represents the
  951
+ * potential wakeup paths, which need to be limited in order to avoid massive
  952
+ * uncontrolled wakeup storms. The common use case should be a single ep which
  953
+ * is connected to n file sources. In this case each file source has 1 path
  954
+ * of length 1. Thus, the numbers below should be more than sufficient. These
  955
+ * path limits are enforced during an EPOLL_CTL_ADD operation, since a modify
  956
+ * and delete can't add additional paths. Protected by the epmutex.
  957
+ */
  958
+static const int path_limits[PATH_ARR_SIZE] = { 1000, 500, 100, 50, 10 };
  959
+static int path_count[PATH_ARR_SIZE];
  960
+
  961
+static int path_count_inc(int nests)
  962
+{
  963
+	if (++path_count[nests] > path_limits[nests])
  964
+		return -1;
  965
+	return 0;
  966
+}
  967
+
  968
+static void path_count_init(void)
  969
+{
  970
+	int i;
  971
+
  972
+	for (i = 0; i < PATH_ARR_SIZE; i++)
  973
+		path_count[i] = 0;
  974
+}
  975
+
  976
+static int reverse_path_check_proc(void *priv, void *cookie, int call_nests)
  977
+{
  978
+	int error = 0;
  979
+	struct file *file = priv;
  980
+	struct file *child_file;
  981
+	struct epitem *epi;
  982
+
  983
+	list_for_each_entry(epi, &file->f_ep_links, fllink) {
  984
+		child_file = epi->ep->file;
  985
+		if (is_file_epoll(child_file)) {
  986
+			if (list_empty(&child_file->f_ep_links)) {
  987
+				if (path_count_inc(call_nests)) {
  988
+					error = -1;
  989
+					break;
  990
+				}
  991
+			} else {
  992
+				error = ep_call_nested(&poll_loop_ncalls,
  993
+							EP_MAX_NESTS,
  994
+							reverse_path_check_proc,
  995
+							child_file, child_file,
  996
+							current);
  997
+			}
  998
+			if (error != 0)
  999
+				break;
  1000
+		} else {
  1001
+			printk(KERN_ERR "reverse_path_check_proc: "
  1002
+				"file is not an ep!\n");
  1003
+		}
  1004
+	}
  1005
+	return error;
  1006
+}
  1007
+
  1008
+/**
  1009
+ * reverse_path_check - The tfile_check_list is list of file *, which have
  1010
+ *                      links that are proposed to be newly added. We need to
  1011
+ *                      make sure that those added links don't add too many
  1012
+ *                      paths such that we will spend all our time waking up
  1013
+ *                      eventpoll objects.
  1014
+ *
  1015
+ * Returns: Returns zero if the proposed links don't create too many paths,
  1016
+ *	    -1 otherwise.
  1017
+ */
  1018
+static int reverse_path_check(void)
  1019
+{
  1020
+	int length = 0;
  1021
+	int error = 0;
  1022
+	struct file *current_file;
  1023
+
  1024
+	/* let's call this for all tfiles */
  1025
+	list_for_each_entry(current_file, &tfile_check_list, f_tfile_llink) {
  1026
+		length++;
  1027
+		path_count_init();
  1028
+		error = ep_call_nested(&poll_loop_ncalls, EP_MAX_NESTS,
  1029
+					reverse_path_check_proc, current_file,
  1030
+					current_file, current);
  1031
+		if (error)
  1032
+			break;
  1033
+	}
  1034
+	return error;
  1035
+}
  1036
+
929 1037
 /*
930 1038
  * Must be called with "mtx" held.
931 1039
  */
@@ -987,6 +1095,11 @@ static int ep_insert(struct eventpoll *ep, struct epoll_event *event,
987 1095
 	 */
988 1096
 	ep_rbtree_insert(ep, epi);
989 1097
 
  1098
+	/* now check if we've created too many backpaths */
  1099
+	error = -EINVAL;
  1100
+	if (reverse_path_check())
  1101
+		goto error_remove_epi;
  1102
+
990 1103
 	/* We have to drop the new item inside our item list to keep track of it */
991 1104
 	spin_lock_irqsave(&ep->lock, flags);
992 1105
 
@@ -1011,6 +1124,14 @@ static int ep_insert(struct eventpoll *ep, struct epoll_event *event,
1011 1124
 
1012 1125
 	return 0;
1013 1126
 
  1127
+error_remove_epi:
  1128
+	spin_lock(&tfile->f_lock);
  1129
+	if (ep_is_linked(&epi->fllink))
  1130
+		list_del_init(&epi->fllink);
  1131
+	spin_unlock(&tfile->f_lock);
  1132
+
  1133
+	rb_erase(&epi->rbn, &ep->rbr);
  1134
+
1014 1135
 error_unregister:
1015 1136
 	ep_unregister_pollwait(ep, epi);
1016 1137
 
@@ -1275,18 +1396,36 @@ static int ep_loop_check_proc(void *priv, void *cookie, int call_nests)
1275 1396
 	int error = 0;
1276 1397
 	struct file *file = priv;
1277 1398
 	struct eventpoll *ep = file->private_data;
  1399
+	struct eventpoll *ep_tovisit;
1278 1400
 	struct rb_node *rbp;
1279 1401
 	struct epitem *epi;
1280 1402
 
1281 1403
 	mutex_lock_nested(&ep->mtx, call_nests + 1);
  1404
+	ep->visited = 1;
  1405
+	list_add(&ep->visited_list_link, &visited_list);
1282 1406
 	for (rbp = rb_first(&ep->rbr); rbp; rbp = rb_next(rbp)) {
1283 1407
 		epi = rb_entry(rbp, struct epitem, rbn);
1284 1408
 		if (unlikely(is_file_epoll(epi->ffd.file))) {
  1409
+			ep_tovisit = epi->ffd.file->private_data;
  1410
+			if (ep_tovisit->visited)
  1411
+				continue;
1285 1412
 			error = ep_call_nested(&poll_loop_ncalls, EP_MAX_NESTS,
1286  
-					       ep_loop_check_proc, epi->ffd.file,
1287  
-					       epi->ffd.file->private_data, current);
  1413
+					ep_loop_check_proc, epi->ffd.file,
  1414
+					ep_tovisit, current);
1288 1415
 			if (error != 0)
1289 1416
 				break;
  1417
+		} else {
  1418
+			/*
  1419
+			 * If we've reached a file that is not associated with
  1420
+			 * an ep, then we need to check if the newly added
  1421
+			 * links are going to add too many wakeup paths. We do
  1422
+			 * this by adding it to the tfile_check_list, if it's
  1423
+			 * not already there, and calling reverse_path_check()
  1424
+			 * during ep_insert().
  1425
+			 */
  1426
+			if (list_empty(&epi->ffd.file->f_tfile_llink))
  1427
+				list_add(&epi->ffd.file->f_tfile_llink,
  1428
+					 &tfile_check_list);
1290 1429
 		}
1291 1430
 	}
1292 1431
 	mutex_unlock(&ep->mtx);
@@ -1307,8 +1446,31 @@ static int ep_loop_check_proc(void *priv, void *cookie, int call_nests)
1307 1446
  */
1308 1447
 static int ep_loop_check(struct eventpoll *ep, struct file *file)
1309 1448
 {
1310  
-	return ep_call_nested(&poll_loop_ncalls, EP_MAX_NESTS,
  1449
+	int ret;
  1450
+	struct eventpoll *ep_cur, *ep_next;
  1451
+
  1452
+	ret = ep_call_nested(&poll_loop_ncalls, EP_MAX_NESTS,
1311 1453
 			      ep_loop_check_proc, file, ep, current);
  1454
+	/* clear visited list */
  1455
+	list_for_each_entry_safe(ep_cur, ep_next, &visited_list,
  1456
+							visited_list_link) {
  1457
+		ep_cur->visited = 0;
  1458
+		list_del(&ep_cur->visited_list_link);
  1459
+	}
  1460
+	return ret;
  1461
+}
  1462
+
  1463
+static void clear_tfile_check_list(void)
  1464
+{
  1465
+	struct file *file;
  1466
+
  1467
+	/* first clear the tfile_check_list */
  1468
+	while (!list_empty(&tfile_check_list)) {
  1469
+		file = list_first_entry(&tfile_check_list, struct file,
  1470
+					f_tfile_llink);
  1471
+		list_del_init(&file->f_tfile_llink);
  1472
+	}
  1473
+	INIT_LIST_HEAD(&tfile_check_list);
1312 1474
 }
1313 1475
 
1314 1476
 /*
@@ -1316,8 +1478,9 @@ static int ep_loop_check(struct eventpoll *ep, struct file *file)
1316 1478
  */
1317 1479
 SYSCALL_DEFINE1(epoll_create1, int, flags)
1318 1480
 {
1319  
-	int error;
  1481
+	int error, fd;
1320 1482
 	struct eventpoll *ep = NULL;
  1483
+	struct file *file;
1321 1484
 
1322 1485
 	/* Check the EPOLL_* constant for consistency.  */
1323 1486
 	BUILD_BUG_ON(EPOLL_CLOEXEC != O_CLOEXEC);
@@ -1334,11 +1497,25 @@ SYSCALL_DEFINE1(epoll_create1, int, flags)
1334 1497
 	 * Creates all the items needed to setup an eventpoll file. That is,
1335 1498
 	 * a file structure and a free file descriptor.
1336 1499
 	 */
1337  
-	error = anon_inode_getfd("[eventpoll]", &eventpoll_fops, ep,
  1500
+	fd = get_unused_fd_flags(O_RDWR | (flags & O_CLOEXEC));
  1501
+	if (fd < 0) {
  1502
+		error = fd;
  1503
+		goto out_free_ep;
  1504
+	}
  1505
+	file = anon_inode_getfile("[eventpoll]", &eventpoll_fops, ep,
1338 1506
 				 O_RDWR | (flags & O_CLOEXEC));
1339  
-	if (error < 0)
1340  
-		ep_free(ep);
1341  
-
  1507
+	if (IS_ERR(file)) {
  1508
+		error = PTR_ERR(file);
  1509
+		goto out_free_fd;
  1510
+	}
  1511
+	fd_install(fd, file);
  1512
+	ep->file = file;
  1513
+	return fd;
  1514
+
  1515
+out_free_fd:
  1516
+	put_unused_fd(fd);
  1517
+out_free_ep:
  1518
+	ep_free(ep);
1342 1519
 	return error;
1343 1520
 }
1344 1521
 
@@ -1404,21 +1581,27 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
1404 1581
 	/*
1405 1582
 	 * When we insert an epoll file descriptor, inside another epoll file
1406 1583
 	 * descriptor, there is the change of creating closed loops, which are
1407  
-	 * better be handled here, than in more critical paths.
  1584
+	 * better be handled here, than in more critical paths. While we are
  1585
+	 * checking for loops we also determine the list of files reachable
  1586
+	 * and hang them on the tfile_check_list, so we can check that we
  1587
+	 * haven't created too many possible wakeup paths.
1408 1588
 	 *
1409  
-	 * We hold epmutex across the loop check and the insert in this case, in
1410  
-	 * order to prevent two separate inserts from racing and each doing the
1411  
-	 * insert "at the same time" such that ep_loop_check passes on both
1412  
-	 * before either one does the insert, thereby creating a cycle.
  1589
+	 * We need to hold the epmutex across both ep_insert and ep_remove
  1590
+	 * b/c we want to make sure we are looking at a coherent view of
  1591
+	 * epoll network.
1413 1592
 	 */
1414  
-	if (unlikely(is_file_epoll(tfile) && op == EPOLL_CTL_ADD)) {
  1593
+	if (op == EPOLL_CTL_ADD || op == EPOLL_CTL_DEL) {
1415 1594
 		mutex_lock(&epmutex);
1416 1595
 		did_lock_epmutex = 1;
1417  
-		error = -ELOOP;
1418  
-		if (ep_loop_check(ep, tfile) != 0)
1419  
-			goto error_tgt_fput;
1420 1596
 	}
1421  
-
  1597
+	if (op == EPOLL_CTL_ADD) {
  1598
+		if (is_file_epoll(tfile)) {
  1599
+			error = -ELOOP;
  1600
+			if (ep_loop_check(ep, tfile) != 0)
  1601
+				goto error_tgt_fput;
  1602
+		} else
  1603
+			list_add(&tfile->f_tfile_llink, &tfile_check_list);
  1604
+	}
1422 1605
 
1423 1606
 	mutex_lock_nested(&ep->mtx, 0);
1424 1607
 
@@ -1437,6 +1620,7 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
1437 1620
 			error = ep_insert(ep, &epds, tfile, fd);
1438 1621
 		} else
1439 1622
 			error = -EEXIST;
  1623
+		clear_tfile_check_list();
1440 1624
 		break;
1441 1625
 	case EPOLL_CTL_DEL:
1442 1626
 		if (epi)
@@ -1455,7 +1639,7 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
1455 1639
 	mutex_unlock(&ep->mtx);
1456 1640
 
1457 1641
 error_tgt_fput:
1458  
-	if (unlikely(did_lock_epmutex))
  1642
+	if (did_lock_epmutex)
1459 1643
 		mutex_unlock(&epmutex);
1460 1644
 
1461 1645
 	fput(tfile);
3  fs/hugetlbfs/inode.c
@@ -583,7 +583,8 @@ static int hugetlbfs_set_page_dirty(struct page *page)
583 583
 }
584 584
 
585 585
 static int hugetlbfs_migrate_page(struct address_space *mapping,
586  
-				struct page *newpage, struct page *page)
  586
+				struct page *newpage, struct page *page,
  587
+				enum migrate_mode mode)
587 588
 {
588 589
 	int rc;
589 590
 
2  fs/nfs/internal.h
@@ -332,7 +332,7 @@ void nfs_commit_release_pages(struct nfs_write_data *data);
332 332
 
333 333
 #ifdef CONFIG_MIGRATION
334 334
 extern int nfs_migrate_page(struct address_space *,
335  
-		struct page *, struct page *);
  335
+		struct page *, struct page *, enum migrate_mode);
336 336
 #else
337 337
 #define nfs_migrate_page NULL
338 338
 #endif
4  fs/nfs/write.c
@@ -1688,7 +1688,7 @@ int nfs_wb_page(struct inode *inode, struct page *page)
1688 1688
 
1689 1689
 #ifdef CONFIG_MIGRATION
1690 1690
 int nfs_migrate_page(struct address_space *mapping, struct page *newpage,
1691  
-		struct page *page)
  1691
+		struct page *page, enum migrate_mode mode)
1692 1692
 {
1693 1693
 	/*
1694 1694
 	 * If PagePrivate is set, then the page is currently associated with
@@ -1703,7 +1703,7 @@ int nfs_migrate_page(struct address_space *mapping, struct page *newpage,
1703 1703
 
1704 1704
 	nfs_fscache_release_page(page, GFP_KERNEL);
1705 1705
 
1706  
-	return migrate_page(mapping, newpage, page);
  1706
+	return migrate_page(mapping, newpage, page, mode);
1707 1707
 }
1708 1708
 #endif
1709 1709
 
2  fs/pipe.c
@@ -1137,7 +1137,7 @@ static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long nr_pages)
1137 1137
 	if (nr_pages < pipe->nrbufs)
1138 1138
 		return -EBUSY;
1139 1139
 
1140  
-	bufs = kcalloc(nr_pages, sizeof(struct pipe_buffer), GFP_KERNEL);
  1140
+	bufs = kcalloc(nr_pages, sizeof(*bufs), GFP_KERNEL | __GFP_NOWARN);
1141 1141
 	if (unlikely(!bufs))
1142 1142
 		return -ENOMEM;
1143 1143
 
7  fs/proc/array.c
@@ -464,7 +464,7 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
464 464
 
465 465
 	seq_printf(m, "%d (%s) %c %d %d %d %d %d %u %lu \
466 466
 %lu %lu %lu %lu %lu %ld %ld %ld %ld %d 0 %llu %lu %ld %lu %lu %lu %lu %lu \
467  
-%lu %lu %lu %lu %lu %lu %lu %lu %d %d %u %u %llu %lu %ld\n",
  467
+%lu %lu %lu %lu %lu %lu %lu %lu %d %d %u %u %llu %lu %ld %lu %lu %lu\n",
468 468
 		pid_nr_ns(pid, ns),
469 469
 		tcomm,
470 470
 		state,
@@ -511,7 +511,10 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
511 511
 		task->policy,
512 512
 		(unsigned long long)delayacct_blkio_ticks(task),
513 513
 		cputime_to_clock_t(gtime),
514  
-		cputime_to_clock_t(cgtime));
  514
+		cputime_to_clock_t(cgtime),