Skip to content
Permalink
Qu-Wenruo/btrf…

Commits on Jan 7, 2016

  1. btrfs: dedup: Add ioctl for inband deduplication

    Add ioctl interface for inband deduplication, which includes:
    1) enable
    2) disable
    3) status
    
    We will later add ioctl to disable inband dedup for given file/dir.
    
    Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
    Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
    wangxiaoguang authored and fengguang committed Jan 7, 2016
  2. btrfs: dedup: Add support for adding hash for on-disk backend

    Now on-disk backend can add hash now.
    
    Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
    Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
    Qu Wenruo authored and fengguang committed Jan 7, 2016
  3. btrfs: dedup: Add support to delete hash for on-disk backend

    Now on-disk backend can delete hash now.
    
    Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
    Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
    Qu Wenruo authored and fengguang committed Jan 7, 2016
  4. btrfs: dedup: Add support for on-disk hash search

    Now on-disk backend should be able to search hash now.
    
    Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
    Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
    Qu Wenruo authored and fengguang committed Jan 7, 2016
  5. btrfs: dedup: Introduce interfaces to resume and cleanup dedup info

    Since we will introduce a new on-disk based dedup method, introduce new
    interfaces to resume previous dedup setup.
    
    And since we introduce a new tree for status, also add disable handler
    for it.
    
    Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
    Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
    Qu Wenruo authored and fengguang committed Jan 7, 2016
  6. btrfs: dedup: Add basic tree structure for on-disk dedup method

    Introduce a new tree, dedup tree to record on-disk dedup hash.
    As a persist hash storage instead of in-memeory only implement.
    
    Unlike Liu Bo's implement, in this version we won't do hack for
    bytenr -> hash search, but add a new type, DEDUP_BYTENR_ITEM for such
    search case, just like in-memory backend.
    
    Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
    Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
    Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
    Qu Wenruo authored and fengguang committed Jan 7, 2016
  7. btrfs: dedup: Inband in-memory only de-duplication implement

    Core implement for inband de-duplication.
    It reuse the async_cow_start() facility to calculate dedup hash.
    And use dedup hash to do inband de-duplication at extent level.
    
    The work flow is as below:
    1) Run delalloc range for an inode
    2) Calculate hash for the delalloc range at the unit of dedup_bs
    3) For hash match(duplicated) case, just increase source extent ref
       and insert file extent.
       For hash mismatch case, go through the normal cow_file_range()
       fallback, and add hash into dedup_tree.
       Compress for hash miss case is not supported yet.
    
    Current implement restore all dedup hash in memory rb-tree, with LRU
    behavior to control the limit.
    
    Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
    Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
    wangxiaoguang authored and fengguang committed Jan 7, 2016
  8. btrfs: ordered-extent: Add support for dedup

    Add ordered-extent support for dedup.
    
    Note, current ordered-extent support only supports non-compressed source
    extent.
    Support for compressed source extent will be added later.
    
    Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
    Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
    wangxiaoguang authored and fengguang committed Jan 7, 2016
  9. btrfs: dedup: Implement btrfs_dedup_calc_hash interface

    Unlike in-memory or on-disk dedup method, only SHA256 hash method is
    supported yet, so implement btrfs_dedup_calc_hash() interface using
    SHA256.
    
    Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
    Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
    wangxiaoguang authored and fengguang committed Jan 7, 2016
  10. btrfs: dedup: Introduce function to search for an existing hash

    Introduce static function inmem_search() to handle the job for in-memory
    hash tree.
    
    The trick is, we must ensure the delayed ref head is not being run at
    the time we search the for the hash.
    
    With inmem_search(), we can implement the btrfs_dedup_search()
    interface.
    
    Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
    Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
    wangxiaoguang authored and fengguang committed Jan 7, 2016
  11. btrfs: delayed_ref: Add support for handle dedup hash

    Add support for delayed_ref to handle dedup_hash.
    
    Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
    Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
    Qu Wenruo authored and fengguang committed Jan 7, 2016
  12. btrfs: delayed-ref: Add support for atomic increasing extent ref

    Slightly modify btrfs_add_delayed_data_ref() to allow it accept
    GFP_ATOMIC, and allow it to do be called inside a spinlock.
    
    This is used by later dedup patches.
    
    Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
    Qu Wenruo authored and fengguang committed Jan 7, 2016
  13. btrfs: dedup: Introduce function to remove hash from in-memory tree

    Introduce static function inmem_del() to remove hash from in-memory
    dedup tree.
    And implement btrfs_dedup_del() and btrfs_dedup_destroy() interfaces.
    
    Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
    Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
    wangxiaoguang authored and fengguang committed Jan 7, 2016
  14. btrfs: dedup: Introduce function to add hash into in-memory tree

    Introduce static function inmem_add() to add hash into in-memory tree.
    And now we can implement the btrfs_dedup_add() interface.
    
    Sgined-o
    Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
    wangxiaoguang authored and fengguang committed Jan 7, 2016
  15. btrfs: dedup: Introduce function to initialize dedup info

    Add generic function to initialize dedup info.
    
    Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
    Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
    wangxiaoguang authored and fengguang committed Jan 7, 2016
  16. btrfs: dedup: Introduce dedup framework and its header

    Introduce the header for btrfs online(write time) de-duplication
    framework and needed header.
    
    The new de-duplication framework is going to support 2 different dedup
    method and 1 dedup hash.
    
    Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
    Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
    wangxiaoguang authored and fengguang committed Jan 7, 2016

Commits on Jan 6, 2016

  1. perf/x86/amd: Remove l1-dcache-stores event for AMD

    This is a long standing bug with the l1-dcache-stores generic event on
    AMD machines.  My perf_event testsuite has been complaining about this
    for years and I'm finally getting around to trying to get it fixed.
    
    The data_cache_refills:system event does not make sense for l1-dcache-stores.
    Maybe this was a typo and it was meant to be for l1-dcache-store-misses?
    
    In any case, the values returned are nowhere near correct for l1-dcache-stores
    and in fact the umask values for the event have completely changed with
    fam15h so it makes even less sense than ever.  So just remove it.
    
    Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
    Cc: Jiri Olsa <jolsa@redhat.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Stephane Eranian <eranian@google.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1512091134350.24311@vincent-weaver-1.umelst.maine.edu
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    deater authored and Ingo Molnar committed Jan 6, 2016
  2. perf/x86/intel/uncore: Add Knights Landing uncore PMU support

    Knights Landing uncore performance monitoring (perfmon) is derived from
    Haswell-EP uncore perfmon with several differences. One notable difference
    is in PCI device IDs. Knights Landing uses common PCI device ID for
    multiple instances of an uncore PMU device type. In Haswell-EP, each
    instance of a PMU device type has a unique device ID.
    
    Knights Landing uncore components that have performance monitoring units
    are UBOX, CHA, EDC, MC, M2PCIe, IRP and PCU. Perfmon registers in EDC, MC,
    IRP, and M2PCIe reside in the PCIe configuration space. Perfmon registers
    in UBOX, CHA and PCU are accessed via the MSR interface.
    
    For more details, please refer to the public document:
    
      https://software.intel.com/sites/default/files/managed/15/8d/IntelXeonPhi%E2%84%A2x200ProcessorPerformanceMonitoringReferenceManual_Volume1_Registers_v0%206.pdf
    
    Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Andi Kleen <andi.kleen@intel.com>
    Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
    Cc: Harish Chegondi <harish.chegondi@gmail.com>
    Cc: Jiri Olsa <jolsa@redhat.com>
    Cc: Kan Liang <kan.liang@intel.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Lukasz Anaczkowski <lukasz.anaczkowski@intel.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Stephane Eranian <eranian@google.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Vince Weaver <vincent.weaver@maine.edu>
    Link: http://lkml.kernel.org/r/8ac513981264c3eb10343a3f523f19cc5a2d12fe.1449470704.git.harish.chegondi@intel.com
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    harishchegondi authored and Ingo Molnar committed Jan 6, 2016
  3. perf/x86/intel/uncore: Remove hard coding of PMON box control MSR offset

    Call uncore_pci_box_ctl() function to get the PMON box control MSR offset
    instead of hard coding the offset. This would allow us to use this
    snbep_uncore_pci_init_box() function for other PCI PMON devices whose box
    control MSR offset is different from SNBEP_PCI_PMON_BOX_CTL.
    
    Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Andi Kleen <andi.kleen@intel.com>
    Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
    Cc: Harish Chegondi <harish.chegondi@gmail.com>
    Cc: Jiri Olsa <jolsa@redhat.com>
    Cc: Kan Liang <kan.liang@intel.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Lukasz Anaczkowski <lukasz.anaczkowski@intel.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Stephane Eranian <eranian@google.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Vince Weaver <vincent.weaver@maine.edu>
    Link: http://lkml.kernel.org/r/872e8ef16cfc38e5ff3b45fac1094e6f1722e4ad.1449470704.git.harish.chegondi@intel.com
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    harishchegondi authored and Ingo Molnar committed Jan 6, 2016
  4. perf/x86/intel: Add perf core PMU support for Intel Knights Landing

    Knights Landing core is based on Silvermont core with several differences.
    Like Silvermont, Knights Landing has 8 pairs of LBR MSRs. However, the
    LBR MSRs addresses match those of the Xeon cores' first 8 pairs of LBR MSRs
    Unlike Silvermont, Knights Landing supports hyperthreading. Knights Landing
    offcore response events config register mask is different from that of the
    Silvermont.
    
    This patch was developed based on a patch from Andi Kleen.
    
    For more details, please refer to the public document:
    
      https://software.intel.com/sites/default/files/managed/15/8d/IntelXeonPhi%E2%84%A2x200ProcessorPerformanceMonitoringReferenceManual_Volume1_Registers_v0%206.pdf
    
    Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Andi Kleen <andi.kleen@intel.com>
    Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
    Cc: Harish Chegondi <harish.chegondi@gmail.com>
    Cc: Jiri Olsa <jolsa@redhat.com>
    Cc: Kan Liang <kan.liang@intel.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Lukasz Anaczkowski <lukasz.anaczkowski@intel.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Stephane Eranian <eranian@google.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Vince Weaver <vincent.weaver@maine.edu>
    Link: http://lkml.kernel.org/r/d14593c7311f78c93c9cf6b006be843777c5ad5c.1449517401.git.harish.chegondi@intel.com
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    harishchegondi authored and Ingo Molnar committed Jan 6, 2016
  5. perf/x86/intel/uncore: Add Broadwell-EP uncore support

    The uncore subsystem for Broadwell-EP is similar to Haswell-EP.
    There are some differences in pci device IDs, box number and
    constraints. This patch extends the Broadwell-DE codes to support
    Broadwell-EP.
    
    Signed-off-by: Kan Liang <kan.liang@intel.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
    Cc: Jiri Olsa <jolsa@redhat.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Stephane Eranian <eranian@google.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Vince Weaver <vincent.weaver@maine.edu>
    Link: http://lkml.kernel.org/r/1449176411-9499-1-git-send-email-kan.liang@intel.com
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    kliang2 authored and Ingo Molnar committed Jan 6, 2016
  6. perf/x86/rapl: Use unified perf_event_sysfs_show instead of special i…

    …nterface
    
    Actually, rapl_sysfs_show is a duplicate of perf_event_sysfs_show. We
    prefer to use the unified interface.
    
    Signed-off-by: Huang Rui <ray.huang@amd.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Andy Lutomirski <luto@amacapital.net>
    Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
    Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
    Cc: Borislav Petkov <bp@suse.de>
    Cc: Dasaratharaman Chandramouli<dasaratharaman.chandramouli@intel.com>
    Cc: Jiri Olsa <jolsa@redhat.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Robert Richter <rric@kernel.org>
    Cc: Stephane Eranian <eranian@google.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Vince Weaver <vincent.weaver@maine.edu>
    Link: http://lkml.kernel.org/r/1449223661-2437-1-git-send-email-ray.huang@amd.com
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    huangrui authored and Ingo Molnar committed Jan 6, 2016
  7. perf/x86: Enable cycles:pp for Intel Atom

    This patch updates the PEBS support for Intel Atom to provide
    an alias for the cycles:pp event used by perf record/top by default
    nowadays.
    
    On Atom, only INST_RETIRED:ANY supports PEBS, so we use this event
    instead with a large cmask to count cycles. Given that Core2 has
    the same issue, we use the intel_pebs_aliases_core2() function for Atom
    as well.
    
    Signed-off-by: Stephane Eranian <eranian@google.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
    Cc: Jiri Olsa <jolsa@redhat.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Vince Weaver <vincent.weaver@maine.edu>
    Cc: kan.liang@intel.com
    Link: http://lkml.kernel.org/r/1449172990-30183-3-git-send-email-eranian@google.com
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Stephane Eranian Ingo Molnar
    Stephane Eranian authored and Ingo Molnar committed Jan 6, 2016
  8. perf/x86: fix PEBS issues on Intel Atom/Core2

    This patch fixes broken PEBS support on Intel Atom and Core2
    due to wrong pointer arithmetic in intel_pmu_drain_pebs_core().
    
    The get_next_pebs_record_by_bit() was called on PEBS format fmt0
    which does not use the pebs_record_nhm layout.
    
    Signed-off-by: Stephane Eranian <eranian@google.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
    Cc: Jiri Olsa <jolsa@redhat.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Vince Weaver <vincent.weaver@maine.edu>
    Cc: kan.liang@intel.com
    Fixes: 2150908 ("perf/x86/intel: Handle multiple records in the PEBS buffer")
    Link: http://lkml.kernel.org/r/1449182000-31524-3-git-send-email-eranian@google.com
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Stephane Eranian Ingo Molnar
    Stephane Eranian authored and Ingo Molnar committed Jan 6, 2016
  9. perf/x86: Fix LBR related crashes on Intel Atom

    This patches fixes the LBR kernel crashes on Intel Atom.
    
    The kernel was assuming that if the CPU supports 64-bit format
    LBR, then it has an LBR_SELECT MSR. Atom uses 64-bit LBR format
    but does not have LBR_SELECT. That was causing NULL pointer
    dereferences in a couple of places.
    
    Signed-off-by: Stephane Eranian <eranian@google.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
    Cc: Jiri Olsa <jolsa@redhat.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Vince Weaver <vincent.weaver@maine.edu>
    Cc: kan.liang@intel.com
    Fixes: 96f3eda ("perf/x86/intel: Fix static checker warning in lbr enable")
    Link: http://lkml.kernel.org/r/1449182000-31524-2-git-send-email-eranian@google.com
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Stephane Eranian Ingo Molnar
    Stephane Eranian authored and Ingo Molnar committed Jan 6, 2016
  10. perf/x86: Fix filter_events() bug with event mappings

    This patch fixes a bug in the filter_events() function.
    
    The patch fixes the bug whereby if some mappings did not
    exist, e.g., STALLED_CYCLES_FRONTEND, then any event after it
    in the attrs array would disappear from the published list of
    events in /sys/devices/cpu/events. This could be verified
    easily on any system post SNB (which do not publish
    STALLED_CYCLES_FRONTEND):
    
    	$ ./perf stat -e cycles,ref-cycles true
    	Performance counter stats for 'true':
                  1,217,348      cycles
    	<not supported>      ref-cycles
    
    The problem is that in filter_events() there is an assumption
    that the argument (attrs) is organized in increasing continuous
    event indexes related to the event_map(). But if we remove the
    non-supported events by shifing the position in the array, then
    the lookup x86_pmu.event_map() needs to compensate for it, otherwise
    we are looking up the wrong index. This patch corrects this problem
    by compensating for the deleted events and with that ref-cycles
    reappears (here shown on Haswell):
    
    	$ perf stat -e ref-cycles,cycles true
    	Performance counter stats for 'true':
             4,525,910      ref-cycles
             1,064,920      cycles
           0.002943888 seconds time elapsed
    
    Signed-off-by: Stephane Eranian <eranian@google.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
    Cc: Jiri Olsa <jolsa@redhat.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Vince Weaver <vincent.weaver@maine.edu>
    Cc: jolsa@kernel.org
    Cc: kan.liang@intel.com
    Fixes: 8300daa ("perf/x86: Filter out undefined events from sysfs events attribute")
    Link: http://lkml.kernel.org/r/1449516805-6637-1-git-send-email-eranian@google.com
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Stephane Eranian Ingo Molnar
    Stephane Eranian authored and Ingo Molnar committed Jan 6, 2016
  11. perf/x86: Use INST_RETIRED.PREC_DIST for cycles: ppp

    Add a new 'three-p' precise level, that uses INST_RETIRED.PREC_DIST as
    base. The basic mechanism of abusing the inverse cmask to get all
    cycles works the same as before.
    
    PREC_DIST is available on Sandy Bridge or later. It had some problems
    on Sandy Bridge, so we only use it on IvyBridge and later. I tested it
    on Broadwell and Skylake.
    
    PREC_DIST has special support for avoiding shadow effects, which can
    give better results compare to UOPS_RETIRED. The drawback is that
    PREC_DIST can only schedule on counter 1, but that is ok for cycle
    sampling, as there is normally no need to do multiple cycle sampling
    runs in parallel. It is still possible to run perf top in parallel, as
    that doesn't use precise mode. Also of course the multiplexing can
    still allow parallel operation.
    
    :pp stays with the previous event.
    
    Example:
    
    Sample a loop with 10 sqrt with old cycles:pp
    
    	  0.14 │10:   sqrtps %xmm1,%xmm0     <--------------
    	  9.13 │      sqrtps %xmm1,%xmm0
    	 11.58 │      sqrtps %xmm1,%xmm0
    	 11.51 │      sqrtps %xmm1,%xmm0
    	  6.27 │      sqrtps %xmm1,%xmm0
    	 10.38 │      sqrtps %xmm1,%xmm0
    	 12.20 │      sqrtps %xmm1,%xmm0
    	 12.74 │      sqrtps %xmm1,%xmm0
    	  5.40 │      sqrtps %xmm1,%xmm0
    	 10.14 │      sqrtps %xmm1,%xmm0
    	 10.51 │    ↑ jmp    10
    
    We expect all 10 sqrt to get roughly the sample number of samples.
    
    But you can see that the instruction directly after the JMP is
    systematically underestimated in the result, due to sampling shadow
    effects.
    
    With the new PREC_DIST based sampling this problem is gone and all
    instructions show up roughly evenly:
    
    	  9.51 │10:   sqrtps %xmm1,%xmm0
    	 11.74 │      sqrtps %xmm1,%xmm0
    	 11.84 │      sqrtps %xmm1,%xmm0
    	  6.05 │      sqrtps %xmm1,%xmm0
    	 10.46 │      sqrtps %xmm1,%xmm0
    	 12.25 │      sqrtps %xmm1,%xmm0
    	 12.18 │      sqrtps %xmm1,%xmm0
    	  5.26 │      sqrtps %xmm1,%xmm0
    	 10.13 │      sqrtps %xmm1,%xmm0
    	 10.43 │      sqrtps %xmm1,%xmm0
    	  0.16 │    ↑ jmp    10
    
    Even with PREC_DIST there is still sampling skid and the result is not
    completely even, but systematic shadow effects are significantly
    reduced.
    
    The improvements are mainly expected to make a difference in high IPC
    code. With low IPC it should be similar.
    
    Signed-off-by: Andi Kleen <ak@linux.intel.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
    Cc: Jiri Olsa <jolsa@redhat.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Stephane Eranian <eranian@google.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Vince Weaver <vincent.weaver@maine.edu>
    Cc: hpa@zytor.com
    Link: http://lkml.kernel.org/r/1448929689-13771-2-git-send-email-andi@firstfloor.org
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Andi Kleen Ingo Molnar
    Andi Kleen authored and Ingo Molnar committed Jan 6, 2016
  12. perf/x86: Use INST_RETIRED.TOTAL_CYCLES_PS for cycles:pp for Skylake

    I added UOPS_RETIRED.ALL by mistake to the Skylake PEBS event list for
    cycles:pp. But the event is not documented for Skylake, and has some
    issues.
    
    The recommended replacement for cycles:pp is to use
    INST_RETIRED.ANY+pebs as a base, similar to what CPUs before Sandy
    Bridge did. This new event is called INST_RETIRED.TOTAL_CYCLES_PS. The
    event is not really new, but has been already used by perf before
    Sandy Bridge for the original cycles:p
    
    Note the SDM doesn't document that event either, but it's being
    documented in the latest version of the event list on:
    
      https://download.01.org/perfmon/SKL
    
    This patch does:
    
     - Remove UOPS_RETIRED.ALL from the Skylake PEBS event list
    
     - Add INST_RETIRED.ANY to the Skylake PEBS event list, and an table entry to
       allow cmask=16,inv=1 for cycles:pp
    
     - We don't need an extra entry for the base INST_RETIRED event,
       because it is already covered by the catch-all PEBS table entry.
    
     - Switch Skylake to use the Core2 PEBS alias (which is
       INST_RETIRED.TOTAL_CYCLES_PS)
    
    Signed-off-by: Andi Kleen <ak@linux.intel.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
    Cc: Jiri Olsa <jolsa@redhat.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Stephane Eranian <eranian@google.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Vince Weaver <vincent.weaver@maine.edu>
    Cc: hpa@zytor.com
    Link: http://lkml.kernel.org/r/1448929689-13771-1-git-send-email-andi@firstfloor.org
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Andi Kleen Ingo Molnar
    Andi Kleen authored and Ingo Molnar committed Jan 6, 2016
  13. perf/x86: Allow zero PEBS status with only single active event

    Normally we drop PEBS events with a zero status field. But when
    there is only a single PEBS event active we can assume the
    PEBS record is for that event. The PEBS buffer is always flushed
    when PEBS events are disabled, so there is no risk of mishandling
    state PEBS records this way.
    
    Signed-off-by: Andi Kleen <ak@linux.intel.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
    Cc: Jiri Olsa <jolsa@redhat.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Stephane Eranian <eranian@google.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Vince Weaver <vincent.weaver@maine.edu>
    Link: http://lkml.kernel.org/r/1449177740-5422-2-git-send-email-andi@firstfloor.org
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Andi Kleen Ingo Molnar
    Andi Kleen authored and Ingo Molnar committed Jan 6, 2016
  14. perf/x86: Remove warning for zero PEBS status

    The recent commit:
    
      75f8085 ("perf/x86/intel/pebs: Robustify PEBS buffer drain")
    
    causes lots of warnings on different CPUs before Skylake
    when running PEBS intensive workloads.
    
    They can have a zero status field in the PEBS record when
    PEBS is racing with clearing of GLOBAl_STATUS.
    
    This also can cause hangs (it seems there are still
    problems with printk in NMI).
    
    Disable the warning, but still ignore the record.
    
    Signed-off-by: Andi Kleen <ak@linux.intel.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
    Cc: Jiri Olsa <jolsa@redhat.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Stephane Eranian <eranian@google.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Vince Weaver <vincent.weaver@maine.edu>
    Link: http://lkml.kernel.org/r/1449177740-5422-1-git-send-email-andi@firstfloor.org
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Andi Kleen Ingo Molnar
    Andi Kleen authored and Ingo Molnar committed Jan 6, 2016
  15. perf/core: Collapse more IPI loops

    This patch collapses the two 'hard' cases, which are
    perf_event_{dis,en}able().
    
    I cannot seem to convince myself the current code is correct.
    
    So starting with perf_event_disable(); we don't strictly need to test
    for event->state == ACTIVE, ctx->is_active is enough. If the event is
    not scheduled while the ctx is, __perf_event_disable() still does the
    right thing.  Its a little less efficient to IPI in that case,
    over-all simpler.
    
    For perf_event_enable(); the same goes, but I think that's actually
    broken in its current form. The current condition is: ctx->is_active
    && event->state == OFF, that means it doesn't do anything when
    !ctx->active && event->state == OFF. This is wrong, it should still
    mark the event INACTIVE in that case, otherwise we'll still not try
    and schedule the event once the context becomes active again.
    
    This patch implements the two function using the new
    event_function_call() and does away with the tricky event->state
    tests.
    
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Reviewed-by: Alexander Shishkin <alexander.shishkin@intel.com>
    Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
    Cc: Jiri Olsa <jolsa@redhat.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Stephane Eranian <eranian@google.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Vince Weaver <vincent.weaver@maine.edu>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Peter Zijlstra Ingo Molnar
    Peter Zijlstra authored and Ingo Molnar committed Jan 6, 2016
  16. Merge branch 'perf/urgent' into perf/core, to pick up fixes before ap…

    …plying new changes
    
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Ingo Molnar
    Ingo Molnar committed Jan 6, 2016
  17. perf: Fix race in swevent hash

    There's a race on CPU unplug where we free the swevent hash array
    while it can still have events on. This will result in a
    use-after-free which is BAD.
    
    Simply do not free the hash array on unplug. This leaves the thing
    around and no use-after-free takes place.
    
    When the last swevent dies, we do a for_each_possible_cpu() iteration
    anyway to clean these up, at which time we'll free it, so no leakage
    will occur.
    
    Reported-by: Sasha Levin <sasha.levin@oracle.com>
    Tested-by: Sasha Levin <sasha.levin@oracle.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
    Cc: Frederic Weisbecker <fweisbec@gmail.com>
    Cc: Jiri Olsa <jolsa@redhat.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Stephane Eranian <eranian@google.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Vince Weaver <vincent.weaver@maine.edu>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Peter Zijlstra Ingo Molnar
    Peter Zijlstra authored and Ingo Molnar committed Jan 6, 2016
  18. perf: Fix race in perf_event_exec()

    I managed to tickle this warning:
    
      [ 2338.884942] ------------[ cut here ]------------
      [ 2338.890112] WARNING: CPU: 13 PID: 35162 at ../kernel/events/core.c:2702 task_ctx_sched_out+0x6b/0x80()
      [ 2338.900504] Modules linked in:
      [ 2338.903933] CPU: 13 PID: 35162 Comm: bash Not tainted 4.4.0-rc4-dirty torvalds#244
      [ 2338.911610] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
      [ 2338.923071]  ffffffff81f1468e ffff8807c6457cb8 ffffffff815c680c 0000000000000000
      [ 2338.931382]  ffff8807c6457cf0 ffffffff810c8a56 ffffe8ffff8c1bd0 ffff8808132ed400
      [ 2338.939678]  0000000000000286 ffff880813170380 ffff8808132ed400 ffff8807c6457d00
      [ 2338.947987] Call Trace:
      [ 2338.950726]  [<ffffffff815c680c>] dump_stack+0x4e/0x82
      [ 2338.956474]  [<ffffffff810c8a56>] warn_slowpath_common+0x86/0xc0
      [ 2338.963195]  [<ffffffff810c8b4a>] warn_slowpath_null+0x1a/0x20
      [ 2338.969720]  [<ffffffff811a49cb>] task_ctx_sched_out+0x6b/0x80
      [ 2338.976244]  [<ffffffff811a62d2>] perf_event_exec+0xe2/0x180
      [ 2338.982575]  [<ffffffff8121fb6f>] setup_new_exec+0x6f/0x1b0
      [ 2338.988810]  [<ffffffff8126de83>] load_elf_binary+0x393/0x1660
      [ 2338.995339]  [<ffffffff811dc772>] ? get_user_pages+0x52/0x60
      [ 2339.001669]  [<ffffffff8121e297>] search_binary_handler+0x97/0x200
      [ 2339.008581]  [<ffffffff8121f8b3>] do_execveat_common.isra.33+0x543/0x6e0
      [ 2339.016072]  [<ffffffff8121fcea>] SyS_execve+0x3a/0x50
      [ 2339.021819]  [<ffffffff819fc165>] stub_execve+0x5/0x5
      [ 2339.027469]  [<ffffffff819fbeb2>] ? entry_SYSCALL_64_fastpath+0x12/0x71
      [ 2339.034860] ---[ end trace ee1337c59a0ddeac ]---
    
    Which is a WARN_ON_ONCE() indicating that cpuctx->task_ctx is not
    what we expected it to be.
    
    This is because context switches can swap the task_struct::perf_event_ctxp[]
    pointer around. Therefore you have to either disable preemption when looking
    at current, or hold ctx->lock.
    
    Fix perf_event_enable_on_exec(), it loads current->perf_event_ctxp[]
    before disabling interrupts, therefore a preemption in the right place
    can swap contexts around and we're using the wrong one.
    
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Alexander Potapenko <glider@google.com>
    Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
    Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
    Cc: Dmitry Vyukov <dvyukov@google.com>
    Cc: Eric Dumazet <edumazet@google.com>
    Cc: Jiri Olsa <jolsa@redhat.com>
    Cc: Kostya Serebryany <kcc@google.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Sasha Levin <sasha.levin@oracle.com>
    Cc: Stephane Eranian <eranian@google.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Vince Weaver <vincent.weaver@maine.edu>
    Cc: syzkaller <syzkaller@googlegroups.com>
    Link: http://lkml.kernel.org/r/20151210195740.GG6357@twins.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Peter Zijlstra Ingo Molnar
    Peter Zijlstra authored and Ingo Molnar committed Jan 6, 2016

Commits on Dec 18, 2015

  1. Merge tag 'perf-core-for-mingo-3' of git://git.kernel.org/pub/scm/lin…

    …ux/kernel/git/acme/linux into perf/core
    
    Pull new perf tool feature from Arnaldo Carvalho de Melo:
    
    " User visible changes:
    
      - Generate perf.data files from 'perf stat', to tap into the scripting
        capabilities perf has instead of defining a 'perf stat' specific scripting
        support to calculate event ratios, etc. Simple example:
    
        $ perf stat record -e cycles usleep 1
    
         Performance counter stats for 'usleep 1':
    
               1,134,996      cycles
    
             0.000670644 seconds time elapsed
    
        $ perf stat report
    
         Performance counter stats for '/home/acme/bin/perf stat record -e cycles usleep 1':
    
               1,134,996      cycles
    
             0.000670644 seconds time elapsed
    
        $
    
        It generates PERF_RECORD_ userspace records to store the details:
    
        $ perf report -D | grep PERF_RECORD
        0xf0 [0x28]: PERF_RECORD_THREAD_MAP nr: 1 thread: 27637
        0x118 [0x12]: PERF_RECORD_CPU_MAP nr: 1 cpu: 65535
        0x12a [0x40]: PERF_RECORD_STAT_CONFIG
        0x16a [0x30]: PERF_RECORD_STAT
        -1 -1 0x19a [0x40]: PERF_RECORD_MMAP -1/0: [0xffffffff81000000(0x1f000000) @ 0xffffffff81000000]: x [kernel.kallsyms]_text
        0x1da [0x18]: PERF_RECORD_STAT_ROUND
        [acme@ssdandy linux]$
    
        An effort was made to make perf.data files generated like this to not
        generate cryptic messages when processed by older tools.
    
        The 'perf script' bits need rebasing, will go up later.
    
      Jiri's cover letter for this series:
    
      The initial attempt defined its own formula lang and allowed triggering user's
      script on the end of the stat command:
    
        http://marc.info/?l=linux-kernel&m=136742146322273&w=2
    
      This patchset abandons the idea of new formula language and rather adds support
      to:
    
        - store stat data into perf.data file
        - add python support to process stat events
    
      Basically it allows to store stat data into perf.data and post process it with
      python scripts in a similar way we do for sampling data.
    
      The stat data are stored in new stat, stat-round, stat-config user events.
        stat        - stored for each read syscall of the counter
        stat round  - stored for each interval or end of the command invocation
        stat config - stores all the config information needed to process data
                      so report tool could restore the same output as record
    
      The python script can now define 'stat__<eventname>_<modifier>' functions
      to get stat events data and 'stat__interval' to get stat-round data.
    
      See CPI script example in scripts/python/stat-cpi.py."
    
    Also a few other changes:
    
    User visible changes:
    
      - Make command line options always available, even when they
        depend on some feature being enabled, warning the user about
        use of such options (Wang Nan)
    
      - Support --vmlinux in perf record, useful, so far, for eBPF,
        where we will set up events that will be used in the record
        session (He Kuang)
    
      - Automatically disable collecting branch flags and cycles with
        --call-graph lbr. This allows avoiding a bunch of extra MSR
        reads in the PMI on Skylake.  (Andi Kleen)
    
    Infrastructure changes:
    
      - Dump the stack when a 'perf test -v ' entry segfaults, so far we
        would have to run it under gdb with 'set follow-fork-mode child'
        set to get a proper backtrace (Arnaldo Carvalho de Melo)
    
      - Initialize the refcnt in 'struct thread' to 1 and fixup its
        users accordingly, so that we try to have the same refcount
        model accross the perf codebase (Arnaldo Carvalho de Melo)
    
      - More prep work for moving the subcmd infrastructure out of
        tools/perf/ and into tools/lib/subcmd/ to be used by other
        tools/ living utilities (Josh Poimboeuf)
    
      - Fix 'perf test' hist testcases when kptr_restrict is on (Namhyung Kim)
    
    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Ingo Molnar
    Ingo Molnar committed Dec 18, 2015
Older
You can’t perform that action at this time.