Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with associativity: "ways needs to be a power of 2" #66

Closed
sguera opened this issue Dec 6, 2017 · 2 comments
Closed

Error with associativity: "ways needs to be a power of 2" #66

sguera opened this issue Dec 6, 2017 · 2 comments
Labels

Comments

@sguera
Copy link

sguera commented Dec 6, 2017

When using as a machine file the one generated for the Intel Xeon E5-2640v4:

FLOPs per cycle:
  DP:
    ADD: 4
    FMA: 8
    MUL: 4
    total: 16
  SP:
    ADD: 8
    FMA: 16
    MUL: 8
    total: 32
NUMA domains per socket: 1.0

...


cacheline size:  64 B
clock: 2.47 GHz
compiler:
  clang: -03 -mavx2 -D_POSIX_C_SOURCE=200112L
  gcc: -O3 -march=core-avx2 -D_POSIX_C_SOURCE=200112L
  icc: -O3 -xCORE-AVX2 -fno-alias
cores per NUMA domain: 0.1
cores per socket: 10
memory hierarchy:
- cache per group:
    cl_size: 64
    load_from: L2
    replacement_policy: LRU
    sets: 64
    store_to: L2
    ways: 8
    write_allocate: True
    write_back: True
  cores per group: 1.0
  cycles per cacheline transfer: 1
  groups: 20
  level: L1
  performance counter metrics:
    accesses: MEM_UOPS_RETIRED_LOADS_ALL:PMC[0-3]
    evicts: L2_TRANS_L1D_WB:PMC[0-3]
    misses: L1D_REPLACEMENT:PMC[0-3]
  size per group: !!python/object:prefixedunit.PrefixedUnit
    prefix: k
    unit: B
    value: 32.0
  threads per group: 1.0
- cache per group:
    cl_size: 64
    load_from: L3
    replacement_policy: LRU
    sets: 512
    store_to: L3
    ways: 8
    write_allocate: True
    write_back: True
  cores per group: 1.0
  cycles per cacheline transfer: 2
  groups: 20
  level: L2
  performance counter metrics:
    accesses: L1D_REPLACEMENT:PMC[0-3]
    evicts: L2_TRANS_L2_WB:PMC[0-3]
    misses: L2_LINES_IN_ALL:PMC[0-3]
  size per group: !!python/object:prefixedunit.PrefixedUnit
    prefix: k
    unit: B
    value: 256.0
  threads per group: 1.0
- cache per group:
    cl_size: 64
    replacement_policy: LRU
    sets: 20480
    ways: 20
    write_allocate: True
    write_back: True
  cores per group: 10.0
  cycles per cacheline transfer: INFORMATION_REQUIRED
  groups: 2
  level: L3
  performance counter metrics:
    accesses: L2_LINES_IN_ALL:PMC[0-3]
    evicts: (LLC_VICTIMS_M:CBOX0C[01] + LLC_VICTIMS_M:CBOX1C[01] + LLC_VICTIMS_M:CBOX2C[01] +
               LLC_VICTIMS_M:CBOX3C[01] + LLC_VICTIMS_M:CBOX4C[01] + LLC_VICTIMS_M:CBOX5C[01] +
               LLC_VICTIMS_M:CBOX6C[01] + LLC_VICTIMS_M:CBOX7C[01] + LLC_VICTIMS_M:CBOX8C[01] +
               LLC_VICTIMS_M:CBOX9C[01] + LLC_VICTIMS_M:CBOX10C[01] + LLC_VICTIMS_M:CBOX11C[01] +
               LLC_VICTIMS_M:CBOX12C[01] + LLC_VICTIMS_M:CBOX13C[01] + LLC_VICTIMS_M:CBOX14C[01] +
               LLC_VICTIMS_M:CBOX15C[01] + LLC_VICTIMS_M:CBOX16C[01] + LLC_VICTIMS_M:CBOX17C[01] +
               LLC_VICTIMS_M:CBOX18C[01] + LLC_VICTIMS_M:CBOX19C[01] + LLC_VICTIMS_M:CBOX20C[01] +
               LLC_VICTIMS_M:CBOX21C[01])
    misses: (LLC_LOOKUP_DATA_READ:CBOX0C[01] + LLC_LOOKUP_DATA_READ:CBOX1C[01] +
               LLC_LOOKUP_DATA_READ:CBOX2C[01] + LLC_LOOKUP_DATA_READ:CBOX3C[01] +
               LLC_LOOKUP_DATA_READ:CBOX4C[01] + LLC_LOOKUP_DATA_READ:CBOX5C[01] +
               LLC_LOOKUP_DATA_READ:CBOX6C[01] + LLC_LOOKUP_DATA_READ:CBOX7C[01] +
               LLC_LOOKUP_DATA_READ:CBOX8C[01] + LLC_LOOKUP_DATA_READ:CBOX9C[01] +
               LLC_LOOKUP_DATA_READ:CBOX10C[01] + LLC_LOOKUP_DATA_READ:CBOX11C[01] +
               LLC_LOOKUP_DATA_READ:CBOX12C[01] + LLC_LOOKUP_DATA_READ:CBOX13C[01] +
               LLC_LOOKUP_DATA_READ:CBOX14C[01] + LLC_LOOKUP_DATA_READ:CBOX15C[01] +
               LLC_LOOKUP_DATA_READ:CBOX16C[01] + LLC_LOOKUP_DATA_READ:CBOX17C[01] +
               LLC_LOOKUP_DATA_READ:CBOX18C[01] + LLC_LOOKUP_DATA_READ:CBOX19C[01] +
               LLC_LOOKUP_DATA_READ:CBOX20C[01] + LLC_LOOKUP_DATA_READ:CBOX21C[01])
  size per group: !!python/object:prefixedunit.PrefixedUnit
    prefix: M
    unit: B
    value: 25.0
  threads per group: 10.0
- cores per group: 10
  cycles per cacheline transfer: null
  level: MEM
  penalty cycles per read stream: 0
  size per group: null
  threads per group: 10
micro-architecture: BDW
model name: Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz
model type: Intel Xeon Broadwell EN/EP/EX processor
non-overlapping model:
  performance counter metric: T_OL + T_L1L2 + T_L2L3 + T_L3MEM
  ports: ["2D", "3D"]
overlapping model:
  performance counter metric: 
    Max(UMASK_UOPS_EXECUTED_PORT_PORT_0:PMC[0-3],
      UMASK_UOPS_EXECUTED_PORT_PORT_1:PMC[0-3],
      UMASK_UOPS_EXECUTED_PORT_PORT_4:PMC[0-3],
      UMASK_UOPS_EXECUTED_PORT_PORT_5:PMC[0-3],
      UMASK_UOPS_EXECUTED_PORT_PORT_6:PMC[0-3],
      UMASK_UOPS_EXECUTED_PORT_PORT_7:PMC[0-3])
  ports: ["0", "0DV", "1", "2", "2D", "3", "3D", "4", "5", "6", "7"]
sockets: 2
threads per core: 1

I get the following error:

Traceback (most recent call last):
  File "/users/staff/ifi/guerrera/anaconda2/envs/myenv3.6/bin/kerncraft", line 11, in <module>
    load_entry_point('kerncraft==0.5.10', 'console_scripts', 'kerncraft')()
  File "/users/staff/ifi/guerrera/anaconda2/envs/myenv3.6/lib/python3.6/site-packages/kerncraft/kerncraft.py", line 295, in main
    run(parser, args)
  File "/users/staff/ifi/guerrera/anaconda2/envs/myenv3.6/lib/python3.6/site-packages/kerncraft/kerncraft.py", line 259, in run
    model = getattr(models, model_name)(kernel, machine, args, parser)
  File "/users/staff/ifi/guerrera/anaconda2/envs/myenv3.6/lib/python3.6/site-packages/kerncraft/models/ecm.py", line 88, in __init__
    self.predictor = CacheSimulationPredictor(self.kernel, self.machine, self.cores)
  File "/users/staff/ifi/guerrera/anaconda2/envs/myenv3.6/lib/python3.6/site-packages/kerncraft/cacheprediction.py", line 218, in __init__
    csim = self.machine.get_cachesim(self.cores)
  File "/users/staff/ifi/guerrera/anaconda2/envs/myenv3.6/lib/python3.6/site-packages/kerncraft/machinemodel.py", line 71, in get_cachesim
    cs, caches, mem = cachesim.CacheSimulator.from_dict(cache_dict)
  File "/users/staff/ifi/guerrera/anaconda2/envs/myenv3.6/lib/python3.6/site-packages/cachesim/cache.py", line 63, in from_dict
    name=name, **{k:v for k,v in conf.items() if k not in ['store_to', 'load_from']})
  File "/users/staff/ifi/guerrera/anaconda2/envs/myenv3.6/lib/python3.6/site-packages/cachesim/cache.py", line 253, in __init__
    assert is_power2(ways), "ways needs to be a power of 2"
AssertionError: ways needs to be a power of 2

In this case L3 has 20 way associativity.

Should I bring it to the closest power of 2 or what?

@cod3monk
Copy link
Member

cod3monk commented Jan 8, 2018

@sguera As a workaround, you can use:

    sets: 25600
    ways: 16

instead of

    sets: 20480
    ways: 20

This should yield the same results in most cases.

@cod3monk
Copy link
Member

cod3monk commented Apr 17, 2019

Fixed with commit fbd388d

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants