Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add generic support for scalar ITensor contraction #569

Merged
merged 2 commits into from
Feb 3, 2021

Conversation

mtfishman
Copy link
Member

This fixes #564.

@emstoudenmire, the main subtlety here is that in this implementation, it adds a special case when the ITensor has a scalar value of 1 and does nothing at all to it (just returns the non-scalar ITensor). The alternative would be that it performs a copy of the ITensor in that case, since it may be surprising for people that contractions sometimes return views of the input ITensors. Do you have an opinion on this?

There are other cases where we would not want to do a copy in ITensor contraction, for example with delta contractions that are just replacing indices (i.e. randomITensor(i, j) * delta(j, j')). In those cases, I would also want to return a view of the ITensor data, so perhaps contraction with ITensor(1) can be lumped in with that category. The main subtlety for contraction with ITensor(1) is that it is a pretty sneaky behavior change between contracting with ITensor(1) and any other scalar value, so maybe someone relies on the return not being a view and then one day they don't realize they are contracting by a value of exactly 1 and the view behavior changes... But maybe we can just make it clear in the docs that scalar-like ITensors have special behavior.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 3, 2021

Benchmark result

Judge result

Benchmark Report for /home/runner/work/ITensors.jl/ITensors.jl

Job Properties

  • Time of benchmarks:
    • Target: 3 Feb 2021 - 00:17
    • Baseline: 3 Feb 2021 - 00:26
  • Package commits:
    • Target: d658e8
    • Baseline: 8e01e6
  • Julia commits:
    • Target: 44fa15
    • Baseline: 44fa15
  • Julia command flags:
    • Target: None
    • Baseline: None
  • Environment variables:
    • Target: None
    • Baseline: None

Results

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID time ratio memory ratio
["contract", "heff_2site"] 0.95 (5%) ✅ 1.00 (1%)
["contract", "matmul_80"] 0.87 (5%) ✅ 1.00 (1%)
["contract", "matmul_inplace_100"] 0.88 (5%) ✅ 1.00 (1%)
["contract", "matmul_inplace_20"] 0.86 (5%) ✅ 1.00 (1%)
["contract", "matmul_inplace_60"] 1.15 (5%) ❌ 1.00 (1%)
["contract", "matmul_inplace_80"] 0.89 (5%) ✅ 1.00 (1%)
["indexset", "constructor", "function"] 1.06 (5%) ❌ 1.00 (1%)
["indexset", "filter", "function"] 0.00 (5%) ✅ 0.00 (1%) ✅
["indexset", "filter", "kwargs"] 0.00 (5%) ✅ 0.05 (1%) ✅
["indexset", "uniqueinds", "3_inputs"] 1.16 (5%) ❌ 1.00 (1%)
["indexset", "uniqueinds", "order_3_inputs"] 1.06 (5%) ❌ 1.00 (1%)
["indexset", "uniqueinds", "order_filter_not_tags"] 0.92 (5%) ✅ 1.00 (1%)

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["autompo"]
  • ["contract"]
  • ["dmrg"]
  • ["indexset", "constructor"]
  • ["indexset", "filter"]
  • ["indexset", "uniqueinds"]
  • ["inplace"]
  • ["op"]
  • ["tagset"]

Julia versioninfo

Target

Julia Version 1.4.2
Commit 44fa15b150* (2020-05-23 18:35 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 18.04.5 LTS
  uname: Linux 5.4.0-1036-azure #38~18.04.1-Ubuntu SMP Wed Jan 6 18:26:30 UTC 2021 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2095 MHz      28776 s          0 s       6573 s      58302 s          0 s
       #2  2095 MHz      76492 s          0 s       5783 s      11782 s          0 s
       
  Memory: 6.791393280029297 GB (2151.3046875 MB free)
  Uptime: 956.0 sec
  Load Avg:  1.35498046875  1.31884765625  0.90625
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, skylake)

Baseline

Julia Version 1.4.2
Commit 44fa15b150* (2020-05-23 18:35 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 18.04.5 LTS
  uname: Linux 5.4.0-1036-azure #38~18.04.1-Ubuntu SMP Wed Jan 6 18:26:30 UTC 2021 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2095 MHz      55429 s          0 s      10539 s      83260 s          0 s
       #2  2095 MHz     111762 s          0 s       6932 s      30958 s          0 s
       
  Memory: 6.791393280029297 GB (2196.18359375 MB free)
  Uptime: 1512.0 sec
  Load Avg:  1.11962890625  1.27197265625  1.07958984375
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, skylake)

Target result

Benchmark Report for /home/runner/work/ITensors.jl/ITensors.jl

Job Properties

  • Time of benchmark: 3 Feb 2021 - 0:17
  • Package commit: d658e8
  • Julia commit: 44fa15
  • Julia command flags: None
  • Environment variables: None

Results

Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.

ID time GC time memory allocations
["autompo", "Quartic Hamiltonian"] 9.238 s (5%) 594.675 ms 1.78 GiB (1%) 37668617
["autompo", "Quartic QN Hamiltonian"] 18.830 s (5%) 3.319 s 9.65 GiB (1%) 105579314
["contract", "heff_2site"] 7.091 ms (5%) 19.41 MiB (1%) 258
["contract", "matmul_100"] 53.503 μs (5%) 80.44 KiB (1%) 30
["contract", "matmul_20"] 2.925 μs (5%) 5.48 KiB (1%) 29
["contract", "matmul_40"] 8.300 μs (5%) 14.86 KiB (1%) 29
["contract", "matmul_60"] 20.701 μs (5%) 30.44 KiB (1%) 30
["contract", "matmul_80"] 27.701 μs (5%) 52.31 KiB (1%) 30
["contract", "matmul_inplace_100"] 48.903 μs (5%) 1.38 KiB (1%) 18
["contract", "matmul_inplace_20"] 1.420 μs (5%) 1.38 KiB (1%) 18
["contract", "matmul_inplace_40"] 5.600 μs (5%) 1.38 KiB (1%) 18
["contract", "matmul_inplace_60"] 17.800 μs (5%) 1.38 KiB (1%) 18
["contract", "matmul_inplace_80"] 24.701 μs (5%) 1.38 KiB (1%) 18
["dmrg", "1d_S=1_heisenberg"] 25.917 s (5%) 2.425 s 38.41 GiB (1%) 2549435
["dmrg", "1d_S=1_heisenberg_qn"] 13.945 s (5%) 1.761 s 12.33 GiB (1%) 102363630
["indexset", "constructor", "function"] 614.846 ns (5%) 1.97 KiB (1%) 8
["indexset", "filter", "function"] 1.170 μs (5%) 2.56 KiB (1%) 4
["indexset", "filter", "function_order"] 38.610 ns (5%) 224 bytes (1%) 1
["indexset", "filter", "kwargs"] 333.201 ns (5%) 1.30 KiB (1%) 4
["indexset", "filter", "order_kwargs"] 22.089 ns (5%) 112 bytes (1%) 1
["indexset", "uniqueinds", "3_inputs"] 744.618 ns (5%) 864 bytes (1%) 5
["indexset", "uniqueinds", "filter_not_tags"] 662.886 ns (5%) 1.09 KiB (1%) 9
["indexset", "uniqueinds", "filter_tags"] 539.180 ns (5%) 864 bytes (1%) 5
["indexset", "uniqueinds", "nofilter0"] 3.350 μs (5%) 80 bytes (1%) 1
["indexset", "uniqueinds", "nofilter2"] 603.775 ns (5%) 1.17 KiB (1%) 6
["indexset", "uniqueinds", "order0"] 94.535 ns (5%)
["indexset", "uniqueinds", "order2"] 132.921 ns (5%) 224 bytes (1%) 1
["indexset", "uniqueinds", "order_3_inputs"] 177.094 ns (5%) 112 bytes (1%) 1
["indexset", "uniqueinds", "order_filter_not_tags"] 210.009 ns (5%) 368 bytes (1%) 5
["indexset", "uniqueinds", "order_filter_tags"] 91.403 ns (5%) 112 bytes (1%) 1
["inplace", "axpy!"] 12.400 μs (5%) 2.11 KiB (1%) 33
["op", "op QN"] 8.534 μs (5%) 6.58 KiB (1%) 100
["op", "op"] 5.550 μs (5%) 3.56 KiB (1%) 44
["tagset", "tagset"] 208.009 ns (5%) 80 bytes (1%) 1
["tagset", "tagset_unicode"] 360.301 ns (5%) 80 bytes (1%) 1

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["autompo"]
  • ["contract"]
  • ["dmrg"]
  • ["indexset", "constructor"]
  • ["indexset", "filter"]
  • ["indexset", "uniqueinds"]
  • ["inplace"]
  • ["op"]
  • ["tagset"]

Julia versioninfo

Julia Version 1.4.2
Commit 44fa15b150* (2020-05-23 18:35 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 18.04.5 LTS
  uname: Linux 5.4.0-1036-azure #38~18.04.1-Ubuntu SMP Wed Jan 6 18:26:30 UTC 2021 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2095 MHz      28776 s          0 s       6573 s      58302 s          0 s
       #2  2095 MHz      76492 s          0 s       5783 s      11782 s          0 s
       
  Memory: 6.791393280029297 GB (2151.3046875 MB free)
  Uptime: 956.0 sec
  Load Avg:  1.35498046875  1.31884765625  0.90625
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, skylake)

Baseline result

Benchmark Report for /home/runner/work/ITensors.jl/ITensors.jl

Job Properties

  • Time of benchmark: 3 Feb 2021 - 0:26
  • Package commit: 8e01e6
  • Julia commit: 44fa15
  • Julia command flags: None
  • Environment variables: None

Results

Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.

ID time GC time memory allocations
["autompo", "Quartic Hamiltonian"] 9.273 s (5%) 613.198 ms 1.78 GiB (1%) 37772154
["autompo", "Quartic QN Hamiltonian"] 19.157 s (5%) 3.438 s 9.65 GiB (1%) 105612923
["contract", "heff_2site"] 7.492 ms (5%) 19.41 MiB (1%) 258
["contract", "matmul_100"] 55.504 μs (5%) 80.44 KiB (1%) 30
["contract", "matmul_20"] 3.013 μs (5%) 5.48 KiB (1%) 29
["contract", "matmul_40"] 8.367 μs (5%) 14.86 KiB (1%) 29
["contract", "matmul_60"] 21.001 μs (5%) 30.44 KiB (1%) 30
["contract", "matmul_80"] 31.902 μs (5%) 52.31 KiB (1%) 30
["contract", "matmul_inplace_100"] 55.304 μs (5%) 1.38 KiB (1%) 18
["contract", "matmul_inplace_20"] 1.660 μs (5%) 1.38 KiB (1%) 18
["contract", "matmul_inplace_40"] 5.600 μs (5%) 1.38 KiB (1%) 18
["contract", "matmul_inplace_60"] 15.501 μs (5%) 1.38 KiB (1%) 18
["contract", "matmul_inplace_80"] 27.602 μs (5%) 1.38 KiB (1%) 18
["dmrg", "1d_S=1_heisenberg"] 26.513 s (5%) 2.700 s 38.55 GiB (1%) 2553806
["dmrg", "1d_S=1_heisenberg_qn"] 14.189 s (5%) 1.944 s 12.35 GiB (1%) 102367526
["indexset", "constructor", "function"] 579.043 ns (5%) 1.97 KiB (1%) 8
["indexset", "filter", "function"] 3.652 s (5%) 851.84 KiB (1%) 16695
["indexset", "filter", "function_order"] 38.712 ns (5%) 224 bytes (1%) 1
["indexset", "filter", "kwargs"] 163.636 ms (5%) 26.59 KiB (1%) 581
["indexset", "filter", "order_kwargs"] 21.587 ns (5%) 112 bytes (1%) 1
["indexset", "uniqueinds", "3_inputs"] 643.350 ns (5%) 864 bytes (1%) 5
["indexset", "uniqueinds", "filter_not_tags"] 652.893 ns (5%) 1.09 KiB (1%) 9
["indexset", "uniqueinds", "filter_tags"] 542.360 ns (5%) 864 bytes (1%) 5
["indexset", "uniqueinds", "nofilter0"] 3.338 μs (5%) 80 bytes (1%) 1
["indexset", "uniqueinds", "nofilter2"] 595.663 ns (5%) 1.17 KiB (1%) 6
["indexset", "uniqueinds", "order0"] 95.318 ns (5%)
["indexset", "uniqueinds", "order2"] 133.151 ns (5%) 224 bytes (1%) 1
["indexset", "uniqueinds", "order_3_inputs"] 167.560 ns (5%) 112 bytes (1%) 1
["indexset", "uniqueinds", "order_filter_not_tags"] 227.907 ns (5%) 368 bytes (1%) 5
["indexset", "uniqueinds", "order_filter_tags"] 91.922 ns (5%) 112 bytes (1%) 1
["inplace", "axpy!"] 12.400 μs (5%) 2.11 KiB (1%) 33
["op", "op QN"] 8.567 μs (5%) 6.58 KiB (1%) 100
["op", "op"] 5.384 μs (5%) 3.56 KiB (1%) 44
["tagset", "tagset"] 208.378 ns (5%) 80 bytes (1%) 1
["tagset", "tagset_unicode"] 360.789 ns (5%) 80 bytes (1%) 1

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["autompo"]
  • ["contract"]
  • ["dmrg"]
  • ["indexset", "constructor"]
  • ["indexset", "filter"]
  • ["indexset", "uniqueinds"]
  • ["inplace"]
  • ["op"]
  • ["tagset"]

Julia versioninfo

Julia Version 1.4.2
Commit 44fa15b150* (2020-05-23 18:35 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 18.04.5 LTS
  uname: Linux 5.4.0-1036-azure #38~18.04.1-Ubuntu SMP Wed Jan 6 18:26:30 UTC 2021 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2095 MHz      55429 s          0 s      10539 s      83260 s          0 s
       #2  2095 MHz     111762 s          0 s       6932 s      30958 s          0 s
       
  Memory: 6.791393280029297 GB (2196.18359375 MB free)
  Uptime: 1512.0 sec
  Load Avg:  1.11962890625  1.27197265625  1.07958984375
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, skylake)

Runtime information

Runtime Info
BLAS #threads 2
BLAS.vendor() openblas64
Sys.CPU_THREADS 2

lscpu output:

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              2
On-line CPU(s) list: 0,1
Thread(s) per core:  1
Core(s) per socket:  2
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Stepping:            4
CPU MHz:             2095.195
BogoMIPS:            4190.39
Hypervisor vendor:   Microsoft
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            36608K
NUMA node0 CPU(s):   0,1
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt avx512cd avx512bw avx512vl xsaveopt xsavec xsaves md_clear
Cpu Property Value
Brand Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Vendor :Intel
Architecture :Skylake
Model Family: 0x06, Model: 0x55, Stepping: 0x04, Type: 0x00
Cores 2 physical cores, 2 logical cores (on executing CPU)
No Hyperthreading detected
Clock Frequencies Not supported by CPU
Data Cache Level 1:3 : (32, 1024, 36608) kbytes
64 byte cache line size
Address Size 48 bits virtual, 46 bits physical
SIMD 512 bit = 64 byte max. SIMD vector size
Time Stamp Counter TSC is accessible via rdtsc
TSC increased at every clock cycle (non-invariant TSC)
Perf. Monitoring Performance Monitoring Counters (PMC) are not supported
Hypervisor Yes, Microsoft

@mtfishman
Copy link
Member Author

Thinking about this more, I'm uncomfortable designing this where:

A = randomITensor(i, j)
B = A * ITensor(1)

makes B a view of A but:

B = A * ITensor(2)

does not. However, I was thinking that if you want the view behavior (i.e. you want to avoid the copy), we could use a delta tensor with no indices for that purpose. The proposal would be:

A = randomITensor(i, j)
B1 = A * ITensor(1) # B1 is a copy of A
B2 = A * delta() # B2 is a view of A

I think the interpretation of delta() here is pretty unambigous. And as mentioned in the previous comment, the view behavior of contracting with a delta would match with how it should work for delta contractions that are replacing indices. We could of course introduce a different type for this (like a Scalar storage) but I would prefer not to if we can avoid it by using existing types.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 3, 2021

Benchmark result

Judge result

Benchmark Report for /home/runner/work/ITensors.jl/ITensors.jl

Job Properties

  • Time of benchmarks:
    • Target: 3 Feb 2021 - 20:00
    • Baseline: 3 Feb 2021 - 20:09
  • Package commits:
    • Target: 15e795
    • Baseline: dc24ee
  • Julia commits:
    • Target: 44fa15
    • Baseline: 44fa15
  • Julia command flags:
    • Target: None
    • Baseline: None
  • Environment variables:
    • Target: None
    • Baseline: None

Results

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID time ratio memory ratio
["indexset", "constructor", "function"] 0.94 (5%) ✅ 1.00 (1%)
["indexset", "filter", "function"] 0.00 (5%) ✅ 0.00 (1%) ✅
["indexset", "filter", "function_order"] 1.11 (5%) ❌ 1.00 (1%)
["indexset", "filter", "kwargs"] 0.00 (5%) ✅ 0.04 (1%) ✅
["indexset", "uniqueinds", "order2"] 0.93 (5%) ✅ 1.00 (1%)

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["autompo"]
  • ["contract"]
  • ["dmrg"]
  • ["indexset", "constructor"]
  • ["indexset", "filter"]
  • ["indexset", "uniqueinds"]
  • ["inplace"]
  • ["op"]
  • ["tagset"]

Julia versioninfo

Target

Julia Version 1.4.2
Commit 44fa15b150* (2020-05-23 18:35 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 18.04.5 LTS
  uname: Linux 5.4.0-1036-azure #38~18.04.1-Ubuntu SMP Wed Jan 6 18:26:30 UTC 2021 x86_64 x86_64
  CPU: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz: 
              speed         user         nice          sys         idle          irq
       #1  2294 MHz      46954 s          0 s       7590 s      40205 s          0 s
       #2  2294 MHz      58836 s          0 s       4973 s      31334 s          0 s
       
  Memory: 6.791393280029297 GB (2157.578125 MB free)
  Uptime: 970.0 sec
  Load Avg:  1.28515625  1.3349609375  0.92236328125
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, broadwell)

Baseline

Julia Version 1.4.2
Commit 44fa15b150* (2020-05-23 18:35 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 18.04.5 LTS
  uname: Linux 5.4.0-1036-azure #38~18.04.1-Ubuntu SMP Wed Jan 6 18:26:30 UTC 2021 x86_64 x86_64
  CPU: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz: 
              speed         user         nice          sys         idle          irq
       #1  2294 MHz      87301 s          0 s       8672 s      47979 s          0 s
       #2  2294 MHz      75111 s          0 s       9242 s      60107 s          0 s
       
  Memory: 6.791393280029297 GB (2297.87890625 MB free)
  Uptime: 1466.0 sec
  Load Avg:  1.07373046875  1.279296875  1.07958984375
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, broadwell)

Target result

Benchmark Report for /home/runner/work/ITensors.jl/ITensors.jl

Job Properties

  • Time of benchmark: 3 Feb 2021 - 20:0
  • Package commit: 15e795
  • Julia commit: 44fa15
  • Julia command flags: None
  • Environment variables: None

Results

Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.

ID time GC time memory allocations
["autompo", "Quartic Hamiltonian"] 9.121 s (5%) 690.329 ms 1.78 GiB (1%) 37768329
["autompo", "Quartic QN Hamiltonian"] 18.966 s (5%) 3.457 s 9.70 GiB (1%) 105988336
["contract", "heff_2site"] 7.751 ms (5%) 19.41 MiB (1%) 258
["contract", "matmul_100"] 73.500 μs (5%) 80.44 KiB (1%) 30
["contract", "matmul_20"] 3.675 μs (5%) 5.48 KiB (1%) 29
["contract", "matmul_40"] 10.500 μs (5%) 14.86 KiB (1%) 29
["contract", "matmul_60"] 28.000 μs (5%) 30.44 KiB (1%) 30
["contract", "matmul_80"] 38.901 μs (5%) 52.31 KiB (1%) 30
["contract", "matmul_inplace_100"] 67.800 μs (5%) 1.38 KiB (1%) 18
["contract", "matmul_inplace_20"] 1.850 μs (5%) 1.38 KiB (1%) 18
["contract", "matmul_inplace_40"] 7.525 μs (5%) 1.38 KiB (1%) 18
["contract", "matmul_inplace_60"] 21.000 μs (5%) 1.38 KiB (1%) 18
["contract", "matmul_inplace_80"] 34.400 μs (5%) 1.38 KiB (1%) 18
["dmrg", "1d_S=1_heisenberg"] 30.235 s (5%) 3.286 s 38.55 GiB (1%) 2553806
["dmrg", "1d_S=1_heisenberg_qn"] 17.041 s (5%) 2.065 s 12.36 GiB (1%) 102368110
["indexset", "constructor", "function"] 582.071 ns (5%) 1.97 KiB (1%) 8
["indexset", "filter", "function"] 1.600 μs (5%) 2.56 KiB (1%) 4
["indexset", "filter", "function_order"] 37.613 ns (5%) 224 bytes (1%) 1
["indexset", "filter", "kwargs"] 389.500 ns (5%) 1.30 KiB (1%) 4
["indexset", "filter", "order_kwargs"] 21.163 ns (5%) 112 bytes (1%) 1
["indexset", "uniqueinds", "3_inputs"] 550.532 ns (5%) 864 bytes (1%) 5
["indexset", "uniqueinds", "filter_not_tags"] 617.614 ns (5%) 1.09 KiB (1%) 9
["indexset", "uniqueinds", "filter_tags"] 486.735 ns (5%) 864 bytes (1%) 5
["indexset", "uniqueinds", "nofilter0"] 2.722 μs (5%) 80 bytes (1%) 1
["indexset", "uniqueinds", "nofilter2"] 576.720 ns (5%) 1.17 KiB (1%) 6
["indexset", "uniqueinds", "order0"] 72.004 ns (5%)
["indexset", "uniqueinds", "order2"] 109.744 ns (5%) 224 bytes (1%) 1
["indexset", "uniqueinds", "order_3_inputs"] 133.908 ns (5%) 112 bytes (1%) 1
["indexset", "uniqueinds", "order_filter_not_tags"] 191.982 ns (5%) 368 bytes (1%) 5
["indexset", "uniqueinds", "order_filter_tags"] 72.999 ns (5%) 112 bytes (1%) 1
["inplace", "axpy!"] 13.500 μs (5%) 2.11 KiB (1%) 33
["op", "op QN"] 8.833 μs (5%) 6.58 KiB (1%) 100
["op", "op"] 4.983 μs (5%) 3.56 KiB (1%) 44
["tagset", "tagset"] 181.143 ns (5%) 80 bytes (1%) 1
["tagset", "tagset_unicode"] 306.478 ns (5%) 80 bytes (1%) 1

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["autompo"]
  • ["contract"]
  • ["dmrg"]
  • ["indexset", "constructor"]
  • ["indexset", "filter"]
  • ["indexset", "uniqueinds"]
  • ["inplace"]
  • ["op"]
  • ["tagset"]

Julia versioninfo

Julia Version 1.4.2
Commit 44fa15b150* (2020-05-23 18:35 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 18.04.5 LTS
  uname: Linux 5.4.0-1036-azure #38~18.04.1-Ubuntu SMP Wed Jan 6 18:26:30 UTC 2021 x86_64 x86_64
  CPU: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz: 
              speed         user         nice          sys         idle          irq
       #1  2294 MHz      46954 s          0 s       7590 s      40205 s          0 s
       #2  2294 MHz      58836 s          0 s       4973 s      31334 s          0 s
       
  Memory: 6.791393280029297 GB (2157.578125 MB free)
  Uptime: 970.0 sec
  Load Avg:  1.28515625  1.3349609375  0.92236328125
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, broadwell)

Baseline result

Benchmark Report for /home/runner/work/ITensors.jl/ITensors.jl

Job Properties

  • Time of benchmark: 3 Feb 2021 - 20:9
  • Package commit: dc24ee
  • Julia commit: 44fa15
  • Julia command flags: None
  • Environment variables: None

Results

Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.

ID time GC time memory allocations
["autompo", "Quartic Hamiltonian"] 9.037 s (5%) 686.265 ms 1.78 GiB (1%) 37768329
["autompo", "Quartic QN Hamiltonian"] 19.151 s (5%) 3.511 s 9.70 GiB (1%) 105988336
["contract", "heff_2site"] 7.532 ms (5%) 19.41 MiB (1%) 258
["contract", "matmul_100"] 72.400 μs (5%) 80.44 KiB (1%) 30
["contract", "matmul_20"] 3.688 μs (5%) 5.48 KiB (1%) 29
["contract", "matmul_40"] 10.600 μs (5%) 14.86 KiB (1%) 29
["contract", "matmul_60"] 27.500 μs (5%) 30.44 KiB (1%) 30
["contract", "matmul_80"] 39.200 μs (5%) 52.31 KiB (1%) 30
["contract", "matmul_inplace_100"] 68.601 μs (5%) 1.38 KiB (1%) 18
["contract", "matmul_inplace_20"] 1.820 μs (5%) 1.38 KiB (1%) 18
["contract", "matmul_inplace_40"] 7.425 μs (5%) 1.38 KiB (1%) 18
["contract", "matmul_inplace_60"] 21.400 μs (5%) 1.38 KiB (1%) 18
["contract", "matmul_inplace_80"] 35.100 μs (5%) 1.38 KiB (1%) 18
["dmrg", "1d_S=1_heisenberg"] 30.845 s (5%) 4.003 s 38.55 GiB (1%) 2553806
["dmrg", "1d_S=1_heisenberg_qn"] 17.464 s (5%) 2.467 s 12.36 GiB (1%) 102368110
["indexset", "constructor", "function"] 621.739 ns (5%) 1.97 KiB (1%) 8
["indexset", "filter", "function"] 2.981 s (5%) 851.84 KiB (1%) 16695
["indexset", "filter", "function_order"] 33.902 ns (5%) 224 bytes (1%) 1
["indexset", "filter", "kwargs"] 150.771 ms (5%) 30.26 KiB (1%) 664
["indexset", "filter", "order_kwargs"] 21.163 ns (5%) 112 bytes (1%) 1
["indexset", "uniqueinds", "3_inputs"] 561.170 ns (5%) 864 bytes (1%) 5
["indexset", "uniqueinds", "filter_not_tags"] 646.597 ns (5%) 1.09 KiB (1%) 9
["indexset", "uniqueinds", "filter_tags"] 495.413 ns (5%) 864 bytes (1%) 5
["indexset", "uniqueinds", "nofilter0"] 2.733 μs (5%) 80 bytes (1%) 1
["indexset", "uniqueinds", "nofilter2"] 575.138 ns (5%) 1.17 KiB (1%) 6
["indexset", "uniqueinds", "order0"] 71.798 ns (5%)
["indexset", "uniqueinds", "order2"] 117.667 ns (5%) 224 bytes (1%) 1
["indexset", "uniqueinds", "order_3_inputs"] 133.448 ns (5%) 112 bytes (1%) 1
["indexset", "uniqueinds", "order_filter_not_tags"] 186.687 ns (5%) 368 bytes (1%) 5
["indexset", "uniqueinds", "order_filter_tags"] 73.819 ns (5%) 112 bytes (1%) 1
["inplace", "axpy!"] 14.100 μs (5%) 2.11 KiB (1%) 33
["op", "op QN"] 8.900 μs (5%) 6.58 KiB (1%) 100
["op", "op"] 5.000 μs (5%) 3.56 KiB (1%) 44
["tagset", "tagset"] 187.000 ns (5%) 80 bytes (1%) 1
["tagset", "tagset_unicode"] 307.287 ns (5%) 80 bytes (1%) 1

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["autompo"]
  • ["contract"]
  • ["dmrg"]
  • ["indexset", "constructor"]
  • ["indexset", "filter"]
  • ["indexset", "uniqueinds"]
  • ["inplace"]
  • ["op"]
  • ["tagset"]

Julia versioninfo

Julia Version 1.4.2
Commit 44fa15b150* (2020-05-23 18:35 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 18.04.5 LTS
  uname: Linux 5.4.0-1036-azure #38~18.04.1-Ubuntu SMP Wed Jan 6 18:26:30 UTC 2021 x86_64 x86_64
  CPU: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz: 
              speed         user         nice          sys         idle          irq
       #1  2294 MHz      87301 s          0 s       8672 s      47979 s          0 s
       #2  2294 MHz      75111 s          0 s       9242 s      60107 s          0 s
       
  Memory: 6.791393280029297 GB (2297.87890625 MB free)
  Uptime: 1466.0 sec
  Load Avg:  1.07373046875  1.279296875  1.07958984375
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, broadwell)

Runtime information

Runtime Info
BLAS #threads 2
BLAS.vendor() openblas64
Sys.CPU_THREADS 2

lscpu output:

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              2
On-line CPU(s) list: 0,1
Thread(s) per core:  1
Core(s) per socket:  2
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               79
Model name:          Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
Stepping:            1
CPU MHz:             2294.684
BogoMIPS:            4589.36
Hypervisor vendor:   Microsoft
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            51200K
NUMA node0 CPU(s):   0,1
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt md_clear
Cpu Property Value
Brand Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
Vendor :Intel
Architecture :Broadwell
Model Family: 0x06, Model: 0x4f, Stepping: 0x01, Type: 0x00
Cores 2 physical cores, 2 logical cores (on executing CPU)
No Hyperthreading detected
Clock Frequencies Not supported by CPU
Data Cache Level 1:3 : (32, 256, 51200) kbytes
64 byte cache line size
Address Size 48 bits virtual, 46 bits physical
SIMD 256 bit = 32 byte max. SIMD vector size
Time Stamp Counter TSC is accessible via rdtsc
TSC increased at every clock cycle (non-invariant TSC)
Perf. Monitoring Performance Monitoring Counters (PMC) are not supported
Hypervisor Yes, Microsoft

@mtfishman
Copy link
Member Author

See #618 for a design proposal for tensors that contract and return views instead of copies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scalar ITensor for QN or BlockSparse Case
1 participant