Skip to content

Excel-lent/VBA-performance-tests

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

layout title published authors
post
Performance Tips for VBA
true
Sancarn

DON'T STAR THIS REPOSITORY BECAUSE IT IS TEMPORARY AND WILL BE DELETED IN THE FUTURE!

THE MAIN DISCUSSION IS HERE

How to improve performance of VBA macros is a common question on the VBA reddit. In this section we will provide examples of code, and test their performance against each other. We believe it is best to work by example and hope this will give you ideas of how you can improve performance in your projects.

Performance of routines in this document are calculated using stdPerformance.cls which is part of the stdVBA project.

Proposals that I implemented to the document and in the table:
  • That would be great to make more space between sections, to separate them from each other.
  • I propose to use only percents of performance increasing except of ms because milliseconds are computer dependent and percents not (less dependent).
  • Not a bad idea would be to scale the constants for each test case. I mean following: I prefer to use Sufrace Pro 2 (it's enough for my purposes), but with your C_MAX constants calculations take more than 10 minutes. Would it be better to calculate a scaling constant at first test and multiply all other C_MAX constants on it?
  • As user I want to download the table, press the button and get the results. Therefore, may be the most important proposal is to add the table to Releases:

The performance test table is available HERE.

  • The results in percents for each Excel version are stored in sheet "Results". I Used the same table on 3 different computers (Surface Pro 2 with Excel 2007, some old one with Excel 2010 and i7 with Excel 2019 64 bit). Sancarn, I do not know which version you have, therefore, your results are shown separately.
  • May be the most important: I cannot guarantee the correctness. You know, there is always room for errors in the code.

Notations:

  • 2007/32: Excel 2007 32 bit
  • 2010/32: Excel 2010 32 bit
  • 2019/64: Excel 2019 64 bit

FOREWORD This section does expect you to be familiar with VBA already, if you are still new to VBA check out the resources section


S1) Direct Value Setting

You don't need to select a cell to manipulate it. The following statements are equivalent

Source code
With stdPerformance.Measure("#1 Select and set", C_MAX)
  For i = 1 To C_MAX
    cells(1, 1).select
    selection.value = "hello"
  Next
End With

With stdPerformance.Measure("#2 Set directly", C_MAX)
  For i = 1 To C_MAX
    cells(1, 1).value = "hello"
  Next
End With

RESULTS

Sancarn 2007/32 2010/32 2019/64
294% 54% 86% 192%


S2) Cut-Paste

Cutting is defined as the act of removing a value from one location and inserting those values into another location. In this test we will test 2 options, one using cut and paste and another using direct value assignment followed by clearing.

Source code
const C_MAX as long = 1000
With stdPerformance.Measure("#1 Cut and paste", C_MAX)
  For i = 1 To C_MAX
    cells(1, 1).cut
    cells(1, 2).select
    ActiveSheet.paste
  Next
End With

With stdPerformance.Measure("#2 Set directly + clear", C_MAX)
  For i = 1 To C_MAX
    cells(1, 2).value = cells(1, 1).value
    cells(1, 1).Clear
  Next
End With

RESULTS

As you can see in the results, direct value assignment is significantly faster than cutting and pasting (and it doesn't alter the clipboard)

Sancarn 2007/32 2010/32 2019/64
1505% 235% -80% -15%


S3) Use arrays

Using arrays is a common bit of advice to any beginners to improve performance. This demonstrates this fact.

Source code
Const C_MAX As Long = 50000
With stdPerformance.Measure("#1 Looping through individual cells setting values", C_MAX)
  For i = 1 To C_MAX
    Cells(i, 1).Value2 = Rnd()
  Next
End With

With stdPerformance.Measure("#2 Exporting array in bulk, set values, Import array in bulk", C_MAX)
  'GetRange
  Set r = ActiveSheet.Range("A1").Resize(C_MAX, 10)

  'Values of Range --> Array
  v = r.Value2
  
  'Modify array
  For i = 1 To C_MAX  'Using absolute just to be clear no extra work is done, but you'd usually use ubound(v,1)
    v(i, 1) = Rnd()
  Next

  'Values of array  -->  Range
  r.Value2 = v
End With

RESULTS

It can be concluded that to insert multiple values into a range it is always better to use Range(...).value.

Sancarn 2007/32 2010/32 2019/64
305% 196% 55% 269%


S4) Options

S4a) Option ScreenUpdating

Another common suggestion is that ScreenUpdating is turned off. ScreenUpdating will generally affect any code which could cause a visual change on the worksheet e.g. Setting a cell's value, changing the color of a cell, etc.

Source code
Const C_MAX As Long = 1000
With stdPerformance.Measure("#1 Looping through individual cells setting values", C_MAX)
  For i = 1 To C_MAX
    cells(1, 1).value = Empty
  Next
End With
With stdPerformance.Measure("#2 w/ ScreenUpdating", C_MAX)
  Application.ScreenUpdating = False
    For i = 1 To C_MAX
      cells(1, 1).value = Empty
    Next
  Application.ScreenUpdating = True
End With

RESULTS

Sancarn 2007/32 2010/32 2019/64
115% -85% 32% -1%

It is important to note however that repeatedly setting cells is bad practice and if you were doing this using arrays toggling ScreenUpdating might have a negative impact as whenever ScreenUpdating is set to true it will forcefully update the screen:

Double parts are removed from the code below. v(1) = Empty is replaced with cells(1, 1).value = Empty (since ScreenUpdating has no influence on array performance, but on performance of updating of cells).

Source code
Const C_MAX as Long = 1000
With stdPerformance.Measure("#2 w/ ScreenUpdating within loop", C_MAX)
    For i = 1 To C_MAX
      Application.ScreenUpdating= False
        cells(1, 1).value = Empty
      Application.ScreenUpdating= True
    Next
End With

RESULTS

Sancarn 2007/32 2010/32 2019/64
3125% 335% -24% -3%


S4b) Option EnableEvents

On many occasions people online will claim that setting EnableEventsto false will greatly speed up your code. But does it really?

Source code
With stdPerformance.Measure("#1 Looping through individual cells setting values", C_MAX)
  For i = 1 To C_MAX
    cells(1, 1).value = Empty
  Next
End With

With stdPerformance.Measure("#2 w/ EnableEvents", C_MAX)
  Application.EnableEvents = False
    For i = 1 To C_MAX
      cells(1, 1).value = Empty
    Next
  Application.EnableEvents = True
End With

RESULTS

Sancarn 2007/32 2010/32 2019/64
71% 1% 0% 12%

It is important to note however that repeatedly setting cells is bad practice and if you were doing this virtually toggling enable events is only going to have negligible, if not negative impacts depending where the call is made:

The code below has been modified to remove the double parts. v(1) = Empty is replaced with cells(1, 1).value = Empty since EnableEvents does not affect array performance, but it does impact the performance of cell updates.

Source code
With stdPerformance.Measure("#3 w/ EnableEvents within loop", C_MAX)
    For i = 1 To C_MAX
      Application.EnableEvents = False
        cells(1, 1).value = Empty
      Application.EnableEvents = True
    Next
End With

RESULTS

Sancarn 2007/32 2010/32 2019/64
5569% -1% 0% 1%

S4c) Option Calculation

Many sources suggest turning off calculation, and performing a manual calculation after edits is faster than continual calculation. In this test we'll test this hypothesis:

Source code
Const C_MAX As Long = 50000
Dim rCell As Range: Set rCell = wsTmp1.Range("A1")
wsTmp1.Range("E1:E1000").Formula = "=RandBetween(1,10)"
With stdPerformance.Measure("Calculation = xlCalculationAutomatic", C_MAX)
  For i = 1 To C_MAX
    rCell.Formula = "=1"
  Next
End With
With stdPerformance.Measure("Calculation = xlCalculationManual", C_MAX)
  Application.Calculation = xlCalculationManual
    For i = 1 To C_MAX
      rCell.Formula = "=1"
    Next
  Application.Calculation = xlCalculationAutomatic
  Application.Calculate
End With

RESULTS

It indeed appears to be the case that disabling calculation, and enabling it afterwards has a significant impact on performance of formula evaluation.

Sancarn 2007/32 2010/32 2019/64
203% 1119% 298% 200%

It should be noted however that this performance impact is only noticeable when changing cells on the worksheet. Either by setting the value directly (range.value) or by setting the formula (range.formula). If instead you are setting the value of an array, changing Calculation mode has a slightly negative impact on the code. It also should be noted that in xlCalculationAutomatic mode every cells with formulas will be recalculated on every change of the value in cell. Therefore the most performence gain will be reached if you have a lot of formulas on the sheet and apply xlCalculationManual. One should also note that if you applied xlCalculationManual and read some intermediate values from the sheet, than you can obtain wrong results since the formulas were not recalculated.



S4d) Option DisplayStatusBar

Some sources suggest turning off the StatusBar will speed up macros, because Excel will no longer have to perform status-bar updates. In this test we'll test that hypothesis:

Source code
Const C_MAX As Long = 5000000
With stdPerformance.Optimise   'Disable screen updating and application events
  Dim v
  With stdPerformance.Measure("Don't change StatusBar", C_MAX)
    For i = 1 To C_MAX
      v = ""
    Next
  End With
  With stdPerformance.Measure("Change StatusBar setting", C_MAX)
    Application.DisplayStatusBar = False
      For i = 1 To C_MAX
        v = ""
      Next
    Application.DisplayStatusBar = True
  End With
End With

RESULTS

As shown, changing the status bar will only increase the time the macro runs.

Sancarn 2007/32 2010/32 2019/64
-7% -2% 0% 5%

One may suggest that this is only because the statusBar isn't being set, and/or because we are disabling screen updating and application events. So let's remove those and test again:

Source code
Const C_MAX As Long = 5000
With stdPerformance.Measure("Change StatusBar setting", C_MAX)
  Application.DisplayStatusBar = False
    For i = 1 To C_MAX
      Application.StatusBar = i
    Next
  Application.DisplayStatusBar = True
End With
With stdPerformance.Measure("Don't change StatusBar", C_MAX)
  For i = 1 To C_MAX
    Application.StatusBar = i
  Next
End With

RESULTS

We still see changing the status bar display slows down the code.

Sancarn 2007/32 2010/32 2019/64
-1% 165% 89% -10%

So the long and short of it is, it's best to avoid toggling the status bar unless you really want that feature.



S4e) Option DisplayPageBreaks

Some sources suggest that Option DisplayPageBreaks can be used to prevent Excel from calculating page breaks, thus speeding up macro execution.

Given that this example is much like others we'll only display results this time.

In a similar vein to DisplayStatusBar, displaying page breaks appears to have a negative impact on performance rather than a positive one.

Sancarn 2007/32 2010/32 2019/64
6% -13% -8% -16%


S4f) Option EnableAnimations

Some sources claim toggling EnableAnimations can speed up performance.

Given that this example is much like others we'll only display results this time.

Much like DisplayStatusBar, changing EnableAnimations appears to have a negative impact on performance rather than a positive one.

Sancarn 2007/32 2010/32 2019/64
2% 6% 1% 7%


S4g) Option PrintCommunication

Some sources claim toggling PrintCommunication can speed up performance.

Given that this example is much like others we'll only display results this time.

Much like DisplayStatusBar, changing PrintCommunication appears to have a negative impact on performance rather than a positive one.

Sancarn 2007/32 2010/32 2019/64
2% - 0% 19%


S5) Using With statements

It is often advised to use a With statement to increase performance, but how significantly does this affect performance? For this we will test 3 scenarios:

  1. Searching an object directly.
  2. Using a with statement.
  3. Creating a helper variable to do the data lookup with.

The results vary wildly depending on how deep within an object you go, for this reason we have created a special helper class to perform this test:

Source code
VERSION 1.0 CLASS
BEGIN
  MultiUse = -1  'True
END
Attribute VB_Name = "oClass"
Attribute VB_GlobalNameSpace = False
Attribute VB_Creatable = False
Attribute VB_PredeclaredId = False
Attribute VB_Exposed = False
Public Self As oClass
Public Data As Boolean    

Private Sub Class_Initialize()
  Set Self = Me
  Data = True
End Sub

The above has a loop which allows us to continuously bury down into depths of the object to see how these affect our results.

Tests have been done up to 10 levels of depth. Here's the tests for the first 3 levels:

Source code
Const C_MAX As Long = 10000000
Dim o As New oClass
Dim x As oClass

'Depth of 1
With stdPerformance.Measure("D1,No with block", C_MAX)
    For i = 1 To C_MAX
      v = o.Data
    Next
End With
With stdPerformance.Measure("D1,With block", C_MAX)
    With o
      For i = 1 To C_MAX
        v = .Data
      Next
    End With
End With
With stdPerformance.Measure("D1,Variable", C_MAX)
    Set x = o
    For i = 1 To C_MAX
      v = x.Data
    Next
End With

'Depth of 2    
With stdPerformance.Measure("D2,No with block", C_MAX)
    For i = 1 To C_MAX
      v = o.Self.Data
    Next
End With
With stdPerformance.Measure("D2,With block", C_MAX)
    With o.Self
      For i = 1 To C_MAX
        v = .Data
      Next
    End With
End With
With stdPerformance.Measure("D2,Variable", C_MAX)
    Set x = o.Self
    For i = 1 To C_MAX
      v = x.Data
    Next
End With

'Depth of 3
With stdPerformance.Measure("D3,No with block", C_MAX)
    For i = 1 To C_MAX
      v = o.Self.Self.Data
    Next
End With
With stdPerformance.Measure("D3,With block", C_MAX)
    With o.Self.Self
      For i = 1 To C_MAX
        v = .Data
      Next
    End With
End With
With stdPerformance.Measure("D3,Variable", C_MAX)
    Set x = o.Self.Self
    For i = 1 To C_MAX
      v = x.Data
    Next
End With
...

RESULTS

You'll note that With blocks behave exactly like use of variables. This is because internally With statements do set a special variable to the value assigned to them. In general you can see that both With Statements and variables scale adequately to larger and larger depths. On the other hand if no with blocks or variables are used, the time taken scales up indefinitely:

I would like to hide this:
D1,No with block: 250 ms (0.025µs per operation)
D1,With block: 250 ms (0.025µs per operation)
D1,Variable: 234 ms (0.0234µs per operation)
D2,No with block: 563 ms (0.0563µs per operation)
D2,With block: 234 ms (0.0234µs per operation)
D2,Variable: 234 ms (0.0234µs per operation)
D3,No with block: 844 ms (0.0844µs per operation)
D3,With block: 250 ms (0.025µs per operation)
D3,Variable: 235 ms (0.0235µs per operation)
D4,No with block: 1156 ms (0.1156µs per operation)
D4,With block: 234 ms (0.0234µs per operation)
D4,Variable: 250 ms (0.025µs per operation)
D5,No with block: 1610 ms (0.161µs per operation)
D5,With block: 250 ms (0.025µs per operation)
D5,Variable: 234 ms (0.0234µs per operation)
D6,No with block: 1812 ms (0.1812µs per operation)
D6,With block: 250 ms (0.025µs per operation)
D6,Variable: 235 ms (0.0235µs per operation)
D7,No with block: 2109 ms (0.2109µs per operation)
D7,With block: 234 ms (0.0234µs per operation)
D7,Variable: 250 ms (0.025µs per operation)
D8,No with block: 2391 ms (0.2391µs per operation)
D8,With block: 234 ms (0.0234µs per operation)
D8,Variable: 250 ms (0.025µs per operation)
D9,No with block: 2703 ms (0.2703µs per operation)
D9,With block: 235 ms (0.0235µs per operation)
D9,Variable: 234 ms (0.0234µs per operation)
D10,No with block: 3031 ms (0.3031µs per operation)
D10,With block: 235 ms (0.0235µs per operation)
D10,Variable: 250 ms (0.025µs per operation)

An chart view of this data can be found here.


Sancarn 2007/32 2010/32 2019/64
1190% 1215% 1437% 1479%

A few important notes:

  1. We only get performance benefits when depth is greater than 2. However in this case there are no negative performance impacts either.
  2. This demonstrates that use of variables can still obtain the performance benefits of With blocks, which can help get around some limitations with `With blocks. E.G. imagine setting values in a sheet to values in an object:
Source code
With ThisWorkbook.Sheets(1)
  With myObject.someData
    'This will syntax error because Range property is part of Worksheet object
    'not myObject.
    .Range("...").value = ,dataProperty
  End with
End with

Here we can use variables to work around the limitation, while still getting performance benefits:

Source code
Dim ws as worksheet: set ws = ThisWorkbook.Sheets(1)
With myObject.someData
  ws.Range("...").value = ,dataProperty
End with


S6) Late vs Early Binding

It is common knowledge that Late binding is slower than Early binding.

Source code
Dim r1 As Object, r2 As VBScript_RegExp_55.RegExp
With stdPerformance.Measure("#A-1 Late bound creation", C_MAX)
  For i = 1 To C_MAX
    Set r1 = CreateObject("VBScript.Regexp")
  Next
End With
With stdPerformance.Measure("#A-2 Early bound creation", C_MAX)
  For i = 1 To C_MAX
    Set r2 = New VBScript_RegExp_55.RegExp
  Next
End With

Set r1 = CreateObject("VBScript.Regexp")
With stdPerformance.Measure("#B-1 Late bound calls", C_MAX2)
  For i = 1 To C_MAX2
    r1.pattern = "something"
  Next
End With

Set r2 = New VBScript_RegExp_55.RegExp
With stdPerformance.Measure("#B-2 Early bound calls", C_MAX2)
  For i = 1 To C_MAX2
    r2.pattern = "something"
  Next
End With

RESULTS Late bound vs early bound creation:

Sancarn 2007/32 2010/32 2019/64
3% -19% -8% -6%

Late bound vs early bound calls:

Sancarn 2007/32 2010/32 2019/64
698% 639% 609% 1018%

Method calls on objects defined using Dim ... as Object (late-bound objects) are significantly slower than method calls on strictly typed objects e.g. Dim ... as Regexp or Dim ... as Dictionary.

Contrary to popular belief this has nothing todo with the use of CreateObject(...). In fact we can do a test:

Source code
Set r2 = CreateObject("VBScript.Regexp")
With stdPerformance.Measure("#B-3 Early bound calls via CreateObject", C_MAX2)
  For i = 1 To C_MAX2
    r2.pattern = "something"
  Next
End With

This will perform exactly the same as #B-2.

It is anticipated that Object performs so much slower than strict types because Object uses the IDispatch interface of an object. I.E. whenever an object method is called IDispatch::GetIDsOfNames is called followed by IDispatch::Invoke. Due to the poor performance it is expected that the IDs are not cached for use within the function body, and are instead called each time a function is called. A faster approach would be to call IDispatch::Invoke ourselves with known DispIDs in scenarios where the object type is known. E.G.

Source code
'Note `DispGetIDsOfNames` and `DispLetProp` are purely fictional functions. But would likely
'improve performance.
set r1 = CreateObject("VBScript.Regexp")
Dim ids() as long: ids = DispGetIDsOfNames(r1, Array("pattern"))
With stdPerformance.Measure("#B-3 Late bound fast dispatch", C_MAX2)
  For i = 1 To C_MAX2
    Call DispLetProp(r2, ids(0), "something")
  Next
End With

CAVEATS

Using direct references to types makes your code less portable than using Object type with CreateObject.



S7) ByRef vs ByVal

Functions testByVal and testByRef have the same content "Cells(1, 1).Value2 = Rnd()" to slow down execution and get meaningful results of speedup.

Source code
v = Split(Space(1000)," ")
With stdPerformance.measure("#1 `ByVal`", C_MAX)
  For i = 1 to C_MAX
    Call testByVal(v)
  Next
End With
With stdPerformance.measure("#2 `ByRef`", C_MAX)
  For i = 1 to C_MAX
    Call testByRef(v)
  Next
End With

RESULTS

Sancarn 2007/32 2010/32 2019/64
- 19% 74% 218%


S8) Module vs Class

Many people, myself included, use Classes extensively in our projects to augment data. However using Object-Oriented programming is only an option and the same sort of things can be done in a purely procedural-oriented way using Type and Module functions.

Source code

First we need to create our Class and Module:

Car1 class

B-1 and B-2 are removed because the performance is nearly the same.

VERSION 1.0 CLASS
BEGIN
  MultiUse = -1  'True
END
Attribute VB_Name = "Car1"
Attribute VB_GlobalNameSpace = False
Attribute VB_Creatable = False
Attribute VB_PredeclaredId = True
Attribute VB_Exposed = False
Public name As String
Public SeatCount As Long
Public DoorCount As Long
Public Distance As Long

Public Function Create(ByVal sName As String, iSeatCount As Long, iDoorCount As Long) As Car1
  Set Create = New Car1
  Call Create.protInit(sName, iSeatCount, iDoorCount)
End Function
Friend Sub protInit(ByVal sName As String, iSeatCount As Long, iDoorCount As Long)
  name = sName
  SeatCount = iSeatCount
  DoorCount = iDoorCount
End Sub

Public Sub Tick()
  Distance = Distance + 1
End Sub

Car2 Module

Attribute VB_Name = "Car2"
Public Type CarData
  name As String
  SeatCount As Long
  DoorCount As Long
  Distance As Long
End Type

Public Function Car_Create(ByVal sName As String, iSeatCount As Long, iDoorCount As Long) As CarData
  Car_Create.name = sName
  Car_Create.SeatCount = iSeatCount
  Car_Create.DoorCount = iDoorCount
End Function

Public Sub Car_Tick(ByRef data As CarData)
  data.Distance = data.Distance + 1
End Sub

Now we can test the performance of these 2 modules as follows:

Const C_MAX  As Long = 500000

Dim i As Long
Dim c1 As Car1
Dim c2 As Car2.CarData

With stdPerformance.Measure("A-#1 Object creation (Class)", C_MAX)
  For i = 1 To C_MAX
    Set c1 = Car1.Create("hello", 10, 2)
  Next
End With
With stdPerformance.Measure("A-#2 Object creation (Module)", C_MAX)
  For i = 1 To C_MAX
    c2 = Car2.Car_Create("hello", 10, 2)
  Next
End With

RESULTS

From these results we can see that class creation and initialisation is clearly significantly slower than module struct creation, however the calling of methods (and thus also setting of properties) are equally performant.

Module vs Class Object creation

Sancarn 2007/32 2010/32 2019/64
463% 540% 581% 670%

Module vs Class Object method calls

Sancarn 2007/32 2010/32 2019/64
0% 45% 72% 66%


S9) Variant vs Typed Data

It is often suggested that you should use Typed data wherever possible for performance reasons.

Source code
Const C_MAX  As Long = 5000000

Dim i As Long
With stdPerformance.Measure("#1 - Variant", C_MAX)
  Dim v() As Variant
  ReDim v(1 To C_MAX)
  For i = 1 To C_MAX
    v(i) = i
  Next
End With
With stdPerformance.Measure("#2 - Type", C_MAX)
  Dim l() As Long
  ReDim l(1 To C_MAX)
  For i = 1 To C_MAX
    l(i) = i
  Next

RESULTS

In general this is only a real benefit when dealing with very large datasets (>5 million cells), but it definitely can happen so if you can, always type your data.

Sancarn 2007/32 2010/32 2019/64
32% 16% 61% 81%


S10) Bulk range operations

It was anticipated that using Application.Union() to create a big range, and then calling delete on this single range would be faster than deleting each row individually. This turned out incorrect.

Source code
Const C_MAX as long = 5000

Dim i As Long
Range("A1:X" & C_MAX).value = "Some cool data here"

With stdPerformance.Measure("#1 Delete rows 1 by 1", C_MAX)
  For i = C_MAX To 1 Step -1
    'Delete only even rows
    If i Mod 2 = 0 Then
      Rows(i).Delete
    End If
  Next
End With

With stdPerformance.Measure("#2 Delete all rows in a single operation less branching", C_MAX)
  Set rng = cells(Rows.count, 1)
  For i = 1 To C_MAX
    If i Mod 2 = 0 Then
      Set rng = Application.Union(rng, Rows(i))
    End If
  Next i
  Set rng = Application.Intersect(rng, Range("1:" & C_MAX))
  rng.Delete
End With

RESULTS

Sancarn 2007/32 2010/32 2019/64
-45% -16% 6% 5%


S11) When to use advanced filters

In many performance tutorials you will be told to use the Advanced filter to speed up filter operations on data. One might think that Advanced is so much faster than hand crafted techniques that it is always better to use it.

In order to test this out we'll need some data. In this tutorial I'll use the following function to generate a section of data for us to filter on:

Source code
'Obtain an array of data to C_MAX size.
'@param {Long} Max length of data
'@returns {Variant(nx2)} Returns an array of data 2 columns wide and n rows deep. 
Public Function getArray(C_MAX As Long) As Variant
  Dim arr() As Variant
  ReDim arr(1 To C_MAX, 1 To 2)
  
  arr(1, 1) = "ID"
  arr(1, 2) = "Key"
  Dim i As Long
  For i = 2 To C_MAX
    'ID
    arr(i, 1) = i
    Select Case True
      Case i Mod 17 = 0: arr(i, 2) = "A"
      Case i Mod 13 = 0: arr(i, 2) = "B"
      Case i Mod 11 = 0: arr(i, 2) = "C"
      Case i Mod 7 = 0: arr(i, 2) = "D"
      Case Else
        arr(i, 2) = "E"
    End Select
  Next
  getArray = arr
End Function

In this test, we will experiment with 3 different scenarios:

  1. We have data in Sheet1 and we want to obtain a filtered copy in Sheet2.
  2. We already have an array in VBA and we want to filter it's contents.
  3. We have data in Sheet1 and we want to obtain a filtered copy as an array.

S11a) Sheet to Sheet

In scenario 1, we have some data in Sheet1 and we'd like to obtain a copy of this data on Sheet2, filtered where the Key = "A"

I have changed the sequence: first is always a slow method, second method is always fast:.

Source code
'In this test we are going to try the same thing but where the data is already on the sheet
Sub scenario1()
  'Initialisation. Initialise a sheet containing data and an output sheet containing headers.
  Dim arr As Variant: arr = getArray(1000000)
  Dim iNumRow As Long: iNumRow = UBound(arr, 1) - LBound(arr, 1) + 1
  Dim iNumCol As Long: iNumCol = UBound(arr, 2) - LBound(arr, 2) + 1
  Sheet1.Range("A1").Resize(iNumRow, iNumCol).Value = arr
  Sheet2.UsedRange.Clear
  
  With stdPerformance.Optimise()
    'Copy data from sheet into an array,
    'loop through rows, move filtered rows to top of array,
    'only return required size of array
    Sheet2.UsedRange.Clear
    With stdPerformance.Measure("#2 Use of array")
      Dim v: v = Sheet1.Range("A1").CurrentRegion.Value
      Dim iRowLen As Long: iRowLen = UBound(v, 1)
      Dim iColLen As Long: iColLen = UBound(v, 2)
      
      Dim i As Long, j As Long, iRet As Long
      iRet = 1
      For i = 2 To iRowLen
        If v(i, 2) = "A" Then
          iRet = iRet + 1
          If iRet < i Then
            For j = 1 To iColLen
              v(iRet, j) = v(i, j)
            Next
          End If
        End If
      Next
      
      Sheet2.Range("A1").Resize(iRet, iColLen).Value = v
    End With

    'Use advanced filter, copy result and paste to new location. Use range.currentRegion.value to obtain result
    With stdPerformance.Measure("#1 Advanced filter and copy")
      'Choose headers
      Sheet2.Range("A1:B1").Value = Array("ID", "Key")
      
      'Choose filter
      HiddenSheet.Range("A1:A2").Value = Application.Transpose(Array("Key", "A"))
      'filter and copy data
      With Sheet1.Range("A1").CurrentRegion
        Call .AdvancedFilter(xlFilterCopy, HiddenSheet.Range("A1:A2"), Sheet2.Range("A1:B1"))
      End With

      'Cleanup
      HiddenSheet.UsedRange.Clear
    End With
    
  End With
End Sub

RESULTS

Sancarn 2007/32 2010/32 2019/64
-23% 982% 962% 238%


S11b) Array to Array

In scenario 2, we have some data in an array, and want a filtered set of IDs as an output. Is it worthwhile writing this data to the sheet and using AdvancedFilter? Or would it be better to filter the array in memory?

Source code
Sub scenario2()
  'Obtain test data
  Dim arr As Variant: arr = getArray(100000)
  
  'Some of these tests may take some time, so optimise to ensure these don't have an impact
  With stdPerformance.Optimise()
    'Use advanced filter, copy result and paste to new location. Use range.currentRegion.value to obtain result
    Dim vResult
    With stdPerformance.Measure("#1 Advanced filter and copy - array result")
      'Get data dimensions
      Dim iNumRow As Long: iNumRow = UBound(arr, 1) - LBound(arr, 1) + 1
      Dim iNumCol As Long: iNumCol = UBound(arr, 2) - LBound(arr, 2) + 1
      
      'Create filters data
      HiddenSheet.Range("A1:A2").Value = Application.Transpose(Array("Key", "A"))
      With HiddenSheet.Range("A4").Resize(iNumRow, iNumCol)
        'Dump data to sheet
        .Value = arr
        
        'Call advanced filter
        Call .AdvancedFilter(xlFilterInPlace, HiddenSheet.Range("A1:A2"))
        
        'Get result
        .Resize(, 1).Copy HiddenSheet.Range("D4")
        vResult = HiddenSheet.Range("D4").CurrentRegion.Value
      End With
      
      'Cleanup
      HiddenSheet.ShowAllData
      HiddenSheet.UsedRange.Clear
    End With
    
    'Use advanced filter, extract results by looping over the range areas
    Dim vResult2() As Variant
    With stdPerformance.Measure("#2 Advanced filter and areas loop - array result")
      'get dimensions
      iNumRow = UBound(arr, 1) - LBound(arr, 1) + 1
      iNumCol = UBound(arr, 2) - LBound(arr, 2) + 1
      
      'Store capacity for at least iNumRow items in result
      ReDim vResult2(1 To iNumRow)
      
      'Create filters data
      HiddenSheet.Range("A1:A2").Value = Application.Transpose(Array("Key", "A"))
      With HiddenSheet.Range("A4").Resize(iNumRow, iNumCol)
        'Set data and call filter
        .Value = arr
        Call .AdvancedFilter(xlFilterInPlace, HiddenSheet.Range("A1:A2"))
        
        'Loop over all visible cells and dump data to array
        Dim rArea As Range, vArea As Variant, i As Long, iRes As Long
        iRes = 0
        For Each rArea In .Resize(, 1).SpecialCells(xlCellTypeVisible).Areas
          vArea = rArea.Value
          If rArea.CountLarge = 1 Then
            iRes = iRes + 1
            vResult2(iRes) = vArea
          Else
            For i = 1 To UBound(vArea, 1)
              iRes = iRes + 1
              vResult2(iRes) = vArea(i, 1)
            Next
          End If
        Next
        
        'Trim size of array to total number of inserted elements
        ReDim Preserve vResult2(1 To iRes)
      End With
      
      'Cleanup
      HiddenSheet.ShowAllData
      HiddenSheet.UsedRange.Clear
    End With
    
    'Use a VBA filter
    Dim vResult3() As Variant: iRes = 0
    With stdPerformance.Measure("#3 Array filter - array result")
      'Get total row count
      iNumRow = UBound(arr, 1) - LBound(arr, 1) + 1
      
      'Make result at least the same number of rows as the base data (We can't return more data than rows in our source data)
      ReDim vResult3(1 To iNumRow)
      
      'Loop over rows, filter condition and assign to result
      For i = 1 To iNumRow
        If arr(i, 2) = "A" Then
          iRes = iRes + 1
          vResult3(iRes) = arr(i, 1)
        End If
      Next
      
      'Trim array to total result size
      ReDim Preserve vResult3(1 To iRes)
    End With
    
    'Use a VBA filter - return a collection
    'This algorithm is much the same as the above, however we simply add results to a collection instead of to an array. Collections are generally a fast way to have dynamic sizing data.
    Dim cResult As Collection
    With stdPerformance.Measure("#4 Array filter - collection result")
      Set cResult = New Collection
      iNumRow = UBound(arr, 1) - LBound(arr, 1) + 1
      For i = 1 To iNumRow
        If arr(i, 2) = "A" Then
          cResult.Add arr(i, 1)
        End If
      Next
    End With
  End With
End Sub

RESULTS

Sancarn 2007/32 2010/32 2019/64
66% 12% 1610% 82%

In reality the time taken for the array/collection methods isn't 0ms for either array or collection, but the calculation was so fast it is impossible to know how much faster it is on such a small amount of data.



S11c) Sheet to Array

In our final scenario, we have some data in Sheet1 and we'd like to obtain a copy of this data in an array which we require for further processing.

In this case what is faster? Using AdvancedFilter? Or using pure array logic?

Source code
Sub scenario3()
  'Initialisation
  Dim arr As Variant: arr = getArray(1000000)
  Dim iNumRow As Long: iNumRow = UBound(arr, 1) - LBound(arr, 1) + 1
  Dim iNumCol As Long: iNumCol = UBound(arr, 2) - LBound(arr, 2) + 1
  Sheet1.Range("A1").Resize(iNumRow, iNumCol).Value = arr
  
  With stdPerformance.Optimise()
    'Use advanced filter, copy result and paste to new location. Use range.currentRegion.value to obtain result
    Dim vResult1
    With stdPerformance.Measure("#1 Advanced filter and copy")
      'Choose output headers
      HiddenSheet.Range("A4:B4").Value = Array("ID", "Key")
      
      'Choose filters
      HiddenSheet.Range("A1:A2").Value = Application.Transpose(Array("Key", "A"))
      
      'Call advanced filter
      With Sheet1.Range("A1").CurrentRegion
        Call .AdvancedFilter(xlFilterCopy, HiddenSheet.Range("A1:A2"), HiddenSheet.Range("A4:B4"))
      End With
      
      'Obtain results
      vResult1 = HiddenSheet.Range("A4").CurrentRegion.Value
      
      'Cleanup
      HiddenSheet.UsedRange.Clear
    End With
    
    'Array
    'Copy data from sheet into an array,
    'loop through rows, move filtered rows to top of array,
    'in this scenario remove value from any row which isn't required.
    Dim vResult2
    With stdPerformance.Measure("#2 Use of array")
      vResult2 = Sheet1.Range("A1").CurrentRegion.Value
      Dim iRowLen As Long: iRowLen = UBound(vResult2, 1)
      Dim iColLen As Long: iColLen = UBound(vResult2, 2)
      
      Dim i As Long, j As Long, iRet As Long
      iRet = 1
      For i = 2 To iRowLen
        If vResult2(i, 2) = "A" Then
          iRet = iRet + 1
          If iRet < i Then
            For j = 1 To iColLen
              vResult2(iRet, j) = vResult2(i, j)
              vResult2(i, j) = Empty
            Next
          End If
        Else
          vResult2(i, 1) = Empty
          vResult2(i, 2) = Empty
        End If
      Next
    End With
  End With
End Sub

RESULTS

Sancarn 2007/32 2010/32 2019/64
13% -74% -80% -29%

In summary, use of AdvancedFilters and arrays are fairly comparable. Use advanced filter if you are trying to keep all data within Excel and you don't need the data in VBA afterwards. However whether you've got the data in VBA to begin with, or you need the data in VBA afterwards, using an array based filter mechanism is preferable.



S12) DLLs vs VBA Libraries

We often think of compiled C libraries from being faster alternatives to pure VBA code, and in some circumstances this is definitely the case. However in this test we are going to compare the following 2 subs:

Source code
Public Declare PtrSafe Sub VariantCopy Lib "oleaut32.dll" (ByRef pvargDest As Variant, ByRef pvargSrc As Variant)

Public Sub VariantCopyVBA(ByRef v1, ByVal v2)
  If isObject(v2) Then
    Set v1 = v2
  Else
    v1 = v2
  End If
End Sub

Generally speaking, these 2 subs perform the same task. They copy the data within one variant, to another variant. Let's compare their performance:

Sub testVariantCopy()
  Const C_MAX As Long = 1000000
  Dim v1, v2
  With stdPerformance.Optimise
    With stdPerformance.Measure("#1 DLL", C_MAX)
      For i = 1 To C_MAX
        Call VariantCopy(v1, v2)
      Next
    End With
    With stdPerformance.Measure("#2 VBA", C_MAX)
      For i = 1 To C_MAX
        Call VariantCopyVBA(v1, v2)
      Next
    End With
  End With
End Sub

RESULTS

So this shows that there is likely some overhead involved when executing DLL functions, which causes them to be significantly slower than homebrew function calls.

Sancarn 2007/32 2010/32 2019/64
-93% 742% 245% -95%

Given the slow speeds it is likely that this has something to do with either:

  1. Lack of caching on DLL load
  2. Lack of caching of function address (via GetProcAddress()).
  3. Preparing the arguments of the function call.


S13) Helper functions

Helper functions are often used to make code look clean, but do they have a performance impact? In order to test, the following 10 functions will be called:

Source code
Function Help1() As Boolean
  Help1 = True
End Function
Function Help2() As Boolean
  Help2 = Help1()
End Function
Function Help3() As Boolean
  Help3 = Help2()
End Function
Function Help4() As Boolean
  Help4 = Help3()
End Function
Function Help5() As Boolean
  Help5 = Help4()
End Function
Function Help6() As Boolean
  Help6 = Help5()
End Function
Function Help7() As Boolean
  Help7 = Help6()
End Function
Function Help8() As Boolean
  Help8 = Help7()
End Function
Function Help9() As Boolean
  Help9 = Help8()
End Function
Function Help10() As Boolean
  Help10 = Help9()
End Function

The test itself looks like this:

Const C_MAX As Long = 1000000
Dim v As Boolean

With stdPerformance.Measure("Help0")
  For i = 1 To C_MAX
    v = True
  Next
End With
With stdPerformance.Measure("Help1", 1 * C_MAX)
  For i = 1 To C_MAX
    v = Help1()
  Next
End With
With stdPerformance.Measure("Help2", 2 * C_MAX)
  For i = 1 To C_MAX
    v = Help2()
  Next
End With
With stdPerformance.Measure("Help3", 3 * C_MAX)
  For i = 1 To C_MAX
    v = Help3()
  Next
End With
...

RESULTS

In this particular scenario it appears that 1-10 were effected by the same degree of delay, and each additional function call adds an additional ~0.03µs delay to the function. In general this isn't something to worry about, however if you are trying to make the most optimal code, it is something to consider.

Help0:   15 ms (0.015µs per operation)
Help1:   31 ms (0.031µs per operation)
Help2:   63 ms (0.031µs per operation)
Help3:   94 ms (0.031µs per operation)
Help4:  141 ms (0.035µs per operation)
Help5:  172 ms (0.034µs per operation)
Help6:  188 ms (0.031µs per operation)
Help7:  218 ms (0.031µs per operation)
Help8:  266 ms (0.033µs per operation)
Help9:  297 ms (0.033µs per operation)
Help10: 312 ms (0.031µs per operation)


D) Data structures

D1) Dictionary (Optimising data lookups)

Dictionaries are often used to optimise the lookup of data, as they provide an optimal means of doing these kinds of lookups. In this scenario we choose between 4 separate scenarios. Including using a dictionary.

Source code
Sub dictionaryTest()
  'Arrays vs dictionary
  Const C_MAX As Long = 5000000
  Dim arr: arr = getArray(C_MAX)
  Dim arrLookup: arrLookup = getLookupArray()
  Dim dictLookup: Set dictLookup = getLookupDict()
  Dim dictLookup2 As Dictionary: Set dictLookup2 = dictLookup
  Dim i As Long, j As Long, iVal As Long
  
  With stdPerformance.Optimise
    With stdPerformance.Measure("#1 Lookup in array - Naieve approach", C_MAX)
      For i = 2 To C_MAX
        'Lookup key in arrLookup
        For j = 1 To 5
          If arr(i, 2) = arrLookup(j, 1) Then
            iVal = arrLookup(j, 2)
            Exit For
          End If
        Next
      Next
    End With
    
    With stdPerformance.Measure("#2 Lookup in dictionary - late binding", C_MAX)
      For i = 2 To C_MAX
        'Lookup key in dict
        iVal = dictLookup(arr(i, 2))
      Next
    End With
    
    With stdPerformance.Measure("#3 Lookup in dictionary - early binding", C_MAX)
      For i = 2 To C_MAX
        'Lookup key in dict
        iVal = dictLookup2(arr(i, 2))
      Next
    End With
    
    With stdPerformance.Measure("#4 Generate through logic", C_MAX)
      For i = 2 To C_MAX
        'Generate value from key
        iVal = getLookupFromCalc(arr(i, 2))
      Next
    End With
    
    With stdPerformance.Measure("#5 Generate through logic direct", C_MAX)
      For i = 2 To C_MAX
        'Generate value from key
        Dim iChar As Long: iChar = Asc(arr(i, 2)) - 64
        iVal = iChar * 10
        If iChar = 5 Then iVal = 99
      Next
    End With
  End With
End Sub

'Obtain an array of data to C_MAX size.
'@param {Long} Max length of data
'@returns {Variant(nx2)} Returns an array of data 2 columns wide and n rows deep.
Public Function getArray(C_MAX As Long) As Variant
  Dim arr() As Variant
  ReDim arr(1 To C_MAX, 1 To 2)

  arr(1, 1) = "ID"
  arr(1, 2) = "Key"
  Dim i As Long
  For i = 2 To C_MAX
    'ID
    arr(i, 1) = i
    Select Case True
      Case i Mod 17 = 0: arr(i, 2) = "A"
      Case i Mod 13 = 0: arr(i, 2) = "B"
      Case i Mod 11 = 0: arr(i, 2) = "C"
      Case i Mod 7 = 0: arr(i, 2) = "D"
      Case Else
        arr(i, 2) = "E"
    End Select
  Next
  getArray = arr
End Function

'Obtain dictionary to lookup key,value pairs
Public Function getLookupDict() As Object
  Dim o As Object: Set o = CreateObject("Scripting.Dictionary")
  o("A") = 10: o("B") = 20: o("C") = 30: o("D") = 40: o("E") = 99
  Set getLookupDict = o
End Function

'Obtain array to lookup key,value pairs
Public Function getLookupArray() As Variant
  Dim arr()
  ReDim arr(1 To 5, 1 To 2)
  arr(1, 1) = "A": arr(1, 2) = 10
  arr(2, 1) = "B": arr(2, 2) = 20
  arr(3, 1) = "C": arr(3, 2) = 30
  arr(4, 1) = "D": arr(4, 2) = 40
  arr(5, 1) = "E": arr(5, 2) = 99
  getLookupArray = arr
End Function

'Obtain value from key
Public Function getLookupFromCalc(ByVal key As String) As Long
  Dim iChar As Long: iChar = Asc(key) - 64
  getLookupFromCalc = iChar * 10
  If iChar = 5 Then getLookupFromCalc = 99
End Function

RESULTS

From the below results we can see that the naieve approach of looping through the lookup array did not only produce more ugly code, but also produces significantly slower results. By comparison the dictionary was both clean and fast.

The other two tests were to show that actually generating the data directly using the available functions can often be faster than looking up data in a Dictionary. This may not be possible in a lot of situations however.

Finally #5 was included because it was surprising to me that including the function in the body of the loop would have that much of an impact on performance.

#1 Lookup in array: 2812 ms (0.5624µs per operation)
#2 Lookup in dictionary - late binding: 1047 ms (0.2094µs per operation)
#3 Lookup in dictionary - early binding: 360 ms (0.072µs per operation)
#4 Generate through logic: 1015 ms (0.203µs per operation)
#5 Generate through logic direct: 469 ms (0.0938µs per operation)

Lookup in array vs in dictionary (late binding):

Sancarn 2007/32 2010/32 2019/64
169% 287% 209% 210%

Lookup in dictionary (late binding) vs in dictionary (early binding):

Sancarn 2007/32 2010/32 2019/64
191% 18% 266% 100%

Generate through logic vs direct:

Sancarn 2007/32 2010/32 2019/64
116% 122% 126% 127%


A1) Advanced Array Manipulation

This section needs further work to include runnable examples instead of only explaining the concepts.

Further reading:

A1a) Resizing and Transposing an array

If you're really keen to ReDim an array lightning fast, you can change the dims of the array using CopyMemory and SafeArray struct

This at least provides a xd --> yd translation.

Resizing or translating an array becomes tough. Let's say we wanted to resize the following array:

a11,a12,a13
a21,a22,a23
a31,a32,a33

Internally this is represented as:

a11,a21,a31,a12,a22,a32,a13,a23,a33

If we want to add a column we simply add 3 elements onto the end:

a11,a21,a31,a12,a22,a32,a13,a23,a33,NEW14,NEW24,NEW34

this is why VBA is capable of ReDim-ing this bound, as it is really easy to do. If we want to add a row however we need to change the array as follows:

a11,a21,a31,NEW41,a12,a22,a32,NEW42,a13,a23,a33,NEW43

Pending implementation example...

In this case there is still some room for performance gain (we can use CopyMemory to copy UBound(arr,1) elements at a time into positions e.g. 1, 5, 9 in this case), but it's not so easy to achieve. This is likely one of the reasons why VBA doesn't support redim-ing of this bound.

If we want to Transpose the array we need to change it into a structure like:

a11,a12,a13,a21,a22,a23,a31,a32,a33

Transposing isn't really possible to optimise in the programming language domain. For this we'd need to optimise byte level machine code.

The most common ways to perform this operation fast is using LAPACK or CUDA SDK. P.S. this is what e.g. Python and FORTRAN use, and why they are so good at statistical programming.



ST) String operations

The simple concatenation of two strings (string3 = string1 & string2) usually works quite fast, but only up to the point where the number of concatenations is small. For example, if you want to store an array of doubles with 150 x 150 elements, you will have to convert it to a string and use concatenation, which takes a lot of time. To speed up the conversion of the array to a string, there is an internal function Join that takes a 1-based 1D array and converts it to a string.

Source code
stringContent = ""
With stdPerformance.Measure("#1 Concatenation " & C_MAX, C_MAX ^ 2)
    For i = 1 To C_MAX
        For j = 1 To C_MAX
            stringContent = stringContent & notBig2DArray(i, j) & ", "
        Next j
        stringContent = stringContent & vbCrLf
    Next i
End With

stringContent = ""
Dim tmpStrings() As Variant
ReDim tmpStrings(1 To C_MAX)
Dim tmpDoubles() As Variant
ReDim tmpDoubles(1 To C_MAX)
With stdPerformance.Measure("#2 Join " & C_MAX, C_MAX ^ 2)
    For i = 1 To C_MAX
        For j = 1 To C_MAX
            tmpDoubles(j) = notBig2DArray(i, j)
        Next j
        tmpStrings(i) = Join(tmpDoubles, ", ")
    Next i
    stringContent = Join(tmpStrings, vbCrLf)
End With

It should also be mentioned that the concatenation function is a single Excel function with exponential growth of execution time, whereas the Join function exhibits almost linear growth in execution time with the size of the array:

Comparison of measured Join and concatenation times as a function of square array

RESULTS

Performance gain of Join relative to concatenation for conversion of 140*140 matrix to a string:

Sancarn 2007/32 2010/32 2019/64
- 6033% 6731% 2594%


W) Writing to file

The majority of answers on StackOverflow regarding file writing in VBA recommend using FSO (FileSystemObject). In this chapter, we want to demonstrate that the internal VB functions Print and Put can accomplish the task more quickly. Each of test cases below stores a string with 2D matrix to a 44.6 Mb file. In all cases the files were written to slow SD card.

W1) FileSystemObject vs standard writing function (Print)

Source code
With stdPerformance.Measure("#1 Standard writing function", C_MAX)
    iFile = FreeFile
    Open Application.ActiveWorkbook.Path & "\" & fileName1 For Output As #iFile
    Print #iFile, strFileContent
    Close #iFile
End With

With stdPerformance.Measure("#2 FSO", C_MAX)
    Set oFile = fso.CreateTextFile(Application.ActiveWorkbook.Path & "\" & fileName2)
    oFile.WriteLine strFileContent
    oFile.Close
    Set fso = Nothing
    Set oFile = Nothing
End With

RESULTS

Sancarn 2007/32 2010/32 2019/64
- 34% -1% 56%


W2) FSO vs binary writing

Source code
With stdPerformance.Measure("#3 FSO", C_MAX)
    Set oFile = fso.CreateTextFile(Application.ActiveWorkbook.Path & "\" & fileName1)
    oFile.WriteLine strFileContent
    oFile.Close
    Set fso = Nothing
    Set oFile = Nothing
End With

With stdPerformance.Measure("#4 Binary writing function", C_MAX)
    iFile = FreeFile
    Open Application.ActiveWorkbook.Path & "\" & fileName2 For Binary As #iFile
    Put #iFile, , strFileContent
    Close #iFile
End With

RESULTS

Sancarn 2007/32 2010/32 2019/64
- 34% 28% 31%