Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Feature constructor optimization #5975

Merged
merged 3 commits into from
May 23, 2022

Conversation

ales-erjavec
Copy link
Contributor

Issue

Ref: #5911

Parallel (or alternative) to #5949

Description of changes

Avoid iteration over table rows, checks in inner loops, construction of Value instances.

Includes
  • Code changes
  • Tests
  • Documentation

@codecov
Copy link

codecov bot commented May 17, 2022

Codecov Report

Merging #5975 (92b2864) into master (369f2c3) will increase coverage by 0.02%.
The diff coverage is 98.90%.

@@            Coverage Diff             @@
##           master    #5975      +/-   ##
==========================================
+ Coverage   86.40%   86.43%   +0.02%     
==========================================
  Files         315      315              
  Lines       67155    67218      +63     
==========================================
+ Hits        58025    58097      +72     
+ Misses       9130     9121       -9     

@ales-erjavec
Copy link
Contributor Author

Benchmarked using

import statistics

import timeit
import numpy as np

from Orange.data import ContinuousVariable, Domain, Table
from Orange.widgets.data.owfeatureconstructor import ContinuousDescriptor, \
    construct_variables

N = 100_000
table = new_domain = None


def setup():
    global table, new_domain
    var_names = ['Var' + str(i) for i in range(100)]
    variables = [ContinuousVariable(name=var_name) for var_name in var_names]
    domain = Domain(variables)
    test_data = np.random.rand(N, len(variables))
    table = Table.from_numpy(domain, test_data)

    expr = 'Var0 + Var1'
    desc = ContinuousDescriptor(expr, expr, number_of_decimals=None)
    constr_vars = construct_variables([desc], table)
    new_domain = Domain(table.domain.attributes + tuple(constr_vars))


def run():
    table.transform(new_domain)


res = timeit.repeat(
    "run()", setup="setup()", globals=globals(), repeat=10, number=1
)
print(f"Avg: {statistics.mean(res):.4f}, Min: {min(res):.4f}, Std: {statistics.stdev(res):.4f}")

Before: Avg: 0.9430, Min: 0.9309, Std: 0.0086
After: Avg: 0.0451, Min: 0.0447, Std: 0.0002

@markotoplak
Copy link
Member

Wow, 20x speedup!

@ales-erjavec ales-erjavec force-pushed the feature-constructor-opt branch 2 times, most recently from 9424cfe to d133f43 Compare May 18, 2022 14:14
@VesnaT VesnaT merged commit 67e9629 into biolab:master May 23, 2022
@ales-erjavec ales-erjavec deleted the feature-constructor-opt branch April 26, 2024 09:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants