Skip to content

Conversation

@MarkDana
Copy link
Collaborator

Bugs solved:

  1. About cache refresh:
  • Manually refresh cache at every init. Added one refresh line at SkeletonDiscovery.py:L145 (for PC) and FCI.py:L615 (for FCI).
  • Deprecate hash(data.tobytes()) - it's slow. @chenweiDelight also mentions a faster hash(str(data)) - but let's just init refresh and use no hash.
  1. About passing cardinality chisq or gsq:
  • Added cardinalities and is_discrete at Fas.py:L10, and calculate cardinalities only once at SkeletonDiscovery.py:L165 (for PC) and FCI.py:L625 (for FCI). No need to np.max() every time.

TODO: still debugging:

Now PC results with/without fas (change PC.py:L71) respectively:

data (#nodes/#edges)
time (sec)
without fas time with fas time SHD(without fas, with fas)
cancer 5/4 0.009 0.009 0
earthquake 5/4 0.012 0.01 0
survey 6/6 0.014 0.015 0
asia 8/8 0.024 0.025 0
sachs 11/17 0.142 0.148 0
child 20/25 0.618 0.699 4
insurance 27/52 1.45 1.823 3
water 32/66 0.321 0.365 0
alarm 37/46 0.873 1.015 1
barley 48/84 3.466 4.95 0
hailfinder 56/66 0.938 1.588 0
hepar2 70/123 9.482 11.949 3
win95pts 76/112 3.381 4.504 0
andes 223/338 26.741 45.722 0

So two problems:

  1. Still small SHD difference at datasets e.g. child.
  2. Still time difference, e.g. andes 26s vs 45s - though before solving cardinalities issue, it's ~300s.
  3. See profiling stat: callings count on chisq is different on with/without fas.

@MarkDana
Copy link
Collaborator Author

Closed this pr - the two issues (cache for different data & pass cardinality to chisq/gsq) has already been covered in @chenweiDelight 's newest commit (732b9d1).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant