Skewed Data Distributions and Homoscedasticity #81

vdemchenko3 · 2023-03-21T12:05:24Z

Hi,

I'm wondering what's the best approach for data that is highly right-skewed. Is it best to take a log transform of it to make it more "normal" or does DirectLiNGAM deal with skewed data? The causal graphs are substantially different if I take the log and then normalise the data compared to only normalising the data and keeping the skewed distribution. I couldn't find the implementations of Hyvarinen & Smith 2013 for skewed data.

Also, my understanding is that LiNGAM is specifically made for non-Gaussian distributions, but I'm a bit confused about how this impacts the adjacency matrix computation using linear regression since from my understanding non-Gaussian distributions violate homoscedasticity.

Any clarity on these two topics would be greatly appreciated!

sshimizu2006 · 2023-03-22T01:06:18Z

You don't have to take a log transform to make variables more normal. Non-Gaussianty itself does not necessarily violate homoscedasticity (constant variance).

vdemchenko3 · 2023-04-25T13:01:35Z

Hi,

Thank you for your reply!

What about scaling the data such that all variables are [0,1]? I've ran analyses both with scaling and not scaling finding significantly different DAGs.

sshimizu2006 · 2023-04-27T04:15:27Z

If you transform your data, the data-generating process will change. That would be the reason you get different results.

vdemchenko3 · 2023-04-27T09:03:02Z

I see so is the suggestion to not change the data at all (no minmax scaling, no log transforms) before running causal discovery?

sshimizu2006 · 2023-04-27T13:46:25Z

Well, my point is that it depends on the class of the data generation process you assume.

vdemchenko3 · 2023-04-27T14:04:32Z

Could you elaborate a bit on that? I'm mostly working with survey-type data where respondents answer various questions.

sshimizu2006 · 2023-04-27T22:05:02Z

Ok, well, my suggestion is that you can do log transforms if you find that previous works in your field do that, but it would be better not to do minmax scaling.

vdemchenko3 · 2023-04-28T08:44:19Z

Why is it better not to do minmax scaling?

sshimizu2006 · 2023-05-02T03:17:47Z

I don't have a strong reason. Just because I don't often see minmax scaling is used in the context of causal discovery. The point is that if you do some transformation and apply LiNGAM for example, it means that you are assuming a linear non-Gaussian model for the transformed data. It is necessary to think about the validity.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skewed Data Distributions and Homoscedasticity #81

Skewed Data Distributions and Homoscedasticity #81

vdemchenko3 commented Mar 21, 2023

sshimizu2006 commented Mar 22, 2023 •

edited

Loading

vdemchenko3 commented Apr 25, 2023

sshimizu2006 commented Apr 27, 2023 •

edited

Loading

vdemchenko3 commented Apr 27, 2023

sshimizu2006 commented Apr 27, 2023

vdemchenko3 commented Apr 27, 2023

sshimizu2006 commented Apr 27, 2023

vdemchenko3 commented Apr 28, 2023

sshimizu2006 commented May 2, 2023

Skewed Data Distributions and Homoscedasticity #81

Skewed Data Distributions and Homoscedasticity #81

Comments

vdemchenko3 commented Mar 21, 2023

sshimizu2006 commented Mar 22, 2023 • edited Loading

vdemchenko3 commented Apr 25, 2023

sshimizu2006 commented Apr 27, 2023 • edited Loading

vdemchenko3 commented Apr 27, 2023

sshimizu2006 commented Apr 27, 2023

vdemchenko3 commented Apr 27, 2023

sshimizu2006 commented Apr 27, 2023

vdemchenko3 commented Apr 28, 2023

sshimizu2006 commented May 2, 2023

sshimizu2006 commented Mar 22, 2023 •

edited

Loading

sshimizu2006 commented Apr 27, 2023 •

edited

Loading