Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collaborate? #2

Open
nilshg opened this issue Mar 3, 2022 · 1 comment
Open

Collaborate? #2

nilshg opened this issue Mar 3, 2022 · 1 comment

Comments

@nilshg
Copy link

nilshg commented Mar 3, 2022

Hey, good to see someone else working on modern causal inference in Julia!

I'm the author of SynthControl and TreatmentPanels, two packages in a similar space.

With TreatmentPanels I'm trying to build a foundational "data prep" package which takes in a table and a treatment assignment and then constructs an object with a type which tells you whether the panel is balanced/unbalanced, single/multi-unit treatment, and whether the treatment is absorbing or switches on and off. It then provides functions to extract e.g. pre- and post-treatment outcomes, treatment periods and IDs of treated units etc.

In SynthControl I'm trying to pull together a bunch of recent methods in this space - starting from the most simple "just use all pretreatment outcomes" case to the classical Abadie/Diamond/Hainmueller and things like Synthetic Diff-in-Diff and Matrix Completion.

Finally I've also started implementing Sant'Anna/Zhao's DRDID, although that's not public yet (need to check licensing on that).

Maybe have a look at my stuff and see if any of it is useful or if you'd like to collaborate on anything!

@junyuan-chen
Copy link
Member

Hi, thanks for reaching out! Your work looks interesting.

Regarding handling the panel structure, I actually took a different approach that is similar to how GroupedArrays.jl works, which used to be a component in FixedEffects.jl. The key advantage of this approach is on performance. There is no need to repeatedly search through the data columns (findfirst inside for loops) for the positions of distinct combinations of treatment assignment and calendar time, etc. Instead, we first label each vector using a Dict in a way similar to how PooledArrays.jl works to obtain vectors consisting of positive integers assigned to each unique value within each column. With these transformed vectors, it is possible to do vector multiplications in a way such that the multiplication results will give us the "labels" for the distinct combinations. From that, the row indices of the distinct combinations can be obtained using a standard algorithm of grouping data with the help of Dict.

The above algorithm that I briefly outlines is something that I believe that has among the best performance for multi-level grouping (when more heavy machinery like multithreading is not needed) that is also relatively simple to implement. You might want to take a look at how I define findcell here if you are interested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants