Adding update / replace to fmerge #19

felixholub · 2018-02-14T10:35:25Z

Would it be possible to extend fmerge to allow for update or replace?

The text was updated successfully, but these errors were encountered:

sergiocorreia · 2018-02-14T11:08:01Z

Yes. It might involve a bit of Mata work, but I'm a bit pressed on time for the next month or so, so can't promise any update.

For reference (in case you want to try, or maybe for future me), I think it would involve changing line 381:

	// Check that variables don't exist yet
	msg = "{err}merge:  variable %s already exists in master dataset\n"
	for (i=1; i<=cols(deck); i++) {
		var = deck[i]
		if (_st_varindex(var) != .) {
			printf(msg, var)
			exit(108)
		}
	}

Instead of raising an error if the variable exists, when the -update- option is on, you would have to create a tempvar (st_tempvar()?) and then replace row i of varnames_num.

Then, after the Mata code finished running, something like replace original_var = tempvar if mi(original_var)

felixholub · 2018-02-14T12:26:45Z

Thanks for the explanation Sergio. It's nothing urgent, just something that I stumble upon every once in a while. Maybe I can use your hint to practice my Stata coding;-) 2018-02-14 12:08 GMT+01:00 Sergio Correia <notifications@github.com>:

…

Yes. It might involve a bit of Mata work, but I'm a bit pressed on time for the next month or so, so can't promise any update. For reference (in case you want to try, or maybe for future me), I think it would involve changing line 381 <https://github.com/sergiocorreia/ftools/blob/master/src/join.ado#381>: // Check that variables don't exist yet msg = "{err}merge: variable %s already exists in master dataset\n" for (i=1; i<=cols(deck); i++) { var = deck[i] if (_st_varindex(var) != .) { printf(msg, var) exit(108) } } Instead of raising an error if the variable exists, when the -update- option is on, you would have to create a tempvar (st_tempvar()?) and then replace row i of varnames_num. Then, after the Mata code finished running, something like replace original_var = tempvar if mi(original_var) — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#19 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AN3iDvtlGqVk2_79jyRMsvYiHTjr6yqVks5tUr6TgaJpZM4SFGbA> .

-- Felix Holub

aghaynes · 2019-06-07T09:58:37Z

mmerge has some really useful options (e.g. unmatched [unmatched observations to keep - none, both, master, using], umatch [for the case that variables in using are named different to master], uname [add a stuf to variables in using])

Is there any intent to add additional features to fmerge?

sergiocorreia · 2019-06-08T00:14:09Z

Hi Alan,

Can you explain a bit more what these options do? I installed mmerge from SSC but I'm not entirely sure of what unmatched() does that merge's keep() doesn't.

Regarding umatch() it can actually be done through the join command. I actually wrote fmerge as a wrapper to join (which has a more familiar syntax for me). For instance, suppose you have a panel of consumers (where t is the year identifier) and want to add some macro data from a dataset (where year is the year identifier)

With merge, you do:

rename t year
merge m:1 year using "annual_data", keepusing(gdp inflation)
rename year t

With join, you do:

join gdp inflation, from("annual_data") by(t=year)

(Note how the join syntax looks more like the collapse() one, and is more explicit about which variables get added)

aghaynes · 2019-07-03T07:14:20Z

Hi Sergio,

Ignore my message. You're completely correct - it's all possible with the other options. (the main advantage to mmerge is that its a bit more verbose in it's reporting)

I wasn't aware of join... I think i'll be looking into that a bit more - i have some quite large datasets which take merge/mmerge a long time to combine...

Thanks!!

luispfonseca · 2019-09-09T14:50:05Z

@aghaynes mentioned one which, as far as I can see, join doesn't do and could be potentially useful. The uname option allows adding a stub to the variable names of using data. This makes it easy to distinguish which variables were pre-existing and which are new, maybe for comparison.

sergiocorreia · 2019-09-09T15:10:21Z

Agree, that should be useful and simple to implement. That said, uname() doesn't seem like an esy-to-remember option, so maybe stub(), prefix() or sth like that?

luispfonseca · 2019-09-09T15:14:56Z

Yes, I agree. Either of these seem fine. stub seems to be commonly used, but I'd say prefix is more intuitive if you've never heard of it.

ArthurHowardMorris · 2020-11-12T08:26:09Z

For what it's worth prefix is the pattern used by frget in Stata 16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding update / replace to fmerge #19

Adding update / replace to fmerge #19

felixholub commented Feb 14, 2018

sergiocorreia commented Feb 14, 2018

felixholub commented Feb 14, 2018 via email

aghaynes commented Jun 7, 2019

sergiocorreia commented Jun 8, 2019

aghaynes commented Jul 3, 2019

luispfonseca commented Sep 9, 2019

sergiocorreia commented Sep 9, 2019

luispfonseca commented Sep 9, 2019

ArthurHowardMorris commented Nov 12, 2020

Adding update / replace to fmerge #19

Adding update / replace to fmerge #19

Comments

felixholub commented Feb 14, 2018

sergiocorreia commented Feb 14, 2018

felixholub commented Feb 14, 2018 via email

aghaynes commented Jun 7, 2019

sergiocorreia commented Jun 8, 2019

aghaynes commented Jul 3, 2019

luispfonseca commented Sep 9, 2019

sergiocorreia commented Sep 9, 2019

luispfonseca commented Sep 9, 2019

ArthurHowardMorris commented Nov 12, 2020