content.json

{"posts":[{"title":"【置顶】个人搜索网站","text":"搜索网站主页","link":"/2024/05/02/0-Homepage/"},{"title":"【置顶】Stata 外部命令安装","text":"常用的 Stata 外部命令 1 外部命令1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950* reghdfe 版本管理net install reghdfe, from(&quot;.\\reghdfe-6.12.4\\reghdfe&quot;) replacenet install ivreghdfe, from(&quot;.\\ivreghdfe-1.1.3&quot;) replacenet install ftools, from(&quot;.\\ftools-2.49.1&quot;)ssc install ivreg2, replacessc install corr2docx, replacessc install reg2docx, replacessc install psmatch2, replacessc install winsor2, replacessc install asreg, replacessc install ardl, replacessc install bdiff, replacessc install coefplot, replacessc install ebalance, replacessc install egenmore, replacessc install fs, replacessc install inlist2, replacessc install konfound, replacessc install mipolate, replacessc install moss, replacessc install overid, replacessc install openall, replacessc install psacalc, replace // 支持 reghdfe 的 psacalc2 去 Github 下载ssc install runby, replacessc install sicff, replacessc install ttable3, replacessc install weakivtest, replacessc install xtscc, replacessc install xtmipolateu, replacessc instal require, replacessc install summarizeby, replacessc install ranktest, replacessc install dpplot, replacessc install ivreg2h, replace // lewbel ivssc install factortest, replacessc install center, replacessc install bcuse, replacessc install indeplist, replacessc install matsort, replacessc install bacondecomp, replacessc install csdid, replacessc install drdid, replacessc install fuzzydid, replacessc install chowtest, replacessc install ereplace, replacessc install keeporder, replacessc install gtools, replacenet install st0085_2.pkg, replace // est 全家桶net install st0373.pkg, replace // 门槛回归，网上还有能加固定效应的资源，可以搜搜看 2 profile.do123456789101112131415161718set type double // 设定 generate 命令产生的新变量为双精度类型set memory 100m // 为 Stata 分配 100m 内存set matsize 2000 // 设定矩阵的维度为 2000x2000set scrollbufsize 500000 // 结果窗口中显示的行数上限set more off, perma // 关闭分页提示符// set timeout1 120sysdir set PLUS &quot;`c(sysdir_stata)'ado\\plus&quot; // 外部命令的存放位置* sysdir set PLUS &quot;`c(sysdir_stata)'ado\\plus_ok&quot; // 外部命令的存放位置sysdir set OLDPLACE &quot;`c(sysdir_stata)'ado\\personal&quot; // 自行编写的stata程序sysdir set PERSONAL &quot;`c(sysdir_stata)'ado\\personal&quot; // 个人文件夹位置// 增加搜索路径* adopath + &quot;`c(sysdir_stata)'ado\\personal\\r_regressions&quot;// 更改工作目录cd &quot;D:\\code\\Stata&quot;","link":"/2024/06/20/9-Stata%20%E5%A4%96%E9%83%A8%E5%91%BD%E4%BB%A4%E5%AE%89%E8%A3%85/"},{"title":"【置顶】GitHub加速和图片显示方法","text":"加速 GitHub 和解决图片无法显示的问题 1 复制仓库提供的域名 仅文章图片存于博客仓库，其他图片存于 GitHub 图床 参考 GitHub 用户的方法：https://github.com/maxiaof/github-hosts 把下述内容复制到 host 文件中（我也会在博客不定期更新网址） 12345678910111213141516171819202122232425262728293031323334353637383940414243#Github Hosts Start#Update Time: 2024-05-29#Project Address: https://github.com/maxiaof/github-hosts#Update URL: https://raw.githubusercontent.com/maxiaof/github-hosts/master/hosts140.82.113.25 alive.github.com140.82.113.25 live.github.com185.199.109.154 github.githubassets.com140.82.112.21 central.github.com185.199.109.133 desktop.githubusercontent.com185.199.110.153 assets-cdn.github.com185.199.108.133 camo.githubusercontent.com185.199.110.133 github.map.fastly.net146.75.121.194 github.global.ssl.fastly.net140.82.121.3 gist.github.com185.199.108.153 github.io20.205.243.166 github.com192.0.66.2 github.blog140.82.121.6 api.github.com185.199.110.133 raw.githubusercontent.com185.199.108.133 user-images.githubusercontent.com185.199.110.133 favicons.githubusercontent.com185.199.108.133 avatars5.githubusercontent.com185.199.111.133 avatars4.githubusercontent.com185.199.108.133 avatars3.githubusercontent.com185.199.111.133 avatars2.githubusercontent.com185.199.111.133 avatars1.githubusercontent.com185.199.111.133 avatars0.githubusercontent.com185.199.110.133 avatars.githubusercontent.com140.82.121.10 codeload.github.com3.5.25.88 github-cloud.s3.amazonaws.com3.5.25.88 github-com.s3.amazonaws.com52.216.138.67 github-production-release-asset-2e65be.s3.amazonaws.com54.231.163.217 github-production-user-asset-6210df.s3.amazonaws.com52.217.173.217 github-production-repository-file-5c1aeb.s3.amazonaws.com185.199.110.153 githubstatus.com140.82.113.17 github.community51.137.3.17 github.dev140.82.114.21 collector.github.com13.107.42.16 pipelines.actions.githubusercontent.com185.199.109.133 media.githubusercontent.com185.199.108.133 cloud.githubusercontent.com185.199.110.133 objects.githubusercontent.com#Github Hosts End 修改 hosts 文件（我才用了 utools 的 hosts 插件进行修改）： 修改后保存，然后在 cmd 处输入 1ipconfig /flushdns 然后即可快速读取图片。 2 手动处理如果发现复制过来不行，那可能是因为域名变了，这个时候就需要手动修改，主要是针对 github.com 和 raw.githubusercontent.com 进入网址：https://www.ip138.com ，输入搜索网址（以 raw.githubusercontent.com 为例），解析得到 把里面出现的域名复制到 hosts 文件中，然后刷新 DNS 即可（0.0.0.0 那个怪怪的，还是不要复制了） 12345185.199.111.133 raw.githubusercontent.com185.199.110.133 raw.githubusercontent.com185.199.109.133 raw.githubusercontent.com185.199.108.133 raw.githubusercontent.com182.43.124.6 raw.githubusercontent.com","link":"/2024/05/30/8-GitHub%E5%8A%A0%E9%80%9F%E5%92%8C%E5%9B%BE%E7%89%87%E6%98%BE%E7%A4%BA%E6%96%B9%E6%B3%95/"},{"title":"【经济学人】Is Xi  an AI doomer?","text":"经济学人英语阅读笔记 China’s elite is split over artificial intelligence. 中国精英对人工智能存在分歧。 be split over：A 在 B 上存在分歧。 IN JULY OF last year Henry Kissinger travelled to Beijing for the final time before his death. Among the messages he delivered to China’s ruler, Xi , was a warning about the catastrophic risks of artificial intelligence (AI). Since then American tech bosses and ex-government officials have quietly met with their Chinese counterparts in a series of informal meetings dubbed the Kissinger Dialogues. The conversations have focused in part on how to protect the world from the dangers of AI. On August 27th American and Chinese officials are expected to take up the subject (along with many others) when America’s national security advisor, Jake Sullivan, travels to Beijing. 去年7月，亨利·基辛格去世前最后一次前往北京。他向中国领导人习传达的信息之一是对人工智能（AI）灾难性风险的警告。自那以后，美国科技巨头和前政府官员在一系列被称为基辛格对话的非正式会议中悄悄会见了中国同行。对话部分集中在如何保护世界免受人工智能的危险。8月27日，当美国国家安全顾问杰克·沙利文访问北京时，美国和中国官员预计将讨论这个问题（以及许多其他问题）。 catastrophic risks: 灾难性风险。 informal meeting：非正式会议。 dub：把 … 戏称为；给 … 起绰号；把 … 称为。 take up the subject：讨论这个问题/主题。 Many in the tech world think that AI will come to match or surpass the cognitive abilities of humans. Some developers predict that artificial general intelligence (AGI) models will one day be able to learn, which could make them uncontrollable. Those who believe that, left unchecked, AI poses an existential risk to humanity are called “doomers”. They tend to advocate stricter regulations. On the other side are “accelerationists”, who stress AI’s potential to benefit humanity. 科技界的许多人认为人工智能将会赶上或超越人类的认知能力。一些开发人员预测，通用人工智能（AGI）模型有一天将能够学习，这可能会使它们变得不可控。那些认为，如果不加以控制，人工智能会给人类带来生存风险的人被称为“末日论者”。他们倾向于主张更严格的规定。另一边是“加速论者”，他们强调人工智能造福人类的潜力。 existential risk：生存风险。 doomer：末日论者。 accelerationists：加速论者。 Western accelerationists often argue that competition with Chinese developers, who are uninhibited by strong safeguards, is so fierce that the West cannot afford to slow down. The implication is that the debate in China is one-sided, with accelerationists having the most say over the regulatory environment. In fact, China has its own AI doomers—and they are increasingly influential. 西方加速论者经常辩称，与中国开发商的竞争如此激烈，西方不能放慢脚步，因为中国开发商不受强有力的保障措施的约束。言下之意，中国的争论是一边倒的，加速论者对监管环境最有发言权。事实上，中国有自己的人工智能毁灭者——而且他们的影响力越来越大。 be uninhibited by：不受 … 约束。 safeguard：保障措施。 one-sided：一边倒的。 have the most say over：… 在 … 上最有话语权。 Until recently China’s regulators have focused on the risk of rogue chatbots saying politically incorrect things about the Communist Party, rather than that of cutting-edge models slipping out of human control. In 2023 the government required developers to register their large language models. Algorithms are regularly marked on how well they comply with socialist values and whether they might “subvert state power”. The rules are also meant to prevent discrimination and leaks of customer data. But, in general, AI-safety regulations are light. Some of China’s more onerous restrictions were rescinded last year. 直到最近，中国监管机构一直关注聊天机器人（chatbots）对共产党说政治上不正确的话的风险，而不是尖端模型脱离人类控制的风险。2023年，政府要求开发者注册他们的大型语言模型。算法经常被标记为它们在多大程度上符合社会主义价值观，以及它们是否可能“颠覆国家政权”。这些规则还旨在防止歧视和客户数据泄露。但是，总的来说，人工智能安全法规很宽松。去年，中国取消了一些更严厉的限制。 cutting-edge：尖端的；前沿的。 slip out of：脱离 …。 subvert：颠覆。 be meant to：旨在。 onerous：繁重的；麻烦的；负有义务的；负有法律责任的。 rescind：撤回；废除。 China’s accelerationists want to keep things this way. Zhu Songchun, a party adviser and director of a state-backed programme to develop AGI, has argued that AI development is as important as the “Two Bombs, One Satellite” project, a Mao-era push to produce long-range nuclear weapons. Earlier this year Yin Hejun, the minister of science and technology, used an old party slogan to press for faster progress, writing that development, including in the field of AI, was China’s greatest source of security. Some economic policymakers warn that an over-zealous pursuit of safety will harm China’s competitiveness. 中国的加速主义者希望保持这种状态。党的顾问兼国家支持的通用人工智能发展项目主任朱松纯认为，人工智能的发展与“两弹一星”项目一样重要，“两弹一星”项目是毛泽东时代推动生产远程核武器的项目。今年早些时候，科技部部长阴和俊使用了一句古老的党口号来敦促加快进步，他写道，包括人工智能领域在内的发展是中国最大的安全源泉。一些经济政策制定者警告说，过度追求安全将损害中国的竞争力。 overzealous：过于热心的；激情过高的。 But the accelerationists are getting pushback from a clique of elite scientists with the Communist Party’s ear. Most prominent among them is Andrew Chi-Chih Yao, the only Chinese person to have won the Turing award for advances in computer science. In July Mr Yao said AI poses a greater existential risk to humans than nuclear or biological weapons. Zhang Ya-Qin, the former president of Baidu, a Chinese tech giant, and Xue Lan, the chair of the state’s expert committee on AI governance, also reckon that AI may threaten the human race. Yi Zeng of the Chinese Academy of Sciences believes that AGI models will eventually see humans as humans see ants. 但加速论者却遭到了共产党耳中的精英科学家集团的抵制。其中最著名的是姚期智，他是唯一一位因计算机科学进步而获得图灵奖的中国人。姚先生在七月表示，人工智能对人类构成的生存风险比核武器或生物武器更大。中国科技巨头百度前总裁张亚勤和国家人工智能治理专家委员会主任薛澜也认为人工智能可能威胁人类。中国科学院的曾毅认为，通用人工智能模型最终会像人类看到蚂蚁一样看到人类。 a clique of：一群。 reckon：猜想；估计 The influence of such arguments is increasingly on display. In March an international panel of experts meeting in Beijing called on researchers to kill models that appear to seek power or show signs of self-replication or deceit. A short time later the risks posed by AI, and how to control them, became a subject of study sessions for party leaders. A state body that funds scientific research has begun offering grants to researchers who study how to align AI with human values. State labs are doing increasingly advanced work in this domain. Private firms have been less active, but more of them have at least begun paying lip service to the risks of AI. 这些论点的影响力日益显现。今年 3 月，一个国际专家小组在北京召开会议，呼吁研究人员杀死那些似乎在寻求权力或表现出自我复制或欺骗迹象的模型。不久之后，人工智能带来的风险以及如何控制这些风险成为党的领导人研究会议的主题。一个资助科学研究的国家机构已经开始向研究如何使人工智能与人类价值观相一致的研究人员提供资助。国家实验室在这一领域的工作越来越先进。私营企业则不太活跃，但至少有更多的企业开始在口头上关注人工智能的风险。 be increasingly on display：日益凸显。 deceit：欺骗；虚伪。 pay lip service to：在口头上关注… The debate over how to approach the technology has led to a turf war between China’s regulators. The industry ministry has called attention to safety concerns, telling researchers to test models for threats to humans. But most of China’s securocrats see falling behind America as a bigger risk. The science ministry and state economic planners also favour faster development. A national AI law slated for this year quietly fell off the government’s work agenda in recent months because of these disagreements. The impasse was made plain on July 11th, when the official responsible for writing the AI law cautioned against prioritising either safety or expediency. 关于如何处理这项技术的争论引发了中国监管机构之间的 “地盘争夺战”。工业部呼吁关注安全问题，要求研究人员测试模型对人类的威胁。但大多数中国安全官员认为，落后于美国的风险更大。科学部和国家经济规划者也倾向于加快发展速度。由于存在这些分歧，最近几个月，原定于今年出台的国家人工智能法悄然退出了政府的工作日程。7 月 11 日，负责撰写人工智能法的官员警告说，不要把安全或权宜之计放在首位，这让僵局变得更加明显。 turf war：地盘争夺战 securocrat：安全官僚 slate：预定；计划；安排 impasse：僵局；死路 prioritise：优先考虑 expediency：权宜之计 The decision will ultimately come down to what Mr Xi thinks. In June he sent a letter to Mr Yao, praising his work on AI. In July, at a meeting of the party’s central committee called the “third plenum”, Mr Xi sent his clearest signal yet that he takes the doomers’ concerns seriously. The official report from the plenum listed AI risks alongside other big concerns, such as biohazards and natural disasters. For the first time it called for monitoring AI safety, a reference to the technology’s potential to endanger humans. The report may lead to new restrictions on AI-research activities. 这个决定最终将取决于习的想法。今年6月，他给姚先生写了一封信，赞扬了他在人工智能方面的工作。今年7月，在一次名为“三中全会”的党中央委员会会议上，习发出了迄今为止最明确的信号，表明他认真对待末日论者的担忧。全会的官方报告将人工智能风险与生物危害和自然灾害等其他重大问题一起列出。它首次呼吁监控人工智能的安全性，指的是该技术危及人类的潜力。该报告可能会导致对人工智能研究活动的新限制。 come down to：取决于 the party’s central committee：党中央委员会 third plenum：三中全会 alongside：和其他的一起 biohazrds：生物危害 More clues to Mr Xi’s thinking come from the study guide prepared for party cadres, which he is said to have personally edited. China should “abandon uninhibited growth that comes at the cost of sacrificing safety”, says the guide. Since AI will determine “the fate of all mankind”, it must always be controllable, it goes on. The document calls for regulation to be pre-emptive rather than reactive. 习思想的更多线索来自为党的干部准备的学习指南，据说是他亲自编辑的。该指南称，中国应该“放弃以牺牲安全为代价的无节制增长”。它继续说，既然人工智能将决定“全人类的命运”，它就必须始终是可控的。该文件呼吁监管是先发制人的，而不是被动的。 party cadre：党干部 uninhibited growth：无节制增长 at the cost of：以牺牲 … 为代价 pre-emptive：先发制人的 reactive：被动的 Safety gurus say that what matters is how these instructions are implemented. China will probably create an AI-safety institute to observe cutting-edge research, as America and Britain have done, says Matt Sheehan of the Carnegie Endowment for International Peace, a think-tank in Washington. Which department would oversee such an institute is an open question. For now Chinese officials are emphasising the need to share the responsibility of regulating AI and to improve co-ordination. 安全专家表示，重要的是如何执行这些指令。华盛顿智库卡内基国际和平基金会的马特·希恩表示，中国可能会建立一个人工智能安全研究所来观察前沿研究，就像美国和英国所做的那样。哪个部门将监督这样一个机构是一个悬而未决的问题。目前，中国官员强调需要分担监管人工智能的责任并改善协调。 guru：领域专家；领导者 think-tank：智库 oversee：监管 share the responsibility of：分担 … 的责任 If China does move ahead with efforts to restrict the most advanced AI research and development it will have gone further than any other big country. Mr Xi says he wants to “strengthen the governance of artificial-intelligence rules within the framework of the United Nations”. To do that China will have to work more closely with others. But America and its friends are still considering the issue. The debate between doomers and accelerationists, in China and elsewhere, is far from over. 如果中国确实继续努力限制最先进的人工智能研发，它将比任何其他大国走得更远。习表示，他希望“在联合国框架内加强人工智能规则的治理”。要做到这一点，中国必须与其他国家更加紧密地合作。但是美国及其朋友仍在考虑这个问题。在中国和其他地方，末日论者和加速论者之间的争论远未结束。 be far from over：… 远未结束","link":"/2024/08/26/12-eco1/"},{"title":"Stata 省份名称转换&#x2F;标准化命令","text":"该命令将省份名称进行标准化，并生成对应的行政编号，便于数据合并 1 命令格式1cnprov province_string, [name(自定义变量名)] 2 例子 数据生成 123456789clearinput str20 prov int year&quot;广东&quot; 2012&quot;浙江省&quot; 2015&quot;新疆&quot; 2019&quot;黑龙&quot; 2020&quot;广西省&quot; 2011&quot;北京&quot; 2023end 数据预览 cnprov 命令 1cnprov prov, name(new_prov) 操作结果 4 命令下载GitHub 仓库：https://github.com/codefoxs/Stata-personal 12345* Installnet install command, from(&quot;https://raw.githubusercontent.com/codefoxs/Stata-personal/main/cnprov/&quot;) replace* Versionwhich cnprov","link":"/2024/07/06/11-%E7%9C%81%E4%BB%BD%E5%90%8D%E7%A7%B0%E8%BD%AC%E6%8D%A2/"},{"title":"回归交互项的解读","text":"文章简单阐述了交互项的解读，以及其与机制、异质性、调节和分组回归的差异与联系 1 交互项的方程设定 —— 一个例子在机制或异质性分析中，交互项通常扮演着重要作用，即通过交互项的系数解读其经济含义。然而，我发现许多人对于如何解读交互项的数学统计含义都存有困难，更别说经济意义了，尤其是遇到双重差分的交互项（三重差分）时。更是一头雾水。因此，特撰此文以供讨论。 首先，我们以数字化转型 $DT$ 与企业融资约束 $FC$ 为例，构建如下基准模型：$$FC_{it} = \\alpha + \\beta DT_{it} + \\gamma’CONTROLS_{it} + \\delta_i + \\lambda_t + \\varepsilon_{it}$$如果不出意外的话，系数 $\\beta$ 应该显著为负，因为数字化转型能够降低企业所面临的融资约束问题，不妨假设系数 $\\beta = -0.1$。 接下来，我们假设信息不对称程度 $ASY$ 是数字化转型 $DT$ 降低企业融资约束 $FC$ 的机制变量（这里特别提一嘴，不再建议使用三步中介效应模型），那么在设定上，我们采用如下方程：$$FC_{it} = \\alpha + \\theta DT_{it} \\times ASY_{it} + \\beta DT_{it} + \\psi ASY_{it} + \\gamma’CONTROLS_{it} + \\delta_i + \\lambda_t + \\varepsilon_{it}$$在一般情况下，我们不能简单地将 $DT$ 与 $ASY$ 直接相乘（尽管不少文献是这么做的），更一般的，我们要将机制变量，也就是 $ASY$ 按照某种规则进行分组处理（例如全样本、行业-年份、城市-年份中位数等），从而生成一个 0-1 虚拟变量 $DASY$。$DASY = 0$ 代表信息不对称程度较低的那组，反之，则代表信息不对称较高的那组。上述模型即可改写为：$$FC_{it} = \\alpha + \\theta DT_{it} \\times DASY_{it} + \\phi DT_{it} + \\psi DASY_{it} + \\gamma’CONTROLS_{it} + \\delta_i + \\lambda_t + \\varepsilon_{it}$$那么，当信息不对称程度较高时，$DT$ 对 $FC$ 的边际效应为：$${\\frac{\\partial\\mathbb{E}[FC_{it} | DASY_{it} = 1]}{\\partial DT_{it}}} = \\theta + \\phi$$类似的，信息不对称程度较低时，有$${\\frac{\\partial\\mathbb{E}[FC_{it} | DASY_{it} = 0]}{\\partial DT_{it}}} = \\phi$$在本质上，交乘和分组时没区别的（当然，你想调节显著性的话还是有点区别的），比较上述两组边际效应可知，在信息不对称程度较高的一组，其边际效应比低组多 $\\theta$。 根据正常思路，数字化转型应当通过降低信息不对称程度，来缓解企业所面临的融资约束。因此，在信息不对称更高的一组，降低的效果应该更强，即 $\\theta$ 应当显著为负。只有这样，边际效应 $\\theta + \\phi$ 的绝对值才能大于边际效应 $\\phi$。 需要注意的是，如果数据是合理的，应该还要观察到系数（如果显著的话），$\\phi &lt; 0$，$\\psi &gt; 0$，因为数字化转型与融资约束的关系应当为负，信息不对称与融资约束关系应当为正。 2 机制的交互项解读 —— 更一般的规律通过上述分析，我们可以总结出更一般的规律，对于如下基准模型和交互模型：$$y_{it} = \\alpha + \\beta x_{it} + \\gamma’ controls_{it} + \\varepsilon_{it}$$ $${y_{it} = \\alpha + \\theta x_{it} \\times Dm_{it} + \\phi x_{it} + \\psi Dm_{it} + \\gamma’ controls_{it} + \\varepsilon_{it}}$$ 其中，$Dm$ 为机制变量所对应的 0-1 虚拟变量，$Dm = 1$ 为 $m$ 更高的那组。若 $m$ 为 $x$ 可影响的变量，且 $y$ 也受到 $m$ 影响，那么有如下关系（当然前提是符合理论逻辑，且系数显著）： $\\beta$ $\\theta$ 关系 - - $x$ 通过降低 $m$ 来降低 $y$ - + $x$ 通过增加 $m$ 来降低 $y$ + + $x$ 通过增加 $m$ 来增加 $y$ + - $x$ 通过降低 $m$ 来增加 $y$ 注意，这里提到 $m$ 必须是 $x$ 可影响的变量且可影响 $y$ 的变量才能用如上的方式进行解读。难道 $m$ 不能被 $x$ 影响就不能作为机制了吗？显然不是（但大多数情况是的）。例如“寻租效应”可以通过“是否国企（$SOE$）”和“是否政治关联（$PR$）”来生成交互变量，如果观察到国企或具有政治关联的企业有更强的效应，那么即可验证“寻租效应”是研究的一条机制（当然前提还是符合理论逻辑）。 3 机制和异质性的区别 —— 越来越模糊？现在一些文章不会刻意强调进行的交乘回归或者分组回归到底是机制还是异质性，只因二者的模型和方法过于相似，曾经我还见到过声称“交乘做机制，分组做异质性”、“机制和异质性只是分组方法不同”等评论，但事实上，所研究的到底是机制还是分组，更重要的是变量的选取和理论逻辑的合理性。 例如，在数字化转型和融资约束的研究中，学者们通常会探究企业规模、东西部地区、产权性质等对异质性作用。但是会有人把这些变量当作机制吗？显然不会，因为这类变量对于研究为什么数字化转型会降低融资约束只能起到一个拓展性的作用，无法解释其中的影响渠道。那么现在论文中关于机制、异质性、调节、分组、交乘等等究竟是什么关系呢？个人之见如下： 机制：探究“为什么 $x$ 会影响 $y$”这一问题，可以采用交乘、分组或直接对 $m$ 回归的方法进行检验（还有不建议的中介效应模型） 异质性：拓展 $x$ 对 $y$ 经济效应的研究，但无法解释为什么 $x$ 会影响 $y$，可以采用交乘和分组方法进行检验 调节：交乘和分组方法的统称，所谓调节效应就是类似于前文中提到的，两个边际效应的差值 $\\theta$ 交乘：通过设定类似于模型 ${y_{it} = \\alpha + \\theta x_{it} \\times Dm_{it} + \\phi x_{it} + \\psi Dm_{it} + \\gamma’ controls_{it} + \\varepsilon_{it}}$ 的方法对机制或异质性进行分组，观察交互项系数与基准回归系数得出结论 分组：通过一定的规则区分样本，然后分别对基准模型进行回归，比较两组系数差异得出结论 4 交乘的交乘 —— 还是建议分组吧如果想要通过交乘方法检验交互基准模型的机制，以数字化转型为例，我们探究规模交互下，信息不对称的机制作用$$\\begin{aligned}FC_{it} &amp;= \\alpha + \\sigma DT_{it} \\times DSMALL_{it} + \\phi_1 DT_{it} + \\phi_2 DSMALL_{it} + \\gamma’CONTROLS_{it} + \\delta_i + \\lambda_t + \\varepsilon_{it}\\end{aligned}$$ $$\\begin{aligned}FC_{it} &amp;= \\alpha + \\theta DT_{it} \\times DSMALL_{it} \\times DASY_{it}\\&amp;+ \\beta_1 DT_{it} \\times DSMALL_{it} + \\beta_2 DT_{it} \\times DASY_{it} + \\beta_3 DASY_{it} \\times DSMALL_{it}\\&amp;+ \\psi_1 DT_{it} + \\psi_2 DSMALL_{it} + \\psi_3 DASY_{it} + \\gamma’CONTROLS_{it} + \\delta_i + \\lambda_t + \\varepsilon_{it}\\end{aligned}$$ 其中，$DSMALL = 1$ 代表企业规模低于中位数的企业。 是不是已经开始眼花缭乱了？有人会质疑，那些单项有必要放吗？只放第一项 $DT_{it} \\times DSMALL_{it} \\times DASY_{it}$ 难道不可以？很遗憾，从计量的角度来看，这样操作会导致严重的遗漏变量问题，因为模型没有把 $DT$、$DASY$ 等作用从交互项中剥离出来，所估计的系数必然有偏。 那么如何解读呢？事实上还是关注交互项系数就好。首先看基准回归，数字化转型对于中小企业的作用应当更大，所以系数 $\\sigma$ 应该显著为负，系数 $\\phi_1$ 也显著为负。 来到交互模型这里，如果想要验证数字化转型通过降低中小企业的信息不对称程度来实现缓解融资约束，那么交互项 $DT_{it} \\times DSMALL_{it} \\times DASY_{it}$ 的系数 $\\theta$ 应该显著为负，因为在信息不对称更高的一组（$DASY = 1$），这种效应将会更强。 但，言归正传，模型设定上太过复杂，这种情况下可能分组会更好理解一些。","link":"/2024/07/06/10-%E5%9B%9E%E5%BD%92%E4%BA%A4%E4%BA%92%E9%A1%B9%E7%9A%84%E8%A7%A3%E8%AF%BB/"},{"title":"Lewbel 工具变量法","text":"该命令将 lewbel 检验和 ivreghdfe、reghdfe 结合起来，并补充异方差检验 1 命令格式1lewbel varlist [if] [in], Absorb(string) [VCE(string) CLuster(string) Z(string) BY(string) first keep opt(string)] absorb()：和reghdfe一样，放固定效应即可 vce() &amp; cluster()：聚类稳健标准误，二者选一个即可，如果是低版本的ivreghdfe建议采用cluster，格式参考reghdfe和ivreghdfe z()：指定使用的外生变量，可以是控制变量的子集，也可以是新的变量。没有指定的时候默认采用所有控制变量 by(): 指定计算中心化时的均值分组，默认采用全样本均值（正如 ivreg2h 所做的） first：报告第一阶段回归结果 keep：保留生成的工具变量，以 _g 结尾 opt()：其他自定义的ivreghdfe参数，估计也用不上 2 原理 估计方程 $$Y_{it} = \\alpha_0 + \\alpha_1 X + \\eta’Controls_{i, t} + \\delta_i + \\lambda_t + \\varepsilon_{i, t} \\tag{1}$$ 计算残差 $$X_{it} = \\beta_0 + \\gamma’Controls_{i, t} + \\delta_i + \\lambda_t + \\mu_{i, t} \\tag{2}$$ 从控制变量（包括固定效应中）选取一部分变量作为外生变量 $Z$，这里选取所有的 $Controls$ 作为外生变量 $Z$​ 将向量 $Z$ 中所有变量均减去自身的全样本均值，然后乘以方程（2）的残差估计值，即：$Z_{IV} = (Z - \\overline{Z}) \\times \\hat{\\mu}_{it}$ 2SLS 把 $Z_{IV}$ 和 $Controls$​ 作为工具变量对方程（1）进行二阶段回归，即 第一阶段$$X_{it} = \\theta_0 + \\theta_1 Z_{IV} + \\Phi’Controls_{i, t} + \\delta_i + \\lambda_t + \\sigma_{i, t} \\tag{3}$$根据（3）的拟合值，估计第二阶段$$Y_{it} = \\psi_0 + \\psi_1 \\hat{X} + \\Omega’Controls_{i, t} + \\delta_i + \\lambda_t + \\epsilon_{i, t} \\tag{4}$$ 3 例子1234567891011121314151617181920212223242526272829303132use &quot;lewbel test.dta&quot;, clearxtset id yeargen Ind_year = string(Industry) + &quot; $ &quot; + string(year)* Use lewbel commandlewbel y x1 x2-x9 , a(Country Ind_year) cl(Ind_year) keep firstest store m3* First stagereghdfe x1 x2_g-x9_g x2-x9, a(Country Ind_year) cl(Ind_year)est store m1* Predictqui predict x1_p* Second stagereghdfe y x1_p x2-x9, a(Country Ind_year) cl(Ind_year)est store m2reg2docx m1 m2 m3 using &quot;lewbel.docx&quot;, replace /// b(%20.4f) t(%20.4f) /// scalars(N(%20.0fc) r2_a(%20.4f) HetLM(%20.4f) HetLMp(%20.4f) /// Kleibergen_Paap_rk_LM(%20.4f) Kleibergen_Paap_rk_LM_p(%20.4f) /// Cragg_Donald_Wald_F(%20.4f) Kleibergen_Paap_rk_Wald_F(%20.4f) /// Hansen_J(%20.4f) Hansen_J_p(%20.4f)) /// order(x1 x1_p) /// addfe(&quot;Country=YES&quot; &quot;Industry*Year=YES&quot;) /// mtitles() /// font(&quot;Times New Roman&quot;, 6.5) /// margin(top, 3.17cm) margin(bottom, 3.17cm) Stata 输出结果： 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269. lewbel y x1 $controls , a(Country Ind_year) cl(Ind_year) keep first*=============================================================================** Part 1 Heteroskedasticity test **=============================================================================*Breusch–Pagan test for heteroskedasticityH0: Constant variance Chi2(2651) = 4792.7716 Prob &gt; chi2 = 0.0000*=============================================================================** Part 2 2SLS regression **=============================================================================*(MWFE estimator converged in 8 iterations)First-stage regressions-----------------------First-stage regression of x1:Statistics robust to heteroskedasticity and clustering on Ind_yearNumber of obs = 20537Number of clusters (Ind_year) = 2629------------------------------------------------------------------------------ | Robust x1 | Coefficient std. err. t P&gt;|t| [95% conf. interval]-------------+---------------------------------------------------------------- x2_g | .3403562 .030574 11.13 0.000 .2804287 .4002837 x3_g | .0209829 .0210201 1.00 0.318 -.0202182 .0621839 x4_g | -.1824731 .0274428 -6.65 0.000 -.2362632 -.128683 x5_g | -.0202362 .0187698 -1.08 0.281 -.0570265 .0165541 x6_g | .0300076 .0200092 1.50 0.134 -.0092121 .0692273 x7_g | .0264974 .0227246 1.17 0.244 -.0180446 .0710393 x8_g | .0312092 .0336754 0.93 0.354 -.0347972 .0972156 x9_g | .1917831 .0199648 9.61 0.000 .1526505 .2309157 x2 | .5211517 .0056266 92.62 0.000 .5101231 .5321804 x3 | .0167608 .0049286 3.40 0.001 .0071003 .0264213 x4 | .0049114 .0058779 0.84 0.403 -.0066098 .0164326 x5 | .0670301 .0052558 12.75 0.000 .0567283 .0773319 x6 | -.0271468 .0044682 -6.08 0.000 -.0359047 -.0183888 x7 | .0478391 .0057346 8.34 0.000 .0365988 .0590794 x8 | -.0415018 .0050295 -8.25 0.000 -.0513599 -.0316436 x9 | -.0220721 .0042422 -5.20 0.000 -.0303873 -.013757------------------------------------------------------------------------------F test of excluded instruments: F( 8, 2628) = 78.98 Prob &gt; F = 0.0000Sanderson-Windmeijer multivariate F test of excluded instruments: F( 8, 2628) = 78.98 Prob &gt; F = 0.0000Summary results for first-stage regressions------------------------------------------- (Underid) (Weak id)Variable | F( 8, 2628) P-val | SW Chi-sq( 8) P-val | SW F( 8, 2628)x1 | 78.98 0.0000 | 633.01 0.0000 | 78.98NB: first-stage test statistics cluster-robustStock-Yogo weak ID F test critical values for single endogenous regressor: 5% maximal IV relative bias 20.25 10% maximal IV relative bias 11.39 20% maximal IV relative bias 6.69 30% maximal IV relative bias 4.99 10% maximal IV size 33.84 15% maximal IV size 18.54 20% maximal IV size 13.24 25% maximal IV size 10.50Source: Stock-Yogo (2005). Reproduced by permission.NB: Critical values are for i.i.d. errors only.Underidentification testHo: matrix of reduced form coefficients has rank=K1-1 (underidentified)Ha: matrix has rank=K1 (identified)Kleibergen-Paap rk LM statistic Chi-sq(8)=111.38 P-val=0.0000Weak identification testHo: equation is weakly identifiedCragg-Donald Wald F statistic 455.55Kleibergen-Paap Wald rk F statistic 78.98Stock-Yogo weak ID test critical values for K1=1 and L1=8: 5% maximal IV relative bias 20.25 10% maximal IV relative bias 11.39 20% maximal IV relative bias 6.69 30% maximal IV relative bias 4.99 10% maximal IV size 33.84 15% maximal IV size 18.54 20% maximal IV size 13.24 25% maximal IV size 10.50Source: Stock-Yogo (2005). Reproduced by permission.NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.Weak-instrument-robust inferenceTests of joint significance of endogenous regressors B1 in main equationHo: B1=0 and orthogonality conditions are validAnderson-Rubin Wald test F(8,2628)= 4.74 P-val=0.0000Anderson-Rubin Wald test Chi-sq(8)= 38.02 P-val=0.0000Stock-Wright LM S statistic Chi-sq(8)= 38.09 P-val=0.0000NB: Underidentification, weak identification and weak-identification-robust test statistics cluster-robustNumber of clusters N_clust = 2629Number of observations N = 20537Number of regressors K = 9Number of endogenous regressors K1 = 1Number of instruments L = 16Number of excluded instruments L1 = 8IV (2SLS) estimation--------------------Estimates efficient for homoskedasticity onlyStatistics robust to heteroskedasticity and clustering on Ind_yearNumber of clusters (Ind_year) = 2629 Number of obs = 20537 F( 9, 2628) = 422.69 Prob &gt; F = 0.0000Total (centered) SS = 14633.8475 Centered R2 = 0.2665Total (uncentered) SS = 14633.8475 Uncentered R2 = 0.2665Residual SS = 10733.40994 Root MSE = .7233------------------------------------------------------------------------------ | Robust y | Coefficient std. err. t P&gt;|t| [95% conf. interval]-------------+---------------------------------------------------------------- x1 | -.0958878 .0328708 -2.92 0.004 -.1603431 -.0314325 x2 | .3531729 .0213118 16.57 0.000 .3113833 .3949625 x3 | -.0767336 .007376 -10.40 0.000 -.0911971 -.0622702 x4 | -.0302413 .0083046 -3.64 0.000 -.0465255 -.0139571 x5 | .0566474 .0073346 7.72 0.000 .0422652 .0710295 x6 | -.134026 .0080818 -16.58 0.000 -.1498734 -.1181787 x7 | .0218564 .0087345 2.50 0.012 .0047292 .0389836 x8 | .042576 .0087935 4.84 0.000 .0253332 .0598188 x9 | -.3142137 .0076184 -41.24 0.000 -.3291524 -.299275------------------------------------------------------------------------------Underidentification test (Kleibergen-Paap rk LM statistic): 111.379 Chi-sq(8) P-val = 0.0000------------------------------------------------------------------------------Weak identification test (Cragg-Donald Wald F statistic): 455.554 (Kleibergen-Paap rk Wald F statistic): 78.984Stock-Yogo weak ID test critical values: 5% maximal IV relative bias 20.25 10% maximal IV relative bias 11.39 20% maximal IV relative bias 6.69 30% maximal IV relative bias 4.99 10% maximal IV size 33.84 15% maximal IV size 18.54 20% maximal IV size 13.24 25% maximal IV size 10.50Source: Stock-Yogo (2005). Reproduced by permission.NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.------------------------------------------------------------------------------Hansen J statistic (overidentification test of all instruments): 30.013 Chi-sq(7) P-val = 0.0001------------------------------------------------------------------------------Instrumented: x1Included instruments: x2 x3 x4 x5 x6 x7 x8 x9Excluded instruments: x2_g x3_g x4_g x5_g x6_g x7_g x8_g x9_gPartialled-out: _cons nb: total SS, model F and R2s are after partialling-out; any small-sample adjustments include partialled-out variables in regressor count K------------------------------------------------------------------------------Absorbed degrees of freedom:-----------------------------------------------------+ Absorbed FE | Categories - Redundant = Num. Coefs |-------------+---------------------------------------| Country | 15 1 14 | Ind_year | 2629 2629 0 *|-----------------------------------------------------+* = FE nested within cluster; treated as redundant for DoF computation. est store m3. . * First stage. reghdfe x1 x2_g-x9_g x2-x9, a(Country Ind_year) cl(Ind_year)(MWFE estimator converged in 8 iterations)HDFE Linear regression Number of obs = 20,537Absorbing 2 HDFE groups F( 16, 2628) = 909.28Statistics robust to heteroskedasticity Prob &gt; F = 0.0000 R-squared = 0.8245 Adj R-squared = 0.7984 Within R-sq. = 0.5411Number of clusters (Ind_year) = 2,629 Root MSE = 0.4494 (Std. err. adjusted for 2,629 clusters in Ind_year)------------------------------------------------------------------------------ | Robust x1 | Coefficient std. err. t P&gt;|t| [95% conf. interval]-------------+---------------------------------------------------------------- x2_g | .3403562 .0305747 11.13 0.000 .2804032 .4003092 x3_g | .0209829 .0210206 1.00 0.318 -.0202358 .0622015 x4_g | -.1824731 .0274435 -6.65 0.000 -.2362861 -.12866 x5_g | -.0202362 .0187703 -1.08 0.281 -.0570422 .0165698 x6_g | .0300076 .0200097 1.50 0.134 -.0092288 .069244 x7_g | .0264974 .0227251 1.17 0.244 -.0180636 .0710583 x8_g | .0312092 .0336762 0.93 0.354 -.0348254 .0972437 x9_g | .1917831 .0199653 9.61 0.000 .1526338 .2309324 x2 | .5211517 .0056268 92.62 0.000 .5101184 .5321851 x3 | .0167608 .0049288 3.40 0.001 .0070962 .0264255 x4 | .0049114 .0058781 0.84 0.403 -.0066147 .0164375 x5 | .0670301 .0052559 12.75 0.000 .056724 .0773363 x6 | -.0271468 .0044683 -6.08 0.000 -.0359084 -.0183851 x7 | .0478391 .0057348 8.34 0.000 .036594 .0590842 x8 | -.0415018 .0050296 -8.25 0.000 -.0513641 -.0316394 x9 | -.0220721 .0042423 -5.20 0.000 -.0303908 -.0137535 _cons | -3.630724 .0837772 -43.34 0.000 -3.795 -3.466448------------------------------------------------------------------------------Absorbed degrees of freedom:-----------------------------------------------------+ Absorbed FE | Categories - Redundant = Num. Coefs |-------------+---------------------------------------| Country | 15 1 14 | Ind_year | 2629 2629 0 *|-----------------------------------------------------+* = FE nested within cluster; treated as redundant for DoF computation. est store m1. . * Predict. qui predict x1_p. . * Second stage. reghdfe y x1_p x2-x9, a(Country Ind_year) cl(Ind_year)(MWFE estimator converged in 8 iterations)HDFE Linear regression Number of obs = 20,537Absorbing 2 HDFE groups F( 9, 2628) = 423.63Statistics robust to heteroskedasticity Prob &gt; F = 0.0000 R-squared = 0.4743 Adj R-squared = 0.3964 Within R-sq. = 0.2652Number of clusters (Ind_year) = 2,629 Root MSE = 0.7754 (Std. err. adjusted for 2,629 clusters in Ind_year)------------------------------------------------------------------------------ | Robust y | Coefficient std. err. t P&gt;|t| [95% conf. interval]-------------+---------------------------------------------------------------- x1_p | -.0958878 .0339678 -2.82 0.005 -.1624942 -.0292814 x2 | .3531729 .0214141 16.49 0.000 .3111827 .3951632 x3 | -.0767336 .0073749 -10.40 0.000 -.0911947 -.0622725 x4 | -.0302413 .0082708 -3.66 0.000 -.0464593 -.0140233 x5 | .0566474 .007391 7.66 0.000 .0421546 .0711401 x6 | -.134026 .0080965 -16.55 0.000 -.1499023 -.1181498 x7 | .0218564 .0088265 2.48 0.013 .0045489 .0391639 x8 | .042576 .0087784 4.85 0.000 .0253627 .0597892 x9 | -.3142137 .0076214 -41.23 0.000 -.3291582 -.2992692 _cons | -3.162962 .208469 -15.17 0.000 -3.571741 -2.754182------------------------------------------------------------------------------Absorbed degrees of freedom:-----------------------------------------------------+ Absorbed FE | Categories - Redundant = Num. Coefs |-------------+---------------------------------------| Country | 15 1 14 | Ind_year | 2629 2629 0 *|-----------------------------------------------------+* = FE nested within cluster; treated as redundant for DoF computation 图表导出结果： 4 命令 &amp; 数据下载GitHub 仓库：https://github.com/codefoxs/Stata-personal 12345* Installnet install command, from(&quot;https://raw.githubusercontent.com/codefoxs/Stata-personal/main/lewbel/&quot;) replace* Versionwhich lewbel 5 参考文献 Lewbel, A. (2012). Using Heteroscedasticity to Identify and Estimate Mismeasured and Endogenous Regressor Models. Journal of Business & Economic Statistics, 30(1), 67–80. https://doi.org/10.1080/07350015.2012.643126","link":"/2024/05/17/3-Lewbel-%E5%B7%A5%E5%85%B7%E5%8F%98%E9%87%8F%E6%B3%95/"},{"title":"社科基金注意事项","text":"社科基金注意事项内容 1 申报事项 一是在反复打磨研究内容，提高研究深度上下功夫。也就是“2. [研究内容] ”这部分。此处要避免研究框架内容单薄的问题，这部分最好在1500字左右。此部分提升研究深度的办法：1. 如果是专著结项就列目录提纲（到二级目录），甚至对提纲的内容做解释。2. 有理论分析、理论基础。3. 在研究内容的各部分避免只是描述，没说自己研究的思路和方法。比如用：采用（借助）……，考察（厘定）……，依据……，提出……等等。4. 避免内容的过少，比如少于三个研究内容。 二是将研究内容和题目对照，在提升题目表达的准确性上下功夫。多看往年本学科立项的题目。社科类的研究题目上都应与时俱进。人文类研究可以思考怎么融入启示、当代价值等等，千万别牵强。 三是在理顺表达，弄清研究问题和逻辑上下功夫。第一段最重要。这里可以多用连接词梳理清楚背景、现实需求，存在的现实问题、学术问题和机遇，表达清楚研究的思路。 四是在段首句、主旨句，各部分连接段和连接句上下功夫。比如研究的学术价值，各个价值之间都应该有维度、排序有逻辑。 五是在前后一致性上下功夫。1. [选题依据]梳理的研究问题、研究不足是否和后面的研究内容形成了一致性。2. [研究内容]主要目标和思路框架是否形成了一致性等等，这些在材料中是否环环相扣、形成完整逻辑链。思路框架支撑立项依据的同时，服务于回答研究问题；研究目标呼应思路框架；研究基础等保证项目可行性。 六是在参考文献上下功夫。参考文献要注意以下几个维度和原则，一数量最好不少于14，二最新研究文献不少于4，三经典文献不少于2，四本学科内大咖文献不少于4，五本学科高地的同行文献、特别是承担过国家社科项目的同行文献不少于3，六中文文献放前面。 七是在预期成果上下功夫，最终成果常用合理的搭配：1论文集（4-7篇）。2专著（20万字以上，论文是阶段性成果）3研究报告（15万字以上，其中论文是阶段性成果） 八是在杜绝错别字和排版上下功夫。在杜绝敏感词、不规范表达上下功夫。杜绝不通顺、逻辑不清楚、错别字，敏感词，不专业表达，提升排版美观性。 九是在写活上下功夫。1多用连接词等高频词，2多用过度段过渡句主旨句段首句，3精炼语言，4适当加粗、变字体、加蓝等色彩。5注重各部分的详略得当。6在表述层次性和层次逻辑上下功夫。7.在用活动词上下功夫。8.在专业思路框架图上下功夫。 十是在提高研究基础表述，包装个人实力上下功夫。1适当标注期刊的索引源、影响因子、引用数、作者排序、获奖情况，甚至结项优秀和良好的国家级项目情况，避免出现一级出版社等表述。2适当加总结段，说清楚自己的研究专长、总体产出，甚至影响力等。3在本单位网站上发布个人简历。 十一是在避免万能句、无效表达上下功夫。特别是在研究计划部分，应杜绝不结合本研究的研究实际的万能研究计划。 十二、在框架思路上下功夫。 框架思路部分有很多模式模式： 其中一种最常用的模式，也是便于专家理解的模式是：思路在前，框架在后。 这种模式一般格式如下： [研究内容] 本课题的研究对象、主要目标、重点难点、研究计划及其可行性等。（框架思路要列出提纲或目录） …… 2.3框架思路 1.用简短的一段话说清楚思路、逻辑。 2.有思路框架图的附上思路框架图（不一定非要有，大都有，其中经管类应该有） 3.主要研究内容（是章节结构的，具体到二级目录） 其中，语言学，除了个别实验研究、数据建设等类别的项目，其他的研究都应该有目录。实验和数据库建设类，应当说出采集的数据类型、受试者的数量、采集的方式、数据处理的方案等。","link":"/2024/05/04/2-%E7%A4%BE%E7%A7%91%E5%9F%BA%E9%87%91%E6%B3%A8%E6%84%8F%E4%BA%8B%E9%A1%B9/"},{"title":"堆叠双重差分模型","text":"堆叠双重差分模型方法 1 堆叠DID（stacked DID）和多期DID（staggered DID）的区别1.1 Difference in data Setting Sample windows: 2001 - 2005 Adoption year stkcd Group type 1999 1 Always treated 2002 2 Early treated 2004 3 Late treated . 4 Never treated Staggered DID panel data (N = 4 $\\times$ 5 = 20) stkcd year y DID Treat Adoption year 1 2001 55 1 1 1999 1 2002 64 1 1 1999 1 2003 21 1 1 1999 1 2004 45 1 1 1999 1 2005 67 1 1 1999 2 2001 82 0 1 2002 2 2002 63 1 1 2002 2 2003 78 1 1 2002 2 2004 99 1 1 2002 2 2005 51 1 1 2002 3 2001 54 0 1 2004 3 2002 36 0 1 2004 3 2003 41 0 1 2004 3 2004 65 1 1 2004 3 2005 94 1 1 2004 4 2001 76 0 0 . 4 2002 37 0 0 . 4 2003 11 0 0 . 4 2004 76 0 0 . 4 2005 44 0 0 . Stacked DID panel data (N = 3 $\\times$ 10 = 30) stkcd year y DID Treat Adoption year in cohort Cohort 1 2001 55 1 1 1999 1 1 2002 64 1 1 1999 1 1 2003 21 1 1 1999 1 1 2004 45 1 1 1999 1 1 2005 67 1 1 1999 1 4 2001 76 0 0 . 1 4 2002 37 0 0 . 1 4 2003 11 0 0 . 1 4 2004 76 0 0 . 1 4 2005 44 0 0 . 1 2 2001 82 0 1 2002 2 2 2002 63 1 1 2002 2 2 2003 78 1 1 2002 2 2 2004 99 1 1 2002 2 2 2005 51 1 1 2002 2 4 2001 76 0 0 . 2 4 2002 37 0 0 . 2 4 2003 11 0 0 . 2 4 2004 76 0 0 . 2 4 2005 44 0 0 . 2 3 2001 54 0 1 2004 3 3 2002 36 0 1 2004 3 3 2003 41 0 1 2004 3 3 2004 65 1 1 2004 3 3 2005 94 1 1 2004 3 4 2001 76 0 0 . 3 4 2002 37 0 0 . 3 4 2003 11 0 0 . 3 4 2004 76 0 0 . 3 4 2005 44 0 0 . 3 1.2 Difference in specification Staggered DID specification $$y_{it} = \\alpha + \\beta DID_{it} + \\eta’ Controls_{it} + \\delta_{i} + \\lambda_{t} + \\varepsilon_{it}$$ where $y_{it}$ is the outcome variable $DID_{it}$ is the event dummy variable $Controls_{it}$ is a vector contains a series of variables $\\delta_i$ and $\\lambda_t$ are firm and year fixed effect, respectively Stacked DID Reduced form $$y_{ict} = \\alpha + \\beta DID_{ict} + \\eta’ Controls_{ict} + \\delta_{ic} + \\lambda_{tc} + \\varepsilon_{ict}$$ where $c$ is the cohort of firm $i$ $\\delta_{ic}$ and $\\lambda_{tc}$ are firm-cohort and year-cohort interacted fixed effect, respectively Event study form $$y_{ict} = \\alpha + \\sum\\limits_{\\begin{array}{*{20}{c}}{- 3 \\leqslant k \\leqslant 3} \\{k \\ne - 1}\\end{array}}\\beta_k Treat_{ic} \\times I( t - A_c = k ) + \\eta’ Controls_{ict} + \\delta_{ic} + \\lambda_{tc} + \\varepsilon_{ict}$$ where $A_c$ is the adoption year of cohort $c$ $I()$ is an indicator function, equaling 1 when the inner equation holds or more specifically$$y_{ict} = \\alpha + Before3_{ict} + Before2_{ict} + Currenct_{ict} + After1_{ict} + After2_{ict} + After3_{ict} + \\eta’ Controls_{ict} + \\delta_{ic} + \\lambda_{tc} + \\varepsilon_{ict}$$ 2 偏误的来源 Two assumption for common trend Time-varying confounders must affect outcomes in both groups in the same way. -&gt; Time fixed effects Group-varying confounders must be time-invariant. -&gt; Group fixed effects Goodman-Bacon, A. (2021). Difference-in-differences with variation in treatment timing. Journal of Econometrics, 225(2): 254-277. Because the early group (bad control) is treated as a control group See more details in Staggered Adoption Designs and Stacked DID and Event Studies (Coady Wing, 2021) 3 Stata code3.1 Staggered DID code 12345cd &quot;D:\\code\\Stata\\stackeddid&quot;use &quot;demo.dta&quot;, clear* Staggered didreghdfe y did, a(stkcd year) vce(cl stkcd) output 1234567891011121314151617181920212223242526272829. reghdfe y did, a(stkcd year) vce(cl stkcd)(MWFE estimator converged in 2 iterations)HDFE Linear regression Number of obs = 20Absorbing 2 HDFE groups F( 1, 3) = 17.15Statistics robust to heteroskedasticity Prob &gt; F = 0.0256 R-squared = 0.5608 Adj R-squared = 0.1656 Within R-sq. = 0.0988Number of clusters (stkcd) = 4 Root MSE = 20.9805 (Std. err. adjusted for 4 clusters in stkcd)------------------------------------------------------------------------------ | Robust y | Coefficient std. err. t P&gt;|t| [95% conf. interval]-------------+---------------------------------------------------------------- did | 19.26923 4.65359 4.14 0.026 4.459429 34.07903 _cons | 47.35192 2.559475 18.50 0.000 39.20653 55.49731------------------------------------------------------------------------------Absorbed degrees of freedom:-----------------------------------------------------+ Absorbed FE | Categories - Redundant = Num. Coefs |-------------+---------------------------------------| stkcd | 4 4 0 *| year | 5 0 5 |-----------------------------------------------------+* = FE nested within cluster; treated as redundant for DoF computation 3.2 Stacked DID code 12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667cd &quot;D:\\code\\Stata\\stackeddid&quot;use &quot;demo.dta&quot;, clear* Stacked did// adoption year = 1999drop if adoptionyear == 2002drop if adoptionyear == 2004gen cohort = 1save &quot;cohort1.dta&quot;, replace// adoption year = 2002use &quot;demo.dta&quot;, cleardrop if adoptionyear == 1999drop if adoptionyear == 2004gen cohort = 2save &quot;cohort2.dta&quot;, replace// adoption year = 2004use &quot;demo.dta&quot;, cleardrop if adoptionyear == 1999drop if adoptionyear == 2002gen cohort = 3save &quot;cohort3.dta&quot;, replaceappend using &quot;cohort1.dta&quot;append using &quot;cohort2.dta&quot;sort cohort stkcd yearsave &quot;stackedmain.dta&quot;, replacereghdfe y did, a(stkcd#cohort year#cohort) vce(cl stkcd)// event study formforvalue i = 3(-1)2{ gen Before`i'_ = cond(year - adoptionyear &lt;= -`i' &amp; adoptionyear != ., 1, 0)}forvalue i = 3(-1)1{ gen Before`i' = cond(year - adoptionyear == -`i', 1, 0)}forvalue i = 0(1)3{ gen After`i' = cond(year - adoptionyear == `i', 1, 0)}forvalue i = 2(1)3{ gen After`i'_ = cond(year - adoptionyear &gt;= `i' &amp; adoptionyear != ., 1, 0)}reghdfe y Before3 Before2 After0 After1 After2 After3, a(stkcd#cohort year#cohort) vce(cl stkcd)est store m1coefplot m1, keep(Before3 Before2 After0 After1 After2 After3) /// levels(90) /// vertical yline(0) xline(3, lp(dash)) /// addplot(line @b @at) ciopts(lpattern(dash) /// recast(rcap) msize(medium)) /// msymbol(circle_hollow) /// scheme(s1mono) result 12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364. reghdfe y did, a(stkcd#cohort year#cohort) vce(cl stkcd)(MWFE estimator converged in 2 iterations)HDFE Linear regression Number of obs = 30Absorbing 2 HDFE groups F( 1, 3) = 89.20Statistics robust to heteroskedasticity Prob &gt; F = 0.0025 R-squared = 0.7619 Adj R-squared = 0.1367 Within R-sq. = 0.0929Number of clusters (stkcd) = 4 Root MSE = 22.3113 (Std. err. adjusted for 4 clusters in stkcd)------------------------------------------------------------------------------ | Robust y | Coefficient std. err. t P&gt;|t| [95% conf. interval]-------------+---------------------------------------------------------------- did | 20.2 2.138754 9.44 0.003 13.39353 27.00647 _cons | 47.49333 .7842096 60.56 0.000 44.99763 49.98904------------------------------------------------------------------------------Absorbed degrees of freedom:--------------------------------------------------------+ Absorbed FE | Categories - Redundant = Num. Coefs |----------------+---------------------------------------| stkcd#cohort | 6 6 0 *| year#cohort | 15 0 15 |--------------------------------------------------------+* = FE nested within cluster; treated as redundant for DoF computation. reghdfe y Before3 Before2 After0 After1 After2 After3, a(stkcd#cohort year#cohort) vce(cl stkcd)(MWFE estimator converged in 2 iterations)warning: missing F statistic; dropped variables due to collinearity or too few clustersHDFE Linear regression Number of obs = 30Absorbing 2 HDFE groups F( 6, 3) = .Statistics robust to heteroskedasticity Prob &gt; F = . R-squared = 0.8914 Adj R-squared = -0.0501 Within R-sq. = 0.5863Number of clusters (stkcd) = 4 Root MSE = 24.6071 (Std. err. adjusted for 4 clusters in stkcd)------------------------------------------------------------------------------ | Robust y | Coefficient std. err. t P&gt;|t| [95% conf. interval]-------------+---------------------------------------------------------------- Before3 | -33.27778 10.94557 -3.04 0.056 -68.11146 1.555902 Before2 | -12.27778 10.94557 -1.12 0.344 -47.11146 22.5559 After0 | -7.916667 20.0721 -0.39 0.720 -71.79504 55.96171 After1 | 43.08333 12.77056 3.37 0.043 2.441714 83.72495 After2 | -9.972222 11.97396 -0.83 0.466 -48.0787 28.13425 After3 | 6.027778 13.48936 0.45 0.685 -36.90138 48.95693 _cons | 54.33704 3.490034 15.57 0.001 43.23019 65.44388------------------------------------------------------------------------------Absorbed degrees of freedom:--------------------------------------------------------+ Absorbed FE | Categories - Redundant = Num. Coefs |----------------+---------------------------------------| stkcd#cohort | 6 6 0 *| year#cohort | 15 0 15 |--------------------------------------------------------+* = FE nested within cluster; treated as redundant for DoF computation 4 参考文献 Chen, Z., Cao, Y., Feng, Z., Lu, M., & Shan, Y. (2023). Broadband infrastructure and stock price crash risk: Evidence from a quasi-natural experiment. Finance Research Letters, 58, 104026. Q2. https://doi.org/10.1016/j.frl.2023.104026","link":"/2024/04/11/1-%E5%A0%86%E5%8F%A0%E5%8F%8C%E9%87%8D%E5%B7%AE%E5%88%86%E6%A8%A1%E5%9E%8B/"},{"title":"各种分组回归方法","text":"本文总结了各种常用和不常用的分组回归命令 12345678* 需要安装 runbyssc install runby// 后面发现好像可以不用 runby，但是我的 Stata 需要 ssc install egenmoressc install egenmore* 测试数据集webuse nlsworkxtset idcode year 1 全样本中位数分组12345egen med = median(age)gen gvar = (age &gt;= med) if !mi(age)reghdfe ln_wage tenure ttl_exp, a(idcode year) vce(cl idcode), if gvar == 0reghdfe ln_wage tenure ttl_exp, a(idcode year) vce(cl idcode), if gvar == 1 2 按照同年份中位数分组12345bys year: egen med = median(age)gen gvar = (age &gt;= med) if !mi(age)reghdfe ln_wage tenure ttl_exp, a(idcode year) vce(cl idcode), if gvar == 0reghdfe ln_wage tenure ttl_exp, a(idcode year) vce(cl idcode), if gvar == 1 3 按照同行业同年份中位数分组12345bys ind_code year: egen med = median(age)gen gvar = (age &gt;= med) if !mi(age)reghdfe ln_wage tenure ttl_exp, a(idcode year) vce(cl idcode), if gvar == 0reghdfe ln_wage tenure ttl_exp, a(idcode year) vce(cl idcode), if gvar == 1 4 按照某一指定年份中位数分组1234567891011gen dc = agegen temp = dc if year == 2014bys code: egen gvar_in_year = min(temp)egen gvar_in_year_median = median(gvar_in_year)gen gvar = (gvar_in_year &gt; gvar_in_year_median) if !mi(gvar_in_year)* 更建议先在原始文件里先处理成指定年份再 merge，然后直接用全样本中位数即可* 不建议先在原始文件里分组再 merge，可能会导致两组样本差异很大reghdfe ln_wage tenure ttl_exp, a(idcode year) vce(cl idcode), if gvar == 0reghdfe ln_wage tenure ttl_exp, a(idcode year) vce(cl idcode), if gvar == 1 5 非平衡面板下，按照每个样本的第一年分组123456789gen temp = agebys idcode: gen order_num = _ngen temp_in_1 = temp if order_num == 1bys idcode: egen tempall = min(temp_in_1)egen med = median(tempall)gen gvar = (tempall &gt;= med) if !mi(tempall)reghdfe ln_wage tenure ttl_exp, a(idcode year) vce(cl idcode), if gvar == 0reghdfe ln_wage tenure ttl_exp, a(idcode year) vce(cl idcode), if gvar == 1 6 全样本分三组（多组），取高低两组对比1234567xtile gvar = age, nq(3)drop if gvar == 2replace gvar = 0 if gvar == 1replace gvar = 1 if gvar == 3reghdfe ln_wage tenure ttl_exp, a(idcode year) vce(cl idcode), if gvar == 0reghdfe ln_wage tenure ttl_exp, a(idcode year) vce(cl idcode), if gvar == 1 7 逐年份将样本分为三组，取高低两组对比12345678910111213141516* 方法一：runbycap program drop myxtileprogram define myxtile xtile gvar = age, nq(3)endrunby myxtile, by(year) verbose* 方法二：egenmorebys year: egen gvar = xtile(age), n(3)drop if gvar == 2replace gvar = 0 if gvar == 1replace gvar = 1 if gvar == 3reghdfe ln_wage tenure ttl_exp, a(idcode year) vce(cl idcode), if gvar == 0reghdfe ln_wage tenure ttl_exp, a(idcode year) vce(cl idcode), if gvar == 1 8 逐行业-年份将样本分为三组，取高低两组对比12345678910111213141516* 方法一：runbycap program drop myxtileprogram define myxtile xtile gvar = age, nq(3)endrunby myxtile, by(ind_code year) verbose* 方法二：egenmorebys ind_code year: egen gvar = xtile(age), n(3)drop if gvar == 2replace gvar = 0 if gvar == 1replace gvar = 1 if gvar == 3reghdfe ln_wage tenure ttl_exp, a(idcode year) vce(cl idcode), if gvar == 0reghdfe ln_wage tenure ttl_exp, a(idcode year) vce(cl idcode), if gvar == 1 9 系数差异检验后面就是系数差异检验尽情发挥了 chowtest bdiff","link":"/2024/01/08/6-%E5%90%84%E7%A7%8D%E5%88%86%E7%BB%84%E5%9B%9E%E5%BD%92%E6%96%B9%E6%B3%95/"},{"title":"向前与滞后符号技巧","text":"这篇文章简单介绍了下 Stata 滞后符号 L. 与向前符号 F. 的使用小技巧，以滞后符号 L. 为例，F. 类似 1 滞后一期123456webuse nlsworkxtset idcode yearsort idcode yearreg ln_wage L.age 2 滞后两期（多期）123456webuse nlsworkxtset idcode yearsort idcode yearreg ln_wage L2.age 3 滞后一到四期123456webuse nlsworkxtset idcode yearsort idcode yearreg ln_wage L(1/4).age 4 滞后一、三、五期123456webuse nlsworkxtset idcode yearsort idcode yearreg ln_wage L(1 3 5).age 5 交互项 + 滞后一期c. 表示声明变量为连续变量，与 i. 相对 # 表示生成交互项，##表示生成交互项以及各自的单独项 123456webuse nlsworkxtset idcode yearsort idcode yearreg ln_wage c.L.age##c.race 6 差分 + 滞后一期D. 表示一阶向后差分，D.x 表示 123456webuse nlsworkxtset idcode yearsort idcode yearreg ln_wage D.L.age 7 同时几个变量一起滞后123456webuse nlsworkxtset idcode yearsort idcode yearreg ln_wage L(1/4).(age hours tenure)","link":"/2023/12/31/7-%E5%90%91%E5%89%8D%E4%B8%8E%E6%BB%9E%E5%90%8E%E7%AC%A6%E5%8F%B7%E6%8A%80%E5%B7%A7/"},{"title":"公司金融常用Stata代码","text":"使用 Stata 做实证已经有一段时间了，分享一些比较常用的命令，应该大部分的公司金融或者家庭金融论文都能够用到。本人是做公司金融的，所以大部分代码都是和公司研究方面相关的。如有不足，还望补充。 写在前面后文中采用的模型均为双向固定效应模型，固定了个体和年份$$Y = \\alpha + \\beta X + \\eta’Controls + \\delta_i + \\lambda_t + \\varepsilon_{it}$$$Controls$ 是一系列的控制变量 1 数据预处理部分（1）剔除样本通常我们在论文模型部分会看到剔除金融业企业、剔除单一观测值、剔除主要变量缺失的样本，有时还需要剔除部分年份的样本。 12345678910111213141516171819202122232425262728293031323334* 1 剔除金融业drop if industry == &quot;J&quot;// 有时候我们得到的是证监会分类的三位数代码，比如 J70，这时候就需要提取首字母drop if substr(industry, 1, 1) == &quot;J&quot;* 2 剔除样本中的 B 股// 这个可能是经常被忽略的，尽管在正文部分已经提及了使用的是 A 股样本，但是有时候数据中却还包含 B 股样本// stkcd 表示股票代码，注意处理成数值型变量而不是字符型drop if (stkcd &gt;= 200000 &amp; stkcd &lt; 300000)drop if stkcd &gt;= 900000* 3 剔除样本中的 ST、SST等股票// 这个应该是比较常用的操作了，因为股票如果变成 ST，它的名字前面会加上对应的字符drop if substr(name, 1, 3) == &quot;*ST&quot;drop if substr(name, 1, 3) == &quot;SST&quot;drop if substr(name, 1, 2) == &quot;ST&quot;drop if substr(name, 1, 2) == &quot;PT&quot;* 4 剔除主要变量缺失的样本// 这个建议在合并数据后再进行操作// 主要思路是，把所有的变量假装用来回归，然后直接剔除掉没用上的样本就行了global all_vars = &quot;Y X $controls&quot; // $controls 是一系列的控制变量，这里用了 Stata 的全局暂元方法qui reg $all_varskeep if e(sample)* 5 剔除单一观测值// 这个主要用在固定效应中bysort stkcd: gen single = _Ndrop if single &lt;= 1drop single （2）合并数据123456789101112131415* 1 1:1 匹配// 如果主文件（master）和导入文件（using）是通过股票代码和年份一一对应的，那么就采用如下的合并数据方法merge 1:1 stkcd year using &quot;control1.dta&quot;keep if _m == 3drop _m// 需要注意的是，如果在后文中没有使用该导入的数据，那么不建议使用 &quot;keep if _m == 3&quot;，因为这会导致样本损失// 建议使用 &quot;drop if _m == 2&quot; 这样可以尽可能地保留原始样本* 2 m:1 匹配// 如果主文件（master）和导入文件（using）不是一一对应的。例如，导入文件是以城市和年份为唯一标识的。// 由于一个城市有多家企业，所以主文件的多个样本对应了导入文件的一个样本，这个时候就需要 m:1 匹配merge m:1 city year using &quot;control2.dta&quot;keep if _m == 3drop _m （3）常见的公司金融变量生成123456789101112131415// 企业规模gen Size = ln(assets)// 资产负债率（杠杆率）gen Leverage = debts / assets// 资产回报率gen ROA = 2 * net_return / (L.assets + assets)// 企业现金流gen Cashflow = cash / assets// 企业年龄// 有时候会采用成立时间而不是IPO时间gen FirmAge = year - ipo_year 当然，如果做的是 DID 的话，可以采用如下的生成方式 1234567891011121314151617181920* 1 单期 DIDlocal treat_province2010 = &quot;广东省, 广西壮族自治区, 上海市, 北京市, 山西省&quot; // 随便打打inlist2 province, values(`treat_province2010') name(Treat) // inlist2 需要外部安装replace Treat = 0 if Treat == .gen Post = (year &gt;= 2010)gen DID = Treat * Post* 2 多期 DID// 假设有三期，分别是local treat_province2010 = &quot;广东省, 广西壮族自治区, 上海市, 北京市, 山西省&quot; // 随便打打local treat_province2012 = &quot;重庆市, 山东省, 西藏自治区&quot; // 随便打打local treat_province2015 = &quot;江西省, 福建省, 湖南省, 河北省&quot; // 随便打打inlist2 province, values(`treat_province2010') name(Treat2010) // inlist2 需要外部安装inlist2 province, values(`treat_province2012') name(Treat2012)inlist2 province, values(`treat_province2015') name(Treat2015)replace Treat2010 = 0 if Treat2010 == .replace Treat2012 = 0 if Treat2012 == .replace Treat2015 = 0 if Treat2015 == .gen DID = ((Treat2010 == 1) &amp; (year &gt;= 2010)) | ((Treat2012 == 1) &amp; (year &gt;= 2012)) | ((Treat2015 == 1) &amp; (year &gt;= 2015)) （4） 缩尾处理123// 通常需要对连续变量进行 1% 和 99% 处的缩尾处理// 建议直接 replace，不要生成新的变量，容易选错。。。winsor2 $continuous_variables, cut(1, 99) replace （5）在基准回归前在基准回归前，最好是对一些变量设定暂元，方便后续使用。然后把处理好的数据稳健保存好，以免后续处理不小心剔除了某些样本又要重来。 123456* 定义全局暂元global controls = &quot;control1 control2 control3 control4 control5 control6&quot;global independents = &quot;X $controls&quot;* 保存数据文件save &quot;main.dta&quot;, replace 2 描述性统计这部分内容其实没啥好说的。。。代码照抄就是了，/// 表示续行符 123sum2docx Y $independents /// using &quot;描述性统计.docx&quot;, replace /// stats(N mean(%12.3f) sd(%12.3f) min(%12.3f) median(%12.3f) max(%12.3f)) 3 基准回归123456789101112131415161718use &quot;main.dta&quot;, replace // 这里开始就可以体现前面保存数据的好处了// 不放控制变量和固定效应reghdfe Y X , vce(cluster stkcd) // 个体层面的聚类稳健标准误est store m1// 加入控制变量reghdfe Y $independents ,vce(cluster stkcd)est store m2// 加入固定效应reghdfe Y $independents , absorb(stkcd year) vce(cluster stkcd)est store m3// 表格输出reg2docx m1 m2 m3 using &quot;基准回归.docx&quot;, replace /// scalars(N(%20.0fc) r2_a(%9.3f)) b(%9.4f) t(%7.3f) /// addfe(&quot;Firm=YES YES YES&quot; &quot;Year=YES YES YES&quot;) /// mtitles(&quot;OLS&quot; &quot;OLS&quot; &quot;OLS&quot;) /// font(&quot;Times New Roman&quot;, 9) 如果采用的是交乘的方法可以使用 # 符号，c. 声明变量是连续的 1234// 只有交乘项reghdfe Y c.X#c.D $controls , absorb(stkcd year) vce(cluster stkcd)// 包含单独项reghdfe Y c.X##c.D $controls , absorb(stkcd year) vce(cluster stkcd) 4 分组回归和系数差异检验分组回归的方法常用于异质性分析，但由于近年来中介效应模型备受批评，所以也有越来越多的学者采用分组或交乘的方式来做机制检验，因此掌握分组回归和系数差异检验的方法非常重要。 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293* 1 按照已有的 0-1 虚拟变量分组，例如“是否国企”// 非国企reghdfe Y $independents , absorb(stkcd year) vce(cluster stkcd), if SOE == 0est store m1// 国企reghdfe Y $independents , absorb(stkcd year) vce(cluster stkcd), if SOE == 1est store m2// 表格输出reg2docx m1 m2 using &quot;国企分组回归.docx&quot;, replace /// scalars(N(%20.0fc) r2_a(%9.3f)) b(%9.4f) t(%7.3f) /// addfe(&quot;Firm=YES YES&quot; &quot;Year=YES YES&quot;) /// mtitles(&quot;Non-SOEs&quot; &quot;SOEs&quot;) /// font(&quot;Times New Roman&quot;, 9)// 系数差异检验// （1）chowtest，其实就是引入交乘项，看交乘项的系数方向对不对，显不显著就行了reghdfe Y c.X##c.SOE $controls , absorb(stkcd year) vce(cluster stkcd)// （2）似无相关模型SUR的检验// 这个貌似不能用 reghdfe，所以还是算了。。。// （3）bdiff// 这个是我常用的命令，详细使用方法可以看连玉君老师的推文bdiff, group(SOE) model(reghdfe Y $independents , absorb(stkcd year) vce(cluster stkcd)) /// reps(1000) bdec(4) pdec(4) bsample seed(123456)* 2 按照中位数分组// 如果是需要按照某个连续变量的中位数进行分组，那么就需要采用一些小小的处理方法qui sum C, detailgen dC = (C &gt;= r(p50)) if !missing(C)// 这里的 dC 就是计算出来的基于中位数的分组变量// 低于中位数的组别reghdfe Y $independents , absorb(stkcd year) vce(cluster stkcd), if dC == 0est store m1// 高于中位数的组别reghdfe Y $independents , absorb(stkcd year) vce(cluster stkcd), if dC == 1est store m2// 表格输出reg2docx m1 m2 using &quot;国企分组回归.docx&quot;, replace /// scalars(N(%20.0fc) r2_a(%9.3f)) b(%9.4f) t(%7.3f) /// addfe(&quot;Firm=YES YES&quot; &quot;Year=YES YES&quot;) /// mtitles(&quot;Low&quot; &quot;High&quot;) /// font(&quot;Times New Roman&quot;, 9)// 系数差异检验bdiff, group(dC) model(reghdfe Y $independents , absorb(stkcd year) vce(cluster stkcd)) /// reps(1000) bdec(4) pdec(4) bsample seed(123456)* 3 按照分位数分组// 有时候中位数分组不显著（就是这么直接），一些作者会采用极端的分组方法。// 例如，最低的 1/3 定为 0，最高的 1/3 定为 1，中间的 1/3 剔除xtile dC = C, nq(3)drop if dC == 2replace dC = 0 if dC == 1replace dC = 1 if dC == 3// 后面就和前面类似了* 4 按照某个连续变量在某一年的中位数进行分组// 这种分类方式其实还是比较少见的。。。大家为了显著都不容易啊。。。// 例如现在有 2008-2020 的上市公司数据，按照变量 C 在 2015 年的中位数分组// 按照哪一年其实也是有讲究的，如果做的是 DID，一般选取事件的前一年// 思路就是把 2015 年的数据填充到其他年份，然后再找中位数分组gen dc = Cgen temp = dc if year == 2015bys code: egen gvar_in_year = min(temp)egen gvar_in_year_median = median(gvar_in_year)gen dC = (gvar_in_year &gt; gvar_in_year_median) if !mi(gvar_in_year)drop dc temp gvar_in_year gvar_in_year_median* 更建议先在原始文件里先处理成指定年份再 merge，然后直接用全样本中位数即可* 不建议先在原始文件里分组再 merge，可能会导致两组样本差异很大* 5 按照是否高于同年/行业/城市/省份中位数进行分组// 这个相对前一个来说简单一些，按行业分别找中位数然后再用变量跟它对比就好bysort industry: egen temp = median(C) // 行业中位数gen dC = (C &gt;= temp) if !missing(temp)drop tempbysort year: egen temp = median(C) // 年份中位数gen dC = (C &gt;= temp) if !missing(temp)drop temp* 6 按照是否高于同年行业/城市/省份中位数进行分组// 这种分组方式感觉怪怪的其实。。。不建议使用bysort industry year: egen temp = median(C)gen dC = (C &gt;= r(p50)) if !missing(temp)drop temp 5 稳健性检验这部分内容建议去看下知乎上专门的文章，我这里就抛砖引玉，主要是介绍下有哪些稳健性检验方法。。。 （1）平行趋势检验DID 必备检验之一，继续沿用前面的 DID 例子。如果大家感兴趣的话，之后可能会专门出一期怎么调平行趋势的文章。。。 12345678910111213141516171819202122232425262728293031323334353637383940// 假设有三期，分别是local treat_province2010 = &quot;广东省, 广西壮族自治区, 上海市, 北京市, 山西省&quot; // 随便打打local treat_province2012 = &quot;重庆市, 山东省, 西藏自治区&quot; // 随便打打local treat_province2015 = &quot;江西省, 福建省, 湖南省, 河北省&quot; // 随便打打inlist2 province, values(`treat_province2010') name(Treat2010) // inlist2 需要外部安装inlist2 province, values(`treat_province2012') name(Treat2012)inlist2 province, values(`treat_province2015') name(Treat2015)gen current = ((Treat2010 == 1) &amp; (year == 2010)) | ((Treat2012 == 1) &amp; (year == 2012)) | ((Treat2015 == 1) &amp; (year == 2015))gen Treat_Year = year if current == 1bysort stkcd: egen Treat_Year2 = sum(Treat_Year)drop Treat_Yearrename Treat_Year2 Treat_Yearforvalue i = 4(-1)2{ gen Before`i'_ = cond(year - Treat_Year &lt;= -`i' &amp; Treat_Year != ., 1, 0)}forvalue i = 4(-1)1{ gen Before`i' = cond(year - Treat_Year == -`i', 1, 0)}forvalue i = 0(1)5{ gen After`i' = cond(year - Treat_Year == `i', 1, 0)}forvalue i = 2(1)5{ gen After`i'_ = cond(year - Treat_Year &gt;= `i' &amp; Treat_Year != ., 1, 0)}reghdfe Y Before3_ Before2 Before1 After0 After1 After2 After3_ $controls , absorb(stkcd year) vce(cluster stkcd)est store m1coefplot m1, keep(Before3_ Before2 Before1 After0 After1 After2 After3_) /// levels(95) /// vertical yline(0) xline(4, lp(dash)) /// xtitle(&quot;period&quot;) ytitle(&quot;coefficient&quot;) /// addplot(line @b @at) ciopts(lpattern(dash) /// recast(rcap) msize(medium)) /// msymbol(circle_hollow) /// scheme(s1mono) （2）安慰剂检验这个也是 DID 必备的检验之一，通常可以分为对 $Treat$ 随机、对 $Post$ 随机和对 $DID$ 随机。这里我以单期 $DID$ 对 $Treat$ 随机为例子。（多期 DID 好像只能对 DID 随机） 12345678local treat_province2010 = &quot;广东省, 广西壮族自治区, 上海市, 北京市, 山西省&quot; // 随便打打inlist2 province, values(`treat_province2010') name(Treat) // inlist2 需要外部安装replace Treat = 0 if Treat == .gen Post = (year &gt;= 2010)permute Treat beta = _b[c.Treat#c.Post] t = (_b[c.Treat#c.Post] / _se[c.Treat#c.Post]), /// reps(1000) rseed(123456) saving(&quot;simulations.dta&quot;, replace): /// reghdfe Y c.Treat#c.Post $controls , absorb(stkcd year) vce(cluster stkcd) “beta = _b[c.Treat#c.Post] t = (_b[c.Treat#c.Post] / _se[c.Treat#c.Post])” 表示分别将交互项的系数和 t 值记录下来，然后再对输出文件 simulations.dta 做描述性统计就可以得到安慰剂检验的结果。当然，也可以直接查看 permute 的输出结果，可以参考我的另一篇文章。 （3）工具变量法工具变量法采用 ivreghdfe 命令，该命令还可以输出一系列的弱工具变量检验结果。当然要注意的是这个命令没办法使用稳健标准误。 1ivreghdfe Y (X = IV) $controls , absorb(stkcd year) 还有另一种方法是通过手动 2SLS 回归，这种方法还能够分别输出两阶段的回归结果，所以更加推荐使用。 123reghdfe X IV $controls , absorb(stkcd year) vce(cluster stkcd)predict X_hatreghdfe Y X_hat $controls , absorb(stkcd year) vce(cluster stkcd) 如果一阶段 F 值大于经验值 10，通常就可以认为不存在弱工具变量问题了。但是其他的弱工具变量检验就需要自己去操作了。 （4）遗漏变量问题遗漏变量也是内生性的一种来源，而且近几年的文献越来越关注遗漏变量可能导致的后门路径问题。一般分为两种解决方法。 第一种是增加更多的控制变量以控制遗漏混淆变量影响。比如加入更多的固定效应，其他层面的控制变量等等。 第二种是证明遗漏变量对原文结论没有很大影响，常见的方法有 Oster 检验（2019）和 Frank（2000）提出的 Konfound 检验。Oster检验可以见我前面提到的那篇文章，Frank（2000）的方法可以看下连玉君老师的推文。 https://www.lianxh.cn/news/576be9d47ceeb.html https://zhuanlan.zhihu.com/p/513830106 https://www.lianxh.cn/news/4832e3735dc81.html 6 总结以上便是一些常见的实证论文中的 Stata 操作，当然难免会有遗漏。如有问题还希望大家指出，一起进步。","link":"/2023/01/06/4-%E5%85%AC%E5%8F%B8%E9%87%91%E8%9E%8D%E5%B8%B8%E7%94%A8Stata%E4%BB%A3%E7%A0%81/"},{"title":"Permutation test 和 Oster test 的 Stata 实现","text":"置换检验和 Oster test 的 Stata 实现 后文用的模型均为：$$y_{it} = \\beta_0 + \\beta_1x_{it}+\\beta’ controls_{it} + \\mu_i + \\lambda_t + \\epsilon_{it}$$内生性问题主要包括遗漏变量、反向因果还有选择性偏误，第一个问题可以通过增加更多变量解决，比如引入其他层面的固定效应，第二个反向因果问题可以通过工具变量法等方法解决，第三个选择性偏差包括样本选择偏误和自选择偏误，前者可以通过 PSM-DID 等方法解决，后者可以通过Heckman 2sls 等方法解决。本文介绍的Permutation test 和 Oster test 是从另外一个视角去探讨遗漏变量内生性问题，尤其是在找不到工具变量和遗漏变量不可观测、不可分离的时候较为有效。 1 Permutation test / Randomization test / 置换检验 排除共存事件的影响，比如在做 DID 的时候，有些其他事件冲击可能与我们所关心的事件是同时发生的，通过置换检验一定程度上可以排除这些共存事件的干扰。 1reg y x age size soe i.industry i.year, r 接下来的是对 $x$ 进行 permute 500 次【命令中的 permute x 和 option 里面的 reps(500) 】，每次都是对同一家公司不同年份进行 permute 【option 里面的 strata(code)】，这样就不会把 A 公司的数据给了 B 公司（不然就搞得和安慰剂检验差不多了）。最终的目的是对 x 的系数进行估计【命令中的 beta=_b[x] ，意思就是把 x 的系数赋值给 beta 这个变量名，其实改成别的或者不加也行】，冒号后面是基准模型。 1permute x beta=_b[x], reps(500) strata(code):reg y x age size soe, r 图中的 -0.0066581 就是前面基准回归的系数，由于这个系数是负的，所以我们看看这 500 次里面有多少次是比这个系数更负，也就是 lower 后面那个数字 0 ，意思就是 500 次 permute 的结果，$x$ 的系数全都大于 -0.0066581，经验 $p$ 值为后面的 0.0000（计算方法是 p = c / n），说明我们的结果是稳健的，不太可能有共存的事件影响我们的估计结果。 tips：如果原本的系数是正的就看下面的 upper 那行 此外，如果做的是 DID 的话，比如方程是$$y_{it} = \\beta_0 + \\beta_1treat_{i}\\times post_t+\\beta’ controls_{it} + \\mu_i + \\lambda_t + \\epsilon_{it}$$那就可以把命令改成对 $treat$ 进行 permute， 1permute treat beta=_b[c.treat#c.post], reps(500) strata(code):reg y c.treat#c.post age size soe i.industry i.year, r 经验 $p$ 值 0.3080 （154 / 500），500 次 permutation 里面就有 154 次比 -0.0268812 要小，所以置换检验没有通过，共存事件的影响不可忽视。 2 Oster test（2019） 其实这个检验目前还没有明确的名字，我姑且叫它 Oster test 吧。这个检验是用来检验遗漏变量的影响，比如，未观测到的因素需要比已观测到的因素作用大多少才能够对原估计结果产生显著影响（使 β = 0 或者 β 逆转为正数[前面提到的结果是负数]），所以这个办法本质上就是用来说明遗漏变量不会影响我们的主要结果。 2.1 是否存在和已经观测到的变量同等重要的未观测到的变量对我们的估计结果产生影响？ 更新：建议使用 github 上提供的 psacalc2，支持 reghdfe 命令后果运行，使用方法不变 链接 ==&gt; psacalc with support for reghdfe Oster test 分为两个部分，首先第一个部分是用来检验是否存在和已经观测到的变量（包括固定效应）同等重要的未观测到的变量对我们的估计结果产生影响？ 按照 Oster 原文以及一些顶刊的做法，通过 $R^2$ 和 $\\delta$ 对“真实的” $\\beta$ 区间（true $\\beta$）进行复原。首先给出两个假设， 引入未观测到的因素后 $R^2$ 会变为原来 1.3 倍（这个 1.3 是 Oster 的建议值）； 未观测到的变量对被解释变量的影响和已观测到的变量（包括固定效应）的影响至少相同（即 $\\delta = 1$ ，也是个建议值） 用到的 Stata 包是 psacalc，执行的命令为 123reg y x age big4 dual assets debts top institude soe, r level(99.5) // 置信区间设为 99.5% ，后面有用global r = e(r2) * 1.3 // 设为 1.3 倍psacalc beta x, delta(1) rmax($r) 首先是回归结果，记住这个 99.5 % 置信区间 [-0.0092741, -0.0052753]， psacalc beta 的输出结果如下， 没有引入控制变量和固定效应前，回归系数为 -0.00727 ，对应的 $R^2$ 为 0.004，控制后分别变为 -0.007270（没变，好家伙，为了区分多加个 0） 和 0.009，上面有个 beta 值为 -0.00728， 那么上面汇报的 beta 值与控制后的系数组成了“真实的” $\\beta$ 区间 [-0.00728, -0.007270]，该区间不包含 0 值，且落于前面提到的 99.5% 置信区间 [-0.0092741, -0.0052753] 内。这个结果说明了不太可能与已经观测到的变量（包括固定效应）同等重要的未观测到的变量对我们的结果产生显著影响。（使得 $\\beta$ 失效等于 0 或令其逆转为正） 2.2 未观测到的变量至少要产生多少倍于已经观测到的变量的影响才能够使得 β = 0？第二个检验换了一个角度去思考，假设 引入未观测到的因素后 $R^2$ 会变为原来 1.3 倍（这个 1.3 是 Oster 的建议值）； $\\beta = 0$ 执行的命令为 1psacalc delta x, beta(0) rmax($r) 结果输出如下， 可以发现上面的 delta 值为 24.20817 ，也就是说，未观测到的变量产生的影响至少24倍于已经观测到的变量才能够使得 β = 0，该结果表明不太可能存在未观测到的变量对我们的结果产生显著影响。（存在24倍影响的未观测变量是不太可能的） 根据知乎评论区朋友提供的资料，下面这个链接认为 delta 小于 0 时，只要小于 -1 也是可以的。 ==&gt; 知乎文章 我在其他的论文中发现 delta 其实小于 0 都是可以的，说明加入遗漏变量后，系数会更偏向于基准的方向。 3 怎么汇报结果？那么 Oster test 的结果应该如何汇报呢？我找了下文献，看到有这两种汇报方法， Donohoe, M.P., Jang, H. and Lisowsky, P., 2022. Competitive externalities of tax cuts. Journal of Accounting Research, 60(1), pp.201-259. Aubery, F. and Sahn, D.E., 2021. Cognitive achievement production in Madagascar: a value-added model approach. Education Economics, 29(6), pp.670-699. 顺带加个置换检验的展示图， Donohoe, M.P., Jang, H. and Lisowsky, P., 2022. Competitive externalities of tax cuts. Journal of Accounting Research, 60(1), pp.201-259. 如有不对还请指出！ 4 参考资料 Aubery, F. and Sahn, D.E., 2021. Cognitive achievement production in Madagascar: a value-added model approach. Education Economics, 29(6), pp.670-699. Donohoe, M.P., Jang, H. and Lisowsky, P., 2022. Competitive externalities of tax cuts. Journal of Accounting Research, 60(1), pp.201-259. Oster, E., 2019. Unobservable selection and coefficient stability: Theory and evidence. Journal of Business & Economic Statistics, 37(2), pp.187-204. 无工具变量解决遗漏变量内生性问题的psacalc方法及stata命令 - 经管代码库 - 经管之家(原人大经济论坛) https://bbs.pinggu.org/thread-10977856-1-1.html 前沿: 解决内生性问题的无工具变量推断法 https://mp.weixin.qq.com/s/o9bqdOSkqwsV_9QSrl_OUA 什么是经济学中的自选择问题？https://www.zhihu.com/question/311199969","link":"/2022/05/13/5-permutation_oster/"}],"tags":[{"name":"数学","slug":"数学","link":"/tags/%E6%95%B0%E5%AD%A6/"},{"name":"Stata","slug":"Stata","link":"/tags/Stata/"},{"name":"搜索网站","slug":"搜索网站","link":"/tags/%E6%90%9C%E7%B4%A2%E7%BD%91%E7%AB%99/"},{"name":"计量经济学","slug":"计量经济学","link":"/tags/%E8%AE%A1%E9%87%8F%E7%BB%8F%E6%B5%8E%E5%AD%A6/"},{"name":"社科基金","slug":"社科基金","link":"/tags/%E7%A4%BE%E7%A7%91%E5%9F%BA%E9%87%91/"},{"name":"GitHub","slug":"GitHub","link":"/tags/GitHub/"},{"name":"网络加速","slug":"网络加速","link":"/tags/%E7%BD%91%E7%BB%9C%E5%8A%A0%E9%80%9F/"},{"name":"经济学人","slug":"经济学人","link":"/tags/%E7%BB%8F%E6%B5%8E%E5%AD%A6%E4%BA%BA/"},{"name":"英语","slug":"英语","link":"/tags/%E8%8B%B1%E8%AF%AD/"}],"categories":[{"name":"网页设计","slug":"网页设计","link":"/categories/%E7%BD%91%E9%A1%B5%E8%AE%BE%E8%AE%A1/"},{"name":"编程学习","slug":"编程学习","link":"/categories/%E7%BC%96%E7%A8%8B%E5%AD%A6%E4%B9%A0/"},{"name":"社科基金","slug":"社科基金","link":"/categories/%E7%A4%BE%E7%A7%91%E5%9F%BA%E9%87%91/"},{"name":"英语学习","slug":"英语学习","link":"/categories/%E8%8B%B1%E8%AF%AD%E5%AD%A6%E4%B9%A0/"}],"pages":[{"title":"","text":"⭐关于我⭐个人介绍 📝经管类在读研究生 💻擅长Python、Stata等编程语言 🌏精通爬虫、计量分析、自然语言处理等技能 🍟爱吃垃圾食品 🧋快乐肥宅 社交媒体 📮邮箱：codefox2020@163.com 💡知乎主页：https://www.zhihu.com/people/Keynes 🌐GitHub 主页：https://codefoxs.github.io","link":"/about.html"},{"title":"","text":"Stata command by CodeFox1 MethodVerify the name of the command you want to download, and then enter the following code to install or update the command 12345* Installnet install command, from(&quot;https://raw.githubusercontent.com/codefoxs/Stata-personal/main/command/&quot;) replace* Versionwhich command 2 Command list Command Function Version datedv Quickly convert date strings to year, month, day and so on. 0.1.1 cnprov Stata commands for Chinese province strings conversion 0.1.1 lewbel Heteroskedasticity-based instrumental variable regression (Lewbel, 2012) 0.1.4 csmar Stata commands for load CSMAR xlsx file 0.1.3","link":"/stata.html"},{"title":"","text":"Stata 学习专栏作者：CodeFox 更新时间：2024-08-25 💡 这是一本 Stata 与公司金融计量的入门级在线教程，内容还在更新中，预计非常长。。。耐心看完应该还是能有所收获的（希望） 框架图 1. 数据获取与预处理1.1 常见数据来源：CSMAR yyds常见的数据来源包括如下几个： 1.1.1 国泰安数据库https://data.csmar.com/ 你能想到的数据基本都在这里，企业财务报表、各种计算好的指标（如盈余管理、融资约束、股价崩盘）、上市公司股价收益率、市场收益率、Fama 因子等。 1.1.2 CNRDS数据库https://www.cnrds.com/Home 试图替代 CSMAR 的后起之秀，比起 CSMAR，多了许多特色数据库，如年报、管理层讨论与分析文本和语气、业绩说明会文本等等。如果学校没买库，可以试着去咸鱼找人代下。 1.1.3 RESSET 金融大数据平台https://www.resset.cn/ 一般需要从自己的图书馆进去，不常用，除非是比较喜欢做股价研究的。 1.1.4 Wind终端通常需要学校的电脑提供，一般不会提供账号（除非去机构上班） 同类的有国外的 Bloomberge、汤森路透等 国内同类的平替有 Choice（东方财富网）和 iFind，建议使用 Choice（ https://choice.eastmoney.com/ ），使用高校认证可以白嫖 数据是比较权威。。。但是不好操作和下载，对于案例分析来说还挺好的一平台。 1.1.5 经管之家、闲鱼、马克数据网、数据皮皮侠、众鲤数据网一些需要甄别的数据来源，但不得不说，这些平台的存在极大地丰富了科研工作。 1.2 数据预处理步骤：从下载到描述性统计主要参考文章：公司金融常用Stata代码 💡以 CSMAR 为例，下载企业基本信息表、资产负债表和利润表 1.2.1 数据下载（1）下载上市公司基本信息 依次选择数据中心、公司研究系列、上市公司基本信息 选择合适的年份与样本区间 ❗不要选择剔除金融或者剔除 ST 的样本，这些操作在 Stata 中实现更好，因为计算时可能需要用到 t - 1 的数据，这里尽可能保证数据的完整性 在左侧选择需要的数据，点击即可移动到右侧 选择默认的数据格式（其他的可能会错乱），然后点击下载即可得到一个 zip 压缩包 复制 zip 文件的绝对路径后 打开 Stata，输入如下代码： 1csmar &quot;D:\\code\\Stata\\stata-learn\\上市公司基本信息年度表.zip&quot; 💡csmar 命令是我的自定义命令，可以通过如下代码下载 1net install csmar, from(&quot;https://raw.githubusercontent.com/codefoxs/Stata-personal/main/csmar/&quot;) replace 显示如下结果则证明数据成功 然后将数据另存为上市公司基本信息-2000to2023.dta 即可 （2）下载企业资产负债表 类似的，选中财务报表数据库","link":"/statalearn.html"}]}