## Tech 6 Identifying Firms with Restatements ##

Please create a new Jupyter Notebook file and then perform the analyses using Python codes. For every question, please write down the codes and execute them to get results. Please also **explain your findings in Markdown cells or comments right after the returning results.**

After you finish the analyses, please **make sure every cell is executed and all the returning results are presented**. Then you can save the file and submit it to Canvas. **Each group only has to submit one file but everyone should make sure they are able to write and run the code.**

## Introduction ##
Public firms are required to issue annual financial reports (10-Ks in the US) every year. Managers are responsible for preparing these reports. The reports are then audited by external auditors before the public release. Despite all the efforts, the final reports that are released to the public could still contain errors. When managers identify material errors in the previous annual reports, they are required by the U.S. Securities and Exchange Commission (SEC) to correct the errors and restate their original reports.

The SEC's Enforcement division performs regular reviews over public firms' financial reporting in order to protect investors from potential accounting frauds. Your group is hired by them to examine which types of firms are more likely to restate their annual reports using a large sample. You are asked to perform the following two tasks by analyzing the provided dataset (*resta_data_00_17.csv*). This dataset include companies' business characteristics, their auditors' identity, and the information on whether their financial reports are restated. Please see below for a description of the dataset variables.

## Dataset Description ##
- datadate: fiscal year end
- fyear: fiscal year
- conm: company name
- resta: an indicator that equals to one if the company's annual report in the corresponding fiscal year is later restated, zero othewise
- auditor_fkey: numerical identifier for the company's auditor (Big Four: 1-PWC, 2-EY, 3-Deloitte, 4-KPMG; Others: >=5)
- lnassets: logarithm of total assets
- leverage: leverage ratio (liabilities / total assets)
- mtb: market-to-book ratio (market value / book equity)
- newissue: the company's newly raised capital as a percentage of total assets
- roa: return on assets (net incomes / total assets)
- smallprofit: an indicator that equals to one if the company reports small profits, zero othewise
- lnsegment: logarithm of the number of total business segments within the company
- foreign: an indicator that equals to one if the company has sales in foreign territories, zero othewise
- sgr: sales growth
- lnage: logarithm of the company's age
- icweak: an indicator that equals to one if the company has at least one internal control weakness, zero othewise
- big4: an indicator that equals to one if the company has is audited by one of Big Four, zero othewise

*Note: Certain variables are transformed to logarithms, which are more suited for statistical tests.*

## Questions ##

Based on the case-by-case investigations, the enforcement division identifys three major types of firms that are extremely susceptible to accounting restaments.
- Firms with weaker internal controls
- Firms with more complex business
- Firms with greater incentives to manage their earnings

**Question 1: Please use statistical tests to examine whether the SEC's three conjectures above are supported by the data.**  
  
*Hint 1: First, identify the construct in each conjecture. Second, find a relevant variable in the dataset for each construct; there can be more than one variable related to a construct, but you only have to pick one (you can do more if you want). Third, examine whether firms with restatements and firms without restatements differ in those variables.*

*Hint 2: When it comes to managing earnings, firms that raise more capitals have greater incentives to manage their earning to lower capital costs. In addition, firms that report small profits likely manage their earnigns upward to avoid losses.*

*Hint 3: See below cells for sample codes to test whether companies with restatements and companies without restatements have different averages of internal control weakness.*

**Question 2: Recently, The Wall Street Journal has reported that several companies with major accounting scandals in 2020 are all audited by Ernst & Young. The journalists were concerned that Ernst & Young might not provide audits as good as the other audit firms. In the context of this dataset, can you examine if the average restatement rate of firms audited by Ernst & Young is significantly greater than 11.8% (the average restatement rate for all companies)?**  
  
  
*If you are intereted in reading that WSJ article, you can search "String of Firms That Imploded Have Something in Common: Ernst & Young Audited Them" (by Patricia Kowsmann,  Mark Maurer, and Jing Yang; October 16, 2020)*    

In [13]:
import pandas as pd
import scipy.stats as stat

In [14]:
audit = pd.read_csv('resta_data_00_17.csv', parse_dates = ['datadate'])

In [15]:
resta = audit[audit['resta'] == 1]

In [16]:
nonresta = audit[audit['resta'] == 0]

In [17]:
resta['icweak'].mean() - nonresta['icweak'].mean()

0.03890817537021113

In [28]:
t, p = stat.ttest_ind(resta['icweak'], nonresta['icweak'])
print("ttest_ind_stats: t = %.3f  p = %.50f" % (t, p))

ttest_ind_stats: t = 14.733  p = 0.00000000000000000000000000000000000000000000000053


In [29]:
t, p = stat.ttest_ind(resta['lnsegment'], nonresta['lnsegment'])
print("ttest_ind_stats: t = %.3f  p = %.50f" % (t, p))

#With a p-value of ~ 0.00 there is very strong evidence that the firms with more complex segments are more 
#likely to have to restate their earnings

ttest_ind_stats: t = 12.666  p = 0.00000000000000000000000000000000000106309394586215


In [31]:
t, p = stat.ttest_ind(resta['smallprofit'], nonresta['smallprofit'])
print("ttest_ind_stats: t = %.3f  p = %.50f" % (t, p))
#With a p-value of ~ 0.00 there is very strong evidence that the firms with small profits are more 
#likely to have to restate their earnings

ttest_ind_stats: t = 7.294  p = 0.00000000000030694914485978792924385137969592456954


In [33]:
t, p = stat.ttest_ind(resta['newissue'], nonresta['newissue'])
print("ttest_ind_stats: t = %.3f  p = %.3f" % (t, p))
#With a p-value of ~.6 we would not conclude that recent issuance of capital would cause a restatement of earnings.

ttest_ind_stats: t = -0.515  p = 0.606


In [52]:
ey = audit[audit['auditor_fkey'] == 2]
restaey = ey[ey['resta'] == 1]
restaey.shape
#This has 967 observations
ey.shape
#This has 10334 observations
eypercent = 967/10334
eypercent
average = .118
difference = eypercent - average
difference
#ey has a lower rate of restatements compare to the average

stat.ttest_1samp(967/10334, .118)

Ttest_1sampResult(statistic=nan, pvalue=nan)

In [55]:
audit.shape
#41860
resta.shape
#4981
per = 4981/41860
per

0.11899187768752986