### Relationship between Company Size Before Layoffs and Percentage of Employees Laid Off

In this section we are looking at the effect of company size before layoffs and the percentage of employees laid off, which directly relates to our hypothesis. We expect larger companies to layoff a higher percentage of employees than smaller companies.

In [None]:
# Plotting scatterplot comparing Company Size Before Layoffs and Percentage Laid Off
plt.figure(figsize=(12,6))
sns.scatterplot(x='Company_Size_before_Layoffs', y='Percentage', data=df, s=100)

# Plotting line of best fit
sns.regplot(x='Company_Size_before_Layoffs', y='Percentage', data=df, scatter=False, color='blue')

plt.title('Company Size Before Layoffs vs. Percentage of Employees Laid Off', fontsize=20)
plt.xlabel('Company Size Before Layoffs', fontsize=14)
plt.ylabel('Percentage of Layoffs', fontsize=14)
plt.show()

According to this scatterplot, there is a relationship between company size before layoffs and the percentage of employees laid off. However, this relationship appears negative, such that, as the size of the company before layoffs increases, the percentage of layoffs decreases. This contradicts our hypothesis and expectation that larger companies will have a higher percentage of layoffs. Given this initial relationship, we would like to measure the strength of association between company size before layoffs and the percentage of employees laid off.

In [None]:
#Creating a heat map to determine the different correlation coefficients for Company Size Before Layoffs, Percentage of Layoffs, Company Size After Layoffs, Money Raised in Million, and Year
heatmap_data = df[['Company_Size_before_Layoffs', 'Percentage', 'Company_Size_after_layoffs', 'Money_Raised_in_$_mil', 'Year']]

correlation_matrix = heatmap_data.corr()

plt.figure(figsize=(12,6))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f")

plt.title('Heatmap: Correlation Matrix of Variables')
plt.show()

According to this heat map, there is a weak negative correlation between company size before layoffs and the percentage of layoffs (r = -0.11). This suggests that the company size before layoffs has only a minor impact on the percentage of layoffs. It seems that the correlation coefficient is the highest between the year of the layoffs and the percentage of layoffs. Given this week negative correlation, we are interested to see if there is a significant relationship between company size before layoffs and the percentage of employees laid off.

$H_o$: There is no relationship between company size before layoffs and the percentage of employees laid off ($\beta = 0$)

$H_a$: There is a relationship between company size before layoffs and the percentage of employees laid off ($\beta \ne 0$)

In [None]:
# Finding the p-value and t-value using the OLS regression
outcome, predictors = patsy.dmatrices('Percentage ~ Company_Size_before_Layoffs', df)
model = sm.OLS(outcome, predictors)
results = model.fit()
print("P-value:", results.pvalues[1])
print("T-test:", results.tvalues[1])

The p-value is 3.21e-05 (p < 0.05) and the t-value is -4.171, which demonstrates a statistically significant relationship between company size before layoffs and the percentage of employees laid off. Given this p-value, we reject the null hypothesis in favor of the alternative hypothesis, concluding that larger companies before layoffs have lower percentage of layoffs. 

Nonetheless, this finding is different from what we hypothesized, as we expected larger companies to have a higher percentage of employee layoffs. One reason for the inverse relationship is that compared to smaller companies, larger companies have more employees to begin with, so if larger companies layoff more people than smaller companies, the percentage is not going to be as high. It is also important to consider other factors, such as industry type, geographic location, and economic condiitons, that could influence the relationship between company size before layoffs and the percentage of layoffs. 