### Correlation Between Money Raised in Millions and Percentage Laid Off

In this section we are looking at the correlation between the money a company raises and the percentage of their staff, they laid off. We expect larger companies to layoff a higher percentage of employees than smaller companies because they will be able to raise more money and handle these layoffs, while the small companies would need their staff.

In [1]:
# Assuming df is your DataFrame containing the data
x = df['Funding']
y = df['Percentage']

# Define the number of bins for each axis
bins_x = np.linspace(min(x), 900, 20)
bins_y = np.linspace(min(y), max(y), 20)

# Create 2D histogram
hist, x_edges, y_edges = np.histogram2d(x, y, bins=(bins_x, bins_y))

# Create meshgrid of x and y values for plotting
x_grid, y_grid = np.meshgrid(x_edges, y_edges)

# Plot the 2D histogram using a pcolormesh
plt.figure(figsize=(8, 6))
plt.pcolormesh(x_grid, y_grid, hist.T, cmap='Blues')
plt.colorbar(label='Frequency')

# Add labels and title
plt.xlabel('Funding')
plt.ylabel('Percentage')
plt.title('2D Histogram of Funding and Percentage')


# Show the plot
plt.show()

NameError: name 'plt' is not defined

The 2D histogram we created depicts the correlation between the two variables. We chose to remove outliers because a majority of our data comes from smaller companies relative to others on the list. From our analysis, it appears there is a negative correlation between the two variables, though it is not a strong relationship. Furthermore, we observe that as a company receives more funding, the likelihood of layoffs decreases. The majority of the darker shaded blue squares are concentrated on the left side of the plot, where funding is lower.

In [None]:
no_outliers = df[df.get('Funding') < 120000]
outcome, predictors = patsy.dmatrices('Percentage ~ Funding', no_outliers)
model = sm.OLS(outcome, predictors)
results = model.fit()
slope = results.params[1]
print("P-value:", results.pvalues[1])
print("T-test:", results.tvalues[1])  
print("Slope:", slope)

The results of the t-test revealed a statistically significant difference between the means of the two groups (p = 0.0124). Given the obtained p-value of 0.0124, which is less than the conventional significance level of 0.05, we can reject the null hypothesis and conclude that there is a significant difference between the groups.

The negative t-statistic (-2.50) indicates that the mean of 'Percentage' is lower than the mean of 'Funding'. This suggests that there is a meaningful difference in the outcome variable between the two groups, with 'Percentage' exhibiting lower values compared to 'Funding'.

Furthermore, the negative slope of the line indicates a negative correlation between the two variables, as highlighted in the 2D Histogram. This corroborates the observed difference between the means and provides additional evidence of the relationship between 'Percentage' and 'Funding'.