**If you lost points on the last checkpoint you can get them back by responding to TA/IA feedback**  

Update/change the relevant sections where you lost those points, make sure you respond on GitHub Issues to your TA/IA to call their attention to the changes you made here.

Please update your Timeline... no battle plan survives contact with the enemy, so make sure we understand how your plans have changed.

# COGS 108 - EDA Checkpoint

# Names

- McKayla David
- Sebastian Modafferi
- Anna Potapenko
- Matthew Chan
- Kirthin Rajkumar


# Research Question

On a global economic scale, do larger (on the basis of funding and quantity of employees) companies lay off a higher percentage of employees than smaller companies? What indicators specifically can be used to predict the percentage of employees laid off?


## Background and Prior Work

Layoffs refer to economic and organizational changes, and are a significant indicator of the success and development of companies. While they affect employees, layoff rates have a broader implication on the health of the economy, industry trends, and the state of the workforce. Understanding the trends and impacts of layoffs is vital not only on a global economic scale, but within communities of stakeholders and individual communities.

Layoffs are important to study due to their relevance to both companies and employees. For a company, understanding indicators which can predict imminent layoffs can help them course correct before reaching a point of no return. On the other hand, employees understanding layoff indicators can help them in choosing the correct company for their next role, ensuring job safety. 

Research published in the Journal of the European Economic Association [1^](#https://academic.oup.com/jeea/article-abstract/18/1/427/5247011) explored the economic influences that cause layoffs, and inquired into how financial health and market factors influence layoff decisions. A similar study published in Journal of Labor Economics looks into the effects of layoffs on unemployment rates, and found that layoffs can have lasting impact on the job market and employee career trajectory.

The journal of Labor Empirical Finance [3^](#https://doi.org/10.1016/s0927-5398\(01\)00024-x) also looks into the different firms and what caused their layoffs, giving insight into company restructuring and different technologies that help to modify the workforce requirements. Additionally, past precedent reviewed by JSTOR [2^](#https://www.jstor.org/stable/117002?casa_token=m7s1bFw7mY4AAAAA%3AhaYXwJWsj5E0Xo7vbnjns6omvUnSFYlenLVZ99nBhONKkQRCLyfLIdEk3ZJycob9If4HtLaMga7y7cQzrzAO6QfJYXTkccHfVciVYhTXREH7HSHuGN4) article explains the repetition of layoffs and how it correlates with economic cycles. This suggests that layoffs are an essential part of economic growth.

Research on layoffs adopts an interdisciplinary approach, using economic theories, organizational behavior, and societal impacts. Overall, it is imperative to understand the factors that influence layoffs because knowledge about these factors can help researchers develop strategies to mitigate the negative effects of layoffs on employees and the economy at large. Existing work does not provide internal indicators for when a company is about to execute layoffs, so our research seeks to identify a correlation between company size and layoffs.

1. <a name="Journal of the European Economic Association"></a> [^](#https://academic.oup.com/jeea/article-abstract/18/1/427/5247011) Gathmann, C., Helm, I., & Schönberg, U. (2018). Spillover effects of mass layoffs. Journal of the European Economic Association, 18(1), 427–468. https://doi.org/10.1093/jeea/jvy045
2. <a name="JSTOR"></a> [^](#cite_ref-2) Hallock, Kevin, (1998). Layoffs, top executive pay, and firm performance on JSTOR. (n.d.). www.jstor.org. https://www.jstor.org/stable/117002
3. <a name="Journal of Empirical Finance"></a> [^](#https://doi.org/10.1016/s0927-5398\(01\)00024-x) Chen, P., Mehrotra, V., Sivakumar, R., & Yu, W. (2001). Layoffs, shareholders’ wealth, and corporate performance. Journal of Empirical Finance, 8(2), 171–199. https://doi.org/10.1016/s0927-5398(01)00024-x


# Hypothesis


We hypothesize that larger companies lay off a higher percentage of employees (especially amidst a recession) than smaller companies. We are inclined to believe this due to the fact that smaller companies already have less employees, so lay-offs are more likely to harm the business than benefit it. Additionally, larger companies are able to withstand more financial pressure, allowing them to perform large layoffs despite the impact on company performance given that they have enough capital with withstand the losses.

# Data

## Data overview

For each dataset include the following information
- Dataset #1 - Kaggle
  - Dataset Name: "Tech Layoffs 2020-2024"
  - https://www.kaggle.com/datasets/ulrikeherold/tech-layoffs-2020-2024
  - Number of observations: 1418
  - Number of variables: 16

This dataset was webscraped from layoffs.fyi. It contains layoff data over the past 4 years which was webscraped from news articles. The key data variables we will be using are `Money_Raised_in_$_mil`, `Percentage`, `Laid_Off`, `Funding`, and `Stage`. We are focusing analysis on these columns because they contain vital information about layoffs and how the company is performing. It comes fairly clean, and the only correction required is the `Money_Raised_in_$_mil` column, as it initally was stored as a string containing a dollar sign character.

## Layoffs.fyi Dataset

In [1]:
import pandas as pd
import numpy as np
df = pd.read_excel('./data/tech_layoffs.xlsx')
df.head()

Unnamed: 0,#,Company,Location_HQ,Country,Continent,Laid_Off,Date_layoffs,Percentage,Company_Size_before_Layoffs,Company_Size_after_layoffs,Industry,Stage,Money_Raised_in_$_mil,Year,lat,lng
0,3,ShareChat,Bengaluru,India,Asia,200,2023-12-20,15.0,1333,1133,Consumer,Series H,$1700,2023,12.97194,77.59369
1,4,InSightec,Haifa,Israel,Asia,100,2023-12-19,20.0,500,400,Healthcare,Unknown,$733,2023,32.81841,34.9885
2,6,Enphase Energy,San Francisco Bay Area,USA,North America,350,2023-12-18,10.0,3500,3150,Energy,Post-IPO,$116,2023,37.54827,-121.98857
3,7,Udaan,Bengaluru,India,Asia,100,2023-12-18,10.0,1000,900,Retail,Unknown,1500,2023,12.97194,77.59369
4,14,Cruise,San Francisco Bay Area,USA,North America,900,2023-12-14,24.0,3750,2850,Transportation,Acquired,$15000,2023,37.77493,-122.41942


In [2]:
#remove company name
df = df.drop(columns=['Company'])

In [3]:
df['Money_Raised_in_$_mil'].dtypes

dtype('O')

In [6]:
df['Funding'] = df['Money_Raised_in_$_mil'].apply(lambda s: np.float64(s[1:])) 
df['Funding'].head()

0     1700.0
1      733.0
2      116.0
3      500.0
4    15000.0
Name: Funding, dtype: float64

# Results

## Exploratory Data Analysis

Carry out whatever EDA you need to for your project.  Because every project will be different we can't really give you much of a template at this point. But please make sure you describe the what and why in text here as well as providing interpretation of results and context.

### Section 1 of EDA - please give it a better title than this

Some more words and stuff.  Remember notebooks work best if you interleave the code that generates a result with properly annotate figures and text that puts these results into context.

In [None]:
## YOUR CODE HERE
## FEEL FREE TO ADD MULTIPLE CELLS PER SECTION

### Section 2 of EDA if you need it  - please give it a better title than this

Some more words and stuff.  Remember notebooks work best if you interleave the code that generates a result with properly annotate figures and text that puts these results into context.

In [None]:
## YOUR CODE HERE
## FEEL FREE TO ADD MULTIPLE CELLS PER SECTION

# Ethics & Privacy

##### Potential ethical concerns and how we plan to address them: 
Our dataset is webscraped from Layoffs.fyi, which contains explicit personal information on individuals who were laid off. Without explicit documentation of informed consent, for the sake of privacy conservation, we will be omitting this information and focusing on the metadata (corporations over the individual). Additionally, Layoffs.fyi only pulls data from news articles, so it is a biased sample that is pulled from data that is only accessible to the public. This dataset is primarily constructed by data contained to the USA, which effectively neglects layoffs that occur in other regions of the world, leading to potentially biased analysis and results. As a result of unsatisfactory observations from foreign countries, we will be orienting our data analysis in the context of the USA's economy. However, we will still include models and representations of non-US observations to provide scope and a point of reference to our data. The timeframe of our data is 2020-2024, which unfortunately excludes a larger historical context regarding layoffs, compounding potential bias and lack of scope. Due to this, our analysis will be further oriented towards a COVID and post-COVID economy.

# Team Expectations 

We expect our team members to be reliable in terms of completing one’s own work/contributions. They should maintain open communication between team members and are expected to communicate any scheduling conflicts for team meetings. They are still expected to complete their work before the meeting even if they are not able to make it. During team meetings, we expect all members to be actively contributing to discussion, and to be professional when discussing conflicts between ideas. Each member will be assigned tasks by the end of the team meeting, and they are expected to arrive to the next team meeting with their task completed sufficiently, and uploaded to the repository, such that we are able to discuss progress and any issues we ran into.

# Project Timeline Proposal

| Meeting Date  | Meeting Time| Completed Before Meeting  | Discuss at Meeting |
|---|---|---|---|
| 02/04 | 1 PM | Read previous COGS 108 Final Projects | Complete previous quarters’ COGS 108 Final Project Analysis, plan meeting times, begin discussing project topics. | 
| 02/05  | 1 PM | Brainstorm project topics, potential data sources, and viability of research questions | Discuss and decide on final project topic; discuss hypothesis; begin background research Discuss ideal dataset(s) and ethics; draft project proposal| 
| 02/11  | 1 PM | Edit, finalize, and submit proposal; Search for datasets | Discuss Wrangling and possible analytical approaches; Assign group members to lead each specific part |
| 02/18  | 1 PM  | Delegate Tasks and start wrangling | Go over what everyone has done. Make edits or revise things before. Also go over revisions and feedback from the proposal. |
| 02/25  | 1 PM  | Finalize wrangling/EDA; Begin Analysis | Meet for Checkpoint #1 |
| 03/03  | 1 PM  | Discuss final approaches for Data Viz and EDA; Continue Analysis | Meet for Checkpoint #2 |
| 03/06  | 7:30 PM  | Finalize Data Viz and EDA; Begin Analysis | Meet for Checkpoint #2 |
| 03/10  | 1 PM  | Finalize quantitative analysis; Discuss approach to final video submission | Meet for video and final submission semantics |
| 03/13 | 12 PM  | Complete analysis; Draft results/conclusion/discussion | Discuss/edit full project |
| 03/20  | Before 11:59 PM  | NA | Turn in Final Project & Group Project Surveys |