# COGS 108 - Project Proposal

# Names

- Auritro Dutta
- Jacquelyn Garcia
- Prabhmeet Gujral
- Ethan Heath
- Aniruddh Krovvidi

# Research Question

Can historical financial data be combined with Environmental, Social, and Governance (ESG) criteria to effectively predict future stock prices using Machine Learning models for companies that also meet high ESG standards, thereby facilitating data-driven and socially responsible investing?
The features that the model will be trained on includes:

- Financial Indicators: The model will use traditional financial metrics such as historical stock prices, financial ratios (e.g., P/E ratio, debt-to-equity ratio), and indicators of volatility as inputs.

- ESG Scores: ESG data will be categorized into Environmental, Social, and Governance scores, which include sub-factors like carbon footprint (Environmental), employee welfare (Social), and board diversity (Governance).

# Background and Prior Work

#### Introduction to ESG Investing and Stock Price Prediction
Environmental, Social, and Governance (ESG) investing has gained significant traction in recent years as investors seek to align their financial goals with their values. ESG investing involves considering a company's environmental impact, social responsibilities, and governance practices alongside traditional financial metrics. The underlying hypothesis is that companies with high ESG ratings not only contribute positively to society but also exhibit more stable and potentially superior financial performance. Our project aims to leverage machine learning models to predict stock prices by incorporating ESG scores with traditional financial indicators. This dual-focus approach aims to provide insights into whether ESG factors enhance the predictive power of financial models, thereby supporting socially responsible investment decisions.

#### Prior Work on ESG and Financial Performance
Previous studies have explored the relationship between ESG factors and financial performance, providing a foundation for our research. A notable study by Friede, Busch, and Bassen (2015) conducted a meta-analysis of over 2,000 empirical studies and found that the majority of these studies reported a positive relationship between ESG factors and corporate financial performance. This comprehensive review suggests that ESG criteria can be financially beneficial and supports the hypothesis that ESG-compliant companies may exhibit favorable stock performance.<a name="#fn1"></a>[<sup>1</sup>](#note-1)

Further, a study by Khan, Serafeim, and Yoon (2016) published in the Journal of Accounting and Economics examined how material ESG issues—those that are likely to affect a company’s financial condition—are linked to stock price performance. They found that firms with good performance on material sustainability issues outperform those with poor performance, suggesting that ESG factors, when material, can provide valuable insights for investors.<a name="#fn2"></a>[<sup>2</sup>](#note-2)

In the realm of machine learning, there have been several attempts to predict stock prices using various algorithms. For instance, the use of LSTM neural networks for stock price prediction has been well-documented. A study by Fischer and Krauss (2018) utilized LSTM networks to predict S&P 500 stock prices and found that LSTM models significantly outperformed traditional models in capturing the temporal dependencies in financial data.<a name="#fn3"></a>[<sup>3</sup>](#note-3) This study highlights the potential of advanced machine learning models to enhance the accuracy of stock price predictions.

#### In-Depth Study Analysis
An in-depth analysis of the intersection of ESG investing and stock price prediction reveals a growing body of work focused on integrating ESG factors into financial models. One such study by Henisz, Koller, and Nuttall (2019) in the McKinsey Quarterly emphasized the increasing importance of ESG factors in driving long-term financial performance. Their research suggested that ESG issues are often linked to critical factors such as regulatory compliance, operational efficiencies, and brand reputation, which can significantly impact stock prices.<a name="#fn4"></a>[<sup>4</sup>](#note-4)

Another significant study by Bolton and Kacperczyk (2020) investigated the relationship between carbon emissions and stock returns. They found that firms with higher carbon emissions tend to have lower stock returns, indicating that environmental factors can have a substantial impact on financial performance. This study aligns with the broader hypothesis that ESG factors, particularly environmental issues, play a critical role in influencing investor behavior and stock price trends.<a name="#fn5"></a>[<sup>5</sup>](#note-5)

Furthermore, a paper by Albuquerque, Koskinen, and Zhang (2019) published in the Journal of Financial Economics examined how corporate social responsibility (CSR) activities influence firm risk and stock returns. They discovered that firms engaging in CSR activities generally experience lower risk and higher returns, supporting the integration of social factors into investment models.<a name="#fn6"></a>[<sup>6</sup>](#note-6)

#### Relevant References
1. <a name="#note-1"></a>[^](#fn1) Friede, G., Busch, T., & Bassen, A. (2015). ESG and financial performance: aggregated evidence from more than 2000 empirical studies. Journal of Sustainable Finance & Investment, 5(4), 210-233. 
This meta-analysis provides comprehensive evidence of the positive relationship between ESG factors and corporate financial performance, suggesting that ESG criteria are not only ethical but also financially beneficial. https://www.tandfonline.com/doi/full/10.1080/20430795.2015.1118917#d1e255

2. <a name="#note-2"></a>[^](#fn2) Khan, M., Serafeim, G., & Yoon, A. (2016). Corporate sustainability: First evidence on materiality. The Accounting Review, 91(6), 1697-1724. 
This study examines the impact of material ESG issues on stock price performance, finding that firms excelling in material sustainability issues tend to outperform those that do not, providing a basis for integrating ESG factors into investment models. https://dash.harvard.edu/bitstream/handle/1/14369106/15-073.pdf;jsessionid=5212220466676E63E99E26EF77D83571?sequence=1

3. <a name="#note-3"></a>[^](#fn3) Fischer, T., & Krauss, C. (2018). Deep learning with long short-term memory networks for financial market predictions. European Journal of Operational Research, 270(2), 654-669. 
This paper demonstrates the effectiveness of LSTM neural networks in predicting stock prices, highlighting their ability to capture complex temporal patterns, which is crucial for accurate financial forecasting. https://www.sciencedirect.com/science/article/pii/S0377221717310652

4. <a name="#note-4"></a>[^](#fn4) Henisz, W., Koller, T., & Nuttall, R. (2019). Five ways that ESG creates value. McKinsey Quarterly. 
This article emphasizes the increasing importance of ESG factors in driving long-term financial performance and provides insights into how ESG issues can impact stock prices through regulatory compliance, operational efficiencies, and brand reputation. https://info.fiduciary-trust.com/hubfs/Fiduciary_Insights/McKinsey_Five_Ways_that_ESG_Creates_Value.pdf

5. <a name="#note-5"></a>[^](#fn5) Bolton, P., & Kacperczyk, M. (2020). Do investors care about carbon risk? Journal of Financial Economics, 142(2), 517-549. 
This study investigates the relationship between carbon emissions and stock returns, highlighting the substantial impact of environmental factors on financial performance and investor behavior. https://www.sciencedirect.com/science/article/pii/S0304405X21001902

6. <a name="#note-6"></a>[^](#fn6) Albuquerque, R., Koskinen, Y., & Zhang, C. (2019). Corporate Social Responsibility and Firm Risk: Theory and Empirical Evidence. Journal of Financial Economics, 137(2), 479-497.
This paper examines how CSR activities influence firm risk and stock returns, finding that firms engaging in CSR activities generally experience lower risk and higher returns, supporting the integration of social factors into investment models. https://pubsonline.informs.org/doi/epdf/10.1287/mnsc.2018.3043

By building on these studies, our project will integrate ESG scores with traditional financial indicators in machine learning models to predict stock prices, aiming to validate and extend the understanding of ESG's role in financial performance. We will explore whether ESG factors provide additional predictive power beyond traditional metrics, potentially leading to more robust and socially responsible investment strategies.    

# Hypothesis


We hypothesize that incorporating ESG factors alongside traditional financial indicators will improve the accuracy of stock price predictions. Specifically, we believe that companies with higher ESG scores will demonstrate more stable and potentially superior stock performance compared to those with lower ESG scores. This hypothesis is based on existing literature that indicates a positive correlation between ESG compliance and financial performance, suggesting that ESG factors provide valuable insights into a company's long-term stability and growth potential. By integrating ESG scores into our machine learning models, we expect to capture additional dimensions of company performance that are not fully reflected in traditional financial metrics, leading to more accurate and socially responsible investment predictions.

# Data

1. Explain what the **ideal** dataset you would want to answer this question. (This should include: What variables? How many observations? Who/what/how would these data be collected? How would these data be stored/organized?)
For this project, an ideal dataset would include a comprehensive range of historical financial data combined with detailed Environmental, Social, and Governance (ESG) scores. This dataset would ideally span 10 to 15 years to capture evolving trends. The objective is to track stock price movements alongside ESG indicators, covering metrics such as stock prices, financial ratios like price-to-earnings and debt levels, and ESG specifics like carbon emissions, workforce diversity, and governance practices.

For meaningful analysis, data from approximately 1,000 publicly traded companies across various sectors would allow insights into broader trends beyond isolated industry cases. Collecting this would likely involve sourcing financial metrics from stock exchanges or platforms like Yahoo Finance, while ESG data could come from agencies like MSCI or Refinitiv. Organizing this data by company and date in a structured database would enable easy querying and updating for ongoing analysis.

In practice, compiling this data can be complex. While platforms like Yahoo Finance offer consistent financial metrics, they typically lack comprehensive ESG information. ESG scores are often provided by specialized agencies and may be restricted by factors like company size, geography, and disclosure practices. Some detailed ESG datasets, such as those from MSCI, require institutional access.

To approximate the ideal dataset, a practical approach might be to merge data from multiple sources. Financial data could be accessed via public APIs or exchanges, while ESG metrics might be drawn from government data or public reports that capture general indicators like carbon emissions or governance ratings. Although this approach may not match the precision of a fully integrated dataset, it allows for a substantial investigation of ESG’s influence on financial performance.

By combining the best available data thoughtfully, we can still conduct a meaningful analysis on the role of ESG factors in financial forecasting, even with certain limitations.



# Ethics & Privacy

- Thoughtful discussion of ethical concerns included
- Ethical concerns consider the whole data science process (question asked, data collected, data being used, the bias in data, analysis, post-analysis, etc.)
- How your group handled bias/ethical concerns clearly described

Acknowledge and address any ethics & privacy related issues of your question(s), proposed dataset(s), and/or analyses. Use the information provided in lecture to guide your group discussion and thinking. If you need further guidance, check out [Deon's Ethics Checklist](http://deon.drivendata.org/#data-science-ethics-checklist). In particular:

- Are there any biases/privacy/terms of use issues with the data you propsed?
- Are there potential biases in your dataset(s), in terms of who it composes, and how it was collected, that may be problematic in terms of it allowing for equitable analysis? (For example, does your data exclude particular populations, or is it likely to reflect particular human biases in a way that could be a problem?)
- How will you set out to detect these specific biases before, during, and after/when communicating your analysis?
- Are there any other issues related to your topic area, data, and/or analyses that are potentially problematic in terms of data privacy and equitable impact?
- How will you handle issues you identified?

# Team Expectations 


Read over the [COGS108 Team Policies](https://github.com/COGS108/Projects/blob/master/COGS108_TeamPolicies.md) individually. Then, include your group’s expectations of one another for successful completion of your COGS108 project below. Discuss and agree on what all of your expectations are. Discuss how your team will communicate throughout the quarter and consider how you will communicate respectfully should conflicts arise. By including each member’s name above and by adding their name to the submission, you are indicating that you have read the COGS108 Team Policies, accept your team’s expectations below, and have every intention to fulfill them. These expectations are for your team’s use and benefit — they won’t be graded for their details.

* *Team Expectation 1*
* *Team Expectation 2*
* *Team Expecation 3*
* ...

# Project Timeline Proposal

Specify your team's specific project timeline. An example timeline has been provided. Changes the dates, times, names, and details to fit your group's plan.

If you think you will need any special resources or training outside what we have covered in COGS 108 to solve your problem, then your proposal should state these clearly. For example, if you have selected a problem that involves implementing multiple neural networks, please state this so we can make sure you know what you’re doing and so we can point you to resources you will need to implement your project. Note that you are not required to use outside methods.



| Meeting Date  | Meeting Time| Completed Before Meeting  | Discuss at Meeting |
|---|---|---|---|
| 1/20  |  1 PM | Read & Think about COGS 108 expectations; brainstorm topics/questions  | Determine best form of communication; Discuss and decide on final project topic; discuss hypothesis; begin background research | 
| 1/26  |  10 AM |  Do background research on topic | Discuss ideal dataset(s) and ethics; draft project proposal | 
| 2/1  | 10 AM  | Edit, finalize, and submit proposal; Search for datasets  | Discuss Wrangling and possible analytical approaches; Assign group members to lead each specific part   |
| 2/14  | 6 PM  | Import & Wrangle Data (Ant Man); EDA (Hulk) | Review/Edit wrangling/EDA; Discuss Analysis Plan   |
| 2/23  | 12 PM  | Finalize wrangling/EDA; Begin Analysis (Iron Man; Thor) | Discuss/edit Analysis; Complete project check-in |
| 3/13  | 12 PM  | Complete analysis; Draft results/conclusion/discussion (Wasp)| Discuss/edit full project |
| 3/20  | Before 11:59 PM  | NA | Turn in Final Project & Group Project Surveys |