# 1. Theoretical Framework - Composite Stock Investment Attractiveness Index (CSIAI)
> “A theoretical framework should be developed to provide the basis for the selection and combination of single indicators into a meaningful composite indicator under a fitness-for-purpose principle.”  
> — *OECD and European Commission (2008) Handbook on Constructing Composite Indicators: Methodology and User Guide.*

This notebook will be used to explain what I want to measure, why it matters and how I plan  to build the CSIAI.  I will try to have here all the decisions that I plan to take all through.

## 1.1. Purpose and Fitness for Purpose  
I will be making the CSIAI to answer one practical question:

> **How attractive is an individual U.S.‐listed company to an equity investor *today*?**

* *Investor perspective:* retail or institutional investors screening for ideas.  
* *Time horizon:* daily refresh beginning **2025-01-01**.  
* *Scope:* Filtered Russell 3000 universe.

## 1.2. What the Index Measures

After reading the OECD Handbook and the Technical Analysis of The Financial Markets by *John Murphy*, I have decided to have 5 dimensions to measure the attractiveness of a stock.  I will be using the following definition of attractiveness:

| Dimension | Core question it answers |
|-----------|--------------------------|
| Financial Strength | Can the company fund itself and pay its bills? |
| Growth Potential | Is the business expanding in a healthy way? |
| Market Performance | How is the market valuing profits and cash returns? |
| Risk & Volatility | How much does the share price swing? |
| Liquidity & Trading | How easy is it to buy or sell the shares? |

These five dimensions cover the most common factors investors check before buying a stock.

## 1.3. Hierarchical Structure

* **Level 1** – CSIAI, a single score.  
* **Level 2** – The five dimensions listed above.  
* **Level 3** – Three to five or more raw indicators inside each dimension

Note that I am writing this based off my initial understanding from reading the OECD Handbook and *John Murphy's* book.  I will be iterating on this as I learn more about the data and the indicators.
Every Indicator is first scaled from its original range to 0-100. Then the average of the indicators will be used to form each dimension score. The average of the five dimensions will form the final CSIAI score.  This is a simple and transparent structure that is easy to understand and explain. I might use different weights for the indicators and dimensions later, but I will start with equal weights to keep it simple.  I will also test different weighting schemes in later notebooks to see how sensitive the CSIAI is to those changes.

## 1.4. Rules for Choosing Indicators
I will be selecting the indicators in **02_data_selection.ipynb**. I will be using the following rules to select the indicators:

* **Concept link** – The indicator must fit only one dimension.
* **Data quality** – Meets Eurostat and OECD standards for relevance, accuracy, timeliness, accessibility, interpretability, comparability, and coherence.
* **Coverage** – Available for at least 90 percent of the filtered universe on the given day.
* **Statistical usefulness** – Shows real variation across companies and low overlap with other indicators (multicollinearity will be checked).
* **Update frequency** – Right now I will be using daily data but note that Stock Market data is not available for all companies every day.  I will be using the last available data point for each company.
* **Documentation** – Definition, unit, source URL, and refresh schedule will be put in the metadata dictionary and API.

## 1.5. Learning Path and Stakeholder Input

I actually had huge interests of applying data science to the finance market and I decided to use this project to learn more about it. I started with limited finance knowledge.  To prepare I studied:

* Murphy, J. *Technical Analysis of the Financial Markets*.  
* Hilpisch, Y. *Python for Algorithmic Trading*.  
* OECD Handbook on Composite Indicators (2008).

## 1.6. Transparency and Replicability

* All raw data comes from Yahoo Finance through the open-source `yfinance` library.  
* Processing scripts, normalisation parameters, and weighting choices will be in this repository and I will try to keep this updated.
* I will set try to set up a dictionary for each of the indicator in the `docs/metadata` folder.

## 1.7. Expected Limitations

* **Short history** – At the moment I have decided to use data starting 2025-01-01 means some risk metrics will be noisy in the first months.  
* **Missing fundamentals** – Smaller companies may lack certain accounting ratios. I might decide to use Sector medians or listwise deletion depending on the missing rate.  
* **Equal weighting** – Simple to explain but may not match real market influence. i will apply Sensitivity analysis to test other weight sets.


## 1.8. Reference List

* Organisation for Economic Co-operation and Development: *Handbook on Constructing Composite Indicators: Methodology and User Guide*, 2008.  
* Murphy, John J. *Technical Analysis of the Financial Markets*, 1999.  
* Hilpisch, Yves. *Python for Algorithmic Trading*. Sebastopol, 2020.