# How to create a Stock Universe
---

Creating a **stock universe** is the process of defining a specific group of stocks that you will analyze, monitor, or potentially include in a portfolio. This universe serves as the first filter before deeper analysis—fundamental or technical—takes place. Below are some common steps and considerations to help you build one:


### 1. Define Your Investment Objective or Strategy

- **Investment Style**: Are you looking at growth, value, dividend, or momentum plays?  
- **Market Cap Focus**: Do you want large-cap, mid-cap, or small-cap companies?  
- **Geographical Scope**: Will you focus on domestic (e.g., U.S.-only) or international (global) markets?  
- **Sector or Industry Focus**: Do you want a broad sector distribution or concentrate on specific industries?

Clarifying your **goal and style** helps filter out stocks that don’t align with your strategy.

<br>

### 2. Set Criteria for Initial Screening

#### 2.1 Quantitative Factors

- **Market Capitalization**: You may limit your universe to companies above a certain market cap for stability or below a threshold for potential high growth.  
- **Liquidity (Trading Volume)**: Exclude illiquid stocks that are hard to trade without influencing the price significantly.  
- **Price**: Some investors prefer stocks above a certain price to avoid penny stocks; others use no minimum price filter.  
- **Volatility / Beta**: Depending on your risk tolerance, you may want to filter by volatility or market beta.  

#### 2.2 Fundamental Factors

- **Revenue Growth**: Companies showing consistent top-line growth.  
- **Earnings Per Share (EPS) Growth**: A trend of positive earnings growth.  
- **Return on Equity (ROE)** or **Return on Assets (ROA)**: Indicates how efficiently a company uses shareholder funds or assets.  
- **Debt-to-Equity Ratio**: A measure of financial leverage and risk.  
- **Dividend Yield and Payout Ratio**: For income-focused strategies.

#### 2.3 Qualitative Factors

- **Industry Leadership**: Market leaders or innovators in a sector.  
- **Management Quality**: Strong track record, good corporate governance.  
- **Brand / Competitive Moat**: Companies with strong, defensible competitive advantages.  

<br>

### 3. Use Tools and Platforms for Screening

Most brokerage platforms, financial websites, or dedicated stock-screening tools offer:

- **Predefined Screeners**: Filter stocks based on popular criteria (e.g., high dividend yield, undervalued tech stocks).  
- **Custom Screening**: Input your own metrics to build a more personalized universe.

Examples of such platforms include Yahoo Finance, Finviz, TradingView, or Bloomberg (institutional).

<br>

### 4. Narrow Down or Expand as Needed

- **Initial Screen**: Start with broad criteria (e.g., market cap > \$1 billion, volume > 500k shares/day).  
- **Refine**: Add more filters (e.g., ROE > 10%, forward P/E < 20) to limit the list to a manageable size.  
- **Validate**: Check the resulting list to ensure no unexpected exclusions (e.g., you might accidentally filter out a promising growth stock).  
- **Periodic Review**: Market conditions change, so re-screen or adjust criteria periodically.

<br>

### 5. Evaluate and Finalize

#### 5.1 Check for Liquidity & Tradability

Make sure the stocks in your universe can be bought or sold easily:
- **Average Daily Volume**  
- **Bid-Ask Spread**  
- **Volatility**

#### 5.2 Confirm Fundamentals & Catalysts

For each shortlisted company, look at:
- **Recent Earnings Reports & Guidance**  
- **Upcoming Catalysts** (product launches, regulatory approvals, etc.)  
- **Industry Trends** 

#### 5.3 Create Tiers or Watchlists

You might categorize your final universe into:
- **Tier 1**: High conviction/priority stocks for deeper research.  
- **Tier 2**: Secondary watchlist for potential opportunities.  
- **Tier 3**: Stocks on the radar but needing further development or better market conditions.

<br>

### 6. Continual Maintenance and Updates

- **Quarterly Reviews**: Re-check earnings releases and any fundamental or price changes.  
- **Add/Remove Stocks**: As companies evolve, new opportunities emerge, or existing ones no longer meet your criteria.  
- **Respond to Market Changes**: Adjust filters if there’s a shift in market conditions or your investment objectives.

<br>

### Example: Simple Step-by-Step

1. **Objective**: Focus on U.S.-based dividend-paying large-cap stocks.  
2. **Initial Filters**: 
   - Market cap > \$10 billion  
   - Dividend yield > 2%  
   - Average daily trading volume > 1 million shares  
3. **Fundamental Filters**: 
   - Return on Equity (ROE) > 10%  
   - Debt-to-Equity ratio < 1.0  
4. **Result**: A list of 40–50 stocks.  
5. **Additional Qualitative Review**: Examine industry trends and management quality.  
6. **Final Universe**: 20–30 stocks that fit well with your dividend growth objective.

<br>


By following a structured method—identifying objectives, setting clear criteria, screening, refining, and periodically revisiting—you create a **stock universe** that aligns with your strategy and risk profile, ensuring you focus your time and research on the most relevant opportunities.

# Creating a Stock Filter
---

In this notebook, we begin with the stocks included in the S&P 500 (Standard & Poor’s 500) as our starting universe. The S&P 500 is a free-float adjusted market index comprising 500 of the largest companies listed on U.S. stock exchanges. By the end of this notebook, you will have a refined set of stocks that can serve as a potential pool for implementing your trading strategies.

In [39]:
import pandas as pd
import yfinance as yf

import warnings
warnings.filterwarnings("ignore")

## Downloading all the S&P500 tickers

It's a common trick to scrape the symbols from this <a href="https://en.wikipedia.org/wiki/List_of_S%26P_500_companies">Wiki page </a> that provides a list of S&P 500 companies. The URL is in the code below. We'll use pandas `read_html()` method. This returns the data as a list, we need the first object in this list.

# NOTE: PUT LINK TO YFINANCE PAGE

In [30]:
url = 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'

symbols = pd.read_html(url)
symbols = symbols[0]
symbols.head()

Unnamed: 0,Symbol,Security,GICS Sector,GICS Sub-Industry,Headquarters Location,Date added,CIK,Founded
0,MMM,3M,Industrials,Industrial Conglomerates,"Saint Paul, Minnesota",1957-03-04,66740,1902
1,AOS,A. O. Smith,Industrials,Building Products,"Milwaukee, Wisconsin",2017-07-26,91142,1916
2,ABT,Abbott Laboratories,Health Care,Health Care Equipment,"North Chicago, Illinois",1957-03-04,1800,1888
3,ABBV,AbbVie,Health Care,Biotechnology,"North Chicago, Illinois",2012-12-31,1551152,2013 (1888)
4,ACN,Accenture,Information Technology,IT Consulting & Other Services,"Dublin, Ireland",2011-07-06,1467373,1989


We'll need to slice just the 'Symbol' column and conver it to a list in order to pass to `yf.download()`.

In [31]:
symbol_list = symbols['Symbol'].to_list()
symbol_list[:5]

['MMM', 'AOS', 'ABT', 'ABBV', 'ACN']

Finally we'll need to tidy up the symbols a little using list comprehension and `replace` function

In [32]:
ticker_symbol = [symbol.replace(".","-") for symbol in symbol_list]

Now we can download the price data.

In [40]:
start  = '2020-02-01'
end    = '2022-03-01'

df = yf.download(ticker_symbol, start=start, end=end, auto_adjust=True)['Close']
df.head()

[*********************100%***********************]  503 of 503 completed

6 Failed downloads:
['GEV', 'SOLV', 'SW', 'VLTO', 'KVUE', 'GEHC']: YFPricesMissingError('$%ticker%: possibly delisted; no price data found  (1d 2020-02-01 -> 2022-03-01) (Yahoo error = "Data doesn\'t exist for startDate = 1580533200, endDate = 1646110800")')


Ticker,A,AAPL,ABBV,ABNB,ABT,ACGL,ACN,ADBE,ADI,ADM,...,WTW,WY,WYNN,XEL,XOM,XYL,YUM,ZBH,ZBRA,ZTS
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2020-02-03,79.389061,74.810112,66.682755,,79.711349,42.486145,193.826538,358.0,100.393242,39.308239,...,200.779266,23.410841,125.143692,59.626522,47.651356,78.529053,96.953537,138.668381,242.550003,130.332886
2020-02-04,80.713005,77.279869,68.351852,,80.782593,42.666817,198.23848,366.73999,102.63842,39.666698,...,204.008835,23.377655,129.153564,59.497746,47.055031,81.51606,97.226852,146.381012,247.869995,133.650818
2020-02-05,82.07563,77.910042,70.191093,,82.000336,43.475082,197.949295,365.549988,106.489876,40.278713,...,205.399506,23.991547,128.744797,59.497746,49.220634,82.552559,97.290627,147.24939,247.809998,132.612167
2020-02-06,81.969322,78.821365,70.636734,,81.917923,43.874458,199.758835,367.459991,105.677597,40.20002,...,194.983002,23.767559,130.603775,59.497746,48.553692,78.321762,94.520813,148.47258,252.089996,133.650818
2020-02-07,80.374771,77.749969,74.777054,,80.452988,43.883968,197.352325,366.089996,102.465004,39.963963,...,196.243073,23.709492,123.518318,58.991207,48.231995,79.339409,92.971886,146.026169,247.259995,133.583496


I've chosen to download the volume data to a seperate dataframe just to make life a little easier. 

In [46]:
start  = '2020-02-01'
end    = '2022-03-01'

vol = yf.download(ticker_symbol, start=start, end=end, auto_adjust=True)['Volume']
vol.head()

[*********************100%***********************]  503 of 503 completed

6 Failed downloads:
['GEV', 'SOLV', 'SW', 'VLTO', 'KVUE', 'GEHC']: YFPricesMissingError('$%ticker%: possibly delisted; no price data found  (1d 2020-02-01 -> 2022-03-01) (Yahoo error = "Data doesn\'t exist for startDate = 1580533200, endDate = 1646110800")')


Ticker,A,AAPL,ABBV,ABNB,ABT,ACGL,ACN,ADBE,ADI,ADM,...,WTW,WY,WYNN,XEL,XOM,XYL,YUM,ZBH,ZBRA,ZTS
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2020-02-03,1919800,173788400,10143700,,4675900,2053600,2345700,2693800,2134800,3939400,...,560800,5626500,2503500,3880000,27389300,1259300,1671000,1962768,262400,2377000
2020-02-04,1676000,136616400,8251500,,3904600,1351600,2242800,3053000,2249800,2567400,...,595900,7236300,3969200,3705700,31922100,1380900,1595500,3160349,251100,1839800
2020-02-05,2345100,118826800,10020600,,3565700,2279300,2102500,2463400,2192400,3038600,...,775000,4503400,2170600,3021000,32099200,1803700,1814400,1405023,244200,1507600
2020-02-06,835500,105425600,7769700,,4270700,1309900,1611500,2224500,1398100,2387300,...,1667900,2438000,2544300,4507500,16055400,5052400,3271600,1221271,407500,2206800
2020-02-07,1447600,117684000,19090300,,4472300,782900,1447700,2092700,1663800,2289400,...,808300,4101400,5616000,2316900,15112500,2496800,2396800,915979,385700,1780800


## Calculating the volatility of the stock


Below is a straightforward approach to calculating a stock’s volatility using daily returns. Typically, volatility refers to the *annualized standard deviation* of returns. Follow these steps:



#### 1. Gather Historical Price Data

You’ll need a time series of prices for the stock (usually the daily closing price). We've done this step

<br>

#### 2. Calculate Daily Returns

You can use either **simple returns** (percentage change) or **log returns**. Below is an example using simple returns.


df['Returns'] = df['Close'].pct_change() 


> **Note**: For log returns, you would do:
>
> ```python
> df['Log_Returns'] = np.log(df['Close'] / df['Close'].shift(1))
> ```

<br>

#### 3. Compute the Standard Deviation of Returns

The standard deviation of daily returns is one measure of *daily volatility*.

```python
daily_volatility = df['Returns'].std()
```
<br>

#### 4. Annualize the Volatility

To convert *daily* volatility to *annualized* volatility, multiply by the square root of the number of trading days in a year (commonly taken as 252 in U.S. markets):

```python
annualized_volatility = daily_volatility * np.sqrt(252)
```

<br>

#### 5. Interpret the Result

- **Daily Volatility**: Gives an idea of how much the stock’s return can move on a typical trading day.  
- **Annualized Volatility**: Commonly used to compare the risk profiles of different stocks or assets on an annual basis.

<br>

**In summary**, stock volatility is most often expressed in annualized terms by taking the standard deviation of daily returns and scaling it by $\sqrt{252}$. This metric helps investors understand the *typical range* of day-to-day price fluctuations and compare the *riskiness* of different stocks or portfolios.

In the example below we multiply the value by 100 to express it as a percentage.

In [43]:
stock = pd.DataFrame() 
stock['volatility'] = df.pct_change().std() * 100 
stock.head()

Unnamed: 0_level_0,volatility
Ticker,Unnamed: 1_level_1
A,1.908864
AAPL,2.351665
ABBV,1.768698
ABNB,3.495365
ABT,1.992053


Calculating the average price of the stock will allow us to eliminate any potential penny stocks we don't wish to trade

In [47]:
stock['price'] = df.mean() 
stock.head()

Unnamed: 0_level_0,volatility,price
Ticker,Unnamed: 1_level_1,Unnamed: 2_level_1
A,1.908864,118.286651
AAPL,2.351665,121.051284
ABBV,1.768698,90.092649
ABNB,3.495365,165.745588
ABT,1.992053,103.259156


## Calculating the average dollar volume traded

To calculate the **average dollar volume traded** for a stock, you need to multiply the daily trading volume by the stock’s price each day—often the closing or adjusted closing price—then compute the average over a specified period. 

**Steps to Calculate Average Dollar Volume**

1. **Obtain Historical Data**  
   - You need daily data that includes at least:
     - **Volume** (shares traded each day)
     - **Price** (e.g., closing price or adjusted closing price)
       

2. **Calculate Daily Dollar Volume**  
   - For each trading day, compute:
     $$
     \text{Dollar Volume} = \text{Volume} \times \text{Price}
     $$


3. **Compute the Average**  
   - Take the mean of the daily dollar volume over the period you’re interested in (e.g., the last 30 trading days).



**Why It Matters**

- **Liquidity Indicator**: Higher dollar volume generally implies better liquidity, meaning it’s easier to enter or exit positions without significantly moving the price.  
- **Stock Selection**: Traders often filter out stocks with low average dollar volume to avoid liquidity constraints.  
- **Volatility & Slippage**: Illiquid stocks can have wider bid-ask spreads, leading to greater trading costs.

By calculating the average dollar volume, you get a sense of **how actively a stock is traded** in dollar terms, helping you assess its **suitability** for your trading or investment strategy.

In [49]:
stock['av_$_vol_traded'] = (df * vol).mean()
stock.head() 

Unnamed: 0_level_0,volatility,price,av_$_vol_traded
Ticker,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
A,1.908864,118.286651,202965100.0
AAPL,2.351665,121.051284,13660500000.0
ABBV,1.768698,90.092649,710657100.0
ABNB,3.495365,165.745588,1133421000.0
ABT,1.992053,103.259156,611530700.0


In [50]:
len(stock)

503

As we can see we currently have 503 stocks. Let's apply some filters so we only have stocks that reach our criteria

The threshold limits are purely arbitrary and can be changed to whatever you like at this point

Firstly, let's eliminate any cheaper stocks costing less than 50

In [None]:
cond_1 = stock['price'] > 50

Now let's eliminate the more volatile stocks

In [None]:
cond_2 = stock['volatility'] < 4

...and finally, let's select stocks whose average dollar volume traded is more than 2 million

In [61]:
cond_3 = stock['av_$_vol_traded'] > 2 * 10**6  

Apply the filter to the dataFrame

In [62]:
filtered_stocks = stock.loc[cond_1 & cond_2 & cond_3]
filtered_stocks.head()

Unnamed: 0_level_0,volatility,price,av_$_vol_traded
Ticker,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
A,1.908864,118.286651,202965100.0
AAPL,2.351665,121.051284,13660500000.0
ABBV,1.768698,90.092649,710657100.0
ABNB,3.495365,165.745588,1133421000.0
ABT,1.992053,103.259156,611530700.0


In [63]:
len(filtered_stocks) 

346

This has reduced our universe down to 346 stocks that satisfy our criteria. 