In [1]:
%pip install yfinance --upgrade --no-cache-dir


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m24.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [2]:
""" 

Dispersion in Financial Data Analysis: Understanding Risk and Returns

Dispersion measures how spread out a dataset is. In finance, it's crucial for
understanding and quantifying risk, helping to gauge return volatility.

Interpreting Dispersion in Financial Returns:
- Tight Dispersion: Returns clustered around the mean suggest lower volatility and risk.
- Wide Dispersion: Spread out returns indicate higher volatility, higher risk, 
  but also potential for higher returns.

Limitations and Considerations:
1. Sample vs. Population: Sample data may not perfectly represent the entire population.
2. Time Series Nature: Mean and variance can change over time in financial data.
3. Outliers and Fat Tails: Extreme events occur more frequently than predicted 
   by normal distribution, which standard measures might underestimate. 
   
  """

" \n\nDispersion in Financial Data Analysis: Understanding Risk and Returns\n\nDispersion measures how spread out a dataset is. In finance, it's crucial for\nunderstanding and quantifying risk, helping to gauge return volatility.\n\nInterpreting Dispersion in Financial Returns:\n- Tight Dispersion: Returns clustered around the mean suggest lower volatility and risk.\n- Wide Dispersion: Spread out returns indicate higher volatility, higher risk, \n  but also potential for higher returns.\n\nLimitations and Considerations:\n1. Sample vs. Population: Sample data may not perfectly represent the entire population.\n2. Time Series Nature: Mean and variance can change over time in financial data.\n3. Outliers and Fat Tails: Extreme events occur more frequently than predicted \n   by normal distribution, which standard measures might underestimate. \n   \n  "

In [3]:
import yfinance as yf
import numpy as np

In [4]:
df = yf.download("AAPL", start="2024-01-01", end="2024-08-01", interval="1d")

[*********************100%%**********************]  1 of 1 completed


In [5]:
df.head()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2024-01-02,187.149994,188.440002,183.889999,185.639999,184.938217,82488700
2024-01-03,184.220001,185.880005,183.429993,184.25,183.553482,58414500
2024-01-04,182.149994,183.089996,180.880005,181.910004,181.222321,71983600
2024-01-05,181.990005,182.759995,180.169998,181.179993,180.495087,62303300
2024-01-08,182.089996,185.600006,181.5,185.559998,184.858521,59144500


#### Mean Absolute Deviation (MAD) :  Average distance between each data point and the mean


In [6]:
X = df['Adj Close']

mean = np.mean(X)

range = np.ptp(X)

distance_from_mean = X - mean

mad = np.mean(np.abs(distance_from_mean))


print("Mean: ", mean)
print("Range: ", range)
print("MAD: ", mad)


Mean:  189.87048141270466
Range:  69.9625244140625
MAD:  14.959539549819143


### Variance and Standard Deviation

Varience(σ2) : The average of the squared deviations around the mean
Standard deviation :  Square root of mean

In [7]:
print ( "Variance: ", np.var(X) )
print ( "Standard Deviation: ", np.std(X) )

Variance:  352.3251098914602
Standard Deviation:  18.77032524735414


### Chebyshev's inequality

Chebyshev's inequality provides an upper bound on the probability of deviation of a random variable from its mean

The proportion of samples within k standard deviations of the mean is at least 1 - 1/k² for all k > 1.

In [8]:
k = 1.25
distance_from_mean = k * np.std(X)

l = [x for x in X if abs(x - mean) <= distance_from_mean]

print ('Observations within', k, 'stds of mean:', len(l))

print ('Confimation : ', float(len(l))/len(X), '>', 1 - 1/k**2)

Observations within 1.25 stds of mean: 118
Confimation :  0.8082191780821918 > 0.36


#### Semivariance and Semideviation
Semivariance is a measure of downside risk in an investment

These measure focus on negative deviations from a target return, unlike variance and standard deviation which treat upside and downside volatility equally

In [9]:
lows = np.array ( [x for x in X if x <= mean ] )

semi_variance = np.sum( (lows - mean) ** 2 ) / len(lows)

semi_deviation = np.sqrt(semi_variance)

print ('Semi Variance: ', semi_variance)
print ('Semi Deviation: ', semi_deviation)



Semi Variance:  199.2346074894633
Semi Deviation:  14.115048972265852


 Target semi variance  -  Avg of distances that fall below a target value

In [10]:
target = 200

lows = np.array ( [x for x in X if x <= target ] )

semi_variance = np.sum( (lows - target) ** 2 ) / len(lows)

semi_deviation = np.sqrt(semi_variance)

print ('Semi Variance: ', semi_variance)
print ('Semi Deviation: ', semi_deviation)

Semi Variance:  458.9942243639715
Semi Deviation:  21.424150493402802
