# Financial Analysis in Python

This proyect analizes the time series described by the closing price of the ETF IVV. Here i get to understand it's behavior and explain the viability of a SARIMAX model to predict values in the immediate future.

### ETF (exchange-traded fund)

An exchange-traded fund is a type of pooled investment security (fund that gathers capital form investors to collectively purchase a diversified portfolio of assets) that holds multiple underlying assets, rather than only one. 

Unlike mutual funds, ETF share prices are determined throughout the day.

ETFs offer low expense ratios and fewer brokerage commissions than buying stocks individually.

### Downloading necessary libraries

Data will be downloaded from Yahoo Finance. We will specify the price to be adjusted by splits and dividends.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import yfinance as yf

### Downloading historical data

## Parameters 

We choose 3 ETFs to study and analyze. These are IVV, IEF, GLD. Choosing a daily interval to extract data in a time span of 20 years will give us plenty observations, ranging from 2005-01-01 to 2025-10-1. 

In [2]:
tickers = ["IVV", "IEF", "GLD"]
start = "2005-01-01"
end = "2025-10-01"   
interval = "1d"      # daily data 

### Description of these ETFs

#### IVV (iShares Core S&P 500 ETF)

It tracks the S&P 500 Index, composed of 500 of the largest publicly traded companies in the United States.

The primary goal is to replicate the performance of the US large-cap stock market.

It is characterized by presenting high volatility and growth potential, having historically provided (the stocks) the highest returns.

#### IEF (iShares 7-10 Year Treasyry Bond ETF)

It tracks an index composed of US Treasry bonds with remaining maturities between 7 and 10 years. These are debt obligations issued by the US government.

The goal is to provide a steady stream of interest payments with lower volatility than stocks.

Key characteristics include a lower growth potential primarly from interest payments, lower risk/volatility.

In times of stock market stress, investors often flock to US Treasuries, which can cause IEF to rise when IVV is falling.

#### GLD (SPDR Gold Shares)

The price of gold bullion. Each share of GLD represents a fractional, undivided interest in the trust, which holds physical gold bars in a secure vault.

Gold produces no income (like dividends or interest). Its return comes purely from price changes. It is seen as a store of value when a currency's purchasing power declines.

It's price often moves independently of tocks and bonds. It can perform well during periods of geopolitical uncertainity, market stress, or loss of confidence in the financial system.

### Downloading data and adjustments

For the first step, we will download the price data, specifying this data to be adjusted. This is done to avoid ignoring dividend income, what would give misleading performance. Let's also observe the structure of the data.

In [3]:
# Download data (auto_adjust=True applies splits/dividends a OHLC [Open | Highest | Lowest | Closed]) ---
data = yf.download(tickers, start=start, end=end, interval=interval, group_by='ticker', auto_adjust=True, progress=False)
data.to_csv('../data/sample_data.csv', index=True)
data.head()

Ticker,GLD,GLD,GLD,GLD,GLD,IEF,IEF,IEF,IEF,IEF,IVV,IVV,IVV,IVV,IVV
Price,Open,High,Low,Close,Volume,Open,High,Low,Close,Volume,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2
2005-01-03,42.98,43.169998,42.740002,43.02,4750400,48.752567,48.890354,48.7009,48.86739,323400,82.778656,82.914696,81.642746,81.935226,578300
2005-01-04,42.799999,42.91,42.459999,42.740002,3456800,48.867383,48.867383,48.557369,48.563107,1181000,82.078058,82.078058,80.642865,80.948952,845400
2005-01-05,42.75,42.880001,42.599998,42.669998,2033600,48.614775,48.723856,48.574589,48.643482,369000,80.942152,81.200621,80.459221,80.459221,618400
2005-01-06,42.48,42.560001,42.07,42.150002,2556400,48.591819,48.741086,48.591819,48.689415,389100,80.64967,81.119001,80.568046,80.785706,518500
2005-01-07,42.09,42.389999,41.700001,41.84,4492700,48.764054,48.798498,48.626266,48.649231,182400,80.996614,81.173459,80.486474,80.636116,583900


This is a multi-level column DataFrame. Each ticker has its own group of columns.

These columns refer to Open, High, Low, Close and Volume. The first four items give us the values of the price at its opening, highest point, lowest point and closing. The volume is the number of shares that these stocks changed hands each moment.