# Markdown
# Task 1: Preprocess and Explore the Data
This notebook preprocesses and explores historical financial data for TSLA, BND, and SPY, fetched from YFinance. We'll:
- Load the data using `DataLoader`.
- Clean and preprocess it with `DataPreprocessor`.
- Perform exploratory data analysis (EDA) with `EDA`.

In [1]:
import sys, os
import pandas as pd
sys.path.append(os.path.abspath('..'))

In [2]:
%load_ext autoreload
%autoreload 2

In [3]:
from src.data_loader import DataLoader
from src.data_preprocessor import DataPreprocessor
#from src.eda import EDA

In [4]:
tickers = ['TSLA', 'BND', 'SPY']

## Step 1: Load the Data
Fetch historical data for TSLA, BND, and SPY from YFinance (Jan 1, 2015, to Jan 31, 2025).

In [5]:
loader = DataLoader(tickers)
raw_data = loader.load_data()

YF.download() has changed argument auto_adjust default to True


[*********************100%***********************]  3 of 3 completed
  self.data = self.data.stack(level=1).reset_index().rename(


In [6]:
raw_data.head()

Price,Date,Ticker,Close,High,Low,Open,Volume
0,2015-01-02,BND,62.573143,62.603427,62.399011,62.406583,2218800
1,2015-01-02,SPY,172.59285,173.811083,171.542657,173.391006,121465900
2,2015-01-02,TSLA,14.620667,14.883333,14.217333,14.858,71466000
3,2015-01-05,BND,62.754837,62.777549,62.610989,62.641273,5820100
4,2015-01-05,SPY,169.475876,171.702279,169.165023,171.534251,169632600


## Step 2: Preprocess the Data
Clean the data, handle missing values, and check basic statistics.

In [7]:
data = pd.read_csv('../data/financial_data.csv')

In [8]:
data.head()

Unnamed: 0,Date,Ticker,Close,High,Low,Open,Volume
0,2015-01-02,BND,62.573143,62.603427,62.399011,62.406583,2218800
1,2015-01-02,SPY,172.59285,173.811083,171.542657,173.391006,121465900
2,2015-01-02,TSLA,14.620667,14.883333,14.217333,14.858,71466000
3,2015-01-05,BND,62.754837,62.777549,62.610989,62.641273,5820100
4,2015-01-05,SPY,169.475876,171.702279,169.165023,171.534251,169632600


In [9]:
data.isna().sum()

Date      0
Ticker    0
Close     0
High      0
Low       0
Open      0
Volume    0
dtype: int64

In [10]:
data.dtypes

Date       object
Ticker     object
Close     float64
High      float64
Low       float64
Open      float64
Volume      int64
dtype: object

In [11]:
preprocessor = DataPreprocessor(data)
cleaned_data = preprocessor.clean_data()

In [12]:
cleaned_data.to_csv('../data/cleaned_data.csv', index=False)

In [13]:
data = pd.read_csv('../data/cleaned_data.csv', parse_dates=['Date'])
data.dtypes

Date      datetime64[ns]
Ticker            object
Close            float64
High             float64
Low              float64
Open             float64
Volume             int64
dtype: object