### Dataset Overview

This dataset contains historical daily stock price data of Tesla.

It includes basic market information such as opening price, closing price,
daily high/low, and trading volume.  
The data will be used to understand price behavior before modeling.



In [26]:
import pandas as pd
import numpy as np


In [27]:
df = pd.read_csv("/Users/ganeshpokharel/projects/Tesla-Stock-Market-Analysis-and-Prediction/data/raw/tesla_stock_raw.csv")

In [28]:
df.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Adj Close
0,6/29/2010,19.0,25.0,17.540001,23.889999,18766300,23.889999
1,6/30/2010,25.790001,30.42,23.299999,23.83,17187100,23.83
2,7/1/2010,25.0,25.92,20.27,21.959999,8218800,21.959999
3,7/2/2010,23.0,23.1,18.709999,19.200001,5139800,19.200001
4,7/6/2010,20.0,20.0,15.83,16.110001,6866900,16.110001


In [24]:
df.shape

(1692, 7)

### Dataset Size

The dataset contains **1692 rows and 7 columns**, representing
1692 trading days of Tesla stock data.

In [25]:
df.columns

Index(['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close'], dtype='str')

### Column Description

The dataset contains 7 columns related to daily stock market information:

- **Date**: Trading date
- **Open**: Opening price of the stock
- **High**: Highest price during the day
- **Low**: Lowest price during the day
- **Close**: Closing price of the stock
- **Volume**: Number of shares traded
- **Adj Close**: Adjusted closing price (after splits/dividends)


In [29]:
df.info()

<class 'pandas.DataFrame'>
RangeIndex: 1692 entries, 0 to 1691
Data columns (total 7 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Date       1692 non-null   str    
 1   Open       1692 non-null   float64
 2   High       1692 non-null   float64
 3   Low        1692 non-null   float64
 4   Close      1692 non-null   float64
 5   Volume     1692 non-null   int64  
 6   Adj Close  1692 non-null   float64
dtypes: float64(5), int64(1), str(1)
memory usage: 92.7 KB


### Data Types and Missing Values

The dataset contains 1692 records with no missing values in any column.

- Price-related columns (Open, High, Low, Close, Adj Close) are stored as float values.
- Volume is stored as an integer.
- Date column is currently in string format and will be converted to datetime later.

Overall, the dataset is clean and ready for further analysis.


In [30]:
df["Date"]= pd.to_datetime(df["Date"])

In [33]:
df= df.sort_values("Date").reset_index(drop=True)

In [34]:
df.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Adj Close
0,2010-06-29,19.0,25.0,17.540001,23.889999,18766300,23.889999
1,2010-06-30,25.790001,30.42,23.299999,23.83,17187100,23.83
2,2010-07-01,25.0,25.92,20.27,21.959999,8218800,21.959999
3,2010-07-02,23.0,23.1,18.709999,19.200001,5139800,19.200001
4,2010-07-06,20.0,20.0,15.83,16.110001,6866900,16.110001



The Date column was converted to datetime format and the data was sorted in chronological order.


In [36]:

processed_path = "../data/processed/tesla_stock_processed.csv"
df.to_csv(processed_path, index=False)

processed_path


'../data/processed/tesla_stock_processed.csv'

### Saving Processed Data

After basic data understanding and date formatting, the cleaned dataset was saved separately.
This ensures that all future analysis uses a consistent and prepared version of the data, while keeping the original raw data unchanged.
