# Project Data Science: Bitcoin Price Prediction

## 1. Introduction
In this day AI have a significant impact almost in our life and work place, and for this field we will leveraging ai in sequential data development. In this Data Science Project we will do some research about BTCUSD consolidation price with Deep Learning Neural Network LSTM method using PyTorch library
### 1. 1 Project Objective
The main objective is to predict BTC price until the next months with LSTM (Long Short Term Memory) method
### 1. 2 Dataset Description
`btcusd_1-min_data` is the name of the dataset, which was taken from the Kaggle Dataset source: https://www.kaggle.com/datasets/mczielinski/bitcoin-historical-data and has seven columns: `Timestamps, Open, Close, Min, Max, Datetime`. The data is a compilation of the price of Bitcoin from 2012 until early 2025, which is a changes every minute. 

## 2. Data Preparation


### 2.1 Importing Libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

plt.style.use('ggplot')

### 2.2 Loading the Dataset

In [6]:
df = pd.read_csv('btcusd_1-min_data.csv', parse_dates=['datetime'])

### 2.3 Initial Data Exploration

In [11]:
# get 5 first data
df.head()

Unnamed: 0,Timestamp,Open,High,Low,Close,Volume,datetime
0,1325412000.0,4.58,4.58,4.58,4.58,0.0,2012-01-01 10:01:00+00:00
1,1325412000.0,4.58,4.58,4.58,4.58,0.0,2012-01-01 10:02:00+00:00
2,1325412000.0,4.58,4.58,4.58,4.58,0.0,2012-01-01 10:03:00+00:00
3,1325412000.0,4.58,4.58,4.58,4.58,0.0,2012-01-01 10:04:00+00:00
4,1325412000.0,4.58,4.58,4.58,4.58,0.0,2012-01-01 10:05:00+00:00


In [None]:
# get data information
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7008205 entries, 0 to 7008204
Data columns (total 7 columns):
 #   Column     Dtype              
---  ------     -----              
 0   Timestamp  float64            
 1   Open       float64            
 2   High       float64            
 3   Low        float64            
 4   Close      float64            
 5   Volume     float64            
 6   datetime   datetime64[ns, UTC]
dtypes: datetime64[ns, UTC](1), float64(6)
memory usage: 374.3 MB


In [12]:
# to know mean, max, min, std, 25%, 50%, 75% of each column
df.describe()

Unnamed: 0,Timestamp,Open,High,Low,Close,Volume
count,7008205.0,7008205.0,7008205.0,7008205.0,7008205.0,7008205.0
mean,1535659000.0,17374.03,17380.98,17366.87,17374.03,5.303707
std,121386800.0,24014.68,24022.7,24006.46,24014.66,22.52397
min,1325412000.0,3.8,3.8,3.8,3.8,0.0
25%,1430535000.0,424.2,424.35,424.03,424.2,0.0181438
50%,1535658000.0,6587.83,6590.61,6584.38,6587.8,0.4692601
75%,1640781000.0,27269.0,27276.0,27263.0,27269.0,3.035527
max,1745974000.0,109111.0,109356.0,108794.0,109036.0,5853.852


## 3. Data Cleaning

### 3.1 Handling Missing Values

In [13]:
#find missing valuee
df.isnull().sum()

Timestamp         0
Open              0
High              0
Low               0
Close             0
Volume            0
datetime     225925
dtype: int64

In [14]:
# drop all missing values
df.dropna(inplace=True)
# check again
df.isnull().sum()

Timestamp    0
Open         0
High         0
Low          0
Close        0
Volume       0
datetime     0
dtype: int64

### 3.2 Removing Duplicate Values

In [19]:
# check duplicate values without Timestamp column
df.iloc[:,1:].duplicated().sum()


np.int64(0)

### 3.3 Change To Daily Dataframe

In [23]:
# change data to daily data 
df_daily = df.resample('D', on='datetime').agg({
    'Volume': 'sum', 
    'High': 'max', 
    'Low': 'min', 
    'Open': 'first', 
    'Close': 'last'
    }).reset_index()
df_daily.head()


Unnamed: 0,datetime,Volume,High,Low,Open,Close
0,2012-01-01 00:00:00+00:00,10.0,4.84,4.58,4.58,4.84
1,2012-01-02 00:00:00+00:00,10.1,5.0,4.84,4.84,5.0
2,2012-01-03 00:00:00+00:00,107.085281,5.32,5.0,5.0,5.29
3,2012-01-04 00:00:00+00:00,107.23326,5.57,4.93,5.29,5.57
4,2012-01-05 00:00:00+00:00,70.328742,6.46,5.57,5.57,6.42


In [None]:
# change datatime data to date and convert to datetime
df_daily['date'] = df_daily['datetime'].dt.date
df_daily['date'] = pd.to_datetime(df_daily['date'])
# drop datetime column
df_daily.drop(columns=['datetime'], inplace=True, axis=1)
# chech data types
df_daily.dtypes
# check the first 5 data
df_daily.head()

Unnamed: 0,Volume,High,Low,Open,Close,date
0,10.0,4.84,4.58,4.58,4.84,2012-01-01
1,10.1,5.0,4.84,4.84,5.0,2012-01-02
2,107.085281,5.32,5.0,5.0,5.29,2012-01-03
3,107.23326,5.57,4.93,5.29,5.57,2012-01-04
4,70.328742,6.46,5.57,5.57,6.42,2012-01-05


In [31]:
# check min and max of each column
df_daily.min(), df_daily.max()

(Volume                    0.0
 High                     4.38
 Low                       3.8
 Open                     4.38
 Close                    4.38
 date      2012-01-01 00:00:00
 dtype: object,
 Volume          127286.486533
 High                 109030.0
 Low                  106187.0
 Open                 108314.0
 Close                106187.0
 date      2025-03-15 00:00:00
 dtype: object)

In [33]:
# chech total data
df_daily.shape[0]

4823

In [34]:
# export data to csv file
df_daily.to_csv('btcusdt_daily.csv', index=False)

## 4. Exploratory Data Analysis (EDA)