### **Use ML expertise to predict real crypto market data**

#### What are you trying to do in this notebook?
In this competition, I'll use my machine learning expertise to forecast short term returns in 14 popular cryptocurrencies. As historic cryptocurrency prices are not confidential this will be a forecasting competition using the time series API. 
A cryptocurrency, crypto-currency, or crypto is a collection of binary data which is designed to work as a medium of exchange.

#### Why are you trying it?
This dataset contains information on historic trades for several cryptoassets, such as Bitcoin and Ethereum. I'm just predict their future returns.
Cryptocurrencies are generally fiat currencies, as they are not backed by or convertible into a commodity.

Bitcoin, first released as open-source software in 2009, is the first decentralized cryptocurrency. Since the release of bitcoin, many other cryptocurrencies have been created.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
# import all of the important libraries in this kernel
import numpy as np
import pandas as pd
import plotly.graph_objects as go
import time
import datetime
from plotly.offline import init_notebook_mode, iplot
import plotly.express as px
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import seaborn as sns
import lightgbm as lgb
cmap = sns.color_palette()

In [None]:
# import the data to use in this kernel
df = pd.read_csv('../input/g-research-crypto-forecasting/train.csv')
asset_details = pd.read_csv('../input/g-research-crypto-forecasting/asset_details.csv')
df_test = pd.read_csv('../input/g-research-crypto-forecasting/example_test.csv')

In [None]:
# casually checking the data
df.head(13)

In [None]:
asset_details

In [None]:
df.isnull().sum()

In [None]:
#sort the asset_id data by using Weight as reference.
asset_details = asset_details.sort_values('Weight',ascending=False)
asset_details

In [None]:
asset_names_dict = {row["Asset_Name"]:row["Asset_ID"] for ind, row in asset_details.iterrows()}
asset_names_dict

In [None]:
def add_asset_name(stdata, join):
    return stdata.merge(
        join, how="left",on="Asset_ID"
    )

df = add_asset_name(df,asset_details)

In [None]:
df['Real_Time'] = pd.to_datetime(df['timestamp'],unit='s')

In [None]:
df.head(10)

In [None]:
(df['Asset_Name'].value_counts()/df.shape[0])*100

In [None]:
countpie = df['Asset_Name'].value_counts()

fig = {
  "data": [
    {
      "values": countpie.values,
      "labels": countpie.index,
      "domain": {"x": [0, .5]},
      "name": "Currency types",
      "hoverinfo":"label+percent+name",
      "hole": .7,
      "type": "pie"
    },],
  "layout": {
        "title":"Pie chart of all the Currency types ratio",
    }
}
iplot(fig)

In [None]:
# This is what I normally write my plotly code but because the number of data is very big so it take too much time.
# It might easier to write but it takes too much time to run the graph.

#px.histogram(df, x="Asset_Name", color="Asset_Name")

In [None]:
# This is the better way to run a histogram plot by "Sanskar Hasija"
asset_count= []
for i in range(14):
    count = (df["Asset_ID"]==i).sum()
    asset_count.append(count)

In [None]:
# The output is basically the same as the code above but it run much more faster
fig = px.histogram(x = asset_details.sort_values("Asset_ID")["Asset_Name"],
                   y = asset_count , 
             color = asset_details.sort_values("Asset_ID")["Asset_Name"])
fig.update_xaxes(title="Currency types")
fig.update_yaxes(title = "Number of Rows")
fig.show()

In [None]:
volumesum = df.groupby(['Asset_ID'])['Volume'].sum()
volumesum

In [None]:
fig = px.histogram(x = asset_details.sort_values("Asset_ID")["Asset_Name"],
                   y = volumesum, 
                   color = asset_details.sort_values("Asset_ID")["Asset_Name"])
fig.update_xaxes(title="Currency types")
fig.update_yaxes(title = "Sum of the volumes")
fig.update_layout(showlegend = True,
    title = {
        'text': 'Quantity of asset bought or sold based on USD',
        'y':0.95,
        'x':0.45,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.show()

In [None]:
assetindex = asset_details.sort_values("Asset_ID")["Asset_Name"].values


In [None]:
assetindex

In [None]:
plt.figure(figsize=(40,80))
gs = gridspec.GridSpec(7, 2)
for i in range(14):
    ax = plt.subplot(gs[i])
    ax = sns.scatterplot(x='Close',y='Open',data=df[df['Asset_ID'] == i],color=cmap[i%10])
    ax.set_xlabel('')
    ax.set_title('Scatter plot of currency name: ' + assetindex[i] +' in USD')
plt.show()


In [None]:
f = plt.figure(figsize=(15,30))

for ind, coin in enumerate(list(assetindex)):
    coin_df = df[df["Asset_ID"]==asset_names_dict[coin]].set_index("Real_Time")
    # fill missing values 
    ax = f.add_subplot(7,2,ind+1)
    plt.plot(coin_df['Close'], label=coin, color=cmap[ind%10])
    plt.legend()
    plt.xlabel('Time')
    plt.ylabel(coin)
    plt.title(coin)

plt.tight_layout()
plt.show()

In [None]:
all_assets_df = pd.DataFrame([])
for ind, coin in enumerate(list(assetindex)):
    coin_df = df[df["Asset_ID"]==asset_names_dict[coin]].set_index("Real_Time")
    # fill missing values
    close_values = coin_df["Close"].fillna(0)
    close_values.name = coin
    all_assets_df = all_assets_df.join(close_values, how="outer")


corrmat = all_assets_df.corr()
fig, ax = plt.subplots(figsize=(14, 14))
sns.heatmap(corrmat, vmax=1., square=True, cmap="rocket_r")
plt.title("Cryptocurrency correlation map on actual price values", fontsize=15)
plt.show()

In [None]:
btctemp = df[df['Asset_Name']=='Bitcoin'].set_index("Real_Time")
btctemp = btctemp.iloc[-2000:,] # I want only the lastest 2000 rows from the bottme
btctemp

In [None]:
fig = go.Figure(data=[go.Candlestick(x=btctemp.index, open=btctemp['Open'], high=btctemp['High'], low=btctemp['Low'], close=btctemp['Close'])])
fig.update_xaxes(title_text = 'Time',
                             rangeslider_visible = True)

fig.update_layout(
     title = {
        'text': ' Candelstick Chart: Bitcoin',
        'y':0.90,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})

fig.update_yaxes(title_text = 'Price in USD', ticksuffix = '$')




fig.show()

In [None]:
def crypto_df(AssetName,fdata=df):
    currencydf = fdata[fdata['Asset_Name']== AssetName].set_index("Real_Time")
    currencydf = currencydf.iloc[-2000:,] # I want only the lastest 2000 rows from the bottme
    return(currencydf)

In [None]:
ethtemp = crypto_df('Ethereum')

In [None]:
def latestcandle(coindata,coinname):  
        fig = go.Figure(data=[go.Candlestick(x=coindata.index, open=coindata['Open'], high=coindata['High'], low=coindata['Low'], close=coindata['Close'])])
        fig.update_xaxes(title_text = 'Time',
                                rangeslider_visible = True)

        fig.update_layout(
        title = {
                'text': ' Candelstick Chart: {:}'.format(coinname),
                'y':0.90,
                'x':0.5,
                'xanchor': 'center',
                'yanchor': 'top'})

        fig.update_yaxes(title_text = 'Price in USD', ticksuffix = '$')

        fig.show()

In [None]:
latestcandle(ethtemp,'Ethereum')

In [None]:
def latestcandle(coinname,fdata=df):  
        
        currencydf = fdata[fdata['Asset_Name']== coinname].set_index("Real_Time")
        currencydf = currencydf.iloc[-2000:,] # I want only the lastest 2000 rows from the bottme
        
        fig = go.Figure(data=[go.Candlestick(x=currencydf.index, open=currencydf['Open'], high=currencydf['High'], low=currencydf['Low'], close=currencydf['Close'])])
        fig.update_xaxes(title_text = 'Time',
                                rangeslider_visible = True)

        fig.update_layout(
        title = {
                'text': ' Candelstick Chart: {:}'.format(coinname),
                'y':0.90,
                'x':0.5,
                'xanchor': 'center',
                'yanchor': 'top'})

        fig.update_yaxes(title_text = 'Price in USD', ticksuffix = '$')

        fig.show()

In [None]:
latestcandle('Ethereum Classic') #just fill in the name of a coin so you could get the plot of those

In [None]:
latestcandle('Litecoin')

In [None]:
latestcandle('Dogecoin')


#### Did it work?
Yes, it works. Cryptocurrency does not exist in physical form (like paper money) and is typically not issued by a central authority. Cryptocurrencies typically use decentralized control as opposed to a central bank digital currency (CBDC). When a cryptocurrency is minted or created prior to issuance or issued by a single issuer, it is generally considered centralized. When implemented with decentralized control, each cryptocurrency works through distributed ledger technology, typically a blockchain, that serves as a public financial transaction database.

#### What did you not understand about this process?
By the help of G-Research, it is possible to know more about the Bitcoin. G-Research is Europe’s leading quantitative finance research firm. We have long explored the extent of market prediction possibilities, making use of machine learning, big data, and some of the most advanced technology available.

#### What else do you think you can try as part of this approach?
To remove this uncertainty and provide a level-playing field, technical indicators.


#### If you have any feedback or comments please write it down the comment section below.