   # Scraping Cryptocurrency Historical Data Snapshot Using python

>Firstly going directly into the project lets have a look on Web scrapping!

![img](https://i.imgur.com/6whNLgu.jpg)

[Web Scraping](https://www.geeksforgeeks.org/what-is-web-scraping-and-how-to-use-it/) is an automatic method to obtain large amounts of data from websites. Most of this data is unstructured data in an HTML format which is then converted into structured data in a spreadsheet or a database so that it can be used in various applications.

[CoinMarketCap](https://coinmarketcap.com/) is the world's most-referenced price-tracking website for cryptoassets in the rapidly growing cryptocurrency space. Its mission is to make crypto discoverable and efficient globally by empowering retail users with unbiased, high quality and accurate information for drawing their own informed conclusions. 
For example, [Historical Snapshots](https://coinmarketcap.com/historical/) contains the entire data in weekly format of the cryptocurrencies till present date.


![Screen Shot](https://i.imgur.com/sQkgxVl.png)




The page https://coinmarketcap.com/historical/ of cryptocurrencies, started in April 2013. This [page](https://coinmarketcap.com/historical/) provides a list of weekly date's of months and years, That date's contains Web browsing link of cryptocurrencies of last seven day records.


In this project, We'll retrieve information from this [page](https://coinmarketcap.com/historical/) and cryptocurrencies [page](https://coinmarketcap.com/historical/20130505/) Which is corresponding to the date's of first page using "Web scrapping". We'll use the python libraries [Requests](https://requests.readthedocs.io/en/latest/) and [Beautiful Soup](https://beautiful-soup-4.readthedocs.io/en/latest/) to scrape data from first page. And [Selenium](https://selenium-python.readthedocs.io/) to scrape data corresponding [page](https://coinmarketcap.com/historical/20130505/)

>1. Python is one of the most popular languages for web scraping as it has a variety of libraries that are specifically created for Web Scraping.
>2. Beautiful soup is another Python library that is highly suitable for Web Scraping, It creates a parse tree that can be used to extract data from HTML on a website.
>3. Selenium Webdriver is a tool for testing the front end of an application, it is used to perform browser manipulation in web scraping
>4. Pandas is a tool used to read and manipulate the data.


After scrapping data from first page, We make an automated format in which we give input as date, month, and year to first page data and then using this input we get a link of that particular 'date' and then scrape that 'link' page and generate a csv of historical data.

[CoinMarketCap](https://coinmarketcap.com/) is a [dynamic website](https://www.geeksforgeeks.org/difference-between-static-and-dynamic-web-pages/) or dynamic web page contains information that changes, depending on the viewer, the time of the day, the time zone, the viewer's native language, and other factors.

Generally, We use Beautiful Soup for [static](https://www.geeksforgeeks.org/difference-between-static-and-dynamic-web-pages/) and Selenium for [dynamic website](https://www.geeksforgeeks.org/difference-between-static-and-dynamic-web-pages/) to scrape data.

But here, We will scratch first [page](https://coinmarketcap.com/historical/) using Beautiful soup, On that page data is distributed in static way.


And second [page](https://coinmarketcap.com/historical/20130505/) using Selenium where, data loaded in automated form when you scroll down your page.

Here's an outline of the steps we'll follow for first [page](https://coinmarketcap.com/historical/)
>1. Download the webpage using `requests`
>2. Parse the HTML source code using beautiful soup
>3. Compile extracted information into python lists and dictionaries
>4. Save the extracted information to a dataframe
>5. Get the desire link using input functions for a input data(like date,moonth and year)

Here's an outline of the steps we'll follow for second [page](https://coinmarketcap.com/historical/20130505/) .
>1. Install and Import the required packages.
>2. Create the selenium webdriver object and Load url into driver
>3. Compile extracted information into python lists and dictionaries
>4. Save the extracted information to a csv file

## Install and Import required libraries

### Install Libraries
We can use `!pip` to install library in jupyter Notebook

In [1]:
!pip install requests --upgrade --quiet
!pip install beautifulsoup4 --upgrade --quiet
!pip install selenium --upgrade --quiet
!pip install pandas --upgrade --quiet

In [2]:
!pip install webdriver-manager



### Import Libraries

In [3]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
# WebDriver class will connect us to a browser’s instance
from selenium import webdriver

# Keys class lets us emulate the stroke of keyboard keys
from selenium.webdriver.common.keys import Keys

#Object that manages the starting and stopping of the ChromeDriver
from selenium.webdriver.chrome.service import Service

# Use with Chrome
from webdriver_manager.chrome import ChromeDriverManager

# Set of supported locator strategies.
from selenium.webdriver.common.by import By

# used to manipulate various properties of Chrome driver
from selenium.webdriver.chrome.options import Options

#import time
import time

To download a page, we can use the `get` function from `requests`.

In [4]:
page_url = 'https://coinmarketcap.com/historical/'
response = requests.get(page_url)

`requests.get` returns a response object containing the data from the web page and some other information.

The `.status_code` property can be used to check if the request was successful. A succesful response will have an [HTTP status COde](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status) between 200 and 299.

In [5]:
response.status_code

200

The request was successful We can get the contents of the page using `response.text`

In [6]:
page_content = response.text

Let's check the no. of characters on the page.

In [7]:
len(page_content)

133149

The page contains over 120,000 characters! Here are the first 500 characters of the page:

In [8]:
page_content[:500]

'<!DOCTYPE html><html lang="en" dir="ltr"><head><meta charSet="utf-8"/><meta http-equiv="x-ua-compatible" content="ie=edge"/><meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1, user-scalable=no, shrink-to-fit=no"/><script>(function() {if (false) {window.__renderTime__.csrRender = 1672588904140;return;}window.__renderTime__ = {"pageInitStart":1672588904129,"pageInitEnd":1672588904138,"pageRender":1672588904140};window.__renderTime__.pageLoad = Date.now();})()</scri'

What we're looking at above is the [HTML source code](https://developer.mozilla.org/en-US/docs/Web/HTML) of the web page.

We can also save it to a file and view the page locally within jupyter using "File > Open".

In [9]:
with open('Webpage.html','w') as f:
    f.write(page_content)

The preview looks similar to the original page, but none of the links work.
![Webpage screenshot](https://i.imgur.com/Eg67q97.png)

## Parse the HTML source code using beautiful soup

To extract information from the HTML source code of a webpage programmatically, we can use the [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) library.

Next, let's read the contents of [page](https://coinmarketcap.com/historical/) and create a `BeautifulSoup` object to parse the content.

In [10]:
# html.parser used to parse HTML files
doc = BeautifulSoup(page_content,'html.parser')

In [11]:
type(doc)

bs4.BeautifulSoup

The `doc` object contains several properties and methods for extracting information from the HTML document. Let's look at a few examples below.

In [12]:
# Get the title of page
doc.find('title').text

'Check Cryptocurrency Price History For The Top Coins | CoinMarketCap'

We can use `.text` to get the text of tags

In [13]:
doc.find('a')

<a class="cmc-link" href="/">22,157</a>

That's how we extract information using Beautiful soup

In [14]:
def get_page(url):
    """Download a web page and return a beautiful soup doc"""
    # Download the page
    response = requests.get(url)
   
    # Ensure that the reponse is valid
    if response.status_code != 200:
        raise Exception('Unable to download page {}'.format(url))
    
    # Get the page HTML
    page_content = response.text
    
    # Construct a beautiful soup document
    doc = BeautifulSoup(page_content,'html.parser')
    return doc

In [15]:
doc = get_page(page_url)

In [16]:
doc.find('title')

<title>Check Cryptocurrency Price History For The Top Coins | CoinMarketCap</title>

We can now use the function `get_page` to download any web page and parse it using beautiful soup

## Extract information from page

![Html year tag screenshot](https://i.imgur.com/npWKDZl.png)

Now, here my approach is that, extract all required information into lists from a year and then passed that into a function which contains all year information, and then convert that data into a dataframe.

In [17]:
def get_dates(year_value):
    date_in_year = []
    # 'a' tag contains all dates in a month
    weekend_days = year_value.find_all('a',{'class':'historical-link cmc-link'})
    # Getting all dates in a year
    for date in weekend_days:
        date_in_year.append(date.text)
    # return to a list
    return date_in_year

We can now use `get_dates` function to get dates in a  year

In [18]:
def get_urls(year_value):
    base_url = 'https://coinmarketcap.com'
    urls = []
    # Getting HREF link from 'a' tag 
    weekend_days = year_value.find_all('a',{'class':'historical-link cmc-link'})
    # Getting all HREF link in a year
    for url in weekend_days:
        urls.append(base_url + url['href'])
    # Saving to the list
    return urls

Now, We can use `get_urls` function to get HREF link in a year

In [19]:
def get_year(year_value):
    year_list = []
    # Getting Number of dates in a year
    weekend_days = year_value.find_all('a',{'class':'historical-link cmc-link'})
    # Getting year name as multiple time as Number of dates we have in that year 
    for day in weekend_days:
        year = year_value.find('div', class_ = 'sc-c66fb3e4-1 fDOEKz').text
        year_list.append(year)
    return year_list

Here, we can use `get_year` function to get multiple times 'name of same year' in a list to match the length of dates and HREf link in that year

In [20]:
def get_months(year_value):
    month_list = []
    """Getting multiple times 'Months name' in that year in list that match the length of list of 
    HREF link list in that month """
    # Getting all months in a year 
    months = year_value.find_all('div', class_ = 'sc-9577e50d-0 cFMeUv')
    for month in months:
        # Getting Number of HREF tag in that month
        Days_in_a_month = month.find_all('a', class_ = 'historical-link cmc-link')
        for day in Days_in_a_month:
            # Appending month name in a list as no. of HREF tags in that month
            month_name = month.find('div', class_ = 'sc-9577e50d-1 jEphuI')
            month_list.append(month_name.text)
    return month_list

We can use `get_months` function to get Months name multiple times in a list thats length equal to HREF link list 

In [21]:
def get_all_data(doc):
    # Getting All years from the beautifulsoup 
    years = doc.find_all('div', {'class':'sc-c66fb3e4-0 ixMSjS'})
    # Define dictionary
    data_dict = {'Year':[], 'Month':[], 'Week Days Number':[], 'Website': []}
    # Storing all year data into dictionary
    for year in years:
        data_dict['Year'] += get_year(year)
        data_dict['Month'] += get_months(year)
        data_dict['Week Days Number'] += get_dates(year)
        data_dict['Website'] += get_urls(year)
    return data_dict

Finally, We got complete page data  with the help of `get_all_data` function.

Now converting this Dictionary into a dataframe

In [22]:
def get_df(doc):
    data_dict = get_all_data(doc)
    df = pd.DataFrame(data_dict)
    return df

In [23]:
df = get_df(doc)
df.head(10) # First 10 rows of our dataframe 

Unnamed: 0,Year,Month,Week Days Number,Website
0,2013,April,28,https://coinmarketcap.com/historical/20130428/
1,2013,May,5,https://coinmarketcap.com/historical/20130505/
2,2013,May,12,https://coinmarketcap.com/historical/20130512/
3,2013,May,19,https://coinmarketcap.com/historical/20130519/
4,2013,May,26,https://coinmarketcap.com/historical/20130526/
5,2013,June,2,https://coinmarketcap.com/historical/20130602/
6,2013,June,9,https://coinmarketcap.com/historical/20130609/
7,2013,June,16,https://coinmarketcap.com/historical/20130616/
8,2013,June,23,https://coinmarketcap.com/historical/20130623/
9,2013,June,30,https://coinmarketcap.com/historical/20130630/


In [24]:
# We can save this dataframe into a csv and check this file into jupyter "File > Open"
df.to_csv('Week Days Number.csv',index = None)

## Get desire link from dataframe

Now, creating some functions through which i can search the desire link of data by providing input as date,month and year.

In [25]:
# Giving two arguments one if dataframe, and second one is Year(for which we search data in dataframe)
def search_month_data(df,Year):
    # Giving condition that input year should be in df
    Week_data_yearly = df[df['Year'] == Year]
    print("""Enter Month Name Down:- (Hint:-) Type just correct spell, Don't worry about upper or lowercase!""")
    Month  = input().capitalize() # '.capitalize()' will convert all months name to Capitalization order 
    # Creating Condition if month is not in given year then using input() for month again
    while not Month in Week_data_yearly['Month'].values:
        print("{} month is not in out record".format(Month))
        Month  = input().capitalize()
    week_data_monthly = Week_data_yearly[Week_data_yearly['Month'] == Month]
    print('Yes, we have data for {} month'.format(Month))
    return week_data_monthly

We can check month using `search_month_data` function in a given yearm

In [26]:
# Used dataFrame as input
def search_date_data(df):
    print('Enter Year here:-')
    Year = input()
    # Checking year in df
    while not Year in df.values:
        print('{} year is not in our record'.format(Year))
        Year = input()
    print('Yes, We have data for {}'.format(Year))
    # Calling `search_month_data` function that check the month in year
    month_data = search_month_data(df,Year)
    return month_data

Here, `search_date_data` function take input as 'year' and 'month', then return dates in the given month and year

In [27]:
def get_date_url(df):
    month_data = search_date_data(df)
    print("Enter date here from selected month!")
    avail_dates = list(month_data['Week Days Number'].values)
    print("These are the available dates in desire months",avail_dates)
    date = input()
    while not date in month_data.values:
        print('{} date is not in our record'.format(date))
        date = input()
    print("Yes, We have records for {} date:- ".format(date))
    # Getting the Website like
    link = month_data[month_data['Week Days Number'] == date]['Website']
    desire_url = link.to_string(index = False) #convert pandas series to string
    return desire_url

Now, `get_date_url` function return the HREF attribute link on the basis of inputs.

In [28]:
desire_url = get_date_url(df)
#desire_url

Enter Year here:-
2022
Yes, We have data for 2022
Enter Month Name Down:- (Hint:-) Type just correct spell, Don't worry about upper or lowercase!
december
Yes, we have data for December month
Enter date here from selected month!
These are the available dates in desire months ['4', '11', '18', '25']
25
Yes, We have records for 25 date:- 


Till yet, We have completed first part of this project. So, move further and scrape second page using selenium

## Create and load url into driver

In [29]:
def get_driver(desire_url):
    """Creating driver"""
    chrome_options = webdriver.ChromeOptions()
    chrome_options.add_argument('--no-sandbox')
    prefs = {"profile.default_content_setting_values.notifications" : 2}
    chrome_options.add_experimental_option("prefs", prefs)
    chrome_options.add_argument("start-maximized")
    driver = webdriver.Chrome(options=chrome_options, service=Service(ChromeDriverManager().install()))
    driver.get(desire_url)
    return driver

## Scroll web page and get tags
Here, I'm using [execute_async_script](https://www.geeksforgeeks.org/execute_async_script-driver-method-selenium-python/#:~:text=This%20article%20revolves%20around%20execute_async_script,to%20playing%20with%20live%20code.) that are used for rendering the webpage rapidly and Executes JavaScript in the current window/frame.

Use the `.get()` method of the driver to load a website

In [30]:
def get_tr_tags(driver):
    time.sleep(1)
     # Close the cookies if exists
    try:
        cookies = driver.find_element(By.CLASS_NAME,'cmc-cookie-policy-banner__close').click()
    except:
        pass
    time.sleep(2)
    # Scrolling the page slowly that all 'tr_tags' load 
    driver.execute_async_script(
            """
        count = 700;
        let callback = arguments[arguments.length - 1];
        t = setTimeout(function scrolldown(){
            console.log(count, t);
            window.scrollTo(0, count);
            if(count < (document.body.scrollHeight || document.documentElement.scrollHeight)){
              count+= 700;
              t = setTimeout(scrolldown, 900);
            }else{
              callback((document.body.scrollHeight || document.documentElement.scrollHeight));
            }
        }, 900);"""
        )
    # Getting all tr_tags
    tr_tags = driver.find_elements(By.CLASS_NAME,'cmc-table-row')
    return tr_tags

Here, we successfuly get all tr_tags from desire link page using `get_tr_tags` function

![tr_tags](https://i.imgur.com/QjXhTx0.png)

In [31]:
driver = get_driver(desire_url)
tr_tags = get_tr_tags(driver)
tr_tags[:2]

[WDM] - Downloading: 100%|████████████████████████████████████████████████████████| 6.58M/6.58M [00:01<00:00, 6.90MB/s]


[<selenium.webdriver.remote.webelement.WebElement (session="1ac162589a553f965718013c7a7aebdf", element="a07d0ab2-7c3d-4081-bad7-3b25ffdf7151")>,
 <selenium.webdriver.remote.webelement.WebElement (session="1ac162589a553f965718013c7a7aebdf", element="5de211ff-cf81-42e4-9919-341ff343fb48")>]

## Extract Information from tags

In [32]:
def parse_crypto(row):
    # crypto Rank
    crypto_rank = row.find_element(By.CLASS_NAME,'cmc-table__cell--sort-by__rank').text
    # crypto Name
    crypto_name = row.find_element(By.CLASS_NAME,'cmc-table__cell--sort-by__name').text
    # Crypto Symbol
    crypto_symbol = row.find_element(By.CLASS_NAME,'cmc-table__cell--sort-by__symbol').text
    # Crypto marketcap and '.replace()' removes all coma's from the element 
    crypto_marketcap = row.find_element(By.CLASS_NAME,'cmc-table__cell--sort-by__market-cap').text.replace(',','')
    # crypto price
    crypto_price = row.find_element(By.CLASS_NAME,'cmc-table__cell--sort-by__price').text.replace(',','')
    # crypto circulating supply
    circulating_supply = row.find_element(By.CLASS_NAME,'cmc-table__cell--sort-by__circulating-supply').text.replace(',','')
    # One hour change in percentage
    change_1h = row.find_element(By.CLASS_NAME,'cmc-table__cell--sort-by__percent-change-1-h').text
    # Twenty four hour change in percentage
    change_24h = row.find_element(By.CLASS_NAME,'cmc-table__cell--sort-by__percent-change-24-h').text
    # One week change i.e. 7d
    change_7d = row.find_element(By.CLASS_NAME,'cmc-table__cell--sort-by__percent-change-7-d').text
    # Volume 24 hour
    volume_24h = None
    try:
        volume_24h = row.find_element(By.CLASS_NAME,'cmc-table__cell--sort-by__volume-24-h').text.replace(',','')
    except:
        pass
    return {
            'Rank':crypto_rank,    'Name':crypto_name, 
            'Symbol':crypto_symbol,'market Cap':crypto_marketcap, 
            'Price':crypto_price,   'Circulating Price':circulating_supply, 
            '1hour %':change_1h,    '24 Hour %':change_24h,
            '7d %':change_7d,       'Volume(24 H)':volume_24h
           }

Here we got all the data from first tr tag using `parse_crypto` function. Now we can parse all tr_tags and return that data into a dictionary.

In [33]:
parse_crypto(tr_tags[0])

{'Rank': '1',
 'Name': 'Bitcoin',
 'Symbol': 'BTC',
 'market Cap': '$324093186300.92',
 'Price': '$16841.99',
 'Circulating Price': '19243168 BTC',
 '1hour %': '0.08%',
 '24 Hour %': '-0.03%',
 '7d %': '0.50%',
 'Volume(24 H)': '$11656379938.06'}

We can use a list comprehension to parse all the `tr` tags in one go.

In [34]:
cryptos = [parse_crypto(tag) for tag in tr_tags]

In [35]:
len(cryptos)

200

In [36]:
cryptos[:3]

[{'Rank': '1',
  'Name': 'Bitcoin',
  'Symbol': 'BTC',
  'market Cap': '$324093186300.92',
  'Price': '$16841.99',
  'Circulating Price': '19243168 BTC',
  '1hour %': '0.08%',
  '24 Hour %': '-0.03%',
  '7d %': '0.50%',
  'Volume(24 H)': '$11656379938.06'},
 {'Rank': '2',
  'Name': 'Ethereum',
  'Symbol': 'ETH',
  'market Cap': '$149169092950.40',
  'Price': '$1218.96',
  'Circulating Price': '122373866 ETH *',
  '1hour %': '0.12%',
  '24 Hour %': '-0.18%',
  '7d %': '2.89%',
  'Volume(24 H)': '$3942720070.47'},
 {'Rank': '3',
  'Name': 'Tether',
  'Symbol': 'USDT',
  'market Cap': '$66243849258.58',
  'Price': '$0.9999',
  'Circulating Price': '66247647090 USDT *',
  '1hour %': '0.00%',
  '24 Hour %': '0.00%',
  '7d %': '-0.02%',
  'Volume(24 H)': '$14856104217.46'}]

In [37]:
def get_all_cryptos(row):
    #list comprehension to parse all the tr tags
    cryptos = [parse_crypto(tag) for tag in row]
    return cryptos

We can use `get_all_cryptos` function parse all tr_tags u

## Convert dictionary to dataframe and csv

Now, i'm again using `pandas` library again to convert dictionary into dataframe then converting this dataframe into csv.

In [38]:
def write_csv(data,path):
    # Creating pandas dataframe 
    df = pd.DataFrame(data)
    # Convert the dataframe  into csv
    csv  = df.to_csv(path,index = None)
    return csv

We can verify it by checking csv file in our File option

## Combining all function into a single function
This function can run all the functions and create a csv of the given input

In [39]:
page_url = 'https://coinmarketcap.com/historical/'
def main_function(page_url):
    doc = get_page(page_url)
    df = get_df(doc)
    desire_url = get_date_url(df)
    driver = get_driver(desire_url)
    tr_rows = get_tr_tags(driver)
    print(f'Found {len(tr_rows)} rows,Parsing all rows now')
    crypto_dict = get_all_cryptos(tr_rows)
    csv_file = write_csv(crypto_dict,'historical data.csv')
    return csv_file
main_function(page_url)

Enter Year here:-
2022
Yes, We have data for 2022
Enter Month Name Down:- (Hint:-) Type just correct spell, Don't worry about upper or lowercase!
december
Yes, we have data for December month
Enter date here from selected month!
These are the available dates in desire months ['4', '11', '18', '25']
25
Yes, We have records for 25 date:- 
Found 200 rows,Parsing all rows now


Now that we have a CSV file, we can use the `pandas` library to view its contents.

In [40]:
pd.read_csv('historical data.csv')

Unnamed: 0,Rank,Name,Symbol,market Cap,Price,Circulating Price,1hour %,24 Hour %,7d %,Volume(24 H)
0,1,Bitcoin,BTC,$324093186300.92,$16841.99,19243168 BTC,0.08%,-0.03%,0.50%,$11656379938.06
1,2,Ethereum,ETH,$149169092950.40,$1218.96,122373866 ETH *,0.12%,-0.18%,2.89%,$3942720070.47
2,3,Tether,USDT,$66243849258.58,$0.9999,66247647090 USDT *,0.00%,0.00%,-0.02%,$14856104217.46
3,4,USD Coin,USDC,$44348890607.38,$1.00,44345240440 USDC *,0.00%,0.00%,0.03%,$1380163674.19
4,5,BNB,BNB,$38894316962.85,$243.14,159965769 BNB *,0.06%,-0.61%,-3.19%,$298063868.05
...,...,...,...,...,...,...,...,...,...,...
195,196,Ontology Gas,ONG,$77138496.55,$0.2353,327807147 ONG *,0.90%,-0.19%,1.99%,$6849565.81
196,197,Illuvium,ILV,$76752342.88,$39.75,1930890 ILV *,0.17%,-0.82%,-3.48%,$3906970.39
197,198,Numeraire,NMR,$76336990.34,$12.96,5888504 NMR *,-0.76%,0.83%,5.47%,$11331406.91
198,199,Nervos Network,CKB,$75849567.13,$0.002275,33340068750 CKB,0.03%,-0.62%,2.52%,$918776.25


### All functions in one Cell 

In [41]:
# Install and import libraries and packages
!pip install requests --upgrade --quiet
import requests

!pip install beautifulsoup4 --upgrade --quiet
from bs4 import BeautifulSoup

!pip install pandas --upgrade --quiet
import pandas as pd

!pip install selenium --upgrade --quiet
# WebDriver class will connect us to a browser’s instance
from selenium import webdriver

# Keys class lets us emulate the stroke of keyboard keys
from selenium.webdriver.common.keys import Keys

#Object that manages the starting and stopping of the ChromeDriver
from selenium.webdriver.chrome.service import Service

# Use with Chrome
from webdriver_manager.chrome import ChromeDriverManager

# Set of supported locator strategies.
from selenium.webdriver.common.by import By

# used to manipulate various properties of Chrome driver
from selenium.webdriver.chrome.options import Options

#import time
import time


In [42]:
page_url = 'https://coinmarketcap.com/historical/'
# main_function will execute all required function and returns csv file of extracted data
def main_function(page_url):
    doc = get_page(page_url)
    df = get_df(doc)
    desire_url = get_date_url(df)
    driver = get_driver(desire_url)
    tr_rows = get_tr_tags(driver)
    print(f'Found {len(tr_rows)} rows,Parsing all rows now')
    crypto_dict = get_all_cryptos(tr_rows)
    csv_file = write_csv(crypto_dict,'historical_data.csv')
    csv = pd.read_csv('historical_data.csv')
    return csv
   
def get_page(url):
    """Download a web page and return a beautiful soup doc"""
    # Download the page
    response = requests.get(url)
   
    # Ensure that the reponse is valid
    if response.status_code != 200:
        raise Exception('Unable to download page {}'.format(url))
    
    # Get the page HTML
    page_content = response.text
    
    # Construct a beautiful soup document
    doc = BeautifulSoup(page_content,'html.parser')
    return doc

def get_dates(year_value):
    date_in_year = []
    # 'a' tag contains all dates in a month
    weekend_days = year_value.find_all('a',{'class':'historical-link cmc-link'})
    # Getting all dates in a year
    for date in weekend_days:
        date_in_year.append(date.text)
    # return to a list
    return date_in_year

def get_urls(year_value):
    base_url = 'https://coinmarketcap.com'
    urls = []
    # Getting HREF link from 'a' tag 
    weekend_days = year_value.find_all('a',{'class':'historical-link cmc-link'})
    # Getting all HREF link in a year
    for url in weekend_days:
        urls.append(base_url + url['href'])
    # Saving to the list
    return urls

def get_year(year_value):
    year_list = []
    # Getting Number of dates in a year
    weekend_days = year_value.find_all('a',{'class':'historical-link cmc-link'})
    # Getting year name as multiple time as Number of dates we have in that year 
    for day in weekend_days:
        year = year_value.find('div', class_ = 'sc-c66fb3e4-1 fDOEKz').text
        year_list.append(year)
    return year_list

def get_months(year_value):
    month_list = []
    """Getting multiple times 'Months name' in that year in list that match the length of list of 
    HREF link list in that month """
    # Getting all months in a year 
    months = year_value.find_all('div', class_ = 'sc-9577e50d-0 cFMeUv')
    for month in months:
        # Getting Number of HREF tag in that month
        Days_in_a_month = month.find_all('a', class_ = 'historical-link cmc-link')
        for day in Days_in_a_month:
            # Appending month name in a list as no. of HREF tags in that month
            month_name = month.find('div', class_ = 'sc-9577e50d-1 jEphuI')
            month_list.append(month_name.text)
    return month_list

def get_all_data(doc):
    # Getting All years from the beautifulsoup 
    years = doc.find_all('div', {'class':'sc-c66fb3e4-0 ixMSjS'})
    # Define dictionary
    data_dict = {'Year':[], 'Month':[], 'Week Days Number':[], 'Website': []}
    # Storing all year data into dictionary
    for year in years:
        data_dict['Year'] += get_year(year)
        data_dict['Month'] += get_months(year)
        data_dict['Week Days Number'] += get_dates(year)
        data_dict['Website'] += get_urls(year)
    return data_dict

def get_df(doc):
    data_dict = get_all_data(doc)
    df = pd.DataFrame(data_dict)
    return df

# Giving two arguments one if dataframe, and second one is Year(for which we search data in dataframe)
def search_month_data(df,Year):
    # Giving condition that input year should be in df
    Week_data_yearly = df[df['Year'] == Year]
    print("""Enter Month Name Down:- (Hint:-) Type just correct spell, Don't worry about upper or lowercase!""")
    Month  = input().capitalize() # '.capitalize()' will convert all months name to Capitalization order 
    # Creating Condition if month is not in given year then using input() for month again
    while not Month in Week_data_yearly['Month'].values:
        print("{} month is not in out record".format(Month))
        Month  = input().capitalize()
    week_data_monthly = Week_data_yearly[Week_data_yearly['Month'] == Month]
    print('Yes, we have data for {} month'.format(Month))
    return week_data_monthly

# Used dataFrame as input
def search_date_data(df):
    print('Enter Year here:-')
    Year = input()
    # Checking year in df
    while not Year in df.values:
        print('{} year is not in our record'.format(Year))
        Year = input()
    print('Yes, We have data for {}'.format(Year))
    # Calling `search_month_data` function that check the month in year
    month_data = search_month_data(df,Year)
    return month_data

def get_date_url(df):
    month_data = search_date_data(df)
    print("Enter date here from selected month!")
    avail_dates = list(month_data['Week Days Number'].values)
    print("These are the available dates in desire months",avail_dates)
    date = input()
    while not date in month_data.values:
        print('{} date is not in our record'.format(date))
        date = input()
    print("Yes, We have records for {} date:- ".format(date))
    # Getting the Website like
    link = month_data[month_data['Week Days Number'] == date]['Website']
    desire_url = link.to_string(index = False) #convert pandas series to string
    return desire_url

def get_driver(desire_url):
    """Creating driver"""
    chrome_options = webdriver.ChromeOptions()
    chrome_options.add_argument('--no-sandbox')
    prefs = {"profile.default_content_setting_values.notifications" : 2}
    chrome_options.add_experimental_option("prefs", prefs)
    chrome_options.add_argument("start-maximized")
    driver = webdriver.Chrome(options=chrome_options, service=Service(ChromeDriverManager().install()))
    driver.get(desire_url)
    return driver

def get_tr_tags(driver):
    time.sleep(1)
     # Close the cookies if exists
    try:
        cookies = driver.find_element(By.CLASS_NAME,'cmc-cookie-policy-banner__close').click()
    except:
        pass
    time.sleep(2)
    # Scrolling the page slowly that all 'tr_tags' load 
    driver.execute_async_script(
            """
        count = 700;
        let callback = arguments[arguments.length - 1];
        t = setTimeout(function scrolldown(){
            console.log(count, t);
            window.scrollTo(0, count);
            if(count < (document.body.scrollHeight || document.documentElement.scrollHeight)){
              count+= 700;
              t = setTimeout(scrolldown, 900);
            }else{
              callback((document.body.scrollHeight || document.documentElement.scrollHeight));
            }
        }, 900);"""
        )
    # Getting all tr_tags
    tr_tags = driver.find_elements(By.CLASS_NAME,'cmc-table-row')
    return tr_tags

def parse_crypto(row):
    # crypto Rank
    crypto_rank = row.find_element(By.CLASS_NAME,'cmc-table__cell--sort-by__rank').text
    # crypto Name
    crypto_name = row.find_element(By.CLASS_NAME,'cmc-table__cell--sort-by__name').text
    # Crypto Symbol
    crypto_symbol = row.find_element(By.CLASS_NAME,'cmc-table__cell--sort-by__symbol').text
    # Crypto marketcap and '.replace()' removes all coma's from the element 
    crypto_marketcap = row.find_element(By.CLASS_NAME,'cmc-table__cell--sort-by__market-cap').text.replace(',','')
    # crypto price
    crypto_price = row.find_element(By.CLASS_NAME,'cmc-table__cell--sort-by__price').text.replace(',','')
    # crypto circulating supply
    circulating_supply = row.find_element(By.CLASS_NAME,'cmc-table__cell--sort-by__circulating-supply').text.replace(',','')
    # One hour change in percentage
    change_1h = row.find_element(By.CLASS_NAME,'cmc-table__cell--sort-by__percent-change-1-h').text
    # Twenty four hour change in percentage
    change_24h = row.find_element(By.CLASS_NAME,'cmc-table__cell--sort-by__percent-change-24-h').text
    # One week change i.e. 7d
    change_7d = row.find_element(By.CLASS_NAME,'cmc-table__cell--sort-by__percent-change-7-d').text
    # Volume 24 hour
    volume_24h = None
    try:
        volume_24h = row.find_element(By.CLASS_NAME,'cmc-table__cell--sort-by__volume-24-h').text.replace(',','')
    except:
        pass
    return {
            'Rank':crypto_rank,    'Name':crypto_name, 
            'Symbol':crypto_symbol,'market Cap':crypto_marketcap, 
            'Price':crypto_price,   'Circulating Price':circulating_supply, 
            '1hour %':change_1h,    '24 Hour %':change_24h,
            '7d %':change_7d,       'Volume(24 H)':volume_24h
           }
def get_all_cryptos(row):
    cryptos = [parse_crypto(tag) for tag in row]
    return cryptos

def write_csv(data,path):
    # Creating pandas dataframe 
    df = pd.DataFrame(data)
    # Convert the dataframe  into csv
    csv  = df.to_csv(path,index = None)
    return csv
# Calling main function
main_function(page_url)

Enter Year here:-
2022
Yes, We have data for 2022
Enter Month Name Down:- (Hint:-) Type just correct spell, Don't worry about upper or lowercase!
december
Yes, we have data for December month
Enter date here from selected month!
These are the available dates in desire months ['4', '11', '18', '25']
25
Yes, We have records for 25 date:- 
Found 200 rows,Parsing all rows now


Unnamed: 0,Rank,Name,Symbol,market Cap,Price,Circulating Price,1hour %,24 Hour %,7d %,Volume(24 H)
0,1,Bitcoin,BTC,$324093186300.92,$16841.99,19243168 BTC,0.08%,-0.03%,0.50%,$11656379938.06
1,2,Ethereum,ETH,$149169092950.40,$1218.96,122373866 ETH *,0.12%,-0.18%,2.89%,$3942720070.47
2,3,Tether,USDT,$66243849258.58,$0.9999,66247647090 USDT *,0.00%,0.00%,-0.02%,$14856104217.46
3,4,USD Coin,USDC,$44348890607.38,$1.00,44345240440 USDC *,0.00%,0.00%,0.03%,$1380163674.19
4,5,BNB,BNB,$38894316962.85,$243.14,159965769 BNB *,0.06%,-0.61%,-3.19%,$298063868.05
...,...,...,...,...,...,...,...,...,...,...
195,196,Ontology Gas,ONG,$77138496.55,$0.2353,327807147 ONG *,0.90%,-0.19%,1.99%,$6849565.81
196,197,Illuvium,ILV,$76752342.88,$39.75,1930890 ILV *,0.17%,-0.82%,-3.48%,$3906970.39
197,198,Numeraire,NMR,$76336990.34,$12.96,5888504 NMR *,-0.76%,0.83%,5.47%,$11331406.91
198,199,Nervos Network,CKB,$75849567.13,$0.002275,33340068750 CKB,0.03%,-0.62%,2.52%,$918776.25


## Summary:
1. Download the webpage using requests
2. Parse the HTML source code using beautiful soup
3. Compile extracted information into python lists and dictionaries
4. Save the extracted information to a dataframe
5. Get the desire link using input functions for a input data(like date,moonth and year)
6. Install and Import the required packages.
7. Create the selenium webdriver object and Load url into driver
8. Compile extracted information into python lists and dictionaries
9. Save the extracted information to a csv file

## Refrences:
1. [Web Scraping and Rest APIs](https://jovian.ai/learn/zero-to-data-analyst-bootcamp/lesson/web-scraping-and-rest-apis) Introduction to Web Scraping and REST APIs By [Jovian](https://jovian.ai/)
2. [BeautifulSoup](https://beautiful-soup-4.readthedocs.io/en/latest/) - Beautiful Soup Documentation
3. [Workshop - Web Scraping with Selenium & AWS](https://jovian.ai/learn/zero-to-data-analyst-bootcamp/lesson/workshop-web-scraping-with-selenium-aws) Basics of Selenium and webscraping by [Jovian](https://jovian.ai/)

4. [Selenium Documentation](https://selenium-python.readthedocs.io/) - Selenium With Python
5. [Apna College](https://www.youtube.com/watch?v=HcOc7P5BMi4) - HTML Basics

## Future Work:
- comparison analysis how the prices, changes in % etc. vary from one crypto to another in same day's.
- We can make a comparison using Machine Learning techniques like is really one cryptocurrencies depends on other cryptocurrencies , if yes then whats the relation between them.
- Forecasting Future Prices of Cryptocurrency using Historical Data.
- Bitcoin Price Prediction Based on Other Cryptocurrencies Using Machine Learning and Time Series Analysis.
- From this csv, we can get any crypto time series data just by few more coding.

In [None]:
import jovian
jovian.commit(files = ['historical data.csv','Week Days Number.csv','CoinMarketCap web scrapping_project.ipynb'])

<IPython.core.display.Javascript object>