# Scraping TOP CRYPTO Info on coingecko
![Imgur](https://i.imgur.com/MoY5RQS.png)


## Web Scraping
  `Web Scraping` is the process of collecting structured web data in an automated fashion. In general web data extraction or web scraping is used by  people and business who want to make use of the vast amount of publicly available web data to make better decisions.

# About Cryptocurrency
A cryptocurrency is a digital or `virtual currency`, that is secured by cryptography, which makes it nearly impossible to counterfeit or double-spend. Many cryptocurrencies  are `decentralized network` based on `blockchain technology` - a distributed ledger enforced by a disparate network of computers. 

# coingecko
`Coingeco` is a website that contains information about all `crypto currencies`.It helps the user to understand live price, volumes and so many other things related to cryptocurrencies.

# Project Outline
This project will use several Python libraries to scrape data from coingecko website. We will use the Python libraries like `requests` and `Beautiful Soup` to scrape data from the pages, then save our data in a `CSV file`.

1. Download the webpage using requests
2. Inspect the HTML in the Browser
3. Parse the webpage's HTML code using Beautiful Soup
4. Extract the information we want from the code
5. Use Python lists and dictionaries to organize the extracted information
6. Extract and combine data from multiple pages
7. Save the extracted information to a CSV file
8. Conclusion


In [1]:
!pip install jovian --upgrade --quiet

In [2]:
import jovian

In [3]:
# Execute this to save new versions of the notebook
jovian.commit(project="WebScrapingCrypto")

<IPython.core.display.Javascript object>

[jovian] Updating notebook "adityahebbarnhnm/webscrapingcrypto" on https://jovian.ai[0m
[jovian] Committed successfully! https://jovian.ai/adityahebbarnhnm/webscrapingcrypto[0m


'https://jovian.ai/adityahebbarnhnm/webscrapingcrypto'

# Download the webpage using `requests`
 
The Python `requests` library, specifically requests.get(), will allow us to    extract the source code of a web page by passing in a URL. To keep our code  clean, we'll assign the URL to a variable.

In [4]:
ccoin_trending_url = 'https://www.coingecko.com/en'

![Imgur](https://i.imgur.com/NGJIIOQ.png)

here's a scrren-shot of the webpage 

To grab the webpage's code, we'll install and import the requests library:

In [5]:
!pip install requests --upgrade --quiet
import requests

Now we can download the web page using requests.get.

In [6]:
page_source = requests.get(ccoin_trending_url)


Let's check it to make sure that the request was successful

In [7]:
page_source.status_code


200

A 200 code means the request was successful.


To access the page content `.text` property of `response` can be used. 

In [8]:
page_source_content = page_source.text
#length of page contents
len(page_source_content)


1318034

The page contains around `13,29,624`, characters.Let's check first 1000 characters. 

In [9]:
page_source_content[:1000]

'<!DOCTYPE html>\n<html lang="en">\n<head>\n<script src="https://www.googleoptimize.com/optimize.js?id=GTM-W3CD992"></script>\n<meta charset="utf-8">\n<meta http-equiv="X-UA-Compatible" content="IE=edge">\n<meta name="viewport" content="width=device-width, initial-scale=1.0">\n<link rel="shortcut icon" href="/favicon.ico">\n<link type="application/opensearchdescription+xml" rel="search" href="/OpensearchDescription.xml" />\n<title>Cryptocurrency Prices, Charts, and Crypto Market Cap | CoinGecko</title>\n<meta name="description" content="View top cryptocurrency prices live, crypto charts, market cap, and trading volume. Discover today’s new and trending coins, top crypto gainers and losers in the market." />\n\n\n<script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({\'gtm.start\':\n  new Date().getTime(),event:\'gtm.js\'});var f=d.getElementsByTagName(s)[0],\n  j=d.createElement(s),dl=l!=\'dataLayer\'?\'&l=\'+l:\'\';j.async=true;j.src=\n  \'https://www.googletagmanager.com/gtm.js?id=\'+

Next let us save the page source into a new HTML file

In [10]:
with open('trending.html', 'w') as f:
  f.write(page_source_content)

Let us save the updated version of the notebook.

In [None]:
# Execute this to save new versions of the notebook
jovian.commit(project="WebScrapingCrypto")

<IPython.core.display.Javascript object>

# Inspect HTML in the Browser

Before to extract data, we'll first have to examine the web-page HTML code to find the tags which contain the page element we want. By right-clicking and selecting inspect option we can examine the code of the website. This method is one of the powerful method to access contents of HTML. To extract each item we'll need to dig a little deeper. We can look into it later 
![Imgur](https://i.imgur.com/RGqvWnq.png)

In [12]:
# Execute this to save new versions of the notebook
jovian.commit(project="WebScrapingCrypto")

<IPython.core.display.Javascript object>

[jovian] Updating notebook "adityahebbarnhnm/webscrapingcrypto" on https://jovian.ai[0m
[jovian] Committed successfully! https://jovian.ai/adityahebbarnhnm/webscrapingcrypto[0m


'https://jovian.ai/adityahebbarnhnm/webscrapingcrypto'

# Parse the webpage's HTML code using `Beautiful Soup`
Beautiful Soup  very useful Python library used to parse, or extract data from, HTML, XML, and other markup language documents. It's installed as `beautifulsoup4`, and the `BeautifulSoup` class is imported from the `bs4` module.

In [13]:
#install the library
!pip install beautifulsoup4 --upgrade --quiet

In [14]:
#import the library
from bs4 import BeautifulSoup

Now we can read in the contents of the HTML file and create a BeautifulSoup object to parse its contents.

In [15]:
doc = BeautifulSoup(page_source.text, 'html.parser')

In [16]:
type(doc)

bs4.BeautifulSoup

In [17]:
# Execute this to save new versions of the notebook
jovian.commit(project="WebScrapingCrypto")

<IPython.core.display.Javascript object>

[jovian] Updating notebook "adityahebbarnhnm/webscrapingcrypto" on https://jovian.ai[0m
[jovian] Committed successfully! https://jovian.ai/adityahebbarnhnm/webscrapingcrypto[0m


'https://jovian.ai/adityahebbarnhnm/webscrapingcrypto'

## tr tags
![Imgur](https://i.imgur.com/Rl8sQsH.png)
From the above screenshots, we can see that the rows of data we need are contained in `tr` tags. A `tr` tag defines a row in an HTML table.BeautifulSoup makes it simple to access HTML tags by name.

In [18]:
# find all the <tr> tags and assign them to 'all_tr_tags'
all_tr_tags = doc.find_all('tr')

In [19]:
len(all_tr_tags)

101

In [20]:
all_tr_tags[:1]


[<tr>
 <th data-sort-method="none"></th>
 <th class="table-number">
 #
 </th>
 <th class="coin-name text-left">
 Coin
 </th>
 <th class="price text-right" data-sort-method="number">
 Price
 </th>
 <th class="change1h text-right col-market" data-sort-method="number" style="width: 70px">
 1h
 </th>
 <th class="change24h text-right col-market" data-sort-method="number" style="width: 70px">
 24h
 </th>
 <th class="change7d text-right col-market" data-sort-method="number" style="width: 70px">
 7d
 </th>
 <th class="lit text-right col-market" data-sort-method="number">
 24h Volume
 </th>
 <th class="cap text-right col-market" data-sort-method="number">
 Mkt Cap
 </th>
 <th class="change7d text-center col-market" data-sort-method="none" style="width: 70px">
 Last 7 Days
 </th>
 </tr>]

# To extract information 

  We're getting closer to parsing the page. Since the list of coins has been spread to so many pages which has a list of 100 coins each the most efficient way is to create a function which can be used to get infrormation from the specific page but which returns information of the first page as a default. 

In [21]:
def get_page(page=1):
    page_url = 'https://www.coingecko.com/en?page=' +str(page)
    #get the HTML page content
    response = requests.get(page_url)
    #checking for the validity of response
    if not response.ok:
        print('Status Code:',response.status_code)
        raise Exception('Failed to fetch' + page_url)
    #Getting BeautifulSoup document
    doc = BeautifulSoup(response.text)   
    return doc

We can see that there are 101 `tr tags` on the page. However,there are only 100 rows with exact information  needed and it also reveals that `header` row is also contained in `tr`  

In [22]:
coin_first_page = get_page()

In [23]:
#shows the first 'tbody' tag in the page, which is the result table
coin_first_page.tbody

<tbody>
<tr>
<td class="pl-1 pr-0" style="width: 20px;">
<div class="dropdown-menu p-3 dropdown-popup dropdown-menu-content portfolios-dropdown"></div>
<i aria-expanded="false" aria-haspopup="true" class="far cursor-pointer fa-star" data-action="click-&gt;favorites#trigger" data-coin-id="1" data-coin-image="https://assets.coingecko.com/coins/images/1/thumb/bitcoin.png?1547033579" data-coin-slug="bitcoin" data-coin-symbol="btc" data-new-portfolio-button-label='&lt;span class="translation_missing" title="translation missing: en.layouts.application.portfolio.add_to_new_portfolio"&gt;Add To New Portfolio&lt;/span&gt;' data-new-portfolio-confirm-button="new-portfolio-form-confirm-button" data-new-portfolio-confirm-modal="new-portfolio-form-modal" data-price-btc="1.0" data-removable="false" data-source="star_button" data-target="favorites.solidStar" data-toast-success-hyperlink-text="Start your portfolio - Add a transaction!" data-toast-success-message="BTC added to Portfolio" data-toggle="d

In [24]:
#tbody_tag here contains tbody of coin_first_page
tbody_tag = coin_first_page.find_all('tbody')

In [25]:
#one table in the page
len(tbody_tag)

1

Now we can finally begin extracting the data we've been looking for. As we saw already, the `tbody` tag contains every information with tr tags.Let's find the first row, or tr tag:

In [26]:

table = tbody_tag[0]

In [27]:
first_row = table.find('tr')
first_row

<tr>
<td class="pl-1 pr-0" style="width: 20px;">
<div class="dropdown-menu p-3 dropdown-popup dropdown-menu-content portfolios-dropdown"></div>
<i aria-expanded="false" aria-haspopup="true" class="far cursor-pointer fa-star" data-action="click-&gt;favorites#trigger" data-coin-id="1" data-coin-image="https://assets.coingecko.com/coins/images/1/thumb/bitcoin.png?1547033579" data-coin-slug="bitcoin" data-coin-symbol="btc" data-new-portfolio-button-label='&lt;span class="translation_missing" title="translation missing: en.layouts.application.portfolio.add_to_new_portfolio"&gt;Add To New Portfolio&lt;/span&gt;' data-new-portfolio-confirm-button="new-portfolio-form-confirm-button" data-new-portfolio-confirm-modal="new-portfolio-form-modal" data-price-btc="1.0" data-removable="false" data-source="star_button" data-target="favorites.solidStar" data-toast-success-hyperlink-text="Start your portfolio - Add a transaction!" data-toast-success-message="BTC added to Portfolio" data-toggle="dropdown"

To make our code more efficient let us modify our function that which takes page number and returns table contents of the page. Because in this case we only need table contents of the page which has list of all coins and we do have only one table for each page.

In [28]:
def get_page_table(page=1):
    doc = get_page(page)   
    #to find all tbody tags in this page
    t_body = doc.find_all('tbody')
     #since each page consists of only one table we are assigning 0th table as a
     #default table
    table = t_body[0]
    return table

In [29]:
table_page1 = get_page_table()

In [30]:
table_page1

<tbody>
<tr>
<td class="pl-1 pr-0" style="width: 20px;">
<div class="dropdown-menu p-3 dropdown-popup dropdown-menu-content portfolios-dropdown"></div>
<i aria-expanded="false" aria-haspopup="true" class="far cursor-pointer fa-star" data-action="click-&gt;favorites#trigger" data-coin-id="1" data-coin-image="https://assets.coingecko.com/coins/images/1/thumb/bitcoin.png?1547033579" data-coin-slug="bitcoin" data-coin-symbol="btc" data-new-portfolio-button-label='&lt;span class="translation_missing" title="translation missing: en.layouts.application.portfolio.add_to_new_portfolio"&gt;Add To New Portfolio&lt;/span&gt;' data-new-portfolio-confirm-button="new-portfolio-form-confirm-button" data-new-portfolio-confirm-modal="new-portfolio-form-modal" data-price-btc="1.0" data-removable="false" data-source="star_button" data-target="favorites.solidStar" data-toast-success-hyperlink-text="Start your portfolio - Add a transaction!" data-toast-success-message="BTC added to Portfolio" data-toggle="d

Great!! we can simply get table of the entire page, In the next part let us use this information to parse information properly

# Use python lists and dictionaries to organize the extracted information

We can see how to select the child elements we need which contains desired data. Now we can write function which goes through each row of table , pulls all the data for each row, puts it in a dictionary , and finally creates a list of all these data dictionaries. And after that using `pandas` we create a `DataFrame` with same dictionary. For that we will install and import `Pandas` first.

In [31]:
#install pandas library
!pip install pandas --upgrade --quiet

In [32]:
#import pandas library
import pandas as pd

In [33]:
def parse_table(tablei):
    table_data= []
    table_rows = tablei.find_all('tr')
    for row in table_rows:
        td_tags = row.find_all('td')
        sr_no = td_tags[1].text.strip()
        name = td_tags[2].text.strip()
        c_name = name.split('\n', 1)[0]
        price= td_tags[3].text.strip()
        h24_v= td_tags[7].text.strip()
        market_cap = td_tags[8].text.strip()
       
       
        row_data = {
            'sr_no' : sr_no,
            'name' : c_name,
            'price': price,
            '24hr_volume':h24_v,
            'market_cap' : market_cap,
             }
        table_data.append(row_data)
    return pd.DataFrame(table_data)

In [34]:
page1_df= parse_table(table_page1)


Let's see the how it works!!

In [35]:
page1_df.style.hide_index()

sr_no,name,price,24hr_volume,market_cap
1,Bitcoin,"$43,820.44","$51,482,860,519","$830,173,307,276"
2,Ethereum,"$3,354.87","$15,348,848,451","$399,251,009,209"
3,Binance Coin,$485.06,"$2,055,830,857","$81,476,168,703"
4,Tether,$1.00,"$52,120,926,121","$78,705,696,307"
5,Solana,$150.80,"$2,090,822,331","$47,016,179,326"
6,USD Coin,$1.00,"$3,123,169,883","$44,905,290,330"
7,Cardano,$1.30,"$1,310,584,874","$41,485,425,637"
8,XRP,$0.797448,"$2,397,590,866","$38,050,878,721"
9,Polkadot,$27.16,"$1,170,727,794","$29,156,883,954"
10,Terra,$81.43,"$2,603,848,338","$29,056,815,194"


Great! Our function returns a `DataFrame` which is fairly easy to read.

In [36]:
# Execute this to save new versions of the notebook
jovian.commit(project="WebScrapingCrypto")

<IPython.core.display.Javascript object>

[jovian] Updating notebook "adityahebbarnhnm/webscrapingcrypto" on https://jovian.ai[0m
[jovian] Committed successfully! https://jovian.ai/adityahebbarnhnm/webscrapingcrypto[0m


'https://jovian.ai/adityahebbarnhnm/webscrapingcrypto'

# Scraping another page for more info
 Great. Until now we are able to scrape the page and create a dataframe out of it. Now in a single shot let us try to scrape another page and let us create another dataframe. So in that in the later phase we can merge both previous and later dataframe which could result in dataframe with larger dataset 

In [37]:
table_page2 = get_page_table(2)

Let us parse table of page 2  and let us get dataframe named page2_df using `parse_table`

In [38]:
page2_df =parse_table(table_page2)

In [39]:
page2_df.style.hide_index()

sr_no,name,price,24hr_volume,market_cap
102,Spell Token,$0.015282108848,"$70,098,487","$1,227,755,935"
101,SafeMoon [OLD],$0.000002169632,"$467,540","$1,218,314,709"
103,yearn.finance,"$33,902.49","$257,845,845","$1,210,398,973"
104,Bitkub Coin,$13.26,"$62,547,103","$1,177,158,876"
105,Holo,$0.006550624813,"$89,522,495","$1,164,005,904"
106,XDC Network,$0.090735378656,"$8,304,534","$1,108,667,803"
107,1inch,$2.50,"$187,941,815","$1,103,352,314"
108,NEM,$0.119917,"$17,397,138","$1,079,532,858"
109,Synthetix Network Token,$5.40,"$41,309,340","$1,076,247,769"
110,GateToken,$6.81,"$5,605,661","$1,069,918,154"


Let us merge both dataframes having 100 columns and 5 rows saw that we can get a large dataframe which has every info of 200 `crypto currencies`

In [40]:
final_df = pd.concat([page1_df,page2_df])

In [41]:
final_df.style.hide_index()

sr_no,name,price,24hr_volume,market_cap
1,Bitcoin,"$43,820.44","$51,482,860,519","$830,173,307,276"
2,Ethereum,"$3,354.87","$15,348,848,451","$399,251,009,209"
3,Binance Coin,$485.06,"$2,055,830,857","$81,476,168,703"
4,Tether,$1.00,"$52,120,926,121","$78,705,696,307"
5,Solana,$150.80,"$2,090,822,331","$47,016,179,326"
6,USD Coin,$1.00,"$3,123,169,883","$44,905,290,330"
7,Cardano,$1.30,"$1,310,584,874","$41,485,425,637"
8,XRP,$0.797448,"$2,397,590,866","$38,050,878,721"
9,Polkadot,$27.16,"$1,170,727,794","$29,156,883,954"
10,Terra,$81.43,"$2,603,848,338","$29,056,815,194"


In [42]:
# Execute this to save new versions of the notebook
jovian.commit(project="WebScrapingCrypto")

<IPython.core.display.Javascript object>

[jovian] Updating notebook "adityahebbarnhnm/webscrapingcrypto" on https://jovian.ai[0m
[jovian] Committed successfully! https://jovian.ai/adityahebbarnhnm/webscrapingcrypto[0m


'https://jovian.ai/adityahebbarnhnm/webscrapingcrypto'

# Save the extracted information to a CSV file
The last steps will be to convert our data from a `DataFrame` to a CSV file, a universal data format. CSV files can be used for many things, or read by a spreadsheet program to make a more pleasing presentation of the data.

We'll write a function which will convert the keys in the dictionary to the header row, and the values will become data rows.

To convert `datframe` to `CSV` file we use `to_csv()` method. 

In [46]:
final_df.to_csv('topcrypto.csv', index= False)

here `page_` is a `DataFrame`,`to_csv()` is a method and `topcrypto` is the name given to `CSV` file. Here `index=False` is saying that the index column of DataFrame should not be considered.

In [47]:
# Execute this to save new versions of the notebook
jovian.commit(project="WebScrapingCrypto")

<IPython.core.display.Javascript object>

[jovian] Updating notebook "adityahebbarnhnm/webscrapingcrypto" on https://jovian.ai[0m
[jovian] Committed successfully! https://jovian.ai/adityahebbarnhnm/webscrapingcrypto[0m


'https://jovian.ai/adityahebbarnhnm/webscrapingcrypto'

Let us have a look at our `CSV` file.
![Imgur](https://i.imgur.com/l1zEBPE.png)

# Summary
 What we have covered in this project
 - Downloaded the webpage containing information of top 100 crypto-    currencies using requests library
 - Parse HTML code using BeautifulSoup
 - created DataFrame using compiled and extracted information
 - DataFrame contains name, price, volume and market_cap
 - Created a CSV file and passed DataFrame to it. 

# References

1]Top Crypto currencies list website- https://www.coingecko.com/en 

2]Beautiful Soup Documentation -  https://beautiful-soup-4.readthedocs.io/en/latest/

3]Requests Documentation - https://docs.python-requests.org/en/master/index.html



In [51]:
# Execute this to save new versions of the notebook
jovian.commit(project="WebScrapingCrypto")

<IPython.core.display.Javascript object>

[jovian] Updating notebook "adityahebbarnhnm/webscrapingcrypto" on https://jovian.ai[0m
[jovian] Committed successfully! https://jovian.ai/adityahebbarnhnm/webscrapingcrypto[0m


'https://jovian.ai/adityahebbarnhnm/webscrapingcrypto'