<h1>Emotion Detection and Information Retrieval</h1>

<h2>Installing Libraries</h2>

In [49]:
#!pip install duckduckgo-search
#!pip install search-engines
#!pip install beautifulsoup4

<h2>Defining Relevant Tokens</h2>

In [50]:
cryptocurrency_name = "bitcoin"
cryptocurrency_symbol = "$BTC"
keywords = f"{cryptocurrency_name} {cryptocurrency_symbol} sentiment"
keywords_for_news = f"{cryptocurrency_name} crypto"

<h2>Retrieve Links from Search Engines</h2>

In [51]:
# References:
############# DuckDuckGo
# (GoogleSearch) 1. https://medium.com/@nutanbhogendrasharma/how-to-scrape-google-search-engines-in-python-44770b8eab5
# (DuckDuckGo)   2. https://pypi.org/project/duckduckgo-search/
# (DuckDuckGo vs GoogleSearch) 3. https://medium.com/hackernoon/duckduckgo-vs-google-what-you-need-to-know-869368b08c4f
# (DuckDuckGo vs GoogleSearch) 4. https://www.cnet.com/tech/mobile/in-ios-17-apple-adds-ability-to-change-search-engine-in-safari-private-browsing/

############# Serch engines like Bing or Yahoo
#https://pypi.org/project/search-engines/

<h3> Importing Libraries </h3>

In [52]:
from duckduckgo_search import DDGS
from search_engines import bing_search, yahoo_search
import requests

In [53]:
MAX_SITES_RESULTS = 100
TIMEOUT_SECONDS = 5

<h3>Functions</h3>

In [54]:
def get_results(search_engine, page_url):
    try:
        response = requests.get(page_url, timeout=TIMEOUT_SECONDS)
        response.raise_for_status()  # Raise an exception for HTTP errors
        html = response.text
        results, next_page_url = search_engine.extract_search_results(html, page_url)
        return results, response.url
    except requests.exceptions.RequestException as e:
        print(f"An error occurred during 'get_results' function execution: {e}")
        return [], None

In [55]:
def fetch_search_results(search_engine, query, max_results):
    search_results = []
    search_url = search_engine.get_search_url(query)

    while len(search_results) < max_results and search_url:
        try:
            next_search_results, search_url = get_results(search_engine, search_url)
            for result in next_search_results:
                if "url" in result:
                    search_results.append(result['url'])
        except Exception as e:
            print(f"An error occurred during 'fetch_search_results' function execution: {e}")
            break

    return search_results[:max_results]


<h3>DuckDuckGo for SearchEngine</h3>

In [56]:
ddgs_results = list(map(lambda r: r["href"], DDGS().text(keywords, max_results=MAX_SITES_RESULTS)))
ddgs_results

['https://alternative.me/crypto/fear-and-greed-index/',
 'https://coincodex.com/sentiment/',
 'https://www.forbes.com/sites/digital-assets/2023/10/05/blackrock-insider-primes-crypto-for-a-huge-177-wall-street-earthquake-that-could-hit-the-price-of-bitcoin-ethereum-and-xrp/',
 'https://www.fxstreet.com/cryptocurrencies/news/bitcoin-weekly-forecast-btc-bearish-fractal-holds-steady-ahead-of-us-nonfarm-payrolls-202310061127',
 'https://www.cnbc.com/2023/10/05/bitcoin-is-a-critical-hedge-against-currency-debasement-and-return-of-inflation-jefferies-says.html',
 'https://www.kaggle.com/datasets/gautamchettiar/bitcoin-sentiment-analysis-twitter-data',
 'https://twitter.com/btcsentimentCOM',
 'https://www.lookintobitcoin.com/charts/bitcoin-fear-and-greed-index/',
 'https://www.forbes.com/sites/roomykhan/2023/10/04/spot-bitcoin-etf-launch-will-solidify-crypto-as-a-distinct-asset-class/',
 'https://www.tradingview.com/news/newsbtc:d44dd6202094b:0-bitcoin-price-projection-soars-btc-gold-ratio-ind

<h3>Bing for SearchEngine</h3>

In [57]:
bing_search_results = fetch_search_results(bing_search, keywords, MAX_SITES_RESULTS)
bing_search_results

Extracted 10 results from page .
Extracted 10 results from page .
Extracted 10 results from page .
Extracted 10 results from page .
Extracted 10 results from page .
Extracted 10 results from page .
Extracted 10 results from page .
Extracted 10 results from page .
Extracted 10 results from page .
Extracted 10 results from page .


['https://alternative.me/crypto/fear-and-greed-index/',
 'https://es.investing.com/crypto/bitcoin/btc-usd-scoreboard',
 'https://www.augmento.ai/bitcoin-sentiment/',
 'https://coincodex.com/sentiment/',
 'https://www.bittsanalytics.com/sentiment-index/BTC',
 'https://www.newsbtc.com/bitcoin-news/bitcoin-sentiment-close-extreme-fear-why-matters/',
 'https://www.fxstreet.com/cryptocurrencies/news/bitcoin-price-could-revisit-10-000-amid-growing-correlation-with-us-dollar-index-202310042140',
 'https://finbold.com/bitcoin-btc-immutable-imx-and-vc-spectra-spct-sustain-bullish-sentiment-in-crypto-community/',
 'https://www.bloomberg.com/news/articles/2023-10-03/bitcoin-btc-price-dips-from-six-week-high-amid-bond-rout',
 'https://www.dailyfx.com/news/bitcoin-ethereum-jump-btc-usd-eth-usd-price-action-20231002.html',
 'https://alternative.me/crypto/fear-and-greed-index/',
 'https://es.investing.com/crypto/bitcoin/btc-usd-scoreboard',
 'https://www.augmento.ai/bitcoin-sentiment/',
 'https://coi

<h3>Yahoo for SearchEngine</h3>

In [58]:
yahoo_search_results = fetch_search_results(yahoo_search, keywords, MAX_SITES_RESULTS)
yahoo_search_results

['https://alternative.me/crypto/fear-and-greed-index/',
 'https://www.augmento.ai/bitcoin-sentiment/',
 'https://coincodex.com/sentiment/',
 'https://coincodex.com/sentiment/',
 'https://www.lookintobitcoin.com/charts/bitcoin-fear-and-greed-index/',
 'https://www.cryptoeq.io/sentiment-report/sentiment/bitcoin',
 'https://alternative.me/crypto/fear-and-greed-index/',
 'https://www.augmento.ai/bitcoin-sentiment/',
 'https://coincodex.com/sentiment/',
 'https://coincodex.com/sentiment/',
 'https://www.lookintobitcoin.com/charts/bitcoin-fear-and-greed-index/',
 'https://www.cryptoeq.io/sentiment-report/sentiment/bitcoin',
 'https://alternative.me/crypto/fear-and-greed-index/',
 'https://www.augmento.ai/bitcoin-sentiment/',
 'https://coincodex.com/sentiment/',
 'https://coincodex.com/sentiment/',
 'https://www.lookintobitcoin.com/charts/bitcoin-fear-and-greed-index/',
 'https://www.cryptoeq.io/sentiment-report/sentiment/bitcoin',
 'https://alternative.me/crypto/fear-and-greed-index/',
 'htt

<h3> Mergin all Results </h3>

In [59]:
search_engines_results = set().union(yahoo_search_results, bing_search_results, ddgs_results)
search_engines_results

{'https://academy.binance.com/en/articles/what-is-crypto-market-sentiment',
 'https://alternative.me/crypto/',
 'https://alternative.me/crypto/fear-and-greed-index/',
 'https://ambcrypto.com/bitcoin-analyzing-the-latest-sentiment-in-btc-derivatives-market/',
 'https://beincrypto.com/unbelievable-bitcoin-price-predictions-2023-top-analysts/',
 'https://bitcoinist.com/bitcoin-sentiment-returns-neutral-prices-down/',
 'https://bitcoinmagazine.com/markets/grayscale-wins-court-case-vs-sec-bitcoin-etf-on-the-horizon',
 'https://bitcoinmagazine.com/markets/social-sentiment-and-the-bitcoin-price',
 'https://bitcointalk.org/index.php?topic=5465299.0',
 'https://coincodex.com/crypto/bitcoin/price-prediction/',
 'https://coincodex.com/fear-greed/',
 'https://coincodex.com/sentiment/',
 'https://coinmarketcap.com/currencies/bitcoin/',
 'https://coinpedia.org/price-prediction/bitcoin-price-prediction/',
 'https://coinpedia.org/research-report/secs-bitcoin-etf-delay-impact-on-investor-sentiments-and

In [60]:
len(search_engines_results)

105

<h2>Web Scraping</h2>

In [61]:
# References:
# (BeautifulSoup) 1. https://ai.plainenglish.io/mastering-web-scraping-and-sentiment-analysis-with-python-and-machine-learning-255d1d6234c5
#                 2. https://j2logo.com/python/web-scraping-con-python-guia-inicio-beautifulsoup/
#                 3. https://www.geeksforgeeks.org/remove-all-style-scripts-and-html-tags-using-beautifulsoup/

<h3> Importing Libraries </h3>

In [62]:
from bs4 import BeautifulSoup

<h3>Functions</h3>

In [63]:
def fetch_page_content(page_url):
    try:
        response = requests.get(page_url, timeout=TIMEOUT_SECONDS)
        response.raise_for_status()
        return response.text
    except Exception as e:
        print(f"Error fetching page content: {e}")
        return None

In [64]:
def remove_unwanted_elements(soup):
    # Implement the logic to remove unwanted elements from the soup object
    pass

In [65]:
def get_mineable_text_from_soup(soup):
    remove_unwanted_elements(soup)
    return " ".join(soup.stripped_strings)

In [66]:
def analyze_page_content(page_url):
    try:
        page_content = fetch_page_content(page_url)
        if page_content:
            soup = BeautifulSoup(page_content, "html.parser")
        
            page_title = soup.find("title").get_text().strip()
            print(f"Title web page: {page_title}")
        
            mineable_text = get_mineable_text_from_soup(soup)
        
            return {
                'title': page_title
            }
    except Exception as e:
        print(f"Error in retrieving {page_url}")
        print(f"An error occurred {e}")
        return None

In [67]:
web_scrap_pages = list(filter(lambda wbp: wbp is not None, map(analyze_page_content, search_engines_results)))
web_scrap_pages

Title web page: Bitcoin price breaches $30,000 on Binance in mega rally
Title web page: Bitcoin Weekly Forecast: BTC bearish fractal holds steady ahead of US Nonfarm Payrolls
Title web page: sentix Bitcoin sentiment index - Crypto Currencies Sentiment
Title web page: 7 Cryptos to Watch as Market Sentiment Hits a Snag | InvestorPlace
Error fetching page content: 403 Client Error: Forbidden for url: https://bitcoinmagazine.com/markets/grayscale-wins-court-case-vs-sec-bitcoin-etf-on-the-horizon
Error fetching page content: 403 Client Error: Forbidden for url: https://www.investing.com/crypto/bitcoin/btc-usd-scoreboard
Title web page: Bitcoin Trader Sentiment Returns To Greed As BTC Jumps Past $25,000
Error fetching page content: 403 Client Error: Forbidden for url: https://www.researchgate.net/publication/346808998_A_complete_vader-based_sentiment_analysis_of_bitcoin_BTC_tweets_during_the_ERA_of_COVID-19
Title web page: Bitcoin Price Prediction as BTC Bulls Eye $30,000 Resistance as Non-F

Error fetching page content: 403 Client Error: Forbidden for url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9581699/
Error fetching page content: 403 Client Error: Forbidden for url: https://beincrypto.com/unbelievable-bitcoin-price-predictions-2023-top-analysts/
Title web page: Trade USD | USD to  | Bitcoin (USD)  | IG International
Title web page: Market Wrap: Bitcoin Sentiment Turns Extremely Bearish
Title web page: Bitcoin: Analyzing the latest sentiment in BTC derivatives market - AMBCrypto
Title web page: Bitcoin (BTC), Immutable (IMX), and VC Spectra (SPCT) Sustain Bullish Sentiment in Crypto Community | Finbold
Title web page: Bitcoin price today, BTC to USD live price, marketcap and chart | CoinMarketCap
Error fetching page content: 403 Client Error: Forbidden for url: https://www.coinbase.com/price/bitcoin
Title web page: Bitcoin (BTC) Price, Real-time Quote & News - Google Finance
Title web page: Bitcoin rally driven by believers collecting evermore tokens
Title web page:

[{'title': 'Bitcoin price breaches $30,000 on Binance in mega rally'},
 {'title': 'Bitcoin Weekly Forecast: BTC bearish fractal holds steady ahead of US Nonfarm Payrolls'},
 {'title': 'sentix Bitcoin sentiment index - Crypto Currencies Sentiment'},
 {'title': '7 Cryptos to Watch as Market Sentiment Hits a Snag | InvestorPlace'},
 {'title': 'Bitcoin Trader Sentiment Returns To Greed As BTC Jumps Past $25,000'},
 {'title': 'Bitcoin Price Prediction as BTC Bulls Eye $30,000 Resistance as Non-Farm Payroll Data is Released â\x80\x93 A Comeback in Play?'},
 {'title': 'Bitcoin Price Prediction 2023, 2024, 2025, 2026 - 2030'},
 {'title': 'The First Bitcoin Mining Pool From El Salvador Is Here'},
 {'title': 'Fear And Greed Index | LookIntoBitcoin'},
 {'title': 'Bloomberg - Are you a robot?'},
 {'title': 'Crypto Dashboard - Alternative.me'},
 {'title': 'Rising Sentiment Dampens Bitcoin’s Correlation With U.S. Stocks'},
 {'title': "Bitcoin bulls in full retreat as BTC sentiment slumps to 'fear' t

<h2>Sentiment Analysis</h2>

<h2>Storage of Sentiment and Pages</h2>