# Tales from the Crypto

## 1. Sentiment Analysis

I am going to use the NEWSAPI to pull the latest news articles surrounding Bitcoin and Ethereum. Then, create a DataFrame of sentiment scores for each coin.

I will answer the following questions using descriptive statistics:

1. Which coin had the highest mean positive score?

2. Which coin had the highest negative score?

3. Which coin had the highest positive score?

In [1]:
# Initial imports
import os
import pandas as pd
from dotenv import load_dotenv
import nltk as nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()

%matplotlib inline

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /Users/jesussaenz/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


In [2]:
# Read your api key environment variable
load_dotenv()

api_key = os.getenv("NewsAPI_Key") 

In [3]:
# Create a newsapi client
from newsapi import NewsApiClient

newsapi = NewsApiClient(api_key=api_key)

In [4]:
# Fetch all the news about Bitcoin

btc_headlines = newsapi.get_everything(
    q="bitcoin",
    language="en",
    page_size=100,
    sort_by="relevancy"
)

In [5]:
# Pirnt total articles
print(f"""Total articles about Bitcoin: 
      {btc_headlines['totalResults']}""")

Total articles about Bitcoin: 
      6629


In [6]:
# Show sample article
btc_headlines["articles"][0]

{'source': {'id': 'techcrunch', 'name': 'TechCrunch'},
 'author': 'Sarah Perez',
 'title': 'PayPal expands the ability to buy, hold and sell cryptocurrency to the U.K.',
 'description': 'PayPal will now allow users outside the U.S. to buy, hold and sell cryptocurrency for the first time. The company announced today the launch of a new service that will allow customers in the U.K. to select between four types of cryptocurrencies — including Bi…',
 'url': 'http://techcrunch.com/2021/08/23/paypal-expands-the-ability-to-buy-hold-and-sell-cryptocurrency-to-the-u-k/',
 'urlToImage': 'https://techcrunch.com/wp-content/uploads/2020/11/GettyImages-887657568.jpg?w=600',
 'publishedAt': '2021-08-23T13:49:45Z',
 'content': 'PayPal will now allow users outside the U.S. to buy, hold and sell cryptocurrency for the first time. The company announced today the launch of a new service that will allow customers in the U.K. to … [+4420 chars]'}

In [7]:
# Create the Bitcoin sentiment scores DataFrame
btc_sentiments = []

for article in btc_headlines["articles"]:
    try:
        text = article["content"]
        date = article["publishedAt"][:10]
        sentiment = analyzer.polarity_scores(text)
        compound = sentiment["compound"]
        pos = sentiment["pos"]
        neu = sentiment["neu"]
        neg = sentiment["neg"]
        
        btc_sentiments.append({
            "text": text,
            "date": date,
            "compound": compound,
            "positive": pos,
            "negative": neg,
            "neutral": neu
            
        })
        
    except AttributeError:
        pass
    
# Create DataFrame
btc_df = pd.DataFrame(btc_sentiments)

# Reorder DataFrame columns
cols = ["date", "text", "compound", "positive", "negative", "neutral"]
btc_df = btc_df[cols]

btc_df.head()

Unnamed: 0,date,text,compound,positive,negative,neutral
0,2021-08-23,PayPal will now allow users outside the U.S. t...,0.4215,0.098,0.0,0.902
1,2021-09-07,A recently-installed Bitcoin ATM.\r\n\n \n\n A...,0.1779,0.052,0.0,0.948
2,2021-09-07,The government of El Salvador purchased at lea...,0.128,0.046,0.0,0.954
3,2021-08-19,Retailers are increasingly accepting cryptocur...,0.6187,0.153,0.0,0.847
4,2021-08-23,"PayPal is bringing the ability to buy, hold an...",0.6908,0.161,0.0,0.839


In [8]:
# Fetch all the news about Ethereum

eth_headlines = newsapi.get_everything(
    q="ethereum",
    language="en",
    page_size=100,
    sort_by="relevancy"
)

In [9]:
# Pirnt total articles
print(f"""Total articles about Ethereum: 
      {eth_headlines['totalResults']}""")

Total articles about Ethereum: 
      2818


In [10]:
# Show sample article
eth_headlines["articles"][0]

{'source': {'id': 'techcrunch', 'name': 'TechCrunch'},
 'author': 'Lucas Matney',
 'title': 'Offchain Labs raises $120 million to hide Ethereum’s shortcomings with its Arbitrum product',
 'description': 'As the broader crypto world enjoys a late summer surge in enthusiasm, more and more blockchain developers who have taken the plunge are bumping into the blaring scaling issues faced by decentralized apps on the Ethereum blockchain. The popular network has see…',
 'url': 'http://techcrunch.com/2021/08/31/offchain-labs-raises-120-million-to-hide-ethereums-shortcomings-with-arbitrum-scaling-product/',
 'urlToImage': 'https://techcrunch.com/wp-content/uploads/2021/08/Image-from-iOS-5.jpg?w=533',
 'publishedAt': '2021-08-31T12:30:39Z',
 'content': 'As the broader crypto world enjoys a late summer surge in enthusiasm, more and more blockchain developers who have taken the plunge are bumping into the blaring scaling issues faced by decentralized … [+3414 chars]'}

In [11]:
# Create the Bitcoin sentiment scores DataFrame
eth_sentiments = []

for article in eth_headlines["articles"]:
    try:
        text = article["content"]
        date = article["publishedAt"][:10]
        sentiment = analyzer.polarity_scores(text)
        compound = sentiment["compound"]
        pos = sentiment["pos"]
        neu = sentiment["neu"]
        neg = sentiment["neg"]
        
        eth_sentiments.append({
            "text": text,
            "date": date,
            "compound": compound,
            "positive": pos,
            "negative": neg,
            "neutral": neu
            
        })
        
    except AttributeError:
        pass
    
# Create DataFrame
eth_df = pd.DataFrame(eth_sentiments)

# Reorder DataFrame columns
cols = ["date", "text", "compound", "positive", "negative", "neutral"]
eth_df = eth_df[cols]

eth_df.head()

Unnamed: 0,date,text,compound,positive,negative,neutral
0,2021-08-31,As the broader crypto world enjoys a late summ...,0.7351,0.167,0.0,0.833
1,2021-08-23,PayPal will now allow users outside the U.S. t...,0.4215,0.098,0.0,0.902
2,2021-08-23,"PayPal is bringing the ability to buy, hold an...",0.6908,0.161,0.0,0.839
3,2021-08-23,One of the most unusual cryptocurrency heists ...,-0.1027,0.0,0.043,0.957
4,2021-08-18,"Vitalik Buterin, founder of ethereum, during T...",0.0,0.0,0.0,1.0


In [12]:
# Get descriptive stats from the DataFrames
btc_df.describe()

Unnamed: 0,compound,positive,negative,neutral
count,100.0,100.0,100.0,100.0
mean,0.095451,0.06229,0.03603,0.90167
std,0.372355,0.060293,0.061654,0.082508
min,-0.8934,0.0,0.0,0.622
25%,0.0,0.0,0.0,0.84625
50%,0.0708,0.0505,0.0,0.909
75%,0.3453,0.0965,0.05825,0.9555
max,0.8116,0.213,0.312,1.0


In [13]:
btc_df.describe()[["positive","negative","neutral"]]

Unnamed: 0,positive,negative,neutral
count,100.0,100.0,100.0
mean,0.06229,0.03603,0.90167
std,0.060293,0.061654,0.082508
min,0.0,0.0,0.622
25%,0.0,0.0,0.84625
50%,0.0505,0.0,0.909
75%,0.0965,0.05825,0.9555
max,0.213,0.312,1.0


In [14]:
eth_df.describe()

Unnamed: 0,compound,positive,negative,neutral
count,100.0,100.0,100.0,100.0
mean,0.09685,0.05843,0.03092,0.91065
std,0.35928,0.063145,0.051257,0.076647
min,-0.8934,0.0,0.0,0.688
25%,-0.0418,0.0,0.0,0.86325
50%,0.0,0.052,0.0,0.92
75%,0.4019,0.08875,0.06525,1.0
max,0.8442,0.288,0.312,1.0


In [15]:
eth_df.describe()[["positive","negative","neutral"]]

Unnamed: 0,positive,negative,neutral
count,100.0,100.0,100.0
mean,0.05843,0.03092,0.91065
std,0.063145,0.051257,0.076647
min,0.0,0.0,0.688
25%,0.0,0.0,0.86325
50%,0.052,0.0,0.92
75%,0.08875,0.06525,1.0
max,0.288,0.312,1.0


The coin with the highest mean positve score between Ethereum and Bitcoin was Ethereum, but not by much.

The coin that took the highest compound score between the two was Bitcoin.

The coin with the highest positive score was Ethereum, but not by much.

### 2. Natural Language Processing

#### Tokenizer
In this section, I will use NLTK and Python to tokenize the text for each coin.
- Lowercase each word
- Remove punctuation
- Remove Stopwords

In [16]:
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer, PorterStemmer
from string import punctuation
import re

In [17]:
# Instantiate the lemmatizer
lemmatizer = WordNetLemmatizer()

In [18]:
# Seperate text column for tokenization
btc_text = pd.DataFrame(btc_df["text"])

In [19]:
# Seperate text column for tokenization
eth_text = pd.DataFrame(eth_df["text"])

In [39]:
# Create tokenization function
def tokenizer(docs):
    """Tokenizes DataFrame."""
     # Remove the punctuation from text
    docs["text_no_punctuation"] = docs["text"].str.replace('[^\w\s]','')
   
    # Create a tokenized list of the words
    docs['tokenized_text'] = docs["text_no_punctuation"].apply(word_tokenize) 
    new_docs = pd.DataFrame(docs[["tokenized_text"]])
    
    return new_docs

In [51]:
# BTC articles set tokens
tokens_btc = tokenizer(btc_text)

In [52]:
# Eth articles set tokens
tokens_eth = tokenizer(eth_text)

In [55]:
tokens_btc.head()

Unnamed: 0,tokenized_text
0,"[PayPal, will, now, allow, users, outside, the..."
1,"[A, recentlyinstalled, Bitcoin, ATM, As, of, t..."
2,"[The, government, of, El, Salvador, purchased,..."
3,"[Retailers, are, increasingly, accepting, cryp..."
4,"[PayPal, is, bringing, the, ability, to, buy, ..."
