# Unit 12 - Tales from the Crypto

---


## 1. Sentiment Analysis

Use the [newsapi](https://newsapi.org/) to pull the latest news articles for Bitcoin and Ethereum and create a DataFrame of sentiment scores for each coin.

Use descriptive statistics to answer the following questions:
1. Which coin had the highest mean positive score?
2. Which coin had the highest negative score?
3. Which coin had the highest positive score?

In [13]:
# Initial imports
import os
import pandas as pd
from dotenv import load_dotenv
from newsapi import NewsApiClient
import nltk as nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()

%matplotlib inline

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /Users/AndrewArgyrou/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


In [14]:
# Read your api key environment variable
load_dotenv()
api_key = os.getenv("NEWS_API_KEY")
type(api_key)

str

In [15]:
# Create a newsapi client
newsapi = NewsApiClient(api_key=api_key)

In [44]:
# Fetch the Bitcoin news articles
bitcoin_news_en = newsapi.get_everything(
    q="Bitcoin, BTC, btc, bitcoin",
    language="en",
    page_size=100,
    sort_by="relevancy",
)

# Show the total number of news
bitcoin_news_en["totalResults"]

2796

In [45]:
# Fetch the Ethereum news articles
ethereum_news_en = newsapi.get_everything(
    q="Ethereum, ETH, eth, ethereum",
    language="en",
    page_size=100,
    sort_by="relevancy",
)

# Show the total number of news
ethereum_news_en["totalResults"]

1551

In [46]:
# Create the Bitcoin sentiment scores DataFrame
bitcoin_sentiments = []

for article in bitcoin_news_en["articles"]:
    try:
        text = article["content"].lower()
        title = article["title"]
        date = article["publishedAt"][:10]
        sentiment = analyzer.polarity_scores(text)
        compound = sentiment["compound"]
        pos = sentiment["pos"]
        neu = sentiment["neu"]
        neg = sentiment["neg"]
        
        bitcoin_sentiments.append({"Text": text, "Date": date, "Compound": compound, "Positive": pos, "Negative": neg, "Neutral": neu, "Title Bitcoin": title})
        
    except AttributeError:
        pass
    
# Create DataFrame
bitcoin_df = pd.DataFrame(bitcoin_sentiments)

# Reorder DataFrame columns
cols = ["Title Bitcoin","Text","Date","Compound","Negative","Neutral","Positive",]
bitcoin_df = bitcoin_df[cols]
#bitcoin_df.sort_values(by='Negative',ascending=False,inplace=True)
bitcoin_df.tail()

Unnamed: 0,Title Bitcoin,Text,Date,Compound,Negative,Neutral,Positive
95,Bitcoin Stabilizes at $40K Support; Resistance...,the leader in news and information on cryptocu...,2022-04-12,0.1779,0.0,0.95,0.05
96,"Bitcoin Neutral, Support at $37K and Resistanc...",bitcoin (btc) continued to bounce around the $...,2022-04-14,-0.34,0.072,0.928,0.0
97,Bitcoin Holding Support With Higher Price Lows...,bitcoin (btc) has maintained support above $37...,2022-04-20,-0.1531,0.116,0.811,0.073
98,Market Wrap: Bitcoin Stabilizes as Bearish Sen...,cryptocurrencies were mixed on wednesday as bi...,2022-04-27,0.0,0.0,1.0,0.0
99,Market Crash 2022: 2 Cryptocurrencies to Buy f...,"financial markets are in a rough patch, as the...",2022-05-07,-0.4215,0.158,0.761,0.082


In [47]:
# Create the Ethereum sentiment scores DataFrame
ethereum_sentiments = []

for article in ethereum_news_en["articles"]:
    try:
        text = article["content"].lower()
        title = article["title"]
        date = article["publishedAt"][:10]
        sentiment = analyzer.polarity_scores(text)
        compound = sentiment["compound"]
        pos = sentiment["pos"]
        neu = sentiment["neu"]
        neg = sentiment["neg"]
        
        ethereum_sentiments.append({"Text": text, "Date": date, "Compound": compound, "Positive": pos, "Negative": neg, "Neutral": neu, "Title Bitcoin": title})
        
    except AttributeError:
        pass
    
# Create DataFrame
ethereum_df = pd.DataFrame(ethereum_sentiments)

# Reorder DataFrame columns
cols = ["Title Bitcoin","Text","Date","Compound","Negative","Neutral","Positive",]
ethereum_df = ethereum_df[cols]
#ethereum_df.sort_values(by='Negative',ascending=False,inplace=True)
ethereum_df.tail()

Unnamed: 0,Title Bitcoin,Text,Date,Compound,Negative,Neutral,Positive
95,Market Roundup: $APE Surges in a Bummer of a M...,april hasnt been kind to crypto. but there is ...,2022-04-28,0.0803,0.107,0.802,0.091
96,"BAYC Team Raises $285M With Otherside NFTs, Cl...","after muchado, yuga labs held its long-awaited...",2022-05-01,0.0,0.0,1.0,0.0
97,Nvidia Stock Falls on an Analyst Downgrade -- ...,share prices of nvidia ( nvda -1.88% ) have be...,2022-04-13,-0.1531,0.106,0.833,0.061
98,Treasury Sanctions More North Korea-Linked ETH...,u.s. government officials are throwing a wider...,2022-04-22,0.5106,0.0,0.867,0.133
99,Descending channel pattern and weak futures da...,despite bouncing from a 45-day low on april 30...,2022-05-05,0.0772,0.096,0.824,0.08


In [48]:
# Describe the Bitcoin Sentiment
bitcoin_df.describe()

Unnamed: 0,Compound,Negative,Neutral,Positive
count,100.0,100.0,100.0,100.0
mean,0.074542,0.04327,0.89454,0.0622
std,0.365947,0.057708,0.079298,0.055902
min,-0.743,0.0,0.608,0.0
25%,-0.15705,0.0,0.845,0.0
50%,0.128,0.0,0.908,0.062
75%,0.296,0.068,0.95,0.08425
max,0.8387,0.224,1.0,0.24


In [49]:
# Describe the Ethereum Sentiment
ethereum_df.describe()

Unnamed: 0,Compound,Negative,Neutral,Positive
count,100.0,100.0,100.0,100.0
mean,0.189138,0.03497,0.88435,0.0807
std,0.393673,0.050533,0.08209,0.071051
min,-0.7845,0.0,0.635,0.0
25%,0.0,0.0,0.833,0.0
50%,0.2023,0.0,0.8905,0.073
75%,0.488875,0.05825,0.9435,0.1225
max,0.8934,0.231,1.0,0.332


### Questions:

**Q: Which coin had the highest mean positive score?**

A: Ethereum had the highest positive mean score of 0.081 compared to Bitcoin's 0.062.

**Q: Which coin had the highest compound score?**

A: Ethereum had the highest compound max score of 0.89 compared to Bitcoin's 0.84. Additionally, Ethereum had a higher mean compound score of 0.19 compared to Bitcoin's 0.07

**Q. Which coin had the highest positive score?**

A: Ethereum had a higher max positive score of 0.033 compared to Bitcoin's 0.24.

---

## 2. Natural Language Processing
---
###   Tokenizer

In this section, you will use NLTK and Python to tokenize the text for each coin. Be sure to:
1. Lowercase each word.
2. Remove Punctuation.
3. Remove Stopwords.

In [50]:
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer, PorterStemmer
from string import punctuation
import re

In [51]:
# Instantiate the lemmatizer
lemmatizer = WordNetLemmatizer()

# Create a list of stopwords
sw = set(stopwords.words('english'))

# Expand the default stopwords list if necessary
# YOUR CODE HERE!

In [54]:
# Complete the tokenizer function
# Complete the tokenizer function
def tokenizer(text):
    """Tokenizes text."""
    # Remove the stop words
    sw = set(stopwords.words('english'))
    
    # Remove the punctuation from text
    regex = re.compile("[^a-zA-Z ]")
    re_clean = regex.sub('', text)

    # Create a tokenized list of the words
    words = word_tokenize(re_clean)
    
    # Lemmatize words into root words
    lem = [lemmatizer.lemmatize(word) for word in words]


    # Convert the words to lowercase
    tokens = [word.lower() for word in words if word.lower() not in sw]
    
    
    return tokens

In [55]:
# Create a new tokens column for Bitcoin
bitcoin_df["Tokens"] = bitcoin_df["Text"].apply(tokenizer)
bitcoin_df.head()

Unnamed: 0,Title Bitcoin,Text,Date,Compound,Negative,Neutral,Positive,Tokens
0,Miami’s Bitcoin Conference Left a Trail of Har...,"now, even though there are a number of women-f...",2022-05-10,0.0772,0.0,0.964,0.036,"[even, though, number, womenfocused, crypto, s..."
1,"Bitcoin, Ethereum Technical Analysis: BTC Slip...","btc fell to its lowest level since last july, ...",2022-05-09,0.1027,0.066,0.859,0.076,"[btc, fell, lowest, level, since, last, july, ..."
2,3 Reddit Stocks That Could Roar in Q2,<ul><li>elon musk will be able to focus on tes...,2022-04-19,0.4404,0.0,0.921,0.079,"[ullielon, musk, able, focus, onteslatsla, pro..."
3,Protecting Retirement Savings from Volatile Cr...,did you hear? you may be able to allocate some...,2022-05-09,0.128,0.0,0.955,0.045,"[hear, may, able, allocate, k, retirement, sav..."
4,10 things before the opening bell,good morning. the combo of hawkish fed policy ...,2022-05-03,0.2732,0.118,0.713,0.17,"[good, morning, combo, hawkish, fed, policy, h..."


In [57]:
# Create a new tokens column for Ethereum
ethereum_df["Tokens"] = ethereum_df["Text"].apply(tokenizer)
ethereum_df.head()

Unnamed: 0,Title Bitcoin,Text,Date,Compound,Negative,Neutral,Positive,Tokens
0,US blames North Korean hacker group for $625 m...,the us department of treasury says lazarus is ...,2022-04-14,-0.7845,0.231,0.681,0.088,"[us, department, treasury, says, lazarus, behi..."
1,"Bitcoin, Ethereum Technical Analysis: BTC Slip...","btc fell to its lowest level since last july, ...",2022-05-09,0.1027,0.066,0.859,0.076,"[btc, fell, lowest, level, since, last, july, ..."
2,Yuga Labs apologises after sale of virtual lan...,a multi-billion dollar cryptocurrency company ...,2022-05-02,-0.2263,0.075,0.879,0.046,"[multibillion, dollar, cryptocurrency, company..."
3,How Bored Ape Yacht Club Broke Ethereum - CNET,when bored ape yacht club creators yuga labs a...,2022-05-04,-0.2732,0.055,0.945,0.0,"[bored, ape, yacht, club, creators, yuga, labs..."
4,How the BAYC metaverse mint raised Ethereum ga...,if you ever wanted to buy an nft based on ethe...,2022-05-02,-0.1027,0.036,0.964,0.0,"[ever, wanted, buy, nft, based, ethereum, woul..."


---

### NGrams and Frequency Analysis

In this section you will look at the ngrams and word frequency for each coin. 

1. Use NLTK to produce the n-grams for N = 2. 
2. List the top 10 words for each coin. 

In [58]:
from collections import Counter
from nltk import ngrams

In [None]:
# Generate the Bitcoin N-grams where N=2
# YOUR CODE HERE!

In [None]:
# Generate the Ethereum N-grams where N=2
# YOUR CODE HERE!

In [None]:
# Function token_count generates the top 10 words for a given coin
def token_count(tokens, N=3):
    """Returns the top N tokens from the frequency count"""
    return Counter(tokens).most_common(N)

In [None]:
# Use token_count to get the top 10 words for Bitcoin
# YOUR CODE HERE!

In [None]:
# Use token_count to get the top 10 words for Ethereum
# YOUR CODE HERE!

---

### Word Clouds

In this section, you will generate word clouds for each coin to summarize the news for each coin

In [None]:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = [20.0, 10.0]

In [None]:
# Generate the Bitcoin word cloud
# YOUR CODE HERE!

In [None]:
# Generate the Ethereum word cloud
# YOUR CODE HERE!

---
## 3. Named Entity Recognition

In this section, you will build a named entity recognition model for both Bitcoin and Ethereum, then visualize the tags using SpaCy.

In [None]:
import spacy
from spacy import displacy

In [None]:
# Download the language model for SpaCy
# !python -m spacy download en_core_web_sm

In [None]:
# Load the spaCy model
nlp = spacy.load('en_core_web_sm')

---
### Bitcoin NER

In [None]:
# Concatenate all of the Bitcoin text together
# YOUR CODE HERE!

In [None]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [None]:
# Render the visualization
# YOUR CODE HERE!

In [None]:
# List all Entities
# YOUR CODE HERE!

---

### Ethereum NER

In [None]:
# Concatenate all of the Ethereum text together
# YOUR CODE HERE!

In [None]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [None]:
# Render the visualization
# YOUR CODE HERE!

In [None]:
# List all Entities
# YOUR CODE HERE!

---