# Unit 12 - Tales from the Crypto

---


## 1. Sentiment Analysis

Use the [newsapi](https://newsapi.org/) to pull the latest news articles for Bitcoin and Ethereum and create a DataFrame of sentiment scores for each coin.

Use descriptive statistics to answer the following questions:
1. Which coin had the highest mean positive score?
2. Which coin had the highest negative score?
3. Which coin had the highest positive score?

In [3]:
# Initial imports
import os
import pandas as pd
from newsapi import NewsApiClient
from dotenv import load_dotenv
import nltk as nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()

%matplotlib inline

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /Users/atrmac/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


In [5]:
# Read your api key environment variable
load_dotenv()
api_key = os.getenv("news_api")

In [7]:
# Create a newsapi client
news_API = NewsApiClient(api_key)
news_API

<newsapi.newsapi_client.NewsApiClient at 0x7f8dc4738dd0>

In [20]:
# Fetch the Bitcoin news articles
bit_news = news_API.get_everything(
    q="Bitcoin",
    language="en"
)

# Verify data & print first 5 articles
bit_news["articles"][:5]

[{'source': {'id': None, 'name': 'Lifehacker.com'},
  'author': 'Jeff Somers',
  'title': 'Is the Crypto Bubble Going to Burst?',
  'description': 'Even if you aren’t paying attention to Bitcoin and other cryptocurrencies, you might have noticed that their value plummeted last week, with the total value of the market tumbling from a high of $3 trillion last year to about $1.5 trillion in a matter of days…',
  'url': 'https://lifehacker.com/is-the-crypto-bubble-going-to-burst-1848475768',
  'urlToImage': 'https://i.kinja-img.com/gawker-media/image/upload/c_fill,f_auto,fl_progressive,g_center,h_675,pg_1,q_80,w_1200/976a59b09e0e681e692bd7517498e3f2.jpg',
  'publishedAt': '2022-02-09T16:00:00Z',
  'content': 'Even if you arent paying attention to Bitcoin and other cryptocurrencies, you might have noticed that their value plummeted last week, with the total value of the market tumbling from a high of $3 tr… [+4782 chars]'},
 {'source': {'id': 'the-verge', 'name': 'The Verge'},
  'author': '

In [21]:
# Fetch the Ethereum news articles
eth_news = news_API.get_everything(
    q="Ethereum",
    language="en"
)

# Verify data & print first 5 articles
eth_news["articles"][:5]

[{'source': {'id': 'the-verge', 'name': 'The Verge'},
  'author': 'Corin Faife',
  'title': 'Crypto.com admits over $30 million stolen by hackers',
  'description': 'Cryptocurrency exchange Crypto.com has said that $15 million in ethereum and $18 million in bitcoin were stolen by hackers in a security breach',
  'url': 'https://www.theverge.com/2022/1/20/22892958/crypto-com-exchange-hack-bitcoin-ethereum-security',
  'urlToImage': 'https://cdn.vox-cdn.com/thumbor/mde_l3lUC4muDPEFG7LYrUz0O3g=/0x146:2040x1214/fit-in/1200x630/cdn.vox-cdn.com/uploads/chorus_asset/file/8921023/acastro_bitcoin_2.jpg',
  'publishedAt': '2022-01-20T13:23:31Z',
  'content': 'In a new blog post the company said that 4,836 ETH and 443 bitcoin were taken\r\nIllustration by Alex Castro / The Verge\r\nIn a blog post published in the early hours of Thursday morning, cryptocurrency… [+2004 chars]'},
 {'source': {'id': None, 'name': 'Gizmodo.com'},
  'author': 'Matt Novak',
  'title': "Hackers Launder $15 Million Stole

In [25]:
# Create the Bitcoin sentiment scores DataFrame
bit_sent = []

for article in bit_news["articles"]:
    try:
        date = article["publishedAt"][:10]
        text = article["content"]
        sentiment = analyzer.polarity_scores(text)
        compound = sentiment["compound"]
        pos = sentiment["pos"]
        neu = sentiment["neu"]
        neg = sentiment["neg"]
        
        bit_sent.append({
            "date": date,
            "text": text,
            "compound": compound,
            "positive": pos,
            "negative": neg,
            "neutral": neu
            
        })
        
    except AttributeError:
        pass
    
# Create DataFrame
bit_df = pd.DataFrame(bit_sent)

# Print first 5 rows of dataframe

bit_df.head()

Unnamed: 0,date,text,compound,positive,negative,neutral
0,2022-02-09,Even if you arent paying attention to Bitcoin ...,0.5859,0.124,0.0,0.876
1,2022-01-25,El Salvador introduced Bitcoin as a legal tend...,0.3182,0.105,0.0,0.895
2,2022-01-27,Israeli national Tal Prihar pled guilty to rou...,-0.3182,0.045,0.084,0.871
3,2022-01-20,"In a new blog post the company said that 4,836...",0.0,0.0,0.0,1.0
4,2022-02-11,Netflix\r\n is making a docuseries about one o...,-0.7096,0.0,0.169,0.831


In [27]:
# Create the Ethereum sentiment scores DataFrame
eth_sent = []

for article in eth_news["articles"]:
    try:
        date = article["publishedAt"][:10]
        text = article["content"]
        sentiment = analyzer.polarity_scores(text)
        compound = sentiment["compound"]
        pos = sentiment["pos"]
        neu = sentiment["neu"]
        neg = sentiment["neg"]
        
        eth_sent.append({
            "date": date,
            "text": text,
            "compound": compound,
            "positive": pos,
            "negative": neg,
            "neutral": neu
            
        })
        
    except AttributeError:
        pass
    
# Create DataFrame
eth_df = pd.DataFrame(eth_sent)

# Print first 5 rows of dataframe

eth_df.head()

Unnamed: 0,date,text,compound,positive,negative,neutral
0,2022-01-20,"In a new blog post the company said that 4,836...",0.0,0.0,0.0,1.0
1,2022-01-19,Hackers who made off with roughly $15 million ...,0.0,0.0,0.0,1.0
2,2022-01-20,"On some level, the new mayor is simply employi...",0.1779,0.052,0.0,0.948
3,2022-01-21,"Back in September\r\n, Robinhood announced pla...",0.0772,0.038,0.0,0.962
4,2022-02-10,If people who buy cryptocurrencies intended on...,-0.2023,0.039,0.062,0.899


In [28]:
# Describe the Bitcoin Sentiment
bit_df.describe()

Unnamed: 0,compound,positive,negative,neutral
count,20.0,20.0,20.0,20.0
mean,0.13952,0.08175,0.0429,0.87535
std,0.457346,0.059843,0.058105,0.065999
min,-0.7096,0.0,0.0,0.729
25%,-0.30155,0.04325,0.0,0.8355
50%,0.21395,0.064,0.0,0.876
75%,0.5859,0.12925,0.0885,0.917
max,0.7783,0.185,0.169,1.0


In [29]:
# Describe the Ethereum Sentiment
eth_df.describe()

Unnamed: 0,compound,positive,negative,neutral
count,20.0,20.0,20.0,20.0
mean,0.06772,0.05075,0.02685,0.9224
std,0.301101,0.052973,0.045677,0.071167
min,-0.6808,0.0,0.0,0.775
25%,-0.06745,0.0,0.0,0.88975
50%,0.0,0.0435,0.0,0.9425
75%,0.19,0.06325,0.05,0.973
max,0.6808,0.185,0.174,1.0


### Questions:

Q: Which coin had the highest mean positive score?

A: Bitcoin appears to have the highest positive mean score with .081750 compared to Ethereum's at .050750.

Q: Which coin had the highest compound score?

A: Bitcoin has the highest compound score with .778300 compared to Ethereum's at .680800.

Q. Which coin had the highest positive score?

A: Both appear to have the same highest positive score with .185000.

---

## 2. Natural Language Processing
---
###   Tokenizer

In this section, you will use NLTK and Python to tokenize the text for each coin. Be sure to:
1. Lowercase each word.
2. Remove Punctuation.
3. Remove Stopwords.

In [30]:
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer, PorterStemmer
from string import punctuation
import re

In [31]:
# Instantiate the lemmatizer
lemmatizer = WordNetLemmatizer()

In [32]:
# Create a list of stopwords
sw = set(stopwords.words('english'))

# Expand the default stopwords list if necessary
sw_addon = 

In [12]:
# Complete the tokenizer function
def tokenizer(text):
    """Tokenizes text."""
    
    # Remove the punctuation from text

   
    # Create a tokenized list of the words
    
    
    # Lemmatize words into root words

   
    # Convert the words to lowercase
    
    
    # Remove the stop words
    
    
    return tokens

In [13]:
# Create a new tokens column for Bitcoin
# YOUR CODE HERE!

In [14]:
# Create a new tokens column for Ethereum
# YOUR CODE HERE!

---

### NGrams and Frequency Analysis

In this section you will look at the ngrams and word frequency for each coin. 

1. Use NLTK to produce the n-grams for N = 2. 
2. List the top 10 words for each coin. 

In [15]:
from collections import Counter
from nltk import ngrams

In [16]:
# Generate the Bitcoin N-grams where N=2
# YOUR CODE HERE!

In [17]:
# Generate the Ethereum N-grams where N=2
# YOUR CODE HERE!

In [18]:
# Function token_count generates the top 10 words for a given coin
def token_count(tokens, N=3):
    """Returns the top N tokens from the frequency count"""
    return Counter(tokens).most_common(N)

In [19]:
# Use token_count to get the top 10 words for Bitcoin
# YOUR CODE HERE!

In [20]:
# Use token_count to get the top 10 words for Ethereum
# YOUR CODE HERE!

---

### Word Clouds

In this section, you will generate word clouds for each coin to summarize the news for each coin

In [21]:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = [20.0, 10.0]

In [22]:
# Generate the Bitcoin word cloud
# YOUR CODE HERE!

In [23]:
# Generate the Ethereum word cloud
# YOUR CODE HERE!

---
## 3. Named Entity Recognition

In this section, you will build a named entity recognition model for both Bitcoin and Ethereum, then visualize the tags using SpaCy.

In [24]:
import spacy
from spacy import displacy

In [25]:
# Download the language model for SpaCy
# !python -m spacy download en_core_web_sm

In [26]:
# Load the spaCy model
nlp = spacy.load('en_core_web_sm')

---
### Bitcoin NER

In [27]:
# Concatenate all of the Bitcoin text together
# YOUR CODE HERE!

In [28]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [29]:
# Render the visualization
# YOUR CODE HERE!

In [30]:
# List all Entities
# YOUR CODE HERE!

---

### Ethereum NER

In [31]:
# Concatenate all of the Ethereum text together
# YOUR CODE HERE!

In [32]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [33]:
# Render the visualization
# YOUR CODE HERE!

In [34]:
# List all Entities
# YOUR CODE HERE!

---