# Unit 12 - Tales from the Crypto

---


## 1. Sentiment Analysis

Use the [newsapi](https://newsapi.org/) to pull the latest news articles for Bitcoin and Ethereum and create a DataFrame of sentiment scores for each coin.

Use descriptive statistics to answer the following questions:
1. Which coin had the highest mean positive score?
2. Which coin had the highest negative score?
3. Which coin had the highest positive score?

In [47]:
# Initial imports
import os
import pandas as pd
import numpy as np
from dotenv import load_dotenv, find_dotenv
import nltk as nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()

%matplotlib inline

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /Users/manoloserrano/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


In [13]:
# Read your api key environment variable
load_dotenv(find_dotenv())
NEWS_API_KEY= os.environ.get('NEWS_API_KEY')



In [14]:
# Create a newsapi client
from newsapi import NewsApiClient

api=NewsApiClient(api_key=NEWS_API_KEY)



In [15]:
# Fetch the Bitcoin news articles
Bitcoin_headlines=api.get_everything(
    q='bitcoin',
    language='en',
    page_size=100,
    sort_by='relevancy'
 )
print(f"Total articles about Bitcoin:{Bitcoin_headlines['totalResults']}")
Bitcoin_headlines['articles'][0]

Total articles about Bitcoin:6459


{'source': {'id': 'engadget', 'name': 'Engadget'},
 'author': 'Jon Fingas',
 'title': 'AMC theaters will accept cryptocurrencies beyond Bitcoin',
 'description': "You won't have to stick to Bitcoin if you're determined to pay for your movie ticket with cryptocurrency. AMC chief Adam Aron has revealed his theater chain will also accept Ethereum, Litecoin and Bitcoin Cash when crypto payments are available. He didn't hav…",
 'url': 'https://www.engadget.com/amc-theaters-accept-ethereum-litecoin-bitcoin-cash-132642183.html',
 'urlToImage': 'https://s.yimg.com/os/creatr-uploaded-images/2021-09/4a01cb80-16eb-11ec-abfe-c7b840dd48ca',
 'publishedAt': '2021-09-16T13:26:42Z',
 'content': "You won't have to stick to Bitcoin if you're determined to pay for your movie ticket with cryptocurrency. AMC chief Adam Aron has revealed his theater chain will also accept Ethereum, Litecoin and Bi… [+1198 chars]"}

In [19]:
# Fetch the Ethereum news articles
Ethereum_headlines=api.get_everything(
    q='ethereum',
    language='en',
    page_size=100,
    sort_by='relevancy'
)
print(f"Total articles about Ethereum:{Ethereum_headlines['totalResults']}")
Ethereum_headlines['articles'][0]

Total articles about Ethereum:2606


{'source': {'id': 'the-verge', 'name': 'The Verge'},
 'author': 'Kim Lyons',
 'title': 'China’s central bank bans cryptocurrency transactions to avoid ‘risks’',
 'description': 'China’s central bank on Friday said cryptocurrency transactions in the country are illegal, banning all transactions. It said cryptocurrencies like bitcoin and Ethereum are not legal tender and can’t be circulated.',
 'url': 'https://www.theverge.com/2021/9/24/22691472/china-central-bank-cryptocurrency-illegal-bitcoin',
 'urlToImage': 'https://cdn.vox-cdn.com/thumbor/mde_l3lUC4muDPEFG7LYrUz0O3g=/0x146:2040x1214/fit-in/1200x630/cdn.vox-cdn.com/uploads/chorus_asset/file/8921023/acastro_bitcoin_2.jpg',
 'publishedAt': '2021-09-24T16:22:55Z',
 'content': 'Its the countrys latest crackdown on digital currencies\r\nIllustration by Alex Castro / The Verge\r\nThe Peoples Bank of China, the countrys central bank, said Friday that cryptocurrency transactions ar… [+1461 chars]'}

In [20]:
# Create the Bitcoin sentiment scores DataFrame

Bitcoin_sentiments = []

for article in Bitcoin_headlines['articles']:
    try:
        text = article['content']
        date = article['publishedAt'][:10]
        sentiment = analyzer.polarity_scores(text)
        compound = sentiment["compound"]
        pos = sentiment['pos']
        neu = sentiment['neu']
        neg = sentiment['neg']
        
        Bitcoin_sentiments.append({
            "text": text,
            "date": date,
            'compound': compound,
            'positive': pos,
            'negative': neg,
            'neutral' : neu,
            
        })
        
    except AttributeError:
        pass
    
bitcoin_df = pd.DataFrame(Bitcoin_sentiments)

cols = ['date', 'text', 'compound', 'positive', 'negative','neutral']
bitcoin_df = bitcoin_df[cols]

bitcoin_df.head()

Unnamed: 0,date,text,compound,positive,negative,neutral
0,2021-09-16,You won't have to stick to Bitcoin if you're d...,0.5574,0.127,0.036,0.838
1,2021-09-23,Four months after Twitter first introduced in-...,0.0,0.0,0.0,1.0
2,2021-10-06,"<ul><li>Bitcoin, in terms of market value, ros...",0.34,0.076,0.0,0.924
3,2021-09-25,Bitcoin and similar blockchain-based cryptos e...,0.0,0.0,0.0,1.0
4,2021-10-04,JPMorgan CEO Jamie Dimon is still not a Bitcoi...,-0.2411,0.0,0.116,0.884


In [26]:
# Create the Ethereum sentiment scores DataFrame
Ethereum_sentiments = []

for article in Ethereum_headlines['articles']:
    try:
        text = article['content']
        date = article['publishedAt'][:10]
        sentiment = analyzer.polarity_scores(text)
        compound = sentiment["compound"]
        pos = sentiment['pos']
        neu = sentiment['neu']
        neg = sentiment['neg']
        
        Ethereum_sentiments.append({
            "text": text,
            "date": date,
            'compound': compound,
            'positive': pos,
            'negative': neg,
            'neutral' : neu,
            
        })
        
    except AttributeError:
        pass
    
Ethereum_df = pd.DataFrame(Ethereum_sentiments)

cols = ['date', 'text', 'compound', 'positive', 'negative','neutral']
Ethereum_df = Ethereum_df[cols]

Ethereum_df.head()

Unnamed: 0,date,text,compound,positive,negative,neutral
0,2021-09-24,Its the countrys latest crackdown on digital c...,0.0,0.0,0.0,1.0
1,2021-09-16,You won't have to stick to Bitcoin if you're d...,0.5574,0.127,0.036,0.838
2,2021-09-16,OpenSea isn't wasting much time after its head...,0.2865,0.18,0.126,0.694
3,2021-09-22,Robinhood plans to start a cryptocurrency wall...,0.4588,0.108,0.0,0.892
4,2021-09-25,"image source, foundation.app/@SideEyeingChloe\...",0.4215,0.101,0.0,0.899


In [27]:
# Describe the Bitcoin Sentiment
bitcoin_df.describe()

TypeError: Cannot interpret '<attribute 'dtype' of 'numpy.generic' objects>' as a data type

In [24]:
# Describe the Ethereum Sentiment
Ethereum_df.describe()

TypeError: Cannot interpret '<attribute 'dtype' of 'numpy.generic' objects>' as a data type

In [329]:
BitcoinCont=Bitcoin['content']
sent_tokenize(BitcoinCont[1])

['Four months after Twitter first introduced in-app tipping, the company is expanding its tip jar feature in a major way.',
 'The company is opening up tipping to all its users globally, and for the first … [+2390 chars]']

### Questions:

Q: Which coin had the highest mean positive score?

A: Ethereum

Q: Which coin had the highest compound score?

A: Ethereum

Q. Which coin had the highest positive score?

A: Ethereum

---

## 2. Natural Language Processing
---
###   Tokenizer

In this section, you will use NLTK and Python to tokenize the text for each coin. Be sure to:
1. Lowercase each word.
2. Remove Punctuation.
3. Remove Stopwords.

In [28]:
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer, PorterStemmer
from string import punctuation
import re

In [29]:
def tokenize(txt):
    tokens = re.split('W+', txt)
    return tokens
data['msg_clean_tokenized'] = da['msg_clean'].apply(lambda x:tokenize(x.lower()))

NameError: name 'da' is not defined

In [32]:
# Instantiate the lemmatizer
lemmatizer = WordNetLemmatizer()
nltk.download('stopwords')
# Create a list of stopwords
sw = set(stopwords.words('english'))
first_result = [word.lower() for word in words if word.lower() not in sw]
# Expand the default stopwords list if necessary


[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/manoloserrano/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


NameError: name 'words' is not defined

In [33]:
# Complete the tokenizer function
Bitcoin_content=Bitcoin_headlines['articles'][0]
article=Bitcoin_content['content']
    # Remove the punctuation from text
def process_text(article):
    sw = set(stopwords.words('english'))
    regex = re.compile("[^a-zA-Z ]")
    re_clean = regex.sub('', article)
    words = word_tokenize(re_clean)
    lem = [lemmatizer.lemmatize(word) for word in words]
    output = [word.lower() for word in lem if word.lower() not in sw]
    return output


processed = process_text(article)
print(processed)
    # Create a tokenized list of the words
    
    
    # Lemmatize words into root words

   
    # Convert the words to lowercase
    
    
    # Remove the stop words
    
    
    

['wont', 'stick', 'bitcoin', 'youre', 'determined', 'pay', 'movie', 'ticket', 'cryptocurrency', 'amc', 'chief', 'adam', 'aron', 'ha', 'revealed', 'theater', 'chain', 'also', 'accept', 'ethereum', 'litecoin', 'bi', 'char']


In [34]:
# Create a new tokens column for Bitcoin
BitcoinCont=Bitcoin_content['content']
sent_tokenize(BitcoinCont[1])

['o']

In [35]:
# Create a new tokens column for Ethereum


---

### NGrams and Frequency Analysis

In this section you will look at the ngrams and word frequency for each coin. 

1. Use NLTK to produce the n-grams for N = 2. 
2. List the top 10 words for each coin. 

In [36]:
from collections import Counter
from nltk import ngrams

In [37]:
# Generate the Bitcoin N-grams where N=2
bigram_counts = Counter(ngrams(processed, n-2))
print(dict(bigram_counts))

NameError: name 'n' is not defined

In [38]:
# Generate the Ethereum N-grams where N=2
# YOUR CODE HERE!

In [39]:
# Function token_count generates the top 10 words for a given coin
def token_count(tokens, N=3):
    """Returns the top N tokens from the frequency count"""
    return Counter(tokens).most_common(N)

In [40]:
# Use token_count to get the top 10 words for Bitcoin
# YOUR CODE HERE!

In [41]:
# Use token_count to get the top 10 words for Ethereum
# YOUR CODE HERE!

---

### Word Clouds

In this section, you will generate word clouds for each coin to summarize the news for each coin

In [42]:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = [20.0, 10.0]

In [49]:
Bitcoin_news_df

Unnamed: 0,Word,Frequency
0,articles,1.0
1,status,1.0
2,totalresults,1.0


In [50]:
# Generate the Bitcoin word cloud
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer(stop_words='english')
X = vectorizer.fit_transform(Bitcoin_headlines)

Bitcoin_news_df = pd.DataFrame(
    list(zip(vectorizer.get_feature_names(), np.ravel(X.sum(axis=0)))),
    columns=["Word", "Frequency"],
)

Bitcoin_news_df = Bitcoin_news_df.sort_values(by=["Frequency"], ascending=False)

top_words=Bitcoin_headlines[
    (Bitcoin_headlines["Frequency"]>=10) & (Bitcoin_headlines['Frequency']<+30)
]


terms_list = str(top_words['Word'].tolist())

wordcloud = WordCloud(colormap="RdYlBu").generate(terms_list)
plt.imshow(wordcloud)
plt.axis("off")
fontdict = {'fontsize': 20, 'fontweight' : 'bold'}
plt.title('Bitcoin Word Cloud', fontdict=fontdict)
plt.show

KeyError: 'Frequency'

In [23]:
# Generate the Ethereum word cloud
# YOUR CODE HERE!

---
## 3. Named Entity Recognition

In this section, you will build a named entity recognition model for both Bitcoin and Ethereum, then visualize the tags using SpaCy.

In [24]:
import spacy
from spacy import displacy

In [25]:
# Download the language model for SpaCy
# !python -m spacy download en_core_web_sm

In [26]:
# Load the spaCy model
nlp = spacy.load('en_core_web_sm')

---
### Bitcoin NER

In [27]:
# Concatenate all of the Bitcoin text together
# YOUR CODE HERE!

In [28]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [29]:
# Render the visualization
# YOUR CODE HERE!

In [30]:
# List all Entities
# YOUR CODE HERE!

---

### Ethereum NER

In [31]:
# Concatenate all of the Ethereum text together
# YOUR CODE HERE!

In [32]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [33]:
# Render the visualization
# YOUR CODE HERE!

In [34]:
# List all Entities
# YOUR CODE HERE!

---