# "Are NFTs Bad?" : Naive Bayes for news analysis

## Goal: 
* Use Naive Bayes to classify news from around the world and decide how many pronounce NFTs as good or bad.

## Considerations
* I'm using the News API (https://newsapi.org/)
* You'll need an API key to use the API, mine is not included in the repo
* I'll keep a dataframe with some data harvested from the API. Good to not ask too much from API.
* I'll tweak NaiveBayes hyper parameters to see how results change, if it comes to that (it didn't)


# Stage 1 : Data Gathering

1. Prepare API
2. Run API for keywords
3. Save data

Some considerations from the API documentation:
* Free accounts can only retrieve 100 articles, which means up to 5 pages of 20 articles per page.
* Free accounts can only look back 30 days in time
* Free accounts can only run 120 queries every 24 hs.

Approach:
* Query a sliding window of time, until a certain max_date. News after max_date will be used for testing
* Make sliding window small, so is possible to retrieve as many articles as possible within the constraints of the developer license
* Making sliding window NOT that small, because we only have 120 queries per day. Getting 20 articles per query, it means we can obtain a maximum of ~2000 articles at best per day.

In [16]:
import os
# Load credentials from external file
API_KEY = ''
with open('../../../Documents/TOKENS/newsapi/newsapi-api-key.txt') as file:
    API_KEY = file.readlines()[0][:-1]

In [126]:
import requests
import pandas as pd
import datetime 


class NewsApi(object):
    """
    This is a very basic class that will contain the authentication
    and a method to retrieve and parse the data.
    
    Not going to worry too much about error handling here.
    """
    
    def __init__(self, API_KEY):
        """
        When instantiating the class you need to pass 
        the API_KEY as an argument
        """
        self.api_key = API_KEY
        self.newsurl = 'https://newsapi.org/v2/everything?'
        self.query = ''
        self.language = 'en'
        self.news_sorting = 'relevancy'
        self.timeranges = []
        self.max_date = ''
        
    def get_news_page(self, query, page, timerange):
        """
        Call this method pasing a query string,
        to see its content make sure you turn it into a json()
        
        Take note that this returns news for a specific time window
        
        * page represents the variable to flip through all queries news articles
        * query is the keyword to search for 
        """
        request_url = (f'{self.newsurl}'
                f'q={query}&' # actual query
                f'language={self.language}&'#choose your language
                f'page={page}&'
                f'from={timerange["from"]}&'
                f'to={timerange["to"]}&'
                f'sort_by=relevancy&'
                f'apiKey={self.api_key}') #api key goes here
        print(request_url)
        return requests.get(request_url)
    
    def define_timeranges(self,Fresh=False):
        """
        Takes today's date and creates timeranges
        to comb through news.
        
        Also assigns max_date as 2 days ago, to use that
        data for retrieving fresh news for testing.
        
        Fresh flag defines if we are getting news from past 2 days
        or 30 days to last 2 days. 
        """
        timeranges = []
        self.max_date = str(datetime.timedelta(days=-2))
        
        if Fresh:
                timeranges.append({'from':str(datetime.date.today() + datetime.timedelta(days=-2)),
                                  'to':str(datetime.date.today())})
        else:
            for i in range(4):
                timeranges.append(
                {
                    'from':str(datetime.date.today() + datetime.timedelta(days=-2) + datetime.timedelta(days=-5*(i+1))),
                'to':str(datetime.date.today() + datetime.timedelta(days=-2) + datetime.timedelta(days=-5*(i)))
                            }
                )

        
        self.timeranges = timeranges
    
    
    def get_news_df(self,query,Fresh=False):
        """ 
        We will iterate through the 5 pages of news
        and through a timeframe to comb through news in the
        past 30 days, within limitations of the API.
        
        We will not include the last 2 days, so as to save that
        for testing data. These is handled by timeranges
        
        * query is the keyword to search for        
        """
        df = pd.DataFrame([])
        
        self.define_timeranges(Fresh)
        
        for timerange in self.timeranges:
            pages = range(1,5)  ## Limit of 100 articles for developer license
            for page in pages:
                r = self.get_news_page(query,page,timerange)
                news_list = []
                try:
                    for n in r.json()['articles']:
                        _={'timestamp':n['publishedAt'],'source':n['source']['name'],'title':n['title'],'description':n['description']}
                        news_list.append(_)

                    tmp_df = pd.DataFrame(news_list)
                    if len(df) == 0:
                        df=tmp_df.copy()
                    else:
                        df = pd.concat([df,tmp_df])
                except KeyError:
                    continue
        return df.sort_values(by='timestamp', ascending=False).reset_index().drop(columns=['index'])

In [127]:
news = NewsApi(API_KEY)
raw_news_df = news.get_news_df('NFT')
raw_news_df

In [147]:
# Save to CSV (in case you cannot access API)
raw_news_df.to_csv('news-on-NFT.csv')

In [113]:
# Read from CSV (in case you cannot access API)
raw_news_df = pd.read_csv('news-on-NFT.csv').drop(columns=['Unnamed: 0'])

# Stage 2: Labeling Data

At this point we have some data of the past few weeks
but they need to be manually labeled in order for the ML
to learn and predict. 

We will manually label the dataset: 
* Title doesn't say NFT or related key_word, we drop the row
* If the title comments on negative aspects of NFT, we will label bad
* If it doesn't mention anything negative, we'll call it good

Is this how simple the world is? No.


In [116]:
# Drop description column
news_df = raw_news_df.drop(columns=['description'])
news_df = news_df.drop_duplicates()

# Drop rows without key_words
def drop_rows_without_keywords(df):
    key_words = ['NFT','Non-Fungible Token','non-fungible','DAO']
    df = df[df['title'].str.contains('|'.join(key_words))]
    return df

news_df = drop_rows_without_keywords(news_df)
news_df

Unnamed: 0,timestamp,source,title
0,2022-03-01T21:57:14Z,Justcreative.com,10+ Best NFT Creator Software – Ultimate Guide for Creatives in 2022
1,2022-03-01T21:46:34Z,Justcreative.com,8+ Best NFT Displays in 2022
2,2022-03-01T18:00:58Z,Forbes,Parler Launches NFT Marketplace With New Trump NFTs
4,2022-03-01T17:00:49Z,Small Business Trends,The Most Influential People in the NFT Space
5,2022-03-01T16:00:13Z,Bleacher Report,Kurt Warner's NFT Studio Reveals 'The Locker Room' Fan Events with Charity Donations
...,...,...,...
229,2022-02-14T20:00:32Z,Small Business Trends,What is an NFT?
230,2022-02-14T19:03:00Z,Dpreview.com,NFT marketplace Cent temporarily halts most trading due to 'widespread fraud'
233,2022-02-14T14:10:15Z,Engadget,UK authorities seize NFTSs over $1.9 million in suspected tax fraud
235,2022-02-14T12:20:04Z,Techmeme.com,"UK's HMRC tax authority seizes three NFTs in a £1.4M fraud case involving 250 alleged fake companies, the first UK law enforcement agency to seize an NFT (BBC)"


In [117]:
# Now we manually have to label the rows. I guess I'll have to create a new table and join it later
pd.set_option('display.max_colwidth', None)

label_list = []
for item in news_df['title'].values:
    print([item,0])

['10+ Best NFT Creator Software – Ultimate Guide for Creatives in 2022', 0]
['8+ Best NFT Displays in 2022', 0]
['Parler Launches NFT Marketplace With New Trump NFTs', 0]
['The Most Influential People in the NFT Space', 0]
["Kurt Warner's NFT Studio Reveals 'The Locker Room' Fan Events with Charity Donations", 0]
['Pixelmon NFTs Are So Bad, That They’re Almost Good', 0]
['8 NFT Scams to Avoid', 0]
['BETMAN debuts TIMELESS WATCH CLUB, an NFT Project Dedicated to the Watch Culture', 0]
['NFT Community Bashes Pixelmon for Disappointing In-game Art Quality', 0]
['10 Best NFT Marketplaces for Artists in 2022', 0]
['How To Buy NFTs in 2022 — Ultimate Guide', 0]
['TIMEPieces Launches Artists for Peace, a Collection of Unique 1 of 1 NFTs from Over 60 Global Artists in Support of Humanitarian and Relief Efforts in Ukraine', 0]
['Following Ukraine\'s request, Binance says banning Russian users would "fly in the face of" crypto\'s purpose; NFT market DMarket bans Russians and Belarusians (Maxwell

In [118]:
label_list = [
   ['10+ Best NFT Creator Software – Ultimate Guide for Creatives in 2022', 0],
['8+ Best NFT Displays in 2022', 0],
['Parler Launches NFT Marketplace With New Trump NFTs', 0],
['The Most Influential People in the NFT Space', 0],
["Kurt Warner's NFT Studio Reveals 'The Locker Room' Fan Events with Charity Donations", 0],
['Pixelmon NFTs Are So Bad, That They’re Almost Good', 1],
['8 NFT Scams to Avoid', 1],
['BETMAN debuts TIMELESS WATCH CLUB, an NFT Project Dedicated to the Watch Culture', 0],
['NFT Community Bashes Pixelmon for Disappointing In-game Art Quality', 1],
['10 Best NFT Marketplaces for Artists in 2022', 0],
['How To Buy NFTs in 2022 — Ultimate Guide', 0],
['TIMEPieces Launches Artists for Peace, a Collection of Unique 1 of 1 NFTs from Over 60 Global Artists in Support of Humanitarian and Relief Efforts in Ukraine', 0],
['Following Ukraine\'s request, Binance says banning Russian users would "fly in the face of" crypto\'s purpose; NFT market DMarket bans Russians and Belarusians (Maxwell Strachan/VICE)', 0],
["Civilization's Sid Meier blasts NFTs, game monetisation", 1],
['Collective Bidding is Boosting The UkraineDAO 1/1 NFT Fundraiser', 0],
['Asprey x Bugatti La Voiture Noire NFT Collectibles', 0],
['How Musicians Are Using NFTs to Revolutionize Fan Engagement', 0],
['AP cancels sale of NFT depicting migrants on a boat amid criticism of profiting from suffering', 1],
['What Are the Rules for Celebrities Promoting NFTs?', 0],
['Crypto Country Club Chips Into The NFT Game With Golfer Joel Dahmen Along For The Ride', 0],
['A look at AcadArena and other "game guilds" in Southeast Asia, which loan out NFTs of in-game creatures as "scholarships" to players of games like Axie Infinity (Rest of World)', 0],
["Gabe Newell Comments On Crypto/NFTs: It's More About The Bad Actors And Less The Technology", 1],
['10+ Different Types of NFTs – Complete List', 0],
['Ukraine Receives $4 Million In Crypto Donations Within Hours—Including $1.9 Million Tied To Julian Assange NFT Auction', 0],
["Gear up for tax season with this cryptocurrency and NFT beginner's guide", 0],
['Spotify Releases “Car Thing” and Neon Launches an NFT Vending Machine in This Week’s Business and Crypto Roundup', 0],
["Sotheby's NFT sale, expected to hit $30 million, suddenly canceled", 1],
['Frequently Asked Questions about NFTs', 0],
['Where Do you Store NFTs? How NFT Storage Works', 0],
['Ultimate NFT Marketing Guide for Creators in 2022', 0],
['How to Make and Sell an NFT: A Simple Guide for Creators', 0],
['What is an NFT? Starter Guide for Designers, Artists & Creatives', 0],
['The Most Popular and Best Selling NFT Collections This Week', 0],
['A beginner’s guide to joining a DAO (Decentralized Autonomous Organization)', 0],
['Crypto Community Rushes to Ukraine’s Defense Armed With NFTs, DAOs', 0],
['AP Cancels NFT Sale Amid Criticisms it Would Be Profiting From Suffering', 1],
['Nelson Mandela paintings of life in prison to be sold as NFTs', 0],
['NFTs 101: The Terms You Need to Know', 0],
["Why NFTs Are Like the Scam-Filled Internet of the Mid-'90s", 1],
['A look at NFT Worlds, an NFT collection of 10K unique Minecraft worlds, which has amassed over 30K ETH in trade volume, or nearly $90M, on OpenSea (Jordan Pearson/VICE)', 0],
['Sacramento Kings Star De’Aaron Fox Ditches Ditches NFT Project—After Pocketing $1.5 Million', 1],
['AP Cancels Sale Of NFT Of Migrants Floating In Overcrowded Boat In Mediterranean', 1],
['De’Aaron Fox Abruptly Shuts Down NFT Project After $1.5 Million in Sales - The Action Network', 1],
['No, Creating An NFT Of The Video Of A Horrific Shooting Will Not Get It Removed From The Internet', 0],
['The 3 Biggest Problems With the NFT Marketplace Today', 1],
['Sotheby’s NFT Sale, Expected to Hit $30 Million, Suddenly Canceled', 1],
["The NFTits Club isn't what you think it is", 0],
['Why NFTs Are Harder To Value And Trade Than Cryptocurrencies', 0],
['Texas Man Wants His Bored Ape NFT Back, Sues OpenSea for $1M', 0],
['The Legal Factors Companies Must Consider When Exploring NFTs', 0],
['NFTs And The Future Of The Alcoholic Beverage Industry', 0],
['Lana Rhoades Reportedly Rug Pulls CryptoSis NFT for $1.5 Million USD', 0],
['Neon Introduces World’s First NFT Vending Machine', 0],
['‘GenZeroes’: Paul “Big Show” Wight & ‘The 100’ Star Richard Harmon Among Cast To Join Sci-Fi NFT Series', 0],
['Sotheby’s to Auction 104 CryptoPunks NFTs for $20-$30 Million', 0],
['NFT sales hit $335 million over the past week. These were the 5 best-selling digital collections.', 0],
['OpenSea NFT Heist Likely Triggers Drop in Activity', 1],
['Cordae Brings His “Futurist” Vision to Life in a New NFT Collection', 0],
['Bored Ape Owner Sues OpenSea for $1 Million USD Over “Stolen” NFT', 1],
['Maxwell Tribeca NFT Social Club', 0],
["Here's How OpenSea Users Were Duped By NFT Phishing Scam", 1],
['Hundreds of Salesforce Employees Object To NFT Plans', 1],
['GameStop tries to calm internal fears about shift to NFTs', 1],
['Why NFT Express could be your secret weapon to cracking the NFT market', 0],
['How to Buy an NFT', 0],
['If Snoop Dogg, Reese Witherspoon, and McDonald’s Can Do NFTs, So Will You', 0],
['NFT Marketplace CEO on Counterfeits: ‘This Is an Ecosystem-Wide Problem’', 1],
['Hackers Stole $1.7 Million Worth of NFTs from Users of OpenSea Marketplace', 1],
["The internet eats itself as Stephen Colbert's AFT becomes an NFT", 0],
['The best NFT displays in 2022', 0],
['Kevin Lynch Unveils "The Kobe Bryant Experience" NFT Series', 0],
['NFT Investors Lose $1.7M in OpenSea Phishing Attack', 1],
['NFTs of cereal boxes sell out in 24 hours', 0],
['OpenSea Users Lose $1.7 Million USD in NFTs From Likely Phishing Attack', 1],
['What Entrepreneurs Need To Know Before Starting A Business In NFTs', 0],
['FTX.US launches a gaming unit, offering "crypto-as-a-service" for game companies to launch tokens and add NFT support (Emily Nicolle/Bloomberg)', 0],
['Myth-Busting NFTs: 7 Claims Fact-Checked', 0],
['How to price your NFTs', 0],
['The video game industry has a love-hate relationship with NFTs', 0],
['Explained: What are NFTs? How is it different from cryptocurrency?', 0],
['Seven NFT Projects That Are Applying An Equity Lens To Their Work', 0],
['NFTs worth $1.7M stolen via OpenSea phishing attack', 1],
['Crypto Price Crash Panic: Serious NFT ‘Hack’ Suddenly Sends Bitcoin, Ethereum, BNB, Solana And Cardano Sharply Lower', 1],
['OpenSea Probes NFT Phishing Attack, Co-Founder Says', 1],
['OpenSea says it is investigating a phishing attack; CEO says "32 users have signed a malicious payload from an attacker, and some of their NFTs were stolen" (Will Gottsegen/CoinDesk)', 1],
['A Hacker Is Actively Stealing High-Value NFTs From OpenSea Users', 1],
['NFT are everywhere, so are the scams. Here’s how to avoid them', 1],
['Cool Crypto Kids Drops NFT Collection From the Children’s Hospital of Los Angeles', 0],
['The best NFT games in 2022', 0],
['Hundreds of Salesforce employees object to NFT plans - Reuters', 0],
['Kim Shui Enhances FW22 Runway with "Serenity Keys" NFT Collection', 0],
['What’s Going on With BAPE’s NFT Designs?', 0],
['If 50 Cent broke out now, he’d launch a DAO', 0],
['Internal documents: 400+ Salesforce employees signed an open letter objecting to its NFT plans, including helping other companies create and sell NFTs (Avi Asher-Schapiro/Thomson Reuters ...)', 1],
["How to Realize Your Brand's Digital Potential With NFT Marketing", 0],
['Kimmel on NFTs: ‘I thought a blockchain was what Melania had on her door’', 0],
['The Most Popular and Best Selling NFT Collections This Week', 0],
['People Whose NFTs Were Stolen Are Getting Wildly Different Refunds from OpenSea', 1],
['A viral YouTube takedown of NFTs has already clocked over 5 million views: Here are the biggest revelations', 0],
['Universal Music Group partners with NFT marketplace Curio to develop and sell NFT collectibles for its record labels and artists starting in March (Dawn Chmielewski/Reuters)', 0],
["Melania Trump is launching another NFT collection highlighting 'iconic moments' from her husband's presidency", 0],
['Tinker Hatfield Designs a Nike Air Max 1 for Ducks Of A Feather NFT Launch', 0],
["There's a Massive Corporate NFT Art Opportunity for Artists (Infographic)", 0],
["Melania Trump's Team Denies She Purchased Her Own NFT Collection, Claims It Was a 'Third-Party Buyer'", 0],
['Sure looks like Melania Trump bought her own NFT', 0],
['Melania Trump Is Launching ‘POTUS NFT Collection’ Amid Questions Surrounding Her Own Digital Art Collection', 0],
['EXCLUSIVE Universal Music to develop collectible NFTs in deal with Curio platform - Reuters', 0],
['PlatinumGames label NFT trend ‘frustrating’ with ‘no positive impact’', 1],
['NFTs just got even more confusing', 1],
['The best NFT frames in 2022', 0],
['5ire, a sustainability-focused Layer 1 blockchain with an exchange, wallet, NFT marketplace, and VC fund, raises $100M, after a $21M seed at a $110M valuation (The Economic Times)', 0],
['OpenSea review: Create, collect, and sell NFTs with low fees', 0],
['Disaster as “NFT Music Stream” Enrages Artists By Pulling Music From YouTube', 1],
['NYSE Moves Closer To NFT Trading With Trademark Application', 0],
['NFT hype could ruin your PC — literally', 1],
["The winning bid in Melania Trump's NFT auction appears to have come from a wallet associated with Melania Trump", 0],
["Someone tried to make a Magic: The Gathering NFT system and Wizards of the Coast isn't having it", 1],
['Did Melania Trump Place the Winning Bid in Her Own NFT Auction?', 0],
['News UK explores cashing in on crypto boom with NFTs', 0],
['NYSE moves closer to NFT trading with trademark application - Reuters', 0],
['NYSE moves closer to NFT trading with trademark application - Reuters.com', 0],
['Habbo utilizing Immutable X to boost NFT experience', 0],
['Micah Johnson\'s Popular "Aku World" NFT Series is Dropping Its Final Chapter', 0],
['Rare Live Photographs of Nirvana From 1991 To Be Sold as NFTs', 0],
['Snoop Dogg Is Making Death Row Records an NFT Label', 0],
['Alphabet Project will make you want to buy all the letters to add to your NFT collection', 0],
['Magic: The Gathering owners forced to explain copyright law to unofficial NFT project', 1],
['Anime NFTs called Azukis are seeing $300 million in sales volume, overtaking Bored Apes and CryptoPunks', 0],
["Here's why NFTs may be the next era of licensing technology", 0],
['The Harebrained Scheme to Turn Magic: The Gathering Cards into NFTS', 1],
['5 Terms to Know Before Buying Your First NFT', 0],
['Kelley Blue Book: What is the Alfa Romeo Tonale, and why does it come with a non-fungible token?', 0],
['A look at the self-righteous anger in the Web3 community over the identification of the founders of the Bored Ape Yacht Club NFT collection (Maxwell Strachan/VICE)', 0],
['What is an NFT?', 0],
["NFT marketplace Cent temporarily halts most trading due to 'widespread fraud'", 1],
['UK authorities seize NFTSs over $1.9 million in suspected tax fraud', 1],
["UK's HMRC tax authority seizes three NFTs in a £1.4M fraud case involving 250 alleged fake companies, the first UK law enforcement agency to seize an NFT (BBC)", 1],
['HMRC seizes NFTs for first time amid fraud inquiry', 1]
]

labels_df = pd.DataFrame(label_list,columns=['title','target'])

In [119]:
news_df = pd.merge(left=news_df,right=labels_df,on='title')
news_df

Unnamed: 0,timestamp,source,title,target
0,2022-03-01T21:57:14Z,Justcreative.com,10+ Best NFT Creator Software – Ultimate Guide for Creatives in 2022,0
1,2022-03-01T21:46:34Z,Justcreative.com,8+ Best NFT Displays in 2022,0
2,2022-03-01T18:00:58Z,Forbes,Parler Launches NFT Marketplace With New Trump NFTs,0
3,2022-03-01T17:00:49Z,Small Business Trends,The Most Influential People in the NFT Space,0
4,2022-03-01T16:00:13Z,Bleacher Report,Kurt Warner's NFT Studio Reveals 'The Locker Room' Fan Events with Charity Donations,0
...,...,...,...,...
135,2022-02-14T20:00:32Z,Small Business Trends,What is an NFT?,0
136,2022-02-14T19:03:00Z,Dpreview.com,NFT marketplace Cent temporarily halts most trading due to 'widespread fraud',1
137,2022-02-14T14:10:15Z,Engadget,UK authorities seize NFTSs over $1.9 million in suspected tax fraud,1
138,2022-02-14T12:20:04Z,Techmeme.com,"UK's HMRC tax authority seizes three NFTs in a £1.4M fraud case involving 250 alleged fake companies, the first UK law enforcement agency to seize an NFT (BBC)",1


## Gauging success of this study

Before we move towards model fitting and all that, we need to manage expectations.
Let's see how imbalanced is this data set.
Given how varied the amount of words are, is very likely we will not have sufficient data
to identify all possible "BAD/GOOD" connotations for NFTs. 

Although our dataset is not orders of magnitude imbalanced, the richness of vocabulary is lacking given
how little rows of data we have for training

In [120]:
good_percentage = (100* news_df['target'].value_counts()/news_df.shape[0])[0]

print(f'In this dataset, {good_percentage}% of the rows are good (0) and {100-good_percentage}% are bad (1)')
print(f'This dataset has {news_df.shape[0]} rows')

In this dataset, 70.0% of the rows are good (0) and 30.0% are bad (1)
This dataset has 140 rows


# Train Naive Bayes

Now that we have training data from several weeks before today, we will have to vectorize
the text and then train the MultinomialNB (Naive Bayes) with it. 

1. Vectorize using inverse frequency (TF-IDF)
2. Train Model
3. Predict (on train data first, just to see)

In [148]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import recall_score


vectorizer = TfidfVectorizer()

X = vectorizer.fit_transform(news_df['title'])
y = news_df['target']

model = MultinomialNB()
model.fit(X,y)

y_pred = model.predict(X)

model_accuracy = model.score(X,y)
model_recall = recall_score(y, y_pred) # Note we are using training data to look at scoring, this is not a great idea

print(f'Obtained accuracy {model_accuracy} and recall {model_recall}')

Obtained accuracy 0.9071428571428571 and recall 0.6904761904761905


Not bad one might say (incorrectly). Using only ~150 headlines, got about 90% accuracy. Let's bring fresh data now to compare accuracy/recall with data never seen before.

## Label test data for metrics

In [137]:
# Get fresh news and prepare the df for vectorizing
fresh_news_df  = news.get_news_df('NFT',True).drop(columns=['description'])
fresh_news_df = drop_rows_without_keywords(fresh_news_df)
fresh_news_df = fresh_news_df.drop_duplicates(subset='title')
fresh_news_df

Unnamed: 0,timestamp,source,title
0,2022-03-03T05:39:16Z,Cointelegraph,Monthly NFT buyers dip below 800K as searches ‘fall off a cliff’
1,2022-03-03T05:31:19Z,Cointelegraph,Bob Dylan goes meta as Sony and Universal partner with Snowcrash NFT platform
3,2022-03-03T02:28:03Z,Justcreative.com,What’s the best NFT crypto for your art? (March 2022)
4,2022-03-03T01:34:09Z,Yahoo Entertainment,NFTs enter ‘mini-bear market’ as buyers drop to three-month low
5,2022-03-03T00:37:27Z,Featureshoot.com,Behind the Most Expensive Photography NFT Ever Sold
6,2022-03-03T00:10:28Z,Stereogum,Charli XCX Drops Out Of NFT-Gated Festival After Fan Backlash
7,2022-03-03T00:05:15Z,Techmeme.com,"Sources: the SEC is probing the NFT market to determine if tokens are like securities, focusing on fractional NFTs, in which a token is broken and sold in units (Matt Robinson/Bloomberg)"
8,2022-03-02T23:04:13Z,Yahoo Entertainment,SEC Probing NFT Market: Report
10,2022-03-02T22:38:00Z,Yahoo Entertainment,Nifty League Raises $5 Million Seed Investment Round Led by RSE Ventures to Expand its NFT Gaming Metaverse
11,2022-03-02T20:34:52Z,CoinDesk,Snoop Dogg's NFT Mixtape Invites Remixes. Does It Authorize Them?


In [138]:
label_list = []
for item in fresh_news_df['title'].values:
    print([item,0])

['Monthly NFT buyers dip below 800K as searches ‘fall off a cliff’', 0]
['Bob Dylan goes meta as Sony and Universal partner with Snowcrash NFT platform', 0]
['What’s the best NFT crypto for your art? (March 2022)', 0]
['NFTs enter ‘mini-bear market’ as buyers drop to three-month low', 0]
['Behind the Most Expensive Photography NFT Ever Sold', 0]
['Charli XCX Drops Out Of NFT-Gated Festival After Fan Backlash', 0]
['Sources: the SEC is probing the NFT market to determine if tokens are like securities, focusing on fractional NFTs, in which a token is broken and sold in units (Matt Robinson/Bloomberg)', 0]
['SEC Probing NFT Market: Report', 0]
['Nifty League Raises $5 Million Seed Investment Round Led by RSE Ventures to Expand its NFT Gaming Metaverse', 0]
["Snoop Dogg's NFT Mixtape Invites Remixes. Does It Authorize Them?", 0]
['Support humanitarian relief efforts for Ukraine and own a limited edition NFT', 0]
['Charli XCX Confirms She Dropped Out of NFT Festival After Fan Backlash', 0]


In [142]:
label_list=[
    ['Monthly NFT buyers dip below 800K as searches ‘fall off a cliff’', 1],
['Bob Dylan goes meta as Sony and Universal partner with Snowcrash NFT platform', 0],
['What’s the best NFT crypto for your art? (March 2022)', 0],
['NFTs enter ‘mini-bear market’ as buyers drop to three-month low', 1],
['Behind the Most Expensive Photography NFT Ever Sold', 0],
['Charli XCX Drops Out Of NFT-Gated Festival After Fan Backlash', 1],
['Sources: the SEC is probing the NFT market to determine if tokens are like securities, focusing on fractional NFTs, in which a token is broken and sold in units (Matt Robinson/Bloomberg)', 0],
['SEC Probing NFT Market: Report', 0],
['Nifty League Raises $5 Million Seed Investment Round Led by RSE Ventures to Expand its NFT Gaming Metaverse', 0],
["Snoop Dogg's NFT Mixtape Invites Remixes. Does It Authorize Them?", 0],
['Support humanitarian relief efforts for Ukraine and own a limited edition NFT', 0],
['Charli XCX Confirms She Dropped Out of NFT Festival After Fan Backlash', 1],
['Crypto Donations To Ukraine Top $52 Million As Funds Pour In From Bitcoin, Ether, PolkaDot And NFTs', 0],
['NFT Investors Furious Over The Quality Of The Art They Bought', 1],
['Metamall Partners With Jigen NFT Provider To Offer Safety For Brands In Metaverse', 0],
['Ukrainian Flag NFT Raises $6.75M for Country’s War Efforts', 0],
['IMA Financial Plans to Start Selling NFT Insurance in Decentraland', 0],
["We can't stop staring at this hilariously hideous NFT game", 0],
['NFT sales fall to $237 million over the past week. These were the 5 best-selling digital collections.', 1],
['Oriental Culture Subsidiary Launches NFT Services Business - MarketWatch', 0],
['Ukraine DAO NFT sells for $6.5M (2,173.6 ETH)', 0],
['‘CryptoTRUMPS’ To Be First Yuge Series To Emerge From Right-Wing NFT Marketplace', 0],
['From lunch to Solana: Here’s the story of the NFT ATM in New York', 0],
["Orange Comet and AMC Unveil New 'The Walking Dead' NFT Collection", 0],
['Victoria Fuller, Ex-Playboy Playmates Launch Virtual Community, Sell NFTs', 0],
['NFT Wallet Development', 0],
["Pixelmon Wanted to Be the Pokemon of NFT Games. Now It's a Laughing Stock - CNET", 0],
["Peach Farmer changes the face of media with NFT's", 0],
['Move over candy bars, New York vending machine now selling NFT art', 0],
['CryptoPunk NFT Is Latest Donation to Ukraine’s $33M Campaign', 0],
['What does the future hold for NFT marketplaces?', 0],
['Move over candy bars, New York vending machine now selling NFT art - Reuters.com', 0],
['What the Tech? NFTs | What The Tech? | wfmz.com - 69News WFMZ-TV', 0],
['NFT Artist Pplpleasr’s New Project ‘Shibuya’ Brings Long-Form Animation to Web 3', 0],
['METAROBOX Promises a Breakthrough in the NFT Industry', 0],
['10+ Best NFT Creator Software – Ultimate Guide for Creatives in 2022', 0],
['8+ Best NFT Displays in 2022', 0],
['How Philly’s Historic Rowhomes Became an NFT Sensation', 0],
['Parler Launches NFT Marketplace With New Trump NFTs', 0],
['NFT Company Buying a Yacht for their Community', 0],
['The Most Influential People in the NFT Space', 0],
['NFT horror: Pixelmon investors lose millions on disfigured Pokemon clones', 1],
["Kurt Warner's NFT Studio Reveals 'The Locker Room' Fan Events with Charity Donations", 0],
['Why the Associated Press’ migrant NFT fiasco could be a troubling new trend', 1],
['Pixelmon NFTs Are So Bad, That They’re Almost Good', 0],
['DC Comics Creates Its Own Bored Ape NFT For One Star Squadron', 0],
['Hike launches Rush Avatar NFTs for its play-to-earn Rush Gaming Universe', 0],
['8 NFT Scams to Avoid', 1],
['Someone Has Been Airdropping ‘COVID-19’ NFTs To Almost 100,000 People Uninvited', 0],
['Nifty News: One NFT per human in existence and the Pixelmon controversy', 0],
['BETMAN debuts TIMELESS WATCH CLUB, an NFT Project Dedicated to the Watch Culture', 0],
['NFT Community Bashes Pixelmon for Disappointing In-game Art Quality', 1],
['Pros And Cons of Non-Fungible Tokens (NFTs)', 0],
["What the Tech: NFTs are grabbing headlines. Here's what you need to know about them - Hawaii News Now", 0],
['10 Best NFT Marketplaces for Artists in 2022', 0],
['How To Buy NFTs in 2022 — Ultimate Guide', 0],
['7 Crypto and NFT Projects That Were Total Scams (February 2022 Edition)', 1]
]
labels_df = pd.DataFrame(label_list,columns=['title','target'])

fresh_news_df = pd.merge(left=fresh_news_df,right=labels_df,on='title')
fresh_news_df

Unnamed: 0,timestamp,source,title,target
0,2022-03-03T05:39:16Z,Cointelegraph,Monthly NFT buyers dip below 800K as searches ‘fall off a cliff’,1
1,2022-03-03T05:31:19Z,Cointelegraph,Bob Dylan goes meta as Sony and Universal partner with Snowcrash NFT platform,0
2,2022-03-03T02:28:03Z,Justcreative.com,What’s the best NFT crypto for your art? (March 2022),0
3,2022-03-03T01:34:09Z,Yahoo Entertainment,NFTs enter ‘mini-bear market’ as buyers drop to three-month low,1
4,2022-03-03T00:37:27Z,Featureshoot.com,Behind the Most Expensive Photography NFT Ever Sold,0
5,2022-03-03T00:10:28Z,Stereogum,Charli XCX Drops Out Of NFT-Gated Festival After Fan Backlash,1
6,2022-03-03T00:05:15Z,Techmeme.com,"Sources: the SEC is probing the NFT market to determine if tokens are like securities, focusing on fractional NFTs, in which a token is broken and sold in units (Matt Robinson/Bloomberg)",0
7,2022-03-02T23:04:13Z,Yahoo Entertainment,SEC Probing NFT Market: Report,0
8,2022-03-02T22:38:00Z,Yahoo Entertainment,Nifty League Raises $5 Million Seed Investment Round Led by RSE Ventures to Expand its NFT Gaming Metaverse,0
9,2022-03-02T20:34:52Z,CoinDesk,Snoop Dogg's NFT Mixtape Invites Remixes. Does It Authorize Them?,0


In [146]:
# Vectorize and predict using existing vectorizer and model
X_test = vectorizer.transform(fresh_news_df['title'])
y_test = fresh_news_df['target']
y_pred = model.predict(X_test)
model_accuracy = model.score(X_test,y_test)
model_recall = recall_score(y_test, y_pred)

print(f'Obtained accuracy {model_accuracy} and recall {model_recall}')

Obtained accuracy 0.8245614035087719 and recall 0.18181818181818182


Here we see that we missed what could be considered as a 'bad NFT', making our recall metric an absolute failure.
In order to make better claims, we'll need to improve our approach. Some ways to do this:
* Easiest approach to say, hardest to do: Add more labeled data. This API only provides 30 days of news and Developer accounts are limited to a max of 100 results. I manually labeled these points, using my judgement, which is biased.
* Find similar words that can also be used to indicate bad behavior.

This is a extremely subjective view of what is constituted bad/good. Anything that will come out of this analysis depends on my own criteria to label data.

# Why not doing hyperparameter tweaking to improve accuracy?

Normally that's the way to go. Using GridSearch or whatever way to tweak and compare scores.
However, based on the:
   * a) Complexity of Data
   * b) Small Dataset

There is simply not so much for the model to learn. Is not an issue of the model learning wrongly.
Is a matter of insufficient information for the model to learn.