# Exploratory Data Analysis of News Articles from Finnhub

Finnhub offers 1 year of news article summaries for each company.  We have extracted that for every company in the S&P 500 index, using Finnhub's web API.   You can cover longer periods with a paid subsciption.  Several other providers have similar offerings - so far we have not ascertained much difference in terms of the content between one of these services and another.

### Some challenges associated with this data

These are summaries of the news articles.  Sometimes they just repeat the headline, sometimes they are blank.  Only a small portion are of substantial length.

Some articles are repeated (perhaps the same article was sent in from multiple news feeds).

Many articles mention multiple companies and the company we're covering would be just one of these.  They appear to be getting tagged to the company by the provider based on just the appearance of the company name in the article text.

Many of the articles' content would not really contain relevant sentiment information that should be expected to be predictive about the company - e.g. just mundane accounting details.



In [171]:
import pandas as pd
import numpy as np
from datetime import datetime
from pytz import timezone
import pytz

In [172]:
from google.colab import drive

drive.mount('/content/drive', force_remount=False)

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [173]:
sp500 = pd.read_csv(
    '/content/drive/MyDrive/abnormal-distribution-project-data/cik_data/sp-components.csv', 
    dtype = 'str', 
    index_col='ticker', 
    usecols=['ticker', 'cik']
)

In [174]:
company_frame = pd.read_pickle(
        '/content/drive/MyDrive/abnormal-distribution-project-data/news/' +
        'articles_' + 
        sp500.index[0] + 
        '.pkl')

The provided columns are category, datetime, headline, id, image, related, source, summary, url.  We have added summary_length (the number of characters in the summary) as part of our download program.

In [175]:
company_frame.head()

Unnamed: 0,category,datetime,headline,id,image,related,source,summary,url,summary_length
0,,1575331200,"Why's It Called That? The Story Behind LUV, FU...",31893065,,MMM,Tyree Gorges,,https://www.benzinga.com/general/education/19/...,0
1,,1575417600,"Deutsche Bank Maintains Hold on 3M, Raises Pri...",32425784,,MMM,Vick Meyer,,https://www.benzinga.com/news/19/12/14929087/d...,0
2,,1575331200,"Why's It Called That? The Story Behind LUV, FU...",31893065,,MMM,Tyree Gorges,,https://www.benzinga.com/general/education/19/...,0
3,,1575504000,"3M Chair, CEO, Currently Speaking At Credit Su...",32425782,,MMM,Benzinga Newsdesk,,https://www.benzinga.com/news/19/12/14938429/3...,0
4,,1575417600,"Deutsche Bank Maintains Hold on 3M, Raises Pri...",32425784,,MMM,Vick Meyer,,https://www.benzinga.com/news/19/12/14929087/d...,0


Per the Finnhub documentation, the datetime field is a Unix timestamp, which represents time in Greenwich Mean Time.  We can convert these to New York time for linking to the stock price information.  Some of the these turn out to not have values populated for hours, minutes and seconds, so we skip timezone conversions for those cases.

In [176]:
def convert_time(unix_time):
    time1 = pytz.timezone('Etc/UTC').localize(pd.to_datetime(unix_time,unit='s'))
    if max(time1.time().hour, time1.time().minute, time1.time().second) == 0: return time1
    time2 = time1.astimezone(pytz.timezone('America/New_York'))
    return time2

company_frame['converted_time'] = company_frame['datetime'].apply(convert_time)

company_frame[['datetime','converted_time']].head(20)

Unnamed: 0,datetime,converted_time
0,1575331200,2019-12-03 00:00:00+00:00
1,1575417600,2019-12-04 00:00:00+00:00
2,1575331200,2019-12-03 00:00:00+00:00
3,1575504000,2019-12-05 00:00:00+00:00
4,1575417600,2019-12-04 00:00:00+00:00
5,1575590400,2019-12-06 00:00:00+00:00
6,1575590400,2019-12-06 00:00:00+00:00
7,1575504000,2019-12-05 00:00:00+00:00
8,1575590400,2019-12-06 00:00:00+00:00
9,1575590400,2019-12-06 00:00:00+00:00


### Get number of articles for each company

In [177]:
num_articles = np.zeros(len(sp500))

for i in range(len(sp500)):
    idx = sp500.index[i]
    company_frame = pd.read_pickle(
        '/content/drive/MyDrive/abnormal-distribution-project-data/news/' +
        'articles_' + 
        idx + 
        '.pkl')
    num_articles[i] = len(company_frame)

sp500['num_articles'] = num_articles


In [178]:
sp500 = sp500.sort_values(by=['num_articles'])

### Companies with the fewest articles

In [179]:
sp500.head(20)

Unnamed: 0_level_0,cik,num_articles
ticker,Unnamed: 1_level_1,Unnamed: 2_level_1
VTRS,1792044,5.0
VNT,1786842,12.0
LUMN,18926,33.0
OTIS,1781335,181.0
AMCR,1748790,195.0
PEAK,765880,197.0
HWM,4281,282.0
NWS,1564708,284.0
J,52988,288.0
CARR,1783180,305.0


### Companies with the most articles

In [180]:
sp500.tail(20)

Unnamed: 0_level_0,cik,num_articles
ticker,Unnamed: 1_level_1,Unnamed: 2_level_1
F,37996,12838.0
INTC,50863,12975.0
WMT,104169,13219.0
STT,93751,16226.0
BA,12927,16574.0
BAC,70858,20043.0
JPM,19617,20121.0
WFC,72971,20164.0
DIS,1744489,21844.0
GS,886982,22231.0


### For illustrative examples, we'll take the companies ranked, 50th, 150th, 250th, 350th and 450th for most articles

In [181]:
print(sp500.iloc[50])
print(sp500.iloc[150])
print(sp500.iloc[250])
print(sp500.iloc[350])
print(sp500.iloc[450])

cik             1013871
num_articles        517
Name: NRG, dtype: object
cik             822416
num_articles       789
Name: PHM, dtype: object
cik             1136893
num_articles       1137
Name: FIS, dtype: object
cik             36104
num_articles     1724
Name: USB, dtype: object
cik             804328
num_articles      4580
Name: QCOM, dtype: object


The companies are NRG, PHM, FIS, USB, QCOM

### NRG

In [182]:
company_frame = pd.read_pickle(
        '/content/drive/MyDrive/abnormal-distribution-project-data/news/' +
        'articles_' + 
        'NRG' + 
        '.pkl')

# remove blank summaries
print("Number of articles including blanks: " + str(len(company_frame)))
company_frame = company_frame[company_frame['summary_length'] > 0]
print("Number of articles excluding blanks: " + str(len(company_frame)))

Number of articles including blanks: 517
Number of articles excluding blanks: 423


Some example summaries

In [183]:
print(company_frame['summary'].iloc[0])
print(company_frame['summary'].iloc[1])
print(company_frame['summary'].iloc[2])
print(company_frame['summary'].iloc[3])

The three major U.S. stock market indexes dropped as China labelled President Donald Trump’s signing of bills supporting protesters in Hong Kong.
Looking at the underlying holdings of the ETFs in our coverage universe at ETF Channel, we have compared the trading price of each holding against the average analyst 12-month forward target price, and computed the weighted average implied analyst target price for the ETF itsel
Apple makes this list. So do Netflix and Take-Two Interactive Software.
Let's see if NRG Energy (NRG) stock is a good choice for value-oriented investors right now from multiple angles.


Most Common Sources

In [184]:
company_frame['source'].value_counts().head(20)

Yahoo                                  104
THELINCOLNIANONLINE                     50
YAHOO                                   33
Nasdaq                                  30
https://www.houstonchronicle.com        18
MarketWatch                             15
businesswire                            14
Benzinga                                13
https://www.thelincolnianonline.com     12
HOUSTONCHRONICLE                        11
marketwatch                             11
Investing News Network                   8
SIMPLYWALL                               6
BUSINESSWIRE                             6
benzinga                                 6
SEEKINGALPHA                             5
THEFLY                                   5
barrons                                  4
seekingalpha.com                         4
seekingalpha                             4
Name: source, dtype: int64

Looking at only the summaries of significant length (at least 100 characters)

In [185]:
company_frame_longonly = company_frame[company_frame['summary_length'] >= 1000]
print("Number of long summaries: " + str(len(company_frame_longonly)))
print("\n")
company_frame_longonly['source'].value_counts().head(20)

Number of long summaries: 15




Yahoo                               8
HOUSTONCHRONICLE                    3
https://www.houstonchronicle.com    2
REUTERS                             1
Green Technology                    1
Name: source, dtype: int64

Some examples of long summaries

In [186]:
company_frame_longonly['summary'].iloc[0]

'When investors think defense, they think utility stocks.The S&P; 500\'s utility sector behaved anything but defensively in 2019, however. The Utilities Select Sector SPDR Fund (XLU) delivered a 25.9% total return last year - better than more than half the index\'s sectors, including the revamped, "growthier" communications sector and consumer discretionary stocks.That\'s surely a pleasant surprise for utility-stock investors. Many enter the sector looking not for growth, but stability in down markets and the dependable dividends these companies can afford thanks to the often regulated nature of the utility business.Like most of the market, utility stocks did get stretched as a result of their 2019 run. "The utility sector currently trades at a P/E of 19.3x, versus a 15-year historical average of 15.01x, which represents a 29% premium to the S&P; 500," Michael Sheldon, executive director and CIO of financial planner RDM Financial Group, told Kiplinger in a December email.While Sheldon 

In [187]:
company_frame_longonly['summary'].iloc[1]

'NEW YORK (AP) — Changes announced in corporate dividends Jan. 20-Jan. 24. INCREASED DIVIDENDS Air Products and Chemicals 1.34 from 1.16 Bank of the James .07 from .06 Cambridge Bancorp .53 from .51 Capital Product Partners .35 from .315 Comcast Corp Cl A .23 from .21 Cortland Bancorp .14 from .12 Dominion Energy .94 from .9175 Enterprise Bancorp .175 from .16 Enterprise Finl Svcs .18 from .17 First Bancshares (The) .10 from .08 First Community Corp .12 from .11 Graham Holdings 1.45 from 1.39 Heartland Financial USA .20 from .18 Heritage Commerce .13 from .12 Heritage Financial Corp .20 from .19 Intel Corp .33 from .31 JB Hunt Transport Svcs .27 from .26 Kimberly-Clark 1.07 from 1.03 MPLX LP .6875 from .6675 Mercantile Bank .28 from .27 MidWestOne Financial Grp .22 from .2025 NRG Energy Inc .30 from .03 OP Bancorp .07 from .05 Old National Bancorp .14 from .13 Old Valley Bancorp CA .14 from .135 One Gas Inc .54 from .50 Orrstown Financial .17 from .15 PCB Bancorp .10 from .08 Pacific P

### PHM

In [188]:
company_frame = pd.read_pickle(
        '/content/drive/MyDrive/abnormal-distribution-project-data/news/' +
        'articles_' + 
        'PHM' + 
        '.pkl')

# remove blank summaries
print("Number of articles including blanks: " + str(len(company_frame)))
company_frame = company_frame[company_frame['summary_length'] > 0]
print("Number of articles excluding blanks: " + str(len(company_frame)))

Number of articles including blanks: 789
Number of articles excluding blanks: 694


In [189]:
print(company_frame['summary'].iloc[0])
print(company_frame['summary'].iloc[1])
print(company_frame['summary'].iloc[2])
print(company_frame['summary'].iloc[3])

Bull of the Day: PulteGroup (PHM)
Looking at the universe of stocks we cover at Dividend Channel, on 12/17/19, PulteGroup Inc (Symbol: PHM), Banc Of California Inc (Symbol: BANC), and Huntington Bancshares Inc (Symbol: HBAN) will all trade ex-dividend for their respective upcoming dividends.  PulteGroup Inc wil
Looking at the universe of stocks we cover at Dividend Channel, on 12/17/19, PulteGroup Inc (Symbol: PHM), Banc Of California Inc (Symbol: BANC), and Huntington Bancshares Inc (Symbol: HBAN) will all trade ex-dividend for their respective upcoming dividends.  PulteGroup Inc wil
Cryptocurrency has long been something given away. In fact, the first Bitcoin (no, not a whole coin) this writer ever received was through a BTC faucet, which if you remember are these services that play ads and grant viewers a few satoshis here or there, often $0.25 or so. According to a top philanthropist, giving... The post What’s the Best Way to Drive Bitcoin Adoption? Billionaire Says Crypto Giveaway

In [190]:
company_frame['source'].value_counts().head(20)

Yahoo                                  128
Nasdaq                                  76
YAHOO                                   68
ZACKS                                   56
THELINCOLNIANONLINE                     51
businesswire                            34
https://www.thelincolnianonline.com     27
marketwatch                             27
THEFLY                                  27
SEEKINGALPHA                            20
investing                               16
seekingalpha.com                        15
BENZINGA                                13
MarketWatch                             13
FOOL                                    11
seekingalpha                            10
Benzinga                                 9
cnbc                                     9
Zacks Investment Research                8
SIMPLYWALL                               6
Name: source, dtype: int64

In [191]:
company_frame_longonly = company_frame[company_frame['summary_length'] >= 1000]
print("Number of long summaries: " + str(len(company_frame_longonly)))
print("\n")
company_frame_longonly['source'].value_counts().head(20)

Number of long summaries: 10




Yahoo                     6
Benzinga Feeds            2
Nasdaq                    1
https://blockboard.net    1
Name: source, dtype: int64

In [192]:
company_frame_longonly['summary'].iloc[0]

'Companies Reporting Before The Bell United Technologies Corporation (NYSE: UTX ) is estimated to report quarterly earnings at $1.84 per share on revenue of $19.37 billion. 3M Company (NYSE: MMM ) is expected to report quarterly earnings at $2.10 per share on revenue of $8.12 billion. Harley-Davidson, Inc. (NYSE: HOG ) is projected to report quarterly earnings at $0.09 per share on revenue of $918.54 million. McCormick & Company, Incorporated (NYSE: MKC ) is expected to report quarterly earnings at $1.61 per share on revenue of $1.52 billion. Pfizer Inc. (NYSE: PFE ) is estimated to report quarterly earnings at $0.57 per share on revenue of $12.61 billion. Lockheed Martin Corporation (NYSE: LMT ) is expected to report quarterly earnings at $5.02 per share on revenue of $15.27 billion. HCA Healthcare, Inc. (NYSE: HCA ) is projected to report quarterly earnings at $3.09 per share on revenue of $13.37 billion. Xerox Holdings Corporation (NYSE: XRX ) is estimated to report quarterly earnin

In [193]:
company_frame_longonly['summary'].iloc[1]

'Companies Reporting Before The Bell United Technologies Corporation (NYSE: UTX ) is estimated to report quarterly earnings at $1.84 per share on revenue of $19.37 billion. 3M Company (NYSE: MMM ) is expected to report quarterly earnings at $2.10 per share on revenue of $8.12 billion. Harley-Davidson, Inc. (NYSE: HOG ) is projected to report quarterly earnings at $0.09 per share on revenue of $918.54 million. McCormick & Company, Incorporated (NYSE: MKC ) is expected to report quarterly earnings at $1.61 per share on revenue of $1.52 billion. Pfizer Inc. (NYSE: PFE ) is estimated to report quarterly earnings at $0.57 per share on revenue of $12.61 billion. Lockheed Martin Corporation (NYSE: LMT ) is expected to report quarterly earnings at $5.02 per share on revenue of $15.27 billion. HCA Healthcare, Inc. (NYSE: HCA ) is projected to report quarterly earnings at $3.09 per share on revenue of $13.37 billion. Xerox Holdings Corporation (NYSE: XRX ) is estimated to report quarterly earnin

### FIS

In [194]:
company_frame = pd.read_pickle(
        '/content/drive/MyDrive/abnormal-distribution-project-data/news/' +
        'articles_' + 
        'FIS' + 
        '.pkl')

# remove blank summaries
print("Number of articles including blanks: " + str(len(company_frame)))
company_frame = company_frame[company_frame['summary_length'] > 0]
print("Number of articles excluding blanks: " + str(len(company_frame)))

Number of articles including blanks: 1137
Number of articles excluding blanks: 702


In [195]:
print(company_frame['summary'].iloc[0])
print(company_frame['summary'].iloc[1])
print(company_frame['summary'].iloc[2])
print(company_frame['summary'].iloc[3])

Core banking providers FIS and Fiserv face pressure from banks’ growing in-house tech investments as well as fintech competition, Morgan Stanley reported in a research note this week. The study argued that FIS and Fiserv continue to be good businesses, with FIS historically serving clients with more than $1 billion in assets and Fiserv catering […]
JACKSONVILLE, Fla.--(BUSINESS WIRE)--In a major expansion of their work together, VyStar Credit Union—which is headquartered in Jacksonville, Florida and is one of the country’s largest credit unions—will be moving its credit card production and processing services to FIS™ (NYSE: FIS), the financial technology solutions leader announced today. In 2018, VyStar, which serves over 690,000 members, started planning their move to a new, modern front end for their core banking solution from FIS to su
JACKSONVILLE, Fla.--(BUSINESS WIRE)--FIS™ (NYSE: FIS) today announced an agreement with JCB, the leading issuer and acquirer in Japan and a global pa

In [196]:
company_frame['source'].value_counts().head(20)

Yahoo                                  131
businesswire                           102
THELINCOLNIANONLINE                     74
Nasdaq                                  70
YAHOO                                   66
marketwatch                             20
SEEKINGALPHA                            16
ZACKS                                   14
BUSINESSWIRE                            10
MarketWatch                             10
barrons                                  9
seekingalpha                             9
https://www.thelincolnianonline.com      8
SMARTERANALYST                           8
Business Wire                            8
REUTERS                                  7
BENZINGA                                 7
SIMPLYWALL                               6
seekingalpha.com                         6
Zacks Investment Research                6
Name: source, dtype: int64

In [197]:
company_frame_longonly = company_frame[company_frame['summary_length'] >= 1000]
print("Number of long summaries: " + str(len(company_frame_longonly)))
print("\n")
company_frame_longonly['source'].value_counts().head(20)

Number of long summaries: 11




Yahoo               5
Business Insider    3
Benzinga Feeds      2
reuters             1
Name: source, dtype: int64

In [198]:
company_frame_longonly['summary'].iloc[0]

"This story was delivered to Business Insider Intelligence Fintech Pro subscribers earlier this morning. To get this story plus others to your inbox each day, hours before they're published on Business Insider, click here. US-based Brex, the $2.6 billion-valued corporate credit card startup, is planning to broaden its product suite beyond its core banking services, reports Bank Innovation. The startup's newly appointed COO Paul-Henri Ferrand told the outlet that the three-year-old company aims to bolster its existing service with the introduction of insurance, lending, and treasury products, though he didn't specify when. Brex, which launched publicly in June 2018, offers corporate credit cards to startups, a segment that's long been underserved by incumbent lenders. It expanded into banking in October 2019, with a limited rollout of a cash management account; Ferrand says Brex plans to roll out the product widely by the end of March 2020. Bolstering its product suite can help Brex dee

In [199]:
company_frame_longonly['summary'].iloc[1]

"This story was delivered to Business Insider Intelligence Payments & Commerce subscribers earlier this morning. To get this story plus others to your inbox each day, hours before they're published on Business Insider, click here . Fiserv recorded almost $3.7 billion in internal revenue in Q4 2019, growing 5% year-over-year (YoY), per the company's earnings release . First Data, which Fiserv acquired for $22 billion in a deal that closed in July 2019, contributed 61% of the total company's internal revenue in the quarter — worth over $2.2 billion — and grew 6% YoY. With First Data driving both Fiserv's revenue and growth in Q4 2019, it's set up to be a core part of the business for years to come. Fiserv appears set to find new short- and long-term revenue opportunities thanks to the acquisition. The company believes it's on track to bring in at least $100 million in revenue synergies in 2020, Frank Bisignano, Fiserv COO and former First Data CEO, said on the company's earnings call . B

### USB

In [200]:
company_frame = pd.read_pickle(
        '/content/drive/MyDrive/abnormal-distribution-project-data/news/' +
        'articles_' + 
        'USB' + 
        '.pkl')

# remove blank summaries
print("Number of articles including blanks: " + str(len(company_frame)))
company_frame = company_frame[company_frame['summary_length'] > 0]
print("Number of articles excluding blanks: " + str(len(company_frame)))

Number of articles including blanks: 1724
Number of articles excluding blanks: 1171


In [201]:
print(company_frame['summary'].iloc[0])
print(company_frame['summary'].iloc[1])
print(company_frame['summary'].iloc[2])
print(company_frame['summary'].iloc[3])

Regional banks are struggling to move away from the troubled London interbank offered rate, saying alternatives to the key benchmark for variable-rate debt could hurt their ability to make new loans. 
If your PS4 won't update, there are several things you can try to get things working again. Try to update your PS4 manually, instead of letting automatic updates take care of everything. If it works, this may solve the problem, and future updates will happen automatically. You can also try to delete notifications, or install the update in Safe Mode . Visit Business Insider's homepage for more stories . Updates are an important part of keeping your PS4 in working order. They include bug fixes, new features, security updates, and more. In short, you should make sure your console always installs the latest system updates. Usually, this happens automatically, but sometimes glitches can prevent that from happening. If your PS4 won't update, here's how to fix the issue. Check out the products m

In [202]:
company_frame['source'].value_counts().head(20)

Yahoo                                  148
businesswire                           124
Nasdaq                                 117
THELINCOLNIANONLINE                     98
https://www.thelincolnianonline.com     71
YAHOO                                   52
BGR.com                                 46
seekingalpha.com                        39
Mac Rumors                              38
https://www.americanbanker.com          31
marketwatch                             22
https://www.fool.com                    20
BIZJOURNALS                             18
THEFLY                                  18
SEEKINGALPHA                            17
Business Insider                        15
seekingalpha                            14
BENZINGA                                13
barrons                                 12
MarketWatch                             11
Name: source, dtype: int64

In [203]:
company_frame_longonly = company_frame[company_frame['summary_length'] >= 1000]
print("Number of long summaries: " + str(len(company_frame_longonly)))
print("\n")
company_frame_longonly['source'].value_counts().head(20)

Number of long summaries: 119




BGR.com                        45
Mac Rumors                     37
Business Insider               15
Yahoo                           7
Nasdaq                          3
Benzinga Feeds                  3
https://www.newstimes.com       2
https://www.sfchronicle.com     2
https://www.ctpost.com          2
Sify.com                        1
Benzinga                        1
REUTERS                         1
Name: source, dtype: int64

In [204]:
company_frame_longonly['summary'].iloc[0]

"If your PS4 won't update, there are several things you can try to get things working again. Try to update your PS4 manually, instead of letting automatic updates take care of everything. If it works, this may solve the problem, and future updates will happen automatically. You can also try to delete notifications, or install the update in Safe Mode . Visit Business Insider's homepage for more stories . Updates are an important part of keeping your PS4 in working order. They include bug fixes, new features, security updates, and more. In short, you should make sure your console always installs the latest system updates. Usually, this happens automatically, but sometimes glitches can prevent that from happening. If your PS4 won't update, here's how to fix the issue. Check out the products mentioned in this article: PlayStation 4 (From $299.99 at Best Buy) How to make sure your PS4 updates Manually install the update If your PS4 won't install an update automatically, you may be able to f

In [205]:
company_frame_longonly['summary'].iloc[1]

'Apple today released iOS 13.3, the third major update to the iOS and iPadOS 13 operating systems. The new software updates come two weeks after the release of iOS/iPadOS 13.2.3 and more than a month after the launch of iOS 13.2 , which brought new emoji. The iOS and \u200c\u200c\u200ciPadOS\u200c\u200c\u200c 13.3 updates are available on all eligible devices over-the-air in the Settings app. To access the updates, go to Settings > General > Software Update. iOS 13.3 continues to add features that were originally promised for iOS 13 but were ultimately eliminated during the beta testing process, with the update introducing Communication Limits for Screen Time. Communication Limits let parents control who their children are able to contact, with the feature covering FaceTime, Phone, Messages, and iCloud Contacts. Calls to emergency numbers are always allowed and when placed, will turn off communication limits for 24 hours to make sure children are safe and not restricted from communicat

### QCOM

In [206]:
company_frame = pd.read_pickle(
        '/content/drive/MyDrive/abnormal-distribution-project-data/news/' +
        'articles_' + 
        'QCOM' + 
        '.pkl')

# remove blank summaries
print("Number of articles including blanks: " + str(len(company_frame)))
company_frame = company_frame[company_frame['summary_length'] > 0]
print("Number of articles excluding blanks: " + str(len(company_frame)))

Number of articles including blanks: 4580
Number of articles excluding blanks: 4093


In [207]:
print(company_frame['summary'].iloc[0])
print(company_frame['summary'].iloc[1])
print(company_frame['summary'].iloc[2])
print(company_frame['summary'].iloc[3])

MediaTek has spilled all the details about its new 5G system-on-a-chip, claiming that at the time of manufacture it has six world firsts inside, and a mass of cutting edge technology, ready to connect phones and other devices to Sub-6 5G networks all over the world. It’s small, but it packs in a massive amount of cool tech.
And there’s a new 5G smartphone chip too - the Dimensity 1000
The Samsung Galaxy A50 has a beefy battery, a big screen, a slick software experience, and solid performance. At just $350, it’s an exceptional phone -- but it faces stiff competition from the likes of Google’s Pixel 3a. Here’s our Galaxy A50 review.
The original Pixelbook was beautiful, but a bit too expensive. The new Pixelbook Go takes a different approach. It starts at a more affordable $649, without sacrificing build quality or performance. Is this the first Google Chromebook that's actually worth buying?


In [208]:
company_frame['source'].value_counts().head(20)

businesswire                      342
Yahoo                             328
Nasdaq                            250
https://www.forbes.com            166
https://www.androidcentral.com    138
marketwatch                       118
YAHOO                             113
https://www.rcrwireless.com       107
THEFLY                             86
investing                          84
https://www.digitaltrends.com      80
cnbc                               72
seekingalpha                       70
THELINCOLNIANONLINE                69
ZACKS                              58
https://www.slashgear.com          51
benzinga                           49
https://www.techradar.com          49
seekingalpha.com                   46
SEEKINGALPHA                       46
Name: source, dtype: int64

In [209]:
company_frame_longonly = company_frame[company_frame['summary_length'] >= 1000]
print("Number of long summaries: " + str(len(company_frame_longonly)))
print("\n")
company_frame_longonly['source'].value_counts().head(20)

Number of long summaries: 137




Yahoo                                   81
Business Insider                        11
BGR.com                                 11
Nasdaq                                   5
HOUSTONCHRONICLE                         3
Sify.com                                 3
The Economic Times India                 3
Zero Hedge                               3
ABC                                      2
https://www.sfchronicle.com              2
Globo                                    2
Benzinga Feeds                           2
Mac Rumors                               2
reuters                                  2
https://economictimes.indiatimes.com     1
https://www.houstonchronicle.com         1
IHS Markit                               1
Kurier.at                                1
https://www.newstimes.com                1
Name: source, dtype: int64

In [210]:
company_frame_longonly['summary'].iloc[0]

'Qualcomm Inc, the world’s top mobile chip supplier, said India must not go slow on 5G as deployment of this next-gen wireless broadband technology will determine its economic fortunes, competitiveness and the fate of its local manufacturing ambitions. The company’s president Cristiano Amon told ET’s Kalyan Parbat that the US chipmaker’s 2020 plans include making its 5G chipsets available in the average $250-350 price band from current $1000-plus levels, to drive affordability. Edited excerpts.5G devices phones powered by Qualcomm’s latest 5G mobile chip, the Snapdragon 865, will be commercially available by early-2020. But India, a key market for Qualcomm, is yet to even go 5G. Does that bother you?No, it doesn’t, because unlike 3G and 4G, 5G wireless technology is not unique to public (telecom) networks alone, and India can take advantage of that. While it may take longer for telcos to build 5G (with spectrum auctions yet to happen), India could start with private 5G networks for uni

In [211]:
company_frame_longonly['summary'].iloc[1]

"The world’s cheapest 5G phone costs just $285, but you might not be able to buy it After years of talking about 5G, carriers have finally rolled out 5G networks in 2019 and started selling several handsets ready to support the faster data transfers. However, most of these phones are more expensive than their 4G equivalents, and buying one makes little sense unless you happen to live or work in the vicinity of those few places where 5G actually works. Come 2020, however, 5G phones will become even more affordable, just as 5G coverage expands around the world. And it all starts with a 5G phone that costs only $285. Sadly, it might not be available in your market anytime soon. Not only is the Redmi K30 ready to support 5G out of the box, but the phone will also rock a 120Hz 6.67-inch screen, which is the kind of screen refresh rate that you’d see on a gaming Android handset or some of this year’s flagship handsets. The phone's design is similar to Samsung’s Galaxy S10+. We’re looking at 