Sections:
1. EDA
2. NLP on whole corpus
3. NER Prototypiong
4. Sentiment Prototyping


# 1. EDA



This workbook contains mostly EDA and an introduction to the NYT corpus we are working with. In addition, we prototype the mechanisms for NER which we will use to analyse individual teams and players for further analysis.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import os
from collections import Counter
from textblob import TextBlob, Word
import nltk
from nltk.corpus import stopwords
import string
import spacy

In [2]:
%matplotlib inline

In [3]:
articles_list = pd.read_csv('../nyt_scrape/nyt_article_list.csv')

In [4]:
articles_list['title'] = articles_list.article_urls.map(lambda x : x.split('/')[-1].split('.')[0])

In [5]:
articles_list.head()

Unnamed: 0,published_date,article_urls,article_summary,article_headline,title
0,2018-06-09T17:31:27+0000,https://www.nytimes.com/2018/06/09/sports/nba-...,"Accused of making the sport uncompetitive, the...",The Warriors Were Dominant. But How Dominant?,nba-finals-sweep
1,2018-06-09T01:26:37+0000,https://www.nytimes.com/2018/06/08/movies/kyri...,"In his most extensive comments to date, the Bo...",Kyrie Irving Doesn’t Know if the Earth Is Roun...,kyrie-irving-nba-celtics-earth
2,2018-06-08T22:00:03+0000,https://www.nytimes.com/2018/06/08/sports/nba-...,Kevin Durant was named the finals’ M.V.P. agai...,"Warriors, in Full Dynasty Mode, Sweep Cavalier...",nba-finals-warriors-cavs
3,2018-06-08T17:33:17+0000,https://www.nytimes.com/2018/06/08/sports/lebr...,"If James wants to beat the Warriors, he may ne...","LeBron James Reveals an Injury, but His Destin...",lebron-james-free-agency
4,2018-06-08T13:16:38+0000,https://www.nytimes.com/2018/06/08/sports/game...,The animated show combining elements of “Game ...,The ‘Game of Zones’ Guys Knew You Wanted a Bry...,game-of-zones


In [6]:
dir_list = os.listdir('../nyt_scrape/articles')
dir_list_split = [name.split('_')[0] for name in dir_list]

Checking that two article indices of same length:

In [7]:
assert len(articles_list) == len(dir_list_split)

Total articles:

In [8]:
print(len(articles_list))

320


Sample article:

In [9]:
with open('../nyt_scrape/articles/'+np.random.choice(dir_list)) as f:
    print(f.read()[:1000],'...')

Kristaps Porzingis scored a career-high 38 points and Kyle O’Quinn added 15 points and 12 rebounds to help the Knicks beat the Denver Nuggets, 116-110, at Madison Square Garden on Monday night.
Tim Hardaway Jr. scored all 13 of his points in the fourth quarter after the Knicks had blown all of their 23-point third-quarter lead.
Nikola Jokic led the Nuggets with 28 points and Jamal Murray scored 20.
The Knicks had taken their biggest lead of the game, 69-46, on Courtney Lee’s layup with 10 minutes 38 seconds left in the third quarter. However, the Nuggets responded by scoring the next 13 points to ignite a 27-2 run, and they took their first lead, 73-71, on Wilson Chandler’s free throws with 4:42 left in the third quarter.
The Knicks committed 10 turnovers during that six-minute stretch.
The game was tied 77-77 when Doug McDermott’s 3-pointer put the Knicks back in front for good with 1:09 left in the third.
The Knicks led 84-81 going into the final quarter.
Will Barton brought the Nugg

Parse dates:

In [10]:
articles_list['date'] = articles_list.published_date.apply(lambda x : x.split('T')[0])

In [11]:
articles_list.date = pd.to_datetime(articles_list.date)

Rewrite this to csv for convenience next time:

In [12]:
articles_list.to_csv('../nyt_scrape/articles_list_w_date.csv',index=False)

# 2. NLP on whole Corpus:

In [13]:
corpus = ''

for article in dir_list:
    with open(f'../nyt_scrape/articles/{article}') as f:
        corpus += f.read()
        corpus += '\n'

corpus = corpus.replace('’','').replace('”','').replace('“','').replace('—','')

Approximate number of words per article (assuming 6 characters per word):

In [14]:
len(corpus)/(len(dir_list)*6)

797.4786458333333

Word frequencies:

In [15]:
corpus_blob = TextBlob(corpus)

In [16]:
corpus_words = corpus_blob.words
corpus_words = [word.lower() for word in corpus_words if word.lower() not in stopwords.words('english')\
                                                         and word not in string.punctuation]
c = Counter(corpus_words)

In [17]:
c.most_common()

[('said', 1583),
 ('game', 1358),
 ('points', 962),
 ('team', 933),
 ('season', 844),
 ('n.b.a', 731),
 ('one', 700),
 ('first', 663),
 ('james', 633),
 ('players', 602),
 ('knicks', 590),
 ('two', 529),
 ('coach', 524),
 ('warriors', 521),
 ('games', 514),
 ('would', 513),
 ('like', 508),
 ('last', 479),
 ('teams', 463),
 ('time', 454),
 ('basketball', 451),
 ('new', 431),
 ('play', 404),
 ('league', 385),
 ('cavaliers', 370),
 ('player', 366),
 ('back', 351),
 ('even', 349),
 ('night', 348),
 ('made', 347),
 ('years', 346),
 ('also', 340),
 ('could', 333),
 ('get', 321),
 ('three', 308),
 ('golden', 308),
 ('going', 306),
 ('left', 304),
 ('celtics', 302),
 ('rockets', 298),
 ('quarter', 294),
 ('state', 290),
 ('rebounds', 288),
 ('much', 281),
 ('series', 280),
 ('way', 279),
 ('conference', 278),
 ('lead', 276),
 ('year', 270),
 ('scored', 264),
 ('cleveland', 264),
 ('second', 263),
 ('played', 258),
 ('finals', 258),
 ('minutes', 258),
 ('ball', 249),
 ('still', 248),
 ('mr', 24

Restricting to nouns:

In [18]:
tags = corpus_blob.pos_tags

In [19]:
tags = [tag for tag in tags if tag[1] in {'NN','NNS','NNP','NNPS'}]

In [20]:
tags = [Word(tag[0].lower()) for tag in tags]

In [21]:
lemma_tags = [tag.lemmatize() for tag in tags]
lemma_tags = [tag for tag in tags if tag not in ['s','t']]

In [22]:
c2 = Counter(lemma_tags)
c2.most_common(20)

[('game', 1350),
 ('points', 960),
 ('team', 927),
 ('season', 844),
 ('n.b.a', 679),
 ('james', 633),
 ('players', 602),
 ('knicks', 590),
 ('warriors', 521),
 ('games', 514),
 ('coach', 499),
 ('time', 454),
 ('teams', 447),
 ('basketball', 439),
 ('league', 377),
 ('cavaliers', 370),
 ('player', 366),
 ('night', 348),
 ('years', 346),
 ('celtics', 302)]

# 3. Prototype NER techniques:

In [23]:
nlp = spacy.load('en_core_web_sm')

Note spacy restricts us to using a maximum of two thirds of the corpus:

In [24]:
doc = nlp(corpus[:999999])

In [25]:
for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)

Virginia 32 40 GPE
Julius Erving 70 83 ORG
the New York Nets 87 104 ORG
1973 108 112 DATE
Erving 114 120 ORG
first 132 137 ORDINAL
Nassau Coliseum 149 164 ORG

 254 255 GPE
Erving 273 279 ORG

 386 387 GPE
Coliseum 391 399 ORG
Erving 428 434 ORG
Long Island 469 480 LOC
Roosevelt High School 504 525 ORG
three seasons 535 548 DATE
two 634 637 CARDINAL
A.B.A. 638 644 GPE
the N.B.A. and Erving 689 710 ORG
the Philadelphia 76ers 723 745 ORG

 746 747 GPE
Erving 751 757 ORG
Long Island 782 793 LOC
this weekend 857 869 DATE
Coliseum 923 931 GPE
first 940 945 ORDINAL
decades 954 961 DATE
J 972 973 PERSON

 1008 1009 GPE
The Long Island Nets 1009 1029 FAC
the N.B.A. G League 1039 1058 ORG
the Brooklyn Nets 1072 1089 ORG
Erving 1105 1111 ORG
Don Ryan 1138 1146 PERSON
Saturdays season 1151 1167 DATE
the Fort Wayne Mad Ants 1183 1206 ORG
New York Nets 1220 1233 GPE
Ryan 1262 1266 PERSON

 1364 1365 GPE
opening night 1375 1388 TIME
Alton Byrd 1487 1497 PERSON

 1559 1560 GPE
Coliseum 1657 1665 PERS

114 16020 16023 CARDINAL
Wizards 106 16025 16036 ORG
23 16053 16055 CARDINAL
12 16067 16069 CARDINAL
Delon Wright 16080 16092 PERSON
11 16100 16102 CARDINAL
18 16110 16112 CARDINAL
the fourth quarter 16123 16141 DATE
Toronto 16152 16159 GPE
10 16178 16180 CARDINAL
Eastern Conference 16227 16245 LOC
Washington 16252 16262 GPE

 16263 16264 GPE
DeMar DeRozan 16264 16277 PERSON
17 16284 16286 CARDINAL
C. J. Miles 16295 16306 PERSON
G Anunoby 16313 16322 ORG
12 16332 16334 CARDINAL
Kyle Lowry 16340 16350 ORG
11 16355 16357 CARDINAL
9 16369 16370 CARDINAL
second 16476 16482 ORDINAL
Philadelphia 16497 16509 GPE
2001 16513 16517 DATE

 16518 16519 GPE
Toronto 16519 16526 GPE
N.B.A 16550 16555 GPE
10 16563 16565 CARDINAL
Game 16578 16582 FAC
six 16608 16611 CARDINAL

 16620 16621 GPE
DeRozan 16645 16652 PERSON
Worry about Game 2. 16673 16692 ORG

 16692 16693 GPE
the Eastern Conference 16716 16738 LOC
Toronto 16740 16747 GPE
16 16753 16755 CARDINAL
30 3-point 16759 16769 CARDINAL
Miles 16785 1

CELTICS 30984 30991 ORG
92 30992 30994 CARDINAL
Khris Middleton 30995 31010 ORG
23 31018 31020 CARDINAL
Milwaukee 31042 31051 GPE
Boston 31057 31063 GPE
two 31093 31096 CARDINAL
one 31106 31109 CARDINAL
NYT)
 31112 31117 ORG
WINSLOW 31117 31124 ORG
Miamis 31148 31154 NORP
Justise Winslow 31155 31170 PERSON
15,000 31172 31178 MONEY
Philadelphia 31232 31244 GPE
Joel Embiids 31252 31264 GPE
Thursday 31273 31281 DATE
Winslow 31283 31290 PERSON
Embiids 31316 31323 NORP
AP 31366 31368 ORG

 31369 31370 GPE
Michael Jordan 31413 31427 PERSON
LeBron James 31489 31501 PERSON
6 31528 31529 CARDINAL
3 31532 31533 CARDINAL

 31534 31535 GPE
6 31589 31590 CARDINAL
3 31593 31594 CARDINAL
Jordans 31637 31644 NORP
6 31645 31646 CARDINAL
Jamess 31677 31683 ORG
3-5 31684 31687 PERCENT
Jerry West   31825 31837 PERSON
1 31845 31846 CARDINAL

 31897 31898 GPE
first 32014 32019 ORDINAL
Saturday 32045 32053 DATE

 32216 32217 GPE
Stephen Curry 32220 32233 PERSON
the Golden State Warriors 32238 32263 GPE
the l

 51428 51429 GPE
Fuzzy Levane 51465 51477 PERSON
Ned Irish 51533 51542 PERSON
four 51563 51567 CARDINAL
African 51568 51575 NORP
one 51619 51622 CARDINAL
Ray Felix 51633 51642 PERSON
Johnny Green 51644 51656 PERSON
Willie Naulls 51661 51674 PERSON
Ramsey 51683 51689 PERSON
Syracuse 51702 51710 GPE

 51815 51816 GPE
Monday 51828 51834 DATE
night 51835 51840 TIME
Carmelo Anthony 51858 51873 PERSON
Knicks 51878 51884 ORG
29-13 second-quarter 51897 51917 TIME
Latvian 51932 51939 NORP
Porzingis 51947 51956 GPE
France 51985 51991 GPE
Frank Ntilikina 51993 52008 PERSON
Turkish 52016 52023 NORP

 52045 52046 GPE
Ramsey 52087 52093 PERSON
his day 52147 52154 DATE
Knicks 52222 52228 ORG
New York 52257 52265 GPE

 52334 52335 GPE
Anthony 52342 52349 PERSON
Knicks 52359 52365 ORG
James 52390 52395 PERSON
  52427 52428 NORP
  52446 52447 NORP
Monday 52512 52518 DATE
Jamess 52544 52550 ORG
Knicks 52577 52583 ORG
Ntilikina 52597 52606 GPE
Dallass 52612 52619 ORG
first-quarter 52668 52681 DATE
two 527

 77347 77348 GPE
Follow NYT Food 77348 77363 ORG
Instagram 77377 77386 GPE
Twitter 77388 77395 GPE
Pinterest 77400 77409 LOC
NYT Cooking 77436 77447 ORG

 77507 77508 GPE
The Golden State Warriors 77508 77533 FAC
third-quarter 77554 77567 DATE
the fourth quarter 77605 77623 DATE
their 17th 77665 77675 DATE
Oakland 77707 77714 GPE

 77715 77716 GPE
Houston 77729 77736 GPE
the Western Conference 77802 77824 FAC
this season 77825 77836 DATE
James Harden 77843 77855 PERSON
Chris Paul 77860 77870 PERSON
95 77907 77909 CARDINAL
Game 4 77924 77930 LAW
the Western Conference 77934 77956 LAW
Tuesday 77967 77974 DATE

 77975 77976 GPE
second 77983 77989 ORDINAL
0.5 seconds 78033 78044 TIME
3-point 78064 78071 CARDINAL
Stephen Curry 78085 78098 PERSON
third quarter 78128 78141 DATE
3-pointer 78170 78179 QUANTITY

 78244 78245 GPE
the last four seasons 78371 78392 DATE
12 78405 78407 CARDINAL
34-17 third quarter 78439 78458 DATE
Rockets 78481 78488 ORG

 78553 78554 GPE
Rockets 78628 78635 ORG
Mik

Domantas Sabonis 94785 94801 PERSON
Enes Kanter 94803 94814 PERSON
Doug McDermott 94816 94830 PERSON
Russell Westbrook 94864 94881 PERSON
Kevin Durant 94962 94974 PERSON
Westbrook 94996 95005 PERSON
Sam Presti 95043 95053 PERSON
two 95061 95064 CARDINAL
this off-season 95076 95091 DATE
Anthony 95095 95102 PERSON
George 95107 95113 PERSON
Patterson 95154 95163 PERSON
Prestis 95246 95253 ORG

 95330 95331 GPE
Anthony 95379 95386 PERSON
Billy Donovan 95425 95438 PERSON
two 95488 95491 CARDINAL
Westbrook 95514 95523 PERSON
West 95574 95578 LOC
Houston 95589 95596 GPE
Golden State 95607 95619 GPE

 95620 95621 GPE
Status 95621 95627 ORG

 95644 95645 GPE
Jimmy Butler 95660 95672 PERSON
Jeff Teague 95674 95685 PERSON
Taj Gibson 95687 95697 PERSON
Jamal Crawford
 95699 95714 PERSON
Kris Dunn 95743 95752 PERSON
Zach LaVine 95754 95765 PERSON
Nikola Pekovic
Outlook 95767 95789 ORG
Tom Thibodeau 95797 95810 PERSON
last season 95890 95901 DATE
one 95917 95920 CARDINAL
Karl-Anthony 95956 95968 PER

Atlanta 113664 113671 GPE
69 113680 113682 DATE
Tahira Hughes Catlett 113694 113715 PERSON

 113767 113768 GPE
DeMathas 113768 113776 PERSON
Power Memorial 113790 113804 FAC
Catletts 113908 113916 PERSON
the University of Notre Dame 113953 113981 ORG
the National Basketball Association 113997 114032 ORG

 114033 114034 GPE
The Washington City Paper 114080 114105 ORG
2011 114109 114113 DATE

 114114 114115 GPE
Abdul-Jabbar 114115 114127 GPE
U.C.L.A. 114165 114173 GPE
20 seasons 114197 114207 DATE
the Milwaukee Bucks 114213 114232 ORG
Los Angeles Lakers 114237 114255 WORK_OF_ART
Giant 114314 114319 ORG
Peter Knobler 114332 114345 PERSON
1983 114347 114351 DATE

 114353 114354 GPE
only 16 114458 114465 CARDINAL
a hard afternoon 114486 114502 TIME

 114542 114543 GPE
Sidney Leon Catlett 114543 114562 PERSON
Washington 114575 114585 GPE
April 8, 1948 114589 114602 DATE
Sidney 114627 114633 PERSON
Louis Armstrong 114707 114722 PERSON
Benny Goodman 114724 114737 PERSON
Catlett 114765 114772 P

Jay-Z. 134002 134008 PERSON
Knicks 134088 134094 ORG
Victor Batine 134125 134138 PERSON
55 134140 134142 DATE
Queens
 134144 134151 PERSON
Gregg Popovich 134151 134165 PERSON
Game 3 134180 134186 LAW
Thursday night 134234 134248 TIME
Erin Popovich 134272 134285 PERSON
67 134294 134296 CARDINAL
Spurs 134302 134307 PERSON
Wednesday 134321 134330 DATE
Gregg 134356 134361 PERSON
40 years 134366 134374 DATE
two 134389 134392 CARDINAL
four 134406 134410 CARDINAL
Popoviches 134501 134511 PERSON
Gregg 134521 134526 PERSON
Air Force 134553 134562 ORG
Erin 134567 134571 PERSON

 134607 134608 GPE
110 134620 134623 CARDINAL
Golden State Warriors 134658 134679 ORG
Spurs 134685 134690 PERSON
three 134700 134705 CARDINAL
one 134728 134731 CARDINAL
One 134783 134786 CARDINAL
Messina 134819 134826 LOC
Game 3 134849 134855 LAW
San Antonio 134859 134870 GPE

 134871 134872 GPE
Popovich 134902 134910 GPE
Wednesday and Thursday 134914 134936 DATE

 134937 134938 GPE
69 134948 134950 DATE
five 135042 13504

Stephen Curry 159758 159771 PERSON
Clevelands Isaiah Thomas 159776 159800 PERSON
  159826 159827 ORG

 159905 159906 GPE
Dwyane Wade 159975 159986 ORG
first 160041 160046 ORDINAL
first 160060 160065 ORDINAL
Clevelands 160077 160087 ORG

 160094 160095 GPE
Green 160124 160129 ORG
late September 160184 160198 DATE
third 160239 160244 ORDINAL
four seasons 160254 160266 DATE
Golden State 160395 160407 GPE
April 160448 160453 DATE

 160454 160455 GPE
27-7 160626 160630 DATE
Cavaliers 160653 160662 ORG
Mondays 160666 160673 DATE
Oracle Arena 160690 160702 PERSON
Clevelands 160734 160744 ORG

 160824 160825 GPE
Zaza Pachulia 160884 160897 GPE
Shaun Livingston 160899 160915 PERSON
Green 160920 160925 ORG
Curry 160941 160946 ORG
Steve Kerr 160974 160984 PERSON
13 160997 160999 CARDINAL
14 161057 161059 CARDINAL
last season 161067 161078 DATE
Kerr 161110 161114 PERSON
Jordan Bell 161177 161188 PERSON
Omri Casspi 161226 161237 WORK_OF_ART
second-year 161246 161257 DATE
Patrick McCaw 161264 161277

Washington Wizards 180978 180996 ORG
102 180998 181001 CARDINAL
Game 6 181009 181015 DATE
Friday night 181019 181031 TIME
Eastern Conference 181045 181063 LOC
first 181064 181069 ORDINAL

 181083 181084 GPE
Toronto 181084 181091 GPE
as many as 12 181103 181116 CARDINAL
the first quarter 181127 181144 DATE
53-50 181161 181166 CARDINAL
the end of the second 181170 181191 DATE
first 181205 181210 ORDINAL
the third quarter 181235 181252 DATE
5 181281 181282 CARDINAL
the final period 181299 181315 DATE

 181316 181317 GPE
Lowry 181326 181331 ORG
DeMar DeRozan 181336 181349 PERSON
16 181381 181383 CARDINAL
fourth 181413 181419 ORDINAL
Fred VanVleet 181449 181462 PERSON
VanVleet 181521 181529 ORG
three minutes 181552 181565 TIME
Friday 181586 181592 DATE
Toronto 181615 181622 GPE
5 181657 181658 CARDINAL
4 181667 181668 CARDINAL
4 181681 181682 CARDINAL

 181692 181693 GPE
first 181727 181732 ORDINAL
five 181733 181737 CARDINAL
the Eastern Conference 181817 181839 EVENT
LeBron Jamess 181873 1

 195714 195715 GPE
Johnson 195780 195787 PERSON

 195916 195917 GPE
Now 195917 195920 DATE
Johnsons 195961 195969 PERSON
Clarence Fruster 195977 195993 PERSON
one 196032 196035 CARDINAL

 196036 196037 GPE
Mannings 196054 196062 ORG
  196166 196167 NORP
New York 196172 196180 GPE

 196181 196182 GPE
Jitu Weusi 196212 196222 PERSON
Bedford-Stuyvesant 196263 196281 ORG
Uhuru Sasa Shule 196350 196366 ORG
New York Citys 196375 196389 GPE
first 196390 196395 ORDINAL
East 196438 196442 LOC

 196477 196478 GPE
Fruster 196518 196525 PERSON

 196683 196684 GPE
Leslie R. Campbell 196689 196707 PERSON
Weusi   196709 196716 PERSON
6-foot-10 196718 196727 CARDINAL
  196735 196736 NORP
Long Island University 196761 196783 FAC
44 196797 196799 DATE
one 196810 196813 CARDINAL
Johnson 196814 196821 PERSON
eight 196859 196864 CARDINAL
two 196892 196895 CARDINAL
One 196902 196905 CARDINAL
Keith Fruster 196919 196932 PERSON
Swahili 196978 196985 PERSON
West Fourth Street 197050 197068 FAC

 197101 197102 

Kerrs 217786 217791 PERSON
LeBron 217852 217858 PERSON

 217897 217898 GPE
Love 217901 217905 WORK_OF_ART
Iguodala 218010 218018 GPE
minutes 218036 218043 TIME
LeBron 218083 218089 PERSON
Strength in Numbers 218158 218177 WORK_OF_ART
Jeff Van Gundy 218186 218200 PERSON
three 218273 218278 CARDINAL

 218340 218341 GPE
OGWUMIKE 218341 218349 ORG
Lue 218352 218355 PERSON
DAntoni 218378 218385 PERSON
the Western Conference 218393 218415 FAC
Houston 218429 218436 GPE
Golden State 218455 218467 ORG
Curry 218494 218499 ORG
DAntoni 218541 218548 ORG

 218625 218626 GPE
Kerr 218642 218646 PERSON
one 218667 218670 CARDINAL
PARKER 218810 218816 ORG
  218818 218819 ORG
Lue 218825 218828 PERSON
Game 3 218866 218872 EVENT
Celtics 218885 218892 ORG
2 218914 218915 CARDINAL
George Hill 218928 218939 PERSON
Kevin Love 218941 218951 PERSON
J.R. Smith 218953 218963 PERSON
Kyle Korver 218968 218979 PERSON
LeBron 219013 219019 PERSON
LeBron 219028 219034 ORG

 219119 219120 GPE
Kerr 219126 219130 PERSON
th

the fourth consecutive year 240005 240032 DATE

 240033 240034 GPE
  240101 240102 NORP

 240191 240192 GPE
Juliet Litman 240284 240297 PERSON

 240368 240369 GPE
  240400 240401 NORP
Wade 240406 240410 PERSON
Oklahoma 240412 240420 GPE
Russell Westbrook 240427 240444 PERSON
Golden States 240449 240462 ORG
Kevin Durant 240463 240475 PERSON
Westbrook 240504 240513 PERSON
Barneys New York 240568 240584 ORG

 240647 240648 GPE
Litman 240761 240767 ORG

 240928 240929 GPE
2005 241027 241031 DATE
David Stern 241062 241073 PERSON
Philadelphia 241184 241196 GPE
76er 241197 241201 ORDINAL
Allen Iverson 241202 241215 PERSON

 241252 241253 GPE
this season 241365 241376 DATE

 241377 241378 GPE
Todays 241378 241384 ORG
Fashion Week 241606 241618 EVENT

 241673 241674 GPE
James 241759 241764 PERSON
  242039 242040 NORP

 242101 242102 GPE
four 242143 242147 CARDINAL
Amiri 242155 242160 ORG
Saint Laurent 242176 242189 PERSON
one 242257 242260 CARDINAL
Fashion Book 242265 242277 ORG
ESPN 242319 242

Wednesday 260301 260310 DATE
night 260311 260316 TIME
Tatum 260318 260323 ORG
25 260340 260342 CARDINAL
Jaylen Brown 260351 260363 PERSON
24 260371 260373 CARDINAL
Terry Rozier 260375 260387 PERSON
17 260394 260396 CARDINAL
Al Horford 260401 260411 PERSON
15 260419 260421 CARDINAL
Celtics 260430 260437 ORG
three 260450 260455 CARDINAL
seven 260490 260495 CARDINAL
103 260518 260521 CARDINAL
Philadelphia 260529 260541 GPE
Monday 260545 260551 DATE

 260552 260553 GPE
Boston 260553 260559 GPE
LeBron James-led 260574 260590 PERSON
Cleveland Cavaliers 260591 260610 ORG
the second consecutive season 260640 260669 DATE
Cavaliers 260675 260684 ORG
the Toronto Raptors 260714 260733 ORG
Celtics 260750 260757 GPE
last May 260758 260766 DATE
five 260772 260776 CARDINAL
1 260795 260796 CARDINAL
Sunday 260841 260847 DATE
Boston 260851 260857 GPE

 260858 260859 GPE
Philadelphias Joel Embiid 260859 260884 ORG
12.5 seconds 260926 260938 TIME
76ers 260955 260960 DATE
Terry Rozier 260991 261003 PERSON
t

Pelicans 281618 281626 NORP
nine 281642 281646 CARDINAL
10 281659 281661 CARDINAL
half 281697 281701 CARDINAL
three 281747 281752 CARDINAL
four 281765 281769 CARDINAL
Golden State 281807 281819 FAC
2017 281823 281827 DATE
2015 281832 281836 DATE
San Antonio 281842 281853 GPE
2014 281857 281861 DATE

 281862 281863 GPE
Blazers 281999 282006 ORG
  282055 282056 ORG
19th 282102 282106 ORDINAL

 282115 282116 GPE
Last season 282151 282162 DATE
51-31 282174 282179 DATE
the Eastern Conference 282196 282218 ORG
the Cleveland Cavaliers 282252 282275 ORG
Kyle Lowry 282290 282300 ORG
Torontos All-Star 282302 282319 PERSON
one 282395 282398 CARDINAL
Lowry 282416 282421 ORG
DeMar DeRozan 282426 282439 PERSON
last year 282440 282449 DATE

 282507 282508 GPE
last summer 282511 282522 DATE
Masai Ujiri 282561 282572 PERSON
Raptors 282578 282585 ORG
Ujiri 282610 282615 ORG

 282793 282794 GPE
Lo 282794 282796 PERSON
Toronto 282809 282816 GPE
East 282898 282902 LOC
2 282927 282928 CARDINAL
the Washingto

2013 302937 302941 DATE
3-pointers 303010 303020 CARDINAL
  303042 303043 ORG

 303078 303079 GPE
Toronto 303082 303089 GPE
Nurse 303091 303096 ORG
Raptors 303112 303119 ORG
Only 14.9 percent 303151 303168 PERCENT
this season 303198 303209 DATE
24.1 percent 303240 303252 PERCENT
last season 303253 303264 DATE

 303265 303266 GPE
Nurse 303321 303326 ORG

 303332 303333 GPE
DeRozan 303348 303355 PERSON
17-foot 303433 303440 CARDINAL
Nurse 303463 303468 ORG
DeRozans 303513 303521 NORP
3-pointers 303555 303565 CARDINAL
DeRozan 303667 303674 PERSON

 303718 303719 GPE
DeRozan 303732 303739 PERSON
four 303812 303816 CARDINAL
26.8 303836 303840 CARDINAL
7.3 303849 303852 CARDINAL
6 assists 303866 303875 QUANTITY
57.3 percent 303891 303903 PERCENT
Eastern Conferences 303946 303965 ORG
the week 303976 303984 DATE

 303985 303986 GPE
Lowry 304032 304037 ORG
Last season 304106 304117 DATE
6 minutes 30 seconds 304175 304195 TIME
the N.B.A. 304293 304303 ORG
22.4 304321 304325 CARDINAL
7 304337 304


 326907 326908 GPE
last year 326942 326951 DATE
Kenyon Martin 326957 326970 PERSON
Lewis 326975 326980 PERSON
Big3 326982 326986 PERSON
Baron Davis 327039 327050 PERSON
Amare Stoudemire 327052 327068 PERSON
Metta World Peace 327070 327087 ORG
Carlos Boozer 327089 327102 PERSON
Greg Oden 327112 327121 PERSON
1 327150 327151 CARDINAL

 327178 327179 GPE
Masons 327198 327204 GPE
2018 327248 327252 DATE
Kwatinetz 327290 327299 PERSON

 327383 327384 GPE
last season 327441 327452 DATE
Big3 327513 327517 PERSON
this year 327518 327527 DATE

 327681 327682 GPE
one 327725 327728 CARDINAL
Kristaps Porzingis 327837 327855 PERSON
one 327860 327863 CARDINAL

 327931 327932 GPE
Porzingis 327988 327997 PERSON

 328003 328004 GPE
37-point 328037 328045 CARDINAL
Knicks 328075 328081 ORG
120 328087 328090 CARDINAL
the Phoenix Suns 328108 328124 ORG
Madison Square Garden 328128 328149 FAC
Friday 328153 328159 DATE
night 328160 328165 TIME
Jeff Hornacek 328180 328193 PERSON
first 328198 328203 ORDINAL
P

recent years 353904 353916 DATE
Thompson 353918 353926 ORG
Ewings 353986 353992 ORG

 354017 354018 GPE
John III 354033 354041 PERSON
Hoyas 354054 354059 ORG
Thompson 354061 354069 ORG
Georgetown 354078 354088 GPE
Capital One Arena 354116 354133 ORG
Ewings 354142 354148 ORG
the season 354224 354234 DATE
Thompson 354236 354244 ORG

 354319 354320 GPE
Butler 354328 354334 ORG
Thompson 354349 354357 ORG
Georgetown 354430 354440 GPE
Ewing 354470 354475 PERSON

 354653 354654 GPE
mid-1980s 354720 354729 DATE
Hoyas 354754 354759 ORG
three 354768 354773 CARDINAL
Final Fours 354774 354785 ORG
Thompsons 354790 354799 PERSON
Hoya Paranoia 354862 354875 ORG

 354876 354877 GPE
Thompson 354942 354950 ORG
Patricks 355024 355032 ORG
Patrick 355120 355127 ORG
  355164 355165 ORG

 355336 355337 GPE
Thompson 355337 355345 ORG
Ewing 355371 355376 PERSON
last year 355402 355411 DATE
Thompson 355491 355499 ORG

 355551 355552 GPE
years 355837 355842 DATE

 355909 355910 GPE
Georgetown 355998 356008 GPE
P

 380285 380286 GPE
2011 380427 380431 DATE
18-hour 380447 380454 CARDINAL
David Stern 380563 380574 PERSON

 380699 380700 GPE

 380974 380975 GPE

 381036 381037 GPE
LeBron James 381113 381125 PERSON
Melo 381127 381131 PERSON
D-Wade 381133 381139 ORG
Stephen Curry 381141 381154 PERSON

 381405 381406 GPE
Michele Roberts 381732 381747 PERSON
first 381759 381764 ORDINAL

 381845 381846 GPE
450 381988 381991 CARDINAL

 382254 382255 GPE

 382372 382373 GPE
David Letterman 382635 382650 PERSON
Obama 382675 382680 PERSON
Obama 382686 382691 PERSON
20 years from 382728 382741 DATE

 382984 382985 GPE

 383060 383061 GPE
Wtrmln Wtr 383489 383499 PERSON

 383581 383582 GPE
LinkedIn 383584 383592 ORG
Varun Paul 383601 383611 PERSON
Rockets 383665 383672 ORG

 383696 383697 GPE
James 383751 383756 PERSON
Trevor 383758 383764 ORG
summer 383816 383822 DATE

 383894 383895 GPE

 383935 383936 GPE
L.A. 383983 383987 GPE
iPads 384051 384056 ORG
that day 384164 384172 DATE

 384238 384239 GPE
Band-Ai

Knicks 406444 406450 ORG
Jeff Hornacek 406457 406470 PERSON

 406587 406588 GPE
Knicks 406592 406598 ORG
18-18 406600 406605 DATE
11 406621 406623 CARDINAL
Davis 406641 406646 PERSON
the third quarter 406662 406679 DATE
14 406691 406693 CARDINAL
7 406712 406713 CARDINAL
Pelicans 406731 406739 NORP
75-71 406751 406756 DATE

 406757 406758 GPE
Pelicans 406792 406800 NORP
79-78 after three quarters 406824 406850 DATE

 406851 406852 GPE
Beasley 406852 406859 PERSON
11 406867 406869 CARDINAL
Knicks 406884 406890 ORG
Pelicans 406911 406919 NORP
23-8 406927 406931 DATE
the second quarter 406936 406954 DATE
Knicks 406970 406976 ORG
Pelicans 407000 407008 NORP
42-10 407018 407023 CARDINAL

 407024 407025 GPE
MINNEAPOLIS 407025 407036 GPE
Jimmy Butler 407059 407071 PERSON
Tomball 407140 407147 GPE
Texas 407149 407154 GPE
Twitter 407188 407195 PERSON
Houston Rocket 407274 407288 LOC
Tracy McGrady 407303 407316 PERSON

 407317 407318 GPE
Houston 407325 407332 GPE
Butler 407357 407363 PERSON
Tomba

Americans 428574 428583 NORP
months or years 428659 428674 DATE

 428675 428676 GPE
Otto Warmbier 428749 428762 PERSON
American 428767 428775 NORP
North Korea 428787 428798 GPE

 428910 428911 GPE
three 428962 428967 CARDINAL
China 428986 428991 GPE
Trumps 429049 429055 PERSON
Xi Jinping 429105 429115 PERSON
Tuesday 429120 429127 DATE
three 429133 429138 CARDINAL
LiAngelo Ball 429176 429189 PERSON
N.B.A. 429210 429216 GPE
Lonzo Ball 429224 429234 PERSON
California 429297 429307 GPE

 429308 429309 GPE
three 429313 429318 CARDINAL
U.C.L.A. 429319 429327 GPE
Hangzhou 429406 429414 GPE
China 429416 429421 GPE
Los Angeles 429452 429463 GPE
Chinese 429628 429635 NORP

 429648 429649 GPE
the White House 429681 429696 ORG
the U.S. State Department 429701 429726 ORG

 429784 429785 GPE
Kelly 429789 429794 PERSON
the United States 429816 429833 GPE
Trump 429843 429848 PERSON
Tuesday night 429849 429862 TIME
Air Force One 429870 429883 PRODUCT
U.C.L.A. 429960 429968 GPE

 429977 429978 GPE
Xi 43

six-minute 456873 456883 TIME
Pachulia 456891 456899 PERSON
six-minute 456983 456993 TIME

 457009 457010 GPE
Kerr 457049 457053 PERSON
2:30 457130 457134 CARDINAL
  457222 457223 NORP

 457296 457297 GPE
Fraser 457326 457332 PERSON

 457338 457339 GPE
the first half 457517 457531 DATE

 457532 457533 GPE
just nine 457538 457547 CARDINAL
Dirk Nowitzki 457612 457625 PERSON
the Dallas Mavericks 457687 457707 ORG
2018-19 457711 457718 DATE
21st 457727 457731 ORDINAL

 457748 457749 GPE
Nowitzki 457786 457794 ORG
The New York Times 457821 457839 ORG
this week 457840 457849 DATE
one 457881 457884 CARDINAL
all season 457890 457900 DATE
two-year 457913 457921 DATE
two more years 457956 457970 DATE

 457988 457989 GPE
2018 458035 458039 DATE
two-year 458064 458072 DATE
$10 million 458074 458085 MONEY
July 2017 458108 458117 DATE
the off-season 458141 458155 DATE
First 458157 458162 ORDINAL
Nowitzki 458164 458172 ORG
next month 458206 458216 DATE
Jessica 458240 458247 PRODUCT
German 458266 4582

13 480407 480409 CARDINAL
4 480423 480424 CARDINAL
6 480428 480429 CARDINAL
3-pointers 480433 480443 CARDINAL
Hornets 480469 480476 NORP
Stephen Silas 480545 480558 PERSON

 480594 480595 GPE
Mondays 480599 480606 DATE
Silas 480630 480635 PERSON
12 480656 480658 CARDINAL
Charlotte 480693 480702 GPE
11-19 480704 480709 DATE
as many as 27 480718 480731 CARDINAL

 480739 480740 GPE
Hornets 480744 480751 NORP
three 480760 480765 CARDINAL
Knicks 480792 480798 ORG
four 480799 480803 CARDINAL

 480824 480825 GPE
Kaminsky 480825 480833 GPE
the first quarter 480850 480867 DATE
15-11 480897 480902 CARDINAL
22-4 480926 480930 CARDINAL
19-15 480949 480954 CARDINAL
37 480970 480972 CARDINAL
10:25 480986 480991 TIME
the first half 481000 481014 DATE

 481015 481016 GPE
Kaminsky 481023 481031 GPE
8 481036 481037 CARDINAL
17 481050 481052 CARDINAL
41-30 481077 481082 CARDINAL
58-33 481091 481096 CARDINAL
3:22 481102 481106 CARDINAL

 481124 481125 GPE
Michael Kidd-Gilchrist 481125 481147 PERSON
15 481

 505480 505481 GPE
Charles 505481 505488 PERSON
Tennessee 505567 505576 GPE

 505587 505588 GPE
Summitt 505639 505646 ORG
Daedra 505805 505811 PERSON
six-three 505831 505840 CARDINAL
Detroit 505853 505860 GPE
Michigan 505862 505870 GPE
‘Night Train 505915 505927 WORK_OF_ART

 505975 505976 GPE
Charles 505976 505983 PERSON
the Lady Vols 506033 506046 PRODUCT
35-2 506052 506056 DATE
13 506088 506090 CARDINAL
76-60 506107 506112 CARDINAL
Auburn in 506126 506135 WORK_OF_ART
1989 506140 506144 DATE
N.C.A.A. 506145 506153 GPE

 506172 506173 GPE
Bridgette Gordon 506173 506189 PERSON
Charless 506191 506199 ORG
that year 506249 506258 DATE

 506302 506303 GPE
15, 17 feet 506356 506367 QUANTITY
Gordon 506384 506390 PERSON

 506562 506563 GPE
The next season 506563 506578 DATE
Tennessee 506580 506589 GPE
27 506601 506603 CARDINAL
Southeastern Conference 506625 506648 PRODUCT
Lady Vols 506664 506673 PERSON
Virginia 506682 506690 GPE
79-75 506692 506697 DATE
1990 506742 506746 DATE
N.C.A.A. 506747

9 526660 526661 CARDINAL
8 526675 526676 CARDINAL
108 526748 526751 CARDINAL
one 526825 526828 CARDINAL

 526868 526869 GPE
James 526962 526967 PERSON
James 527054 527059 PERSON

 527182 527183 GPE

 527340 527341 GPE
Embiid 527400 527406 NORP
Philadelphias 527431 527444 PERSON
Twitter 527473 527480 GPE
17 527489 527491 CARDINAL
14 527500 527502 CARDINAL
6 527516 527517 CARDINAL
James 527650 527655 PERSON
this summer 527681 527692 DATE

 527693 527694 GPE
Jamess 527701 527707 PRODUCT
Hardens 527725 527732 PERSON

 527885 527886 GPE
105 527913 527916 CARDINAL
Wednesday 527931 527940 DATE
Harden 527942 527948 PERSON
half 527972 527976 CARDINAL
Wesley Johnson 527986 528000 PERSON
the Los Angeles Clippers 528004 528028 ORG
Harden 528030 528036 PERSON
3-point 528101 528108 CARDINAL
Johnson 528115 528122 PERSON
Harden 528211 528217 PERSON
Clippers 528228 528236 ORG
10 feet 528251 528258 QUANTITY
Most Valuable Player Award 528297 528323 WORK_OF_ART
Clippers 528341 528349 ORG
3-point 528411 52

Popovich 552807 552815 PERSON
the Air Force 552855 552868 ORG
45 years ago this spring 552870 552894 DATE

 552895 552896 GPE
SAN ANTONIO 552896 552907 GPE
98-year 552923 552930 CARDINAL
2018 552970 552974 DATE
N.C.A.A. 552975 552983 GPE
Immaculata 553012 553022 ORG
Catholic 553032 553040 NORP
Philadelphia 553057 553069 GPE
first 553079 553084 ORDINAL
three 553085 553090 CARDINAL

 553141 553142 GPE
Macs 553153 553157 PERSON
the early 1970s 553168 553183 DATE
Bill Russells 553185 553198 PERSON
the University of San Francisco 553219 553250 ORG
the 1950s 553254 553263 DATE
Villanova 553284 553293 ORG
Loyola-Chicago 553298 553312 ORG
this weekends 553316 553329 DATE
Four 553336 553340 CARDINAL
three 553350 553355 CARDINAL
Catholic 553433 553441 NORP
the New Testament 553555 553572 ORG

 553580 553581 GPE
Julie E. Byrne 553606 553620 PERSON
Hofstra University 553649 553667 ORG
American Catholicism 553680 553700 ORG

 553701 553702 GPE
Four 553715 553719 CARDINAL
Easter weekend 553735 55374

 580675 580676 GPE
Leftwich 580676 580684 GPE
1996 580722 580726 DATE
the last decade 580741 580756 DATE
Rick Buchanan 580812 580825 PERSON
Leftwich 580894 580902 GPE

 581117 581118 GPE
North American 581184 581198 NORP
Richard Lapchick 581225 581241 PERSON
the University of Central Floridas Institute of Diversity and Ethics 581259 581327 ORG
N.F.L. 581490 581496 ORG
Major League Baseball 581501 581522 ORG

 581523 581524 GPE

 581706 581707 GPE
Leftwichs 581707 581716 NORP
Linda Luchetti 581751 581765 PERSON
Utah 581771 581775 GPE
Teresa Resch 581826 581838 PERSON
Toronto 581844 581851 GPE

 581923 581924 GPE
Bonner 581984 581990 PERSON
Orlando 581996 582003 GPE
Amanda Green 582062 582074 PERSON
Oklahoma City 582080 582093 GPE
Nanea McGuigan 582174 582188 PERSON
Golden State Warriors 582194 582215 ORG
Annabel Padilla 582275 582290 PERSON
Hawks 582296 582301 ORG
Analisa Rodriguez 582337 582354 PERSON
San Antonio 582360 582371 GPE
Spurs 582372 582377 PERSON

 582406 582407 GPE
W.N.B.A.

Mexico 607957 607963 GPE
1992 607967 607971 DATE
20th 607980 607984 ORDINAL
first 608004 608009 ORDINAL
Dec. 6, 1997 608036 608048 DATE

 608050 608051 GPE
The N.B.A. G League 608051 608070 ORG
27 608137 608139 CARDINAL
next season 608153 608164 DATE
30 608212 608214 CARDINAL
30 goal 608219 608226 QUANTITY

 608322 608323 GPE
Mexico City 608327 608338 GPE
the G League 608386 608398 ORG
30 608407 608409 CARDINAL
the N.B.A.
 608481 608492 ORG
Thursday 608516 608524 DATE
Mexico City 608586 608597 GPE
seventh 608618 608625 ORDINAL
The N.B.A. Academies 608649 608669 ORG
China 608817 608822 GPE
three 608824 608829 CARDINAL
Australia 608832 608841 GPE
India 608843 608848 GPE
Senegal 608853 608860 GPE

 608861 608862 GPE
Dirk Nowitzki 608862 608875 PERSON
the Dallas Mavericks 608879 608899 ORG
season-ending 608904 608917 DATE
Thursday 608944 608952 DATE

 608973 608974 GPE
Nowitzki 608997 609005 ORG
Mavericks 609051 609060 PRODUCT
four 609067 609071 CARDINAL
nearly a month ago 609204 609222 DA

July 631840 631844 DATE
  631915 631916 NORP
DeAndre Jordan 631954 631968 PERSON
Lou Williams 631973 631985 PERSON
the Cleveland Cavaliers 632041 632064 ORG

 632065 632066 GPE
Clippers 632074 632082 ORG
Griffins 632131 632139 PERSON
James 632247 632252 PERSON
July 632270 632274 DATE
the summer of 2019 632329 632347 DATE
two 632363 632366 CARDINAL
Detroit 632396 632403 GPE
Tobias Harris 632419 632432 PERSON
first 632473 632478 ORDINAL
June 632493 632497 DATE

 632498 632499 GPE
West 632600 632604 LOC
80 632616 632618 CARDINAL
May 632622 632625 DATE
Clipperland 632636 632647 ORG
Griffins 632911 632919 PERSON
Clippers 632990 632998 PERSON
Griffin 633091 633098 NORP
Clipper 633119 633126 ORG
last summer 633127 633138 DATE
half-season later 633168 633185 DATE
Detroit 633273 633280 GPE
Pistons 633341 633348 ORG
seven 633364 633369 CARDINAL
the start of December 633388 633409 DATE
2-11 633416 633420 DATE
January 633431 633438 DATE
one 633450 633453 CARDINAL
Van Gundy 633473 633482 PERSON
Yea

Smith 656775 656780 PERSON
six 656895 656898 CARDINAL
Cleveland 656932 656941 GPE
15 656946 656948 CARDINAL
13 656970 656972 CARDINAL
Boston 656973 656979 GPE
Celtics 656992 656999 ORG
six 657020 657023 CARDINAL

 657030 657031 GPE
Boston 657031 657037 GPE
84-77 657056 657061 DATE
three quarters 657068 657082 DATE
fourth 657112 657118 ORDINAL
Cavaliers 657128 657137 ORG
James 657177 657182 PERSON
1:48 657221 657225 CARDINAL
Cavs 657239 657243 ORG
14 657249 657251 CARDINAL
more than seven minutes 657262 657285 TIME
Game 1 657298 657304 LAW
3 657311 657312 CARDINAL
Cleveland 657329 657338 GPE
Saturday 657342 657350 DATE

 657351 657352 GPE
The Fox News 657352 657364 WORK_OF_ART
Laura Ingraham 657370 657384 PERSON
Thursday 657388 657396 DATE
LeBron Jamess 657406 657419 PERSON
Kevin Durant 657476 657488 PERSON
James 657578 657583 PERSON
Durant 657589 657595 PERSON
the Miami Heats 657601 657616 GPE

 657629 657630 GPE
Ingraham 657634 657642 PERSON
James 657672 657677 PERSON
Trump 657703 657

Knicks 689771 689777 ORG
Anthony 689779 689786 PERSON

 689945 689946 GPE
one 689982 689985 CARDINAL
Mike DAntoni 689993 690005 PERSON
Phil Jackson 690053 690065 PERSON
Knicks 690079 690085 ORG
about half 690105 690115 CARDINAL
Anthonys 690119 690127 GPE
New York 690136 690144 GPE

 690145 690146 GPE
Anthony 690146 690153 PERSON
Jackson 690177 690184 PERSON
last June 690200 690209 DATE
Anthony 690216 690223 PERSON
Knicks 690242 690248 ORG
just months later 690254 690271 DATE
Anthony 690273 690280 PERSON
New York 690297 690305 GPE
the Thunder.
 690320 690333 ORG
Saturday 690353 690361 DATE
night 690362 690367 TIME
Anthonys 690372 690380 GPE
Friday night 690392 690404 TIME
Philadelphia 690409 690421 GPE
New York 690458 690466 GPE

 690544 690545 GPE
season 690572 690578 DATE
almost seven years 690633 690651 DATE

 690828 690829 GPE
Anthony 690921 690928 PERSON

 690996 690997 GPE
  691033 691034 NORP
  691049 691050 NORP
LeBron James 691095 691107 PERSON
the Cleveland Cavaliers 691111 69

this season 717649 717660 DATE
third 717677 717682 ORDINAL
consecutive days 717709 717725 DATE

 717726 717727 GPE
Spurs 717731 717736 PERSON
17-2 717749 717753 CARDINAL
third 717792 717797 ORDINAL

 717816 717817 GPE
Spurs 717821 717826 PERSON
Kawhi Leonard 717840 717853 PERSON
Rudy Gay 717855 717863 PERSON
two nights 717902 717912 TIME
San Antonio 717919 717930 GPE
first 717969 717974 ORDINAL
all season 717980 717990 DATE
Leonard 717992 717999 PERSON

 718017 718018 GPE
Kyle Anderson 718018 718031 PERSON
16 718036 718038 CARDINAL
eight 718050 718055 CARDINAL
Tony Parker 718095 718106 PERSON
14 718114 718116 CARDINAL
10 718135 718137 CARDINAL
the first quarter 718141 718158 DATE
San Antonio 718162 718173 GPE

 718198 718199 GPE
San Antonio 718199 718210 GPE
shot 51 percent 718211 718226 PERCENT
Knicks 718252 718258 ORG
46 percent 718264 718274 PERCENT

 718275 718276 GPE
Knicks 718280 718286 ORG
2-12 718291 718295 CARDINAL
this season 718308 718319 DATE
Oct. 29 718332 718339 DATE
Clev

Reno 744105 744109 GPE
Ricks 744155 744160 ORG

 744219 744220 GPE
three 744223 744228 CARDINAL
Nevada 744240 744246 GPE
Musselman 744248 744257 ORG
the Wolf Pack 744270 744283 LAW
81-29 744290 744295 CARDINAL
Last season 744312 744323 DATE
Nevada 744325 744331 GPE
N.C.A.A. 744375 744383 GPE
the Mountain West Conference 744495 744523 LOC
the year 744533 744541 DATE

 744542 744543 GPE
China 744641 744646 GPE
the Dominican Republic 744648 744670 GPE
Venezuela 744675 744684 GPE
Continental Basketball Association 744733 744767 ORG
United States Basketball League 744769 744800 ORG
Rapid City Thrillers 744840 744860 ORG
Florida Sharks 744862 744876 ORG

 744878 744879 GPE
two seasons 744887 744898 DATE
2002 744905 744909 DATE
2004 744913 744917 DATE
Musselman 744919 744928 PERSON
his first season 744954 744970 DATE
38-44 744982 744987 DATE
nearly a decade 745023 745038 DATE
San Antonio 745071 745082 GPE
Spurs Gregg Popovich 745083 745103 PERSON
the year 745134 745142 DATE
Musselman 745148 7

first 771057 771062 ORDINAL

 771145 771146 GPE
Gelfand 771265 771272 PERSON

 771278 771279 GPE
Golden State 771297 771309 GPE
Gelfand 771413 771420 PERSON

 771456 771457 GPE
Gelfand 771481 771488 PERSON
Golden State 771519 771531 ORG
Livingston 771654 771664 PERSON

 771665 771666 GPE
the end of the day 771669 771687 DATE
O.K. 771697 771701 ORG
5-6 771703 771706 CARDINAL
Gelfand 771708 771715 ORG
the G-League 771758 771770 ORG
  771781 771782 NORP
  771821 771822 NORP

 771923 771924 GPE

 772049 772050 GPE
Curry 772050 772055 ORG
  772087 772088 ORG

 772146 772147 GPE
LeBron James 772196 772208 PERSON
first 772217 772222 ORDINAL
the season 772240 772250 DATE
57 772260 772262 CARDINAL
Friday 772266 772272 DATE
night 772273 772278 TIME
Sunday 772288 772294 DATE
James Harden 772296 772308 PERSON
56 772327 772329 CARDINAL
Kristaps Porzingiss 772352 772371 ORG
40 772372 772374 CARDINAL
the same night 772378 772392 TIME

 772393 772394 GPE
56-point 772402 772410 CARDINAL
only 35 minutes

Houston 793978 793985 GPE
3-point 794014 794021 CARDINAL
second 794039 794045 ORDINAL
DAntoni 794067 794074 ORG
James Harden 794116 794128 PERSON
Chris Paul 794133 794143 PERSON
Rockets 794166 794173 NORP
1 794185 794186 CARDINAL
30th 794214 794218 ORDINAL
third to 14th 794257 794270 DATE

 794279 794280 GPE
Dan DAntoni 794321 794332 PERSON
Mike 794364 794368 PERSON
Suns 794378 794382 ORG
Knicks 794384 794390 ORG
Lakers 794395 794401 PERSON
Marshall 794424 794432 PERSON
Phoenix 794453 794460 GPE
Houston 794498 794505 GPE
two 794510 794513 CARDINAL
one 794523 794526 CARDINAL
one 794530 794533 CARDINAL

 794542 794543 GPE
Rockets 794547 794554 ORG
Wednesday 794594 794603 DATE
  794674 794675 NORP
DAntonis Suns 794685 794698 ORG
Game 4 794869 794875 EVENT
Cleveland 794945 794954 GPE
3-1 794992 794995 CARDINAL
Game 7 795013 795019 LAW
Oracle Arena 795023 795035 PERSON
Golden State 795076 795088 ORG

 795129 795130 GPE
Dan DAntoni 795150 795161 PERSON
Houston 795186 795193 GPE
Mike DAntonis

Stephen Curry 821712 821725 PERSON
Tuesday 821756 821763 DATE
nights 821764 821770 TIME
12 821791 821793 CARDINAL
Oklahoma 821794 821802 GPE
14-4 821804 821808 DATE
5 821818 821819 CARDINAL
Kansas 821820 821826 GPE
16-3 821828 821832 CARDINAL
Norman 821837 821843 GPE
Oklahoma 821905 821913 GPE
two 821932 821935 CARDINAL

 821942 821943 GPE
Oklahoma 821943 821951 GPE
Corey Evans 822023 822034 PERSON
Young 822112 822117 PERSON

 822191 822192 GPE
Concerning Young 822192 822208 PERSON
Dwight Howard 822256 822269 PERSON
Young 822282 822287 PERSON
two 822314 822317 CARDINAL
year 822344 822348 DATE
32 822441 822443 CARDINAL
the Oscar Robertson Trophy 822454 822480 ORG
50 822488 822490 CARDINAL
the Naismith Award 822495 822513 PERSON
first 822524 822529 ORDINAL
All-American 822535 822547 WORK_OF_ART
Young 822549 822554 PERSON
12s 822604 822607 DATE
Mohamed Bamba 822622 822635 PERSON
Texas 822639 822644 GPE
the year 822674 822682 DATE
Young 822684 822689 PERSON
this June 822718 822727 DATE

 8

Courtney Lee 845021 845033 PERSON
St. Fleur 845075 845084 PERSON
Carmelo Anthony 845141 845156 PERSON
Knicks 845169 845175 ORG
Dahntay Jones 845197 845210 PERSON
St. Fleur 845230 845239 ORG
Manhattans Sky 845278 845292 PERSON
last summer 845297 845308 DATE
Anthony 845316 845323 PERSON
Hoodie Melo 845341 845352 ORG

 845364 845365 GPE
one 845695 845698 CARDINAL
millions 845811 845819 CARDINAL

 845846 845847 GPE
Jay-Z 845852 845857 PERSON
4:44 845885 845889 GPE
The Story of O. J. 845941 845959 WORK_OF_ART
St. Fleur 846055 846064 PERSON
years 846091 846096 DATE

 846192 846193 GPE

 846323 846324 GPE
the mid-1990s 846360 846373 DATE
Perry 846380 846385 PERSON

 846478 846479 GPE
1996 846523 846527 DATE
Perry 846529 846534 PERSON
Darrell Walker 846554 846568 PERSON
Knicks 846581 846587 ORG
Japan 846615 846620 GPE
Charles Barkley 846651 846666 PERSON
Walker 846668 846674 PERSON

 846715 846716 GPE

 846727 846728 GPE
The next season 846728 846743 DATE
Walker 846745 846751 PERSON
Perry 8467

Kerrs 871908 871913 PERSON
fourth 871979 871985 ORDINAL
  872011 872012 NORP

 872036 872037 GPE
Casspi 872049 872055 LOC
10 or more minutes 872087 872105 TIME
only five 872106 872115 CARDINAL
January 872128 872135 DATE
3-pointers 872262 872272 CARDINAL
Golden State 872311 872323 ORG

 872403 872404 GPE
53 872456 872458 CARDINAL
Warrior 872470 872477 ORG
Casspi 872553 872559 PRODUCT
the Sacramento Kings 872578 872598 FAC
nine 3-pointers 872636 872651 CARDINAL
36 872664 872666 CARDINAL
Curry at Oracle Arena 872711 872732 ORG
Dec. 28, 2015 872736 872749 DATE

 872750 872751 GPE
last months 872766 872777 DATE
Kerr 872803 872807 PERSON
Golden State 872868 872880 GPE
11.9 873028 873032 CARDINAL
7.1 873044 873047 CARDINAL
seven 873073 873078 CARDINAL
nearly 59 percent 873101 873118 PERCENT

 873134 873135 GPE
Casspi 873193 873199 PERSON
12 minutes 873377 873387 TIME
12 873401 873403 CARDINAL
3s 873451 873453 CARDINAL

 873454 873455 GPE
Steve 873505 873510 PERSON

 873591 873592 GPE
Currys 8

112-99 894327 894333 DATE
Wednesday 894337 894346 DATE
night 894347 894352 TIME
15 894378 894380 CARDINAL
16 894388 894390 CARDINAL
the second half 894401 894416 DATE
11 894469 894471 CARDINAL
early in the fourth quarter 894488 894515 DATE
Magics 894534 894540 NORP

 894566 894567 GPE
Magic 894571 894576 EVENT
Payton 894636 894642 ORG
two 894652 894655 CARDINAL
Knicks 894680 894686 ORG
Porzingis 894704 894713 PERSON
three 894778 894783 CARDINAL

 894819 894820 GPE
Porzingiss 894823 894833 NORP
Tim Hardaway Jr. 894843 894859 PERSON
Knicks 894868 894874 ORG
26 894918 894920 CARDINAL
11 894932 894934 CARDINAL
38 minutes 894947 894957 TIME
Knicks 894963 894969 ORG
second 894970 894976 ORDINAL
Doug McDermott 894996 895010 PERSON
13 895020 895022 CARDINAL

 895044 895045 GPE
Knicks 895049 895055 ORG
six 895084 895087 CARDINAL
seven 895102 895107 CARDINAL
Porzingis 895163 895172 GPE
53 percent 895184 895194 PERCENT
23 895224 895226 CARDINAL

 895237 895238 GPE
Magic 895268 895273 ORG
54 perce

Porzingis 926412 926421 GPE
3-point 926436 926443 CARDINAL
fourth 926460 926466 ORDINAL

 926476 926477 GPE
2016 926480 926484 DATE
Bryan Colangelos 926486 926502 PERSON
first 926503 926508 ORDINAL
last year 926588 926597 DATE
Colangelo 926599 926608 PERSON
Sacramentos 926637 926648 ORG
2019 926658 926662 DATE
first 926663 926668 ORDINAL
Boston 926683 926689 GPE
two 926701 926704 CARDINAL
1 926718 926719 CARDINAL

 926832 926833 GPE
Philadelphia 926855 926867 GPE
Hinkie 926914 926920 PERSON
Noel 926933 926937 PERSON
Okafor   926942 926950 NORP

 927030 927031 GPE
76ers 927073 927078 DATE
Simmonss 927181 927189 NORP
last season 927226 927237 DATE
Fultz 927246 927251 PERSON

 927311 927312 GPE
Knicks 927337 927343 ORG
Porzingis 927521 927530 GPE
Knicks 927541 927547 ORG
Steve Mills 927608 927619 PERSON
Scott Perry 927624 927635 PERSON

 927636 927637 GPE
Porzingis 927672 927681 PERSON
  927726 927727 NORP
  927773 927774 NORP

 927856 927857 GPE
Porzingis 927885 927894 GPE
five-year 9278

Arseneault 953591 953601 ORG

 953643 953644 GPE
Arseneault 953644 953654 GPE
David Arseneault Sr 953680 953699 PERSON
the early 1990s 953710 953725 DATE
Grinnell 953747 953755 ORG
the last quarter-century 953931 953955 DATE

 953989 953990 GPE
one 953997 954000 CARDINAL
Grinnell 954020 954028 PRODUCT
David Arseneault Sr 954076 954095 PERSON
3-pointers 954151 954161 CARDINAL

 954271 954272 GPE
Rockets 954276 954283 ORG
3-pointers 954323 954333 CARDINAL
this season 954334 954345 DATE
3-point 954502 954509 CARDINAL
Nick Young 954558 954568 PERSON
Mondays 954630 954637 DATE
25-footer 954658 954667 CARDINAL

 954682 954683 GPE
three 954803 954808 CARDINAL
Arseneault 954816 954826 ORG

 954859 954860 GPE
Arseneault 954860 954870 GPE
Grinnell 954914 954922 ORG
Division III 954968 954980 ORG
one day 955051 955058 DATE
2014 955062 955066 DATE
Sacramento 955072 955082 GPE
David Arseneault Jr. 955128 955148 PERSON
the Reno Bighorns 955158 955175 ORG
the N.B.A. Development League 955196 955225 O

108 978885 978888 CARDINAL
Pacers 978910 978916 PERSON
Nov. 5 978920 978926 DATE
108-100 978930 978937 CARDINAL
Toronto 978952 978959 GPE
Nov. 22 978963 978970 DATE
28 978983 978985 CARDINAL
the third quarter 978995 979012 DATE
1991-92 979048 979055 DATE

 979056 979057 GPE
Knicks 979061 979067 ORG
Bankers Life Fieldhouse 979086 979109 ORG
8-27 979121 979125 DATE

 979126 979127 GPE
Pacers 979131 979137 PERSON
7-4 979147 979150 DATE
six 979181 979184 CARDINAL
Dec. 15 979232 979239 DATE
6 979250 979251 CARDINAL
120 979267 979270 CARDINAL
0-4 979282 979285 CARDINAL
less than 100 979299 979312 CARDINAL

 979313 979314 GPE
3 979476 979477 CARDINAL

 979530 979531 GPE
7 979574 979575 CARDINAL
Celtics 979614 979621 ORG
Rockets 979626 979633 ORG
3-point 979662 979669 CARDINAL

 979676 979677 GPE
Monday night 979680 979692 TIME
Rockets 979719 979726 NORP
7-for-44 979751 979759 CARDINAL
3 979765 979766 CARDINAL
16 percent 979771 979781 PERCENT
the entire season 979810 979827 DATE
26 percent 979

So this does a pretty good job but is also slow. Let's analysis how accurate the less precise textblob versions are:

In [27]:
team_names = pd.read_csv('../player_stats/team_names.csv')

Note that this data frame contains names of all teams along with number of wins this season and the round of playoffs.

In [28]:
team_names.head()

Unnamed: 0,Name,Wins,Playoffs,Conference,New York Team,City
0,Lakers,35,0,W,0,Los Angeles
1,Cavaliers,50,4,E,0,Cleveland
2,Warriors,58,5,W,0,Golden State
3,Celtics,55,3,E,0,Boston
4,Spurs,47,1,W,0,San Antonio


In [29]:
playoff_dict = {0:'DNQ',1:'First Round',2:'Second Round',3:'Conference Finals',4:'NBA Finals',5:'Champion'}

In [30]:
names = list(team_names['Name'])

Prediction with TextBlob tags:

In [31]:
for name in names:
    print('{} has {} occurrences in corpus'.format(name,c2[name.lower()]))

Lakers has 103 occurrences in corpus
Cavaliers has 370 occurrences in corpus
Warriors has 521 occurrences in corpus
Celtics has 302 occurrences in corpus
Spurs has 108 occurrences in corpus
76ers has 186 occurrences in corpus
Knicks has 590 occurrences in corpus
Bulls has 28 occurrences in corpus
Rockets has 296 occurrences in corpus
Raptors has 153 occurrences in corpus
Mavericks has 60 occurrences in corpus
Thunder has 67 occurrences in corpus
Suns has 41 occurrences in corpus
Heat has 75 occurrences in corpus
Hawks has 32 occurrences in corpus
Timberwolves has 46 occurrences in corpus
Hornets has 31 occurrences in corpus
Clippers has 99 occurrences in corpus
Pistons has 37 occurrences in corpus
Pacers has 91 occurrences in corpus
Blazers has 44 occurrences in corpus
Bucks has 62 occurrences in corpus
Wizards has 61 occurrences in corpus
Magic has 45 occurrences in corpus
Nets has 146 occurrences in corpus
Grizzlies has 29 occurrences in corpus
Kings has 52 occurrences in corpus
Nugg

In [32]:
team_names['noun_occurrences'] = [c2[name.lower()] for name in names]

Picking out these specific strings:

In [33]:
for name in names:
    print('{} has {} occurrences in corpus'.format(name,corpus.count(name)))

Lakers has 103 occurrences in corpus
Cavaliers has 369 occurrences in corpus
Warriors has 525 occurrences in corpus
Celtics has 297 occurrences in corpus
Spurs has 109 occurrences in corpus
76ers has 195 occurrences in corpus
Knicks has 591 occurrences in corpus
Bulls has 28 occurrences in corpus
Rockets has 296 occurrences in corpus
Raptors has 150 occurrences in corpus
Mavericks has 60 occurrences in corpus
Thunder has 77 occurrences in corpus
Suns has 42 occurrences in corpus
Heat has 71 occurrences in corpus
Hawks has 32 occurrences in corpus
Timberwolves has 52 occurrences in corpus
Hornets has 32 occurrences in corpus
Clippers has 100 occurrences in corpus
Pistons has 37 occurrences in corpus
Pacers has 90 occurrences in corpus
Blazers has 42 occurrences in corpus
Bucks has 58 occurrences in corpus
Wizards has 58 occurrences in corpus
Magic has 42 occurrences in corpus
Nets has 150 occurrences in corpus
Grizzlies has 29 occurrences in corpus
Kings has 52 occurrences in corpus
Nug

In [34]:
team_names['raw_occurrences'] = [corpus.count(name) for name in names]

In [35]:
team_names['% difference'] = ((team_names.raw_occurrences - team_names.noun_occurrences) / team_names.raw_occurrences)*100

In [36]:
team_names.sort_values(['noun_occurrences'],ascending=False)

Unnamed: 0,Name,Wins,Playoffs,Conference,New York Team,City,noun_occurrences,raw_occurrences,% difference
6,Knicks,29,0,E,1,New York,590,591,0.169205
2,Warriors,58,5,W,0,Golden State,521,525,0.761905
1,Cavaliers,50,4,E,0,Cleveland,370,369,-0.271003
3,Celtics,55,3,E,0,Boston,302,297,-1.683502
8,Rockets,65,3,W,0,Houston,296,296,0.0
5,76ers,52,2,E,0,Philadelphia,186,195,4.615385
9,Raptors,59,2,E,0,Toronto,153,150,-2.0
24,Nets,28,0,E,1,Brooklyn,146,150,2.666667
4,Spurs,47,1,W,0,San Antonio,108,109,0.917431
29,Pelicans,48,2,W,0,New Orleans,106,104,-1.923077


So from this analysis we find that there is little discrepancy between the number of raw strings matching the team names in the text and the number found by restricting ourselves to TextBlob nouns. This seems like a sensible way to proceed with our analysis.

In [38]:
team_names.to_csv('NER_vs_raw_strings.csv',index=False)

# 4. Comparing Sentiment Classifiers:

Read in files:

In [39]:
articles_list = pd.read_csv('../nyt_scrape/articles_list_w_date.csv',parse_dates=['date'])

In [40]:
file_title = articles_list.article_urls.apply(lambda x : x.replace("/","").replace(".",""))
article_date = articles_list.date

title_date_dict = dict(zip(file_title,article_date))

In [42]:
title_text_dict = dict()

for article in file_title:
    with open(f'../nyt_scrape/articles/{article}.txt') as f:
        title_text_dict[article] = f.read().replace('’','').replace('”','').replace('“','').replace('—','').split('\n')

In [43]:
para = TextBlob(title_text_dict[file_title[0]][0])

In [44]:
print(para, para.sentiment)

The Golden State Warriors have ruined basketball. Sentiment(polarity=0.3, subjectivity=0.5)


At first glance this sentiment analysis looks to be quite poor - we would hope this paragraph would register negative sentiment.

**Vader Sentiment:**

In [46]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

In [47]:
vader = SentimentIntensityAnalyzer()

In [48]:
vader.polarity_scores(para)

{'neg': 0.341, 'neu': 0.659, 'pos': 0.0, 'compound': -0.4767}

In [49]:
for para in title_text_dict[file_title[25]][:5]:
    print(f'{para}\nTextBlob Sentiment Scores:{TextBlob(para).sentiment}\nVader Sentiment Scores:{vader.polarity_scores(para)}\n\n')

BOSTON  With a little help from his friends, though not that much, LeBron James dazzled once again in a playoff elimination game, sending the Cavaliers back to the N.B.A. finals on Sunday night and keeping himself in a Cleveland uniform for at least another two weeks.
TextBlob Sentiment Scores:Sentiment(polarity=-0.1375, subjectivity=0.3)
Vader Sentiment Scores:{'neg': 0.0, 'neu': 0.882, 'pos': 0.118, 'compound': 0.6712}


In a game that looked bleak at the outset, with the Cavaliers playing without the injured Kevin Love, James powered an offense that started the game atrociously and then did just enough to break the upstart hearts of the young Boston Celtics, 87-79, in Game 7 of the Eastern Conference finals.
TextBlob Sentiment Scores:Sentiment(polarity=-0.2875, subjectivity=0.5875)
Vader Sentiment Scores:{'neg': 0.036, 'neu': 0.814, 'pos': 0.149, 'compound': 0.7398}


James played all 48 minutes and finished with 35 points, 15 rebounds and 9 assists, missing a triple-double only bec

On balance, Vader Sentiment appears to be more accurately capturing the sentiment of the article.