# Homework 2

In this homework you will be performing some analysis with entity extraction. In particular, you will be looking at the Reuters corpus and trying to construct entity profiles of persons, organizations, and locations. This will require you to iterate through the documents in the Reuters corpus, parse them appropriately, extract entities, and then store the entities along with some surrounding text. Additionally, you will be looking for mechanisms to identify potential relationships between persons and locations.

Throughout this you will need to use NLTK to access the corpus. At the same time, you will need to use an entity extraction system. You can choose to use either NLTK or Spacy. I would strongly suggest using Spacy for the entity extraction portion of this assignment.

The basic idea is to build a knowledge base around the entities you will extract in the Reuters corpus. Normally, this would be a first step to trying to model such things as entity resolution across documents. You could also use this as a first step to analyzing the sentiment towards particular entities. For example, people expressing dissatistfaction at a restaurant or brand.

Follow the below steps and read the comments carefully on the types of tasks your code will need to do.

I would expect that some of you might be able to reuse parts of this code for your project...

## Step 1) Import necessary libraries 

In [39]:
# This will be the corpus we work from
from nltk.corpus import reuters
import spacy 
import nltk

In [40]:
# I will assume you are using Spacy as a default entity recognizer.
#import en_core_web_sm
# note, the model load can be odd. In some instances your model might have the full name or the short name here.
# if you run into issues here, check the spacy model page at https://spacy.io/usage/models
#nlp = spacy.load("en")
#nlp =spacy.load('en_core_web_sm')
import en_core_web_sm
nlp = en_core_web_sm.load()

## Step 2) Fill in the following function to extract the entity, document id, and relevant sentence text from the input

In [41]:
def new_dict_assigned(ent, relevant_sentence, label, dict1):
    if ent.label_==label and ent.text.strip() not in dict1:
        dict1[ent.text.strip()]=relevant_sentence 
    elif ent.label_==label and ent.text.strip() in dict1: 
        dict1[ent.text.strip()][1].append(ent.sent.text.strip())
    return dict1

In [42]:
def extract_entities(doc_id, doc_text):
    analyzed_doc = nlp(doc_text)
    doc_persons = {}
    doc_organizations = {}
    doc_locations = {}
    
    for entity in analyzed_doc.ents:
        if entity.text.strip() != "":
            relevant_sentence = (doc_id, [entity.sent.text.strip()])
            new_dict_assigned(entity,relevant_sentence,"PERSON",doc_persons)
            new_dict_assigned(entity,relevant_sentence,"ORG",doc_organizations)
            new_dict_assigned(entity,relevant_sentence,"LOC",doc_locations)
            
    return doc_persons, doc_organizations, doc_locations

## Step 3) Adjust the following code to run the document entity extraction function
## Also, add the entity records you are constructing to your master list of entities
## Note: for the full subission run across all the Reuters documents

In [43]:
reuters_files = reuters.fileids()[:]
#reuters_files = reuters.fileids()[:25]
def combined_dict(key, dictionary):
    for key,value in key.items():
        if key not in dictionary:
            dictionary[key]=[value]
        else:
            dictionary[key].append(value)
    return dictionary

In [44]:
combined_persons = {}
combined_organizations = {}
combined_locations = {}

for doc_id in reuters_files:
    #extract all entities 
    persons, organizations, locations = extract_entities(doc_id, reuters.open(doc_id).read().replace('\n', '').lower())
    combined_dict(persons,combined_persons)
    combined_dict(locations,combined_locations)
    combined_dict(organizations,combined_organizations)

In [45]:
combined_persons

{'tom  murtha': [('test/14826',
   ['"if the tariffs remain in place for any length of time  beyond a few months it will mean the complete erosion of  exports (of goods subject to tariffs) to the u.s.," said tom  murtha, a stock analyst at the tokyo office of broker &lt;james  capel and co>.'])],
 'paul sheen': [('test/14826',
   ['"we must quickly open our markets, remove trade barriers and  cut import tariffs to allow imports of u.s. products, if we  want to defuse problems from possible u.s. retaliation," said  paul sheen, chairman of textile exporters &lt;taiwan safe group>.'])],
 'lawrence mills': [('test/14826',
   ['"that is a very short-term view," said lawrence mills,  director-general of the federation of hong kong industry.'])],
 'john button': [('test/14826',
   ['the australian government is awaiting the outcome of trade  talks between the u.s. and japan with interest and concern,  industry minister john button said in canberra last friday.'])],
 'michael smith': [('test/1

## Step 4) Fill in the following method to look through the content of an entity dictionary to determine the most popular based on number of mentions

In [46]:
def entity_tuples(entity_dictionary):
    most_common=[]
    
    #iterate dictionary key
    for key in entity_dictionary.keys():
        total_entity_mentions=0
        for t in range(len(entity_dictionary[key])):
            total_entity_mentions += len(entity_dictionary[key][t][1])
        most_common.append((key, total_entity_mentions))
    

    most_common= sorted(most_common, key = lambda x:x[1], reverse=True)
    return most_common

In [47]:
#get 500 top entities in each category
def most_popular_entities(entity_dictionary):
    #Get list of tuples (entity, total mentions)
    tuples_with_most_mentions= entity_tuples(entity_dictionary)
    return [t[0] for t in tuples_with_most_mentions[:500]]   

## Step 5) Now invoke your top entity mention finder

In [50]:
top_persons = most_popular_entities(combined_persons)
top_locations = most_popular_entities(combined_locations)
top_organizations = most_popular_entities(combined_organizations)

In [51]:
#print 10 of top persons
print(top_persons[:10])

['reagan', 'baker', 'lawson', 'yeutter', 'james baker', 'baldrige', 'johnson', 'mln', 'poehl', 'stoltenberg']


In [52]:
top_locations

['gulf',
 'europe',
 'west texas',
 'north sea',
 'asia',
 'africa',
 'the middle east',
 'north america',
 'west',
 'atlantic',
 'midwest',
 'the  gulf',
 'pacific',
 'mideast',
 '1986/87',
 'middle east',
 'western europe',
 'mediterranean',
 'the far east',
 'the north sea',
 'latin america',
 'south america',
 'west coast',
 'northeast',
 'the gulf of mexico',
 'nova',
 '4th qtr',
 'mississippi river',
 'new england',
 'southern gulf',
 'the aegean sea',
 'eastern europe',
 'the mideast gulf',
 'east',
 'the pacific coast',
 'east coast',
 'the red river',
 'the east coast',
 'south',
 'southeast asia',
 'valley',
 'southern california',
 'the west coast',
 'kharg island',
 'continental europe',
 'la pampa',
 'highland valley',
 'east bloc',
 'east europe',
 'mideast gulf',
 'caribbean',
 'scandinavia',
 'the persian gulf',
 'western',
 'the southern gulf',
 'south louisiana',
 'the black sea',
 'eastern bloc',
 'lake ontario',
 'the north china',
 'delta',
 'midmississippi river',

## Step 6) Analyze the most popular entities to determine what words they most frequently occur with

In [53]:
#generating tokens of entity

def get_tokens(dictionary):
    text=""
    for item in range(len(dictionary[ent])):
        mentions = dictionary[ent][item][1]
        all_mentions="".join(mentions)
        text += all_mentions
    doc = nlp(text)
    return [t.text for t in doc if not t.is_stop if t.is_alpha if t.text!=ent]

In [54]:


person_most_popular_terms = {}

for ent in top_persons:
    person_token_dictionary = {}
    #get all tokens 
    person_words = get_tokens(combined_persons)
    #get frequencies
    for w in person_words:
        if w not in person_token_dictionary.keys():
            person_token_dictionary[w]=1
        else:
            person_token_dictionary[w]+=1
    
   
    person_sorted_words= sorted(person_token_dictionary.items(), key=lambda x:x[1], reverse=True)
    
    person_top_words = [t[0] for t in person_sorted_words[:50]] 
    

    person_most_popular_terms[ent] = person_top_words

In [55]:
organization_most_popular_terms = {}

for ent in top_organizations:
    
    organization_token_dictionary = {}
    
    org_words = get_tokens(combined_organizations)
    for w in org_words:
        if w not in organization_token_dictionary.keys():
            organization_token_dictionary[w]=1
        else:
            organization_token_dictionary[w]+=1
            
    org_sorted_words= sorted(organization_token_dictionary.items(), key=lambda x:x[1], reverse=True)
    org_top_words = [t[0] for t in org_sorted_words[:50]] 
    organization_most_popular_terms[ent] = org_top_words

In [56]:
location_most_popular_terms = {}

for ent in top_locations:
    
    location_token_dictionary = {}
    
    loc_words = get_tokens(combined_locations)
    for w in loc_words:
        if w not in location_token_dictionary.keys():
            location_token_dictionary[w]=1
        else:
            location_token_dictionary[w]+=1
            
    loc_sorted_words= sorted(location_token_dictionary.items(), key=lambda x:x[1], reverse=True)
    loc_top_words = [t[0] for t in loc_sorted_words[:50]] 
    location_most_popular_terms[ent] = loc_top_words

## Step 7) Present your results of the most popular entities and their associated terms

In [57]:
for key,values in person_most_popular_terms.items():
    print(key, values)
    print("\n")

reagan ['said', 'administration', 'trade', 'president', 'japan', 'oil', 'economic', 'bill', 'japanese', 'house', 'foreign', 'congress', 'gulf', 'tariffs', 'united', 'states', 'policy', 'officials', 'agreement', 'tax', 'secretary', 'year', 'official', 'dlrs', 'legislation', 'wheat', 'decision', 'markets', 'unfair', 'action', 'soviet', 'new', 'today', 'retaliate', 'says', 'senate', 'union', 'sanctions', 'offer', 'semiconductor', 'retaliation', 'exports', 'mln', 'open', 'week', 'industry', 'impose', 'help', 'opposition', 'countries']


baker ['said', 'says', 'west', 'trade', 'hughes', 'treasury', 'agreement', 'dollar', 'louvre', 'meeting', 'rate', 'exchange', 'german', 'merger', 'paris', 'policy', 'interest', 'currency', 'accord', 'economic', 'agreed', 'germany', 'department', 'sees', 'rates', 'international', 'comment', 'deficit', 'secretary', 'told', 'monetary', 'weekend', 'stoltenberg', 'billion', 'consent', 'reduction', 'currencies', 'dlr', 'justice', 'decree', 'japanese', 'james', 'r

nazer ['opec', 'said', 'mln', 'oil', 'bpd', 'arabia', 'members', 'saudi', 'prices', 'continue', 'production', 'necessary', 'shared', 'market', 'conditions', 'fully', 'adhering', 'accord', 'sell', 'pronounced', 'circumstance', 'world', 'largest', 'exporter', 'restrain', 'long', 'adhere', 'pact', 'producing', 'abiding', 'produce', 'quota', 'defend', 'dlr', 'includes', 'neutral', 'zone', 'kuwait', 'sales', 'floating', 'storage', 'output', 'level', 'mean', 'kingdom', 'returned', 'role', 'swing', 'producer', 'played']


donald trump ['trump', 'donald', 'estate', 'said', 'resorts', 'dlrs', 'real', 'developer', 'inc', 'stock', 'ual', 'crosby', 'shares', 'mln', 'new', 'york', 'international', 'casino', 'class', 'b', 'common', 'agreed', 'purchase', 'hotel', 'month', 'held', 'bid', 'acquire', 'control', 'buy', 'interested', 'interstate', 'alexanders', 'discussions', 'takeover', 'recently', 'apparently', 'unsuccessful', 'spokesman', 'sharesdonald', 'family', 'chairman', 'james', 'manufacturing', 

harris ['said', 'common', 'filing', 'securities', 'exchange', 'commission', 'shares', 'dlrs', 'prices', 'ranging', 'sold', 'feb', 'dealings', 'behalf', 'advisory', 'committee', 'bell', 'dec', 'stock', 'bought', 'builders', 'transport', 'cyclops', 'jan', 'march', 'farmers', 'merchants', 'alleged', 'squeeze', 'cornering', 'market', 'harrisharris', 'lpfa', 'rule', 'book', 'allows', 'management', 'steps', 'necessary', 'correct', 'malpractice', 'assured', 'monitoring', 'retained', 'investment', 'banking', 'firm', 'kahn', 'sell', 'furnace']


jordan ['wheat', 'export', 'said', 'hard', 'soft', 'dlrs', 'credit', 'tonnes', 'red', 'winter', 'tender', 'mln', 'bids', 'bonus', 'sources', 'tonne', 'agreement', 'total', 'iraq', 'switched', 'guarantee', 'program', 'includes', 'delivery', 'marketing', 'boosts', 'boosted', 'private', 'today', 'bid', 'april', 'november', 'house', 'kaines', 'sold', 'cargoes', 'white', 'sugar', 'buying', 'expects', 'real', 'gnp', 'growth', 'percentage', 'point', 'higher', 

lukman ['said', 'oil', 'output', 'opec', 'world', 'higher', 'agreement', 'nigeria', 'saudi', 'bpd', 'quota', 'ecuador', 'nigerian', 'minister', 'remarks', 'whilst', 'talking', 'connection', 'debt', 'industrialised', 'controls', 'necessarily', 'mean', 'energy', 'bills', 'industrial', 'nations', 'president', 'expects', 'short', 'calm', 'need', 'slight', 'review', 'believe', 'tend', 'position', 'strengthen', 'gains', 'far', 'qatar', 'arabia', 'iranlukman', 'industry', 'reports', 'correct', 'february', 'propuced', 'mln', 'arabialukman']


indonesia ['said', 'month', 'spokesman', 'ministry', 'oil', 'producing', 'coffee', 'new', 'ico', 'trade', 'agriculture', 'official', 'wished', 'identified', 'problem', 'quality', 'continue', 'expand', 'suharto', 'opening', 'speech', 'indonesian', 'petroleum', 'association', 'ready', 'extend', 'contracts', 'held', 'foreign', 'companies', 'badly', 'hit', 'year', 'steep', 'plunge', 'crude', 'prices', 'cut', 'revenue', 'exports', 'expected', 'export', 'substa

buitoni ['billion', 'lire', 'court', 'said', 'consolidated', 'revenue', 'year', 'appeal', 'spa', 'accord', 'state', 'binding', 'rejected', 'company', 'controlled', 'statement', 'vismara', 'sales', 'net', 'profit', 'employs', 'people', 'disclose', 'financial', 'details', 'acquisition', 'represents', 'diversification', 'market', 'sector', 'annual', 'consumption', 'quarter', 'pct', 'comparable', 'reported', 'rose', 'ruling', 'acquire', 'italian', 'food', 'firm', 'sme', 'societa', 'meridionale', 'finanziaria', 'contractually', 'officials', 'claim', 'industrial']


ronald reagan ['ronald', 'reagan', 'president', 'nakasone', 'talks', 'said', 'trade', 'visit', 'prepare', 'prime', 'minister', 'yasuhiro', 'later', 'iran', 'japanese', 'arms', 'abe', 'warned', 'stronger', 'american', 'countermeasures', 'military', 'escalation', 'powerful', 'political', 'pressure', 'groups', 'widespread', 'opposition', 'tax', 'reform', 'plans', 'hard', 'pressed', 'come', 'new', 'tell', 'key', 'congressmen', 'annou

lloyd bentsen ['lloyd', 'bentsen', 'senate', 'finance', 'committee', 'chairman', 'd', 'told', 'trade', 'interest', 'tex', 'countries', 'measure', 'aimed', 'retaliation', 'correcting', 'japan', 'unfair', 'leader', 'calls', 'rate', 'cuts', 'called', 'major', 'industrial', 'pledge', 'coming', 'economic', 'summit', 'venice', 'cut', 'reference', 'spoils', 'thought', 'bill', 'introduced', 'democratic', 'senator', 'reporters', 'plan', 'sen', 'fellow', 'texas', 'democrat', 'positive', 'senators', 'including', 'max', 'baucus', 'mont']


robert brusca ['brusca', 'nikko', 'securities', 'said', 'robert', 'international', 'policy', 'manufacturing', 'prime', 'rate', 'increase', 'happen', 'soon', 'tonight', 'co', 'inc', 'surface', 'baker', 'look', 'responsible', 'caused', 'unsettling', 'financial', 'markets', 'west', 'german', 'new', 'york', 'numbers', 'things', 'suggest', 'fomc', 'change', 'decline', 'auto', 'employment', 'accounted', 'nearly', 'half', 'total', 'drop', 'fed', 'lowering', 'inflation'

cohen ['dlrs', 'said', 'average', 'price', 'silver', 'systems', 'asked', 'republic', 'directors', 'resigned', 'sees', 'analyst', 'michael', 'pickens', 'yorkton', 'securities', 'puts', 'possible', 'spike', 'stocks', 'run', 'incredible', 'way', 'noted', 'today', 'break', 'ounce', 'indicates', 'small', 'investors', 'entering', 'precious', 'metals', 'market', 'expects', 'ratio', 'gold', 'prices', 'seeks', 'acquisitions', 'marketing', 'america', 'inc', 'retained', 'richter', 'co', 'assist', 'efforts', 'redirect', 'business']


palm olein ['olein', 'palm', 'traders', 'arabia', 'refined', 'bleached', 'deodorised', 'tender', 'saudi', 'tonnes', 'shipment', 'tonne', 'rbd', 'bought', 'dlrs', 'tomorrow', 'june', 'buys', 'malaysian', 'cost', 'freight', 'jeddah', 'indian', 'state', 'trading', 'corp', 'april', 'import', 'seeking', 'market', 'pakistan', 'split', 'equal', 'cargo', 'lots', 'second', 'half', 'shipments', 'yesterday', 'possibly', 'cargoes', 'vegetable', 'oil', 'today', 'mar', 'apr', 'cif'

stora ['papyrus', 'close', 'completing', 'takeover', 'sweden', 'kopparbergs', 'bergslags', 'ab', 'shareholders', 'later', 'month', 'sell', 'shares', 'company', 'said', 'announced', 'september', 'acquire', 'price', 'billion', 'crowns', 'forming', 'europe', 'second', 'largest', 'forest', 'group', 'london', 'based', 'reed', 'international', 'plc', 'stremaining', 'l']


theodore cross ['share', 'theodore', 'cross', 'dlrs', 'harper', 'investor', 'row', 'bid', 'inc', 'editor', 'offers', 'pct', 'shares', 'stock', 'week', 'offered', 'prompting', 'rival', 'publishing', 'firm', 'harcourt', 'brace', 'jovanovich', 'business', 'rowbusiness', 'rowon', 'monday', 'received', 'surprise', 'dlr', 'owner', 'shareholder', 'group', 'led', 'new', 'york', 'said', 'securities', 'exchange', 'commission', 'filing', 'boosted', 'stake', 'frost', 'sullivan', 'common', 'total', 'outstanding']


solomon ['said', 'dollar', 'pct', 'presentation', 'japan', 'society', 'elaboration', 'expects', 'significant', 'decline', '

transco ['gas', 'said', 'open', 'access', 'cubic', 'feet', 'refused', 'transport', 'cheap', 'spot', 'distributors', 'likely', 'conciliatory', 'inventory', 'charge', 'industry', 'analysts', 'speculated', 'earlier', 'week', 'tenneco', 'follow', 'lead', 'close', 'pipelines', 'pay', 'anticipated', 'oil', 'prices', 'continue', 'exert', 'pressure', 'profitability', 'energy', 'pipeline', 'delivered', 'trillion', 'year', 'marketing', 'affiliate', 'sold', 'average', 'billion', 'filed', 'revised', 'settlement', 'proposal', 'permit', 'transporter', 'restructuring']


carter ['mln', 'beet', 'plantings', 'increase', 'restructuring', 'announced', 'december', 'hawley', 'rejected', 'buy', 'offer', 'retail', 'year', 'federal', 'government', 'subsidized', 'synfuels', 'development', 'synthetic', 'fuels', 'corp', 'research', 'program', 'created', 'administration', 'goal', 'developing', 'replacements', 'barrels', 'predicted', 'rise', 'midwest', 'coupled', 'increases', 'california', 'sugarbeet', 'slightly',

dole ['option', 'said', 'says', 'considered', 'senate', 'republican', 'leader', 'robert', 'congress', 'consider', 'legislation', 'apply', 'called', 'producers', 'major', 'proposal', 'farm', 'addition', 'debate', 'open', 'bill', 'reagan', 'administration', 'proposed', 'policy', 'changes', 'going', 'year', 'singling', 'cut', 'target', 'prices', 'pct', 'reporters', 'speech', 'sensed', 'shift', 'state', 'department', 'supporting', 'export', 'enhancement', 'initiative', 'soviet', 'union', 'change']


almir pazzionotto ['minister', 'almir', 'pazzionotto', 'said', 'government', 'force', 'settlement', 'strike', 'ruled', 'illegal', 'labour', 'statement', 'meeting', 'petrobras', 'oil', 'industry', 'leaders', 'set', 'wednesday', 'rio', 'presence', 'act', 'mediator']


martens ['government', 'billion', 'requirement', 'said', 'gnp', 'plan', 'spending', 'raising', 'francs', 'expected', 'grow', 'volume', 'cut', 'financial', 'belgian', 'prime', 'minister', 'wilfried', 'announced', 'parliament', 'reduc

natalie koether ['said', 'natalie', 'koether', 'stake', 'inc', 'investor', 'shares', 'pct', 'computer', 'memories', 'outstanding', 'common', 'stock', 'company', 'seeking', 'control', 'shareholder', 'group', 'led', 'far', 'hills', 'raised', 'total', 'attorney', 'reduced', 'ccx', 'previous', 'information', 'shareholders', 'prime', 'medical', 'services', 'reconsidering', 'plan', 'seek', 'plans', 'sell', 'entire']


maki ['trade', 'said', 'trying', 'reduce', 'farm', 'product', 'output', 'expensive', 'programs', 'japan', 'hold', 'detailed', 'discussions', 'item', 'new', 'round', 'gatt', 'talks', 'meeting', 'april', 'representative', 'clayton', 'yeutter', 'lyng', 'join']


j. bildner ['bildner', 'sons', 'inc', 'improved', 'sees', 'resultsj', 'said', 'expects', 'earnings', 'sales', 'current', 'fiscal']


jose romero ['chairman', 'jose', 'romero', 'said', 'visit', 'brussels', 'later', 'month', 'lobby', 'proposed', 'pct', 'european', 'community', 'ec', 'levy', 'vegetable', 'oil', 'noted', 'vote



peter baron ['baron', 'consumers', 'buffer', 'consumer', 'spokesman', 'peter', 'cocoa', 'compromise', 'stock', 'producers', 'accept', 'plan', 'members', 'international', 'organization', 'icco', 'accepted', 'final', 'rules', 'condition', 'agree', 'west', 'germany', 'perfectly', 'happy', 'told', 'reuters', 'london', 'agreements', 'economic', 'clauses', 'stabilise', 'prices', 'function', 'fixed', 'price', 'ranges', 'close', 'market', 'reality', 'participation', 'participants', 'prepared', 'obligations', 'framework', 'agreement', 'seriously', 'called', 'tone', 'negotiations']


bell ['stock', 'split', 'pct', 'southwestern', 'votes', 'dividend', 'said', 'increase', 'increasesouthwestern', 'statement', 'holds', 'bhp', 'billion', 'shares']


jeumont-schneider ['jeumont', 'schneider', 'siemens', 'group', 'subsidiary', 'pct', 'west', 'germany', 'ag', 'teamed', 'french', 'opposition', 'bid', 'att', 'dutch', 'philips', 'telecommunications', 'bv', 'applied', 'buy', 'france', 'second', 'largest',

deng ['grain', 'china', 'newspaper', 'quoted', 'saying', 'paper', 'output', 'sets', 'limit', 'imports', 'says', 'leader', 'xiaoping', 'said', 'import', 'mln', 'tonnes', 'ming', 'pao', 'hong', 'kong', 'key', 'issues', 'influence', 'development', 'situation', 'reached', 'point', 'pigs', 'fed', 'increases', 'state', 'council', 'decided', 'raise', 'price', 'grains', 'including', 'corn', 'rice', 'unchanged', 'gave', 'details']


qassem ahmed taqi ['oil', 'minister', 'qassem', 'ahmed', 'taqi', 'iraqi', 'news', 'iraq', 'replaced', 'agency', 'reports', 'reportsiraq', 'moved', 'heavy', 'industries', 'ministry', 'official', 'agcny', 'ina', 'said', 'decree', 'named', 'head', 'national', 'company', 'inoc', 'isam', 'abdul', 'rahim', 'al', 'chalaby', 'replacing']


evans ['said', 'industry', 'taxation', 'based', 'government', 'changes', 'plenty', 'examples', 'targeted', 'approaches', 'oil', 'produced', 'good', 'results', 'recent', 'decision', 'change', 'rrt', 'desire', 'ensure', 'certainty', 'stabil

junichiro koizumi ['plan', 'junichiro', 'koizumi', 'head', 'ldp', 'committee', 'working', 'japan', 'said', 'details', 'stave', 'trade', 'problems', 'liberal', 'democratic', 'party', 'drawn', 'detailed', 'calling', 'large', 'tax', 'cuts', 'increase', 'government', 'purchases', 'foreign', 'goods', 'phrase', 'taken', 'implying', 'immediate', 'cut', 'pct', 'discount']


koizumi ['said', 'ldp', 'cut', 'plan', 'specify', 'size', 'tax', 'domestic', 'demand', 'rule', 'rate', 'necessary', 'stimulated']


jaime ongpin ['secretary', 'jaime', 'ongpin', 'finance', 'government', 'billion', 'meet', 'trade', 'industry', 'jose', 'revenue', 'expected', 'rise', 'pct', 'pesos', 'year', 'said', 'intend', 'devalue', 'peso', 'wants', 'flexible', 'able', 'continue', 'respond', 'market', 'conditions']


jose concepcion ['trade', 'secretary', 'industry', 'jose', 'philippines', 'criticises', 'ec', 'oil', 'levy', 'concepcion', 'told', 'world', 'meet', 'finance', 'jaime', 'ongpin', 'vegetable', 'ministers']


jeff

redman ['said', 'independently', 'confirm', 'reports', 'iran', 'offered', 'halt', 'attacks', 'gulf', 'declined', 'elaborate', 'washington', 'emergency', 'meeting', 'nato', 'ambassadors', 'brussels', 'subject', 'path', 'want', 'tension', 'rise', 'help']


hernandez grisanti ['oil', 'hernandez', 'price', 'despite', 'attack', 'energy', 'minister', 'arturo', 'grisanti', 'iranian', 'growing', 'tension', 'gulf', 'said', 'market', 'months', 'venezuela', 'sees', 'flat', 'world', 'prices', 'remain', 'stable', 'platforms', 'venezuelan', 'foresaw', 'stability', 'crude', 'augmented', 'military', 'crucial', 'mines', 'today', 'told', 'meeting', 'regional', 'exporters', 'critical', 'efforts', 'achieve', 'recovery', 'stabilize']


midday ['close', 'today', 'said', 'manila', 'stock', 'exchange', 'composite', 'index', 'plunged', 'points', 'pct', 'depressed', 'record', 'point', 'fall', 'dow', 'jones', 'industrial', 'average', 'hughes', 'interested', 'bank', 'coming', 'dollar', 'yen', 'levels', 'new', 'yo

In [58]:
for key,values in location_most_popular_terms.items():
    print(key, values)
    print("\n")

gulf ['said', 'oil', 'iranian', 'states', 'iran', 'united', 'attack', 'shipping', 'reagan', 'military', 'mln', 'dlrs', 'missiles', 'prices', 'kuwaiti', 'american', 'price', 'warned', 'new', 'told', 'tehran', 'minister', 'forces', 'ships', 'tankers', 'ship', 'protect', 'use', 'monday', 'platform', 'attacks', 'tension', 'president', 'kuwait', 'arab', 'action', 'officials', 'fob', 'foreign', 'soviet', 'rates', 'week', 'hormuz', 'corn', 'usda', 'barge', 'freight', 'near', 'anti', 'region']


europe ['said', 'mln', 'oil', 'japan', 'market', 'year', 'japanese', 'exports', 'currency', 'pct', 'european', 'growth', 'trade', 'export', 'united', 'states', 'company', 'imports', 'prices', 'west', 'america', 'south', 'east', 'sales', 'demand', 'dlrs', 'largest', 'world', 'rose', 'sold', 'rate', 'billion', 'far', 'chairman', 'interest', 'report', 'sharply', 'added', 'firms', 'domestic', 'officials', 'ec', 'yen', 'rise', 'high', 'dollar', 'crude', 'economic', 'terms', 'cost']


west texas ['west', 'te

eastern europe ['europe', 'mln', 'eastern', 'said', 'tonnes', 'forecasts', 'plantings', 'forecast', 'east', 'germany', 'coarse', 'grain', 'licht', 'hectares', 'area', 'ha', 'follows', 'ussr', 'albania', 'bulgaria', 'czechoslovakia', 'hungary', 'poland', 'production', 'cutting', 'growth', 'recently', 'mainly', 'expectations', 'poor', 'export', 'performance', 'year', 'notably', 'oil', 'exporting', 'iafmm', 'fish', 'meal', 'consumption', 'rose', 'west', 'scandinavian', 'countries', 'far', 'fell', 'remained', 'static', 'grains', 'usually']


the mideast gulf ['gulf', 'mideast', 'oil', 'allied', 'said', 'soviet', 'tankers', 'protect', 'sources', 'force', 'propane', 'dlrs', 'arabia', 'direct', 'clash', 'lpg', 'mln', 'set', 'carry', 'kuwaiti', 'kuwait', 'agreed', 'charter', 'union', 'exports', 'diplomatic', 'dependence', 'secret', 'reagan', 'declared', 'upcoming', 'summit', 'venice', 'discussing', 'common', 'security', 'interests', 'shared', 'western', 'democracies', 'bigger', 'played', 'secr

3rd qtr ['dlrs', 'loss', 'vs', 'mln', 'includes', 'qtr', 'months', 'discontinued', 'gain', 'net', 'losses', 'operations', 'revs', 'disposition', 'investments', 'respectively']


south central ['south', 'central', 'care', 'centers', 'centralthe', 'areas', 'alberta', 'lacked', 'moisture', 'germination', 'patchy', 'learning', 'inc', 'said', 'signed', 'letter', 'intent', 'acquire', 'profitable', 'day', 'pennsylvania', 'total', 'price', 'dlrs']


jupiter ['kresge', 'corp', 'stores', 'mccrory', 'privately', 'held', 'rapid', 'american', 'said', 'k', 'mart', 'completed', 'previously', 'announced', 'acquisition', 'agreed', 'sell', 'subsidiary', 'newly', 'acquired', 'renamed']


south china ['south', 'china', 'said', 'coal', 'north', 'march', 'trader', 'transporting', 'stocks', 'areas', 'consumers', 'east', 'problem', 'particularly', 'handles', 'crude', 'oil', 'exports', 'transhipments', 'imports', 'include', 'fertiliser', 'soda', 'ash', 'iron', 'ore', 'brazil', 'hui', 'bao', 'drought', 'affecte

oak harbour ['home', 'loan', 'bank', 'savings', 'washington', 'federal', 'board', 'fhlbb', 'announced', 'acquisition', 'association', 'seattle', 'interwest', 'oak', 'harbour']


mississippi ['pct', 'missouri', 'year', 'republican', 'senators', 'john', 'danforth', 'christopher', 'bond', 'introduced', 'bill', 'allow', 'wheat', 'feedgrain', 'producers', 'rivers', 'hurt', 'flooding', 'collect', 'deficiency', 'payments', 'upper', 'areas', 'risen', 'nearly', 'past', 'weeks', 'original', 'tariff', 'price']


transvaal ['weather', 'agency', 'said', 'dry', 'summary', 'crop', 'bulletin', 'scattered', 'showers', 'continued', 'pockets', 'persisted', 'northeast', 'rainfall', 'february', 'near', 'normal', 'areas', 'earlier', 'periods', 'hot', 'reduced', 'yield', 'prospects', 'parts', 'northern', 'southern', 'orange', 'free', 'state']


the gulf of mexico's ['gas', 'gulf', 'mexico', 'mln', 'need', 'lock', 'future', 'supplies', 'utilities', 'big', 'industrial', 'customers', 'bring', 'resurgence', 'act

thassos island ['aegean', 'oil', 'row', 'erupted', 'turkey', 'said', 'search', 'round', 'greek', 'islands', 'coast', 'following', 'announcement', 'greece', 'planned', 'drill', 'east', 'thassos', 'island', 'taking', 'control', 'canadian', 'led', 'consortium', 'operating', 'northern']


north central nevada ['resources', 'ltd', 'program', 'drilling', 'cornucopia', 'said', 'extensive', 'drill', 'sampling', 'begin', 'mid', 'april', 'ivanhoe', 'gold', 'property', 'north', 'central', 'nevada']


northern europe ['netbacks', 'crude', 'oil', 'refined', 'northern', 'europe', 'generally', 'lower', 'brent', 'valued', 'dlrs']


the mediterranean region ['netback', 'values', 'mediterranean', 'region', 'shown', 'dlrs', 'barrel']


northern europe's ['netbacks', 'northern', 'europe', 'refinery', 'region', 'lower', 'friday', 'previous', 'week', 'brent', 'falling', 'pct', 'dlrs', 'barrel']


southern kansas ['corp', 'penteco', 'interest', 'east', 'gas', 'reef', 'energy', 'said', 'board', 'entered', 'ag

the pacific ocean ['ecuador', 'forced', 'suspend', 'exports', 'pipeline', 'connecting', 'jungle', 'oil', 'fields', 'pacific', 'ocean', 'port', 'balao', 'damaged', 'week', 'earthquake']


central queensland ['pct', 'mines', 'copper', 'coal', 'assets', 'include', 'iron', 'ore', 'gold', 'utah', 'stakes', 'seven', 'large', 'central', 'queensland', 'coking', 'samarco', 'operation', 'brazil', 'la', 'escondida', 'deposit', 'chile', 'island', 'port', 'hardy', 'canada', 'south', 'africa', 'bhp', 'minerals', 'wholly', 'partly', 'owned', 'manganese', 'base', 'metal', 'operations', 'prospects', 'ok', 'tedi', 'project', 'papua', 'new', 'guinea']


mid-east gulf ['billion', 'total', 'trade', 'mid', 'east', 'gulf', 'states', 'fell', 'pct', 'lower', 'oil', 'prices', 'imports', 'dlrs', 'compared', 'exports']


batam island ['indonesia', 'build', 'palm', 'oil', 'terminal', 'plans', 'crude', 'new', 'port', 'batam', 'island', 'south', 'singapore', 'research', 'technology', 'minister', 'yusuf', 'habibie', 

In [59]:
for key,values in organization_most_popular_terms.items():
    print(key, values)
    print("\n")

mln ['dlrs', 'vs', 'said', 'net', 'year', 'loss', 'billion', 'pct', 'shares', 'share', 'company', 'profit', 'sales', 'stg', 'quarter', 'cts', 'bank', 'revs', 'dlr', 'inc', 'note', 'corp', 'earnings', 'tax', 'reported', 'income', 'rose', 'includes', 'shr', 'gain', 'february', 'compared', 'january', 'assets', 'fell', 'revenues', 'operations', 'march', 'oil', 'interest', 'sale', 'cash', 'stock', 'week', 'ended', 'total', 'group', 'common', 'qtr', 'co']


cts ['vs', 'loss', 'net', 'shr', 'profit', 'mln', 'dlrs', 'year', 'div', 'qtr', 'oper', 'revs', 'share', 'ctscts', 'qtly', 'ctsshr', 'dividend', 'mths', 'corp', 'pct', 'said', 'sales', 'quarterly', 'shrs', 'april', 'avg', 'inc', 'gain', 'quarter', 'note', 'jan', 'record', 'sulphur', 'prior', 'sets', 'march', 'ctsoper', 'seven', 'payout', 'ctsqtr', 'ctssix', 'payable', 'ctstwo', 'includes', 'ended', 'tax', 'company', 'extraordinary', 'primary', 'raises']


pct ['year', 'said', 'february', 'rose', 'rise', 'january', 'mln', 'stake', 'rate', 

gaf ['said', 'offer', 'borg', 'warner', 'dlrs', 'company', 'share', 'corp', 'heyman', 'board', 'chemical', 'union', 'carbide', 'chairman', 'samuel', 'pct', 'acquire', 'analyst', 'takeover', 'mln', 'spokesman', 'dlr', 'bid', 'group', 'gain', 'billion', 'plastics', 'chemicals', 'values', 'merrill', 'lynch', 'believe', 'rose', 'oppenheimer', 'proposal', 'led', 'tender', 'profits', 'unsuccessful', 'offered', 'chicago', 'agreement', 'based', 'profit', 'net', 'asked', 'reconsider', 'shares', 'today', 'bank']


american express ['express', 'american', 'shearson', 'said', 'nippon', 'pct', 'stock', 'mln', 'life', 'rumors', 'lehman', 'sell', 'shares', 'co', 'analysts', 'comment', 'statement', 'financial', 'firm', 'market', 'company', 'officials', 'services', 'sold', 'traders', 'declared', 'speculation', 'spinoff', 'value', 'reflect', 'securities', 'brokerage', 'worldwide', 'options', 'stake', 'japanese', 'investment', 'dlrs', 'split', 'cts', 'brothers', 'total', 'exchange', 'year', 'considered',

shell ['oil', 'said', 'dlrs', 'dutch', 'group', 'royal', 'gasoline', 'pct', 'mln', 'ltd', 'net', 'billion', 'prices', 'petroleum', 'vs', 'royalty', 'pact', 'bp', 'revise', 'spokesman', 'higher', 'market', 'co', 'unit', 'dollar', 'year', 'exchange', 'tonnes', 'products', 'sands', 'stations', 'pri', 'output', 'corp', 'caltex', 'singapore', 'petrol', 'eastern', 'pte', 'subsidiary', 'product', 'yen', 'japan', 'octane', 'sales', 'contract', 'gas', 'buy', 'little', 'major']


the ec commission ['ec', 'commission', 'said', 'tax', 'vegetable', 'oils', 'export', 'ministers', 'european', 'trade', 'proposals', 'marine', 'fats', 'community', 'producers', 'licences', 'tonnes', 'open', 'proposed', 'today', 'tender', 'maize', 'recent', 'action', 'market', 'july', 'future', 'steel', 'industry', 'farm', 'large', 'brussels', 'states', 'import', 'called', 'special', 'west', 'sources', 'grain', 'granted', 'current', 'currency', 'ecus', 'tonne', 'intact', 'minister', 'group', 'strong', 'block', 'adoption']

fujitsu ['fairchild', 'said', 'semiconductor', 'acquisition', 'schlumberger', 'deal', 'secretary', 'agreement', 'control', 'japanese', 'given', 'analysts', 'weinberger', 'sale', 'technology', 'north', 'american', 'capel', 'distribution', 'access', 'production', 'buying', 'firm', 'defense', 'caspar', 'commerce', 'malcolm', 'baldrige', 'pct', 'officials', 'market', 'supercomputers', 'japan', 'supercomputer', 'proposed', 'purchase', 'furore', 'proceed', 'talks', 'line', 'basic', 'reached', 'year', 'spokeswoman', 'told', 'terminates', 'pact', 'sell', 'business', 'analyst']


klm ['air', 'said', 'stake', 'british', 'dutch', 'commonwealth', 'atlanta', 'royal', 'talks', 'week', 'courier', 'service', 'door', 'ltd', 'minority', 'airlines', 'delivery', 'services', 'takeover', 'regional', 'shipping', 'network', 'spokesman', 'agreed', 'report', 'negotiating', 'spokeswoman', 'xp', 'plc', 'flights', 'uk', 'seeking', 'group', 'airline', 'press', 'nv', 'carrier', 'nrc', 'vendex', 'international', 'own

balladur ['said', 'pct', 'louvre', 'target', 'year', 'growth', 'french', 'interest', 'finance', 'urges', 'respect', 'international', 'monetary', 'rates', 'accordballadur', 'maintains', 'inflation', 'notion', 'financial', 'zones', 'currencies', 'fact', 'higher', 'dollar', 'artificially', 'high', 'latest', 'contact', 'ministers', 'minister', 'west', 'german', 'pledges', 'policy', 'insists', 'maintenance', 'february', 'probably', 'environment', 'favourable', 'economic', 'interview', 'daily', 'les', 'france', 'says', 'zone', 'nearerballadur', 'community', 'closer']


honeywell ['said', 'bull', 'company', 'systems', 'automation', 'dlrs', 'interest', 'sperry', 'aerospace', 'federal', 'new', 'owned', 'nec', 'pct', 'spencer', 'quarter', 'operating', 'december', 'inc', 'business', 'debt', 'unit', 'loss', 'reduce', 'sale', 'created', 'dedicated', 'computer', 'jointly', 'chairman', 'chief', 'executive', 'officer', 'edson', 'major', 'step', 'expects', 'largest', 'customer', 'purchasing', 'computer

state ['secretary', 'george', 'shultz', 'trade', 'wheat', 'said', 'today', 'moscow', 'allied', 'economic', 'week', 'gulf', 'problem', 'association', 'official', 'international', 'business', 'newspaper', 'published', 'china', 'demands', 'editorial', 'coincide', 'visit', 'eep', 'issue', 'reagan', 'administration', 'relations', 'growers', 'schultz', 'steel', 'billion', 'yen', 'package', 'announced', 'tokyo', 'went', 'bigger', 'force', 'played', 'boost', 'forces', 'mideast', 'vital', 'protect', 'shipping', 'attack', 'possible', 'contributions']


cftc ['futures', 'trading', 'exchange', 'commission', 'commodity', 'said', 'contract', 'cbt', 'wheat', 'contracts', 'corn', 'speculative', 'limits', 'kcbt', 'proposal', 'mge', 'kansas', 'month', 'proposed', 'impair', 'ability', 'interest', 'minneapolis', 'grain', 'syrup', 'april', 'approves', 'approval', 'approved', 'rule', 'change', 'february', 'according', 'hits', 'position', 'limit', 'planthe', 'raising', 'months', 'net', 'single', 'time', 'lea

american motors ['motors', 'american', 'said', 'chrysler', 'renault', 'says', 'stake', 'merger', 'letter', 'company', 'received', 'corp', 'proposal', 'transaction', 'hands', 'directors', 'agreed', 'emerson', 'electric', 'ltv', 'march', 'intent', 'companies', 'ask', 'extend', 'agreement', 'date', 'event', 'prior', 'april', 'discovers', 'unforeseen', 'problem', 'course', 'diligence', 'investigation', 'referring', 'renaultchrysler', 'renaultamerican', 'advising', 'enter', 'brief', 'statement', 'studying', 'foresaw', 'major', 'complications', 'abort', 'combination', 'historians']


the bank of spain ['bank', 'spain', 'pct', 'money', 'raised', 'assistance', 'funds', 'overnight', 'said', 'banks', 'rate', 'billion', 'daily', 'today', 'drain', 'suspended', 'market', 'rates', 'reserve', 'year', 'compared', 'spanish', 'central', 'yesterday', 'demand', 'currently', 'inflation', 'auction', 'requirement', 'excess', 'liquidity', 'supply', 'spokesman', 'offered', 'seven', 'day', 'repurchase', 'agreem

cgct ['pct', 'french', 'telephone', 'market', 'mln', 'stake', 'francs', 'philips', 'government', 'offer', 'siemens', 'controls', 'said', 'private', 'losses', 'owns', 'asked', 'start', 'submit', 'sale', 'france', 'priced', 'international', 'consortia', 'battling', 'right', 'buy', 'german', 'companies', 'leading', 'contenders', 'control', 'foreign', 'groups', 'want', 'gain', 'foothold', 'potential', 'limited', 'privatisation', 'laws', 'passed', 'year', 'left', 'acquired', 'bid', 'outlined', 'american', 'telegraph', 'co']


bank of japan intervenes ['japan', 'intervenes', 'tokyo', 'dollars', 'yen', 'bank', 'dealersbank', 'dealers', 'opening', 'buy', 'dollar', 'soon', 'openingbank', 'support', 'buys', 'buying', 'intervened', 'market', 'opened', 'marketbank', 'stem', 'fallbank', 'early', 'afternoonbank', 'shortly', 'opensbank']


dow ['said', 'dlrs', 'chemical', 'mln', 'prices', 'rate', 'year', 'increase', 'earned', 'oil', 'technical', 'refined', 'grades', 'pound', 'underwriter', 'familiar'

g-7 ['japan', 'meeting', 'economic', 'understand', 'approved', 'present', 'rates', 'britain', 'canada', 'france', 'italy', 'west', 'germany', 'said', 'ranges', 'members', 'statement', 'consider', 'currencies', 'broadly', 'consistent', 'fundamentals', 'says', 'japanese', 'policies', 'time', 'officials', 'asmiyazawa', 'supports', 'louvre', 'accord', 'finance', 'minister', 'kiichi', 'miyazawamiyazawa', 'strongly', 'supportspresident', 'jacques', 'delors', 'called', 'swift', 'convening', 'countries', 'following', 'instability', 'today', 'trading', 'world', 'money', 'stock']


southmark ['said', 'shares', 'rights', 'dividend', 'corp', 'american', 'special', 'shareholders', 'realty', 'trust', 'price', 'capital', 'mln', 'right', 'acquire', 'share', 'dlrs', 'cash', 'stock', 'traded', 'april', 'common', 'ex', 'date', 'purchase', 'acquisition', 'talks', 'pratt', 'hotel', 'caesars', 'world', 'facilities', 'issue', 'entitles', 'holder', 'buy', 'beneficial', 'interest', 'compute', 'paid', 'based', 

the u.s. agriculture  department ['agriculture', 'department', 'credit', 'commodity', 'corporation', 'ccc', 'accepted', 'cover', 'bonus', 'sale', 'bid', 'export', 'tonnes', 'egypt', 'frozen', 'poultry', 'iraq', 'wheat', 'semolina', 'barley', 'farm', 'said', 'offers', 'switched', 'mln', 'dlrs', 'guarantees', 'mexico', 'purchases', 'appears', 'relying', 'corn', 'china', 'argentina', 'south', 'africa', 'supplies', 'united', 'states', 'bids', 'bonuses', 'sales', 'flour', 'offer', 'exporter', 'head', 'dairy', 'cattle', 'morocco', 'cyprus']


manufacturers hanover ['hanover', 'manufacturers', 'rate', 'prime', 'pct', 'bank', 'trust', 'raises', 'effective', 'said', 'today', 'dlrs', 'bankers', 'futures', 'todaymanufacturers', 'co', 'major', 'citibank', 'chase', 'raise', 'announced', 'national', 'loans', 'agent', 'group', 'financing', 'mln', 'facility', 'disappointment', 'european', 'central', 'appointment', 'opportunity', 'sell', 'dollar', 'lower', 'vice', 'president', 'carol', 'increase', 'mat

mellon ['dlrs', 'mln', 'said', 'quarter', 'loan', 'loss', 'end', 'pct', 'loans', 'charge', 'offs', 'current', 'brazil', 'robertson', 'contrast', 'confidence', 'triggered', 'week', 'washington', 'announced', 'plans', 'slap', 'tariffs', 'japanese', 'electronic', 'imports', 'raising', 'specter', 'debilitating', 'trade', 'provision', 'losses', 'reflecting', 'additions', 'earned', 'reserve', 'estimated', 'book', 'compared', 'total', 'primary', 'capital', 'ratio', 'line', 'figure', 'excess', 'regulatory', 'noted', 'bigger', 'involvement']


commonwealth ['said', 'british', 'plant', 'talks', 'pct', 'aluminum', 'puts', 'block', 'aluminumcommonwealth', 'mln', 'rmj', 'agreed', 'klm', 'stake', 'dutch', 'reverses', 'increase', 'early', 'continuing', 'columbia', 'opened', 'interested', 'bought', 'january', 'closed', 'feb', 'leaving', 'workers', 'dividend', 'include', 'shares', 'westfair', 'originated', 'dlrs', 'residential', 'mortgage', 'loans', 'securities', 'management', 'continue', 'holding', 'a

ifo ['diw', 'said', 'pct', 'industry', 'report', 'economy', 'institutes', 'exports', 'saw', 'rising', 'predicted', 'mln', 'joint', 'estimates', 'economic', 'development', 'markedly', 'favourable', 'forecast', 'pick', 'slow', 'start', 'continue', 'weak', 'point', 'good', 'reason', 'believe', 'soon', 'slight', 'rise', 'emerge', 'course', 'positive', 'private', 'consumption', 'compared', 'investments', 'rises', 'employment', 'occur', 'tertiary', 'sector', 'number', 'employed', 'manufacturing', 'unemployment', 'decline', 'recessionary', 'influences']


wells fargo ['fargo', 'wells', 'dlrs', 'includes', 'warner', 'said', 'security', 'credit', 'billion', 'mln', 'services', 'guards', 'chilton', 'corp', 'rating', 'automotive', 'parts', 'protective', 'business', 'agreement', 'dlr', 'loan', 'partnership', 'line', 'borg', 'spokeswoman', 'company', 'plans', 'sell', 'financial', 'unit', 'businesses', 'include', 'co', 'shr', 'vs', 'andborg', 'systems', 'separate', 'announcement', 'acquisition', 'inc

bond corp ['bond', 'corp', 'holdings', 'ltd', 'said', 'considering', 'atlas', 'mining', 'pct', 'media', 'comment', 'allied', 'bail', 'consolidated', 'development', 'seriously', 'investments', 'philippines', 'australian', 'brewer', 'alan', 'offered', 'pesos', 'share', 'b', 'acquire', 'merlin', 'pete', 'previously', 'reported', 'publicly', 'floated', 'rights', 'issue', 'owned', 'expected', 'listed', 'end', 'document', 'shareholders', 'delay', 'follows', 'receipt', 'mln', 'dlr', 'loan', 'parent', 'company', 'meet', 'payment']


nippon steel ['steel', 'nippon', 'corp', 'mln', 'joint', 'venture', 'china', 'spokesman', 'dlrs', 'gtx', 'inland', 'pct', 'denies', 'seeking', 'japanese', 'plants', 'told', 'reuters', 'official', 'request', 'company', 'considering', 'sales', 'mainichi', 'quoted', 'officials', 'saying', 'prices', 'reasonable', 'export', 'mills', 'invests', 'corpnippon', 'discussing', 'co', 'negotiating', 'set', 'indiana', 'said', 'declining', 'local', 'newspapers', 'reported', 'capi

texaco canada ['canada', 'texaco', 'crude', 'canadian', 'oil', 'bbl', 'postings', 'cts', 'said', 'prices', 'raises', 'light', 'sweet', 'dlrs', 'cuts', 'par', 'grade', 'dlrstexaco', 'canadiancts', 'inc', 'raise', 'edmonton', 'swann', 'hills', 'industry', 'outlook', 'positive', 'robust', 'annual', 'report', 'strengthened', 'somewhat', 'good', 'reason', 'believe', 'general', 'level', 'sustainable', 'continued', 'volatility', 'likely', 'lowered', 'contract', 'price', 'pay']


eurostat ['pct', 'ec', 'output', 'growth', 'said', 'industrial', 'year', 'compared', 'european', 'community', 'statistics', 'office', 'earlier', 'production', 'january', 'fell', 'rose', 'months', 'annual', 'inflation', 'rate', 'marginally', 'april', 'beaten', 'japan', 'prices', 'provisionally', 'lower', 'industry', 'slows', 'increased', 'average', 'recorded', 'highest', 'portugal', 'greece', 'contracted', 'noted', 'december', 'added', 'adjustment', 'seasonal', 'factors', 'clearly', 'slowing', 'beginning', 'summeec', '

carney ['countries', 'market', 'government', 'said', 'access', 'steel', 'support', 'agriculture', 'avoid', 'production', 'incentives', 'thirdly', 'freeze', 'seek', 'reduce', 'aid', 'measures', 'distorted', 'world', 'prices', 'fourth', 'principle', 'introduce', 'new', 'import', 'barriers', 'mandated', 'existing', 'legislation', 'fifth', 'basic', 'principles', 'implemented', 'help', 'maintain', 'open', 'taking', 'action', 'ensure', 'accurate', 'data', 'exports', 'imports', 'canada', 'backdoor', 'offshore', 'suppliers', 'canadian', 'companies', 'asked']


ag ['said', 'marks', 'bank', 'dollar', 'likely', 'stay', 'range', 'gisela', 'steinhaeuser', 'senior', 'dealer', 'chase', 'management', 'board', 'spokesman', 'hamburg', 'based', 'und', 'westbank', 'turnover', 'months', 'group', 'asian', 'creditanstalt', 'bankverein', 'deutsche', 'joint', 'alfred', 'herrhausen', 'told', 'news', 'taboo', 'peter', 'pietsch', 'commerzbank', 'engineering', 'linde', 'world', 'rose', 'mln', 'pct', 'period', 'cha

the bank of italy ['italy', 'bank', 'billion', 'net', 'official', 'reserves', 'january', 'banks', 'companies', 'italian', 'lire', 'previously', 'reported', 'said', 'capital', 'end', 'rose', 'provisional', 'pct', 'months', 'conditions', 'allocation', 'individuals', 'holding', 'stakes', 'fall', 'fell', 'april', 'today', 'newspaper', 'la', 'repubblica', 'cited', 'remarks', 'announced', 'deficit', 'partly', 'caused', 'non', 'banking', 'money', 'supply', 'seasonally', 'adjusted', 'cumulative', 'balance', 'match', 'total', 'calculated', 'individual']


xerox ['rank', 'south', 'said', 'affiliate', 'africa', 'fintech', 'unit', 'ltd', 'agreement', 'sell', 'altron', 'corp', 'signed', 'definitive', 'pty', 'group', 'undisclosed', 'preliminary', 'reached', 'completion', 'sale', 'awaits', 'approval', 'shareholders', 'review', 'johannesburg', 'stock', 'founded', 'wholly', 'owned', 'manufactures', 'markets', 'products', 'eastern', 'workforce', 'pct', 'black', 'colored', 'african', 'sells', 'copiers', 

atpc ['tin', 'exports', 'plan', 'members', 'curb', 'member', 'association', 'producing', 'countries', 'export', 'tonnes', 'zaire', 'find', 'ways', 'industry', 'officials', 'states', 'chairman', 'indonesia', 'mining', 'energy', 'minister', 'subroto', 'pledged', 'country', 'support', 'produced', 'exported', 'estimated', 'agreement', 'seven', 'aimed', 'cut', 'world', 'surplus', 'boost', 'prices', 'executive', 'director', 'victor', 'siaahan', 'told', 'reuters', 'received', 'telex', 'indicating', 'willingess', 'limit', 'total', 'year']


olivetti ['exclude', 'venture', 'spokesman', 'controlled', 'investment', 'company', 'c', 'ec', 'stake', 'possibility', 'investing', 'responding', 'reuters', 'query', 'italian', 'press', 'reports', 'today', 'saying', 'participate', 'pct', 'buitoni', 'cir', 'compagnie', 'industriali', 'riunite', 'ing', 'spaolivetti', 'sgs', 'thomsoning', 'spa', 'semiconductor', 'currently', 'discussion', 'italy', 'societa', 'finanziaria', 'telefonica', 'france', 'thomson', 'c

## Extra Credit

There are several extra credit options for this assignment. 
* The first would be to determine which persons, organizations, and locations most frequently occur in the same sentences.
* Another task would be to attempt to resolve different forms of the same name for each person and location. For example, George Bush and Bush inside the same document.