# Select out daily popular topics
**Objective**: For each day, select out daily popular topic by analyzing high frequency terms in news titles of that day.

# Roadmap
1. Build news title docs for each day
2. Preprocess each doc
3. Find out high frequency word in each news title doc

# Steps

In [1]:
"""
Initialization
"""

'''
Standard modules
'''
import os
from pprint import pprint

'''
Analysis modules
'''
import pandas as pd

'''
Custom modules
'''
import config
import utilities

'''
Misc
'''
nb_name = '20171002-daheng-select_daily_popular_topics'

## Build news title docs for each day

In [2]:
"""
Delcare tmp sr pickle
"""
news_period_title_docs_pkl = os.path.join(config.TMP_DIR, '{}-{}'.format(nb_name, 'news-period-title_docs.sr.pkl'))

if 1 == 1:
    '''
    Load in pickle for news data over selected period.
    '''
    news_period_df = pd.read_pickle(config.NEWS_PERIOD_DF_PKL)

In [3]:
"""
Print any single news title
"""
news_period_df.loc[3, 'news_title']

'Jeb Bush quits board posts ahead of possible White House run reports'

In [4]:
"""
Print complete news titles
"""
with pd.option_context('display.max_colwidth', 100):
    display(news_period_df[['post_time', 'news_title']])

Unnamed: 0,post_time,news_title
0,2015-01-01 00:03:09,"Jeb Bush takes ""natural next step"" toward 2016 bid"
1,2015-01-01 00:03:26,"Fireworks, parties and prayers usher in 2015"
2,2015-01-01 00:04:41,2 Killed in Helicopter Crash in Southern Arizona
3,2015-01-01 00:04:41,Jeb Bush quits board posts ahead of possible White House run reports
4,2015-01-01 00:06:16,North Korea's Kim Jong Un to South Korean leader: Let's meet
5,2015-01-01 00:08:39,Western states get brutal blast of winter
6,2015-01-01 00:09:14,Abbas paves way to join International Criminal Court
7,2015-01-01 00:12:44,A look at Egypt's Al-Jazeera English trial
8,2015-01-01 00:16:27,"Storm Brings Snow, Cold to West for New Year's"
9,2015-01-01 00:17:17,"Storm brings snow, cold to West for New Year's"


In [5]:
"""
Group news by day of post_time and concatenate news_titles
"""
news_titles_sr = news_period_df.resample('D', on='post_time')['news_title'].apply(lambda x: ' '.join(x))

In [6]:
"""
Print any single news title doc
"""
news_titles_sr.iloc[0]
# news_titles_sr.loc['2015-01-01']

'Jeb Bush takes "natural next step" toward 2016 bid Fireworks, parties and prayers usher in 2015 2 Killed in Helicopter Crash in Southern Arizona Jeb Bush quits board posts ahead of possible White House run reports North Korea\'s Kim Jong Un to South Korean leader: Let\'s meet Western states get brutal blast of winter Abbas paves way to join International Criminal Court A look at Egypt\'s Al-Jazeera English trial Storm Brings Snow, Cold to West for New Year\'s Storm brings snow, cold to West for New Year\'s A look at the trial of 3 Al-Jazeera English journalists imprisoned in Egypt over a year Storm brings snow, cold to West for New Year\'s Storm brings snow, cold to West for New Year\'s Mohamed Fahmy, Canadian imprisoned in Egypt, to get retrial North Korean leader open to summit with South 2 killed in helicopter crash in southern Arizona Egypt court orders retrial in Al-Jazeera case North Korean leader open to summit with South New Year\'s Stampede in Shanghai Prompts Anxious Wait Fo

In [7]:
"""
Print all news title docs
"""
with pd.option_context('display.max_colwidth', 130):
    print(news_titles_sr)

post_time
2015-01-01    Jeb Bush takes "natural next step" toward 2016 bid Fireworks, parties and prayers usher in 2015 2 Killed in Helicopter Crash i...
2015-01-02    West Virginia Police Shooting: 2 Officers Injured, 2 Dead Bodies Found Inside ... Polar Bear Plunge draws big crowds despite 2...
2015-01-03    7-year-old survives plane crash that kills 4 in Kentucky The Life and Times of Mario Cuomo People: Harry Reid injured in exerc...
2015-01-04    Pakistan Strikes Kill 31 Militants, Drone Kills 7 Republicans take control in House, Senate Tuesday Pakistan Strikes Kill 31 M...
2015-01-05    Nest's thermostat gets smarter with support for more third-party devices national football league playoffs: first round Stuart...
2015-01-06    German Anti-Islam Protests Hit Record Numbers UPDATE 2-Sony CEO praises employees, partners for standing up to hackers Three t...
2015-01-07    Tail of AirAsia plane located in Java Sea, Indonesian official says California breaks ground on bullet train as 

In [8]:
"""
Make tmp sr pickle
"""
if 0 == 1:
    news_titles_sr.to_pickle(news_period_title_docs_pkl)

## Preprocess each doc

In [9]:
"""
Load tmp sr pickle for news title docs
"""
if 1 == 1:
    news_titles_sr = pd.read_pickle(news_period_title_docs_pkl)