This assignments gives you practice re-arranging text files and dealing with difficult formats. Here's the information from the assignment's README, at least at the moment: 

---

# Assignment: Headline Processing

This is an assignment that is eligible for completion against your contract goals. 

In this assignment, you'll practice reshaping a pretty messy text file. This assignment
may end up being a decent amount of work, but assignment 2 will build on this one, so
you'll be making an investment in two assignments. 

## Background

In 2015 and 2016 a journalism student at UMT started capturing headlines from six 
Montana newspapers. In this assignment we'll only work with the headlines from
the Missoulian, though you can see all of the raw materials in the Excel file 
"Headlines Base Document.xlsx". 

The student wished to analyze some of the words used by the different newspapers. But,
as you'll see when you look at the file, the data are arranged in a way that makes
the headlines hard to work with. In this assignment, you'll re-arrange the data. 

## Task

You'll start with the file `missoula.txt`, which holds a copy-and-paste from the 
"Missoulian" sheet in the Excel file. Your goal is to create a file that looks 
like the file `missoula_clean.txt`. 

**Input**: The input file is arranged as a ragged table. The first row holds 
dates in a DD-MMM format. Note that dates before 01-Jan are from 2015; dates 
after 01-Jan are from 2016. The rows below a date hold the headlines from that date
and you'll notice a varying number of headlines. 

**Output**: The output file is arranged as a [tidy data set](https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html), 
with one headline on each row. There are three columns: the newspaper, the date, and the headline. 

Things to think about: 

1. You'll need to transform dates in this DD-MMM format into a traditional YYYY-MM-DD, 
the [one true date format](https://xkcd.com/1179/). 
1. There's no way to read a file "vertically" in Python, so you'll need to be smart
about associating a headline with its date. 


In [1]:
import pandas as pd
import numpy as np

In [2]:
# Your code will go here, but I'll start you off by reading in the file and looking at the first few rows of data
with open("missoula.txt",'r', encoding = "ISO-8859-1") as infile :
    for idx, row in enumerate(infile.readlines()):
        print(row)
        if idx > 2 :
            break

23-Sep	30-Sep	7-Oct	14-Oct	21-Oct	28-Oct	4-Nov	11-Nov	18-Nov	25-Nov	2-Dec	9-Dec	16-Dec	23-Dec	30-Dec	6-Jan	13-Jan	20-Jan	27-Jan	3-Feb	10-Feb	17-Feb	24-Feb	2-Mar	9-Mar	16-Mar	23-Mar	30-Mar	6-Apr	13-Apr	20-Apr	27-Apr	4-May	11-May	18-May	25-May	1-Jun	8-Jun	15-Jun	22-Jun	29-Jun	6-Jul	13-Jul	20-Jul	27-Jul	3-Aug	10-Aug	17-Aug	24-Aug	31-Aug	7-Sep	14-Sep	21-Sep

After EWU air raid, Bobcats switch focus to Cal Poly option	Student accidentally discharges gun at Tech student housing; no one hurt	Billings woman suspects antifreeze poisoning killed Knuckles, her English bulldog	'Two Rivers' takes impressionistic look at Milltown Dam's removal	Big Sky notebook: Turnover margin a make-or-break component to success		'Tall building lawyers' in court with Carlyle director	Comedy group reunites for sketch-improv show	Bourbon glazed carrots	Bison plan draws comments from around nation, world	'Curiosities' at Butterfly Herbs print show	As leaders fret in Paris, climate-change concerns grow in Montana	Big S

In [7]:
textlist = []
with open("missoula.txt",'r', encoding = "ISO-8859-1") as infile :
    textlist = infile.readlines()

In [8]:
dates = textlist[0]

In [9]:
datelist = dates.split()

In [10]:
datelist.index('30-Dec')

14

In [11]:
dates2015 = datelist[0:15]

In [12]:
dates2016 = datelist[15:]

In [13]:
dates2015 = [date + "-2015" for date in dates2015]

In [14]:
dates2016 = [date + "-2016" for date in dates2016]

In [15]:
alldates = dates2015 + dates2016

In [16]:
MonthDict = {'Jan': '01', 'Feb': '02', 'Mar': '03', 'Apr': '04', 'May': '05', 'Jun': '06', 'Jul': '07', 'Aug': '08', 'Sep': '09', 'Oct': '10', 'Nov': '11', 'Dec': '12'}

In [17]:
for date, month in enumerate(alldates):
    for key in MonthDict:
        if key in month:
            alldates[date]=month.replace(key, MonthDict[key])

In [18]:
alldates = [sub.replace('-', '/') for sub in alldates]

In [19]:
from datetime import datetime
alldates = [datetime.strptime(x,'%d/%m/%Y') for x in alldates]

In [20]:
alldates = [x.strftime('%Y/%m/%d') for x in alldates]

In [21]:
Mydict = dict.fromkeys(alldates)

In [22]:
for i in alldates:
    Mydict[i] = []

In [23]:
len(textlist)

74

In [24]:
textlist = [[item.split('\t')] for item in textlist[1:]]

In [25]:
x=0
y=0

for z, p in enumerate(textlist):
    for headline in p[0]:
        if x == 53:
            x=0
            y += 1
            if y == 74:
                break
        else:
            Mydict[alldates[x]].append(textlist[y][0][x])
            x += 1

In [26]:
headline_processing_df = pd.DataFrame(Mydict.items(), columns=['Date', 'Headline'])

In [27]:
headline_processing_df['Paper'] = 'Missoulian'

In [28]:
paper = headline_processing_df.pop('Paper')
headline_processing_df.insert(0, 'Paper', paper)

In [29]:
headline_processing_df = headline_processing_df.explode('Headline').reset_index(drop=True)

In [30]:
headline_processing_df.isnull().sum()

Paper       0
Date        0
Headline    0
dtype: int64

In [31]:
headline_processing_df['Headline'].replace('', np.nan, inplace=True)

In [32]:
headline_processing_df['Headline'].replace('\n', np.nan, inplace=True)

In [33]:
headline_processing_df.dropna(subset=['Headline'], inplace=True)

In [34]:
headline_processing_df

Unnamed: 0,Paper,Date,Headline
0,Missoulian,2015/09/23,"After EWU air raid, Bobcats switch focus to Ca..."
1,Missoulian,2015/09/23,Alternative art gallery FrontierSpace to hold ...
2,Missoulian,2015/09/23,Does state gets passing grade for education fu...
3,Missoulian,2015/09/23,Fall films have issues
4,Missoulian,2015/09/23,Family in need after Bonner fire destroyed hom...
...,...,...,...
3741,Missoulian,2016/09/21,Prep Athletes of the Week for Oct. 9\n
3742,Missoulian,2016/09/21,Rand Paul to be in Davenport next Tuesday\n
3743,Missoulian,2016/09/21,Santorum touts hardline immigration enforcemen...
3744,Missoulian,2016/09/21,Team Trump Montana not getting much support fr...
