#### Data preparation - text normalization

In [1]:
import pandas as pd
%matplotlib inline
import numpy as npt

In [2]:
df = pd.read_csv("bbc-news-data.csv",sep='\t')
df.head()

Unnamed: 0,category,filename,title,content
0,business,001.txt,Ad sales boost Time Warner profit,Quarterly profits at US media giant TimeWarne...
1,business,002.txt,Dollar gains on Greenspan speech,The dollar has hit its highest level against ...
2,business,003.txt,Yukos unit buyer faces loan claim,The owners of embattled Russian oil giant Yuk...
3,business,004.txt,High fuel prices hit BA's profits,British Airways has blamed high fuel prices f...
4,business,005.txt,Pernod takeover talk lifts Domecq,Shares in UK drinks and food firm Allied Dome...


In [3]:
#selecting columns with category and content
df = df.iloc[:, [0, 3]]
pd.set_option('display.max_colwidth', 300)
df.head()

Unnamed: 0,category,content
0,business,"Quarterly profits at US media giant TimeWarner jumped 76% to $1.13bn (£600m) for the three months to December, from $639m year-earlier. The firm, which is now one of the biggest investors in Google, benefited from sales of high-speed internet connections and higher advert sales. TimeWarner sai..."
1,business,The dollar has hit its highest level against the euro in almost three months after the Federal Reserve head said the US trade deficit is set to stabilise. And Alan Greenspan highlighted the US government's willingness to curb spending and rising household savings as factors which may help to r...
2,business,The owners of embattled Russian oil giant Yukos are to ask the buyer of its former production unit to pay back a $900m (£479m) loan. State-owned Rosneft bought the Yugansk unit for $9.3bn in a sale forced by Russia to part settle a $27.5bn tax claim against Yukos. Yukos' owner Menatep Group sa...
3,business,"British Airways has blamed high fuel prices for a 40% drop in profits. Reporting its results for the three months to 31 December 2004, the airline made a pre-tax profit of £75m ($141m) compared with £125m a year earlier. Rod Eddington, BA's chief executive, said the results were ""respectable"" ..."
4,business,"Shares in UK drinks and food firm Allied Domecq have risen on speculation that it could be the target of a takeover by France's Pernod Ricard. Reports in the Wall Street Journal and the Financial Times suggested that the French spirits firm is considering a bid, but has yet to contact its targ..."


In [4]:
#Convert to lowercase
df['lowercase_step1'] = df['content'].str.lower()
df.head()

Unnamed: 0,category,content,lowercase_step1
0,business,"Quarterly profits at US media giant TimeWarner jumped 76% to $1.13bn (£600m) for the three months to December, from $639m year-earlier. The firm, which is now one of the biggest investors in Google, benefited from sales of high-speed internet connections and higher advert sales. TimeWarner sai...","quarterly profits at us media giant timewarner jumped 76% to $1.13bn (£600m) for the three months to december, from $639m year-earlier. the firm, which is now one of the biggest investors in google, benefited from sales of high-speed internet connections and higher advert sales. timewarner sai..."
1,business,The dollar has hit its highest level against the euro in almost three months after the Federal Reserve head said the US trade deficit is set to stabilise. And Alan Greenspan highlighted the US government's willingness to curb spending and rising household savings as factors which may help to r...,the dollar has hit its highest level against the euro in almost three months after the federal reserve head said the us trade deficit is set to stabilise. and alan greenspan highlighted the us government's willingness to curb spending and rising household savings as factors which may help to r...
2,business,The owners of embattled Russian oil giant Yukos are to ask the buyer of its former production unit to pay back a $900m (£479m) loan. State-owned Rosneft bought the Yugansk unit for $9.3bn in a sale forced by Russia to part settle a $27.5bn tax claim against Yukos. Yukos' owner Menatep Group sa...,the owners of embattled russian oil giant yukos are to ask the buyer of its former production unit to pay back a $900m (£479m) loan. state-owned rosneft bought the yugansk unit for $9.3bn in a sale forced by russia to part settle a $27.5bn tax claim against yukos. yukos' owner menatep group sa...
3,business,"British Airways has blamed high fuel prices for a 40% drop in profits. Reporting its results for the three months to 31 December 2004, the airline made a pre-tax profit of £75m ($141m) compared with £125m a year earlier. Rod Eddington, BA's chief executive, said the results were ""respectable"" ...","british airways has blamed high fuel prices for a 40% drop in profits. reporting its results for the three months to 31 december 2004, the airline made a pre-tax profit of £75m ($141m) compared with £125m a year earlier. rod eddington, ba's chief executive, said the results were ""respectable"" ..."
4,business,"Shares in UK drinks and food firm Allied Domecq have risen on speculation that it could be the target of a takeover by France's Pernod Ricard. Reports in the Wall Street Journal and the Financial Times suggested that the French spirits firm is considering a bid, but has yet to contact its targ...","shares in uk drinks and food firm allied domecq have risen on speculation that it could be the target of a takeover by france's pernod ricard. reports in the wall street journal and the financial times suggested that the french spirits firm is considering a bid, but has yet to contact its targ..."


In [5]:
#Convert punctuation marks and numbers (\d+) to spaces
df['removed_punctuation_step2'] = df['lowercase_step1'].str.replace('[^\w\s\d+]',' ', regex=True)
df['removed_punctuation_step2'] = df['removed_punctuation_step2'].str.replace('\d+',' ', regex=True)
df.head()

Unnamed: 0,category,content,lowercase_step1,removed_punctuation_step2
0,business,"Quarterly profits at US media giant TimeWarner jumped 76% to $1.13bn (£600m) for the three months to December, from $639m year-earlier. The firm, which is now one of the biggest investors in Google, benefited from sales of high-speed internet connections and higher advert sales. TimeWarner sai...","quarterly profits at us media giant timewarner jumped 76% to $1.13bn (£600m) for the three months to december, from $639m year-earlier. the firm, which is now one of the biggest investors in google, benefited from sales of high-speed internet connections and higher advert sales. timewarner sai...",quarterly profits at us media giant timewarner jumped to bn m for the three months to december from m year earlier the firm which is now one of the biggest investors in google benefited from sales of high speed internet connections and higher advert sales timewarner said four...
1,business,The dollar has hit its highest level against the euro in almost three months after the Federal Reserve head said the US trade deficit is set to stabilise. And Alan Greenspan highlighted the US government's willingness to curb spending and rising household savings as factors which may help to r...,the dollar has hit its highest level against the euro in almost three months after the federal reserve head said the us trade deficit is set to stabilise. and alan greenspan highlighted the us government's willingness to curb spending and rising household savings as factors which may help to r...,the dollar has hit its highest level against the euro in almost three months after the federal reserve head said the us trade deficit is set to stabilise and alan greenspan highlighted the us government s willingness to curb spending and rising household savings as factors which may help to r...
2,business,The owners of embattled Russian oil giant Yukos are to ask the buyer of its former production unit to pay back a $900m (£479m) loan. State-owned Rosneft bought the Yugansk unit for $9.3bn in a sale forced by Russia to part settle a $27.5bn tax claim against Yukos. Yukos' owner Menatep Group sa...,the owners of embattled russian oil giant yukos are to ask the buyer of its former production unit to pay back a $900m (£479m) loan. state-owned rosneft bought the yugansk unit for $9.3bn in a sale forced by russia to part settle a $27.5bn tax claim against yukos. yukos' owner menatep group sa...,the owners of embattled russian oil giant yukos are to ask the buyer of its former production unit to pay back a m m loan state owned rosneft bought the yugansk unit for bn in a sale forced by russia to part settle a bn tax claim against yukos yukos owner menatep group says it...
3,business,"British Airways has blamed high fuel prices for a 40% drop in profits. Reporting its results for the three months to 31 December 2004, the airline made a pre-tax profit of £75m ($141m) compared with £125m a year earlier. Rod Eddington, BA's chief executive, said the results were ""respectable"" ...","british airways has blamed high fuel prices for a 40% drop in profits. reporting its results for the three months to 31 december 2004, the airline made a pre-tax profit of £75m ($141m) compared with £125m a year earlier. rod eddington, ba's chief executive, said the results were ""respectable"" ...",british airways has blamed high fuel prices for a drop in profits reporting its results for the three months to december the airline made a pre tax profit of m m compared with m a year earlier rod eddington ba s chief executive said the results were respectable in a third...
4,business,"Shares in UK drinks and food firm Allied Domecq have risen on speculation that it could be the target of a takeover by France's Pernod Ricard. Reports in the Wall Street Journal and the Financial Times suggested that the French spirits firm is considering a bid, but has yet to contact its targ...","shares in uk drinks and food firm allied domecq have risen on speculation that it could be the target of a takeover by france's pernod ricard. reports in the wall street journal and the financial times suggested that the french spirits firm is considering a bid, but has yet to contact its targ...",shares in uk drinks and food firm allied domecq have risen on speculation that it could be the target of a takeover by france s pernod ricard reports in the wall street journal and the financial times suggested that the french spirits firm is considering a bid but has yet to contact its targ...


In [6]:
#Lametization
import nltk
from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()

def lemmatize_text(text):
    tokens = nltk.word_tokenize(text)
    # lemmatization of 'v' - verbs and 'n' - nouns.
    lemmatized_tokens = [lemmatizer.lemmatize(token, 'v') for token in tokens]
    lemmatized_tokens = [lemmatizer.lemmatize(token, 'n') for token in lemmatized_tokens]
    # merge tokens together
    return ' '.join(lemmatized_tokens)

df['lemmatized_text_step3'] = df['removed_punctuation_step2'].apply(lemmatize_text)
df.head()


Unnamed: 0,category,content,lowercase_step1,removed_punctuation_step2,lemmatized_text_step3
0,business,"Quarterly profits at US media giant TimeWarner jumped 76% to $1.13bn (£600m) for the three months to December, from $639m year-earlier. The firm, which is now one of the biggest investors in Google, benefited from sales of high-speed internet connections and higher advert sales. TimeWarner sai...","quarterly profits at us media giant timewarner jumped 76% to $1.13bn (£600m) for the three months to december, from $639m year-earlier. the firm, which is now one of the biggest investors in google, benefited from sales of high-speed internet connections and higher advert sales. timewarner sai...",quarterly profits at us media giant timewarner jumped to bn m for the three months to december from m year earlier the firm which is now one of the biggest investors in google benefited from sales of high speed internet connections and higher advert sales timewarner said four...,quarterly profit at u medium giant timewarner jump to bn m for the three month to december from m year earlier the firm which be now one of the biggest investor in google benefit from sale of high speed internet connection and higher advert sale timewarner say fourth quarter sale rise to bn from...
1,business,The dollar has hit its highest level against the euro in almost three months after the Federal Reserve head said the US trade deficit is set to stabilise. And Alan Greenspan highlighted the US government's willingness to curb spending and rising household savings as factors which may help to r...,the dollar has hit its highest level against the euro in almost three months after the federal reserve head said the us trade deficit is set to stabilise. and alan greenspan highlighted the us government's willingness to curb spending and rising household savings as factors which may help to r...,the dollar has hit its highest level against the euro in almost three months after the federal reserve head said the us trade deficit is set to stabilise and alan greenspan highlighted the us government s willingness to curb spending and rising household savings as factors which may help to r...,the dollar have hit it highest level against the euro in almost three month after the federal reserve head say the u trade deficit be set to stabilise and alan greenspan highlight the u government s willingness to curb spend and rise household save a factor which may help to reduce it in late tr...
2,business,The owners of embattled Russian oil giant Yukos are to ask the buyer of its former production unit to pay back a $900m (£479m) loan. State-owned Rosneft bought the Yugansk unit for $9.3bn in a sale forced by Russia to part settle a $27.5bn tax claim against Yukos. Yukos' owner Menatep Group sa...,the owners of embattled russian oil giant yukos are to ask the buyer of its former production unit to pay back a $900m (£479m) loan. state-owned rosneft bought the yugansk unit for $9.3bn in a sale forced by russia to part settle a $27.5bn tax claim against yukos. yukos' owner menatep group sa...,the owners of embattled russian oil giant yukos are to ask the buyer of its former production unit to pay back a m m loan state owned rosneft bought the yugansk unit for bn in a sale forced by russia to part settle a bn tax claim against yukos yukos owner menatep group says it...,the owner of embattle russian oil giant yukos be to ask the buyer of it former production unit to pay back a m m loan state own rosneft buy the yugansk unit for bn in a sale force by russia to part settle a bn tax claim against yukos yukos owner menatep group say it will ask rosneft to repay a l...
3,business,"British Airways has blamed high fuel prices for a 40% drop in profits. Reporting its results for the three months to 31 December 2004, the airline made a pre-tax profit of £75m ($141m) compared with £125m a year earlier. Rod Eddington, BA's chief executive, said the results were ""respectable"" ...","british airways has blamed high fuel prices for a 40% drop in profits. reporting its results for the three months to 31 december 2004, the airline made a pre-tax profit of £75m ($141m) compared with £125m a year earlier. rod eddington, ba's chief executive, said the results were ""respectable"" ...",british airways has blamed high fuel prices for a drop in profits reporting its results for the three months to december the airline made a pre tax profit of m m compared with m a year earlier rod eddington ba s chief executive said the results were respectable in a third...,british airway have blame high fuel price for a drop in profit report it result for the three month to december the airline make a pre tax profit of m m compare with m a year earlier rod eddington ba s chief executive say the result be respectable in a third quarter when fuel cost rise by m or b...
4,business,"Shares in UK drinks and food firm Allied Domecq have risen on speculation that it could be the target of a takeover by France's Pernod Ricard. Reports in the Wall Street Journal and the Financial Times suggested that the French spirits firm is considering a bid, but has yet to contact its targ...","shares in uk drinks and food firm allied domecq have risen on speculation that it could be the target of a takeover by france's pernod ricard. reports in the wall street journal and the financial times suggested that the french spirits firm is considering a bid, but has yet to contact its targ...",shares in uk drinks and food firm allied domecq have risen on speculation that it could be the target of a takeover by france s pernod ricard reports in the wall street journal and the financial times suggested that the french spirits firm is considering a bid but has yet to contact its targ...,share in uk drink and food firm ally domecq have rise on speculation that it could be the target of a takeover by france s pernod ricard report in the wall street journal and the financial time suggest that the french spirit firm be consider a bid but have yet to contact it target ally domecq sh...


In [7]:
#Checking the lameltization
k1 = df.loc[0,"lemmatized_text_step3"]
k2 =df.loc[0,"removed_punctuation_step2"] 
if k1 != k2:
    print("The cells are different")
else:
    print("The cells are the same")

The cells are different


In [8]:
#removing stopwords

import nltk
# nltk.download('punkt')
# nltk.download('stopwords')
from nltk.corpus import stopwords

def preprocess_text(text):
    tokens = nltk.word_tokenize(text)
    tokens_without_stopwords = [token for token in tokens if token not in stop_words]
    return ' '.join(tokens_without_stopwords)

stop_words = set(stopwords.words('english'))


df['without_stopwords_step4'] = df['lemmatized_text_step3'].apply(preprocess_text)


df.head()


Unnamed: 0,category,content,lowercase_step1,removed_punctuation_step2,lemmatized_text_step3,without_stopwords_step4
0,business,"Quarterly profits at US media giant TimeWarner jumped 76% to $1.13bn (£600m) for the three months to December, from $639m year-earlier. The firm, which is now one of the biggest investors in Google, benefited from sales of high-speed internet connections and higher advert sales. TimeWarner sai...","quarterly profits at us media giant timewarner jumped 76% to $1.13bn (£600m) for the three months to december, from $639m year-earlier. the firm, which is now one of the biggest investors in google, benefited from sales of high-speed internet connections and higher advert sales. timewarner sai...",quarterly profits at us media giant timewarner jumped to bn m for the three months to december from m year earlier the firm which is now one of the biggest investors in google benefited from sales of high speed internet connections and higher advert sales timewarner said four...,quarterly profit at u medium giant timewarner jump to bn m for the three month to december from m year earlier the firm which be now one of the biggest investor in google benefit from sale of high speed internet connection and higher advert sale timewarner say fourth quarter sale rise to bn from...,quarterly profit u medium giant timewarner jump bn three month december year earlier firm one biggest investor google benefit sale high speed internet connection higher advert sale timewarner say fourth quarter sale rise bn bn profit buoy one gain offset profit dip warner bros le user aol time w...
1,business,The dollar has hit its highest level against the euro in almost three months after the Federal Reserve head said the US trade deficit is set to stabilise. And Alan Greenspan highlighted the US government's willingness to curb spending and rising household savings as factors which may help to r...,the dollar has hit its highest level against the euro in almost three months after the federal reserve head said the us trade deficit is set to stabilise. and alan greenspan highlighted the us government's willingness to curb spending and rising household savings as factors which may help to r...,the dollar has hit its highest level against the euro in almost three months after the federal reserve head said the us trade deficit is set to stabilise and alan greenspan highlighted the us government s willingness to curb spending and rising household savings as factors which may help to r...,the dollar have hit it highest level against the euro in almost three month after the federal reserve head say the u trade deficit be set to stabilise and alan greenspan highlight the u government s willingness to curb spend and rise household save a factor which may help to reduce it in late tr...,dollar hit highest level euro almost three month federal reserve head say u trade deficit set stabilise alan greenspan highlight u government willingness curb spend rise household save factor may help reduce late trade new york dollar reach euro thursday market concern deficit hit greenback rece...
2,business,The owners of embattled Russian oil giant Yukos are to ask the buyer of its former production unit to pay back a $900m (£479m) loan. State-owned Rosneft bought the Yugansk unit for $9.3bn in a sale forced by Russia to part settle a $27.5bn tax claim against Yukos. Yukos' owner Menatep Group sa...,the owners of embattled russian oil giant yukos are to ask the buyer of its former production unit to pay back a $900m (£479m) loan. state-owned rosneft bought the yugansk unit for $9.3bn in a sale forced by russia to part settle a $27.5bn tax claim against yukos. yukos' owner menatep group sa...,the owners of embattled russian oil giant yukos are to ask the buyer of its former production unit to pay back a m m loan state owned rosneft bought the yugansk unit for bn in a sale forced by russia to part settle a bn tax claim against yukos yukos owner menatep group says it...,the owner of embattle russian oil giant yukos be to ask the buyer of it former production unit to pay back a m m loan state own rosneft buy the yugansk unit for bn in a sale force by russia to part settle a bn tax claim against yukos yukos owner menatep group say it will ask rosneft to repay a l...,owner embattle russian oil giant yukos ask buyer former production unit pay back loan state rosneft buy yugansk unit bn sale force russia part settle bn tax claim yukos yukos owner menatep group say ask rosneft repay loan yugansk secure asset rosneft already face similar repayment demand foreign...
3,business,"British Airways has blamed high fuel prices for a 40% drop in profits. Reporting its results for the three months to 31 December 2004, the airline made a pre-tax profit of £75m ($141m) compared with £125m a year earlier. Rod Eddington, BA's chief executive, said the results were ""respectable"" ...","british airways has blamed high fuel prices for a 40% drop in profits. reporting its results for the three months to 31 december 2004, the airline made a pre-tax profit of £75m ($141m) compared with £125m a year earlier. rod eddington, ba's chief executive, said the results were ""respectable"" ...",british airways has blamed high fuel prices for a drop in profits reporting its results for the three months to december the airline made a pre tax profit of m m compared with m a year earlier rod eddington ba s chief executive said the results were respectable in a third...,british airway have blame high fuel price for a drop in profit report it result for the three month to december the airline make a pre tax profit of m m compare with m a year earlier rod eddington ba s chief executive say the result be respectable in a third quarter when fuel cost rise by m or b...,british airway blame high fuel price drop profit report result three month december airline make pre tax profit compare year earlier rod eddington ba chief executive say result respectable third quarter fuel cost rise ba profit still better market expectation expect rise full year revenue help o...
4,business,"Shares in UK drinks and food firm Allied Domecq have risen on speculation that it could be the target of a takeover by France's Pernod Ricard. Reports in the Wall Street Journal and the Financial Times suggested that the French spirits firm is considering a bid, but has yet to contact its targ...","shares in uk drinks and food firm allied domecq have risen on speculation that it could be the target of a takeover by france's pernod ricard. reports in the wall street journal and the financial times suggested that the french spirits firm is considering a bid, but has yet to contact its targ...",shares in uk drinks and food firm allied domecq have risen on speculation that it could be the target of a takeover by france s pernod ricard reports in the wall street journal and the financial times suggested that the french spirits firm is considering a bid but has yet to contact its targ...,share in uk drink and food firm ally domecq have rise on speculation that it could be the target of a takeover by france s pernod ricard report in the wall street journal and the financial time suggest that the french spirit firm be consider a bid but have yet to contact it target ally domecq sh...,share uk drink food firm ally domecq rise speculation could target takeover france pernod ricard report wall street journal financial time suggest french spirit firm consider bid yet contact target ally domecq share london rise gmt pernod share paris slip pernod say seek acquisition refuse comme...


In [9]:
#Remove rare words
import pandas as pd
from collections import Counter

# Calculate word frequencies
word_frequencies = Counter(' '.join(df['without_stopwords_step4']).split())


rare_word_threshold = 6 

# Get the rare words
rare_words = [word for word, freq in word_frequencies.items() if freq <= rare_word_threshold]

# Remove rare words from the text
df['final_text'] = df['without_stopwords_step4'].apply(lambda text: ' '.join([word for word in text.split() if word not in rare_words]))
df.head()


Unnamed: 0,category,content,lowercase_step1,removed_punctuation_step2,lemmatized_text_step3,without_stopwords_step4,final_text
0,business,"Quarterly profits at US media giant TimeWarner jumped 76% to $1.13bn (£600m) for the three months to December, from $639m year-earlier. The firm, which is now one of the biggest investors in Google, benefited from sales of high-speed internet connections and higher advert sales. TimeWarner sai...","quarterly profits at us media giant timewarner jumped 76% to $1.13bn (£600m) for the three months to december, from $639m year-earlier. the firm, which is now one of the biggest investors in google, benefited from sales of high-speed internet connections and higher advert sales. timewarner sai...",quarterly profits at us media giant timewarner jumped to bn m for the three months to december from m year earlier the firm which is now one of the biggest investors in google benefited from sales of high speed internet connections and higher advert sales timewarner said four...,quarterly profit at u medium giant timewarner jump to bn m for the three month to december from m year earlier the firm which be now one of the biggest investor in google benefit from sale of high speed internet connection and higher advert sale timewarner say fourth quarter sale rise to bn from...,quarterly profit u medium giant timewarner jump bn three month december year earlier firm one biggest investor google benefit sale high speed internet connection higher advert sale timewarner say fourth quarter sale rise bn bn profit buoy one gain offset profit dip warner bros le user aol time w...,quarterly profit u medium giant timewarner jump bn three month december year earlier firm one biggest investor google benefit sale high speed internet connection higher advert sale timewarner say fourth quarter sale rise bn bn profit buoy one gain offset profit dip warner bros le user aol time w...
1,business,The dollar has hit its highest level against the euro in almost three months after the Federal Reserve head said the US trade deficit is set to stabilise. And Alan Greenspan highlighted the US government's willingness to curb spending and rising household savings as factors which may help to r...,the dollar has hit its highest level against the euro in almost three months after the federal reserve head said the us trade deficit is set to stabilise. and alan greenspan highlighted the us government's willingness to curb spending and rising household savings as factors which may help to r...,the dollar has hit its highest level against the euro in almost three months after the federal reserve head said the us trade deficit is set to stabilise and alan greenspan highlighted the us government s willingness to curb spending and rising household savings as factors which may help to r...,the dollar have hit it highest level against the euro in almost three month after the federal reserve head say the u trade deficit be set to stabilise and alan greenspan highlight the u government s willingness to curb spend and rise household save a factor which may help to reduce it in late tr...,dollar hit highest level euro almost three month federal reserve head say u trade deficit set stabilise alan greenspan highlight u government willingness curb spend rise household save factor may help reduce late trade new york dollar reach euro thursday market concern deficit hit greenback rece...,dollar hit highest level euro almost three month federal reserve head say u trade deficit set stabilise alan greenspan highlight u government willingness curb spend rise household save factor may help reduce late trade new york dollar reach euro thursday market concern deficit hit recent month f...
2,business,The owners of embattled Russian oil giant Yukos are to ask the buyer of its former production unit to pay back a $900m (£479m) loan. State-owned Rosneft bought the Yugansk unit for $9.3bn in a sale forced by Russia to part settle a $27.5bn tax claim against Yukos. Yukos' owner Menatep Group sa...,the owners of embattled russian oil giant yukos are to ask the buyer of its former production unit to pay back a $900m (£479m) loan. state-owned rosneft bought the yugansk unit for $9.3bn in a sale forced by russia to part settle a $27.5bn tax claim against yukos. yukos' owner menatep group sa...,the owners of embattled russian oil giant yukos are to ask the buyer of its former production unit to pay back a m m loan state owned rosneft bought the yugansk unit for bn in a sale forced by russia to part settle a bn tax claim against yukos yukos owner menatep group says it...,the owner of embattle russian oil giant yukos be to ask the buyer of it former production unit to pay back a m m loan state own rosneft buy the yugansk unit for bn in a sale force by russia to part settle a bn tax claim against yukos yukos owner menatep group say it will ask rosneft to repay a l...,owner embattle russian oil giant yukos ask buyer former production unit pay back loan state rosneft buy yugansk unit bn sale force russia part settle bn tax claim yukos yukos owner menatep group say ask rosneft repay loan yugansk secure asset rosneft already face similar repayment demand foreign...,owner russian oil giant yukos ask buyer former production unit pay back loan state rosneft buy yugansk unit bn sale force russia part settle bn tax claim yukos yukos owner menatep group say ask rosneft repay loan yugansk secure asset rosneft already face similar repayment demand foreign bank leg...
3,business,"British Airways has blamed high fuel prices for a 40% drop in profits. Reporting its results for the three months to 31 December 2004, the airline made a pre-tax profit of £75m ($141m) compared with £125m a year earlier. Rod Eddington, BA's chief executive, said the results were ""respectable"" ...","british airways has blamed high fuel prices for a 40% drop in profits. reporting its results for the three months to 31 december 2004, the airline made a pre-tax profit of £75m ($141m) compared with £125m a year earlier. rod eddington, ba's chief executive, said the results were ""respectable"" ...",british airways has blamed high fuel prices for a drop in profits reporting its results for the three months to december the airline made a pre tax profit of m m compared with m a year earlier rod eddington ba s chief executive said the results were respectable in a third...,british airway have blame high fuel price for a drop in profit report it result for the three month to december the airline make a pre tax profit of m m compare with m a year earlier rod eddington ba s chief executive say the result be respectable in a third quarter when fuel cost rise by m or b...,british airway blame high fuel price drop profit report result three month december airline make pre tax profit compare year earlier rod eddington ba chief executive say result respectable third quarter fuel cost rise ba profit still better market expectation expect rise full year revenue help o...,british airway blame high fuel price drop profit report result three month december airline make pre tax profit compare year earlier rod ba chief executive say result respectable third quarter fuel cost rise ba profit still better market expectation expect rise full year revenue help offset incr...
4,business,"Shares in UK drinks and food firm Allied Domecq have risen on speculation that it could be the target of a takeover by France's Pernod Ricard. Reports in the Wall Street Journal and the Financial Times suggested that the French spirits firm is considering a bid, but has yet to contact its targ...","shares in uk drinks and food firm allied domecq have risen on speculation that it could be the target of a takeover by france's pernod ricard. reports in the wall street journal and the financial times suggested that the french spirits firm is considering a bid, but has yet to contact its targ...",shares in uk drinks and food firm allied domecq have risen on speculation that it could be the target of a takeover by france s pernod ricard reports in the wall street journal and the financial times suggested that the french spirits firm is considering a bid but has yet to contact its targ...,share in uk drink and food firm ally domecq have rise on speculation that it could be the target of a takeover by france s pernod ricard report in the wall street journal and the financial time suggest that the french spirit firm be consider a bid but have yet to contact it target ally domecq sh...,share uk drink food firm ally domecq rise speculation could target takeover france pernod ricard report wall street journal financial time suggest french spirit firm consider bid yet contact target ally domecq share london rise gmt pernod share paris slip pernod say seek acquisition refuse comme...,share uk drink food firm ally rise speculation could target takeover france pernod report wall street journal financial time suggest french spirit firm consider bid yet contact target ally share london rise gmt pernod share paris slip pernod say seek acquisition refuse comment specific pernod la...


In [10]:
#Save the normalized text to a file
df[['category', 'final_text']].to_csv("normalized_text.csv", index=False)