# Methodology of text mining for the analysis of political texts. Content Analysis through machine learning

Alternative title: How to identify the existing patterns in political texts that we are analysing.

A typical task in the sector of political science and migration studies consists in the comparative analysis of texts that result from political speeches or interviews. This lets us determine similarity or differences between texts, and later allows us to gain insights over their structure.

We can approach this task through either qualitative or statistical methods. Among the statistical methods for text analysis lie the subclass of methods that comprise machine learning, which is itself a subcategory of artificial intelligence.

We will see the basis for this methodology, and also one of its applications for the comparison of political speeches.

![title](image.png)

# Managing the necessary imports

This is a technical requirement, with no interest for Political Science

In [3]:
import pandas as pd
pd.options.display.max_rows = 999
pd.options.display.max_columns = 999
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB as CLF

# First corpus of texts

This is the first corpus of texts, whose characteristics we are interested in studying

In [4]:
corpus = ['the cat is on the table',
     'the dog is in the room',
     'the table is in the room',
     'the cat is not a dog',
     'the cat and the dog are in the room',
     'the room is not a table']

print('These are the texts present in our corpus:\n')
for text in corpus:
    print(text)

These are the texts present in our corpus:

the cat is on the table
the dog is in the room
the table is in the room
the cat is not a dog
the cat and the dog are in the room
the room is not a table


# Labels

Here we will see which labels are associated with the texts present in our corpus. The texts are divided accordingly to two labels, and on the basis of whether they talk about animals or not.

In [5]:
targets = ['animals',
           'animals',
           'not_animals',
           'animals',
           'animals',
           'not_animals']
pd.DataFrame({'texts':corpus,'labels':targets})

Unnamed: 0,texts,labels
0,the cat is on the table,animals
1,the dog is in the room,animals
2,the table is in the room,not_animals
3,the cat is not a dog,animals
4,the cat and the dog are in the room,animals
5,the room is not a table,not_animals


# Extracting the Bag-of-Words matrix

The Bag-of-Words matrix is composed of the absolute frequencies of occurrence of the words which are part of the text collection. Its rows correspond to each individual document or text, and its columns correspond to each individual word.

In [6]:
cv = CountVectorizer(token_pattern='\w+')
matrix = cv.fit_transform(corpus)
features = cv.get_feature_names()
matrix = matrix.todense()
df = pd.DataFrame(matrix)
df.columns = features
df['targets']=targets
print(df.shape)
df

(6, 13)


Unnamed: 0,a,and,are,cat,dog,in,is,not,on,room,table,the,targets
0,0,0,0,1,0,0,1,0,1,0,1,2,animals
1,0,0,0,0,1,1,1,0,0,1,0,2,animals
2,0,0,0,0,0,1,1,0,0,1,1,2,not_animals
3,1,0,0,1,1,0,1,1,0,0,0,1,animals
4,0,1,1,1,1,1,0,0,0,1,0,3,animals
5,1,0,0,0,0,0,1,1,0,1,1,1,not_animals


# Classification

We then use a classification algorithm to learn the rule accordingly to which texts are associated with their labels.

These rules allows us to identify the abstract predictors of class affiliation of a text.

This particular classifier is a Naive Bayesian classifier, which solves the classification task by first computing Bayes' Theorem for each feature of the input in respect to each labels, as follows:

\begin{equation}
   P(A | B) = \frac{P(B | A) * P(A)}{P(B)}
\end{equation}

The classifier then determines the class affiliation of a text vector on the basis of a smoothed version of the maximum likelihood.

In [18]:
P_A = df['room'].count()/df.count().sum() # Apriori probability of the word "room"

P_B = (df['targets'] == 'animals').sum()/df.shape[0] # Apriori probability of the label "animals"

P_B_A = df[(df['room']!=0) & (df['targets']=='animals')].shape[0] / df[df['room']!=0].shape[0] 
# Probability of the label "annimals", if we know that the word "room" is present

(P_B_A * P_A) / P_B

0.0625

In [7]:
bayes = pd.DataFrame(columns=df.columns)
bayes['targets'] = df['targets'].unique()
for i, label in enumerate(bayes['targets']):
    
    for word in df.columns[:-1]:
        
        P_A = df[word].count()/df.count().sum() 
        
        P_B = (df['targets'] == label).sum()/df.shape[0] 
        
        P_B_A = df[(df[word]!=0) & (df['targets']==label)].shape[0] / df[df[word]!=0].shape[0] 
        
        bayes.loc[i,word] = (P_B_A * P_A) / P_B

bayes

Unnamed: 0,a,and,are,cat,dog,in,is,not,on,room,table,the,targets
0,0.0576923,0.115385,0.115385,0.115385,0.115385,0.0769231,0.0692308,0.0576923,0.115385,0.0576923,0.0384615,0.0769231,animals
1,0.115385,0.0,0.0,0.0,0.0,0.0769231,0.0923077,0.115385,0.0,0.115385,0.153846,0.0769231,not_animals


In [8]:
clf = CLF()
X = df[df.columns[:-1]]
y = df['targets']
clf.fit(X,y)

coefficients = clf.coef_.reshape(-1,1)
predictors = pd.DataFrame(coefficients)
predictors.index = features
predictors.columns = ['keywords']
print()
print(predictors.sort_values(by='keywords'))


       keywords
and   -3.178054
are   -3.178054
cat   -3.178054
dog   -3.178054
on    -3.178054
a     -2.484907
in    -2.484907
not   -2.484907
is    -2.079442
room  -2.079442
table -2.079442
the   -1.791759


# Stopwords removal

In order to increase the explicatory capacity of the algorithm, we can remove those words that, a priori, we can imagine to be the most frequent words in any text in a given language. 

These words normally include articles, prenouns, and the most common verbs.

In [9]:
cv = CountVectorizer(stop_words='english')
matrix = cv.fit_transform(corpus)
features = cv.get_feature_names()
matrix = matrix.todense()
df = pd.DataFrame(matrix)
df.columns = features
df['targets']=targets
print(df.shape)
df

(6, 5)


Unnamed: 0,cat,dog,room,table,targets
0,1,0,0,1,animals
1,0,1,1,0,animals
2,0,0,1,1,not_animals
3,1,1,0,0,animals
4,1,1,1,0,animals
5,0,0,1,1,not_animals


In [10]:
clf = CLF()
X = df[df.columns[:-1]]
y = df['targets']
clf.fit(X,y)

coefficients = clf.coef_.reshape(-1,1)
predictors = pd.DataFrame(coefficients)
predictors.index = features
predictors.columns = ['keywords']
print()
print(predictors.sort_values(by='keywords'))


       keywords
cat   -2.079442
dog   -2.079442
room  -0.980829
table -0.980829


In [8]:
import pickle
with open('dataset.pickle','rb') as handle:
    df = pickle.load(handle)
print(df.shape)
df.head()

(1945, 17)


Unnamed: 0,full_name,country,parliament_group,id,national_party,first_name,last_name,group_id,personal_url,date,referenceList,title,titleUrl,html_text,language_code,text_raw,language
0,Lars ADAKTUSSON,Sweden,Group of the European People's Party (Christia...,124990,Kristdemokraterna,Lars,Adaktusson,Chr_Dem,http://www.europarl.europa.eu/meps/en/124990/L...,12-06-2018,[P8_CRE-REV(2018)06-12(16)],EU-NATO relations (debate),http://www.europarl.europa.eu/sides/getDoc.do?...,"<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 T...",en - English,"\n – Madam President, with developing instab...",english
1,Lars ADAKTUSSON,Sweden,Group of the European People's Party (Christia...,124990,Kristdemokraterna,Lars,Adaktusson,Chr_Dem,http://www.europarl.europa.eu/meps/en/124990/L...,31-05-2018,[P8_CRE-REV(2018)05-31(5.1)],Situation of imprisoned EU-Iranian dual nation...,http://www.europarl.europa.eu/sides/getDoc.do?...,"<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 T...",sv - svenska,För att inte störa relationerna till regimen h...,swedish
2,Lars ADAKTUSSON,Sweden,Group of the European People's Party (Christia...,124990,Kristdemokraterna,Lars,Adaktusson,Another_Class,http://www.europarl.europa.eu/meps/en/124990/L...,17-04-2018,[P8_CRE-REV(2018)04-17(16)],Situation in Russia (debate),http://www.europarl.europa.eu/sides/getDoc.do?...,"<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 T...",en - English,There is no reason for failing to act with res...,english
3,Lars ADAKTUSSON,Sweden,Group of the European People's Party (Christia...,124990,Kristdemokraterna,Lars,Adaktusson,Chr_Dem,http://www.europarl.europa.eu/meps/en/124990/L...,01-03-2018,[P8_CRE-REV(2018)03-01(4)],Cutting the sources of income for Jihadists - ...,http://www.europarl.europa.eu/sides/getDoc.do?...,"<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 T...",en - English,"Another lesson is that, after the dismantling ...",english
4,Lars ADAKTUSSON,Sweden,Group of the European People's Party (Christia...,124990,Kristdemokraterna,Lars,Adaktusson,One_Class,http://www.europarl.europa.eu/meps/en/124990/L...,17-01-2018,[P8_CRE-REV(2018)01-17(16)],Russia - the influence of propaganda on EU cou...,http://www.europarl.europa.eu/sides/getDoc.do?...,"<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 T...",en - English,"When it comes to confronting this, the High Re...",english


In [9]:
df.loc[10]['text_raw']

'On these most fundamental human rights, the Russian Federation is falling short. As representatives of the EU, we must condemn the unjust sentencing and demand the immediate release of Crimean Tatar leaders, Akhtem Chiygoz and Ilmi Umerov. The charges against journalist Mykola Semena should be dropped. Moreover, the EU must condemn the persecution and unfair policies against Crimean Tatars and continue to defend the European security order as we have known it since World War II. '

In [10]:
df['full_name'].unique()

array(['Lars ADAKTUSSON', 'Asim ADEMOV', 'Isabella ADINOLFI',
       'Marco AFFRONTE', 'Laura AGEA', 'John Stuart AGNEW',
       'Clara Eugenia AGUILERA GARCÍA', 'Daniela AIUTO', 'Tim AKER',
       'Marina ALBIOL GUZMÁN'], dtype=object)

In [11]:
df['country'].unique()

array(['Sweden', 'Bulgaria', 'Italy', 'United Kingdom', 'Spain'],
      dtype=object)

In [12]:
df[df['country']=='United Kingdom']['full_name'].unique()

array(['John Stuart AGNEW', 'Tim AKER'], dtype=object)

In [13]:
UK = df[df['country']=='United Kingdom']
cv = CountVectorizer(stop_words='english')
matrix = cv.fit_transform(UK['text_raw']).todense()
UK_DF = pd.DataFrame(matrix)
features = cv.get_feature_names()
UK_DF.columns = features
print(UK_DF.shape)
UK_DF.head()

(322, 2793)


Unnamed: 0,000,10,100,106,108,12,120,143,148,15,168,17,178,1951,196,1975,1980,1992,1996,20,200,2000,2002,2007,2008,2010,2013,2014,2015,2016,2017,2018,2020,22,224,23,24,28,293,30,300,31,330,35,356,36,364,40,400,42,441,442,47,48,50,500,510,55,554,56,576,59,60,600,649,650,651,66,72,76,800,834,918,944,957,abandon,abduction,abhorrent,able,absolute,absolutely,abstain,abstained,abstention,abuse,abuses,accede,acceding,accelerated,access,accession,accordance,according,accordingly,account,accountability,accountable,accounts,accurate,accused,achieve,achieving,acknowledge,acp,acquis,acquisition,acquittal,acres,act,acting,action,actions,activation,active,activist,activities,activity,actors,acts,actual,actually,ad,add,added,addition,additional,address,addresses,addressing,adhere,adhered,adjustment,administration,administrations,administrative,admission,admits,adopt,adopting,adoption,advancing,advantage,advice,advisory,affairs,affect,afford,africa,african,afternoon,age,agencies,agency,agenda,ago,agree,agreed,agreement,agreements,agricultural,agriculture,ahead,aid,aim,aimed,aims,air,alarmism,albania,albanians,alia,align,aligning,alignment,alleged,allegedly,allocated,allocation,allow,allowances,allowed,allowing,allows,alpine,alternates,alternative,altogether,alyn,ambiguity,amended,amendment,amendments,america,amounts,anderson,angering,anglers,animal,animals,announce,announced,annual,answer,anti,antibiotic,antibiotics,antimicrobial,appalling,appeal,appear,appease,appliances,applicable,application,applications,applied,apply,applying,appointments,appoints,approach,appropriate,appropriately,appropriations,approval,approve,approved,approximately,arabia,area,areas,arena,argument,arguments,armed,arming,armoured,arms,arrangements,artemis,arteries,article,articles,asbestos,asia,aside,ask,asked,asking,asks,aspects,assault,assembly,assessment,associate,associated,association,asylum,atkinson,atomic,attack,attacked,attacks,attempt,attempted,attempts,attendance,australia,authorisation,authoritarian,authorities,authority,available,aware,away,baby,backed,bad,balance,ballot,ban,banking,banned,bannerman,banning,barbaric,barriers,base,based,basic,basically,basis,bean,bearing,bears,bed,beef,bees,beet,beg,begin,behalf,behave,behaviour,behavioural,beings,believe,believed,believes,belong,belongs,benefit,benefited,benefits,best,better,big,bigger,biggest,bigots,bilateral,billion,bills,binding,bio,bit,blair,blame,blank,blasting,blind,bloc,block,blocking,bloomberg,bodies,body,bolsters,boost,border,borders,borghezio,bosnia,bottle,bound,box,brake,branded,brave,brazil,breaches,break,breakdown,breeders,breeding,brexit,brezhnev,brief,bring,bringing,brings,britain,british,brits,brought,bruel,brussels,budget,budgetary,budgeting,budgets,build,building,bulgaria,bully,bullying,burden,bureaucracy,bureaucratic,bureaucratise,bureaucrats,bury,business,businesses,businessmen,buzzwords,cadmium,calais,calf,called,calling,calls,came,cameron,camp,campaign,campaigns,campbell,canada,candidate,capacity,capita,capital,carbon,care,carers,caribbean,carry,cart,case,cash,cast,castro,catastrophic,catch,categorically,cato,cats,cattle,caucasus,caught,cause,caused,causes,causing,cctv,ceilings,censure,cent,centrally,centres,cents,cereal,certain,certainly,cfsp,chains,chairs,challenge,challenged,chamber,chance,chancellor,change,changed,changes,charge,charter,cheap,chemicals,child,children,china,chinese,chips,choice,choosing,chose,chosen,chrysotile,churchill,circumvention,cites,citizens,city,civil,claim,clamp,clarity,class,classes,classroom,clause,clear,clegg,climate,cling,clog,cloned,cloning,close,closer,closing,closure,club,clue,co2,coal,coast,cofferati,coherence,cohesion,collaboration,collapse,collapsed,collateral,colleague,collection,collective,...,served,serves,service,services,set,setting,settle,settled,settlement,severe,sexual,shale,shape,share,shares,sharing,shied,shine,shocking,shop,shouldn,sick,sides,sight,sign,signed,significant,signing,signs,similarly,simple,simplification,simply,singer,single,singular,sins,sir,sites,sits,situation,situations,size,skilled,slow,small,smes,smith,social,socialism,socialist,society,soft,soil,solar,sold,solution,solutions,solve,somebody,soon,sooner,sort,sorts,sought,sound,sounding,sounds,source,sources,south,southern,sovereign,sovereignty,soybean,soybeans,speak,speaker,speakers,speaks,special,species,specific,specifically,speculated,speech,spend,spending,spent,spirit,spoken,spokesman,spraying,sprouts,squeezed,stable,staff,stage,stand,standardised,standards,standing,staring,start,state,statements,states,stating,status,stay,stayed,staying,stays,step,stifles,stocks,stone,stop,stopped,stopping,storage,straight,straightjacket,strategies,strategy,stray,streets,strengthening,strong,strongholds,strongly,structural,structure,studies,subject,subsistence,substances,succeed,success,successful,successive,succumb,sudden,suffer,suffered,suffering,sugar,suggested,suggesting,summer,summit,sun,superstate,supervision,supervisory,supply,support,supported,supporters,supporting,supportive,supports,supposed,sure,surely,surprise,surprised,survive,suspects,sustain,sustainable,swedish,sweeps,swiss,switzerland,sympathise,sympathises,symphony,syria,syrian,systems,table,tabled,tackle,taken,taking,talk,talked,talking,talks,tape,target,tariff,task,tax,taxation,taxes,taxpayer,taxpayers,tear,technical,technically,techniques,technologies,technology,tell,tenanted,tenants,tendency,tendering,tenfold,tens,tensions,term,terms,terrible,territories,terrorism,teu,text,tfas,tfeu,thalidomide,thank,theirs,theresa,thessaloniki,thing,things,think,thinking,thinks,thirdly,thought,thousands,threaten,threw,thriving,throwing,thurrock,ticket,tide,ties,tilbury,time,times,tinfoil,tired,tisa,titanic,title,tobacco,today,tokenism,told,tolerance,tomorrow,took,tool,topic,toppled,tor,torture,tory,total,totally,touch,tower,town,track,trade,trading,tradition,trafficking,tragedy,tragic,tragically,trail,train,transfer,transformed,transit,translation,transnistria,transparency,transport,transposition,travel,treaties,treatment,treaty,trials,tried,trips,trouble,truss,trust,trustworthy,try,trying,ttip,tune,tunisia,turf,turkey,turn,ujhelyi,uk,ukip,ukraine,ultimately,unacceptable,unaccountable,unanimously,unbreakable,uncertainty,unconvinced,undeclared,undemocratic,undercut,underlying,underneath,understand,understanding,undertaken,undo,unelected,unemployed,unemployment,unfair,unfettered,unfortunately,unintended,union,united,universal,university,unlawful,unless,unnecessary,unofficial,unravelled,unrealistic,unreasonable,unstable,unusual,upcoming,update,uphold,urge,urges,uruguay,use,used,useful,using,usual,usually,utilise,utterly,vaccination,vague,value,van,vanity,varieties,variety,various,vast,vat,vegan,vegetable,vehemently,vehicles,verbose,verroa,vessels,veterinary,viable,vice,victim,victims,vietnam,view,views,vigilance,vindictive,violate,violation,violence,visa,visitors,visualize,vital,voluntary,volunteering,vote,voted,voters,votes,voting,waffle,wage,wait,waive,waiver,walking,wall,want,wanting,wants,war,warm,warmer,warming,warnings,warranted,wary,wasn,waste,wasting,watch,watching,waters,waterways,wave,waving,way,ways,weaker,weapon,weapons,weather,week,weeks,weight,welcome,welcomes,welfare,went,west,western,westminster,whatsoever,whilst,wide,wider,widespread,wildlife,willing,willingness,win,wince,wine,winston,winter,wish,wishes,witnessed,woman,women,won,wonder,word,wording,words,work,worked,workers,workforce,working,workplace,works,world,worry,worse,worth,worthy,wounds,wrapped,written,wrong,wrongdoing,wrote,wto,year,yearn,years,yemen,yes,yield,young,youth,zarianopoulos,zero,zimbabwe,zone,zoonotic,état
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,1,0,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [16]:
df[df['country']=='United Kingdom']

Unnamed: 0,full_name,country,parliament_group,id,national_party,first_name,last_name,group_id,personal_url,date,referenceList,title,titleUrl,html_text,language_code,text_raw,language
844,John Stuart AGNEW,United Kingdom,Europe of Freedom and Direct Democracy Group,96897,United Kingdom Independence Party,John Stuart,Agnew,Fre_Dir,http://www.europarl.europa.eu/meps/en/96897/JO...,12-09-2018,[P8_CRE-PROV(2018)09-12(14)],A European Strategy for Plastics in a circular...,http://www.europarl.europa.eu/sides/getDoc.do?...,"<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 T...",en - English,\n\n\n – The argument is that it we should al...,english
845,John Stuart AGNEW,United Kingdom,Europe of Freedom and Direct Democracy Group,96897,United Kingdom Independence Party,John Stuart,Agnew,Fre_Dir,http://www.europarl.europa.eu/meps/en/96897/JO...,12-09-2018,[P8_CRE-PROV(2018)09-12(16)],A European One Health Action Plan against Anti...,http://www.europarl.europa.eu/sides/getDoc.do?...,"<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 T...",en - English,"\n – Madam President, agriculture has a part...",english
846,John Stuart AGNEW,United Kingdom,Europe of Freedom and Direct Democracy Group,96897,United Kingdom Independence Party,John Stuart,Agnew,One_Class,http://www.europarl.europa.eu/meps/en/96897/JO...,11-06-2018,[P8_CRE-REV(2018)06-11(21)],One-minute speeches on matters of political im...,http://www.europarl.europa.eu/sides/getDoc.do?...,"<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 T...",en - English,It is the Commissioners’ responsibility to ens...,english
847,John Stuart AGNEW,United Kingdom,Europe of Freedom and Direct Democracy Group,96897,United Kingdom Independence Party,John Stuart,Agnew,Fre_Dir,http://www.europarl.europa.eu/meps/en/96897/JO...,28-05-2018,[P8_CRE-REV(2018)05-28(26)],Implementation of CAP young farmers’ tools in ...,http://www.europarl.europa.eu/sides/getDoc.do?...,"<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 T...",en - English,The initiative by the Commission to give margi...,english
848,John Stuart AGNEW,United Kingdom,Europe of Freedom and Direct Democracy Group,96897,United Kingdom Independence Party,John Stuart,Agnew,Fre_Dir,http://www.europarl.europa.eu/meps/en/96897/JO...,02-05-2018,[P8_CRE-REV(2018)05-02(30)],Addressing farm safety in the EU (debate),http://www.europarl.europa.eu/sides/getDoc.do?...,"<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 T...",en - English,I also believe that EU legislation on chrysoti...,english
849,John Stuart AGNEW,United Kingdom,Europe of Freedom and Direct Democracy Group,96897,United Kingdom Independence Party,John Stuart,Agnew,One_Class,http://www.europarl.europa.eu/meps/en/96897/JO...,18-04-2018,[P8_CRE-REV(2018)04-18(22)],Organic production and labelling of organic pr...,http://www.europarl.europa.eu/sides/getDoc.do?...,"<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 T...",en - English,"However, if you talk to organic organisations,...",english
850,John Stuart AGNEW,United Kingdom,Europe of Freedom and Direct Democracy Group,96897,United Kingdom Independence Party,John Stuart,Agnew,Fre_Dir,http://www.europarl.europa.eu/meps/en/96897/JO...,16-04-2018,[P8_CRE-REV(2018)04-16(20)],Inclusion of greenhouse gas emissions and remo...,http://www.europarl.europa.eu/sides/getDoc.do?...,"<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 T...",en - English,"Finally, I would like to remind all the MEPs i...",english
851,John Stuart AGNEW,United Kingdom,Europe of Freedom and Direct Democracy Group,96897,United Kingdom Independence Party,John Stuart,Agnew,Fre_Dir,http://www.europarl.europa.eu/meps/en/96897/JO...,28-02-2018,[P8_CRE-REV(2018)02-28(26)],Prospects and challenges for the EU apiculture...,http://www.europarl.europa.eu/sides/getDoc.do?...,"<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 T...",en - English,Colony collapse disorder and verroa mite are t...,english
852,John Stuart AGNEW,United Kingdom,Europe of Freedom and Direct Democracy Group,96897,United Kingdom Independence Party,John Stuart,Agnew,Fre_Dir,http://www.europarl.europa.eu/meps/en/96897/JO...,06-02-2018,[P8_CRE-REV(2018)02-06(14)],Situation in Zimbabwe (debate),http://www.europarl.europa.eu/sides/getDoc.do?...,"<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 T...",en - English,Ninety-five per cent of the potential workforc...,english
853,John Stuart AGNEW,United Kingdom,Europe of Freedom and Direct Democracy Group,96897,United Kingdom Independence Party,John Stuart,Agnew,Fre_Dir,http://www.europarl.europa.eu/meps/en/96897/JO...,05-02-2018,[P8_CRE-REV(2018)02-05(20)],Geo-blocking and other forms of discrimination...,http://www.europarl.europa.eu/sides/getDoc.do?...,"<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 T...",en - English,If it worked for businesses to implement the a...,english


In [14]:
X = matrix
y = UK['last_name']

clf = CLF()

clf.fit(X,y)

coefficients = clf.coef_.reshape(-1,1)
predictors = pd.DataFrame(coefficients)
predictors.index = features
predictors.columns = ['keywords']
print(clf.score(X,y))
print(predictors.sort_values(by='keywords'))

0.8322981366459627
                   keywords
meat              -5.472271
sins              -5.472271
meaning           -5.472271
fraught           -5.472271
perceived         -5.472271
organic           -5.472271
censure           -5.472271
cent              -5.472271
meddling          -5.472271
centres           -5.472271
cents             -5.472271
significant       -5.472271
medicine          -5.472271
sight             -5.472271
meet              -5.472271
chains            -5.472271
demonstrate       -5.472271
mccarthy          -5.472271
sits              -5.472271
cattle            -5.472271
solar             -5.472271
carbon            -5.472271
soil              -5.472271
entry             -5.472271
dependent         -5.472271
demonstrating     -5.472271
principal         -5.472271
meeting           -5.472271
material          -5.472271
fruit             -5.472271
castro            -5.472271
catastrophic      -5.472271
catch             -5.472271
categorically     -5.472271
m