This example extracts question types from the UK Parliament Question Answer Sessions 
reproducing the asking too much paper (http://www.cs.cornell.edu/~cristian/Asking_too_much.html).
(due to the non-deterministic nature of clustering, the order of the clusters and some cluster assignments will vary)
This version uses precomputed motifs for speed.

In [1]:
import os
import pkg_resources
import numpy as np

from convokit import Corpus, QuestionTypology, download

Initializing QuestionTypology Class

In [2]:
num_clusters = 8

# Get precomputed motifs. data_dir contains the downloaded data. 
# motifs_dir is the specific path within data_dir that contains the precomputed motifs
data_dir = os.path.join(pkg_resources.resource_filename("convokit", ""), 'downloads', 'parliament')
motifs_dir = os.path.join(data_dir, 'parliament-motifs')

corpus = Corpus(filename=os.path.join(data_dir, 'parliament-corpus'))
questionTypology = QuestionTypology(corpus, data_dir, motifs_dir=motifs_dir, num_dims=25, 
  num_clusters=num_clusters, verbose=False)



questionTypology.types_to_data contains the necessary data that is computed in the step above
its keys are the indices of the clusters (here 0-7). The values are dictionaries with the following keys:
"motifs": the motifs, as a list of tuples of the motif terms
"motif_dists": the corresponding distances of each motif from the centroid of the cluster this motif is in
"fragments": the answer fragments, as a list of tuples of answer terms
"fragment_dists": the corresponding distances of each fragment from the centroid of the cluster this 
fragment is in
"questions": the IDs of the questions in this cluster. You can get the corresponding question text by using the
get_question_text_from_pair_idx(pair_idx) method.
"question_dists": the corresponding distances of each question from the centroid of the cluster 
this question is in

Display Outputs

In [3]:
questionTypology.display_totals()
print('10 examples for type 1-8:')
for i in range(num_clusters):
    questionTypology.display_motifs_for_type(i, num_egs=10)
    questionTypology.display_answer_fragments_for_type(i, num_egs=10)
    questionTypology.display_questions_for_type(i, num_egs=10)

example_question = "I thank the Minister for his response . He will be aware that the \
Northern Ireland Policing Board and the Chief Constable are concerned about a possible \
reduction in the police budget in the forthcoming financial year , and that there are \
increasing pressures on the budget as a result of policing the past , the ongoing \
inquiries , and the cost of the legal advice that the police need to secure in order \
to participate in them . However , does he agree that it is right that the Government \
provide adequate funding for the ordinary policing in the community that tackles all the \
matters that concern the people of Northern Ireland ? Does he accept that there should not \
be a reduction in the police budget , given the increasing costs of the inquiries that \
I have mentioned ? Will the Government do something to reduce the cost of the inquiries \
, and ensure that adequate policing is provided for all the victims of crime in \
Northern Ireland ?"

example_question = "What is the minister going to do about?"

print('Given a new question, we will now find the appropriate cluster for it:')
print('Question: ', example_question)
print('Cluster: ', questionTypology.classify_question(example_question))

print('Figure 1A from the paper will now display')
questionTypology.display_question_type_log_odds_graph()

Total Motifs: 2255
Total Questions: 199861
Total Fragments: 2756
Number of Motifs in each Cluster:  [376, 240, 275, 198, 270, 234, 350, 312]
10 examples for type 1-8:
	10 sample question motifs for type 0 (376 total motifs):
		1. ('why>*',)
		2. ('explain_*',)
		3. ('explain_*', 'will>*')
		4. ('explain_*', 'explain_will')
		5. ('admit_*',)
		6. ('admit_*', 'will>*')
		7. ('why>*', 'why>does')
		8. ('where>*',)
		9. ('admit_*', 'admit_will')
		10. ('stop_*',)
	10 sample answer fragments for type 0 (586 total fragments) :
		1. suggest_*
		2. understand_does
		3. tell_will
		4. wonder_*
		5. is_extraordinary
		6. says_is
		7. cost_*
		8. talks_about
		9. talks_*
		10. hoped_have
	10 sample questions that were assigned type 0 (25380 total questions with this type) :
		1. I admit that I am not involved in religion of any kind , but will my hon Friend explain to me why on earth , given that the House voted for women priests , there should be any difficulty about bishops ? We established the

		6. Is the Secretary of State aware that one Peter Sherry , who stood as a Sinn Fein candidate in a local government by - election , recently used his manifesto to forecast which people would be murdered , and three of those murders have already been carried out ? In view of that misuse of the democratic procedures , will the right hon Gentleman view more seriously and urgently the suggestion of my hon Friend the Member for Upper Bann ( Mr. McCusker ) ?
		7. Is my hon Friend aware that the Dalgety group also owns the Spillers factory in Barrhead which has an outstanding record in production , industrial relations and new investment ? In his discussions with Dalgety , will he reinforce the Scottish Office commitment to the success of that factory and explore possibilities for expansion ?
		8. Is the Secretary of State aware that the very high price of petrol and diesel in Northern Ireland—the highest in the United Kingdom—is having a severe impact on the living standards of families an

		6. The Minister will be aware of the recent report from PricewaterhouseCoopers stating that 36,000 jobs will be lost in Northern Ireland as a result of the Government 's policies-20,000 in the public sector and a further 16,000 in the private sector . What estimate has he made of the cost to the taxpayer of those 36,000 people currently in work being made unemployed by the Government 's policies ?
		7. When the self - righteous barrage of critimism of the Soviet Union comes to an end , what positive proposals to carry forward the process of detente in Europe will be made by the United Kingdom . ?
		8. What provision is being made for culture by the new British embassy in Baghdad ?
		9. I , too , welcome the Chancellor 's announcement about the further progress made by the original 10 countries to meet the HIPC conditions . However , what personal representations have the British Government made , either directly to the countries or through the World Bank , to ensure that countries in

		1. Does my right hon and learned Friend agree that the sheer scale of the support shown by the Russian people for Boris Yeltsin and the programme of reforms gives a tremendous boost to international confidence in the long - term future of Russia ? Does he also agree that in the near term substantial aid will be necessary to help the difficult process of economic transformation ? Will he therefore comment on the quantity and quality of Britain 's aid to Russia ?
		2. Does the Secretary of state agree that when lives , no matter whose lives they are , are lost in Northern Ireland it is particularly distasteful that there are always those who will gloat over the loss of life and see death as some sort of victory ? Does he agree also that it is particularly distasteful when politicians do so , because that only feeds the bitternesses and the hatreds that ensure that death will occur in the future ? Does the Secretary of State further agree that the most powerful statement in the past 12 

		7. I welcome my right hon Friend to his place , and I would like to touch on Navy recruitment , if I may . Will he quash these rumours that we will not have enough trained sailors to man both our aircraft carriers when they are launched ?
		8. I thank my right hon Friend for that answer . Will he estimate how many people are living in bad housing conditions because of the Labour Party 's vindictive prejudice against the shorthold provisions and the private rented sector ?
		9. I thank the right hon Gentleman for that reply . It is an honour to be the last person ever to ask him a question . It is just a shame that we are not talking about bats , as we usually do . I know that the right hon Gentleman feels that some progress has been made on this issue , but others have said that the Church of England is rather dragging its feet . Will he heed the calls of Archbishop Desmond Tutu to show strong moral leadership on this issue and report back sooner rather than later ?
		10. I welcome b

KeyboardInterrupt: 