# SAT Expansion Pipeline

This notebook implements a method a segments-as-topic (SAT) methodology for generating new topics in the Comparative Constitutions Project (CCP) ontology. In the implementation below, a SAT comprises sections of national constitutions that capture the meaning of a topic.

## Stages

### Preliminaries

This stage gets things started and need only be run once during a session. It comprises three steps:

- Step 1: Load packages and functions from external files.
- Step 2: Start a web server that supports Javascript to Python interactions with the notebook.
- Step 3: Load models. This includes Google's Universal Sentence Encoder (USE) version 4 and the data models used by the application which include encodings of constitution sections generated by the USE model. These encodings are used by semantic search.

### Initialisation

This stage initialises the data structures used to record your activities during a session. At the end of a session (SAT Review and Acceptance) the populated data structures are saved to a JSON file in the `outputs` folder. The file name is `<topic_key>_resources.json` and it provides a complete record of your activities. 

### SAT Generation

In the SAT generation stage, a topic formulation comprising a short phrase is created. A sentence-level semantic similarity model is then used to encode the topic formulation and the encoding is used to find constitution sections that are semantically similar to the topic formulation. Formulations can be tested and refined until a suitable seed set of segments is obtained. This SAT seed set is then used to find additional sections in the SAT Expansion stage. 

There are two steps in this stage:

- Step 1: Load an interface in which a topic key (a short identifier) and formulation are defined along with semantic search criteria.
- Step 2: Use the choices made in the interface to search for constitution sections that are semantically similar to the topic formuation. Once the search is complete, select sections that you judge match the formulation. Alernatively, assess the results and return to Step 1 to refine your choices.

In summary, SAT Generation is an iterative process, the final outcome of which is a set of accepted sections that constitute the SAT seed set which is the input to the SAT Exoansion stage.


### SAT Expansion

There are three steps in this stage:

- Step 1: Save the SAT seed set created by SAT Generation.
- Step 2: Load an interface in which to define the two semantic similarity thresholds needed by SAT Expansion.
- Step 3: Use the choices made in the interface to search for constitution sections that are semantically similar to SAT sections. Once the search is complete, select sections that you judge should be included in the SAT, i.e., expand the SAT.

SAT Expansion is an iterative process, and by running Step 3 you can .


### SAT Review and Acceptance


1. Generating a seed SAT set using PAT search in the CCP corpus
2. Iteratively expanding the segments of a SAT.
3. Reviewing and refining the expanded SAT.
4. Automatically tagging SAT segments in constitution XML files.


## Outputs


# Preliminaries

##  Step 1: Load packages and functions

In [1]:
__author__ = 'Roy Gardner, Matt Martin'

%run ./_library/packages.py
%run ./_library/utilities.py
%run ./_library/sat.py
%run ./_library/server.py


## Step 2: Start Python web server

The server handles Javascript to Python interactions. Specifically SAT segments are selected using checkboxes in output cell HTML. Checkbox element state changes are handled by Javascript and posted to the server which manages the set of checked elements.

Checkbox state is used to define:

- Segments constituting the SAT seed set in SAT generation.
- Selected segments during SAT expansion.
- Removal of segments dirung the SAT review process.


In [2]:
port = 8002

def get_selected_ids():
    # Get IDs of selected checkboxes
    return state.selected_ids 

def clear_selected_ids():
    state.selected_ids = set()
    
def set_selected_ids(selected_ids):
    state.selected_ids = set(selected_ids)

if not server_is_running(port):
    state = CheckboxState()
    handler = lambda *args: CheckboxHandler(state, *args)
    server = HTTPServer(('localhost', port), handler)

    thread = Thread(target=server.serve_forever)
    thread.daemon = True
    thread.start()
    print('Server running on port:', port)
else:
    print('Already running on port:', port)
    
if server_is_running(port):
    html = '''
    <script>
    function hit(id) {
        const checkbox = document.getElementById(id);
        fetch('http://localhost:''' + str(port) + '''', {
            method: 'POST',
            headers: {
                'Content-Type': 'application/json',
            },
            body: JSON.stringify({
                id: id,  // Send the actual ID
                checked: checkbox.checked
            })
        });
    }</script>
    '''
    display(HTML(html))
else:
    print('Server is not running:', port)


Server running on port: 8002


## Step 3: Load the models

In [3]:
model_path = '../model/anarchism/'

model_dict = do_load(model_path,exclusion_list=[],verbose=True)

use_path = '../encoders/use-4/'
encoder = hub.load(use_path)

print('Finished')


Loading model…
Finished loading model.
Finished


# Initialisation

Run this cell to reset an existing session or to start a new session.


In [4]:
# Make sure that SAT segments are empty
clear_selected_ids()
review = False

# Dictionary containing resources for current run
resource_dict = {
    'topic_key': '',
    'topic_label': '',
    'topic_description': '',
    'start_datetime':None,
    'end_datetime':None,
    'generation': {
        'formulation': '',
        'search_threshold': 0.0,
        'cluster_threshold': 0.0,
        'seed_segments': []
    },
    'expansion': {
        'iterations': []
    },
    'review':{
        'sat_segments_final':[],
        'removed_segments':[],       
        'csv_file':'',    
    },
    'xml':{
        'constitution_count':0,
        'constitutions_updated':[]        
    }
}

def get_iteration_dict():
    iteration_dict = {
        'post_review':False,       
        'accepted_set':[],    
        'rejected_set':[],    
        'sat_set':[],
        'mapping_threshold':0.0,    
        'cluster_threshold':0.0    
    }
    return iteration_dict
    


# SAT Generation

SAT Generation is a two step stage:

- Step 1: Define your topic formulation and semantic search parameters in a simple interface.
- Step 2: Run your semantic search to see the results. Then select suitable sections in the results to create the seed set.

The key to success at this stage is experimentation. You can work on the formulation and search parameters to refine your search results in order to generate a seed set that matches the topic you are creating. The seed set need not be exhaustive — a small set of constitution sections that are a good match to your topic formulation will provide the basis for successful SAT expansion. The sections in the seed set are better a finding additional sections than any formulation.


## Step 1: Create the SAT generation interface


This step creates an interface within which you define your topic formulation and the parameters of your semantic search of constitution sections. 

Run the cell below to generate the interface for selecting the following values and parameters:

- Topic key
  - An alphanumeric key for your topic between 4 and 10 characters in length, e.g. parents.
- Search threshold
  - Sets the minimum semantic similarity a constitution section must meet to qualify as a match to your topic formulation. Sections that meet or exceed this threshold are included in the search results in Step 2 below.
  - Too low and you'll get too many results — many of which will be off-topic.
  - Too high and you may miss on-topic results.
  - 0.63 is a good starting point and is set as the default; move up or down as needed using the slider.
- Cluster threshold
  - Groups search results together to try and separate on-topic from off-topic results in Step 2.
  - Too low and you'll get one big cluster containing all search results.
  - Too high and most results will be considered unrelated to one another and will appear in the `singletons` set.
  - 0.72 is a good starting point and is set as the default, but you'll need to experiment for each topic you create.
- Formulation
  - Enter the text of your topic formulation here; this will be used to search for semantically similar constitution sections in Step 2. The maximum number of character is 400 and the text you entered is sanitised to escape HTML and remove characters that in a web setting would be considered a security threat. If text is sanitised then an alert displays the sanitised text.

Once you are happy with your choices click on the `Apply Choices` button and move on to Step 2.


In [5]:

choice_dict = init_choice_dict()
generation_interface(choice_dict,0.63,0.72)


Text(value='', continuous_update=False, description='Topic key:', layout=Layout(width='initial'), placeholder=…

Label(value='THRESHOLDS:')

FloatSlider(value=0.63, continuous_update=False, description='Search:', layout=Layout(width='800px'), max=0.9,…

FloatSlider(value=0.72, continuous_update=False, description='Cluster:', layout=Layout(width='800px'), max=0.9…

Textarea(value='', continuous_update=False, description='Formulation:', layout=Layout(width='initial'), placeh…

Button(description='Apply Choices', style=ButtonStyle(), tooltip='Click to apply choices')

Output()

## Step 2:  Run the semantic search and create the seed set

The choices made in the interface above are now used in a semantic search of constitution sections. The search may take a few seconds depending upon your computer.

Sections found by the semantic search appear in HTML tables and are organised into clusters of based of their semantic similarity to one another. Note that clusters may suggest sub-topics or further refinements for your topic.

Each section's row in a cluster has three elements:

- The section's ID which is a link to the section in the [Constitute Project](https://www.constituteproject.org/) website. By using this link you are able to vew the section in the context of the consitution to which it belongs.
- The section's text.
- A checkbox.

Use the checkbox to add (or remove) a section from the seed set. Once you are happy with your seed set, you can proceed to Stage 2: SAT Expansion. In the first step of SAT expansion your seed set will be saved.

Above the HTML tables is a field (Search for terms…) which can be used to search for one or more words in the search results. As you type, the search results are filtered to show only those results containing the text in the field. Clear the field to see the full set of results.

## Use cases

1. I want to start over. 
    - Rerun Initialisation to reset the entire process and start a new session.
2. I'm not happy with my formulation.
    - Change the formulation in Step 1, click on Apply Choices, and rerun the Step 2 to run a new search.
3. There are too many results.
    - Return to Step 1, increase the search threshold, click on Apply Choices, and rerun Step 2.
4. There are too few results.
    - Return to Step 1, reduce the search threshold, click on Apply Choices, and rerun Step 2.
5. I'm not happy with the results.
    - Return to Step 1, edit the formulation, click on Apply Choices, and rerun the Step 2.
6. I've made my selection, what next?.
    - Move to SAT Expansion: Step 1. This will save and record your selection and start the SAT Expansion process.


In [6]:

if len(choice_dict['formulation']) > 0:
    print('Topic key:', choice_dict['topic_key'])
    print('Formulation:', choice_dict['formulation'])
    print('Search threshold:', choice_dict['search_threshold'])
    print('Cluster threshold:', choice_dict['cluster_threshold'])
    print()
    
    resource_dict['topic_key'] = choice_dict['topic_key']
    resource_dict['start_datetime'] = int(time.time())
    resource_dict['generation']['formulationv'] = choice_dict['formulation']
    resource_dict['generation']['search_threshold'] = choice_dict['search_threshold']
    resource_dict['generation']['cluster_threshold'] = choice_dict['cluster_threshold']

    # Get a set of segment IDs found by the semantic search
    segment_ids = run_sat_generation(choice_dict, model_dict, encoder)
    print('Number of search results:',len(segment_ids))
    if len(segment_ids) > 0:
        # Use same clustering and listing as expansion
        cluster_dict = cluster_sat_candidates(segment_ids,model_dict,\
                                              threshold=choice_dict['cluster_threshold'])
        print('Number of clusters:',len(cluster_dict))
        print()
        list_clusters(cluster_dict,model_dict)
        
else:
    alert('No formulation entered.')
    

Topic key: anarchism
Formulation: Mentions of anarchism
Search threshold: 0.6
Cluster threshold: 0.72

Number of search results: 116
Number of clusters: 7



Segment ID,Segment text,Accept
Dielo_Truda_1926/127,"The role of the masses and the role of the anarchists in the social struggle and the social revolution The principal forces of the social revolution are the urban working class, the peasant masses and a section of the working intelligentia.",
Dielo_Truda_1926/130,"The anarchist conception of the role of the masses in the social revolution and the construction of socialism differs, in a typical way, from that of the statist parties.",
Dielo_Truda_1926/131,"While bolshevism and its related tendencies consider that the masses assess only destructionary revolutionary instincts, being incapable of creative and constructive activity — the principle reason why the latter activity should be concentrated in the hands of the men forming the government of the State of the Central Committee of the party — anarchists on the contrary think that the labouring masses have inherent creative and constructive possibilities which are enormous, and anarchists aspire to suppress the obstacles impeding the manifestation of these possibilities.",
Dielo_Truda_1926/132,"Anarchists consider the State to be the principle obstacle, usurping the rights of the masses and taking from them all the functions of economic and social life.",
Dielo_Truda_1926/143,The fundamental task of the General Union of Anarchists in the pre-revolutionary period must be the preparation of the workers and peasants for the social revolution.,
Dielo_Truda_1926/148,"but education alone is not sufficient — What is also necessary is a certain mass anarchist organisation — To realise this, it is necessary to work in two directions: on the one hand towards the selection and grouping of revolutionary worker and peasant forces on a libertarian communist theoretical basis (a specifically libertarian communist organisation); on the other, towards regrouping revolutionary workers and peasants on an economic base of production and consumption (revolutionary workers and peasants organised around production: workers and free peasants co-operatives).",
Dielo_Truda_1926/149,"The worker and peasant class, organised on the basis of production and consumption, penetrated by revolutionary anarchist positions, will be the first strong point of the social revolution.",
Dielo_Truda_1926/165,"Although the masses express themselves profoundly in social movement in terms of anarchist tendencies and tenets, these tendencies and tenets do however remain dispersed, being uncoordinated, and consequently do not lead to the organisation of the driving power of libertarian ideas which is necessary for preserving the anarchist orientation and objectives of the social revolution.",
Dielo_Truda_1926/170,"And from the moment when anarchists declare a conception of the revolution and the structure of society, they are obliged to give all these questions a clear response, to relate the solution of these problems to the general conception of libertarian communism, and to devote all their forces to the realisation of these.",
Dielo_Truda_1926/171,Only in this way do the General Union of Anarchists and the anarchist movement completely assure their function as a theoretical driving force in the social revolution.,

Segment ID,Segment text,Accept
Dielo_Truda_1926/85,"This other society will be libertarian communism, in which social solidarity and free individuality find their full expression, and in which these two ideas develop in perfect harmony.",
Dielo_Truda_1926/94,"Within the limits of this self-managing society of workers, libertarian communism establishes the principle of the equality of value and rights of each individual (not individuality “in general,” nor of “mystic individuality,” nor the concept of individuality, but each real, living, individual).",

Segment ID,Segment text,Accept
Dielo_Truda_1926/156,"More than any other concept, anarchism should become the leading concept of revolution, for it is only on the theoretical base of anarchism that the social revolution can succeed in the complete emancipation of.",
Dielo_Truda_1926/158,The leading position of anarchist ideas in the revolution suggests an orientation of events after anarchist theory.,

Segment ID,Segment text,Accept
Dielo_Truda_1926/193,Anarchism and syndicalism,
Dielo_Truda_1926/73,Anarchists and libertarian communism,

Segment ID,Segment text,Accept
Rojava_2023/204,"2 The canton in the Democratic Autonomous Administration of North and East Syria organizes itself in terms of: political, social, economic, ecological, cultural, security, educational, women and youth, on the basis of democratic confederation and the principles that the Democratic Autonomous Administration decides and operates according to.",
Rojava_2023/23,"Article 10 Oath: I swear to God Almighty, and I pledge to the martyrs: to abide by the social contract and its articles, to preserve the democratic rights of the peoples and the values of the martyrs, to preserve the freedom, safety and security of the regions of the Democratic Autonomous Administration of North and East Syria and the Democratic Republic of Syria, and to work for a free, equal life and the achievement of social justice, according to the principle of the democratic nation Article 11 The Democratic Autonomous Administration of North and East Syria consists of cantons based on the concept of local democracy based on the democratic system that takes the confederal democratic organizations of social groups and segments as its basis.",

Segment ID,Segment text,Accept
Dielo_Truda_1926/338,The General Union of Anarchists has a concrete and determined goal.,
Dielo_Truda_1926/50,Let it form the foundations for the General Union of Anarchists!,

Segment ID,Segment text,Accept
Dielo_Truda_1926/0,Organisational Platform of the Libertarian Communists Dielo Truda (Workers’ Cause) 1926 Introduction,
Dielo_Truda_1926/1,"It is very significant that, in spite of the strength and incontestably positive character of libertarian ideas, and in spite of the forthrightness and integrity of anarchist positions in the facing up to the social revolution, and finally the heroism and innumerable sacrifices borne by the anarchists in the struggle for libertarian communism, the anarchist movement remains weak despite everything, and has appeared, very often, in the history of working class struggles as a small event, an episode, and not an important factor.",
Dielo_Truda_1926/114,"Some want to conquer power by peaceful, parliamentarian means (the social democratic), others by revolutionary means (the bolsheviks, the left social revolutionaries).",
Dielo_Truda_1926/115,"Anarchism considers these two to be fundamentally wrong, disastrous in the work of the emancipation of labour.",
Dielo_Truda_1926/12,"Anarchism is not a beautiful utopia, nor an abstract philosophical idea, it is a social movement of the labouring masses.",
Dielo_Truda_1926/124,"The State, immediately and supposedly constructed for the defence of the revolution, invariably ends up distorted by needs and characteristics peculiar to itself, itself becoming the goal, produces specific, privileged castes, and consequently re-establishes the basis of capitalist Authority and State; the usual enslavement and exploitation of the masses by violence.",
Dielo_Truda_1926/14,"“We are persuaded,” said Kropotkin, “that the formation of an anarchist organisation in Russia, far from being prejudicial to the common revolutionary task, on the contrary it is desirable and useful to the very greatest degree.”",
Dielo_Truda_1926/141,"Action by anarchists can be divided into two periods, that before the revolution, and that during the revolution.",
Dielo_Truda_1926/142,"In both, anarchists can only fulfil their role as an organised force if they have a clear conception of the objectives of their struggle and the roads leading to the realisation of these objectives.",
Dielo_Truda_1926/144,"In denying formal (bourgeois) democracy, authority and State, in proclaiming the complete emancipation of labour, anarchism emphasises to the full the rigorous principles of class struggle.",


# SAT Expansion

Now that you have selected a seed SAT you are ready to use the constitution sections in the seed set to search for semantically similar sections and therefore expand the SAT.



## Step 1: Load seed SAT from generation process

This process creates two sets of segments:

- The SAT sections, i.e., those sections accepted in the SAT generation process.
- Rejected sections — a set which is initially empty.

Both sets grow during the SAT expansion process below.


In [7]:
topic_key = choice_dict['topic_key']

# We might be returning here to start again, i.e., we need to check SAT Generation state

if len(resource_dict['generation']['seed_segments']) == 0:
    # First time into expansion
    # Set of selected segments from generation
    sat_segment_ids = get_selected_ids()
    # Convert SAT segments to list for serialisation to JSON resource
    resource_dict['generation']['seed_segments'] = get_segments(sat_segment_ids,model_dict)    
else:
    # We want to restart the process with the original generation seed set
    set_selected_ids([key for d in resource_dict['generation']['seed_segments'] for key in d.keys()])
    sat_segment_ids = get_selected_ids()
    
# Set of rejected segments
rejected_segment_ids = set()

# Convert SAT segments to list for serialisation to JSON resource
resource_dict['generation']['seed_segments'] = get_segments(sat_segment_ids,model_dict)

print('Expanding SAT for:',topic_key)
print()
print('Number of segments in SAT seed set:',len(sat_segment_ids))

# Initial state for expansion process
clear_selected_ids()
first_time = True


Expanding SAT for: anarchism

Number of segments in SAT seed set: 5


## Step 2: Create the SAT expansion interface


This step creates an interface within which you define SAT expansion parameters of your semantic search of constitution sections. 

Run the cell below to generate the interface for selecting the following parameters:

- Mapping threshold
  - Sets the minimum similarity between constitution sections and sections in the SAT.
  - Too low and you'll get too many results—many of which will be off-topic.
  - Too high and you may miss on-topic results.
  - 0.63 is a good starting point and is set as the default; move up or down as needed using the slider.
- Cluster threshold
  - Groups search results together to try and separate on-topic from off-topic results in Step 2.
  - Too low and you'll get one big cluster containing all search results.
  - Too high and most results will be considered unrelated to one another and will appear in the `singletons` set.
  - 0.72 is a good starting point and is set as the default, but you'll need to experiment for each topic you create.

Once you are happy with your choices click on the `Apply Choices` button and move on to Step 2.


In [8]:

expansion_choice_dict = init_expansion_choice_dict()
expansion_interface(expansion_choice_dict,0.70,0.74)


Label(value='THRESHOLDS:')

FloatSlider(value=0.7, continuous_update=False, description='Mapping:', layout=Layout(width='800px'), max=0.9,…

FloatSlider(value=0.74, continuous_update=False, description='Cluster:', layout=Layout(width='800px'), max=0.9…

Button(description='Apply Choices', style=ButtonStyle(), tooltip='Click to apply choices')

Output()

## Step 3: Run SAT expansion (iterative process)


Iteratively run the code cell below. Each iteration will:
1. Find SAT expansion candidate segments that are semantically similar to SAT segments at or above a `mapping_threshold`. A segments is a candidate if:
    - It is not a member of the current SAT segments set.
    - It is not a member of the rejected segments set.
2. Provide a clustered list of candidate segments.
3. Provide support for selecting candidate segments for inclusion in the SAT.

Each subsequent iteration will:

- Add selected candidate segments from the previous iteration to the SAT segments set. above.
- Add unselected candidate segments to the rejected segments set.
- Repeat steps 1-3 above.

The process terminates when no more candidates segments are found or no selection is made.

The results layout and interface is identical to that of SAT Generation: Step 2.

## Use cases

1. I want to start over from the very beginning.
    - Rerun Initialisation to reset the entire process and start a new session. You will have to start with SAT generation.
2. I want to start over with the original seed set.
    - Rerun Step 1, to start again with the seed set from SAT generation.
3. There are too many results.
    - Return to Step 2, increase the mapping threshold, click on Apply Choices, and rerun Step 3.
4. There are too few results.
    - Return to Step 2, reduce the mapping threshold, click on Apply Choices, and rerun Step 3.
5. I've made my selection, what next?.
    - Simply rerun Step 3. SAT expansion is an iterative process which can be repeated as many time as you like. Unless you reduce the mapping threshold the search results should get smaller with every iteration. The expansion terminates when there or no search results or when you rerun Step 3 without selecting any additional sections.


In [12]:

mapping_threshold = expansion_choice_dict['mapping_threshold']
cluster_threshold = expansion_choice_dict['cluster_threshold']

# First time in this state
if len(get_selected_ids()) == 0 and first_time:
    first_time = False
    # Get the set of candidate segments.
    sat_candidate_ids = run_sat_expansion(sat_segment_ids,sat_segment_ids,rejected_segment_ids,model_dict,\
                                          threshold=mapping_threshold)
    print('Number of candidate segments:',len(sat_candidate_ids))
else:    
    # Get accepted segments - could be from expansion iteration or review
    sat_accepted_ids = get_selected_ids()

    if len(sat_accepted_ids) == 0:
        # Termination condition
        rejected_segment_ids.update(sat_candidate_ids)
        sat_candidate_ids = set()
        # Populate an iteration dictionary
        # Updated rg 07/05/2025 to save segment text as well as segment IDs
        iteration_dict = {
            'accepted_set':get_segments(sat_accepted_ids,model_dict),    
            'rejected_set':get_segments(rejected_segment_ids,model_dict),    
            'sat_set':get_segments(sat_segment_ids,model_dict),
            'mapping_threshold':mapping_threshold,    
            'cluster_threshold':cluster_threshold    
        }
        resource_dict['expansion']['iterations'].append(iteration_dict)

    else:    
        print('Number of accepted segments:',len(sat_accepted_ids))
        # Add accepted segments to the SAT set. 
        if review:
            # Re-entrant from review so SAT is the current selected set from the review cell
            sat_segment_ids = sat_accepted_ids
            review = False
        else:
            # Expansion iteration so extend the SAT set
            sat_segment_ids.update(sat_accepted_ids)

        print('Updated SAT size:',len(sat_segment_ids))

        # Add all remaining segments from the last iteration's candidate set to the rejected set
        rejected_segment_ids.update(sat_candidate_ids.difference(sat_accepted_ids))        
        # Updated rg 07/05/2025 to save segment text as well as segment IDs
        iteration_dict = {
            'accepted_set':get_segments(sat_accepted_ids,model_dict),    
            'rejected_set':get_segments(rejected_segment_ids,model_dict),    
            'sat_set':get_segments(sat_segment_ids,model_dict),
            'mapping_threshold':mapping_threshold,    
            'cluster_threshold':cluster_threshold    
        }
        resource_dict['expansion']['iterations'].append(iteration_dict)

        # Build the matrix with the accepted set for speed 
        sat_candidate_ids = run_sat_expansion(sat_accepted_ids,sat_segment_ids,rejected_segment_ids,\
                                              model_dict,threshold=mapping_threshold)    
        print('Number of candidate segments:',len(sat_candidate_ids))

if len(sat_candidate_ids) > 0:     
    # Cluster the candidates and display
    clear_selected_ids() # Clear state for the next run
    cluster_dict = cluster_sat_candidates(sat_candidate_ids,model_dict,threshold=cluster_threshold)
    print('Number of clusters:',len(cluster_dict))
    print()
    list_clusters(cluster_dict,model_dict)
else:
    # Initialise so user can do another run with the currently selected topic
    clear_selected_ids()
    first_time = True
    print('The process has terminated. Please review the final SAT set in the cell below.')



The process has terminated. Please review the final SAT set in the cell below.


# SAT Review

This stage provides the opportunity to review the segments of the expanded SAT. Segments can be removed by unchecking the segments box.


## Step 1: Create the SAT Review interface

This step creates an interface within which you define the cluster threshold for the final SAT sections. This is useful tool for helping identify sections that may be edge cases, as well as sections that might indicate the presence of sub-topics.

Run the cell below to generate the interface for selecting the following parameter:

- Cluster threshold
  - Groups SAT sections together.
  - Too low and you'll get one big cluster containing all SAT sections.
  - Too high and most sections will appear in the `singletons` set.
  - 0.74 is a good starting point and is set as the default, but you may need to experiment.

Once you are happy with your choice click on the `Apply Choices` button and move on to Step 2.


In [13]:
%run ./_library/utilities.py

review_choice_dict = init_review_choice_dict()
review_interface(review_choice_dict,0.74)


Label(value='THRESHOLD:')

FloatSlider(value=0.74, continuous_update=False, description='Cluster:', layout=Layout(width='800px'), max=0.9…

Button(description='Apply Choice', style=ButtonStyle(), tooltip='Click to apply choice')

Output()

## Step 2: Review the final SAT

This section provides an opportunity to review the segments of the expanded SAT. Segments can be removed by unchecking the segments box.

The results layout and interface is identical to that of SAT Generation: Step 2, and SAT Expansion: Step 3 in layout. However, all checkboxes are checked by default and can be unchecked to remove a section from the SAT.

## Use cases

1. I unchecked a section but now I want to add it back in.
    - Click on the section's checkbox and the section will be added back into the SAT.
2. I need to return to the expansion stage to make sure I didn't miss anything.
    - Rerun SAT Generation: Step 2, with a lower mapping threshold.


In [14]:

review = True

cluster_threshold = review_choice_dict['cluster_threshold']

# Make sure checkbox state contains all SAT segments
set_selected_ids(sat_segment_ids)

# Store the SAT set that is being reviewed
review_sat_ids = sat_segment_ids

cluster_dict = cluster_sat_candidates(sat_segment_ids,model_dict,threshold=cluster_threshold)
print('Number of SAT segments:',len(sat_segment_ids))
print('Number of clusters:',len(cluster_dict))
print()
list_clusters(cluster_dict,model_dict,check_all=True)



Number of SAT segments: 9
Number of clusters: 3



Segment ID,Segment text,Accept
Dielo_Truda_1926/171,Only in this way do the General Union of Anarchists and the anarchist movement completely assure their function as a theoretical driving force in the social revolution.,
Dielo_Truda_1926/306,"Its task is to group around itself all the healthy elements of the anarchist movement into one general organisation, active and agitating on a permanent basis: the General Union of Anarchists.",

Segment ID,Segment text,Accept
Dielo_Truda_1926/193,Anarchism and syndicalism,
Dielo_Truda_1926/73,Anarchists and libertarian communism,

Segment ID,Segment text,Accept
Dielo_Truda_1926/127,"The role of the masses and the role of the anarchists in the social struggle and the social revolution The principal forces of the social revolution are the urban working class, the peasant masses and a section of the working intelligentia.",
Dielo_Truda_1926/130,"The anarchist conception of the role of the masses in the social revolution and the construction of socialism differs, in a typical way, from that of the statist parties.",
Dielo_Truda_1926/131,"While bolshevism and its related tendencies consider that the masses assess only destructionary revolutionary instincts, being incapable of creative and constructive activity — the principle reason why the latter activity should be concentrated in the hands of the men forming the government of the State of the Central Committee of the party — anarchists on the contrary think that the labouring masses have inherent creative and constructive possibilities which are enormous, and anarchists aspire to suppress the obstacles impeding the manifestation of these possibilities.",
Dielo_Truda_1926/132,"Anarchists consider the State to be the principle obstacle, usurping the rights of the masses and taking from them all the functions of economic and social life.",
Dielo_Truda_1926/143,The fundamental task of the General Union of Anarchists in the pre-revolutionary period must be the preparation of the workers and peasants for the social revolution.,


## Step 3: Accept review and write the final SAT to CSV

Run the cell below to generate the final interface for selecting the following values:

- Topic label
  - A short human-readable label for your new topic.
- Topic description
  - A description of the new topic that is more expansive than the topic formulation.

Once you are happy with your choices click on the `Accept Review` button which saves two files into the `outputs/` folder:

1. <topic_key__final_SAT.csv: contains a list of all SAT sections.
2. <topic_key__resource.json: A full history of your choices and results.


In [None]:
%run ./_library/utilities.py

# Set the SAT segments to the checked segments in review
sat_segment_ids = get_selected_ids()

print('Number of segments in final SAT:',len(sat_segment_ids))
print()

if len(sat_segment_ids) > 0:  
    accept_review_interface(sat_segment_ids,review_sat_ids,resource_dict,model_dict,accept_review)
else:
    print('The SAT is empty.')
