# Using *Key Point Summarization* for analyzing and finding insights in a survey data 

#### **Important Notice**: This tutorial describes the new SDK that was introduced on June 2023, starting from debater-python-api version 5.0.0. The tutorial of the legacy SDK is available [here](../kpa_quick_start_tutorial-with_results.ipynb).

When you have a large collection of texts representing people’s opinions (such as product reviews, survey answers or social media), it is difficult to understand the key issues that come up in the data. Going over thousands of comments is prohibitively expensive.  Existing automated approaches are often limited to identifying recurring phrases or concepts and the overall sentiment toward them, but do not provide detailed or actionable insights. 

In this tutorial you will gain hands-on experience in using *Key Point Summarization* (KPS) for analyzing and deriving insights from open-ended comments.  

The data we will use is [a community survey conducted in the city of Austin](https://data.austintexas.gov/dataset/Community-Survey/s2py-ceb7). In this survey, the citizens of Austin were asked "If there was ONE thing you could share with the Mayor regarding the City of Austin (any comment, suggestion, etc.), what would it be?". 

## 1. Initialization

### 1.1 Setup

Let's first import all the required packages for this tutorial and initialize the *Key Point Summarization* client. The client prints information using a logger and a suitable verbosity level should be set. The client object is configured with an API key. To receive an API key please send an email to *yoavka@il.ibm.com* and we'll be happy to provide it. In the code below it is stored in the enviroment variable *KPS_API_KEY* (you may also modify the code and place the api-key directly).

In [2]:
from debater_python_api.api.clients.keypoints_client import KpsClient, KpsJobFuture
from debater_python_api.api.clients.key_point_summarization.KpsResult import KpsResult
import os
import pandas as pd
import json 

pd.set_option('display.max_rows', None)
KpsClient.init_logger()
api_key = os.environ['KPS_API_KEY']
host = 'https://keypoint-matching-backend.debater.res.ibm.com'
keypoints_client = KpsClient(api_key, host)

### 1.2 Read the data
Let's read the data from *dataset_austin.csv* file, which holds the Austin survey dataset, and print the first comment.

In [3]:
comments_df = pd.read_csv('./dataset_austin.csv')

print(f'There are {len(comments_df)} comments in the dataset')
print(dict(comments_df.iloc[0,:]))

There are 3187 comments in the dataset
{'id': 1, 'year': 2016, 'text': "Dissatisfied traffic and with traffic, timing of street lights.  EXTREMELY dissatisfied with cit govt. interfering in local businesses (Uber/Lyft, income property owners).  Also, extremely dissatisfied with all the free handouts to people who are perfectly capable of earning their own money.  I'm very dissatisfied with the liberal leaning local politicians.", 'Council_District': 7}


Each comment has a unique_id 'id', a 'text', a 'year' and a 'Council_District'. The *Key Point Summarization* service is able to run over hundreds of thousands of comments. However, to provide good user experience, the KPS evaluation service is limited to 1000 comments per run, each comment with up to 3000 chars. You may request to increase this limit if needed. 

We will choose a sample of 1000 comments from 2016 for our analysis.

In [4]:
comments_df = comments_df.dropna()
comments_df = comments_df[comments_df.text.apply(lambda x: 0<len(str(x))<=3000)]

comments_2016_df = comments_df[comments_df['year'] == 2016]
sample_size = 1000
comments_2016_sample_df = comments_2016_df.sample(n = sample_size, random_state = 1)

## 2. Run the full KPS flow and generate results
The simplest way to run KPS is using the method **keypoints_client.run_full_kps_flow()**, which serves as an excellent starting point. To do so, you only need to provide a collection of textual comments and a distinct domain name (comprised of alphanumeric characters, spaces, or underscores). If you wish to reuse a domain from a previous run, you will first need to delete it via **keypoints_client.delete_domain_cannot_be_undone(domain=domain)**. 

Using this method, KPS extracts the key points from the provided data and matches each sentence in the input comments with its corresponding matching key points.

By default, the service performs stance analysis: it runs for positive (pro) and for negative (con) sentences seperately, and returns a merged result containing key points from both stances. To disable the stance analysis and run on all sentences together, add the parameter **stance=\"no-stance\"** to the **run_full_kps_flow** method.   

In [5]:
domain = "austin_demo_full"
comments_texts_2016 = list(comments_2016_sample_df['text'])
kps_result_2016 = keypoints_client.run_full_kps_flow(domain, comments_texts_2016)

2023-06-15 11:33:25,048 [INFO] keypoints_client.py 66: client calls service (post): https://keypoint-matching-backend.debater.res.ibm.com/domains
2023-06-15 11:33:26,113 [INFO] keypoints_client.py 113: created domain: austin_demo_full with domain_params: None
2023-06-15 11:33:26,117 [INFO] keypoints_client.py 158: uploading 1000 comments in batches
2023-06-15 11:33:26,123 [INFO] keypoints_client.py 66: client calls service (post): https://keypoint-matching-backend.debater.res.ibm.com/comments
2023-06-15 11:33:27,476 [INFO] keypoints_client.py 174: uploaded 1000 comments, out of 1000
2023-06-15 11:33:27,480 [INFO] keypoints_client.py 137: waiting for the comments to be processed
2023-06-15 11:33:27,486 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/comments
2023-06-15 11:33:28,116 [INFO] keypoints_client.py 187: domain: austin_demo_full, comments status: {'processed_comments': 0, 'processed_sentences': 0, 'pending_comment

Stage 1/2: |--------------------------------------------------| 0.0% Complete



2023-06-15 11:35:03,986 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-15 11:35:05,736 [INFO] keypoints_client.py 628: job_id 648accf9d5c9baae800554ad is done, returning result
2023-06-15 11:35:05,750 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-15 11:35:06,358 [INFO] keypoints_client.py 625: job_id 648accfad5c9baae800554ae is running, progress: {'total_stages': 2, 'stage_1': {'inferred_batches': 6, 'total_batches': 17, 'batch_size': 2000}}


Stage 1/2: |█████████████████---------------------------------| 35.3% Complete



2023-06-15 11:35:36,364 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-15 11:35:38,348 [INFO] keypoints_client.py 628: job_id 648accfad5c9baae800554ae is done, returning result
2023-06-15 11:35:38,475 [INFO] KpsResult.py 525: Merging positive and negative KpsResults.
2023-06-15 11:35:38,575 [INFO] keypoints_client.py 66: client calls service (delete): https://keypoint-matching-backend.debater.res.ibm.com/data
2023-06-15 11:35:39,948 [INFO] keypoints_client.py 400: domain: austin_demo_full was deleted


kps_result_2016 is a KpsResult object. Let's print the top 40 key points, and the top three matched sentences per key point:

In [6]:
kps_result_2016.print_result(n_sentences_per_kp = 3, title = "Austin sample 2016 full flow", n_top_kps = 40)

Austin sample 2016 full flow results, stance: pro and con 
n_comments: 1000
Coverage (all comments): 65.10
Displaying 40 key points out of 48:
169 - Improve traffic flow. - con
	- The traffic on Cameron Road was worsened by the addition of bike lanes.
	- Fix the traffic - you keep encouraging growth of the city without the infrastructure to
	  support that growth.
	- Get the bikes off the streets and improve traffic flow in and out of the city.
153 - We really need to improve our public transportation. - con
	- Need to rework zoning to improve traffic and add more public transit, especially to
	  suburbs.
	- The city needs a more comprehensive and better connected transportation network
	  including bus, rail, bike, pedestrian access.
	- Austin needs to get serious about alternatives to driving including real mass transit,
	  bike, and pedestrian facilities.
127 - Improve affordable housing/living. - con
	- Affordable housing is also a problem- the rents are getting too high and are ha

In the second line we can see that 65% of the comments were matched, i.e., had at least one sentence matched to a key point.
For each key point, this method prints the number of matched comments, its stance (pro or con) and the top matching sentences which present some of main points that were raised regarding the key point.

## 3. KpsResult Object and result processing


The *KpsResult* object stores all the information about the job and the results. It can be used to generate several types of reports and to compare different results. All the reports generated in this section are available in the folder "kps_results".

The KpsResult can be saved to a file and loaded from a file via the **load** and **save** methods:

In [7]:
json_file = "kps_results/kps_result_2016.json"
kps_result_2016.save(json_file) 
kps_result_2016 = KpsResult.load(json_file)

2023-06-15 11:41:07,528 [INFO] KpsResult.py 79: Writing results to: kps_results/kps_result_2016.json
2023-06-15 11:41:07,579 [INFO] KpsResult.py 90: Loading results from: kps_results/kps_result_2016.json


### 3.1 Result Summary
The result summary presented in the attribute *summary_df* displays the aggregated information per key point:

In [8]:
kps_result_2016.summary_df

Unnamed: 0,key_point,#comments,comments_coverage,#sentences,sentences_coverage,stance,kp_id,parent_id,n_comments_subtree
0,Improve traffic flow.,169,0.169,185,0.102041,con,0.0,,220.0
1,We really need to improve our public transport...,153,0.153,167,0.092113,con,1.0,,153.0
2,Improve affordable housing/living.,127,0.127,150,0.082736,con,2.0,,154.0
3,PROPERTY TAXES ARE TOO HIGH.,83,0.083,96,0.052951,con,3.0,,110.0
4,Please plan better for growth on our roadways!,78,0.078,85,0.046884,con,4.0,0.0,78.0
5,Buying a home in S Austin is prohibitive.,56,0.056,62,0.034197,con,5.0,2.0,56.0
6,driving in Austin is terrible,47,0.047,51,0.02813,con,6.0,0.0,47.0
7,Water costs too much.,44,0.044,50,0.027579,con,7.0,3.0,52.0
8,Don't let Austin become Houston with overdevel...,35,0.035,36,0.019857,con,11.0,,35.0
9,Better pedestrian and biking lifestyle options.,34,0.034,36,0.019857,con,9.0,,34.0


The report displays:  
- `key_point`: the list of generated key points, sorted by their saliance.
- `#comments`: the number of comments matched to the key point (comments that have at least one sentence matched to the key point).  
- `comments_coverage`: the percentage of comments matched to it (out of the entire set of comments sent to the job).  
- `#sentences`: the number of the sentences matched to the key point.  
- `sentences_coverage`: the percentage of the sentences matched to the key point (out of the entire set of sentences in the comments sent to the job).
- `stance`: the key point's stance (if stance analysis was performed).
- `kp_id`: the key point's id.  
- `parent_id`: The analysis also creates a key point tree-structured hierarchy. The parent_id column shows the kp_id of the parent of the key point. For example, the key points *Please plan better for growth on our roadways!* and *driving in Austin is terrible* are under the parent key point *Improve traffic flow.*  
- `n_comments_subtree`: how many comments are in the subtree of the key point (matched to the key point or any of its descendants).  


In addition to the individual key points, in the last rows we can find the statistics of total and matched number of sentences and comments for each stance, starting with \*: in the current example, there are 1000 comments and 1813 sentences in total, 651 and 894 of them are matched to at least one key point, respectively. Out of those, 881 comments have sentences classified as con, and overall 1450 sentences are classified as con. 644 comments and 882 sentences are matched to con key points.

### 3.2 Docx report
You can also generate a Microsoft Word document that shows the key point hierarchy visually and presents the sentences matched to each key point as a user-friendly report. it can be generated as follows:

In [9]:
kps_result_2016.generate_docx_report(output_dir = "kps_results", result_name="kps_result_2016")

2023-06-15 11:46:56,578 [INFO] docx_generator.py 200: Creating key points hierarchy
2023-06-15 11:46:56,605 [INFO] docx_generator.py 208: Creating key points matches tables
2023-06-15 11:46:56,917 [INFO] docx_generator.py 284: saving docx summary in file: kps_results/kps_result_2016_hierarchical.docx


### 3.3 Full report

The attribute *result_df* stores the full results. Each row stores a pair of a key point, a matched sentence, their match_score and all the information regarding the sentence.

In [10]:
kps_result_2016.result_df.head(3)

Unnamed: 0,kp,sentence_text,match_score,comment_id,sentence_id,sents_in_comment,span_start,span_end,num_tokens,argument_quality,kp_quality,sent_kp_quality,pos_score,neg_score,sug_score,neut_score,selected_stance,stance_conf,kp_stance
0,Improve traffic flow.,Improve traffic flow.,1.0,269,0,1,0,21,3,0.310709,0.995855,0.995855,0.00196,0.00375,0.988809,0.005481,sug,0.988809,con
1,Improve traffic flow.,The traffic on Cameron Road was worsened by th...,0.999983,674,0,3,0,71,13,0.603956,0.995855,0.0,0.001241,0.995374,0.001376,0.002009,neg,0.995374,con
2,Improve traffic flow.,Fix the traffic - you keep encouraging growth ...,0.999983,210,0,2,0,108,17,0.626678,0.995855,0.0,0.000969,0.989541,0.004868,0.004622,neg,0.989541,con


It's also possible to generate all three reports together. You need to provide the output_dir and the result name, and the reports will be written to files with the appropriate suffixes:

In [11]:
kps_result_2016.export_to_all_outputs(output_dir="kps_results", result_name="kps_result_2016")

2023-06-15 11:47:28,380 [INFO] utils.py 60: Writing dataframe to: kps_results/kps_result_2016.csv
2023-06-15 11:47:28,406 [INFO] utils.py 60: Writing dataframe to: kps_results/kps_result_2016_kps_summary.csv
2023-06-15 11:47:28,453 [INFO] docx_generator.py 200: Creating key points hierarchy
2023-06-15 11:47:28,475 [INFO] docx_generator.py 208: Creating key points matches tables
2023-06-15 11:47:28,786 [INFO] docx_generator.py 284: saving docx summary in file: kps_results/kps_result_2016_hierarchical.docx


### 5.5 Fetching unmatched sentences

Tne previous reports show only the sentences that were matched to a key point. We can also see the sentences in the input data that were not matched:

In [12]:
unmathced_sentences_df = kps_result_2016.get_unmapped_sentences_df()
unmathced_sentences_df.to_csv("kps_results/kps_result_2016_unmatched_sents.csv")


More advanced capabiliteis of the KpsResult Object will be presented later: 
 - Job management related options (Section 5). 
 - Comparative analysis (Section 6).

## 4 Run KPS step by step
In order to customize the key points summarization service and fully exploit its caching and comparitive capabilities, we must run the service in a staged manner. Let's dive into each step and understand the KPS flow.  

### 4.1 Create a domain
The *Key Point Summarization* service stores the data (and cached results) in a *domain*. A user can create several domains, one for each dataset. Domains are only accessible to the user who created them.

Create a domain using the **keypoints_client.create_domain(domain=domain, domain_params={})** method. 
Several parameters can be passed in the domain_params dictionary. In most cases, the default params need not change, and provide satisfactory results. Full documentation of the supported *domain_params* can be found [here](kps_parameters.pdf).

By default, an exception is raised if the domain already exists. To avoid this exception, add the parameter **ignore_exists=True** to the method. Note that in this case the domain_params are not updated and the existing domain remains the same. 

In this tutorial we will first delete the domain to make sure that we start with an empty domain.

In [13]:
domain = 'austin_demo'
keypoints_client.delete_domain_cannot_be_undone(domain=domain)
keypoints_client.create_domain(domain=domain, domain_params={})

2023-06-15 11:51:40,271 [INFO] keypoints_client.py 66: client calls service (delete): https://keypoint-matching-backend.debater.res.ibm.com/data
2023-06-15 11:51:40,800 [ERROR] keypoints_client.py 77: There is a problem with the request (422): user: db0a12 doesn't have domain: austin_demo
2023-06-15 11:51:40,806 [INFO] keypoints_client.py 405: domain: austin_demo doesn't exist.
2023-06-15 11:51:40,810 [INFO] keypoints_client.py 66: client calls service (post): https://keypoint-matching-backend.debater.res.ibm.com/domains
2023-06-15 11:51:41,648 [INFO] keypoints_client.py 113: created domain: austin_demo with domain_params: {}


Notes:
* The domain must be comprised of alphanumeric characters, spaces, or underscores. 
* We can delete a domain we no longer need using: **keypoints_client.delete_domain_cannot_be_undone(domain=domain)**.
* Each domain has a state: it stores all comments previously uploaded into it and a cache with all the computations performed over this data.
* If we want to restart and run over the domain from scratch (no comments and no cache), we can delete the domain and then re-create it. Keep in mind that the cache is also cleared and consecutive runs will take longer.

### 4.2 Upload comments into the domain
Upload the comments into the domain using the **keypoints_client.upload_comments(domain=domain, comments_ids=comments_ids, comments_texts=comments_texts)** method. This method receives the domain, a list of comment_ids and a list of comment_texts. When uploading comments into a domain, the *Key Point Summarization* service splits the comments into sentences and runs a minor cleansing on them. If you want to split the comments into sentences or clean them yourself, you can use the *split_comments* or *clean_comments* domain_params when creating the domain to disable this functionality in KPS (see details [here](kps_parameters.pdf)).

Note that:
* Comments_ids must be unique strings comprised of alphanumeric characters, spaces or underscores.
* The number of comments_ids must match the number comments_texts
* Comments_texts must not be longer than 3000 characters
* Uploading the same comment several times (same domain + comment_id + comment_text) is not a problem and the comment is only processed once.
* Uploading the same comment_id with a different comment_text will raise an exception. 

After being uploaded to the domain, the comments can take some time to be processed. 
The method runs in a synchronous manner and returns only after all the comments are processed. In the meantime, the status of the processing is printed on screen.
For information about running KPS asynchronously, see Section 7.

In [14]:
comments_texts_2016 = list(comments_2016_sample_df['text'])
comments_ids_2016 = list(comments_2016_sample_df['id'].astype(str))
keypoints_client.upload_comments(domain=domain, comments_ids=comments_ids_2016, comments_texts=comments_texts_2016)

2023-06-15 11:51:47,430 [INFO] keypoints_client.py 158: uploading 1000 comments in batches
2023-06-15 11:51:47,432 [INFO] keypoints_client.py 66: client calls service (post): https://keypoint-matching-backend.debater.res.ibm.com/comments
2023-06-15 11:51:48,589 [INFO] keypoints_client.py 174: uploaded 1000 comments, out of 1000
2023-06-15 11:51:48,591 [INFO] keypoints_client.py 137: waiting for the comments to be processed
2023-06-15 11:51:48,592 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/comments
2023-06-15 11:51:49,223 [INFO] keypoints_client.py 187: domain: austin_demo, comments status: {'processed_comments': 0, 'processed_sentences': 0, 'pending_comments': 1000}
2023-06-15 11:51:59,229 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/comments
2023-06-15 11:51:59,841 [INFO] keypoints_client.py 187: domain: austin_demo, comments status: {'processed_com

To examine the processed data, we can download the processed sentences and save them into a csv:

In [15]:
sentences_df = keypoints_client.get_sentences_for_domain(domain=domain)
sentences_df.to_csv(f"kps_results/{domain}_sentences.csv")

2023-06-15 11:52:10,476 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/data
2023-06-15 11:52:12,495 [INFO] keypoints_client.py 478: returning 1813 sentences for domain austin_demo


### 4.3 Run a KPS job
Run a *Key Point Summarization* job using the **keypoints_client.run_kps_job(domain=domain)** method. 
This method receives:
* The domain.
* Optional *comment_ids*: by default, the summarization is performed over all comments in the domain. If we need to run over a subset of the comments (split the data by different GEOs/users types/timeframes etc.) we can pass a list of their comments_ids.
* Optional *run_params*: a dictionary with various parameters for customizing the job (see Section 4.4).
* Optional *stance*: unlike in the run_full_kps_flow method presented in Section 2, here no stance analysis is performed by default. see Section 4.5 for stance customization.
* Optional *description*: a description of the job to appear in the user report (see Section 5.1).  

The system extracts the key points from the input comments, and matches each sentence in the comments with all its matching key points.
The job runs in a synchronous manner, prints the progress to the screen and returns the KpsResult eventually.   
For information about running KPS asynchronously, see Section 7. 

In [16]:
kps_result_2016_no_stance = keypoints_client.run_kps_job(domain = domain)

2023-06-15 11:52:13,067 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/comments
2023-06-15 11:52:13,661 [INFO] keypoints_client.py 187: domain: austin_demo, comments status: {'processed_comments': 1000, 'processed_sentences': 1813, 'pending_comments': 0}
2023-06-15 11:52:13,662 [INFO] keypoints_client.py 66: client calls service (post): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-15 11:52:14,334 [INFO] keypoints_client.py 345: started a kp summarization job - domain: austin_demo, stance: no-stance, run_params: None, job_id: 648ad13ee9ea39ceca145f87
2023-06-15 11:52:14,334 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-15 11:52:14,892 [INFO] keypoints_client.py 621: job_id 648ad13ee9ea39ceca145f87 is pending
2023-06-15 11:52:44,899 [INFO] keypoints_client.py 66: client calls service (get): https://keypoi

Stage 1/2: |--------------------------------------------------| 0.0% Complete



2023-06-15 11:53:15,553 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-15 11:53:16,178 [INFO] keypoints_client.py 625: job_id 648ad13ee9ea39ceca145f87 is running, progress: {'total_stages': 2, 'stage_1': {'inferred_batches': 2, 'total_batches': 17, 'batch_size': 2000}}


Stage 1/2: |█████---------------------------------------------| 11.8% Complete



2023-06-15 11:53:46,188 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-15 11:53:46,807 [INFO] keypoints_client.py 625: job_id 648ad13ee9ea39ceca145f87 is running, progress: {'total_stages': 2, 'stage_1': {'inferred_batches': 16, 'total_batches': 17, 'batch_size': 2000}}


Stage 1/2: |███████████████████████████████████████████████---| 94.1% Complete



2023-06-15 11:54:16,811 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-15 11:54:17,424 [INFO] keypoints_client.py 625: job_id 648ad13ee9ea39ceca145f87 is running, progress: {'total_stages': 2, 'stage_1': {'inferred_batches': 17, 'total_batches': 17, 'batch_size': 2000}, 'stage_2': {'inferred_batches': 0, 'total_batches': 2, 'batch_size': 2000}}


Stage 2/2: |--------------------------------------------------| 0.0% Complete



2023-06-15 11:54:47,438 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-15 11:54:49,826 [INFO] keypoints_client.py 628: job_id 648ad13ee9ea39ceca145f87 is done, returning result


Let's print the top 20 key points in the results: 

In [17]:
kps_result_2016_no_stance.print_result(n_sentences_per_kp = 3, title = "Austin sample 2016", n_top_kps = 20)

Austin sample 2016 results, stance: no-stance 
n_comments: 1000
Coverage (all comments): 73.60
Displaying 20 key points out of 55:
188 - Improve Austin's traffic!
	- Fix the traffic - you keep encouraging growth of the city without the infrastructure to
	  support that growth.
	- Get the bikes off the streets and improve traffic flow in and out of the city.
	- Austin should use dedicated bus lanes and bicycle lanes like Atlanta, Georgia and solve
	  traffic issues.
168 - We really need to improve our public transportation.
	- Need to rework zoning to improve traffic and add more public transit, especially to
	  suburbs.
	- The city needs a more comprehensive and better connected transportation network
	  including bus, rail, bike, pedestrian access.
	- Austin needs to get serious about alternatives to driving including real mass transit,
	  bike, and pedestrian facilities.
168 - fix the roads so automobiles can travel easier.
	- Please fix the traffic flow in and around this city.
	- N

### 4.4 Modify the run_params to customize the summary
Each domain has a cache that stores all the intermediate results that are calculated during the summarization process. Therefore modifing the run_params and running another summarization is usually faster. 

Full documentation of the supported *run_params* can be found [here](kps_parameters.pdf).
Some of the notable options:
* By default, key points are extracted automatically. When we want to provide the key points and match all the sentences to them we can pass key_points parameter: **run_param['key_points'] = [...]**. This enables a mode of work named human-in-the-loop where we first automatically extract key points, then we manually edit them (refine non-perfect key points, remove duplicated and add missing ones) and then run again, this time providing the edited keypoints as a given set of key points.
* It is also possible to provide key points and let KPS add additional missing key points. To do so pass the key points to the key_point_candidates parameter: **run_param['key_point_candidates'] = [...]** (see Section 6.2 for an elaborated example).
* Change the lengths of the required key points and the sentences participating in the summarization.
* Change the minimal number of required matching sentences per key point. 
* The **mapping_policy** is used when mapping all sentences to the final key points: the default value is **NORMAL**. Changing to **STRICT** will cause only the sentence and key point pairs with very high matching confidence to be considered matched, increasing precision but potentially decreasing coverage. Changing it to **LOOSE** will do the opposite and match pairs with lower confidene. 

Let's run with the 'LOOSE' mapping policy:

In [18]:
run_params = {'mapping_policy':'LOOSE'}
kps_result_loose = keypoints_client.run_kps_job(domain=domain, run_params=run_params)
kps_result_loose.print_result(n_sentences_per_kp=3, title='Austin sample 2016 LOOSE', n_top_kps=20)

2023-06-15 11:56:20,944 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/comments
2023-06-15 11:56:21,540 [INFO] keypoints_client.py 187: domain: austin_demo, comments status: {'processed_comments': 1000, 'processed_sentences': 1813, 'pending_comments': 0}
2023-06-15 11:56:21,542 [INFO] keypoints_client.py 66: client calls service (post): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-15 11:56:22,236 [INFO] keypoints_client.py 345: started a kp summarization job - domain: austin_demo, stance: no-stance, run_params: {'mapping_policy': 'LOOSE'}, job_id: 648ad236e9ea39ceca145f88
2023-06-15 11:56:22,241 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-15 11:56:22,829 [INFO] keypoints_client.py 621: job_id 648ad236e9ea39ceca145f88 is pending
2023-06-15 11:56:52,835 [INFO] keypoints_client.py 66: client calls servic

Austin sample 2016 LOOSE results, stance: no-stance 
n_comments: 1000
Coverage (all comments): 81.80
Displaying 20 key points out of 59:
211 - fix the roads so automobiles can travel easier.
	- Please fix the traffic flow in and around this city.
	- Need to fix traffic congestion in the city highways urgently.
	- Make ridesharing more accessible, fix the highways by making turnarounds and better
	  mergers/ramps, and stop the gentrification that is killing this city.
209 - We really need to improve our public transportation.
	- Need to rework zoning to improve traffic and add more public transit, especially to
	  suburbs.
	- The city needs a more comprehensive and better connected transportation network
	  including bus, rail, bike, pedestrian access.
	- Austin needs to get serious about alternatives to driving including real mass transit,
	  bike, and pedestrian facilities.
196 - Improve Austin's traffic!
	- Fix the traffic - you keep encouraging growth of the city without the infrast

By changing the mapping policy to **LOOSE** the comments' coverage was increased from 74% to 82%.

### 4.5 Add stance analysis to the summarization

In many usecases (surveys, customer feedback, etc.) the comments have positive and/or negative stances, and it is useful to create a KPS on each stance seperatly. Most stance detection models don't perform too well on survey data since the comments have many "suggestions" in them. These suggestions tend to be classified by the models as positives, while the user suggests a point for improvement. For that end we trained a stance-model that handles suggestions well and classifies each sentence as either 'Positive', 'Negative', 'Neutral' or 'Suggestion'. We treat Suggestions as negatives and run two separate summarizations, first over 'Positive' sentences (pro) and second over 'Negative' and 'Suggestions' sentences (con).

This has the following advantages:

* Generate separate positive/negative key points that show clearly what works well and what needs to be improved.
* Filters-out neutral sentences that usually don't contain valuable information.
* Helps the matching model avoid stance mistakes (matching a positive sentence to a negative key point and vice-versa).

In some cases, we might want to run over a single stance. For example, if we are only interested in points for improvement.
In order to run for each stance seperately, use the **stance** parameter in **run_kps_job**. the options are either "pro", "con", or "no-stance" (default). For example, to run only on the "pro" sentences, run **keypoints_client.run_kps_job(domain=domain, stance="pro")**

To generate a merged pro and con result use the **run_kps_job_both_stances** method.
This method starts two seperate jobs simultenously, one for *pro* and one for *con*. It later unifies the results and returns the merged result object (similar to the default behviour of **run_kps_full_flow**).
This method receives: 
* The domain.
* comments_ids - optional, as in *run_kps_job*.
* desription - optional, as in *run_kps_job*, stance is appended to the description of each job.
* run_params_pro - optional run_params to be sent to the *pro* job.
* run_params_con - optional run_params to be sent to the *con* job.

In [19]:
kps_result_2016_merged = keypoints_client.run_kps_job_both_stances(domain)

2023-06-15 11:58:08,912 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/comments
2023-06-15 11:58:09,558 [INFO] keypoints_client.py 187: domain: austin_demo, comments status: {'processed_comments': 1000, 'processed_sentences': 1813, 'pending_comments': 0}
2023-06-15 11:58:09,563 [INFO] keypoints_client.py 66: client calls service (post): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-15 11:58:10,283 [INFO] keypoints_client.py 345: started a kp summarization job - domain: austin_demo, stance: pro, run_params: {'stance': 'PRO'}, job_id: 648ad2a2e9ea39ceca145f89
2023-06-15 11:58:10,286 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/comments
2023-06-15 11:58:10,925 [INFO] keypoints_client.py 187: domain: austin_demo, comments status: {'processed_comments': 1000, 'processed_sentences': 1813, 'pending_comments': 0}
2023-06-15 11:58:10

Stage 1/2: |--------------------------------------------------| 0.0% Complete



2023-06-15 11:59:14,893 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-15 11:59:15,514 [INFO] keypoints_client.py 625: job_id 648ad2a3e9ea39ceca145f8a is running, progress: {'total_stages': 2, 'stage_1': {'inferred_batches': 4, 'total_batches': 4, 'batch_size': 2000}, 'stage_2': {'inferred_batches': 0, 'total_batches': 1, 'batch_size': 2000}}


Stage 2/2: |--------------------------------------------------| 0.0% Complete



2023-06-15 11:59:45,521 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-15 11:59:47,655 [INFO] keypoints_client.py 628: job_id 648ad2a3e9ea39ceca145f8a is done, returning result
2023-06-15 11:59:47,781 [INFO] KpsResult.py 525: Merging positive and negative KpsResults.


## 5. Jobs management

### 5.1 User report
The user report stores all the information about existing domains and all past and present KPS jobs. 
To fetch it and print to screen:

In [None]:
report = keypoints_client.get_full_report()
keypoints_client.print_report(report)

### 5.2 job_id
Each job has a unique job_id which is useful for the jobs managements, and can be obtained in several ways:
* It's printed to the screen when the job starts and in every progress update.
* From the user report.
* When running asyncronously (see Section 7)
* From the KpsResult object, using the **get_stance_to_job_id()** method. Note that the KpsResult can either store the information from a single job (when running on a single stance or no stance) or two jobs (for combined pro and con results). 

In [20]:
print(kps_result_2016_merged.get_stance_to_job_id())

{'pro': '648ad2a2e9ea39ceca145f89', 'con': '648ad2a3e9ea39ceca145f8a'}


In [21]:
print(kps_result_2016_no_stance.get_stance_to_job_id())

{'no-stance': '648ad13ee9ea39ceca145f87'}


### 5.3 Canceling a job
Simply exiting the program after the job is sent does not cancel the job: it keeps running on the server, consuming resources. In order to cancel a job, use: 
* **keypoints_client.cancel_kp_extraction_job(\<job_id\>)**

It is also possibe to stop all jobs in a domain, or even all jobs in all domains (this might be simpler since there is no need of the job_id):

* **keypoints_client.cancel_all_extraction_jobs_for_domain(\<domain\>)**
* **keypoints_client.cancel_all_extraction_jobs_all_domains()**


### 5.4 Fetching the results of a previous job
If the program terminated unexpectedly after the job was sent, you can still fetch the results using:
* **kps_result = keypoints_client.get_results_from_job_id(\<job_id\>)**

## 6. Comparative analysis with KPS

The KPS service allows us to easily perform comparisons between subsets of our data and to perform trend analysis over data collected in different times. Let's explore two of these options: 

### 6.1 Compare comment subsets
So far, we ran over all the sampled comments for 2016. Now, let's say we want to perform the analysis over the same data by district, and compare the feedback of the residents of district 7 with the feedback of the residents of district 10. All we need is our previous KpsResult and the comment_ids of each subset:

In [22]:
comment_ids_district_7 = list(comments_2016_sample_df[comments_2016_sample_df["Council_District"]==7]["id"].astype(str))
comment_ids_district_10 = list(comments_2016_sample_df[comments_2016_sample_df["Council_District"]==10]["id"].astype(str))

Now, we can compare the full result and the result from each district, using the method **compare_with_comment_subsets()**.
This method receives a dictionary with mappings from subset names to sets of comment ids, and performs the comparison between the full result and the subset results. For the full result and for each of the subsets, you can see the number and percentage of the comments that match each key point. Let's create the comparison df and print the top 20 key points and the summary row.

In [23]:
subsets_dict = {"district_7":comment_ids_district_7, "district_10":comment_ids_district_10}
comparison_df = kps_result_2016_no_stance.compare_with_comment_subsets(subsets_dict)
pd.concat([comparison_df.head(20),comparison_df.tail(1)])

Unnamed: 0,key point,full_n_comments,full_percent,district_10_n_comments,district_10_percent,district_7_n_comments,district_7_percent
0,Improve Austin's traffic!,188,18.80%,39,25.66%,20,16.26%
1,fix the roads so automobiles can travel easier.,168,16.80%,29,19.08%,19,15.45%
2,We really need to improve our public transport...,168,16.80%,29,19.08%,18,14.63%
3,Improve affordable housing/living.,144,14.40%,17,11.18%,26,21.14%
4,PROPERTY TAXES ARE TOO HIGH.,89,8.90%,16,10.53%,15,12.20%
5,Provide public transportation to prevent traff...,88,8.80%,15,9.87%,9,7.32%
6,Please plan better for growth on our roadways!,83,8.30%,14,9.21%,7,5.69%
7,Buying a home in S Austin is prohibitive.,63,6.30%,4,2.63%,15,12.20%
8,Water costs too much.,49,4.90%,11,7.24%,7,5.69%
9,Better pedestrian and biking lifestyle options.,44,4.40%,4,2.63%,5,4.07%


From the percentage displayed in the comparison, it is apparent that the residents of district 10 are more concerned about the traffic issues in Austin, while the residents of district 7 care more about affordable housing.

Using this method, we can compare results over subsets of the data in the same domain with a single KPS job. The subsets can be data from different GEOs, different organizations, different times, different users (e.g. promoters/detractors) etc.).

This has several advantages over running a separate job for each subset:
- The service only needs to be called once.
- By running a single job, we generate a single key points list which makes it possible to compare between the subsets.
- KPS generates better results for larger datasets, so running over the full data allows to get reacher key points with better coverage. 

### 6.2 Run KPS incrementally 

A year passed, and we collect additional data (from 2017). We would like to analyze the new data and compare it to the data from 2016. 
We can upload all the comments from 2017 to the same domain ("austin_demo").

In [24]:
comments_2017_df = comments_df[comments_df['year'] == 2017]

sample_size = 1000
comments_2017_sample_df = comments_2017_df.sample(n = sample_size, random_state = 1)

domain = 'austin_demo'
comments_texts_2017 = list(comments_2017_sample_df['text'])
comments_ids_2017 = list(comments_2017_sample_df['id'].astype(str))
keypoints_client.upload_comments(domain=domain, comments_ids=comments_ids_2017, comments_texts=comments_texts_2017)

2023-06-15 12:09:57,048 [INFO] keypoints_client.py 158: uploading 1000 comments in batches
2023-06-15 12:09:57,051 [INFO] keypoints_client.py 66: client calls service (post): https://keypoint-matching-backend.debater.res.ibm.com/comments
2023-06-15 12:09:58,219 [INFO] keypoints_client.py 174: uploaded 1000 comments, out of 1000
2023-06-15 12:09:58,221 [INFO] keypoints_client.py 137: waiting for the comments to be processed
2023-06-15 12:09:58,222 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/comments
2023-06-15 12:09:58,827 [INFO] keypoints_client.py 187: domain: austin_demo, comments status: {'processed_comments': 1000, 'processed_sentences': 1813, 'pending_comments': 1000}
2023-06-15 12:10:08,831 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/comments
2023-06-15 12:10:09,472 [INFO] keypoints_client.py 187: domain: austin_demo, comments status: {'process

We can now run a new summarization job over all the data in the domain, as we did before, and automatically extract new key points. Then we can use the **compare_with_comment_subsets** method to compare between the data subsets, as we learned in the previous section. We can assume that some key points will be identical to the key points extracted from the 2016 data, some will be similar and some key points will be new.

A better option is to run a new summarization on the comments from 2017, but provide the keypoints from the 2016 summarization and let *Key Point Summarization* add new key points from the 2017 data if there are such. One benefit of this approach is that the new result will mostly use 2016 key point and we will be consistent with our previous results. Another major benefit for this approach is run-time. 2016 data was already analyzed with these key points and now we only need to process the new data. The 2016 key points can be provided via the: **run_param['key_point_candidates'] = [...]** parameter, passing a list of strings, or we can use: **run_param['key_point_candidates_by_job_ids'] = [\<job_id1\>,...]** and provide a list of previous job_ids. KPS will take the key points from the jobs' result automatically.

For simplicity, we'll run over the result without the stance analysis. We can also use the incremental approach when running on both stances: we will need to provide the job_id of the positive summarization of 2016 in the run_params_pro and the job_id of negative summarization of 2016 in the run_params_con when running on the data from 2017.

First, let's extract the job id from the result:

In [25]:
stance_to_job_id = kps_result_2016_no_stance.get_stance_to_job_id()
print(stance_to_job_id)
job_id_2016_no_stance = stance_to_job_id["no-stance"]

{'no-stance': '648ad13ee9ea39ceca145f87'}


We can use the **comments_ids** parameter in the **run_kps_job** method, to run on a subset of the comments in the domain. Let's do that and run summarization over the comments from 2017 independantly. We will provide the key points from 2016 as candidates, since we want to able to compare between the the two results:

In [26]:
run_params = {'key_point_candidates_by_job_ids': [job_id_2016_no_stance]}
kps_result_2017_no_stance = keypoints_client.run_kps_job(domain, run_params=run_params, comments_ids = comments_ids_2017)

2023-06-15 12:10:46,501 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/comments
2023-06-15 12:10:47,105 [INFO] keypoints_client.py 187: domain: austin_demo, comments status: {'processed_comments': 2000, 'processed_sentences': 3792, 'pending_comments': 0}
2023-06-15 12:10:47,109 [INFO] keypoints_client.py 66: client calls service (post): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-15 12:10:48,169 [INFO] keypoints_client.py 345: started a kp summarization job - domain: austin_demo, stance: no-stance, run_params: {'key_point_candidates_by_job_ids': ['648ad13ee9ea39ceca145f87']}, job_id: 648ad598e9ea39ceca145f8d
2023-06-15 12:10:48,170 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-15 12:10:48,715 [INFO] keypoints_client.py 621: job_id 648ad598e9ea39ceca145f8d is pending
2023-06-15 12:11:18,706 [INFO] keypo

Stage 1/3: |--------------------------------------------------| 0.0% Complete



2023-06-15 12:12:50,661 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-15 12:12:51,292 [INFO] keypoints_client.py 625: job_id 648ad598e9ea39ceca145f8d is running, progress: {'total_stages': 3, 'stage_0': {'inferred_batches': 11, 'total_batches': 11, 'batch_size': 2000}, 'stage_1': {'inferred_batches': 1, 'total_batches': 12, 'batch_size': 2000}}


Stage 1/3: |████----------------------------------------------| 8.3% Complete



2023-06-15 12:13:21,293 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-15 12:13:21,911 [INFO] keypoints_client.py 625: job_id 648ad598e9ea39ceca145f8d is running, progress: {'total_stages': 3, 'stage_0': {'inferred_batches': 11, 'total_batches': 11, 'batch_size': 2000}, 'stage_1': {'inferred_batches': 11, 'total_batches': 12, 'batch_size': 2000}}


Stage 1/3: |█████████████████████████████████████████████-----| 91.7% Complete



2023-06-15 12:13:51,915 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-15 12:13:52,550 [INFO] keypoints_client.py 625: job_id 648ad598e9ea39ceca145f8d is running, progress: {'total_stages': 3, 'stage_0': {'inferred_batches': 11, 'total_batches': 11, 'batch_size': 2000}, 'stage_1': {'inferred_batches': 12, 'total_batches': 12, 'batch_size': 2000}, 'stage_2': {'inferred_batches': 0, 'total_batches': 3, 'batch_size': 2000}}


Stage 2/3: |--------------------------------------------------| 0.0% Complete



2023-06-15 12:14:22,555 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-15 12:14:23,192 [INFO] keypoints_client.py 625: job_id 648ad598e9ea39ceca145f8d is running, progress: {'total_stages': 3, 'stage_0': {'inferred_batches': 11, 'total_batches': 11, 'batch_size': 2000}, 'stage_1': {'inferred_batches': 12, 'total_batches': 12, 'batch_size': 2000}, 'stage_2': {'inferred_batches': 2, 'total_batches': 3, 'batch_size': 2000}}


Stage 2/3: |█████████████████████████████████-----------------| 66.7% Complete



2023-06-15 12:14:53,198 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-15 12:14:55,464 [INFO] keypoints_client.py 628: job_id 648ad598e9ea39ceca145f8d is done, returning result


Now, we can compare the kps_result_2016_no_stance with the kps_result_2017_no_stance using the **compare_with_other_results** method. This method receives the title for the current result and a dictionary mapping from results name to KpsResult objects, and returns the comparison table. If the comparison is with a single other result, the change percent is also displayed. 

In [27]:
other_results_dict = {"2017": kps_result_2017_no_stance}
comparison_df = kps_result_2016_no_stance.compare_with_other_results(this_title="2016", other_results_dict=other_results_dict)
comparison_df

Unnamed: 0,key point,2016_n_comments,2016_percent,2017_n_comments,2017_percent,change_percent
0,Improve Austin's traffic!,188,18.80%,137,13.70%,-5.10%
1,fix the roads so automobiles can travel easier.,168,16.80%,143,14.30%,-2.50%
2,We really need to improve our public transport...,168,16.80%,134,13.40%,-3.40%
3,Improve affordable housing/living.,144,14.40%,204,20.40%,6.00%
4,PROPERTY TAXES ARE TOO HIGH.,89,8.90%,99,9.90%,1.00%
5,Provide public transportation to prevent traff...,88,8.80%,72,7.20%,-1.60%
6,Please plan better for growth on our roadways!,83,8.30%,71,7.10%,-1.20%
7,Buying a home in S Austin is prohibitive.,63,6.30%,93,9.30%,3.00%
8,Water costs too much.,49,4.90%,47,4.70%,-0.20%
9,Better pedestrian and biking lifestyle options.,44,4.40%,58,5.80%,1.40%


In the first 54 rows we can see the key points from 2016 applied to the data from 2017. Then, key points from 2017 that are not covered by the 2016 data are added, e.g. *"Focus on basic services"* and *"Austin is not managing growth well."*.

## 7. Running KPS asynchronously

It is also possible to upload comments and run KPS jobs asynchronously. This can be useful when you want to start several jobs simultaneously, and then later collect the results.

Let's create a new domain for sake of the demonstaration. Note that this is not required, as async and sync calls can be used on the same doamin.

In [28]:
domain = 'austin_demo_async'
keypoints_client.delete_domain_cannot_be_undone(domain=domain)
keypoints_client.create_domain(domain=domain, domain_params={})

2023-06-15 12:19:18,572 [INFO] keypoints_client.py 66: client calls service (delete): https://keypoint-matching-backend.debater.res.ibm.com/data
2023-06-15 12:19:19,121 [ERROR] keypoints_client.py 77: There is a problem with the request (422): user: db0a12 doesn't have domain: austin_demo_async
2023-06-15 12:19:19,125 [INFO] keypoints_client.py 405: domain: austin_demo_async doesn't exist.
2023-06-15 12:19:19,126 [INFO] keypoints_client.py 66: client calls service (post): https://keypoint-matching-backend.debater.res.ibm.com/domains
2023-06-15 12:19:20,092 [INFO] keypoints_client.py 113: created domain: austin_demo_async with domain_params: {}


### 7.1 Uploading comments asynchronously

In order to start loading comments, use the *upload_comments_async* method: 

In [29]:
comments_texts = list(comments_2016_sample_df['text'])
comments_ids = list(comments_2016_sample_df['id'].astype(str))
keypoints_client.upload_comments_async(domain=domain, comments_ids=comments_ids, comments_texts=comments_texts)

2023-06-15 12:19:25,793 [INFO] keypoints_client.py 158: uploading 1000 comments in batches
2023-06-15 12:19:25,798 [INFO] keypoints_client.py 66: client calls service (post): https://keypoint-matching-backend.debater.res.ibm.com/comments
2023-06-15 12:19:26,950 [INFO] keypoints_client.py 174: uploaded 1000 comments, out of 1000


The method uploads the comments and returns immediately. We must wait until all comments finish processing before starting a KPS job. This can be checked via the **are_all_comments_processed(domain=domain)** method, which prints the upload status and returns True when the domain is ready for running jobs:

In [30]:
print(keypoints_client.are_all_comments_processed(domain))

2023-06-15 12:19:30,042 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/comments
2023-06-15 12:19:30,648 [INFO] keypoints_client.py 187: domain: austin_demo_async, comments status: {'processed_comments': 0, 'processed_sentences': 0, 'pending_comments': 1000}


False


You can also use the **wait_till_all_comments_are_processed(domain=domain)** method, that returns only after the comments are processed:

In [31]:
keypoints_client.wait_till_all_comments_are_processed(domain=domain) 

2023-06-15 12:19:40,219 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/comments
2023-06-15 12:19:40,851 [INFO] keypoints_client.py 187: domain: austin_demo_async, comments status: {'processed_comments': 0, 'processed_sentences': 0, 'pending_comments': 1000}
2023-06-15 12:19:50,852 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/comments
2023-06-15 12:19:51,455 [INFO] keypoints_client.py 187: domain: austin_demo_async, comments status: {'processed_comments': 0, 'processed_sentences': 0, 'pending_comments': 1000}
2023-06-15 12:20:01,463 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/comments
2023-06-15 12:20:02,101 [INFO] keypoints_client.py 187: domain: austin_demo_async, comments status: {'processed_comments': 1000, 'processed_sentences': 1813, 'pending_comments': 0}


### 7.2 Running KPS jobs asynchronously

In order to start a job asynchronously, use the **run_kps_job_async** method. This method receives the same arguments as **run_kps_job**, but returns right after a the job is sent to the server, returning a future object:

In [32]:
future = keypoints_client.run_kps_job_async(domain=domain)

2023-06-15 12:20:06,118 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/comments
2023-06-15 12:20:06,731 [INFO] keypoints_client.py 187: domain: austin_demo_async, comments status: {'processed_comments': 1000, 'processed_sentences': 1813, 'pending_comments': 0}
2023-06-15 12:20:06,736 [INFO] keypoints_client.py 66: client calls service (post): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-15 12:20:07,450 [INFO] keypoints_client.py 345: started a kp summarization job - domain: austin_demo_async, stance: no-stance, run_params: None, job_id: 648ad7c7e9ea39ceca145f90


Use the returned future and wait till results are available using the **kps_result = future.get_result()** method. The method waits for the job to finish and eventually returns the result.

In [33]:
kps_result_async = future.get_result(high_verbosity=True)

2023-06-15 12:20:24,794 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-15 12:20:25,420 [INFO] keypoints_client.py 625: job_id 648ad7c7e9ea39ceca145f90 is running, progress: not updated yet
2023-06-15 12:20:55,425 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-15 12:20:56,036 [INFO] keypoints_client.py 625: job_id 648ad7c7e9ea39ceca145f90 is running, progress: {'total_stages': 2, 'stage_1': {'inferred_batches': 0, 'total_batches': 17, 'batch_size': 2000}}


Stage 1/2: |--------------------------------------------------| 0.0% Complete



2023-06-15 12:21:26,039 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-15 12:21:26,667 [INFO] keypoints_client.py 625: job_id 648ad7c7e9ea39ceca145f90 is running, progress: {'total_stages': 2, 'stage_1': {'inferred_batches': 13, 'total_batches': 17, 'batch_size': 2000}}


Stage 1/2: |██████████████████████████████████████------------| 76.5% Complete



2023-06-15 12:21:56,675 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-15 12:21:57,277 [INFO] keypoints_client.py 625: job_id 648ad7c7e9ea39ceca145f90 is running, progress: {'total_stages': 2, 'stage_1': {'inferred_batches': 17, 'total_batches': 17, 'batch_size': 2000}, 'stage_2': {'inferred_batches': 0, 'total_batches': 2, 'batch_size': 2000}}


Stage 2/2: |--------------------------------------------------| 0.0% Complete



2023-06-15 12:22:27,283 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-15 12:22:27,896 [INFO] keypoints_client.py 625: job_id 648ad7c7e9ea39ceca145f90 is running, progress: {'total_stages': 2, 'stage_1': {'inferred_batches': 17, 'total_batches': 17, 'batch_size': 2000}, 'stage_2': {'inferred_batches': 1, 'total_batches': 2, 'batch_size': 2000}}


Stage 2/2: |█████████████████████████-------------------------| 50.0% Complete



2023-06-15 12:22:57,904 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-15 12:22:58,527 [INFO] keypoints_client.py 625: job_id 648ad7c7e9ea39ceca145f90 is running, progress: {'total_stages': 2, 'stage_1': {'inferred_batches': 17, 'total_batches': 17, 'batch_size': 2000}, 'stage_2': {'inferred_batches': 1, 'total_batches': 2, 'batch_size': 2000}}


Stage 2/2: |█████████████████████████-------------------------| 50.0% Complete



2023-06-15 12:23:28,533 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-15 12:23:30,674 [INFO] keypoints_client.py 628: job_id 648ad7c7e9ea39ceca145f90 is done, returning result


The future object can also be used to obtain the job_id, via the **future.get_job_id()** method.

To generate a merged pro and con KpsResult, generate a separate pro_result and con_result using the above flow, and then use the method **KpsResult.get_merged_pro_con_results(pro_result, con_result)**. It's important to only merge pro and con results that were obtained over the same domain and using the same set of comments, otherwise an error will be raised.

In [37]:
future_pro = keypoints_client.run_kps_job_async(domain, stance="pro")
future_con = keypoints_client.run_kps_job_async(domain, stance="con")
result_pro = future_pro.get_result()
result_con = future_con.get_result()
result_async_merged = KpsResult.get_merged_pro_con_results(pro_result=result_pro, con_result=result_con)

2023-06-15 12:27:25,030 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/comments
2023-06-15 12:27:25,635 [INFO] keypoints_client.py 187: domain: austin_demo_async, comments status: {'processed_comments': 1000, 'processed_sentences': 1813, 'pending_comments': 0}
2023-06-15 12:27:25,638 [INFO] keypoints_client.py 66: client calls service (post): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-15 12:27:26,337 [INFO] keypoints_client.py 345: started a kp summarization job - domain: austin_demo_async, stance: pro, run_params: {'stance': 'PRO'}, job_id: 648ad97ee9ea39ceca145f9f
2023-06-15 12:27:26,340 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/comments
2023-06-15 12:27:26,953 [INFO] keypoints_client.py 187: domain: austin_demo_async, comments status: {'processed_comments': 1000, 'processed_sentences': 1813, 'pending_comments': 0}
2

## 8. Cleanup
If you finished the tutorial and no longer need the domains and the results, cleaning up is always advised:

In [38]:
keypoints_client.delete_domain_cannot_be_undone(domain='austin_demo')
keypoints_client.delete_domain_cannot_be_undone(domain='austin_demo_async')

2023-06-15 12:28:55,722 [INFO] keypoints_client.py 66: client calls service (delete): https://keypoint-matching-backend.debater.res.ibm.com/data
2023-06-15 12:28:57,613 [INFO] keypoints_client.py 400: domain: austin_demo was deleted
2023-06-15 12:28:57,614 [INFO] keypoints_client.py 66: client calls service (delete): https://keypoint-matching-backend.debater.res.ibm.com/data
2023-06-15 12:28:59,699 [INFO] keypoints_client.py 400: domain: austin_demo_async was deleted


{'status': 'success'}

## 9. Conclusion
In this tutorial, we showed how to use the *Key Point Summarization* service, and how it provides detailed insights over survey data right out of the box - significantly reducing the effort required by a data scientist to analyze the data. We also demonstrated key *key point Summarization* features such as how to modify the summarization parameters and increase coverage, how to use the stance-model and create per-stance results, how to incrementally add new data and how to compare between different subsets of the data.

Feel free to contact us for questions or assistance: *yoavka@il.ibm.com*