# Using *Key Point Summarization* service for analyzing and finding insights in a survey data 
When you have a large collection of texts representing people’s opinions (such as product reviews, survey answers or social media), it is difficult to understand the key issues that come up in the data. Going over thousands of comments is prohibitively expensive.  Existing automated approaches are often limited to identifying recurring phrases or concepts and the overall sentiment toward them, but do not provide detailed or actionable insights. 

In this tutorial you will gain hands-on experience in using *Key Point Summarization* (KPS) for analyzing and deriving insights from open-ended comments.  

The data we will use is [a community survey conducted in the city of Austin](https://data.austintexas.gov/dataset/Community-Survey/s2py-ceb7). In this survey, the citizens of Austin were asked "If there was ONE thing you could share with the Mayor regarding the City of Austin (any comment, suggestion, etc.), what would it be?". 

## 1. Initialization

### 1.1 Setup

Lets first import all required packages for this tutoarial and initialize the *Key Point Analysis* client. The client prints information using the logger and a suitable verbosity level should be set. The client object is configured with an API key. It should be  retrieved from the [Project Debater Early Access Program](https://early-access-program.debater.res.ibm.com/) site.  In the code bellow it is passed by the enviroment variable *DEBATER_API_KEY* (you may also modify the code and place the api-key directly).

In [2]:
from debater_python_api.api.clients.keypoints_client import KpsClient, KpsJobFuture
from debater_python_api.api.clients.key_point_summarization.KpsResult import KpsResult
import os
import pandas as pd
import json 

pd.set_option('display.max_rows', None)
KpsClient.init_logger()
os.environ['DEBATER_API_KEY'] = "db0a12ff371164c162dc1b4b6fc6f76bL10" #TODO!!!
api_key = os.environ['DEBATER_API_KEY']
host = 'https://keypoint-matching-backend.debater.res.ibm.com'
keypoints_client = KpsClient(api_key, host)

### 1.2 Read the data
Let's read the data from *dataset_austin.csv* file, which holds the Austin survey dataset, and print the first comment.

In [3]:
comments_df = pd.read_csv('./dataset_austin.csv')

print(f'There are {len(comments_df)} comments in the dataset')
print(dict(comments_df.iloc[0,:]))

There are 3187 comments in the dataset
{'id': 1, 'year': 2016, 'text': "Dissatisfied traffic and with traffic, timing of street lights.  EXTREMELY dissatisfied with cit govt. interfering in local businesses (Uber/Lyft, income property owners).  Also, extremely dissatisfied with all the free handouts to people who are perfectly capable of earning their own money.  I'm very dissatisfied with the liberal leaning local politicians."}


Each comment has a unique_id 'id', a 'text' and a 'year'. We will first remove all comments with text longer than 3000 characters since this is a system's limit, and all empty comments. Then we will filter the comments and take the ones from 2016. 

The *Key Point Summarization* service is able to run over hundreds of thousands of sentences, however since the computation is heavy in resources (particularly GPUs) the trial version is limited to 1000 comments. You may request to increase this limit if needed. Since we want the tutorial to be relativly fast and lightweight, we will only run on a sample of 400 comments. Note that running over a larger set improves both the quality and coverage of the results.

In [5]:
comments_df = comments_df.dropna()
comments_df = comments_df[comments_df.text.apply(lambda x: 0<len(str(x))<=3000)]

comments_2016_df = comments_df[comments_df['year'] == 2016]
sample_size = 400
comments_2016_sample_df = comments_2016_df.sample(n = sample_size, random_state = 1)

## 2. Run kps and generate results (2016 data)

### 2.1 Run the full KPS flow
This is the simplest way to run the Key Point Summarization system, and it provides a good entry point. To run it, you simply need to send your list of textual comments and a unique domain name. The domain name must contain only alphanumeric characters, spaces or underscore. In order to resue a domain from a previous run, run  **keypoints_client.delete_domain(domain=domain)** first.

The system extracts the key points from the data, and matches each sentence in the input comments with all its matched key points.

In [None]:
domain = "austin_demo_full"
comments_texts_2016 = list(comments_2016_sample_df['text'])
kps_result_2016 = keypoints_client.run_full_kps_flow(domain, comments_texts_2016)

2023-06-01 22:34:44,105 [INFO] keypoints_client.py 66: client calls service (post): https://keypoint-matching-backend.debater.res.ibm.com/domains
2023-06-01 22:34:45,081 [INFO] keypoints_client.py 113: created domain: austin_demo_full with domain_params: None
2023-06-01 22:34:45,092 [INFO] keypoints_client.py 158: uploading 400 comments in batches
2023-06-01 22:34:45,093 [INFO] keypoints_client.py 66: client calls service (post): https://keypoint-matching-backend.debater.res.ibm.com/comments
2023-06-01 22:34:46,075 [INFO] keypoints_client.py 174: uploaded 400 comments, out of 400
2023-06-01 22:34:46,078 [INFO] keypoints_client.py 137: waiting for the comments to be processed
2023-06-01 22:34:46,079 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/comments
2023-06-01 22:34:46,676 [INFO] keypoints_client.py 187: domain: austin_demo_full, comments status: {'processed_comments': 0, 'processed_sentences': 0, 'pending_comments':

Stage 1/2: |██████████----------------------------------------| 20.0% Complete



2023-06-01 22:36:12,456 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-01 22:36:14,078 [INFO] keypoints_client.py 638: job_id 6478f2eef0128c554c89af90 is done, returning result
2023-06-01 22:36:14,194 [INFO] keypoints_client.py 66: client calls service (delete): https://keypoint-matching-backend.debater.res.ibm.com/data
2023-06-01 22:36:15,293 [INFO] keypoints_client.py 408: domain: austin_demo_full was deleted


kps_result_2016 is a KpsResult object. Let's print the result:

In [None]:
kps_result_2016.print_result(n_sentences_per_kp = 2, title = "Random sample")

Random sample coverage (all sentences): 50.62
Random sample coverage (of pro and con sentences): 59.74
Random sample key points:
102 - TRAFFIC NEEDS ADDRESSING ON ALL HWYS AND STREETS - con
	- Need to fix traffic congestion in the city highways urgently.
	- Fix the traffic - you keep encouraging growth of the city without the infrastructure to
	  support that growth.
66 - Improve affordable housing/living. - con
	- Cost of living here is to high & tax for my home is to high.
	- Reduce property taxes and housing costs so that retiring and still living here is a real
	  possibility.
54 - Public transportation needs to improve. - con
	- Need to rework zoning to improve traffic and add more public transit, especially to
	  suburbs.
	- There needs to be better and more public transportation in the city of Austin.
51 - BIKE LANES HURT TRAFFIC. - con
	- Remove peddlers at traffic lights and fix traffic problems.
	- FIX THE TRAFFIC.
33 - PROPERTY TAXES ARE TOO HIGH. - con
	- Consider the finan

First, in the first line we see that 51% of the sentences were matched to at least one key point.
For each key point, this method prints the number of matched sentences within the input comments (including the key point itself), its stance (pro or con) and the top matching sentences.

This is just a small part of the information that can be extracted from the results. For more information on the result processing see section 4.

### 2.2 Run Key Point Summarization in a stagged manner.  
In order to customize the key points summerization service and fully exploit its caching and comparitive capabilities, we must run the service in a stagged manner. Let's dive into each step and understand the KPS flow.  

#### 2.2.1 Create a domain
The *Key Point Summarization* service stores the data (and cached-results) in a *domain*. A user can create several domains, one for each dataset. Domains are only accessible to the user who created them.

Create a domin using the **keypoints_client.create_domain(domain=domain, domain_params={})** method. Several params can be passed when creating a domain in the domain_params dictionary as described in the documentation. Leaving it empty gives us a good default behaviour. 
By deafult, an exception is thrown if the domain already exists. In order to avoid this exception, add the parameter **ignore_exists=True** to the method. Note that in this case the domain_params are not updated. 

In this tutorial we will first delete the domain to make sure that we start with an empty domain.

Full documentation of the supported *domain_params* can be found [here](kps_parameters.pdf).

In [8]:
domain = 'austin_demo'
keypoints_client.delete_domain_cannot_be_undone(domain=domain)
keypoints_client.create_domain(domain=domain, domain_params={})

2023-06-01 22:50:19,284 [INFO] keypoints_client.py 66: client calls service (delete): https://keypoint-matching-backend.debater.res.ibm.com/data
2023-06-01 22:50:19,840 [ERROR] keypoints_client.py 77: There is a problem with the request (422): user: db0a12 doesn't have domain: austin_demo
2023-06-01 22:50:19,843 [INFO] keypoints_client.py 413: domain: austin_demo doesn't exist.
2023-06-01 22:50:19,844 [INFO] keypoints_client.py 66: client calls service (post): https://keypoint-matching-backend.debater.res.ibm.com/domains
2023-06-01 22:50:20,677 [INFO] keypoints_client.py 113: created domain: austin_demo with domain_params: {}


Few domain related points:
* The domain must consist of alphanumeric characters, spaces and underscores only. 
* We can always delete a domain we no longer need using: **keypoints_client.delete_domain_cannot_be_undone(domain=domain)**
* Keep in mind that a domain has a state. It stores all comments that had beed uploaded into it and a cache with all computations performed over this data.
* If we want to restart and run over the domain from scratch (no comments and no cache), we can delete the domain and then re-create it or obviously use a different domain. Keep in mind that the cache is also cleared and consecutive runs will take longer.

#### 2.2.2 Upload comments into the domain
Upload the comments into the domain using the **keypoints_client.upload_comments(domain=domain, comments_ids=comments_ids, comments_texts=comments_texts)** method. This method receives the domain, a list of comment_ids and a list of comment_texts. When uploading comments into a domain, the *Key Point Summarization* service splits the comments into sentences and runs a minor cleansing on them. If you have domain-specific knowladge and want to split the comments into sentences or clean them yourself, you can use the *split_comments* or *clean_comments* domain_params when creating the domain. 

Note that:
* Comments_ids must be unique strings containing only alphanumeric characters, spaces or underscores.
* The number of comments_ids must match the number comments_texts
* Comments_texts must not be longer than 3000 characters
* Uploading the same comment several times (same domain + comment_id + comment_text) is not a problem and even saves time since the comment is only processed once.
* Uploading the same comment_id with a different comment_text will raise an exception. 

After being uploaded to the domain, the comments are processed. The method runs in a synchronous manner and returns only after all the comments are processed. In the meanwhile, the status of the processing is printed on screen.
For information about running KPS in an async manner, see section 6.

In [9]:
comments_texts_2016 = list(comments_2016_sample_df['text'])
comments_ids_2016 = list(comments_2016_sample_df['id'].astype(str))
keypoints_client.upload_comments(domain=domain, comments_ids=comments_ids_2016, comments_texts=comments_texts_2016)

2023-06-01 22:59:10,791 [INFO] keypoints_client.py 158: uploading 400 comments in batches
2023-06-01 22:59:10,795 [INFO] keypoints_client.py 66: client calls service (post): https://keypoint-matching-backend.debater.res.ibm.com/comments
2023-06-01 22:59:11,822 [INFO] keypoints_client.py 174: uploaded 400 comments, out of 400
2023-06-01 22:59:11,823 [INFO] keypoints_client.py 137: waiting for the comments to be processed
2023-06-01 22:59:11,823 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/comments
2023-06-01 22:59:12,414 [INFO] keypoints_client.py 187: domain: austin_demo, comments status: {'processed_comments': 0, 'processed_sentences': 0, 'pending_comments': 400}
2023-06-01 22:59:22,420 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/comments
2023-06-01 22:59:23,010 [INFO] keypoints_client.py 187: domain: austin_demo, comments status: {'processed_comment

To examine the processed data, we can download the processed sentences and save them into a csv:

In [10]:
sentences_df = keypoints_client.get_sentences_for_domain(domain=domain)
sentences_df.to_csv(f"kps_results/{domain}_sentences.csv")

2023-06-01 23:00:07,316 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/data
2023-06-01 23:00:08,758 [INFO] keypoints_client.py 486: returning 727 sentences for domain austin_demo


#### 2.2.3 Run a KPS job
Run a *Key Point Summarization* job using the **keypoints_client.run_kps_job(domain=domain)** method. 
This method receives:
* The domain.
* Optional *comment_ids*: by default, the analysis is performed over all comments in the domain. If we need to run over a subset of the comments (split the data by different GEOs/users types/timeframes etc') we can pass a list of thier comments_ids.
* Optional *run_params*: a dictionary with various parameters for customizing the job (see section 2.2.4).
* Optional *stance*: By default, no stance analysis is performed. see section 2.2.5 for stance customization.
* Optional *description*: a description of the job to appear in the user report (see section 3.2).  

The system extracts the key points from the input comments, and matches each sentence in the comments with all its matching key points.
The job runs in a synchronous manner, prints the progress to the screen and returns the results eventually.   
For information about running KPS in an async manner, see section 6. 

In [11]:
kps_result_2016 = keypoints_client.run_kps_job(domain = domain)

2023-06-01 23:03:29,745 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/comments
2023-06-01 23:03:30,330 [INFO] keypoints_client.py 187: domain: austin_demo, comments status: {'processed_comments': 400, 'processed_sentences': 727, 'pending_comments': 0}
2023-06-01 23:03:30,331 [INFO] keypoints_client.py 66: client calls service (post): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-01 23:03:30,958 [INFO] keypoints_client.py 345: started a kp summarization job - domain: austin_demo, stance: no-stance, run_params: None, job_id: 6478f992f0128c554c89afbd
2023-06-01 23:03:30,959 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-01 23:03:31,547 [INFO] keypoints_client.py 631: job_id 6478f992f0128c554c89afbd is pending
2023-06-01 23:04:01,554 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint

Stage 1/2: |██████████████------------------------------------| 28.6% Complete



2023-06-01 23:04:32,203 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-01 23:04:33,773 [INFO] keypoints_client.py 638: job_id 6478f992f0128c554c89afbd is done, returning result


Let's print the results: 

In [12]:
kps_result_2016.print_result(n_sentences_per_kp = 2, title = "Random sample")

Random sample coverage: 60.52
Random sample key points:
112 - TRAFFIC NEEDS ADDRESSING ON ALL HWYS AND STREETS
	- Need to fix traffic congestion in the city highways urgently.
	- Fix the traffic - you keep encouraging growth of the city without the infrastructure to
	  support that growth.
77 - Improve affordable housing/living.
	- Cost of living here is to high & tax for my home is to high.
	- Reduce property taxes and housing costs so that retiring and still living here is a real
	  possibility.
67 - Public transportation needs to improve.
	- Need to rework zoning to improve traffic and add more public transit, especially to
	  suburbs.
	- There needs to be better and more public transportation in the city of Austin.
59 - BIKE LANES HURT TRAFFIC.
	- Remove peddlers at traffic lights and fix traffic problems.
	- FIX THE TRAFFIC.
43 - Integrated transportation is critical.
	- The city needs a more comprehensive and better connected transportation network
	  including bus, rail, bike, p

First, in the first line we see that 61% of the sentences were matched to at least one key point.
For each key point, this method prints the number of matched sentences within the input comments (including the key point itself) and the top matching sentences.

This is just a small part of the information that can be extracted from the results. For more information on the result processing see section 4.

#### 2.2.4 Modify the run_params to customize your analysis
Each domain has a cache that stores all intermediate results that are calculated during the analysis. Therefore modifing the run_params and running another analysis is usually faster since all intersecting intermediate results are retreived from cache. 

Full documentation of the supported *run_params* can be found [here](kps_parameters.pdf).
Some of the notable options:
* By default, key points are extracted automatically. When we want to provide key points and match all sentences to these key points we can do so by passing them to the keypoints parameter: **run_param['key_points'] = [...]**. This enables a mode of work named human-in-the-loop where we first automatically extract key points, then we manually edit them (refine non-perfect key points, remove duplicated and add missing ones) and then run again, this time providing the edited keypoints as a given set of key points.
* It is also possible to provide key points and let KPS add additional missing key points. To do so pass the key points to the key_point_candidates parameter: **run_param['key_point_candidates'] = [...]** (see section 5 for an elaborated example).
* Change the lengths of the required key points and the sentences participating in the analysis.
* The **mapping_policy** is used when mapping all sentences to the final key points: the default value is **NORMAL**. Changing to **STRICT** will cause only the sentence and key point pairs with very high matching confidence to be considered matched, increasing precision but potentially decreasing coverage. Changing it to **LOOSE** will do the opposite and match pairs with lower confidene. 

Let's run with 'LOOSE' mapping policy:

In [13]:
run_params = {'mapping_policy':'LOOSE'}
kps_result_loose = keypoints_client.run_kps_job(domain=domain, run_params=run_params)
kps_result_loose.print_result(n_sentences_per_kp=2, title='Random sample loose')

2023-06-01 23:07:59,680 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/comments
2023-06-01 23:08:00,289 [INFO] keypoints_client.py 187: domain: austin_demo, comments status: {'processed_comments': 400, 'processed_sentences': 727, 'pending_comments': 0}
2023-06-01 23:08:00,291 [INFO] keypoints_client.py 66: client calls service (post): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-01 23:08:00,939 [INFO] keypoints_client.py 345: started a kp summarization job - domain: austin_demo, stance: no-stance, run_params: {'mapping_policy': 'LOOSE'}, job_id: 6478faa0f0128c554c89afcb
2023-06-01 23:08:00,947 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-01 23:08:01,538 [INFO] keypoints_client.py 631: job_id 6478faa0f0128c554c89afcb is pending
2023-06-01 23:08:31,542 [INFO] keypoints_client.py 66: client calls service 

Random sample loose coverage: 70.29
Random sample loose key points:
138 - TRAFFIC NEEDS ADDRESSING ON ALL HWYS AND STREETS
	- Need to fix traffic congestion in the city highways urgently.
	- Fix the traffic - you keep encouraging growth of the city without the infrastructure to
	  support that growth.
92 - Public transportation needs to improve.
	- Need to rework zoning to improve traffic and add more public transit, especially to
	  suburbs.
	- There needs to be better and more public transportation in the city of Austin.
88 - Improve affordable housing/living.
	- Cost of living here is to high & tax for my home is to high.
	- Reduce property taxes and housing costs so that retiring and still living here is a real
	  possibility.
71 - BIKE LANES HURT TRAFFIC.
	- Remove peddlers at traffic lights and fix traffic problems.
	- FIX THE TRAFFIC.
63 - Integrated transportation is critical.
	- The city needs a more comprehensive and better connected transportation network
	  including bus, r

By changing the mapping policy to **LOOSE** the coverage was increased from 61% to 70%.

#### 2.2.5 Add stance to the analysis

In many usecases (surveys, customer feedback, etc.) the comments have positive and/or negative stance, and it is usful to create a KPS analysis on each stance seperatly. Most stance detection models don't perform too well on survey data (also costumer feedbacks etc.) since the comments tend to have many "suggestions" in them, and the suggestions tend to appear as positive to the model while the user suggests to improve something that needs improvement. For that end we trained a stance-model that handles suggestions well and labels each sentence as 'Positive', 'Negative', 'Neutral' and 'Suggestion'. We treat Suggestions as negatives and run two separate analysis, first over 'Positive' sentences (pro) and second over 'Negative' and 'Suggestions' sentences (con).

This has the following advantages:

* Creates a separate positive/negative summary that shows clearly what works well and what needs to be improved.
* Filters-out neutral sentences that usually don't contain valuable information.
* Helps the matching model avoid stance mistakes (matching a positive sentence to a negative key point and vice-versa).

In order to run for each stance seperately, use the **stance** parameter in *run_kps_job*. the options are either "pro", "con", or "no-stance" (default).
First, let's run on *pro* sentences:

In [14]:
kps_results_2016_pro = keypoints_client.run_kps_job(domain=domain, stance="pro")
kps_results_2016_pro.print_result(n_sentences_per_kp=2, title='Random sample pro')

2023-06-01 23:11:49,885 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/comments
2023-06-01 23:11:50,515 [INFO] keypoints_client.py 187: domain: austin_demo, comments status: {'processed_comments': 400, 'processed_sentences': 727, 'pending_comments': 0}
2023-06-01 23:11:50,519 [INFO] keypoints_client.py 66: client calls service (post): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-01 23:11:51,161 [INFO] keypoints_client.py 345: started a kp summarization job - domain: austin_demo, stance: pro, run_params: {'stance': 'PRO'}, job_id: 6478fb87f0128c554c89afcc
2023-06-01 23:11:51,162 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-01 23:11:51,712 [INFO] keypoints_client.py 631: job_id 6478fb87f0128c554c89afcc is pending
2023-06-01 23:12:21,717 [INFO] keypoints_client.py 66: client calls service (get): https://k

Random sample pro coverage (all sentences): 0.28
Random sample pro coverage (of pro sentences): 6.06
Random sample pro key points:
2 - City services (water, streets, electric) are outstanding!! - pro
	- RESIDENTIAL SERVICES ARE EXCELLENT!


As you can see, only 0.28% of the sentences are positive sentences that were matched to a key point, and only 6% of the positive sentences are matched. As in many surveys, most comments are negative/suggestions. The positive comments are too few to extract meaningful key points and the results are therefore very limited. In order to generate better positive results, we can try to run a the full dataset.

Let's now run for *con*:

In [15]:
kps_results_2016_con = keypoints_client.run_kps_job(domain=domain, stance="con")
kps_results_2016_con.print_result(n_sentences_per_kp=2, title='Random sample con')

2023-06-01 23:13:30,348 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/comments
2023-06-01 23:13:30,942 [INFO] keypoints_client.py 187: domain: austin_demo, comments status: {'processed_comments': 400, 'processed_sentences': 727, 'pending_comments': 0}
2023-06-01 23:13:30,945 [INFO] keypoints_client.py 66: client calls service (post): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-01 23:13:31,596 [INFO] keypoints_client.py 345: started a kp summarization job - domain: austin_demo, stance: con, run_params: {'stance': 'CON'}, job_id: 6478fbebf0128c554c89afd0
2023-06-01 23:13:31,599 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-01 23:13:32,178 [INFO] keypoints_client.py 631: job_id 6478fbebf0128c554c89afd0 is pending
2023-06-01 23:14:02,185 [INFO] keypoints_client.py 66: client calls service (get): https://k

Random sample con coverage (all sentences): 50.34
Random sample con coverage (of con sentences): 62.78
Random sample con key points:
102 - TRAFFIC NEEDS ADDRESSING ON ALL HWYS AND STREETS - con
	- Need to fix traffic congestion in the city highways urgently.
	- Fix the traffic - you keep encouraging growth of the city without the infrastructure to
	  support that growth.
66 - Improve affordable housing/living. - con
	- Cost of living here is to high & tax for my home is to high.
	- Reduce property taxes and housing costs so that retiring and still living here is a real
	  possibility.
54 - Public transportation needs to improve. - con
	- Need to rework zoning to improve traffic and add more public transit, especially to
	  suburbs.
	- There needs to be better and more public transportation in the city of Austin.
51 - BIKE LANES HURT TRAFFIC. - con
	- Remove peddlers at traffic lights and fix traffic problems.
	- FIX THE TRAFFIC.
33 - PROPERTY TAXES ARE TOO HIGH. - con
	- Consider the f

We can create a new unified kps_result by combining the pro and the con results. This new result will keep the information about both pro and con key points and sentences. It's important to only merge pro and con results that were obtained over the same domain and using the same set of comments, otherwise an error will be raised. 
Let's combine the pro and con results to a single results, and print it to the screen:

In [19]:
kps_result_2016_merged = KpsResult.get_merged_pro_con_results(pro_result=kps_results_2016_pro, con_result=kps_results_2016_con)
kps_result_2016_merged.print_result(n_sentences_per_kp=2, title='Random sample con')

Random sample con coverage (all sentences): 50.62
Random sample con coverage (of pro and con sentences): 59.74
Random sample con key points:
102 - TRAFFIC NEEDS ADDRESSING ON ALL HWYS AND STREETS - con
	- Need to fix traffic congestion in the city highways urgently.
	- Fix the traffic - you keep encouraging growth of the city without the infrastructure to
	  support that growth.
66 - Improve affordable housing/living. - con
	- Cost of living here is to high & tax for my home is to high.
	- Reduce property taxes and housing costs so that retiring and still living here is a real
	  possibility.
54 - Public transportation needs to improve. - con
	- Need to rework zoning to improve traffic and add more public transit, especially to
	  suburbs.
	- There needs to be better and more public transportation in the city of Austin.
51 - BIKE LANES HURT TRAFFIC. - con
	- Remove peddlers at traffic lights and fix traffic problems.
	- FIX THE TRAFFIC.
33 - PROPERTY TAXES ARE TOO HIGH. - con
	- Consid


Alternatively, we can avoid running a seperate analysis for each stance and use the *run_kps_job_both_stances*.
This method starts two seperate jobs simultenously, one for *pro* and one for *con*. It later retreives the results, unify them and returns the merge results object.
This method recieves: 
* The domain
* comments_ids - as in *run_kps_job*
* desription - as in *run_kps_job*, stance is appended to the description of each job
* run_params_pro - optional run_params to be sent to the *pro* job
* run_params_con - optional run_params to be sent to the *con* job

In [16]:
kps_result_2016_merged_2 = keypoints_client.run_kps_job_both_stances(domain)

2023-06-01 23:26:08,018 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/comments
2023-06-01 23:26:08,659 [INFO] keypoints_client.py 187: domain: austin_demo, comments status: {'processed_comments': 400, 'processed_sentences': 727, 'pending_comments': 0}
2023-06-01 23:26:08,663 [INFO] keypoints_client.py 66: client calls service (post): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-01 23:26:09,299 [INFO] keypoints_client.py 345: started a kp summarization job - domain: austin_demo, stance: pro, run_params: {'stance': 'PRO'}, job_id: 6478fee1f0128c554c89afe7
2023-06-01 23:26:09,302 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/comments
2023-06-01 23:26:09,906 [INFO] keypoints_client.py 187: domain: austin_demo, comments status: {'processed_comments': 400, 'processed_sentences': 727, 'pending_comments': 0}
2023-06-01 23:26:09,907

## 3. Jobs management

### 3.1 User report
The user report stores all the information about existing domains and all past and present kps jobs. 
To fetch it and print to screen:

In [None]:
report = keypoints_client.get_full_report()
keypoints_client.print_report(report)

### 3.2 job_id
Each job has a unique job_id which is useful for the jobs managements, and can be obtained in several ways:
* It's printed to the screen when the job starts and in every progress update.
* From the KpsResult object (see section 4)
* When running asyncronously (see section 6)
* From the user report.

### 3.3 Canceling a job
Simply exiting the program after the job is sent does not cancel the job, since it keeps running on the server, consuming resources. In order to cancel a job, use: 
* **keypoints_client.cancel_kp_extraction_job(\<job_id\>)**

It is also possibe to stop all jobs in a domain, or even all jobs in all domains (might be simpler since there is no need of the job_id):

* **keypoints_client.cancel_all_extraction_jobs_for_domain(\<domain\>)**
* **keypoints_client.cancel_all_extraction_jobs_all_domains()**


### 3.4 fetching the results of a previous job
If the program terminated unexpectedly after the job was sent, you can still fetch the results using:
* **kps_result = keypoints_client.get_results_from_job_id(\<job_id\>)**

## 4. KpsResult Object and result processing
The KpsResult object stores all the information about the job (or merged jobs).
It can be used to generate several types of reports and to compare different results. All the reports generated in this section are available in the "kps_results" folder.

The KpaResult can be saved to file and loaded from file via the *load* and *save* methods:

In [20]:
json_file = "kps_results/merged_austin_results.json"
kps_result_2016_merged.save(json_file) 
kps_result_2016_merged = KpsResult.load(json_file)

2023-06-01 23:56:09,801 [INFO] KpsResult.py 93: Writing results to: kps_results/merged_austin_results.json
2023-06-01 23:56:09,827 [INFO] KpsResult.py 104: Loading results from: kps_results/merged_austin_results.json


### 4.1 Job metadata
First, the KpsResults stores the metadata for the job(s): a dictonary with the general and the per_stance metadata. The general metadata contains the user_id, domain and overall number of sentences/comments, which are shared accross stances. The per stance metadata contains the job specific data: run_params, job_id, and number of sentences and comments belonging to the stance. 

In [21]:
print(json.dumps(kps_result_2016_merged.get_job_metadata(), indent=2))

{
  "general": {
    "domain": "austin_demo",
    "user_id": "db0a12",
    "n_sentences": 727,
    "n_sentences_unfiltered": 720,
    "n_comments": 400,
    "n_comments_unfiltered": 394
  },
  "per_stance": {
    "pro": {
      "description": null,
      "run_params": {
        "stance": "PRO"
      },
      "job_id": "6478fb87f0128c554c89afcc",
      "n_sentences_stance": 33,
      "n_comments_stance": 28
    },
    "con": {
      "description": null,
      "run_params": {
        "stance": "CON"
      },
      "job_id": "6478fbebf0128c554c89afd0",
      "n_sentences_stance": 583,
      "n_comments_stance": 353
    }
  }
}


The job_ids can be obtained by:

In [22]:
kps_result_2016_merged.get_stance_to_job_id()

{'pro': '6478fb87f0128c554c89afcc', 'con': '6478fbebf0128c554c89afd0'}

### 4.2 Full result
As for the results themselves, the KpsResult stores all the generated key points and all the sentences that are matched to them. Based on that, several reports and analyses can be generated. First, the attribute "result_df" stores the full results. each row stores a pair of a key point, a matched sentence, their match_score and all the information regarding the sentence.

In [82]:
kps_result_2016_merged.result_df.head(3)

Unnamed: 0,kp,sentence_text,match_score,comment_id,sentence_id,sents_in_comment,span_start,span_end,num_tokens,argument_quality,kp_quality,sent_kp_quality,pos_score,neg_score,sug_score,neut_score,selected_stance,stance_conf,kp_stance
0,TRAFFIC NEEDS ADDRESSING ON ALL HWYS AND STREETS,TRAFFIC NEEDS ADDRESSING ON ALL HWYS AND STREETS,1.0,1473,0,1,0,48,8,0.347743,0.974795,0.974795,0.002457,0.043872,0.943447,0.010224,sug,0.943447,con
1,TRAFFIC NEEDS ADDRESSING ON ALL HWYS AND STREETS,Need to fix traffic congestion in the city hig...,0.99998,854,0,1,0,61,10,0.42671,0.974795,0.996486,0.001866,0.007801,0.985536,0.004797,sug,0.985536,con
2,TRAFFIC NEEDS ADDRESSING ON ALL HWYS AND STREETS,Fix the traffic - you keep encouraging growth ...,0.999979,387,0,2,0,108,17,0.626678,0.974795,0.0,0.000969,0.989541,0.004868,0.004622,neg,0.989541,con


### 4.3 Result Summary 
The result summary presentend in the attribute *summary_df* display the aggregated information per key point:

In [23]:
kps_result_2016_merged.summary_df

Unnamed: 0,key_point,#comments,comments_coverage,#sentences,sentences_coverage,stance,kp_id,parent_id,n_comments_subtree
0,TRAFFIC NEEDS ADDRESSING ON ALL HWYS AND STREETS,94,0.235,102,0.140303,con,0.0,,96.0
1,Improve affordable housing/living.,55,0.1375,66,0.090784,con,1.0,,64.0
2,BIKE LANES HURT TRAFFIC.,51,0.1275,51,0.070151,con,3.0,0.0,51.0
3,Public transportation needs to improve.,48,0.12,54,0.074278,con,2.0,,53.0
4,PROPERTY TAXES ARE TOO HIGH.,30,0.075,33,0.045392,con,4.0,,37.0
5,Integrated transportation is critical.,27,0.0675,31,0.042641,con,5.0,2.0,27.0
6,Buying a home in S Austin is prohibitive.,18,0.045,24,0.033012,con,6.0,1.0,18.0
7,Don't let Austin become Houston with overdevel...,16,0.04,17,0.023384,con,8.0,,16.0
8,Water costs too much.,15,0.0375,18,0.024759,con,7.0,4.0,15.0
9,Be more proactive in city planning and develop...,14,0.035,14,0.019257,con,11.0,,14.0


Here, all keypoints are sorted by their saliance. For each key point, we can see the number of comments matched to it (comments that have at least one sentence matched to the key point); the precentage of comments matched to it (out of the entire set of comments sent to the job); the number and precentage of the sentences matched to it, and its stance. 
The analysis also creates a key point tree-structred hierarchy. The *parent_id* column shows the *kp_id* of the parent of the key point (according to the column *kp_id*), and the *n_comments_subtree* shows how many comments are in the subtree of the key point. For example, the key point *BIKE LANES HURT TRAFFIC.* is under the parent key point *TRAFFIC NEEDS ADDRESSING ON ALL HWYS AND STREETS*.

In addition to the individual key points, in the last rows we can find the statistics of total and matched number of sentences and comments for each stance, starting with \*: in the current example, there are 400 comments and 727 sentences in total, 263 and 368 of them are matched to at least one key point, respectively. Out of those, 353 comments have sentences classified as *con*, and overall 568 sentences are classified as *con*. 262 comments and 366 sentences are matched to *con* key points. 

### 4.4 Docx report
The third report is a Microsoft Word document that shows the key point heirarchy visually and present the sentences matched to each key point as a user-friendly report. ot can be generated using 

In [24]:
kps_result_2016_merged.generate_docx_report(output_dir = "kps_results", result_name="kps_result_2016_merged")

2023-06-02 00:07:59,592 [INFO] docx_generator.py 208: Creating key points hierarchy
2023-06-02 00:07:59,607 [INFO] docx_generator.py 216: Creating key points matches tables
2023-06-02 00:07:59,760 [INFO] docx_generator.py 292: saving docx summary in file: kps_results/kps_result_2016_merged_hierarchical.docx


It's also possible to generate all three reports together. You need to provide the output_dir and the result name, and they will be written to file with the appropriate suffixes.

In [25]:
kps_result_2016_merged.export_to_all_outputs(output_dir="kps_results", result_name="kps_result_2016_merged")

2023-06-02 00:08:13,296 [INFO] utils.py 60: Writing dataframe to: kps_results/kps_result_2016_merged.csv
2023-06-02 00:08:13,320 [INFO] utils.py 60: Writing dataframe to: kps_results/kps_result_2016_merged_kps_summary.csv
2023-06-02 00:08:13,346 [INFO] docx_generator.py 208: Creating key points hierarchy
2023-06-02 00:08:13,361 [INFO] docx_generator.py 216: Creating key points matches tables
2023-06-02 00:08:13,508 [INFO] docx_generator.py 292: saving docx summary in file: kps_results/kps_result_2016_merged_hierarchical.docx


It's also possible to compare different results or subsets of the same results, see section 5 for details.

### 4.5 Unmatched sentneces
Note that the KpsResult object only stores the sentences that were matched to at least one key point. In order to get the list of sentences that were not matched to any key point, the client needs to be called with the KpsResult object as a parameter: 

In [26]:
unmapped_sentences_df = keypoints_client.get_unmapped_sentences_for_kps_result(kps_result_2016_merged)
unmapped_sentences_df.to_csv("kps_results/kps_result_merged_unmapped_sentences.csv")

2023-06-02 00:09:30,599 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/data
2023-06-02 00:09:32,192 [INFO] keypoints_client.py 486: returning 727 sentences for domain austin_demo, job_id 6478fb87f0128c554c89afcc
2023-06-02 00:09:32,427 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/data
2023-06-02 00:09:33,869 [INFO] keypoints_client.py 486: returning 727 sentences for domain austin_demo, job_id 6478fbebf0128c554c89afd0


## 5. Run *Key Point Summarization* incrementally
### 5.1 Run *Key Point Summarization* incrementally on new data (data from 2016 + 2017)
A year passed, and we collect additional data (data from 2017). We can now upload the 2017 data to the same domain (austin_demo) and have both 2016 and 2017 data in one domain. 

In [27]:
comments_2017_df = comments_df[comments_df['year'] == 2017]

sample_size = 400
comments_2017_sample_df = comments_2017_df.sample(n = sample_size, random_state = 1)

domain = 'austin_demo'
comments_texts_2017 = list(comments_2017_sample_df['text'])
comments_ids_2017 = list(comments_2017_sample_df['id'].astype(str))
keypoints_client.upload_comments(domain=domain, comments_ids=comments_ids_2017, comments_texts=comments_texts_2017)

2023-06-02 00:18:39,925 [INFO] keypoints_client.py 158: uploading 400 comments in batches
2023-06-02 00:18:39,926 [INFO] keypoints_client.py 66: client calls service (post): https://keypoint-matching-backend.debater.res.ibm.com/comments
2023-06-02 00:18:40,878 [INFO] keypoints_client.py 174: uploaded 400 comments, out of 400
2023-06-02 00:18:40,881 [INFO] keypoints_client.py 137: waiting for the comments to be processed
2023-06-02 00:18:40,883 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/comments
2023-06-02 00:18:41,522 [INFO] keypoints_client.py 187: domain: austin_demo, comments status: {'processed_comments': 400, 'processed_sentences': 727, 'pending_comments': 400}
2023-06-02 00:18:51,528 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/comments
2023-06-02 00:18:52,155 [INFO] keypoints_client.py 187: domain: austin_demo, comments status: {'processed_com

We can now run a new analysis over all the data in the domain, as we did before, and automatically extract new key points. We can assume that some will be identical to the key points extracted on the 2016 data, some will be similar and some key points will be new.

A better option is to run a new analysis but provide the keypoints from the 2016 analysis and let *Key Point Summarization* add new key points from the 2017 data if there are such. One benefit of this approach is that the new result will mostly use 2016 key point and we will be able to compare between them and view the trends. Another major benefit for this approach is run-time. 2016 data was already analyzed with these key points and since we have a cache in place, much of the computation can be avoided. The 2016 key points can be provided via the: **run_param['key_point_candidates'] = [...]** parameter, passing a list of strings, or we can use: **run_param['key_point_candidates_by_job_ids'] = [\<job_id1\>,...]** and provide a list of previous job_ids. KPS will take the key points from the jobs' result automatically.

For simplicity, we'll run over the result without the stance analysis. We can also use the incremental approach when running on both stances: we will need to provide the job_id of the positive analysis of 2016 in the run_params_pro and the job_id of negative analysis of 2016 in the run_params_con when running on all sentences from 2016+2017.

First, let's extract the job id from the result:

In [28]:
stance_to_job_id = kps_result_2016.get_stance_to_job_id()
print(stance_to_job_id)
job_id_2016 = stance_to_job_id["no-stance"]

{'no-stance': '6478f992f0128c554c89afbd'}


In [29]:
run_params = {'key_point_candidates_by_job_ids': [job_id_2016]}
kps_result_2016_2017 = keypoints_client.run_kps_job(domain, run_params=run_params)

2023-06-02 00:19:56,589 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/comments
2023-06-02 00:19:57,173 [INFO] keypoints_client.py 187: domain: austin_demo, comments status: {'processed_comments': 800, 'processed_sentences': 1510, 'pending_comments': 0}
2023-06-02 00:19:57,176 [INFO] keypoints_client.py 66: client calls service (post): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-02 00:19:57,882 [INFO] keypoints_client.py 345: started a kp summarization job - domain: austin_demo, stance: no-stance, run_params: {'key_point_candidates_by_job_ids': ['6478f992f0128c554c89afbd']}, job_id: 64790b7df0128c554c89b04a
2023-06-02 00:19:57,884 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-02 00:19:58,467 [INFO] keypoints_client.py 631: job_id 64790b7df0128c554c89b04a is pending
2023-06-02 00:20:28,475 [INFO] keypoi

Stage 1/3: |--------------------------------------------------| 0.0% Complete



2023-06-02 00:21:29,735 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-02 00:21:30,371 [INFO] keypoints_client.py 635: job_id 64790b7df0128c554c89b04a is running, progress: {'total_stages': 3, 'stage_0': {'inferred_batches': 5, 'total_batches': 5, 'batch_size': 2000}, 'stage_1': {'inferred_batches': 3, 'total_batches': 8, 'batch_size': 2000}}


Stage 1/3: |██████████████████--------------------------------| 37.5% Complete



2023-06-02 00:22:00,378 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-02 00:22:00,978 [INFO] keypoints_client.py 635: job_id 64790b7df0128c554c89b04a is running, progress: {'total_stages': 3, 'stage_0': {'inferred_batches': 5, 'total_batches': 5, 'batch_size': 2000}, 'stage_1': {'inferred_batches': 8, 'total_batches': 8, 'batch_size': 2000}}


Stage 1/3: |██████████████████████████████████████████████████| 100.0% Complete




2023-06-02 00:22:30,988 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-02 00:22:32,896 [INFO] keypoints_client.py 638: job_id 64790b7df0128c554c89b04a is done, returning result


Now, we can compare the results obtained by the 2016+2017 data with the results obtained in 2016, using the method **kps_result_2016_2017.compare_with_comment_subsets()**.
This method recieves a dictionary with mappings from subset names to sets of comment ids, and performs the comparison between the full result and the subset results. For the full result and for each of the subsets, you can see the number and percentage of the comments that match each key point. 

In [32]:
subsets_dict = {"2016":comments_ids_2016, "2017":comments_ids_2017}
comparison_df = kps_result_2016_2017.compare_with_comment_subsets(subsets_dict)
comparison_df

Unnamed: 0,key point,full_n_comments,full_precent,2016_n_comments,2016_precent,2017_n_comments,2017_precent
0,TRAFFIC NEEDS ADDRESSING ON ALL HWYS AND STREETS,177,22.12%,98,24.50%,79,19.75%
1,Improve affordable housing/living.,145,18.12%,60,15.00%,85,21.25%
2,Public transportation needs to improve.,102,12.75%,56,14.00%,46,11.50%
3,BIKE LANES HURT TRAFFIC.,94,11.75%,58,14.50%,36,9.00%
4,PROPERTY TAXES ARE TOO HIGH.,77,9.62%,33,8.25%,44,11.00%
5,Integrated transportation is critical.,57,7.12%,37,9.25%,20,5.00%
6,Buying a home in S Austin is prohibitive.,54,6.75%,19,4.75%,35,8.75%
7,"Utility ""fees"" create great hardship on the poor.",41,5.12%,16,4.00%,25,6.25%
8,Don't let Austin become Houston with overdevel...,37,4.62%,17,4.25%,20,5.00%
9,Water costs too much.,37,4.62%,17,4.25%,20,5.00%


Using this method, we can compare results over subsets of the data in the same domain. The subsets can be data from different GEOs, different organizations, different times, different users (e.g. promoters/detractors) etc.).

### 5.2 Run *Key Point Analysis* incrementaly on new data (2017 independantly)
If you don't care about the 2016+2017 combination and only want to compare the 2016 and the 2017 data, You can use the **comments_ids** parameter in the **run_kps_job** method, to run on a subset of the comments in the domain. Let's do that and run an analysis over 2017 comments independantly. We will provide the key points from 2016 since we want to able to compare between them:

In [33]:
run_params = {'key_point_candidates_by_job_ids': [job_id_2016]}
kps_result_2017 = keypoints_client.run_kps_job(domain, run_params=run_params, comments_ids = comments_ids_2017)

2023-06-02 00:26:10,536 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/comments
2023-06-02 00:26:11,123 [INFO] keypoints_client.py 187: domain: austin_demo, comments status: {'processed_comments': 800, 'processed_sentences': 1510, 'pending_comments': 0}
2023-06-02 00:26:11,126 [INFO] keypoints_client.py 66: client calls service (post): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-02 00:26:11,971 [INFO] keypoints_client.py 345: started a kp summarization job - domain: austin_demo, stance: no-stance, run_params: {'key_point_candidates_by_job_ids': ['6478f992f0128c554c89afbd']}, job_id: 64790cf4f0128c554c89b04b
2023-06-02 00:26:11,975 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-02 00:26:12,561 [INFO] keypoints_client.py 631: job_id 64790cf4f0128c554c89b04b is pending
2023-06-02 00:26:42,568 [INFO] keypoi

Stage 1/3: |█████████████████████████-------------------------| 50.0% Complete



2023-06-02 00:27:44,162 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-02 00:27:44,772 [INFO] keypoints_client.py 635: job_id 64790cf4f0128c554c89b04b is running, progress: {'total_stages': 3, 'stage_0': {'inferred_batches': 4, 'total_batches': 4, 'batch_size': 2000}, 'stage_1': {'inferred_batches': 1, 'total_batches': 2, 'batch_size': 2000}}


Stage 1/3: |█████████████████████████-------------------------| 50.0% Complete



2023-06-02 00:28:14,775 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-02 00:28:16,506 [INFO] keypoints_client.py 638: job_id 64790cf4f0128c554c89b04b is done, returning result


Now, we can compare the kps_result_2016 with the kps_result_2017 using the *compare_with_other_results* method. This method receives the title for the current results and a dictionary mapping from results name to KpsResult obects, and returns the comparison table. If the comparison is with a single other result, the change percent is also displayed. 

In [34]:
kps_result_2016.compare_with_other_results(this_title="2016", other_results_dict = {"2017":kps_result_2017})

Unnamed: 0,key point,2016_n_comments,2016_precent,2017_n_comments,2017_precent,change_percent
0,TRAFFIC NEEDS ADDRESSING ON ALL HWYS AND STREETS,99,24.75%,80,20.00%,-4.75%
1,Improve affordable housing/living.,60,15.00%,86,21.50%,6.50%
2,BIKE LANES HURT TRAFFIC.,58,14.50%,36,9.00%,-5.50%
3,Public transportation needs to improve.,56,14.00%,47,11.75%,-2.25%
4,Integrated transportation is critical.,37,9.25%,21,5.25%,-4.00%
5,PROPERTY TAXES ARE TOO HIGH.,33,8.25%,44,11.00%,2.75%
6,Buying a home in S Austin is prohibitive.,21,5.25%,35,8.75%,3.50%
7,Don't let Austin become Houston with overdevel...,18,4.50%,20,5.00%,0.50%
8,Water costs too much.,17,4.25%,20,5.00%,0.75%
9,Keep Austin clean & in compliance.,17,4.25%,12,3.00%,-1.25%


This allows us to compare the results of different kps jobs that were using the same set of key points. You can see, for example, that in 2017 people complained more than in 2016 about affordable housing, and less about the traffic. 

## 6. Running KPS in async mode

It is also possible to upload comments and run kps jobs in an asynchronous manner. This can be useful when you want to start several jobs simultaneously, and then later collect the results.

Let's create a new domain for sake of the demonstaration. Note that this is not required, as async and sync calls can be used on the same doamin.

In [35]:
domain = 'austin_demo_async'
keypoints_client.delete_domain_cannot_be_undone(domain=domain)
keypoints_client.create_domain(domain=domain, domain_params={})

2023-06-02 00:31:26,934 [INFO] keypoints_client.py 66: client calls service (delete): https://keypoint-matching-backend.debater.res.ibm.com/data
2023-06-02 00:31:27,457 [ERROR] keypoints_client.py 77: There is a problem with the request (422): user: db0a12 doesn't have domain: austin_demo_async
2023-06-02 00:31:27,461 [INFO] keypoints_client.py 413: domain: austin_demo_async doesn't exist.
2023-06-02 00:31:27,464 [INFO] keypoints_client.py 66: client calls service (post): https://keypoint-matching-backend.debater.res.ibm.com/domains
2023-06-02 00:31:28,425 [INFO] keypoints_client.py 113: created domain: austin_demo_async with domain_params: {}


### 6.1 Uploading comments asynchronously

In order to start loading comments, use the *upload_comments_async* method: 

In [36]:
comments_texts = list(comments_2016_sample_df['text'])
comments_ids = list(comments_2016_sample_df['id'].astype(str))
keypoints_client.upload_comments_async(domain=domain, comments_ids=comments_ids, comments_texts=comments_texts)

2023-06-02 00:33:42,989 [INFO] keypoints_client.py 158: uploading 400 comments in batches
2023-06-02 00:33:42,996 [INFO] keypoints_client.py 66: client calls service (post): https://keypoint-matching-backend.debater.res.ibm.com/comments
2023-06-02 00:33:43,956 [INFO] keypoints_client.py 174: uploaded 400 comments, out of 400


The method uploads the comments and returns immediately. We must wait until all comments finish processing before starting a kps job. This can be checked via the *are_all_comments_processed* method, which prints the upload status and returns True when the domain is ready for jobs:

In [37]:
res = keypoints_client.are_all_comments_processed(domain)
print(res)

2023-06-02 00:33:47,762 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/comments
2023-06-02 00:33:48,374 [INFO] keypoints_client.py 187: domain: austin_demo_async, comments status: {'processed_comments': 0, 'processed_sentences': 0, 'pending_comments': 400}


False


You can also use the keypoints_client.wait_till_all_comments_are_processed(domain=domain) method, that returns only after the comments are processed:

In [38]:
keypoints_client.wait_till_all_comments_are_processed(domain=domain) 

2023-06-02 00:33:53,472 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/comments
2023-06-02 00:33:54,105 [INFO] keypoints_client.py 187: domain: austin_demo_async, comments status: {'processed_comments': 0, 'processed_sentences': 0, 'pending_comments': 400}
2023-06-02 00:34:04,114 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/comments
2023-06-02 00:34:04,755 [INFO] keypoints_client.py 187: domain: austin_demo_async, comments status: {'processed_comments': 400, 'processed_sentences': 727, 'pending_comments': 0}


### 6.2 Running KPS jobs asynchronously

In order to start a job in an async manner, use the *run_kps_job_async* method. This method recieves the same arguments as *run_kps_job*, but returns right after a the job is sent to the server, returning a future object:

In [39]:
future = keypoints_client.run_kps_job_async(domain=domain)

2023-06-02 00:34:09,908 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/comments
2023-06-02 00:34:10,562 [INFO] keypoints_client.py 187: domain: austin_demo_async, comments status: {'processed_comments': 400, 'processed_sentences': 727, 'pending_comments': 0}
2023-06-02 00:34:10,564 [INFO] keypoints_client.py 66: client calls service (post): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-02 00:34:11,222 [INFO] keypoints_client.py 345: started a kp summarization job - domain: austin_demo_async, stance: no-stance, run_params: None, job_id: 64790ed3f0128c554c89b04e


Use the returned future and wait till results are available using the kps_result = future.get_result() method. The method waits for the job to finish and eventually returns the result.

In [40]:
kps_result_async = future.get_result(high_verbosity=True)

2023-06-02 00:34:23,587 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-02 00:34:24,207 [INFO] keypoints_client.py 635: job_id 64790ed3f0128c554c89b04e is running, progress: not updated yet
2023-06-02 00:34:54,216 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-02 00:34:54,820 [INFO] keypoints_client.py 635: job_id 64790ed3f0128c554c89b04e is running, progress: {'total_stages': 2, 'stage_1': {'inferred_batches': 7, 'total_batches': 7, 'batch_size': 2000}}


Stage 1/2: |██████████████████████████████████████████████████| 100.0% Complete




2023-06-02 00:35:24,824 [INFO] keypoints_client.py 66: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2023-06-02 00:35:26,604 [INFO] keypoints_client.py 638: job_id 64790ed3f0128c554c89b04e is done, returning result


The future object can also be used to obtain the job_id, via the **future.get_job_id()** method. 

## 5. Cleanup
If you finished the tutorial and no longer need the domains and the results, cleaning up is always advised:

In [41]:
keypoints_client.delete_domain_cannot_be_undone(domain='austin_demo')
keypoints_client.delete_domain_cannot_be_undone(domain='austin_demo_full')
keypoints_client.delete_domain_cannot_be_undone(domain='austin_demo_async')

2023-06-02 00:36:12,222 [INFO] keypoints_client.py 66: client calls service (delete): https://keypoint-matching-backend.debater.res.ibm.com/data
2023-06-02 00:36:14,027 [INFO] keypoints_client.py 408: domain: austin_demo was deleted
2023-06-02 00:36:14,029 [INFO] keypoints_client.py 66: client calls service (delete): https://keypoint-matching-backend.debater.res.ibm.com/data
2023-06-02 00:36:14,586 [ERROR] keypoints_client.py 77: There is a problem with the request (422): user: db0a12 doesn't have domain: austin_demo_full
2023-06-02 00:36:14,591 [INFO] keypoints_client.py 413: domain: austin_demo_full doesn't exist.
2023-06-02 00:36:14,592 [INFO] keypoints_client.py 66: client calls service (delete): https://keypoint-matching-backend.debater.res.ibm.com/data
2023-06-02 00:36:15,691 [INFO] keypoints_client.py 408: domain: austin_demo_async was deleted


{'status': 'success'}

## 6. Conclusion
In this tutorial, we showed how to use the *Key Point Summarization* service, and how it provides detailed insights over survey data right out of the box - significantly reducing the effort required by a data scientist to analyze the data. We also demonstrated key *key point Summarization* features such as how to modify the analysis parameters and increase coverage, how to use the stance-model and create per-stance results, and how to incrementally add new data.

Feel free to contact us for questions or assistance: *yoavka@il.ibm.com*