# Using *Key Point Analysis* service for analyzing and finding insights in a survey data 
When you have a large collection of texts representing people’s opinions (such as product reviews, survey answers or  social media), it is difficult to understand the key issues that come up in the data. Going over thousands of comments is prohibitively expensive.  Existing automated approaches are often limited to identifying recurring phrases or concepts and the overall sentiment toward them, but do not provide detailed or actionable insights.

In this tutorial you will gain hands-on experience in using *Key Point Analysis* (KPA) for analyzing and deriving insights from open-ended answers.  

The data we will use is a community survey conducted in the city of Austin (https://data.world/cityofaustin/mf9f-kvkk). In this survey, the citizens of Austin where asked "If there was ONE thing you could share with the Mayor regarding the City of Austin (any comment, suggestion, etc.), what would it be?". 

## 1. Run *Key Point Analysis*

### 1.1 Read the data and run *key point analysis*  over it
Let's read the data from *dataset_austin.csv* file, which holds the Austin survey dataset, and print the first 5 comments.

In [1]:
import csv
import random


with open('./dataset_austin.csv') as csv_file:
    reader = csv.DictReader(csv_file)
    reader = [dict(d) for d in reader]
    comments = list(reader)

print(f'There are {len(comments)} comments in the dataset')
print(comments[:5])

There are 3187 comments in the dataset
[{'id': '0', 'comment_text': "Dissatisfied traffic and with traffic, timing of street lights.  EXTREMELY dissatisfied with cit govt. interfering in local businesses (Uber/Lyft, income property owners).  Also, extremely dissatisfied with all the free handouts to people who are perfectly capable of earning their own money.  I'm very dissatisfied with the liberal leaning local politicians."}, {'id': '1', 'comment_text': 'Maintenance of city facilities needs to be equitable across the city. We need to think long-term; Austin can\'t sustain it\'s current level of "chic" indefinitely. What are we going to do when the cool beautiful people move to a community with more shiny dime-store objects to lure them? Long after the current boom goes bust (and it always does) the rest of us real Austinites will still be here. It really is time to stop sacrificing the quality of life on the east side of town (or any of the  less-privileged parts of town) for the wan

Each comment is a dictionary with an unique_id 'id' and 'comment_text'. Let's randomly sample 1000 comments. The *Key Point Analysis* service is able to run over hundreds of thousands of sentences, however since the computation is heavy in resources (particularly GPUs) the trial version is limited to 1000 comments. You may request to increase this limit if needed.

In [2]:
random.seed(0)
random_sample_comments = random.sample(comments, 1000)

*Key point analysis* is a novel and promising approach for summarization, with an important quantitative angle. This service summarizes a collection of comments on a given topic as a small set of key points. The salience of each key point is given by the number of its matching sentences in the given comments.

Before running the *Key Point Analysis* service we first need to initialize our client. The clients print information using the logger and a suitable verbosity level should be set. The client object is configured with an API key. It should be  retrieved from the [Project Debater Early Access Program](https://early-access-program.debater.res.ibm.com/) site.  In the code bellow it is passed by the enviroment variable *DEBATER_API_KEY*.

The *Key Point Analysis* service stores the data (and cached-results) in a *domain*. A user can create several domains, one for each dataset. Domains are only accessible to the user who created them.

Full documentation of the *Key Point Analysis* service can be found [here](https://early-access-program.debater.res.ibm.com/docs/services/keypoints/keypoints_pydoc.html).


In [3]:
from debater_python_api.api.clients.keypoints_client import KpAnalysisClient, KpAnalysisUtils, KpAnalysisTaskFuture
import os

KpAnalysisUtils.init_logger()
api_key = os.environ['DEBATER_API_KEY']
host = 'https://keypoint-matching-backend.debater.res.ibm.com'
keypoints_client = KpAnalysisClient(api_key, host)

In order to run *Key Point Analysis*, do the following steps:

1. Create a domin using the **keypoints_client.create_domain(domain=domain, domain_params={})** method. Several params can be passed when creating a domain in the domain_params dictionary as described in the documentation. Leaving it empty gives us a good default behaviour. You can also use **KpAnalysisUtils.create_domain_ignore_exists(client=keypoints_client, domain=domain, domain_params={})** if you don't want an exception to be thrown if the domain already exists. Note that in such case the domain_params are not updated and are remained as they where before.

In [4]:
domain = 'austin_demo'
KpAnalysisUtils.create_domain_ignore_exists(client=keypoints_client, domain=domain, domain_params={})

2022-07-31 15:27:19,127 [INFO] keypoints_client.py 426: client calls service (post): https://keypoint-matching-backend.debater.res.ibm.com/domains
2022-07-31 15:27:19,639 [ERROR] keypoints_client.py 440: There is a problem with the request (422): domain: austin_demo already exist
2022-07-31 15:27:19,640 [INFO] keypoints_client.py 276: domain: austin_demo already exists, domain_params are NOT updated.


2. Upload the comments into the domain using the **keypoints_client.upload_comments(domain=domain, comments_ids=comments_ids, comments_texts=comments_texts)** method. This method receives the domain, a list of comment_ids and a list of comment_texts. When uploading comments into a domain, the *Key Point Analysis* service splits the comments into sentences and runs a minor cleansing on the sentences. If you have domain-specific knowladge and want to split the comments into sentences yourself, you can upload comments that are already splitted into sentences and set the *dont_split* parameter to True (in the domain_params when creating the domain) and *Key Point Analysis* will use the provided sentences as is. 

Note that:
* Comments_ids must be unique
* The number of comments_ids must match the number comments_texts
* Comments_texts must not be longer than 1000 characters
* Uploading the same comment several times (same domain + comment_id, comment_text is ignored) is not a problem and the comment is only uploaded once (if the comment_text is different, it is NOT updated).

In [5]:
comments = [c for c in random_sample_comments if len(c['comment_text'])<=1000]
comments_texts = [comment['comment_text'] for comment in comments]
comments_ids = [comment['id'] for comment in comments]
keypoints_client.upload_comments(domain=domain, comments_ids=comments_ids, comments_texts=comments_texts)

2022-07-31 15:27:19,648 [INFO] keypoints_client.py 499: uploading 989 comments in batches
2022-07-31 15:27:19,649 [INFO] keypoints_client.py 426: client calls service (post): https://keypoint-matching-backend.debater.res.ibm.com/comments
2022-07-31 15:27:20,687 [INFO] keypoints_client.py 513: uploaded 989 comments, out of 989


3. Comments that are uploaded to the domain are being processed. This takes some times and runs in an async manner. We can't run an analysis before this phase finishes and we need to wait till all comments in the domain are processed using the **keypoints_client.wait_till_all_comments_are_processed(domain=domain)** method.

In [6]:
keypoints_client.wait_till_all_comments_are_processed(domain=domain)

2022-07-31 15:27:20,692 [INFO] keypoints_client.py 426: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/comments
2022-07-31 15:27:21,250 [INFO] keypoints_client.py 525: domain: austin_demo, comments status: {'processed_comments': 989, 'processed_sentences': 1821, 'pending_comments': 989}
2022-07-31 15:27:31,257 [INFO] keypoints_client.py 426: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/comments
2022-07-31 15:27:31,827 [INFO] keypoints_client.py 525: domain: austin_demo, comments status: {'processed_comments': 989, 'processed_sentences': 1821, 'pending_comments': 0}


4. Start a *Key Point Analysis* job using the **future = keypoints_client.start_kp_analysis_job(domain=domain, run_params=run_params)** method. This method receives the domain and a *run_params*. The run_params is a dictionary with various parameters for customizing the job. Leaving it empty gives us a good default behaviour. The job runs in an async manner therefore the method returns a future object.

Few additional options when running an analysis job:
* The analysis is performed over all comments in the domain. If we need to run over a subset of the comments (split the data by different GEOs/users types/timeframes etc') we can pass a list of comments_ids to the comments_ids parameter and it will create an analysis using only the provided comments.
* By default, key points are extracted automatically. When we want to provide key points and match all sentences to these key points we can do so by passing them to the keypoints parameter: **run_param['keypoints'] = [...]**
* It is also possible to provide key points and let KPA add additional missing key points. To do so pass the key points to the keypoint_candidates parameter: **run_param['keypoint_candidates'] = [...]**

In [7]:
run_params = {'n_top_kps': 20}  # limit number of key points to 20 for faster results
future = keypoints_client.start_kp_analysis_job(domain=domain, run_params=run_params)

2022-07-31 15:27:31,834 [INFO] keypoints_client.py 426: client calls service (post): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2022-07-31 15:27:32,477 [INFO] keypoints_client.py 581: started a kp analysis job - domain: austin_demo, run_params: {'n_top_kps': 20}, job_id: 62e67534716f9a7436dcd640


5. Use the returned future and wait till results are available using the **kpa_result = future.get_result()** method. The method waits for the job to finish and eventually returns the result. The result is a dictionary containing the key points (sorted descendingly according to number of matched sentences) and for each key point has a list of matched sentences (sorted descendingly according to their match score). An additional 'none' key point is added which holds all the sentences that don't match any key point.

In [8]:
kpa_result = future.get_result(high_verbosity=True, polling_timout_secs=30)

2022-07-31 15:27:32,488 [INFO] keypoints_client.py 426: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2022-07-31 15:27:32,994 [INFO] keypoints_client.py 762: job_id 62e67534716f9a7436dcd640 is pending
2022-07-31 15:28:03,000 [INFO] keypoints_client.py 426: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2022-07-31 15:28:04,545 [INFO] keypoints_client.py 769: job_id 62e67534716f9a7436dcd640 is done, returning result


Let's print the results:

In [9]:
from austin_utils import print_results
print_results(kpa_result, n_sentences_per_kp=2, title='Random sample')

Random sample coverage: 58.94
Random sample key points:
297 - traffic improvement needed
	- IMPROVE THE STREETS AND TRAFFIC LIGHT TIMING.
	- NEED BETTER TRAFFIC FLOW PLANNING.
227 - Austin needs better public transportation
	- TRANSPORTATION POOLING ETC.
	- PLEASE CONSIDER MORE PUBLIC TRANSPORTATION.
172 - Improve affordable housing/living.
	- IMPROVE THE COST OF LIVING IN AUSTIN.
	- BUILD MORE AFFORDABLY HOUSING
110 - Lower home taxes.
	- lower taxes and price of homes, stop being a sanctuary city
	- Quit giving tax breaks to businesses so you can quit raising property taxes.
69 - Cost of electricity is too high
	- Costs of energy are ridiculously high.
	- TRAFFIC AND COST OF ELECTRIC AND WATER.
66 - Improvement of overall safety in the city.
	- TRAFFIC CONGESTION AND SAFETY.
	- I enjoy living in Austin and I hope the city continues being a safe city.
60 - Keep enforcing local rules.
	- ENFORCE THE LAWS OF THIS CITY AND GET THE PANHANDLERS OFF OUR STREETS!
	- Or enforce the no camping

We can also save the results to file. This creates two files, one with the key points and all matched sentences and another summary file with only the key points and their saliance.

In [10]:
KpAnalysisUtils.write_result_to_csv(kpa_result, 'austin_survey_kpa_results.csv')

2022-07-31 15:28:04,578 [INFO] keypoints_client.py 115: Writing dataframe to: austin_survey_kpa_results_kps_summary.csv
2022-07-31 15:28:04,608 [INFO] keypoints_client.py 115: Writing dataframe to: austin_survey_kpa_results.csv


It is always possible to cancel a pending/running job in the following way:
* **keypoints_client.cancel_kp_extraction_job(\<Job Id\>)**

Job Id can be found: 
1. It's printed when a job is started 
2. From the fututre object: **future.get_job_id()**
3. From user report: **keypoints_client.get_full_report()** (see bellow)

It is also possibe to stop all jobs in a domain, or even all jobs in all domains:
* **keypoints_client.cancel_all_extraction_jobs_for_domain(domain)**
* **keypoints_client.cancel_all_extraction_jobs_all_domains()**

Please cancel long jobs if the results are no longer needed.

### 1.3 Modify the run_params and increase coverage
Each domain have a cache that stores all intemidiate results that are canculate during the analysis. Therefore modifing the run_params and running another analysis runs much faster and all intersecting inetmidiate results are retreived from cache. 

Let's run again, but now reduce the **clustering_threshold** and **mapping_threshold**. The **clustering_threshold** is used for the key points selection (choose higher values for more fine-grained key points, and lower for more distinct key points). The **mapping_threshold** is used when mapping all sentences to the final key points (a lower threshold leads to a higher coverage with the risk of a lower precision).

In [11]:
run_params = {'n_top_kps': 20, 'clustering_threshold': 0.95, 'mapping_threshold': 0.95}  # limit number of key points to 20 for faster results
future = keypoints_client.start_kp_analysis_job(domain=domain, run_params=run_params)
kpa_result = future.get_result(high_verbosity=True, polling_timout_secs=30)

2022-07-31 15:28:04,647 [INFO] keypoints_client.py 426: client calls service (post): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2022-07-31 15:28:05,262 [INFO] keypoints_client.py 581: started a kp analysis job - domain: austin_demo, run_params: {'n_top_kps': 20, 'clustering_threshold': 0.95, 'mapping_threshold': 0.95}, job_id: 62e67555716f9a7436dcd641
2022-07-31 15:28:05,263 [INFO] keypoints_client.py 426: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2022-07-31 15:28:05,772 [INFO] keypoints_client.py 762: job_id 62e67555716f9a7436dcd641 is pending
2022-07-31 15:28:35,777 [INFO] keypoints_client.py 426: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2022-07-31 15:28:37,267 [INFO] keypoints_client.py 769: job_id 62e67555716f9a7436dcd641 is done, returning result


In [12]:
print_results(kpa_result, n_sentences_per_kp=2, title='Random sample')

Random sample coverage: 66.58
Random sample key points:
314 - traffic improvement needed
	- IMPROVE THE STREETS AND TRAFFIC LIGHT TIMING.
	- NEED BETTER TRAFFIC FLOW PLANNING.
267 - Austin needs better public transportation
	- TRANSPORTATION POOLING ETC.
	- PLEASE CONSIDER MORE PUBLIC TRANSPORTATION.
187 - Improve affordable housing/living.
	- IMPROVE THE COST OF LIVING IN AUSTIN.
	- BUILD MORE AFFORDABLY HOUSING
117 - Lower home taxes.
	- lower taxes and price of homes, stop being a sanctuary city
	- Quit giving tax breaks to businesses so you can quit raising property taxes.
96 - Cost of electricity is too high
	- Costs of energy are ridiculously high.
	- TRAFFIC AND COST OF ELECTRIC AND WATER.
95 - Improvement of overall safety in the city.
	- TRAFFIC CONGESTION AND SAFETY.
	- I enjoy living in Austin and I hope the city continues being a safe city.
85 - Keep enforcing local rules.
	- ENFORCE THE LAWS OF THIS CITY AND GET THE PANHANDLERS OFF OUR STREETS!
	- Or enforce the no camping

By reducing the thresholds, the coverage was increased from 59% to 66%.

## 2. Mapping setences to multiple key points, and creating Key-Points-Graphs
By default, each sentence is mapped to one key point at most (the key point with the highest match-score, above the **mapping_threshold**). We can run again and ask KPA to map each sentence to all key points with a match-score above the **mapping_threshold**, by adding the **sentence_to_multiple_kps** parameter.

In [13]:
run_params = {'n_top_kps': 20, 'clustering_threshold': 0.95, 'mapping_threshold': 0.95, 'sentence_to_multiple_kps': True}
future = keypoints_client.start_kp_analysis_job(domain=domain, run_params=run_params)
kpa_result = future.get_result(high_verbosity=True, polling_timout_secs=30)

2022-07-31 15:28:37,285 [INFO] keypoints_client.py 426: client calls service (post): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2022-07-31 15:28:37,860 [INFO] keypoints_client.py 581: started a kp analysis job - domain: austin_demo, run_params: {'n_top_kps': 20, 'clustering_threshold': 0.95, 'mapping_threshold': 0.95, 'sentence_to_multiple_kps': True}, job_id: 62e67575716f9a7436dcd642
2022-07-31 15:28:37,861 [INFO] keypoints_client.py 426: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2022-07-31 15:28:38,375 [INFO] keypoints_client.py 762: job_id 62e67575716f9a7436dcd642 is pending
2022-07-31 15:29:08,380 [INFO] keypoints_client.py 426: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2022-07-31 15:29:08,931 [INFO] keypoints_client.py 766: job_id 62e67575716f9a7436dcd642 is running, progress: not updated yet
2022-07-31 15:29:38,936 [INFO] keypoints_client.py 426: cl

In [14]:
print_results(kpa_result, n_sentences_per_kp=2, title='Random sample')

Random sample coverage: 65.34
Random sample key points:
346 - Address transportation problems NOW.
	- THE TRAFFIC ISSUE AND LACK OF LIGHT RAIL MUST BE ADDRESSED.
	- WORK ON TRAFFIC ISSUES
225 - Work on making Austin affordable again.
	- Make Austin affordable
	- More affordable housing
170 - Too much traffic!!
	- Please fix the traffic problems.
	- Please fix the traffic problems!
128 - Property taxes are outrageous.
	- PROPERTY TAXES ARE WAY TOO HIGH.
	- UTILITY RATES & PROPERTY TAXES ARE OUTRAGEOUS
102 - Spend our tax dollars wisely!!!
	- SPENDING OF TAX REVENUE.
	- STOP OVER DEVELOPMENT AND WASTING TAX PAYERS MONEY.
97 - Enforce the traffic laws.
	- ENFORCEMENT OF TRAFFIC LAWS TO KEEP TRAFFIC FLOWING.
	- Enforce traffic infractions.
95 - Improvement of overall safety in the city.
	- TRAFFIC CONGESTION AND SAFETY.
	- I enjoy living in Austin and I hope the city continues being a safe city.
76 - Traffic and city planning is poorly done.
	- Fix traffic congestion - better planning need

Now that sentences are mapped to multiple key points, it is possible to create a *key points graph* by first saving the results as before, then translating the results file into a graph-data json file, then load this json file in our demo graph visualization, available at: [key points graph demo](https://keypoint-matching-ui.ris2-debater-event.us-east.containers.appdomain.cloud/)

In [15]:
result_file = 'austin_survey_kpa_results.csv'
KpAnalysisUtils.write_result_to_csv(kpa_result, result_file)
KpAnalysisUtils.create_graph_data_file_for_ui(result_file)

2022-07-31 15:29:40,596 [INFO] keypoints_client.py 115: Writing dataframe to: austin_survey_kpa_results_kps_summary.csv
2022-07-31 15:29:40,605 [INFO] keypoints_client.py 115: Writing dataframe to: austin_survey_kpa_results.csv
2022-07-31 15:29:40,639 [INFO] keypoints_client.py 357: Creating key points graph data-file for results file: austin_survey_kpa_results.csv
2022-07-31 15:29:40,639 [INFO] keypoints_client.py 332: reading file: austin_survey_kpa_results.csv
2022-07-31 15:29:40,731 [INFO] keypoints_client.py 390: saving graph in file: austin_survey_kpa_results_graph_data.json
2022-07-31 15:29:40,732 [INFO] keypoints_client.py 391: saving graph in file: austin_survey_kpa_results_graph_data.json


You can now go to: [key points graph demo](https://keypoint-matching-ui.ris2-debater-event.us-east.containers.appdomain.cloud/) and load the graph's data file **austin_survey_kpa_results_graph_data.json** to the ui.

## 3. Run key point analysis on each stance seperatly

In many use-cases (surveys, customer feedback, etc') the comments have positive and/or negative stance, and it is usful to create a KPA analysis on each stance seperatly. Most stance detection models don't perfome too well on survey data (also costumer feedbacks etc') since the comments tend to have many "suggestions" in them, and the suggestions tend to apear positive to the model while the user suggests to improve something that needs improvement.
For that end we trained a stance-model that handles suggestions well and labels each sentence as 'Positive', 'Negative', 'Neutral' and 'Suggestion'. We usually treat Suggestions as negatives and run two separate analysis, first over 'Positive' sentences and second over 'Negative' and 'Suggestions' sentences.

This has the following advantages:
* Creates a separate positive/negative summary that shows clearly what works well and what needs to be improved.
* Filters-out neutral sentences that usually don't contain valuable information.
* Helps the matching model avoid stance mistakes (matching a positive sentence to a negative key point and vice-versa).

Lets run again, over the Austin survey dataset, but this time create two seperate KPA analyses (positive and negative). We will first need to create a new domain but this time add the domain_param **do_stance_analysis**

In [16]:
domain = 'austin_demo_two_stances'
domain_params = {'do_stance_analysis': True}
KpAnalysisUtils.create_domain_ignore_exists(client=keypoints_client, domain=domain, domain_params=domain_params)

2022-07-31 15:29:40,744 [INFO] keypoints_client.py 426: client calls service (post): https://keypoint-matching-backend.debater.res.ibm.com/domains
2022-07-31 15:29:41,222 [ERROR] keypoints_client.py 440: There is a problem with the request (422): domain: austin_demo_two_stances already exist
2022-07-31 15:29:41,223 [INFO] keypoints_client.py 276: domain: austin_demo_two_stances already exists, domain_params are NOT updated.


Let's upload the comments to the new domain and wait for them to be processed. This time the sentences' stance is also calculated.

In [17]:
keypoints_client.upload_comments(domain=domain, comments_ids=comments_ids, comments_texts=comments_texts)
keypoints_client.wait_till_all_comments_are_processed(domain=domain)

2022-07-31 15:29:41,229 [INFO] keypoints_client.py 499: uploading 989 comments in batches
2022-07-31 15:29:41,231 [INFO] keypoints_client.py 426: client calls service (post): https://keypoint-matching-backend.debater.res.ibm.com/comments
2022-07-31 15:29:42,298 [INFO] keypoints_client.py 513: uploaded 989 comments, out of 989
2022-07-31 15:29:42,299 [INFO] keypoints_client.py 426: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/comments
2022-07-31 15:29:42,877 [INFO] keypoints_client.py 525: domain: austin_demo_two_stances, comments status: {'processed_comments': 989, 'processed_sentences': 1821, 'pending_comments': 989}
2022-07-31 15:29:52,880 [INFO] keypoints_client.py 426: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/comments
2022-07-31 15:29:53,433 [INFO] keypoints_client.py 525: domain: austin_demo_two_stances, comments status: {'processed_comments': 989, 'processed_sentences': 1821, 'pending_comments': 0}


We can also download the processed sentences and save them into a csv if we want to examine the processed data.

In [18]:
sentences = keypoints_client.get_sentences_for_domain(domain=domain)
KpAnalysisUtils.write_sentences_to_csv(sentences, f'{domain}_sentences.csv')

2022-07-31 15:29:53,439 [INFO] keypoints_client.py 426: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/data
2022-07-31 15:29:55,066 [INFO] keypoints_client.py 712: returning 1821 sentences for domain austin_demo_two_stances


And now, run two analyses, one over the positive sentences and one over the negative + suggestions.

In [19]:
run_params['stances_to_run'] = ['pos']
run_params['stances_threshold'] = 0.5
future = keypoints_client.start_kp_analysis_job(domain=domain, run_params=run_params)
kpa_pos_result = future.get_result(high_verbosity=True, polling_timout_secs=30)
print_results(kpa_pos_result, n_sentences_per_kp=2, title='Random sample positives')

2022-07-31 15:29:57,443 [INFO] keypoints_client.py 426: client calls service (post): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2022-07-31 15:29:58,102 [INFO] keypoints_client.py 581: started a kp analysis job - domain: austin_demo_two_stances, run_params: {'n_top_kps': 20, 'clustering_threshold': 0.95, 'mapping_threshold': 0.95, 'sentence_to_multiple_kps': True, 'stances_to_run': ['pos'], 'stances_threshold': 0.5}, job_id: 62e675c6716f9a7436dcd644
2022-07-31 15:29:58,103 [INFO] keypoints_client.py 426: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2022-07-31 15:29:58,606 [INFO] keypoints_client.py 762: job_id 62e675c6716f9a7436dcd644 is pending
2022-07-31 15:30:28,609 [INFO] keypoints_client.py 426: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2022-07-31 15:30:29,557 [INFO] keypoints_client.py 769: job_id 62e675c6716f9a7436dcd644 is done, returning result


Random sample positives coverage: 10.11
Random sample positives key points:
4 - CONTINUE TO SUPPORT THE EXCELLENT LIBRARIES.
	- Keep standing up for the liberal beliefs that make Austin so great.
	- The Austin Public Library is top of the line!
3 - KEEP FIGHTING FOR PROGRESSIVE IDEAS!
	- KEEP AUSTIN PROGRESSIVE!
	- Keep standing up for the liberal beliefs that make Austin so great.
2 - Keep enforcing local rules.
	- Good job regulating Uber!


As in many surveys, most comments are negative/suggestions therefore the positive analysis is relativly limited. Let's see how the negative analysis goes

In [20]:
run_params['stances_to_run'] = ['neg', 'sug']
run_params['stances_threshold'] = 0.5
future = keypoints_client.start_kp_analysis_job(domain=domain, run_params=run_params)
kpa_neg_result = future.get_result(high_verbosity=True, polling_timout_secs=30)
print_results(kpa_neg_result, n_sentences_per_kp=2, title='Random sample positives')

2022-07-31 15:30:29,564 [INFO] keypoints_client.py 426: client calls service (post): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2022-07-31 15:30:30,245 [INFO] keypoints_client.py 581: started a kp analysis job - domain: austin_demo_two_stances, run_params: {'n_top_kps': 20, 'clustering_threshold': 0.95, 'mapping_threshold': 0.95, 'sentence_to_multiple_kps': True, 'stances_to_run': ['neg', 'sug'], 'stances_threshold': 0.5}, job_id: 62e675e6716f9a7436dcd645
2022-07-31 15:30:30,246 [INFO] keypoints_client.py 426: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2022-07-31 15:30:30,759 [INFO] keypoints_client.py 762: job_id 62e675e6716f9a7436dcd645 is pending
2022-07-31 15:31:00,763 [INFO] keypoints_client.py 426: client calls service (get): https://keypoint-matching-backend.debater.res.ibm.com/kp_extraction
2022-07-31 15:31:02,507 [INFO] keypoints_client.py 769: job_id 62e675e6716f9a7436dcd645 is done, returning resul

Random sample positives coverage: 70.37
Random sample positives key points:
310 - Address transportation problems NOW.
	- THE TRAFFIC ISSUE AND LACK OF LIGHT RAIL MUST BE ADDRESSED.
	- WORK ON TRAFFIC ISSUES
203 - Work on making Austin affordable again.
	- Make Austin affordable
	- More affordable housing
154 - Too much traffic!!
	- Please fix the traffic problems.
	- Please fix the traffic problems!
112 - Property taxes are outrageous.
	- PROPERTY TAXES ARE WAY TOO HIGH.
	- UTILITY RATES & PROPERTY TAXES ARE OUTRAGEOUS
96 - Spend our tax dollars wisely!!!
	- STOP OVER DEVELOPMENT AND WASTING TAX PAYERS MONEY.
	- Please keep taxes down!
82 - Enforce the traffic laws.
	- ENFORCEMENT OF TRAFFIC LAWS TO KEEP TRAFFIC FLOWING.
	- Enforce traffic infractions.
73 - Improvement of overall safety in the city.
	- I work downtown and do not feel safe walking around because of the large and growing
	  homeless population.
	- Bicycle lanes have negatively impacted travel and safety in this city.
72

Reaching a nice 70% coverage, most of the sentences are matched to the 20 automatically extracted key points.

We can increase the **stances_threshold** when we want to run over less sentences with a stronger stance. This is useful when we have a large dataset with many less-relevant sentences and we want to filter them out.

We can mark the stance in the results:

In [21]:
kpa_pos_result = KpAnalysisUtils.set_stance_to_result(kpa_pos_result, 'pos')
kpa_neg_result = KpAnalysisUtils.set_stance_to_result(kpa_neg_result, 'neg')

And save the results (both pos/neg seperatly and merged) and create key points graphs' data files as we did before

In [22]:
pos_result_file = 'austin_survey_pos_kpa_results.csv'
KpAnalysisUtils.write_result_to_csv(kpa_pos_result, pos_result_file)
KpAnalysisUtils.create_graph_data_file_for_ui(pos_result_file)

neg_result_file = 'austin_survey_neg_kpa_results.csv'
KpAnalysisUtils.write_result_to_csv(kpa_neg_result, neg_result_file)
KpAnalysisUtils.create_graph_data_file_for_ui(neg_result_file)

kpa_merged_result = KpAnalysisUtils.merge_two_results(kpa_pos_result, kpa_neg_result)
merged_result_file = 'austin_survey_merged_kpa_results.csv'
KpAnalysisUtils.write_result_to_csv(kpa_merged_result, merged_result_file)
KpAnalysisUtils.create_graph_data_file_for_ui(merged_result_file)

2022-07-31 15:31:02,526 [INFO] keypoints_client.py 115: Writing dataframe to: austin_survey_pos_kpa_results_kps_summary.csv
2022-07-31 15:31:02,548 [INFO] keypoints_client.py 115: Writing dataframe to: austin_survey_pos_kpa_results.csv
2022-07-31 15:31:02,553 [INFO] keypoints_client.py 357: Creating key points graph data-file for results file: austin_survey_pos_kpa_results.csv
2022-07-31 15:31:02,553 [INFO] keypoints_client.py 332: reading file: austin_survey_pos_kpa_results.csv
2022-07-31 15:31:02,569 [INFO] keypoints_client.py 390: saving graph in file: austin_survey_pos_kpa_results_graph_data.json
2022-07-31 15:31:02,570 [INFO] keypoints_client.py 391: saving graph in file: austin_survey_pos_kpa_results_graph_data.json
2022-07-31 15:31:02,647 [INFO] keypoints_client.py 115: Writing dataframe to: austin_survey_neg_kpa_results_kps_summary.csv
2022-07-31 15:31:02,679 [INFO] keypoints_client.py 115: Writing dataframe to: austin_survey_neg_kpa_results.csv
2022-07-31 15:31:02,757 [INFO] k

When we want to see what domains we have, maybe delete old ones that are not needed, see past and present analysis jobs, perhaps take their job_id and fetch their result 
(**KpAnalysisTaskFuture(keypoints_client, \<job_id\>).get_result()**), 
we can get a report with all the needed information

In [None]:
report = keypoints_client.get_full_report()
KpAnalysisUtils.print_report(report)

## 4. Conclusion
In this tutorial, we showed how *Key Point Analysis* is used, and how it provides detailed insights over survey data right out of the box - significantly reducing the effort required by a data scientist to analyze the data. We also demonstrated key *key point analysis* features such as how we can modify the analysis parameters and increase coverage, how we can use the stance-model and create per-stance results and how to create *key points graph* and further improve the quality and the clarity of the results.