# Key Point Summarization
## Extracting meaningful insights from reviews, surveys, and customer feedback

When dealing with large collections of text representing people's opinions, such as product reviews, survey responses, customer feedback, or social media posts, understanding the key issues within the data can be challenging. Manually reviewing thousands of comments is time-consuming and cost-prohibitive.
Existing automated approaches often fall short, typically limited to identifying recurring phrases or concepts and gauging overall sentiment. While useful, these methods often fail to provide detailed, actionable insights.
Key Point Summarization maps the input texts to a set of automatically-extracted short sentences and phrases, termed Key Points, which provide a concise plain-text summary of the data. The prevalence of each key point may be quantified as the number of its matching sentences.

In this tutorial, you will gain hands-on experience using Key Point Summarization (KPS) to analyze and derive insights from free-text feedback.
The data we will use is [a community survey conducted in the city of Austin](https://data.austintexas.gov/dataset/Community-Survey/s2py-ceb7). In this survey, the citizens of Austin were asked "If there was ONE thing you could share with the Mayor regarding the City of Austin (any comment, suggestion, etc.), what would it be?".

KPS utilizes fine-tuned language models for data analysis. Therefore, an environment based on runtime 24.1 and with a GPU is required to run this tutorial effectively.


## Learning goal

The goal of this notebook is to demonstrate how Key Points Summarization can be used for extracting meaningful insights from reviews, surveys and customer feedback

## Contents

This notebook contains the following parts:

- [Initialization and data loading for analysis](#setup)
- [Examining and saving results](#exam)


<a id="setup"></a>
## Initialization and data loading for analysis
First thing we need to do is run the KPS backend. This service runs in the background and performs the analysis.

In [None]:
from keypoint_matching.BackendRunnerWatsonStudio import BackendRunnerWatsonStudio
runner = BackendRunnerWatsonStudio()
runner.start_backend()

Now we can create a client that connects to the backend and uses it.

In [None]:
from key_points_summarization_api.api.clients.keypoints_client import KpsClientWatsonStudio
client = KpsClientWatsonStudio()

Lets run self_check and make sure all is configured correctly and working.
self_check outputs
{'status': 'UP'} when all is working or {'status': 'DOWN'} if a problem is detected. We can also see if GPUs are used or not.

In [None]:
client.run_self_check()

Let's read the data from dataset_austin.csv file, which holds the Austin survey dataset.

In [None]:
import pandas as pd

def get_comments_texts():
    url = "https://raw.githubusercontent.com/IBMDataScience/sample-notebooks/master/Files/Data/dataset_austin.csv"
    df = pd.read_csv(url)
    texts = [str(text) for text in df['text'].tolist()]
    texts = [text for text in texts if len(text)<3000]
    return texts

Let's first load the data into a list of strings and limit the number of comments for quick analysis:

In [None]:
texts = get_comments_texts()
print(f'There are {len(texts)} comments in the dataset')

limit_comments = 500
texts = texts[:limit_comments]
print(f'Analysing {len(texts)} comments')

## Creating a summarization
We will now analyze the comments using the `client.run_full_kps_flow` method. This may take a little while. The more comments we have, the longer it will take.

Comments are analyzed in the scope of a domain. The data is temporarily stored in a domain. 
A user can create several domains, one for each dataset.

In [None]:
domain = f'austin_test'  # describes the dataset
kps_result = client.run_full_kps_flow(domain, texts)

<a id="exam"></a>
## Examining and saving results
Results are now available. Lets print a summary of the analysis.

In [None]:
kps_result.print_result(n_sentences_per_kp = 3, title = "Austin sample", n_top_kps = 40)

We can also export the results into files (summary file, full analysis, as well as a user-friendly DOCX report).

In [None]:
import os
output_dir = f'./kps_results/{domain}/'
os.makedirs(output_dir, exist_ok = True)
kps_result.export_to_all_outputs(output_dir=output_dir, result_name=domain)
!ls -al {output_dir}

When we're done, we can stop the backend for a clean termination.

In [None]:
runner.stop_backend()

### Authors:
**Yoav Katz**
**Roy Bar-Haim**
**Yoav Kantor**
**Lilach Edelstein**

Copyright © 2024 IBM. This notebook and its source code are released under the terms of the MIT License.