In the below code, we have imported two libraries libraries, summa and spacy, for text summarization, as well as json.

1. summa Library:
The summa library is a Python package that provides various text summarization methods. It is used to generate a summary of a given text document.

2. json Library:
The json library is a built-in Python library used for working with JSON (JavaScript Object Notation) data. It provides functions for encoding, decoding, and manipulating JSON data.

3. spacy Library:
The spacy library is an open-source natural language processing (NLP) library for Python. It provides various functionalities for processing and analyzing text data.

In [40]:
from summa import summarizer
import json
import spacy as spacy

The below code defines a function called summarize_reviews that takes two parameters: reviews and target_words. This function aims to generate a summary of a collection of reviews.

1. Concatenating Reviews:
The function begins by concatenating the individual review texts into a single document. The reviews parameter is expected to be a list or iterable containing the review texts. The join() function is used to concatenate the texts, separating them with a space. The resulting concatenated text is stored in the document variable.

2. Generating the Summary:
The code then utilizes the summarizer module from the summa library to generate the summary. The summarizer.summarize() function is called with two arguments: the document variable containing the concatenated text and the target_words parameter.

3. Returning the Summary:
Finally, the function returns the generated summary using the return statement. The summary can be captured and used by the calling code.

In [41]:
def summarize_reviews(reviews, target_words):
    # Concatenate the review texts into a single document
    document = ' '.join(reviews)
    
    # Generate the summary using the Summa summarizer
    summary = summarizer.summarize(document, words=target_words)
    
    return summary

The below code reads the contents of the our text file named "dataset.txt" and stores the lines of the file in a list variable called lines. The readlines() method reads all the lines from the file and returns them as a list, where each line of the file becomes an element in the list. The resulting list of lines is stored in the variable lines. Each line of the file is represented as a separate element in the list.

In [42]:
# Read the dataset from the text file
with open("dataset.txt", "r", encoding="utf-8") as file:
    lines = file.readlines()

The below code parses each line from the lines list as a JSON object and creates a new list called dataset containing these parsed JSON objects.

1. line.strip() removes any leading or trailing whitespace characters from the line.
2. json.loads() is a function from the json library that converts a JSON-formatted string into a Python object. It takes a string as input and returns the corresponding Python object (e.g., dictionary, list, etc.) represented by the JSON string.

In [43]:
# Parse each line as a JSON object
dataset = [json.loads(line.strip()) for line in lines]

The below code extracts the first 1000 reviews with a rating of 1.0 from a dataset and stores them in a list called reviewFor1

In [44]:
# Extract the first 1000 "rating-1.0" reviews
reviewFor1 = []
for review in dataset:
    if review["overall"] == 1.0:
        reviewFor1.append(review["reviewText"])
reviewFor1 = reviewFor1[:1000]

The below code generates a summary for the "rating-1.0" reviews by calling the summarize_reviews() function with the reviewFor1 list and a target word count determined as 1% of the original word count of the reviews.

After executing this code, the summaryFor1 variable will contain the summary generated for the "rating-1.0" reviews.

In [45]:
# Generate the summary for "rating-1.0" reviews (1% of the original word count)
summaryFor1 = summarize_reviews(reviewFor1, target_words=len(' '.join(reviewFor1)) * 0.01)

The below code extracts the first 1000 reviews with a rating of 5.0 from a dataset and stores them in a list called reviewFor5

In [46]:
# Extract the first 1000 "rating-5.0" reviews
reviewFor5 = []
for review in dataset:
    if review["overall"] == 5.0:
        reviewFor5.append(review["reviewText"])
reviewFor5 = reviewFor5[:1000]

The below code generates a summary for the "rating-5.0" reviews by calling the summarize_reviews() function with the reviewFor5 list and a target word count of approximately 300 words.

After executing this code, the summaryFor5 variable will contain the summary generated for the "rating-5.0" reviews, which is limited to approximately 300 words.

In [47]:
# Generate the summary for "rating-5.0" reviews (approximately 300 words)
summaryFor5 = summarize_reviews(reviewFor5, target_words=300)

In [48]:
# Print the summaries
print("Summary of rating-1.0 reviews (1% of the original word count):")
print(summaryFor1)
print()

Summary of rating-1.0 reviews (1% of the original word count):
it worked for the first week then it only charge my phone to 20%.
good reason to buy a case, just not this one.my co-worker delayed in getting a case because he could not find one he liked, and broke the display screen, making his phone unusable.
The Phone charger works, it charges the phone.
However, while the phone is charging it continually beeps as when the battery is too low.It will not stop beeping while it is charging until the battery reaches a certain percentage.I do not think this is the correct charger to be using on the phone I have.Now I am worried based on other reviewers that the battery may blow on the phone and I'll have to replace that too.I would not recommend buying this charger.The charger from my old blackberry works better.
Really poor quality tweezer bent like nothing screw drivers made of cheap plastic adhesive removing tool again cheap plastic you get what you pay for would not recommend to anyone 

The given code defines a function called wordCount that takes a string parameter which we have named as summarize. This function calculates the number of words in the provided string by using the Spacy library's tokenizer. It then returns the word count of the string.

1. Loading the English Language Model:
The code uses the spacy.load() function to load the English language model named "en_core_web_sm". This model provides language processing capabilities, including tokenization, part-of-speech tagging, and entity recognition.

2. Processing the Summary:
The code applies the loaded language model to the summarize string which is the parameter of our function by calling spc(summarize). This processes the summary using the Spacy tokenizer, which breaks the text into individual words and other tokens.

3. Counting the Number of Words:
The code uses a list comprehension to iterate over each token in sum5, which is the processed summary. It checks if the token is not a space character (token.is_space). If the token is not a space, it is considered a word, and it is included in the list comprehension. The length of this list is then calculated using len() to determine the word count.

After executing this code, the number of words in the summaryFor5 summary will be displayed, along with the summary itself.

In [52]:
def wordCount(summarize):
    # Load the English language model
    spc = spacy.load("en_core_web_sm")

    # Process the code using the Spacy tokenizer
    sum5 = spc(summarize)

    # Count the number of words
    wordCount = len([token for token in sum5 if not token.is_space])

    return wordCount

words = wordCount(summaryFor5)
print("Number of words in the file:", words)
print("\n\nSummary of rating-5.0 reviews (approximately 300 words):")
print(summaryFor5)

Number of words in the file: 428


Summary of rating-5.0 reviews (approximately 300 words):
Once we got the phones I called back to check and yes the person from Cingular had done everything as asked, I also checked from the phone itself as you can go to "My Account" from the Main Menu page and phone goes online and checks everything about your account, how many minutes, the plan, when the next bill is due, etc etc, and sure enough everything was as it was supposed to be.As far as signal strength it is excellent in my area, I can go all through our house upstairs and downstairs and it stays between 4-5 bars, I don't recall seeing less than 3 bars all around town so far, the phone gets a great signal, and Cingular "in my area" puts out a great signal.My wife just came back from a business trip, she was about 150 miles away "With NO ROAMING charges" btw our last cell phone service "Alltel" would have been roaming within 20 miles.Anyway when she got to the hotel she called we both had per