Skip to content

IBM/tweet-travel-blog-summary

Repository files navigation

Text summarization of a Travel blog and tweeting it to increase the reach and expand business.

In this Code Pattern We will demonstrate a methodology to summarize a blog using Watson Studio. We will make use of the existing Code Pattern Text summarization and visualization using watson studio. You will get the details of the text summarization from the suggested code pattern. This Code Pattern focuses on extractive summarization of the travel blog, extracting keywords converting them into relevant hashtags and tweeting it on twitter.

For the sake of an example to show how one can market and grow their business, we have taken an upcoming famous website hostelgeeks.com where they share stories of the people who have travelled or visited the places listed in their travel blogs. Humans are fond of stories. Since childhood they have lots of memories to share which makes them happy. Sharing their experience with other people does two things. First any travel blog gives awareness to the travellers and also incites the desire to visit and have that experience too. Second it reviews and comments about any place from a credible source also decides the credibilty of that place which all the travellers look for before investing in any travel plans. So we ran text Summarization on the stories, converted them into meaningful impactful tweet and tweeted it using twitter API with relevant Hashtags.

A Brief about Text Summarization

When the reader has completed this code pattern, they will understand how to:

  • Quickly summarize the text from documents & news feeds.
  • Create topic modeling on the text to extract important topics.
  • Create visualizations for better understanding of the data.
  • Interpret the summary and visualization of the data.
  • Tweet it on twitter.

Architecture Diagram

Flow

  1. User logs into Watson Studio, creates an instance which includes object storage.
  2. User uploads the data file to the object storage.
  3. User imports a Jupyter Notebook from the URL.
  4. User runs the processing techniques & creates a statistical model for topics in the notebook.
  5. User explores the visualization in the notebook and can export the output to object storage.

Watch the Video

Steps

Follow these steps to setup and run this code pattern. The steps are described in detail below.

  1. Create an account with IBM Cloud
  2. Create a new Watson Studio project
  3. Create the notebook
  4. Add the data
  5. Get access tokens and consumer key to use Twitter Api.
  6. Sample Output
  7. Future Scope and Extension
  8. More Enterprise Use Cases

1. Create an account with IBM Cloud

Sign up for IBM Cloud. By clicking on create a free account you will get 30 days trial account.

2. Create a new Watson Studio project

Sign up for IBM's Watson Studio.

Click on New project and select Data Science as per below.

Define the project by giving a Name and hit Create.

By creating a project in Watson Studio a free tier Object Storage service will be created in your IBM Cloud account.

3. Create the notebook

4. Run the notebook

When a notebook is executed, what is actually happening is that each code cell in the notebook is executed, in order, from top to bottom.

Each code cell is selectable and is preceded by a tag in the left margin. The tag format is In [x]:. Depending on the state of the notebook, the x can be:

  • A blank, this indicates that the cell has never been executed.
  • A number, this number represents the relative order this code step was executed.
  • A *, this indicates that the cell is currently executing.

There are several ways to execute the code cells in your notebook:

  • One cell at a time.
    • Select the cell, and then press the Play button in the toolbar.
  • Batch mode, in sequential order.
    • From the Cell menu bar, there are several options available. For example, you can Run All cells in your notebook, or you can Run All Below, that will start executing from the first cell under the currently selected cell, and then continue executing all cells that follow.

5. Get access tokens and consumer key to use Twitter Api.

  • Go to https://developer.twitter.com/ and sign in using twitter login ID. If you don't have login credentials, then sign up.

  • Click on Apps (top right of the navigation bar)

  • Click on Create an app button.

  • Follow the process, answer the questions and it should create your app.

  • Click on the Details button and then click on Keys and Tokens and get the access tokens and consumer keys.

6. Sample Output

Lets look at the summarization of the document. We can observe that all the key pointers are included in the summary. The text ranking algorithm has produced good results.

I dislike entering book shops. Every time I walk into a book shop, I will end up
buying one. I entered an international bookshop in Example, Barcelona, while
looking for some cool things to do in Barcelona. And as expected, a book named
"The dead alleys of Barcelona" got my attention, a crime novel. Long story shot:
I bought it, went home, and started reading. In this book, the author Stefanie
Kremser talks about a special part of El Born, downtown Barcelona. She describes
this magical square, this narrow street the main character lived in. I didn't
know this exact street, and I was curious. I went downtown, wandered around the
square and saw this café with the few tables on the terrace. Until today, 7 years
later, it is still my favorite café in Barcelona – thanks to this book! – – –
This Short Travel Story was written by Matt, the guy behind Hostelgeeks. Here at
Hostelgeeks we award and collect 5 Star Hostels around the world. Fancy more coffee?
Find the 13 best coffee shops in Barcelona here. Barcelona is our home. You can
find our best-kept secret tips for Barcelona as well as 23 fun things to do. It
also includes our favorite Café in Barcelona. Our hottest tip for Barcelona?
Rent a Red Vespa (www.via-vespa.com) to get around easily.  Your email address
will not be published.

As we can see in the below image, the important words in the corpus have been highlighted which will help in inference of the data. Wordclouds are beautifully insightful with pros and cons. Word clouds can allow you to share back results from research in a way that does not require an understanding of the technicalities. Some of the pros are below.

  • It reveals the essential.
  • They delight and provide emotional connection.
  • They are fast & engaging. As observed, skilled interpretation is what provides the beautiful insights.

Latent Dirichlet Allocation (LDA) is a probabilistic model with interpretable topics. Topic modeling is one of the most popular NLP techniques with several real-world applications such as dimensionality reduction, text summarization, recommendation engine, etc. To visualize our topics in a 2-dimensional space we will use the pyLDAvis library. This visualization is interactive in nature and displays topics along with the most relevant words.

on the running the cell with command

status = api.update_status(tweet_with_summary_hashtags)
print(status.id) 

you will get a code which indicates the tweet with hashtags has been tweeted.

7. Future Scope and Extension

There are two main approaches to summarization:

Extractive summarization: it works by selecting the most meaningful sentences in an article and arranging them in a comprehensive manner. This means the summary sentences are extracted from the article without any modifications.

Abstractive summarization: it works by paraphrasing its own version of the most important sentence in the article.

There are also two scales of document summarization:

Single-document summarization : the task of summarizing a standalone document. Note that a ” document” could refer to different things depending on the use case (URL, internal PDF file, legal contract, financial report, email, etc.).

Multi-document summarization the task of assembling a collection of documents (usually through a query against a database or search engine) and generating a summary that incorporates perspectives from across documents.

The Extractive Single Document Summarization and tweeting of the same has been showcased in this Code Pattern. The developers can further extend this Code Pattern to Abstractive Single Document, Abstractive Multi-Document and Extractive Multi-Document Summarization. One of the challenges with these summarizations is that it is hard to generalize. For example, summarizing a news article is very different to summarizing a financial earnings report.

There are two common metrics any summarizer attempts to optimize:

  • Topic coverage: does the summary incorporate the main topics from the document?
  • Readability: do the summary sentences flow in a logical way?

Thus the developers can further extend this Code Pattern to optimize it for other specific enterprise applications. The developers can read the following papers and blogs to do so.

8.More Enterprise Use Cases

Companies producing long-form content, like whitepapers, e-books and blogs, might be able to leverage summarization to break down this content and make it sharable on social media sites like Twitter or Facebook. This would allow companies to further re-use existing content and also spread awareness amongst their employees.

Other Examples:

  • Goodreads - analyse all the reviews of the book written by users, create summary and tweet it. The new users dont have to go through all the review comments.
  • Amazon Product Reviews- analyse all the reviews of the products written by users, create summary and tweet it. This will help the new users to make an informed decision without going through all the review comments. Similarly, it can be used as :
  • Internal Company Knowledge Summarizer.
  • Event summarizer.
  • Meeting summarizer.
  • Slack Group Chat Summarizer.
  • Patent research.

Troubleshooting

See DEBUGGING.md.

License

This code pattern is licensed under the Apache Software License, Version 2. Separate third party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 (DCO) and the Apache Software License, Version 2.

ASL FAQ link: http://www.apache.org/foundation/license-faq.html#WhatDoesItMEAN

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published