**Goal of the project**

In this project, we’ll use the distilbart-cnn 12-6 model for text summarization, by using it to automatically generate short meta descriptions.

**Load the packages**

In [20]:
# Importing libraries
import pandas as pd
from ecommercetools import nlp

In [21]:
pd.set_option('max_colwidth', 200)

**Load the data**

For this project, I am using a dataset of product descriptions from the GoNutrition website.

In [22]:
# Load dataset
df = pd.read_csv('../input/gonutrition/gonutrition.csv')

In [23]:
# Rename Pandas columns to lower case
df.columns = df.columns.str.lower()

In [24]:
# Examine the data
df.head()

Unnamed: 0,product_name,product_description
0,Whey Protein Isolate 90,What is Whey Protein Isolate? Whey Protein Isolate 90 is our highest quality whey protein powder and provides 23g of protein per 25g serving. This whey protein isolate powder is 90% protein and ex...
1,Whey Protein 80,"What is Whey Protein 80? Whey Protein 80 is an ultra premium quality 80% whey protein powder exclusively from free range, grass fed cows providing an unrivalled combination of taste, value and res..."
2,Volt Preworkout™,"What is Volt™? Our Volt pre workout formula includes 12 advanced active ingredients that work together to increase energy, mental focus and muscular pump. Volt enables you to achieve the ultimate ..."


**Generate meta descriptions for all products**

To auto-generate our meta descriptions using the text summarisation in EcommerceTools we’ll be using the get_summaries( ) function from the nlp module.

In [25]:
df = nlp.get_summaries(df, 
                       'product_description', 
                       'meta_description', 
                       min_length = 20, 
                       max_length = 28, 
                       do_sample = False)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 (https://huggingface.co/sshleifer/distilbart-cnn-12-6)
No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 (https://huggingface.co/sshleifer/distilbart-cnn-12-6)
No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 (https://huggingface.co/sshleifer/distilbart-cnn-12-6)


Since we need our meta descriptions to be a specific length (ideally at least 50 characters and fewer than 160 characters), we’ll also calculate the length using str.len() so we can tweak the parameters. Longer sentences will be truncated, so we’ll add an ellipsis “…” afterwards to indicate truncation to the user.

In [26]:
df['words'] = df['meta_description'].str.split().str.len()
df['characters'] = df['meta_description'].str.len()
df['meta_description'] = df['meta_description'] + '...'

**Examine the results**

This approach works fairly well. We get back some perfectly usable meta descriptions that contain a summary of each product page’s content.

In [27]:
df[['meta_description', 'words', 'characters']].head()

Unnamed: 0,meta_description,words,characters
0,Whey Protein Isolate 90 is our highest quality whey protein powder and provides 23g of protein per 25g serving....,19,111
1,GN Whey Protein 80 is an ultra premium quality 80% whey protein powder. Contains 20g of premium grade protein per...,20,113
2,"Volt enables you to achieve the ultimate workout so you can maximise your lean muscle, power and strength gains by training harder....",22,131


**Whey Protein Isolate 90**

In [28]:
df['meta_description'][0]

'Whey Protein Isolate 90 is our highest quality whey protein powder and provides 23g of protein per 25g serving....'

**GN Whey Protein 80**

In [29]:
df['meta_description'][1]

'GN Whey Protein 80 is an ultra premium quality 80% whey protein powder. Contains 20g of premium grade protein per...'

**Volt Preworkout**

In [30]:
df['meta_description'][2]

'Volt enables you to achieve the ultimate workout so you can maximise your lean muscle, power and strength gains by training harder....'

As with the previous approach, we do end up with some truncated sentences, but the ellipsis solves this and a human could easily trim or extend them to fit.