# Report Description Generator with GPT2

This notebook demonstrates how to use a Spanish language model to automatically generate descriptions for report views based on selected metadata fields.

<a href="https://colab.research.google.com/github/cbadenes/semantic-report-search/blob/main/data/analysis/32_text_generation.ipynb" target="_parent">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab"/>
</a>

In [1]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import pandas as pd

pd.set_option("display.max_colwidth", None)
pd.set_option("display.max_columns", 100)
pd.set_option("display.max_rows", 100)

In [2]:
model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

In [3]:
# Make sure the Excel file is accessible in the current environment
df = pd.read_excel("Reporting_Inventory.xlsx", sheet_name="Views")
df.head()

Unnamed: 0,ID Data Product,Report Name,Product Owner,PBIX_File,Report View,Description,Category,Status,Rename,Dimensions,KPIs,Other Terms,Filters,Tags,Priority
0,RPPBI0032,Feeder Market - 2024,Jonathan Shields,LifeReport.pbix,CRITERIA,Methodolody and definition of the algorithim of Feeder Market,Informative,Productive,,,,,,,Priority 1
1,RPPBI0032,Feeder Market - 2024,Jonathan Shields,LifeReport.pbix,DESTINATION_OF_FEEDER_MARKETS,View focused on understand the performance by hotel for a specific feeder market o selection of feeder marktes.,Functional,Productive,,"Hotel, month, Feeder Market, Segment, Channel Mix, Room Type","Total Revenue, Room Revenue, RN, Lead Time, Lenght of Stay, AOV, ADR, ADR Net, %Cost",,,,Priority 1
2,RPPBI0032,Feeder Market - 2024,Jonathan Shields,LifeReport.pbix,EXECUTIVE VIEW,Global view to understand Feeder Market Performance compared to previous years diferentiating between domestic and international,Executive,Productive,,"Hotel, month, Feeder Market, Segment, Channel Mix, Room Type","Total Revenue, Room Revenue, RN, Lead Time, Lenght of Stay, AOV, ADR, ADR Net, %Cost",,,,Priority 1
3,RPPBI0032,Feeder Market - 2024,Jonathan Shields,LifeReport.pbix,FEEDER MARKET FLOWS,"View focused on understanding the booking behaviour by Feeder Market. It allows to understand when, where and through which channels and segments are producing the different feeder markets for a selected booking period. Besides, it shows the flow (Feeder Market to Destination) by contribution of total revenue",Functional,Productive,,"Hotel, month, Feeder Market, Segment, Channel Mix, Room Type, Booked Year and Booked month","Total Revenue, Room Revenue, RN, Lead Time, Lenght of Stay, AOV, ADR, ADR Net, %Cost",,,,Priority 1
4,RPPBI0032,Feeder Market - 2024,Jonathan Shields,LifeReport.pbix,FEEDER_MARKET_DETAIL,"Detail view of Feeder Markets by Destination including more indepth view by channel, and including Top_Agency and Top_Company information",Functional,Productive,,"Hotel, month, Feeder Market, Segment, Channel Mix, Room Type","Total Revenue, Room Revenue, RN, Lead Time, Lenght of Stay, AOV, ADR, ADR Net, %Cost",,,,Priority 1


In [4]:
def build_input_text(row):
    return (
        f"Report Name: {row['Report Name']}. "
        f"Category: {row.get('Category', 'N/A')}. "
        f"KPIs: {row.get('KPIs', 'not specified')}. "
        f"Suggested Description:"
    )

df["input_text"] = df.apply(build_input_text, axis=1)
df[["Report Name", "Report View", "input_text"]].head()


Unnamed: 0,Report Name,Report View,input_text
0,Feeder Market - 2024,CRITERIA,Report Name: Feeder Market - 2024. Category: Informative. KPIs: nan. Suggested Description:
1,Feeder Market - 2024,DESTINATION_OF_FEEDER_MARKETS,"Report Name: Feeder Market - 2024. Category: Functional. KPIs: Total Revenue, Room Revenue, RN, Lead Time, Lenght of Stay, AOV, ADR, ADR Net, %Cost. Suggested Description:"
2,Feeder Market - 2024,EXECUTIVE VIEW,"Report Name: Feeder Market - 2024. Category: Executive. KPIs: Total Revenue, Room Revenue, RN, Lead Time, Lenght of Stay, AOV, ADR, ADR Net, %Cost. Suggested Description:"
3,Feeder Market - 2024,FEEDER MARKET FLOWS,"Report Name: Feeder Market - 2024. Category: Functional. KPIs: Total Revenue, Room Revenue, RN, Lead Time, Lenght of Stay, AOV, ADR, ADR Net, %Cost. Suggested Description:"
4,Feeder Market - 2024,FEEDER_MARKET_DETAIL,"Report Name: Feeder Market - 2024. Category: Functional. KPIs: Total Revenue, Room Revenue, RN, Lead Time, Lenght of Stay, AOV, ADR, ADR Net, %Cost. Suggested Description:"


In [5]:
def generate_description(input_text, max_length=80):
    inputs = tokenizer(input_text, return_tensors="pt")
    outputs = model.generate(
        inputs["input_ids"],
        max_length=max_length,
        num_return_sequences=1,
        no_repeat_ngram_size=2,
        do_sample=True,
        top_k=50,
        top_p=0.95,
        temperature=0.7,
        pad_token_id=tokenizer.eos_token_id
    )
    generated = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return generated.replace(input_text, "").strip()

In [6]:
# Example: generate description for the first report view
sample_input = df.loc[1, "input_text"]
sample_description = generate_description(sample_input)
print(f"Input text:\n{sample_input}\n")
print(f"Generated description:\n{sample_description}")

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


Input text:
Report Name: Feeder Market - 2024. Category: Functional. KPIs: Total Revenue, Room Revenue, RN, Lead Time, Lenght of Stay, AOV, ADR, ADR Net, %Cost. Suggested Description:

Generated description:
This is a Feeders Market, with a number of new features and new customers. It is our goal to expand the market in the coming months,


In [10]:
# Create a sample
df_sample = df.head(10).copy()

# Generate descriptions for all views
df_sample["Generated_Description"] = df_sample["input_text"].apply(generate_description)

df_sample[["Report Name", "Category", "KPIs", "Generated_Description"]].head(10)

Unnamed: 0,Report Name,Category,KPIs,Generated_Description
0,Feeder Market - 2024,Informative,,"A feeder market for the future of the entire food supply industry, offering a variety of feedable products from the traditional food chain to provide a convenient, inexpensive source of fresh, wholesome, and nutritious food. In a future, when consumers can choose a food source without having to"
1,Feeder Market - 2024,Functional,"Total Revenue, Room Revenue, RN, Lead Time, Lenght of Stay, AOV, ADR, ADR Net, %Cost","This is a feeder market in the US. It is designed to provide a safe, efficient, and cost-effective way to increase customer confidence in"
2,Feeder Market - 2024,Executive,"Total Revenue, Room Revenue, RN, Lead Time, Lenght of Stay, AOV, ADR, ADR Net, %Cost",This website is based on the current estimate of net income derived from the various Feeders Market categories. The net return for each of these categories is derived
3,Feeder Market - 2024,Functional,"Total Revenue, Room Revenue, RN, Lead Time, Lenght of Stay, AOV, ADR, ADR Net, %Cost","This is a Market-based feeder market. As a result, it is not a complete data set. This feed is only a small subset of"
4,Feeder Market - 2024,Functional,"Total Revenue, Room Revenue, RN, Lead Time, Lenght of Stay, AOV, ADR, ADR Net, %Cost","The feeder market is a service provider's goal to provide customer service to a range of customers. In order to do so, they need to have"
5,Feeder Market - 2024,Functional,"Total Revenue, Room Revenue, RN, Lead Time, Lenght of Stay, AOV, ADR, ADR Net, %Cost",The feeder market is an exciting new tool for evaluating how food and beverage companies are investing in their food service operations. The Feeders Market is a
6,Feeder Market - 2024,Index,,This is the feeder market for the 24-hour feed and you can learn more about how Feeders and Feeding Market are designed.\n\nB. How to Use the Market\n. It is important to understand that the market is open source and that you should not use it to
7,Feeder Market - 2024,Functional,"Total Spending, Total Revenue, Arrivals, Nights,",This is the most popular feeder market in the world. You can easily find the largest and most profitable markets for the various types of feeders. This market is a great place to sell your business. Feeders are a very popular and
8,Feeder Market - 2024,Functional,"Total Revenue, Room Revenue, RN,ADR","A small food market for feeders.\n\nFood Market for Feeders - 2025\n, 2025-2025\n:\n. This is a small, market, and will be run by a volunteer, not a big company. It will"
9,Feeder Market - 2025,Informative,"Total Revenue, Room Revenue, RN, Lead Time, Lenght of Stay, AOV, ADR, ADR Net, %Cost","To provide a useful, useful and informative feeder market. The Feeders Market provides a framework for understanding and implementing a wide range of inputs and"


In [12]:
# Save results

df_sample.to_csv("generated_descriptions.csv", index=False)
