In [64]:
import os
from dotenv import load_dotenv
import google.generativeai as genai
import re
import pandas as pd
import json

## LLM Gemini model preparation

Please get your GEMINI_API_KEY in Google AI Studio first, then add it to `.env` file in the root directory in the same way shown in `.env_example` file.

In [65]:
# Load environment variables from .env file
load_dotenv()

genai.configure(api_key=os.environ["GEMINI_API_KEY"])

Gemini 1.5 Flash

Rate Limits:

- 15 RPM (requests per minute)

- 1 million TPM (tokens per minute)

- 1,500 RPD (requests per day)

Price (input)

- Free of charge

In [91]:
# Create the model
# See https://ai.google.dev/api/python/google/generativeai/GenerativeModel
generation_config = {
    "temperature": 0,
    "top_p": 0.95,
    "top_k": 64,
    "max_output_tokens": 1024,
    "response_mime_type": "text/plain",
}

SYSTEM_PROMPT = """
You are a helpful assistant. You will analyze the given text and provide a JSON response indicating whether the text mentions or suggests certain things. Please follow these guidelines:
1. "is_bathroom_shared": Check if there is any mention of a shared bathroom. Ignore mentions of other shared amenities like "shared kitchen" or "shared laundry". Look for keywords like "shared bathroom", "bathroom shared", "bathroom is shared", etc.
2. "is_kitchen_shared": Check if there is any mention of a shared kitchen. Ignore mentions of other shared amenities like "shared bathroom" or "shared laundry". Look for keywords like "shared kitchen", "kitchen shared", "kitchen is shared", etc.
3. "is_host_in_unit": Check if there is any mention of the host living in the same unit as the guest. Look for keywords like "we live in the unit", "we live in the other room", etc. If the text explicitly indicates that the host lives upstairs, downstairs, or in a separate building or unit, set this to false.
4. "is_host_on_prem": Check if there is any mention of the host living on premises. Look for keywords like "live on premises", "live in the same building", "live on-site", etc.
5. "shared_utilities": Check if there is any mention of shared utilities or facilities. Ignore mentions of other shared amenities like "shared kitchen" or "shared bathroom". Look for keywords like "shared laundry", "shared laundry room", "laundry shared", "laundry is shared", "shared electricity", "shared internet", "shared cable TV", etc.
6. "is_entrance_separate": Check if there is any mention of separate entrance. Look for keywords like "private entrance", "separate entrance", etc.
7. "other_guests_on_prem": Check if there is any mention of other guests living on the premises/house/apt/apartment or anything indicating that there will be other guests on the premises.
8. "host_profile_mentioned": Check if there is any mention of the host's profile being Superhost or similar detail.
9. "is_separate_suite": Check if there is any mention of private or separate suites/apartments. Look for keywords like "private suite", "separate suite", "separate apartments", etc.
10. "host_interaction": Check if there is any mention of limited or no interaction with the host.
11. "has_house_rules": Check if there is any mention of any specific rule indicating the host's presence. Look for keywords like "noise", "noise levels", "no noise", "curfew", "limited time" "from 10am to 10pm", "until 10pm", "not allowed", etc.
12. "has_personal_touches": Check if there is any mention of language in the text suggesting personal touches.
13. "entire_property": Check if there is any mention of the property type being listed. Ignore mentions of other property types like "entire room", "private room" or "shared room". Look for keywords like "entire home", "entire house", "entire studio", "entire apt", "entire apartment", "entire condo", "entire penthouse", "whole house", "whole apartment", "whole studio", etc.
14. "includes_additional_service": Check if there is any mention of host offering services, therefore, indicating their presence. Look for keywords like "breakfast", "breakfast provided", "breakfast included", "meals provided", "meal provided", "meal included", "available upon request", etc.
15. "has_self_check_in": Check if there is any menition of guest self check-in. Look for keywords like "self check-in", "self check in", "self checkin", etc.

Respond with a JSON formatted output. No additional text, markdown, details or annotations required:
{
  "is_bathroom_shared": <true/false>,
  "is_kitched_shared": <true/false>,
  "is_host_in_unit": <true/false>,
  "is_host_on_prem": <true/false>,
  "shared_utilities": <true/false>, 
  ...
}
"""

model = genai.GenerativeModel(
    model_name="gemini-1.5-flash-latest",
    generation_config=generation_config,
    system_instruction=SYSTEM_PROMPT
)

## Data preparation

In [92]:
# Specify the path to your .jsonl file
file_path = '../../data/description_parsing/result.jsonl'

# Read the .jsonl file into a DataFrame
df = pd.read_json(file_path, lines=True)

In [93]:
print(df.shape)
df.head(10)

(2046, 26)


Unnamed: 0,listing_id,checking_in_and_out_house_rule,during_your_stay_house_rule,before_you_leave_house_rule,bathroom_amenities,bedroom_and_laundry_amenities,entertainment_amenities,heating_and_cooling_amenities,home_safety_amenities,internet_and_office_amenities,...,the_space_description,guest_access_description,other_things_to_note_description,registration_number_description,accomodation_type,scenic_views_amenities,outdoor_amenities,family_amenities,privacy_and_safety_amenities,during_your_stay_description
0,992310817343498240,"[Check-in after 3:00 p.m., Checkout before 11:...","[4 guests maximum, No pets, No smoking]","[Turn things off, Lock up]",,,,,,,...,,,,,,,,,,
1,52491262,"[Check-in after 3:00 p.m., Checkout before 11:...","[4 guests maximum, No pets, No parties or even...","[Gather used towels, Throw trash away, Turn th...","[Hair dryer, Cleaning products, Shampoo, Condi...","[Washer, Dryer – In building, Essentials, Hang...","[Ethernet connection, TV]","[Portable air conditioning, Indoor fireplace: ...","[Smoke alarm, Carbon monoxide alarm, Fire exti...","[Wifi, Dedicated workspace]",...,,,,,,,,,,
2,671020786592088704,"[Check-in: 4:00 p.m.–9:00 p.m., Checkout befor...","[2 guests maximum, No pets, No parties or even...",,,,,,,,...,The one bedroom condo features a fully equippe...,The entire condo is at your disposal as well a...,"Before you book, please note that this condo i...",24-159113,Entire condo,,,,,
3,840818841362172416,"[Check-in after 4:00 p.m., Checkout before 12:...","[1 guest maximum, Pets allowed, Quiet hours, N...","[Turn things off, Lock up]",,,,,,,...,,,,,,,,,,
4,873876805820264448,"[Check-in: 3:00 p.m.–12:00 a.m., Checkout befo...","[2 guests maximum, No pets, Quiet hours, No pa...","[Gather used towels, Throw trash away, Turn th...","[Hair dryer, Cleaning products, Hot water]","[Essentials, Hangers, Bed linens, Extra pillow...","[40-inch TV, Books and reading material]","[Central air conditioning, Central heating]","[Exterior security cameras on property, Smoke ...","[Fast wifi – 673 Mbps, Dedicated workspace]",...,,,,,,[Garden view],[Private backyard – Fully fenced],,,
5,11605179,"[Check-in: 3:00 p.m.–11:00 p.m., Checkout befo...","[2 guests maximum, No pets, No parties or even...",,,,,,,,...,,,,,,,,,,
6,937111808015809280,"[Check-in after 4:00 p.m., Checkout before 11:...","[4 guests maximum, No pets, Quiet hours, No pa...",[Lock up],"[Bathtub, Hair dryer, Cleaning products, Hot w...","[Washer, Free dryer – In unit, Essentials, Han...","[TV with Amazon Prime Video, Netflix]",[Radiant heating],"[Exterior security cameras on property, Smoke ...",[Wifi],...,,,,,,,,[Pack ’n play/Travel crib – available upon req...,,
7,874531542306878080,"[Check-in after 3:00 p.m., Checkout before 11:...","[3 guests maximum, Pets allowed, Quiet hours, ...","[Gather used towels, Throw trash away, Turn th...","[Bathtub, Hair dryer, Cleaning products, Dove ...","[Essentials, Hangers, Bed linens, Extra pillow...",[75-inch HDTV],"[Portable fans, Radiant heating]","[Exterior security cameras on property, Fire e...","[Wifi, Dedicated workspace]",...,,,,,,,,[Folding or convertible high chair – available...,,
8,859698453756280064,"[Check-in after 4:00 p.m., Checkout before 11:...","[4 guests maximum, Pets allowed, Quiet hours, ...",,"[Bathtub, Hair dryer, Cleaning products, Shamp...","[Free washer – In unit, Free dryer – In unit, ...","[50-inch HDTV with Disney+, Netflix]","[Central air conditioning, Indoor fireplace, C...",,"[Fast wifi – 90 Mbps, Dedicated workspace]",...,,,,,,,"[Shared patio or balcony, Backyard, BBQ grill]",,"[Lock on bedroom door, Exterior security camer...",
9,1009946230387520128,"[Check-in after 5:00 p.m., Checkout before 1:0...","[3 guests maximum, No pets, Quiet hours, No pa...","[Gather used towels, Throw trash away, Turn th...",,,,,,,...,,,,,,,,,,


In [94]:
# Function to join columns with their headers
def join_columns_with_headers(row):
    return " \n ".join([f"{col.replace('_', ' ').capitalize()}:\n{row[col]}" 
                      for col in df.columns 
                      if col not in ['listing_id', 
                                     'registration_number_description',
                                     'during_your_stay_house_rule',
                                     'before_you_leave_house_rule', 
                                     'bathroom_amenities', 
                                     'bedroom_and_laundry_amenities', 
                                     'entertainment_amenities', 
                                     'heating_and_cooling_amenities', 
                                     'home_safety_amenities', 
                                     'internet_and_office_amenities',
                                     'kitchen_and_dining_amenities', 
                                     'parking_and_facilities_amenities', 
                                     'not_included_amenities', 
                                     'scenic_views_amenities', 
                                     'outdoor_amenities',
                                     'family_amenities', 
                                     'privacy_and_safety_amenities']
                      ])

# Applying the function to each row
df['combined_description'] = df[['place_description', 
                                 'the_space_description', 
                                 'guest_access_description', 
                                 'other_things_to_note_description',
                                 'during_your_stay_description',
                                 'checking_in_and_out_house_rule',
                                 'accomodation_type', 
                                 'location_features_amenities',
                                 'services_amenities']
                                 ].apply(join_columns_with_headers, axis=1)

# Displaying the result
print(df[['listing_id', 'combined_description']])


               listing_id                               combined_description
0      992310817343498240  Checking in and out house rule:\n['Check-in af...
1                52491262  Checking in and out house rule:\n['Check-in af...
2      671020786592088704  Checking in and out house rule:\n['Check-in: 4...
3      840818841362172416  Checking in and out house rule:\n['Check-in af...
4      873876805820264448  Checking in and out house rule:\n['Check-in: 3...
...                   ...                                                ...
2041             51453941  Checking in and out house rule:\nnan \n Locati...
2042             48943875  Checking in and out house rule:\nnan \n Locati...
2043   919040366006914560  Checking in and out house rule:\nnan \n Locati...
2044   663251426089581568  Checking in and out house rule:\nnan \n Locati...
2045  1100359701544645760  Checking in and out house rule:\nnan \n Locati...

[2046 rows x 2 columns]


In [95]:
sample_combined_description = df.combined_description[2]

print(sample_combined_description)

Checking in and out house rule:
['Check-in: 4:00 p.m.–9:00 p.m.', 'Checkout before 11:00 a.m.', 'Self check-in with smart lock'] 
 Location features amenities:
nan 
 Services amenities:
nan 
 Place description:
Enjoy the breathtaking view from this centrally located condo in Vancouver’s most interesting emerging neighbourhood. It is hard to find a more beautiful view of the north shore mountains and Vancouver’s working harbour. Enjoy the sights and sounds; retreat to the quiet of this luxury condo to cook, work, entertain and sleep. Everything you need in your home away from home. 
 The space description:
The one bedroom condo features a fully equipped gourmet kitchen; a full bathroom, a living and dining area as well as separate work area and washer/dryer. 
 Guest access description:
The entire condo is at your disposal as well as outdoor entertainment areas and exercise areas are accessible to guests 
 Other things to note description:
Before you book, please note that this condo is 

## Run Gemini to generate response

In [96]:
response = model.generate_content(sample_combined_description)

print(response.text)

{
  "is_bathroom_shared": false,
  "is_kitchen_shared": false,
  "is_host_in_unit": false,
  "is_host_on_prem": true,
  "shared_utilities": false,
  "is_entrance_separate": false,
  "other_guests_on_prem": false,
  "host_profile_mentioned": false,
  "is_separate_suite": false,
  "host_interaction": false,
  "has_house_rules": true,
  "has_personal_touches": false,
  "entire_property": true,
  "includes_additional_service": false,
  "has_self_check_in": true
}



In [97]:
def clean_features(features):
    # Use regex to remove everything before the first { and after the last }
    cleaned_features = re.sub(r'^.*?({.*}).*$', r'\1', features.lower(), flags=re.DOTALL)
    return cleaned_features

In [98]:
cleaned_features = clean_features(response.text)

json.loads(cleaned_features)

{'is_bathroom_shared': False,
 'is_kitchen_shared': False,
 'is_host_in_unit': False,
 'is_host_on_prem': True,
 'shared_utilities': False,
 'is_entrance_separate': False,
 'other_guests_on_prem': False,
 'host_profile_mentioned': False,
 'is_separate_suite': False,
 'host_interaction': False,
 'has_house_rules': True,
 'has_personal_touches': False,
 'entire_property': True,
 'includes_additional_service': False,
 'has_self_check_in': True}