In [106]:
import os
from dotenv import load_dotenv
import google.generativeai as genai
import re
import pandas as pd
import json

## LLM Gemini model preparation

Please get your GEMINI_API_KEY in Google AI Studio first, then add it to `.env` file in the root directory in the same way shown in `.env_example` file.

In [98]:
# Load environment variables from .env file
load_dotenv()

genai.configure(api_key=os.environ["GEMINI_API_KEY"])

Gemini 1.5 Flash

Rate Limits:

- 15 RPM (requests per minute)

- 1 million TPM (tokens per minute)

- 1,500 RPD (requests per day)

Price (input)

- Free of charge

In [273]:
# Create the model
# See https://ai.google.dev/api/python/google/generativeai/GenerativeModel
generation_config = {
    "temperature": 0,
    "top_p": 0.95,
    "top_k": 64,
    "max_output_tokens": 1024,
    "response_mime_type": "text/plain",
}

SYSTEM_PROMPT = """
You are a helpful assistant. You will analyze the given text and provide a JSON response indicating whether the text mentions or suggests certain things. Please follow these guidelines:
1. "is_bathroom_shared": Check if there is any mention of a shared bathroom. Ignore mentions of other shared amenities like "shared kitchen" or "shared laundry". Look for keywords like "shared bathroom", "bathroom shared", "bathroom is shared", etc.
2. "is_kitchen_shared": Check if there is any mention of a shared kitchen. Ignore mentions of other shared amenities like "shared bathroom" or "shared laundry". Look for keywords like "shared kitchen", "kitchen shared", "kitchen is shared", etc.
3. "is_host_in_unit": Check if there is any mention of the host living in the same unit, apartment or house as the guest. Look for keywords like "live in the unit", "live next door", etc.
4. "is_host_on_prem": Check if there is any mention of the host living on premises. Look for keywords like "live on premises", "live in the same building", "live on-site", etc.
5. "shared_utilities": Check if there is any mention of shared utilities or facilities. Ignore mentions of other shared amenities like "shared kitchen" or "shared bathroom". Look for keywords like "shared laundry", "shared laundry room", "laundry shared", "laundry is shared", "shared electricity", "shared internet", "shared cable TV", etc.
6. "is_entrance_separate": Check if there is any mention of separate entrance. Look for keywords like "private entrance", "separate entrance", etc.
7. "other_guests_on_prem": Check if there is any mention of other guests living on the premises/house/apt/apartment or anything indicating that there will be other guests on the premises.
8. "host_profile_mentioned": Check if there is any mention of the host's profile being Superhost or similar detail.
9. "property_separate_suite": Check if there is any mention of private or separate suites/apartments. Look for keywords like "private suite", "separate suite", "separate apartments", etc.
10. "host_interaction": Check if there is any mention of limited or no interaction with the host.
11. "house_rules_host_indication": Check if there is any mention of any specific rule indicating the host's presence. Look for keywords like "noise", "noise levels", "no noise", "curfew", "limited time" "from 10am to 10pm", "until 10pm", etc.
12. "listing_personal_touches": Check if there is any mention of language in the text suggesting personal touches.
13. "entire_property": Check if there is any mention of the property type being listed. Ignore mentions of other property types like "entire room", "private room" or "shared room". Look for keywords like "entire home", "entire house", "entire studio", "entire apt", "entire apartment", "entire condo", "entire penthouse", "whole house", "whole apartment", "whole studio", etc.
14. "host_additional_services": Check if there is any mention of host offering services, therefore, indicating their presence. Look for keywords like "breakfast", "breakfast provided", "breakfast included", "meals provided", "meal provided", "meal included", "available upon request", etc.
15. "has_self_check_in": Check if there is any menition of guest self check-in. Look for keywords like "self check-in", "self check in", "self checkin", etc.

Respond with a JSON formatted output. No additional text, markdown, details or annotations required:
{
  "is_bathroom_shared": <true/false>,
  "is_kitched_shared": <true/false>,
  "is_host_on_prem": <true/false>,
  "other_guests_on_prem": <true/false>, 
  ...
}
"""

model = genai.GenerativeModel(
    model_name="gemini-1.5-flash-latest",
    generation_config=generation_config,
    system_instruction=SYSTEM_PROMPT
)

## Data preparation

In [274]:
# Specify the path to your .jsonl file
file_path = '../../data/description_parsing/descriptions_output.jsonl'

# Read the .jsonl file into a DataFrame
df = pd.read_json(file_path, lines=True)

In [275]:
print(df.shape)
df.head()

(1387, 7)


Unnamed: 0,listing_id,place_description,the_space_description,guest_access_description,other_things_to_note_description,registration_number_description,during_your_stay_description
0,944162829781112704,Welcome to our brand new and beautiful apartme...,You will enjoy the quiet and private room with...,This is a private big bedroom with a private b...,1) The private bathroom is not inside the room...,,
1,1045877421386474496,"Enjoy a luxury, comfortable stay in this brigh...",,,,00180521,
2,13131513,Enjoy living in Vancouver’s premier neighbourh...,"Freshly painted walls, designer sheets, heated...",During your trip you will have unfettered acce...,.,24-160094,Our host guest relationship is of the utmost i...
3,873338302587192320,We have a comfortable suite with a separate en...,Welcome to our Apollo Sunset Suite! <br />We l...,"-Internet (Telus PureFibre Internet Gigabit, t...","There are Save on Food Supermarket, Shoppers, ...",00136310,
4,874351314046879488,Spend a night in this unique music studio! <br...,"Cozy bedroom with desk, piano and comfy bed. <...",Guests will have access to the basement suite ...,There are only a handful of in-person music le...,,Send a message through the app if you need any...


In [276]:
# Function to join columns with their headers
def join_columns_with_headers(row):
    return " \n ".join([f"{col.replace('_', ' ').capitalize()}:\n{row[col]}" 
                      for col in df.columns 
                      if col not in ['listing_id', 'registration_number_description']
                      ])

# Applying the function to each row
df['combined_description'] = df[['place_description', 
                                 'the_space_description', 
                                 'guest_access_description', 
                                 'other_things_to_note_description',
                                 'during_your_stay_description']
                                 ].apply(join_columns_with_headers, axis=1)

# Displaying the result
print(df[['listing_id', 'combined_description']])


               listing_id                               combined_description
0      944162829781112704  Place description:\nWelcome to our brand new a...
1     1045877421386474496  Place description:\nEnjoy a luxury, comfortabl...
2                13131513  Place description:\nEnjoy living in Vancouver’...
3      873338302587192320  Place description:\nWe have a comfortable suit...
4      874351314046879488  Place description:\nSpend a night in this uniq...
...                   ...                                                ...
1382  1083582514071370880  Place description:\nEnjoy a stylish experience...
1383             47480096  Place description:\n"Come be our guest in our ...
1384              8273235  Place description:\nWe live in North Vancouver...
1385   970431967548785664  Place description:\nRelax with the whole famil...
1386  1175286792719540224  Place description:\nDescription<br /><br /><br...

[1387 rows x 2 columns]


In [277]:
sample_combined_description = df.combined_description[3]

print(sample_combined_description)

Place description:
We have a comfortable suite with a separate entrance, full kitchen and dining room, full bathroom, an independent bedroom and one completely isolated room for reclining with a sofa-bed and a large desk with a comfortable chair for you to use as a working space. <br />The house is located in a quiet residential district with walking distance to Restaurants, Supermarkets, Spa,  Coffee Shops and Royal Columbia Hospital.  There is a short distance to bus stops and Sky-train Station. 
 The space description:
Welcome to our Apollo Sunset Suite! <br />We live in a quiet, safe community with beautiful surroundings. Just a short walk from the new Skwo:wech Primary School and Buses. It's a 5-minute walk to Royal Columbian Hospital and an 8-minute walk to the SkyTrain station for quick access to Metrotown and downtown Vancouver. We are very close to several supermarkets (Save On Foods, Shoppers) and recreational facilities such as bars, restaurants, spas, cafes. <br />Highways 

## Run Gemini to generate response

In [278]:
response = model.generate_content(sample_combined_description)

print(response.text)

```json
{
  "is_bathroom_shared": false,
  "is_kitchen_shared": false,
  "is_host_in_unit": false,
  "is_host_on_prem": true,
  "shared_utilities": false,
  "is_entrance_separate": true,
  "other_guests_on_prem": false,
  "host_profile_mentioned": false,
  "property_separate_suite": true,
  "host_interaction": false,
  "house_rules_host_indication": false,
  "listing_personal_touches": false,
  "entire_property": false,
  "host_additional_services": false,
  "has_self_check_in": true
}
```


In [279]:
def clean_features(features):
    # Use regex to remove everything before the first { and after the last }
    cleaned_features = re.sub(r'^.*?({.*}).*$', r'\1', features.lower(), flags=re.DOTALL)
    return cleaned_features

In [280]:
cleaned_features = clean_features(response.text)

json.loads(cleaned_features)

{'is_bathroom_shared': False,
 'is_kitchen_shared': False,
 'is_host_in_unit': False,
 'is_host_on_prem': True,
 'shared_utilities': False,
 'is_entrance_separate': True,
 'other_guests_on_prem': False,
 'host_profile_mentioned': False,
 'property_separate_suite': True,
 'host_interaction': False,
 'house_rules_host_indication': False,
 'listing_personal_touches': False,
 'entire_property': False,
 'host_additional_services': False,
 'has_self_check_in': True}