# Real Estate Starter Notebook

This is a starter Notebook to download the dataset for your project and give some guidance in connecting to Nvidia GPU-powered models.

Make sure to copy or rename this notebook to avoid accidentally overwriting it later.

## Install Libraries

To install libraries you wish to use, type `!pip install <package>`

For instance, to install `pandas`, type `!pip install pandas`.

Below are some examples of libraries you may need, the code is commented out. Remove the `#` to execute. Uncomment one line at a time, to install the libraries one-by-one.

In [3]:
# !pip install langchain_nvidia_ai_endpoints
# !pip install googledriver
# !pip install openai

In [15]:
from googledriver import download_folder
from openai import OpenAI
from langchain_nvidia_ai_endpoints import ChatNVIDIA, NVIDIAEmbeddings, NVIDIARerank
from langchain_core.documents import Document
import pandas as pd

## Download the Data

The code below will download the relevant data for the Real Estate project

In [28]:
# Google Drive location
URL = "https://drive.google.com/drive/folders/1VVLonRYyGHAWxVuPCZei4CV4oxXUKTRM?usp=drive_link"

# Download Google Drive folder
download_folder(URL, 'real_estate')

['real_estate/Consumer_Price_Index.csv',
 'real_estate/EIBOR_2015_2024.csv',
 'real_estate/Gdp_Quarterly.csv',
 'real_estate/Gross_Domestic_Product_At_Constant_Prices.csv',
 'real_estate/Transactions_test.csv',
 'real_estate/Transactions_training_1.csv',
 'real_estate/Transactions_training_2.csv']

## Import the Data

Due to the file size, the data relating to property prices in Dubai is in two parts `Transactions_training_1.csv` and `Transactions_training_2.csv`. The code below ingests the two parts and combines them as `train`.

In [45]:
# Historical Property Prices for Dubai
train_pt1 = pd.read_csv('real_estate/Transactions_training_1.csv')
train_pt2 = pd.read_csv('real_estate/Transactions_training_2.csv', 
                        header=None, names=train_pt1.columns)
train = pd.concat([train_pt1, train_pt2])
train.to_csv('real_estate/Transactions_Training.csv')
train.head()

Unnamed: 0,transaction_id,procedure_id,trans_group_id,trans_group_en,procedure_name_en,instance_date,property_type_id,property_type_en,property_sub_type_id,property_sub_type_en,...,rooms_en,has_parking,procedure_area,rent_value,meter_rent_price,no_of_parties_role_1,no_of_parties_role_2,no_of_parties_role_3,year_instance,actual_worth (Target)
0,1-11-2020-1729,11,1,Sales,Sell,17/2/2020,3,Unit,60,Flat,...,2 B/R,1,153.55,,,1.0,2.0,0.0,2020,2000000
1,1-11-2020-2296,11,1,Sales,Sell,11/3/2020,3,Unit,60,Flat,...,2 B/R,1,153.91,,,1.0,2.0,0.0,2020,2503000
2,1-11-2019-7817,11,1,Sales,Sell,28/8/2019,3,Unit,60,Flat,...,3 B/R,1,207.06,,,1.0,2.0,0.0,2019,2320700
3,1-11-2020-1720,11,1,Sales,Sell,20/2/2020,3,Unit,60,Flat,...,3 B/R,1,386.02,,,1.0,1.0,0.0,2020,8000000
4,1-11-2017-12472,11,1,Sales,Sell,10/10/2017,3,Unit,60,Flat,...,3 B/R,1,190.29,,,1.0,1.0,0.0,2017,2200000


## Modelling

### Option 1: Pre-Deployed Models

This section shows how to connect to models that have been deployed on prem for, available for you to use.

#### LLM

In [6]:
# Define LLM parameters for model LLama 3.1 8b
llm = ChatNVIDIA(
    base_url="http://nvpoc.ddnsfree.com:9901/v1",
    api_key="n/a",
    model="meta/llama-3.1-8b-instruct"
)

# Create generator object
events = llm.stream("Hi, write a short poem")

# Print elements of object
for e in events:
    print(e.content, end='')

Here's a short poem:

The Sunset Sky

The sun sets slow and paints the west,
A fiery hue that's truly best.
Orange, pink, and purple blend,
A colorful sight to behold, a perfect end.

The stars appear, a twinkling sea,
As night falls slow, and darkness spreads with glee.
The moon glows bright, a silver plate,
A beacon in the dark, a gentle state.

I hope you enjoy it!

#### Embedder

In [7]:
embeddings = NVIDIAEmbeddings(
    base_url="http://nvpoc.ddnsfree.com:9902/v1", 
    model="nvidia/nv-embedqa-e5-v5",
    truncate="END"
)

# This will return the vector embeddings for the word 'Hi'
# embeddings.embed_query("Hi")

#### Re-Ranker

In [8]:
reranker = NVIDIARerank(
    base_url="http://nvpoc.ddnsfree.com:9903/v1", 
    model="nvidia/nv-rerankqa-mistral-4b-v3",
    truncate="END"
)

In [9]:
# Example of using a re-ranker to rank passages by relevance to a query
query = "What is the GPU memory bandwidth of H100 SXM?"
passages = [
    "The Hopper GPU is paired with the Grace CPU using NVIDIA's ultra-fast chip-to-chip interconnect, delivering 900GB/s of bandwidth, 7X faster than PCIe Gen5. This innovative design will deliver up to 30X higher aggregate system memory bandwidth to the GPU compared to today's fastest servers and up to 10X higher performance for applications running terabytes of data.", 
    "A100 provides up to 20X higher performance over the prior generation and can be partitioned into seven GPU instances to dynamically adjust to shifting demands. The A100 80GB debuts the world's fastest memory bandwidth at over 2 terabytes per second (TB/s) to run the largest models and datasets.", 
    "Accelerated servers with H100 deliver the compute power—along with 3 terabytes per second (TB/s) of memory bandwidth per GPU and scalability with NVLink and NVSwitch™.", 
]

# Define Re-ranker
response = reranker.compress_documents(
  query=query,
  documents=[Document(page_content=passage) for passage in passages]
)

# Print Ranked Responses
for x in response:
    print("Score: ", x.metadata.get('relevance_score'))
    print("Passage: ", x.page_content, '\n')

Score:  8.8359375
Passage:  Accelerated servers with H100 deliver the compute power—along with 3 terabytes per second (TB/s) of memory bandwidth per GPU and scalability with NVLink and NVSwitch™. 

Score:  0.29736328125
Passage:  The Hopper GPU is paired with the Grace CPU using NVIDIA's ultra-fast chip-to-chip interconnect, delivering 900GB/s of bandwidth, 7X faster than PCIe Gen5. This innovative design will deliver up to 30X higher aggregate system memory bandwidth to the GPU compared to today's fastest servers and up to 10X higher performance for applications running terabytes of data. 

Score:  -0.10614013671875
Passage:  A100 provides up to 20X higher performance over the prior generation and can be partitioned into seven GPU instances to dynamically adjust to shifting demands. The A100 80GB debuts the world's fastest memory bandwidth at over 2 terabytes per second (TB/s) to run the largest models and datasets. 



### Option 2: API Endpoint

If you would like to use other Nvidia models, feel free to browse their catalog 
https://build.nvidia.com/explore/discover

You will first need to create a Developer account with them. If you use your ADCB email address, you will be eligible for 5000 free credits. Otherwise, you will be given 1000 free credits. These will be consumed as you query the API.

From the catalog, you can insert Python the code snippet (sample shown below). Just make sure to replace `my_api_key` with your own and `model` with the one you wish to query.

In [13]:
# Generate API key from the Nvidia catalog site
my_api_key = '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'

In [14]:
# Define LLM paramters
client = ChatNVIDIA(
  model="meta/llama-3.2-3b-instruct",
  api_key=my_api_key, 
  temperature=0.2,
  top_p=0.7,
  max_tokens=1024,
)

# Print out reponse to a query
for chunk in client.stream([{"role":"user",
                             "content":"Write a limerick about the wonders of GPU computing."}]): 
  print(chunk.content, end="")


Here is a limerick about GPU computing:

There once was a GPU so fine,
Whose computing power was truly divine.
It processed with speed,
And calculations with ease,
And made complex tasks truly sublime.

## Test Data for Scoring your Model

After 2:30pm, the test data will be made available. This is to allow you to evaluate your model performance against unseen data. Running the block of code below, will save down to a new folder called `real_estate_test`. This will contain the orginal data, as well as the test dataset.

In [None]:
# Google Drive location
URL = "https://drive.google.com/drive/folders/1VVLonRYyGHAWxVuPCZei4CV4oxXUKTRM?usp=drive_link"

# Download Google Drive folder
download_folder(URL, 'real_estate_test')

This will only contain the test data set after it has been released. If you run it before 2:30pm, you can re-run it after it has been released.