## Data Modelling

#### Required fields
* Transaction type (i.e. sale vs. rent - string)
* Bedrooms (integer)
* Bathrooms (integer)
* Description (free text string)
* Property type e.g. flat, detached house, terraced house
* Price e.g. 500,000 (typically integer)
* Location :  Key location data here is Postcode district and/or Postcode
* Agent (advertising the property)
* Listing source
* Listing URL
* Other nice-to-have metadata
* If a rental property is furnished or not
* Anything else you deem interesting

In [183]:
# import libaries needed
import pandas as pd
import psycopg2
from datetime import datetime, date
import csv
import 

SyntaxError: invalid syntax (4204300349.py, line 6)

In [145]:
# load dataset
rm_data = pd.read_csv(f'data_output/rightnow_{date.today()}.csv')
omt_data = pd.read_csv(f'data_output/omt_{date.today()}.csv')
z_data = pd.read_csv(f'data_output/zoopla_{date.today()}.csv')

In [98]:
rm_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 14 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   transaction     50 non-null     object 
 1   address         50 non-null     object 
 2   bedroom         48 non-null     float64
 3   bathroom        49 non-null     float64
 4   sales_price     25 non-null     float64
 5   rent_perMonth   25 non-null     float64
 6   rent_perWeek    25 non-null     float64
 7   description     50 non-null     object 
 8   propertyType    50 non-null     object 
 9   location        50 non-null     object 
 10  agent           50 non-null     object 
 11  listing_source  50 non-null     object 
 12  listing_url     50 non-null     object 
 13  listed_date     50 non-null     object 
dtypes: float64(5), object(9)
memory usage: 5.6+ KB


In [146]:
omt_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 46 entries, 0 to 45
Data columns (total 14 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   transaction     46 non-null     object 
 1   address         44 non-null     object 
 2   bedroom         38 non-null     float64
 3   bathroom        44 non-null     float64
 4   sales_price     21 non-null     float64
 5   rent_perMonth   23 non-null     float64
 6   rent_perWeek    23 non-null     float64
 7   description     44 non-null     object 
 8   propertyType    44 non-null     object 
 9   location        44 non-null     object 
 10  agent           4 non-null      object 
 11  listing_source  46 non-null     object 
 12  listing_url     46 non-null     object 
 13  listed_date     44 non-null     object 
dtypes: float64(5), object(9)
memory usage: 5.2+ KB


In [147]:
z_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 15 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   transaction     50 non-null     object 
 1   address         50 non-null     object 
 2   bedroom         44 non-null     float64
 3   bathroom        41 non-null     float64
 4   living_room     30 non-null     float64
 5   sales_price     25 non-null     float64
 6   rent_perMonth   25 non-null     float64
 7   rent_perWeek    25 non-null     float64
 8   description     50 non-null     object 
 9   propertyType    46 non-null     object 
 10  location        50 non-null     object 
 11  agent           50 non-null     object 
 12  listing_source  50 non-null     object 
 13  listing_url     50 non-null     object 
 14  listed_date     50 non-null     object 
dtypes: float64(6), object(9)
memory usage: 6.0+ KB


### Create a Unique ID for each data

The unique id will be crated by combining the  [[transaction,agent,address,sales_price,rent_permonth,rent_perweek]] as one attribute and drop all empty columns

In [148]:
# create a unique id field
rm_data['id'] = rm_data['transaction'] + rm_data["bedroom"].astype(str) + rm_data["bathroom"].astype(str) + rm_data["sales_price"].astype(str) + z_data["rent_perMonth"].astype(str) + rm_data["address"]
# drop empty columns
rm_data.dropna(subset=['address'], inplace=True)

# change the data column to data type
rm_data['listed_date'] = pd.to_datetime(rm_data['listed_date'])

# read dataset
rm_data.head()

Unnamed: 0,transaction,address,bedroom,bathroom,sales_price,rent_perMonth,rent_perWeek,description,propertyType,location,agent,listing_source,listing_url,listed_date,id
0,rent,Narcissus Road London NW6,1.0,1.0,,1750.0,404.0,A large and modern one bedroom apartment boast...,Flat,NW6,"Kinleigh Folkard & Hayward - Lettings, West Ha...",rightmove,https://www.rightmove.co.uk/properties/1348085...,2023-05-15,rent1.01.0nan2300.0Narcissus Road London NW6
1,rent,"Flat 2, Refectory Apartments, 3 Lawrence Yard,...",2.0,1.0,,1750.0,404.0,Ellis & Co are delighted to offer on the marke...,Apartment,N15,"Ellis & Co, Tottenham",rightmove,https://www.rightmove.co.uk/properties/1350452...,2023-05-19,"rent2.01.0nan1500.0Flat 2, Refectory Apartment..."
2,rent,"Essex Road, Islington, N1",3.0,1.0,,3250.0,750.0,This superbly located three double bedroom spl...,Apartment,N1,"Charles Henry Peppiatt Ltd, Southgate",rightmove,https://www.rightmove.co.uk/properties/1350452...,2023-05-19,"rent3.01.0nan2496.0Essex Road, Islington, N1"
3,rent,"Kingston Road, London, SW20",1.0,1.0,,1450.0,335.0,"Modern 1 bed, 1 bath flat located within walki...",Flat,SW20,"Criterion Hospitality Limited, Criterion Hospi...",rightmove,https://www.rightmove.co.uk/properties/1350058...,2023-05-19,"rent1.01.0nan1600.0Kingston Road, London, SW20"
4,rent,"Adelaide Road, Chalk Farm, NW3",1.0,1.0,,1650.0,381.0,This charming one bedroom apartment is on the ...,Apartment,NW3,"Charles Henry Peppiatt Ltd, Southgate",rightmove,https://www.rightmove.co.uk/properties/1335260...,2023-05-19,"rent1.01.0nan750.0Adelaide Road, Chalk Farm, NW3"


In [149]:
# create a unique id field
omt_data['id'] = omt_data['id'] = omt_data['transaction'] + omt_data["bedroom"].astype(str) + omt_data["bathroom"].astype(str) + omt_data["sales_price"].astype(str) + z_data["rent_perMonth"].astype(str) + omt_data["address"]
# drop empty columns
omt_data.dropna(subset=['address'], inplace=True)

# # change the data column to data type
# omt_data['listed_date'] = pd.to_datetime(omt_data['listed_date'])

# read dataset
omt_data.head()

Unnamed: 0,transaction,address,bedroom,bathroom,sales_price,rent_perMonth,rent_perWeek,description,propertyType,location,agent,listing_source,listing_url,listed_date,id
0,rent,"Gloucester Terrace, London, W2",3.0,2.0,,8450.0,1950.0,3 bedroom duplex to rent,duplex,W2,Savills - Notting Hill,omt,https://www.onthemarket.com/details/12548982/,> 14 days,"rent3.02.0nan2300.0Gloucester Terrace, London, W2"
1,rent,"Chalklands, Wembley HA9",1.0,1.0,,750.0,173.0,1 bedroom flat to rent,flat,HA9,Home World Management - London,omt,https://www.onthemarket.com/details/13215915/,< 7 days,"rent1.01.0nan1500.0Chalklands, Wembley HA9"
2,rent,"Elmhurst Road, Enfield, London, Enfield EN3",1.0,1.0,,1000.0,231.0,1 bedroom flat to rent,flat,EN3,,omt,https://www.onthemarket.com/details/13215647/,< 7 days,"rent1.01.0nan2496.0Elmhurst Road, Enfield, Lon..."
3,rent,"Hogarth Road , Kensington , Kensington Chelsea...",1.0,1.0,,1647.0,380.0,1 bedroom flat to rent,flat,SW5,,omt,https://www.onthemarket.com/details/8560268/,< 7 days,"rent1.01.0nan1600.0Hogarth Road , Kensington ,..."
4,rent,"Browning Way, Hounslow TW5",,1.0,,1250.0,288.0,Studio to rent,Studio,TW5,,omt,https://www.onthemarket.com/details/13215123/,< 7 days,"rentnan1.0nan750.0Browning Way, Hounslow TW5"


In [150]:
# create a unique id field
z_data['id'] = z_data['id'] = z_data['transaction'] + z_data["bedroom"].astype(str) + z_data["bathroom"].astype(str) + z_data["sales_price"].astype(str) + z_data["rent_perMonth"].astype(str) + z_data["address"]
# drop empty columns
z_data.dropna(subset=['address'], inplace=True)

# change the data column to data type
z_data['listed_date'] = pd.to_datetime(z_data['listed_date'])

# read dataset
z_data.head()

Unnamed: 0,transaction,address,bedroom,bathroom,living_room,sales_price,rent_perMonth,rent_perWeek,description,propertyType,location,agent,listing_source,listing_url,listed_date,id
0,rent,"Belvedere Heights, Lisson Grove, London NW8",2.0,2.0,1.0,,2300.0,531.0,This is a beautifully decorated modern two dou...,flat,NW8,Redac Strattons - Central London Hub,zoopla,https://www.zoopla.co.uk/to-rent/details/64672...,2023-05-19,"rent2.02.0nan2300.0Belvedere Heights, Lisson G..."
1,rent,"Narcissus Road, West Hampstead, London NW6",1.0,1.0,,,1500.0,346.0,All bills included! Introducing a stunning new...,Studio,NW6,Belgrave Estates,zoopla,https://www.zoopla.co.uk/to-rent/details/64672...,2023-05-19,"rent1.01.0nan1500.0Narcissus Road, West Hampst..."
2,rent,"Hoxton Street, London N1",2.0,1.0,,,2496.0,576.0,"The Flat is situated on Hoxton Street, which i...",flat,N1,MK London Properties,zoopla,https://www.zoopla.co.uk/to-rent/details/64672...,2023-05-19,"rent2.01.0nan2496.0Hoxton Street, London N1"
3,rent,"Cambridge Rd, Kingston Upon Thames, London KT1",1.0,1.0,,,1600.0,369.0,Material Information Council Tax Band :C,flat,KT1,B & K Estates,zoopla,https://www.zoopla.co.uk/to-rent/details/64672...,2023-05-19,"rent1.01.0nan1600.0Cambridge Rd, Kingston Upon..."
4,rent,"Old Oak Common Lane, East Acton, London W3",1.0,,,,750.0,173.0,Savoy Property Consultants are delighted to pr...,,W3,Savoy Property Consultants,zoopla,https://www.zoopla.co.uk/to-rent/details/54451...,2023-05-19,"rent1.0nannan750.0Old Oak Common Lane, East Ac..."


### Connecting to the database

In [151]:

def create_database(database_name: str):
    # Connect to the default database
    conn = psycopg2.connect("host=localhost dbname=postgres user=postgres password=1118")
    conn.set_session(autocommit=True)
    cur = conn.cursor()

    # Create the sparkify database with UTF8 encoding
    cur.execute(f"DROP DATABASE IF EXISTS {database_name}")
    cur.execute(f"CREATE DATABASE {database_name} ENCODING 'UTF8'")

    # Close the connection to the default database
    cur.close()
    conn.close()

    # Connect to the new database
    conn = psycopg2.connect(f"host=localhost dbname={database_name} user=postgres password=1118")
    conn.set_session(autocommit=True)
    cur = conn.cursor()

    return cur, conn

cur, conn = create_database('london_propertylisting')

ObjectInUse: database "london_propertylisting" is being accessed by other users
DETAIL:  There are 12 other sessions using the database.


### Create table for each dataset and insert the data into the table
float is used instead of integer because there are NAN values which not be represented as integer

Right move

In [107]:
print(pd.io.sql.get_schema(rm_data, name="rm_data"))

CREATE TABLE "rm_data" (
"transaction" TEXT,
  "address" TEXT,
  "bedroom" REAL,
  "bathroom" REAL,
  "sales_price" REAL,
  "rent_perMonth" REAL,
  "rent_perWeek" REAL,
  "description" TEXT,
  "propertyType" TEXT,
  "location" TEXT,
  "agent" TEXT,
  "listing_source" TEXT,
  "listing_url" TEXT,
  "listed_date" TIMESTAMP,
  "id" TEXT
)


In [152]:
cur.execute("DROP TABLE IF EXISTS rightmove")

rm_table = ("""CREATE TABLE IF NOT EXISTS rightmove (
transaction TEXT,
  address VARCHAR,
  bedroom FLOAT,
  bathroom FLOAT,
  sales_price FLOAT,
  rent_perMonth FLOAT,
  rent_perWeek FLOAT,
  description VARCHAR,
  propertyType VARCHAR,
  location VARCHAR,
  agent VARCHAR,
  listing_source VARCHAR,
  listing_url VARCHAR,
  listed_date VARCHAR,
  id TEXT PRIMARY KEY
  );""")

cur.execute(rm_table)

In [153]:
# insert rm_data into the table created
rm_insert = """
INSERT INTO rightmove (
    transaction,
    address,
    bedroom,
    bathroom,
    sales_price,
    rent_perMonth,
    rent_perWeek,
    description,
    propertyType,
    location,
    agent,
    listing_source,
    listing_url,
    listed_date,
    id)
VALUES
(%s,%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s);
"""
for i, row in rm_data.iterrows():
    cur.execute(rm_insert, list(row))

On the market

In [154]:
cur.execute("DROP TABLE IF EXISTS omt")

omt_table = ("""CREATE TABLE IF NOT EXISTS omt (
transaction TEXT,
  address VARCHAR,
  bedroom FLOAT,
  bathroom FLOAT,
  sales_price FLOAT,
  rent_perMonth FLOAT,
  rent_perWeek FLOAT,
  description VARCHAR,
  propertyType VARCHAR,
  location VARCHAR,
  agent VARCHAR,
  listing_source VARCHAR,
  listing_url VARCHAR,
  listed_date VARCHAR,
  id TEXT PRIMARY KEY
  );""")

cur.execute(omt_table)

In [155]:
# insert omt_data into the table created
omt_insert = """
INSERT INTO omt (
    transaction,
    address,
    bedroom,
    bathroom,
    sales_price,
    rent_perMonth,
    rent_perWeek,
    description,
    propertyType,
    location,
    agent,
    listing_source,
    listing_url,
    listed_date,
    id)
VALUES
(%s,%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s);
"""
for i, row in omt_data.iterrows():
    cur.execute(omt_insert, list(row))

Zoopla

In [156]:
cur.execute("DROP TABLE IF EXISTS zoopla")

z_table = ("""CREATE TABLE IF NOT EXISTS zoopla (
transaction TEXT,
  address VARCHAR,
  bedroom FLOAT,
  bathroom FLOAT,
  living_room FLOAT,
  sales_price FLOAT,
  rent_perMonth FLOAT,
  rent_perWeek FLOAT,
  description VARCHAR,
  propertyType VARCHAR,
  location VARCHAR,
  agent VARCHAR,
  listing_source VARCHAR,
  listing_url VARCHAR,
  listed_date VARCHAR,
  id TEXT PRIMARY KEY
  );""")

cur.execute(z_table)

In [157]:
# insert rm_data into the table created
z_insert = """
INSERT INTO zoopla (
    transaction,
    address,
    bedroom,
    bathroom,
    living_room,
    sales_price,
    rent_perMonth,
    rent_perWeek,
    description,
    propertyType,
    location,
    agent,
    listing_source,
    listing_url,
    listed_date,
    id)
VALUES
(%s,%s, %s, %s, %s, %s,%s, %s, %s, %s, %s, %s, %s, %s, %s, %s);
"""
for i, row in z_data.iterrows():
    cur.execute(z_insert, list(row))

### Pull out the dataset and write query

  Use LEFT outer join to combine all the three dataset as one, remove duplicate and save the distinct data set into a new table in the database and also ouput it as csv

In [170]:
query1 = """SELECT DISTINCT *
FROM (
    SELECT *
    FROM public.rightmove
    UNION ALL
    SELECT *
    FROM public.omt
    UNION ALL
    SELECT *
    FROM public.zoopla
) AS master_list;"""

In [181]:
query1 = """
SELECT DISTINCT *
FROM (
    SELECT transaction, address, bedroom, bathroom, sales_price, rent_perMonth, rent_perWeek,
        description, propertyType, location, agent, listing_source, listing_url, listed_date, id,
        CAST(NULL AS FLOAT) AS living_room
    FROM public.rightmove
    UNION ALL
    SELECT transaction, address, bedroom, bathroom, sales_price, rent_perMonth, rent_perWeek,
        description, propertyType, location, agent, listing_source, listing_url, listed_date, id,
        CAST(NULL AS FLOAT) AS living_room
    FROM public.omt
    UNION ALL
    SELECT transaction, address, bedroom, bathroom, sales_price, rent_perMonth, rent_perWeek,
        description, propertyType, location, agent, listing_source, listing_url, listed_date, id, living_room
    FROM public.zoopla
) AS master_list;
"""

In [182]:
cur.execute(query1)

for row in cur.fetchall():
    print(row)

('sales', 'Bronson Road, Raynes Park', 2.0, 1.0, 785000.0, nan, nan, '2 bedroom terraced house for sale', 'terraced', 'Park', 'NaN', 'omt', 'https://www.onthemarket.com/details/13229071/', '2023-05-19', 'sales2.01.0785000.0nanBronson Road, Raynes Park', None)
('rent', 'Park Mansions, London, SW8', 3.0, 1.0, nan, 2500.0, 577.0, '£2850 Per calendar Month, 3 Minute walk to Tube / Rail Station (Zone 1) Beautiful 3 Bedroom Vauxhall Flat with 1 WC. All within 20 metres of lush and spacious Vauxhall Park. Please note: From the... ** Property Reference: 1695487 **', 'Flat', 'SW8', 'OpenRent, London', 'rightmove', 'https://www.rightmove.co.uk/properties/134355104#/?channel=RES_LET', '2023-05-19 00:00:00', 'rent3.01.0nan2050.0Park Mansions, London, SW8', None)
('sales', 'South Street, London', 5.0, 6.0, 35000000.0, nan, nan, 'An elegant stone-dressed, red-brick, five-storey grade II listed mansion house. Double-fronted Edwardian facade with arts & crafts style detailing and french-inspired fine 

16

### Create both xlsx and csv master list

In [None]:
cur.execute(query1)

for row in cur.fetchall():
    print(row)

# Convert the query result to a DataFrame
df = pd.DataFrame(rows, columns=[column[0] for column in cur.fetchall()])


# Save as CSV
csv_filename = f'data_output/masterList_{date.today()}.csv'
df.to_csv(csv_filename, index=False)

# Save as Excel
excel_filename = f'data_output/masterList_{date.today()}.xlsx'
df.to_excel(excel_filename, index=False)

### Creare a database and load the master list back to the database

In [None]:
cur.execute("DROP TABLE IF EXISTS master_list")

master_list  = """
CREATE TABLE IF NOT EXISTS master_list (
    transaction TEXT,
    address VARCHAR,
    bedroom FLOAT,
    bathroom FLOAT,
    living_room FLOAT,
    sales_price FLOAT,
    rent_perMonth FLOAT,
    rent_perWeek FLOAT,
    description VARCHAR,
    propertyType VARCHAR,
    location VARCHAR,
    agent VARCHAR,
    listing_source VARCHAR,
    listing_url VARCHAR,
    listed_date DATE,
    id TEXT PRIMARY KEY
);
"""

cur.execute(master_list)

Copy the save csv file to the master_list table

In [None]:
copy_data = f"""
COPY master_list (
    transaction, address, bedroom, bathroom, living_room, sales_price, rent_perMonth, rent_perWeek,
    description, propertyType, location, agent, listing_source, listing_url, listed_date, id
) FROM 'data_output/masterList_{date.today()}.csv' DELIMITER ',' CSV HEADER;
"""

cur.execute(copy_data)