# Prediction of prices in Beijing
* url: https://www.kaggle.com/datasets/ruiqurm/lianjia

# Import all the libraries and check its version

In [1]:
import sys # To check pyhton version
import pandas as pd
pd.set_option('display.max_columns', None) # Show all the columns of the DataFrame.
import numpy as np
import os
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

In [2]:
print(f"Python version: {sys.version}")
print(f"OS version: {os.name}")
print(f"Pandas version: {pd.__version__}")
print(f"Numpy version: {np.__version__}")

Python version: 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0]
OS version: posix
Pandas version: 2.0.3
Numpy version: 1.21.5


## Data path

In [3]:
main_path = "/home/andres/Proyectos/Python/Machine Learning/Housing price Beijing"
file_name = "housing_beijing.csv"
path = os.path.join(main_path, file_name)

#### Columns of the dataset
* url: the url which fetches the data
* id: the id of transaction
* Lng: and Lat coordinates, using the BD09 protocol.
* Cid: community id
* tradeTime: the time of transaction
* DOM: active days on market.Know more in https://en.wikipedia.org/wiki/Days_on_market
* followers: the number of people follow the transaction.
* totalPrice: the total price
* price: the average price by square
* square: the square of house
* livingRoom: the number of living room
* drawingRoom: the number of drawing room
* kitchen: the number of kitchen
* bathroom the number of bathroom
* floor: the height of the house. I will turn the Chinese characters to English in the next version.
* buildingType: including tower( 1 ) , bungalow( 2 )，combination of plate and tower( 3 ), plate( 4 ).
* constructionTime: the time of construction
* renovationCondition: including other( 1 ), rough( 2 ),Simplicity( 3 ), hardcover( 4 )
* buildingStructure: including unknow( 1 ), mixed( 2 ), brick and wood( 3 ), brick and concrete( 4 ),steel( 5 ) and steel-concrete composite ( 6 ).
* ladderRatio: the proportion between number of residents on the same floor and number of elevator of ladder. It describes how many ladders a resident have on average.
* elevator have ( 1 ) or not have elevator( 0 )
* fiveYearsProperty: if the owner have the property for less than 5 years.

Most data is traded in 2011-2017, some of them is traded in Jan,2018, and some is even earlier(2010,2009)

All the data was fetching from https://bj.lianjia.com/chengjiao. 

Some columns are missing a description.


## Exploratory analysis of the dataset

In [4]:
# dtype = str: We make all columns str to avoid problems loading the data.
# low_memory = False: If there are missing values or mixed data types, it may require more memory for type inference.
# encoding = 'gbk': To read chinesse characters.
data = pd.read_csv(path, encoding = 'gbk', dtype = str, low_memory = False)
data.head()

Unnamed: 0,url,id,Lng,Lat,Cid,tradeTime,DOM,followers,totalPrice,price,square,livingRoom,drawingRoom,kitchen,bathRoom,floor,buildingType,constructionTime,renovationCondition,buildingStructure,ladderRatio,elevator,fiveYearsProperty,subway,district,communityAverage
0,https://bj.lianjia.com/chengjiao/101084782030....,101084782030,116.475489,40.01952,1111027376244,2016-08-09,1464,106,415.0,31680,131.0,2,1,1,1,高 26,1,2005,3,6,0.217,1.0,0.0,1.0,7,56021
1,https://bj.lianjia.com/chengjiao/101086012217....,101086012217,116.453917,39.881534,1111027381879,2016-07-28,903,126,575.0,43436,132.38,2,2,1,2,高 22,1,2004,4,6,0.667,1.0,1.0,0.0,7,71539
2,https://bj.lianjia.com/chengjiao/101086041636....,101086041636,116.561978,39.877145,1111040862969,2016-12-11,1271,48,1030.0,52021,198.0,3,2,1,3,中 4,4,2005,3,6,0.5,1.0,0.0,0.0,7,48160
3,https://bj.lianjia.com/chengjiao/101086406841....,101086406841,116.43801,40.076114,1111043185817,2016-09-30,965,138,297.5,22202,134.0,3,1,1,1,底 21,1,2008,1,6,0.273,1.0,0.0,0.0,6,51238
4,https://bj.lianjia.com/chengjiao/101086920653....,101086920653,116.428392,39.886229,1111027381174,2016-08-28,927,286,392.0,48396,81.0,2,1,1,1,中 6,4,1960,2,2,0.333,0.0,1.0,1.0,1,62588


In [5]:
data.shape

(318851, 26)

In [6]:
data.dtypes

url                    object
id                     object
Lng                    object
Lat                    object
Cid                    object
tradeTime              object
DOM                    object
followers              object
totalPrice             object
price                  object
square                 object
livingRoom             object
drawingRoom            object
kitchen                object
bathRoom               object
floor                  object
buildingType           object
constructionTime       object
renovationCondition    object
buildingStructure      object
ladderRatio            object
elevator               object
fiveYearsProperty      object
subway                 object
district               object
communityAverage       object
dtype: object

The dataset has one problem, the column 'totalPrice' which correspond to the total price is lesser than the 'price'
column, which is the pricer per square. The reason for this is because the 'totalPrice' column does not have all the
digits, so we are goint to drop the column and create a new 'Price' column multiplying the 'price' columun with the
'square' column, we are going to do the following steps:
* Drop the 'totalPrice' column.
* Convert the 'price' and 'square' columns data type to float (right now they are objects).
* Create the 'Price (M)' column multiplying  the 'price' and 'square' columns and normalizing it to 1,000,000 with 3 decimals so it is easier to read it.

In [7]:
# Step 1
data = data.drop('totalPrice', axis = 1)
data.head()

Unnamed: 0,url,id,Lng,Lat,Cid,tradeTime,DOM,followers,price,square,livingRoom,drawingRoom,kitchen,bathRoom,floor,buildingType,constructionTime,renovationCondition,buildingStructure,ladderRatio,elevator,fiveYearsProperty,subway,district,communityAverage
0,https://bj.lianjia.com/chengjiao/101084782030....,101084782030,116.475489,40.01952,1111027376244,2016-08-09,1464,106,31680,131.0,2,1,1,1,高 26,1,2005,3,6,0.217,1.0,0.0,1.0,7,56021
1,https://bj.lianjia.com/chengjiao/101086012217....,101086012217,116.453917,39.881534,1111027381879,2016-07-28,903,126,43436,132.38,2,2,1,2,高 22,1,2004,4,6,0.667,1.0,1.0,0.0,7,71539
2,https://bj.lianjia.com/chengjiao/101086041636....,101086041636,116.561978,39.877145,1111040862969,2016-12-11,1271,48,52021,198.0,3,2,1,3,中 4,4,2005,3,6,0.5,1.0,0.0,0.0,7,48160
3,https://bj.lianjia.com/chengjiao/101086406841....,101086406841,116.43801,40.076114,1111043185817,2016-09-30,965,138,22202,134.0,3,1,1,1,底 21,1,2008,1,6,0.273,1.0,0.0,0.0,6,51238
4,https://bj.lianjia.com/chengjiao/101086920653....,101086920653,116.428392,39.886229,1111027381174,2016-08-28,927,286,48396,81.0,2,1,1,1,中 6,4,1960,2,2,0.333,0.0,1.0,1.0,1,62588


In [8]:
data.shape

(318851, 25)

In [9]:
# Step 2
data['price'] = data['price'].astype(float)
data['square'] = data['square'].astype(float)
data.dtypes

url                     object
id                      object
Lng                     object
Lat                     object
Cid                     object
tradeTime               object
DOM                     object
followers               object
price                  float64
square                 float64
livingRoom              object
drawingRoom             object
kitchen                 object
bathRoom                object
floor                   object
buildingType            object
constructionTime        object
renovationCondition     object
buildingStructure       object
ladderRatio             object
elevator                object
fiveYearsProperty       object
subway                  object
district                object
communityAverage        object
dtype: object

In [10]:
# Step 3
data['Price (M)'] = round(data['price'] * data['square'] / 1000000, 3)
data.head()

Unnamed: 0,url,id,Lng,Lat,Cid,tradeTime,DOM,followers,price,square,livingRoom,drawingRoom,kitchen,bathRoom,floor,buildingType,constructionTime,renovationCondition,buildingStructure,ladderRatio,elevator,fiveYearsProperty,subway,district,communityAverage,Price (M)
0,https://bj.lianjia.com/chengjiao/101084782030....,101084782030,116.475489,40.01952,1111027376244,2016-08-09,1464,106,31680.0,131.0,2,1,1,1,高 26,1,2005,3,6,0.217,1.0,0.0,1.0,7,56021,4.15
1,https://bj.lianjia.com/chengjiao/101086012217....,101086012217,116.453917,39.881534,1111027381879,2016-07-28,903,126,43436.0,132.38,2,2,1,2,高 22,1,2004,4,6,0.667,1.0,1.0,0.0,7,71539,5.75
2,https://bj.lianjia.com/chengjiao/101086041636....,101086041636,116.561978,39.877145,1111040862969,2016-12-11,1271,48,52021.0,198.0,3,2,1,3,中 4,4,2005,3,6,0.5,1.0,0.0,0.0,7,48160,10.3
3,https://bj.lianjia.com/chengjiao/101086406841....,101086406841,116.43801,40.076114,1111043185817,2016-09-30,965,138,22202.0,134.0,3,1,1,1,底 21,1,2008,1,6,0.273,1.0,0.0,0.0,6,51238,2.975
4,https://bj.lianjia.com/chengjiao/101086920653....,101086920653,116.428392,39.886229,1111027381174,2016-08-28,927,286,48396.0,81.0,2,1,1,1,中 6,4,1960,2,2,0.333,0.0,1.0,1.0,1,62588,3.92


### Create a subset with the desired columns
Not all the columns are useful, we are going to stay with all the columns that bring some value.

In [11]:
columns = data.columns.values.tolist()
columns

['url',
 'id',
 'Lng',
 'Lat',
 'Cid',
 'tradeTime',
 'DOM',
 'followers',
 'price',
 'square',
 'livingRoom',
 'drawingRoom',
 'kitchen',
 'bathRoom',
 'floor',
 'buildingType',
 'constructionTime',
 'renovationCondition',
 'buildingStructure',
 'ladderRatio',
 'elevator',
 'fiveYearsProperty',
 'subway',
 'district',
 'communityAverage',
 'Price (M)']

In [12]:
# Create the desired subset
columns_set = set(columns)
columns_subset = {'url', 'Cid', 'tradeTime', 'id', 'communityAverage'}
desired_columns = columns_set - columns_subset
desired_columns = list(desired_columns)
desired_columns

['followers',
 'bathRoom',
 'renovationCondition',
 'floor',
 'Price (M)',
 'Lng',
 'buildingType',
 'square',
 'elevator',
 'Lat',
 'fiveYearsProperty',
 'drawingRoom',
 'constructionTime',
 'subway',
 'price',
 'DOM',
 'ladderRatio',
 'livingRoom',
 'kitchen',
 'district',
 'buildingStructure']

In [13]:
# Create a new DataFrame using the desired columns
desired_data = data[desired_columns]
desired_data.head()

Unnamed: 0,followers,bathRoom,renovationCondition,floor,Price (M),Lng,buildingType,square,elevator,Lat,fiveYearsProperty,drawingRoom,constructionTime,subway,price,DOM,ladderRatio,livingRoom,kitchen,district,buildingStructure
0,106,1,3,高 26,4.15,116.475489,1,131.0,1.0,40.01952,0.0,1,2005,1.0,31680.0,1464,0.217,2,1,7,6
1,126,2,4,高 22,5.75,116.453917,1,132.38,1.0,39.881534,1.0,2,2004,0.0,43436.0,903,0.667,2,1,7,6
2,48,3,3,中 4,10.3,116.561978,4,198.0,1.0,39.877145,0.0,2,2005,0.0,52021.0,1271,0.5,3,1,7,6
3,138,1,1,底 21,2.975,116.43801,1,134.0,1.0,40.076114,0.0,1,2008,0.0,22202.0,965,0.273,3,1,6,6
4,286,1,2,中 6,3.92,116.428392,4,81.0,0.0,39.886229,1.0,1,1960,1.0,48396.0,927,0.333,2,1,1,2


### Renaming the columns

In [14]:
desired_data = desired_data.rename(columns = {'followers': 'Followers', 'ladderRatio': 'Ladder Ratio',
                                  'fiveYearsProperty': 'Five Years Property', 'square': 'Square',  
                                  'livingRoom': 'Living Room', 'bathRoom': 'Bathroom', 'kitchen': 'Kitchen',
                                   'buildingType': 'Building Type', 'buildingStructure': 'Building Structure',
                                  'elevator': 'Elevator', 'constructionTime': 'Construction Time', 
                                  'district': 'District', 'renovationCondition': 'Renovation Condition',
                                  'subway': 'Subway', 'drawingRoom': 'Drawing Room', 'price': 'Price per square',
                                             'floor': 'Floor'})

In [15]:
desired_data.head()

Unnamed: 0,Followers,Bathroom,Renovation Condition,Floor,Price (M),Lng,Building Type,Square,Elevator,Lat,Five Years Property,Drawing Room,Construction Time,Subway,Price per square,DOM,Ladder Ratio,Living Room,Kitchen,District,Building Structure
0,106,1,3,高 26,4.15,116.475489,1,131.0,1.0,40.01952,0.0,1,2005,1.0,31680.0,1464,0.217,2,1,7,6
1,126,2,4,高 22,5.75,116.453917,1,132.38,1.0,39.881534,1.0,2,2004,0.0,43436.0,903,0.667,2,1,7,6
2,48,3,3,中 4,10.3,116.561978,4,198.0,1.0,39.877145,0.0,2,2005,0.0,52021.0,1271,0.5,3,1,7,6
3,138,1,1,底 21,2.975,116.43801,1,134.0,1.0,40.076114,0.0,1,2008,0.0,22202.0,965,0.273,3,1,6,6
4,286,1,2,中 6,3.92,116.428392,4,81.0,0.0,39.886229,1.0,1,1960,1.0,48396.0,927,0.333,2,1,1,2


In [16]:
columns = desired_data.columns.values.tolist()
columns

['Followers',
 'Bathroom',
 'Renovation Condition',
 'Floor',
 'Price (M)',
 'Lng',
 'Building Type',
 'Square',
 'Elevator',
 'Lat',
 'Five Years Property',
 'Drawing Room',
 'Construction Time',
 'Subway',
 'Price per square',
 'DOM',
 'Ladder Ratio',
 'Living Room',
 'Kitchen',
 'District',
 'Building Structure']

### Dealing with annoying values
Let's create a function that trys to convert the values to int or float and if it fails, stored those values in a dictionary.

In [17]:
# Finds the unique values for each column if the conversion to int or float fails.
def findValues(columns_list):
    data_copy = desired_data.copy() # Create a copy of the dataset.
    values_dict = {}
    for col in columns_list:
        try:
            data_copy[col] = data_copy[col].astype(int)
        except ValueError: # If it is a float value
            try:
                data_copy[col] = data_copy[col].astype(float)
            except ValueError: # If there is a non number value.
                col_list = []
                for i in data_copy[col].unique().tolist():
                    try:
                        int(i)
                    except ValueError:
                        try:
                            float(i)
                        except ValueError:
                            col_list.append(i)
                values_dict[col] = col_list
    return values_dict

In [18]:
# Create a DataFrame using values_dict
series_list = []
for col, values in findValues(columns).items():
    series_list.append(pd.Series(values, name = col))

# Concatenate the series in a DataFrame
df = pd.concat(series_list, axis=1)
df

Unnamed: 0,Bathroom,Floor,Drawing Room,Construction Time,Living Room
0,未知,高 26,中 14,未知,#NAME?
1,,高 22,中 15,,
2,,中 4,中 16,,
3,,底 21,中 6,,
4,,中 6,高 14,,
...,...,...,...,...,...
198,,未知 29,,,
199,,未知 24,,,
200,,未知 30,,,
201,,未知 31,,,


In [19]:
# Look for the frequency of certain values
def frequencyValues(values_dict):
    freq_dict = {}
    for key, value in values_dict.items():
        for l in value:
            # Stored the column, value and relative frequency in a tuple.
            freq_dict[(key, l, 'Relative frequency %')] = ((desired_data[key] == l).sum(), ((desired_data[key] == l).sum() / desired_data.shape[0]) * 100)
    return freq_dict

In [20]:
frequencyValues(findValues(columns))

{('Bathroom', '未知', 'Relative frequency %'): (2, 0.000627252227529473),
 ('Floor', '高 26', 'Relative frequency %'): (1820, 0.5707995270518205),
 ('Floor', '高 22', 'Relative frequency %'): (2540, 0.7966103289624308),
 ('Floor', '中 4', 'Relative frequency %'): (1598, 0.5011745297960489),
 ('Floor', '底 21', 'Relative frequency %'): (289, 0.09063794687800886),
 ('Floor', '中 6', 'Relative frequency %'): (34788, 10.910425245647653),
 ('Floor', '中 8', 'Relative frequency %'): (1202, 0.3769785887452133),
 ('Floor', '高 6', 'Relative frequency %'): (20904, 6.556040282138052),
 ('Floor', '高 10', 'Relative frequency %'): (1193, 0.37415595372133065),
 ('Floor', '中 23', 'Relative frequency %'): (1484, 0.465421152826869),
 ('Floor', '底 11', 'Relative frequency %'): (676, 0.21201125290496187),
 ('Floor', '底 3', 'Relative frequency %'): (639, 0.20040708669566665),
 ('Floor', '高 24', 'Relative frequency %'): (4014, 1.2588952206516524),
 ('Floor', '低 23', 'Relative frequency %'): (839, 0.2631323094486139

From the above dictionary we can see that the 'Floor' column has too many combined data (kanji + number), so we are going to drop that column. The other columns have too little rows with combined data, so we can drop them, the only exception is going to be the 'Construction Time' wich only has one value '未知' that means unknown, about 6% of the data has this value so instead of dropping the rows, we are going to give them a random value choose among the other values that are known.

In [21]:
# Drop 'Floor' column.
desired_data = desired_data.drop('Floor', axis = 1)
desired_data.head()

Unnamed: 0,Followers,Bathroom,Renovation Condition,Price (M),Lng,Building Type,Square,Elevator,Lat,Five Years Property,Drawing Room,Construction Time,Subway,Price per square,DOM,Ladder Ratio,Living Room,Kitchen,District,Building Structure
0,106,1,3,4.15,116.475489,1,131.0,1.0,40.01952,0.0,1,2005,1.0,31680.0,1464,0.217,2,1,7,6
1,126,2,4,5.75,116.453917,1,132.38,1.0,39.881534,1.0,2,2004,0.0,43436.0,903,0.667,2,1,7,6
2,48,3,3,10.3,116.561978,4,198.0,1.0,39.877145,0.0,2,2005,0.0,52021.0,1271,0.5,3,1,7,6
3,138,1,1,2.975,116.43801,1,134.0,1.0,40.076114,0.0,1,2008,0.0,22202.0,965,0.273,3,1,6,6
4,286,1,2,3.92,116.428392,4,81.0,0.0,39.886229,1.0,1,1960,1.0,48396.0,927,0.333,2,1,1,2


In [22]:
# Drop the rows.
columns = desired_data.columns.values.tolist()
dict_keys_object = frequencyValues(findValues(columns)).keys() # Access to the keys values (tuple).
keys_list = list(dict_keys_object) # Transform the dictionary into a list to  iterate.
for i in keys_list: # i is a tuple with the values (column, value, Relative frequency %)
    if i[0] != 'Construction Time':
    # We drop the rows in de DataFrame that have the unwanted values.
        desired_data = desired_data[~desired_data[i[0]].eq(i[1])]

In [23]:
frequencyValues(findValues(columns))

{('Construction Time', '未知', 'Relative frequency %'): (19283,
  6.04825935719013)}

Replace the '未知' values with random values of the 'Construction Time' column.

In [24]:
# Function that replace certain values with random values of the same column.
def randomReplace(data, column, value):
    list_values = data[column].tolist() # Get all the values of the column.
    list_values = [x for x in list_values if x != value] # Update the list without the unwanted value.
    np.random.shuffle(list_values) # Shuffle the list.
    mask = (data[column] == value) # Create a boolean mask for the rows that contain the unwanted value.
    data.loc[mask, column] = np.random.choice(list_values, size = mask.sum()) # Replace the unwanted value for random values from the list.
    return data.head()

In [25]:
randomReplace(desired_data, 'Construction Time', '未知')

Unnamed: 0,Followers,Bathroom,Renovation Condition,Price (M),Lng,Building Type,Square,Elevator,Lat,Five Years Property,Drawing Room,Construction Time,Subway,Price per square,DOM,Ladder Ratio,Living Room,Kitchen,District,Building Structure
0,106,1,3,4.15,116.475489,1,131.0,1.0,40.01952,0.0,1,2005,1.0,31680.0,1464,0.217,2,1,7,6
1,126,2,4,5.75,116.453917,1,132.38,1.0,39.881534,1.0,2,2004,0.0,43436.0,903,0.667,2,1,7,6
2,48,3,3,10.3,116.561978,4,198.0,1.0,39.877145,0.0,2,2005,0.0,52021.0,1271,0.5,3,1,7,6
3,138,1,1,2.975,116.43801,1,134.0,1.0,40.076114,0.0,1,2008,0.0,22202.0,965,0.273,3,1,6,6
4,286,1,2,3.92,116.428392,4,81.0,0.0,39.886229,1.0,1,1960,1.0,48396.0,927,0.333,2,1,1,2


In [26]:
frequencyValues(findValues(columns)) # If the dictionay is empty it means all the unwanted values were replaced sucessfully.

{}

### NaN values
Let's see how many rows with NaN values there are in the dataset.

In [27]:
data_copy = desired_data.copy() # Create a copy of the dataset.
data_copy = data_copy.dropna() # Drop NaN rows.
nan_rows = desired_data.shape[0] - data_copy.shape[0]
nan_rows

159284

In [28]:
(nan_rows / desired_data.shape[0]) * 100 # Porcentage of NaN rows.

49.96063597213466

The above analysis show us that almost half of the registers hava at least one NaN columns, so deleting them will make us lose half of the data, instead, we are going to replace NaN values with somo statistic.

In [29]:
nans = desired_data.isna().sum()
nans

Followers                    0
Bathroom                     0
Renovation Condition         0
Price (M)                    0
Lng                          0
Building Type             2021
Square                       0
Elevator                     0
Lat                          0
Five Years Property          0
Drawing Room                 0
Construction Time            0
Subway                       0
Price per square             0
DOM                     157970
Ladder Ratio                 0
Living Room                  0
Kitchen                      0
District                     0
Building Structure           0
dtype: int64

In [30]:
data_copy = desired_data.copy()
# Replace NaNs with 'Building type' mode.
# inplace = True modifies the DataFrame directly, otherwise it creates a copy.
mode_value = data_copy['Building Type'].mode().iloc[0]
data_copy['Building Type'].fillna(value = mode_value, inplace = True) 

In [31]:
nans = data_copy.isna().sum()
nans

Followers                    0
Bathroom                     0
Renovation Condition         0
Price (M)                    0
Lng                          0
Building Type                0
Square                       0
Elevator                     0
Lat                          0
Five Years Property          0
Drawing Room                 0
Construction Time            0
Subway                       0
Price per square             0
DOM                     157970
Ladder Ratio                 0
Living Room                  0
Kitchen                      0
District                     0
Building Structure           0
dtype: int64

In [32]:
# Replace NaNs with 'DOM' mean.
# Transform the values to numbers, when the conversion fails, we use coerce to fill them with nans.
data_copy['DOM'] = pd.to_numeric(data_copy['DOM'], errors = 'coerce')
mean_value = int(round(data_copy['DOM'].mean()))
data_copy['DOM'] = data_copy['DOM'].fillna(value = mean_value)

In [33]:
nans = data_copy.isna().sum()
nans

Followers               0
Bathroom                0
Renovation Condition    0
Price (M)               0
Lng                     0
Building Type           0
Square                  0
Elevator                0
Lat                     0
Five Years Property     0
Drawing Room            0
Construction Time       0
Subway                  0
Price per square        0
DOM                     0
Ladder Ratio            0
Living Room             0
Kitchen                 0
District                0
Building Structure      0
dtype: int64

# Clustering
We are going to group the data in differents clusters using the location and price per feet square, but first we are going to plot a 3D graph.
## 3D Graph

In [42]:
min_lng = min(data_copy['Lng'])
max_lng = max(data_copy['Lng'])

In [43]:
min_lat = min(data_copy['Lat'])
max_lat = max(data_copy['Lat'])

In [44]:
min_ps = min(data_copy['Price per square'])
max_ps = max(data_copy['Price per square'])

In [48]:
# Create the 3D figure.
fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(data_copy['Lng'], data_copy['Lat'], data_copy['Price per square'], c='b', marker='o')
ax.set_xlim(min_lng, max_lng) 
ax.set_ylim(min_lat, max_lat) 
ax.set_zlim(min_ps, max_ps) 
ax.set_xlabel('Longitude')
ax.set_ylabel('Latitude')
ax.set_zlabel('Price per square')
plt.show()