### Importing necessary library

In [70]:
import pandas as pd
import numpy as np

import datetime

### Reading the data file

In [79]:
rent_df = pd.read_csv('../Data/canada_rent.csv')
rent_df

Unnamed: 0,rentfaster_id,city,province,address,latitude,longitude,lease_term,type,price,beds,baths,sq_feet,link,furnishing,availability_date,smoking,cats,dogs
0,468622,Airdrie,Alberta,69 Gateway Dr NE,51.305962,-114.012515,Long Term,Townhouse,2495.0,2 Beds,2.5,1403,/ab/airdrie/rentals/townhouse/2-bedrooms/pet-f...,Unfurnished,Immediate,Non-Smoking,True,True
1,468622,Airdrie,Alberta,69 Gateway Dr NE,51.305962,-114.012515,Long Term,Townhouse,2695.0,3 Beds,2.5,1496,/ab/airdrie/rentals/townhouse/2-bedrooms/pet-f...,Unfurnished,Immediate,Non-Smoking,True,True
2,468622,Airdrie,Alberta,69 Gateway Dr NE,51.305962,-114.012515,Long Term,Townhouse,2295.0,2 Beds,2.5,1180,/ab/airdrie/rentals/townhouse/2-bedrooms/pet-f...,Unfurnished,Immediate,Non-Smoking,True,True
3,468622,Airdrie,Alberta,69 Gateway Dr NE,51.305962,-114.012515,Long Term,Townhouse,2095.0,2 Beds,2.5,1403,/ab/airdrie/rentals/townhouse/2-bedrooms/pet-f...,Unfurnished,November 18,Non-Smoking,True,True
4,468622,Airdrie,Alberta,69 Gateway Dr NE,51.305962,-114.012515,Long Term,Townhouse,2495.0,2 Beds,2.5,1403,/ab/airdrie/rentals/townhouse/2-bedrooms/pet-f...,Unfurnished,Immediate,Non-Smoking,True,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
25766,334285,Yorkton,Saskatchewan,18 Allanbrooke Dr,51.203024,-102.484782,Long Term,Apartment,920.0,Studio,1,400,/sk/yorkton/rentals/apartment/3-bedrooms/pet-f...,Unfurnished,Immediate,Non-Smoking,True,True
25767,334285,Yorkton,Saskatchewan,18 Allanbrooke Dr,51.203024,-102.484782,Long Term,Apartment,1450.0,3 Beds,1,810,/sk/yorkton/rentals/apartment/3-bedrooms/pet-f...,Unfurnished,Immediate,Non-Smoking,True,True
25768,334284,Yorkton,Saskatchewan,363 Northview Rd,51.202667,-102.485765,Long Term,Apartment,1125.0,1 Bed,1,520,/sk/yorkton/rentals/apartment/1-bedroom/pet-fr...,Unfurnished,Immediate,Non-Smoking,True,True
25769,334284,Yorkton,Saskatchewan,363 Northview Rd,51.202667,-102.485765,Long Term,Apartment,920.0,Studio,1,400,/sk/yorkton/rentals/apartment/1-bedroom/pet-fr...,Unfurnished,Immediate,Non-Smoking,True,True


### Cleaning up the dataframe

I will be removing columns that will bring no additional value to the regression analysis, such as the link and rentfaster_id columns.

I will also be removing the latitude and longitude columns to work with the address, city and Province instead.

I will be removing the string 'Beds' from the beds column and will be replacing studio and none Beds by 0.

I will be replacing the string immediate in the column availability_date by today's date, in order to change the column to a datetime stamp.

I will be dropping all the rows where the column availability_date has the string 'No Vacancy'

I will be replacing the string 'Negotiable' and 'Call for Availability' in the column availability_date by today's date

In [80]:
# Removing unecessary columns
rent_df.drop(columns=['link', 'rentfaster_id','latitude','longitude'], inplace=True)

**Filtering the beds column**

In [81]:
# Looking at the beds possibilities
print('Before =>',rent_df['beds'].unique())

# Removing Beds, Bed from column beds
rent_df['beds'].replace(r'\s*Beds?\s*','', regex=True, inplace=True)

# Replacing Studio and none by 0
rent_df['beds'].replace(r'\b(Studio|none)\b','0', regex=True, inplace=True)

# Verifying that the string replacement was done correctly
print('After =>',rent_df['beds'].unique())

Before => ['2 Beds' '3 Beds' 'Studio' '1 Bed' '5 Beds' '4 Beds' nan '6 Beds'
 'none Beds' '8 Beds' '7 Beds' '9 Beds']
After => ['2' '3' '0' '1' '5' '4' nan '6' '8' '7' '9']


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  rent_df['beds'].replace(r'\s*Beds?\s*','', regex=True, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  rent_df['beds'].replace(r'\b(Studio|none)\b','0', regex=True, inplace=True)


rent_df

**Filtering the availability_date column**

In [None]:
# Look the possible string the column availability_date
print(rent_df['availability_date'].unique())

# Looking at frequency of occurency of 'Negotiable', 'Call for Availability' in the column availability_date
print(rent_df[rent_df['availability_date'].isin(['Negotiable', 'Call for Availability'])].count())

# Looking at frequency of occurency of 'No Vacancy' in the column availability_dateabs
print(rent_df[rent_df['availability_date'].isin(['No Vacancy'])].count())

In [None]:
# Removing adds (removing the entire row) where there is 'No Vacancy'
rent_df = rent_df[~rent_df['availability_date'].isin(['No Vacancy'])]

In [None]:
# Retrieve today's date
today = datetime.date.today().strftime('%B %d') #pd.Timestamp.today()

# Replace the strings by today's date
rent_df['availability_date'].replace({'Immediate': today}, inplace=True)
rent_df['availability_date'].replace({'Negotiable': today}, inplace=True)
rent_df['availability_date'].replace({'Call for Availability': today}, inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  rent_df['availability_date'].replace({'Immediate': today}, inplace=True)


In [87]:
pd.to_datetime(rent_df['availability_date'], format='%B %d')

ValueError: time data "Negotiable" doesn't match format "%B %d", at position 13. You might want to try:
    - passing `format` if your strings have a consistent format;
    - passing `format='ISO8601'` if your strings are all ISO8601 but not necessarily in exactly the same format;
    - passing `format='mixed'`, and the format will be inferred for each element individually. You might want to use `dayfirst` alongside this.

In [96]:
rent_df['availability_date'].unique()

rent_df[~rent_df['availability_date'].isin(['No Vacancy'])]

Unnamed: 0,city,province,address,lease_term,type,price,beds,baths,sq_feet,furnishing,availability_date,smoking,cats,dogs
0,Airdrie,Alberta,69 Gateway Dr NE,Long Term,Townhouse,2495.0,2,2.5,1403,Unfurnished,February 21,Non-Smoking,True,True
1,Airdrie,Alberta,69 Gateway Dr NE,Long Term,Townhouse,2695.0,3,2.5,1496,Unfurnished,February 21,Non-Smoking,True,True
2,Airdrie,Alberta,69 Gateway Dr NE,Long Term,Townhouse,2295.0,2,2.5,1180,Unfurnished,February 21,Non-Smoking,True,True
3,Airdrie,Alberta,69 Gateway Dr NE,Long Term,Townhouse,2095.0,2,2.5,1403,Unfurnished,November 18,Non-Smoking,True,True
4,Airdrie,Alberta,69 Gateway Dr NE,Long Term,Townhouse,2495.0,2,2.5,1403,Unfurnished,February 21,Non-Smoking,True,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
25766,Yorkton,Saskatchewan,18 Allanbrooke Dr,Long Term,Apartment,920.0,0,1,400,Unfurnished,February 21,Non-Smoking,True,True
25767,Yorkton,Saskatchewan,18 Allanbrooke Dr,Long Term,Apartment,1450.0,3,1,810,Unfurnished,February 21,Non-Smoking,True,True
25768,Yorkton,Saskatchewan,363 Northview Rd,Long Term,Apartment,1125.0,1,1,520,Unfurnished,February 21,Non-Smoking,True,True
25769,Yorkton,Saskatchewan,363 Northview Rd,Long Term,Apartment,920.0,0,1,400,Unfurnished,February 21,Non-Smoking,True,True
