In [1]:
import pandas as pd
import numpy as np
pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_columns', None)

In [2]:
listings = pd.read_csv('listings.csv')

# 1. Column analysis

In [3]:
listings.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14667 entries, 0 to 14666
Data columns (total 65 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   __type                14667 non-null  object 
 1   MlsNumber             14667 non-null  int64  
 2   CategoryCode          14667 non-null  object 
 3   Lat                   14667 non-null  float64
 4   Lng                   14667 non-null  float64
 5   Address               14667 non-null  object 
 6   City                  14667 non-null  object 
 7   Quartier              11488 non-null  object 
 8   ShortCity             14667 non-null  object 
 9   PostalCode            14667 non-null  object 
 10  ShortDescription      0 non-null      float64
 11  LongDescription       12922 non-null  object 
 12  PhotoText             14667 non-null  object 
 13  BuyPrice              14667 non-null  object 
 14  BuyPriceDesc          2811 non-null   object 
 15  LocationPrice      

## 1.1. Columns description
- `__type` - ?
- `MlsNumber` - probably id,
- `CategoryCode` - 'COP', 'PPR', 'PCI', 'UNI', 'TER', 'FER'
    COP: AP (Apartment), LS (Loft/Studio), MA (Condo/Loft, House) -- Townhouse
    FER: FE (Farm)
    PCI: C (Commercial), I (Industrial), VE (Bulk, block sale)
    PPR: 2X (Duplex), 3X (Triplex), 4X (Quadruplex), 5X (Quintuplex), AU (Revenue Property)
    TER: TE, TR (Land) 
    UNI: ME/MEM (Single family, 2 or more storeys), MM (Single family, mobile home), MPM (Single family, split level), PP (single family, bungalow)
- `Lat` - latitude,
- `Lng` - longitude,
- `Adress`
- `City` - 33 different cities
- `Quartier` - 52 different districts
- `ShortDescription` - column to drop, contains only NULLS
- `LongDescription` - description od the property(in french or in english)
- `BuyPrice` - buy price in dollars
- `BuyPriceDesc` - '+GST/QST': - goods and services tax (GST) and the Québec sales tax (QST),
    '/square foot', ' /square foot +GST/QST'
- `LocationPrice` - maybe additional price for location,
- `LocationPriceDesc` -  /month', ' /year', ' /month +GST/QST',
       ' /year /square foot +GST/QST', ' /year +GST/QST,
- `Category` - type of building, full category name,
- `Construction` - year of construction,
- `OpenHouse` - probably the open hours when you can come to see the house on sale,
- `GenreCode` - ???,
- `CatgCode` - duplicate of column `CategoryCode`,
- `Utilisation` - mostly nan, other categories: 'Residential only', 'Commercial and residential',
       'Industrial and offices', 'Residential and commercial',
       'Commercial and office space', 'Commercial and industrial',
       'Commercial only', 'Commercial or industrial (income)',
       'Multi-family dwelling', 'Offices only', 'Industrial only',
       'Other', 'Retirement home',
       
- `RevenuEffectif` - column to drop, contains only zeros,
- `RevenueVrutPotentiel` - Gross potential income (GPI) refers to the total rental income a property can produce if all units were fully leased and rented at market rents with a zero vacancy rate,
- `WalkScore` -
    | Score  | Description |
    |---|--:|
    | 90–100	Walker's Paradise  | Daily errands do not require a car. |
    | 70–89	Very Walkable | Most errands can be accomplished on foot. |
    | 50–69	Somewhat Walkable | Some errands can be accomplished on foot. |
    | 25–49	Car-Dependent | Most errands require a car. |
    | 0–24	Car-Dependent | Almost all errands require a car. |
- `SuperficieTerrain` - column to drop, contains only zeros,
- `NbPieces` - number of rooms,
- `NbChambers` - number of bedrooms,
- `NbSallesEaux` - number of bathrooms with shower,
- `NbSallesBains` - number of bathrooms with bathtub,
- `NbFoyerPoele` - number of fireplaces,
- `NbEquipements` - column to drop, contains only zeros,
- `NbGarages` - number of garages,
- `NbStationnements` - number of parking lots,
- `NbPiscines` - number  of pools,
- `NbBordEaux` - ???,
- `NbAnimals` - column to drop, contains only zeros,
- `NbCultures` - ???,
- `Language` - column to drop, contains only nulls,





### 1.1.1. City vs. ShortCity

In [4]:
print(listings.City[listings.City != listings.ShortCity])
print(listings.ShortCity[listings.City != listings.ShortCity])

1172      Côte-des-Neiges/Notre-Dame-de-Grâce (Montréal)
1173      Côte-des-Neiges/Notre-Dame-de-Grâce (Montréal)
1174      Côte-des-Neiges/Notre-Dame-de-Grâce (Montréal)
1175      Côte-des-Neiges/Notre-Dame-de-Grâce (Montréal)
1176      Côte-des-Neiges/Notre-Dame-de-Grâce (Montréal)
                              ...                       
14426    Villeray/Saint-Michel/Parc-Extension (Montréal)
14427    Villeray/Saint-Michel/Parc-Extension (Montréal)
14428    Villeray/Saint-Michel/Parc-Extension (Montréal)
14429    Villeray/Saint-Michel/Parc-Extension (Montréal)
14430    Villeray/Saint-Michel/Parc-Extension (Montréal)
Name: City, Length: 2385, dtype: object
1172     Côte-des-Neiges/Notre-Dame-de-Grâce ...
1173     Côte-des-Neiges/Notre-Dame-de-Grâce ...
1174     Côte-des-Neiges/Notre-Dame-de-Grâce ...
1175     Côte-des-Neiges/Notre-Dame-de-Grâce ...
1176     Côte-des-Neiges/Notre-Dame-de-Grâce ...
                          ...                   
14426    Villeray/Saint-Michel/Parc-Ext