### ydata-profiling is a one-line Exploratory Data Analysis (EDA) library that is aimed at providing an efficient solution for performing descriptive analysis.  It is allows the output of our analysis to be exported to html or json.

### 1. Let's import our depends and load our Data.

In [2]:
!pip install ydata-profiling





In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Our dataset was taken for the [City of Edmonton Property Assessment Dataset](https://data.edmonton.ca/City-Administration/Property-Assessment-Data-Current-Calendar-Year-/q7d6-ambg/about_data).  We have also modified the dataset from its original dataset to find with the object of our analysis.

In [4]:
edmonton = pd.read_csv("Property_Assessment_Data__Current_Calendar_Year__20240314_mod.csv")

  edmonton = pd.read_csv("Property_Assessment_Data__Current_Calendar_Year__20240314_mod.csv")


### 2. Data Understanding

In [5]:
edmonton.head(5)

Unnamed: 0,Suite,House Number,Street Name,Garage,Neighbourhood,Assessed Value,Latitude,Longitude,Point Location,Assessment Class % 1,Assessment Class 1,Unnamed: 11,Unnamed: 12,Unnamed: 13,Unnamed: 14,Unnamed: 15
0,,8403,156 AVENUE NW,N,BELLE RIVE,295500,53.616313,-113.471973,POINT (-113.47197299412386 53.61631345927817),100,RESIDENTIAL,,,,,
1,,9420,92 STREET NW,N,STRATHEARN,49836000,53.530949,-113.469279,POINT (-113.46927857387664 53.53094866121484),100,OTHER RESIDENTIAL,,,,,
2,,136,GRANDISLE WAY NW,Y,RIVERVIEW AREA,985000,53.436014,-113.663371,POINT (-113.66337124475555 53.436014211845055),100,RESIDENTIAL,,,,,
3,,15112,45 AVENUE NW,Y,RAMSAY HEIGHTS,268500,53.48456,-113.581913,POINT (-113.58191288991183 53.48456049274555),100,RESIDENTIAL,,,,,
4,,9315,175 AVENUE NW,Y,LAGO LINDO,349000,53.63672,-113.485279,POINT (-113.48527866070057 53.63671965668959),100,RESIDENTIAL,,,,,


In [6]:
edmonton.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 381550 entries, 0 to 381549
Data columns (total 16 columns):
 #   Column                Non-Null Count   Dtype  
---  ------                --------------   -----  
 0   Suite                 82452 non-null   object 
 1   House Number          381550 non-null  int64  
 2   Street Name           381550 non-null  object 
 3   Garage                381550 non-null  object 
 4   Neighbourhood         381550 non-null  object 
 5   Assessed Value        381550 non-null  int64  
 6   Latitude              381550 non-null  float64
 7   Longitude             381550 non-null  float64
 8   Point Location        381550 non-null  object 
 9   Assessment Class % 1  381550 non-null  int64  
 10  Assessment Class 1    381550 non-null  object 
 11  Unnamed: 11           0 non-null       float64
 12  Unnamed: 12           0 non-null       float64
 13  Unnamed: 13           0 non-null       float64
 14  Unnamed: 14           0 non-null       float64
 15  

In [7]:
edmonton.shape

(381550, 16)

In [8]:
edmonton = edmonton.drop(columns=["Unnamed: 11", "Unnamed: 12", "Unnamed: 13", "Unnamed: 14", "Unnamed: 15"], axis=1, inplace=False)

In [9]:
edmonton.head()

Unnamed: 0,Suite,House Number,Street Name,Garage,Neighbourhood,Assessed Value,Latitude,Longitude,Point Location,Assessment Class % 1,Assessment Class 1
0,,8403,156 AVENUE NW,N,BELLE RIVE,295500,53.616313,-113.471973,POINT (-113.47197299412386 53.61631345927817),100,RESIDENTIAL
1,,9420,92 STREET NW,N,STRATHEARN,49836000,53.530949,-113.469279,POINT (-113.46927857387664 53.53094866121484),100,OTHER RESIDENTIAL
2,,136,GRANDISLE WAY NW,Y,RIVERVIEW AREA,985000,53.436014,-113.663371,POINT (-113.66337124475555 53.436014211845055),100,RESIDENTIAL
3,,15112,45 AVENUE NW,Y,RAMSAY HEIGHTS,268500,53.48456,-113.581913,POINT (-113.58191288991183 53.48456049274555),100,RESIDENTIAL
4,,9315,175 AVENUE NW,Y,LAGO LINDO,349000,53.63672,-113.485279,POINT (-113.48527866070057 53.63671965668959),100,RESIDENTIAL


In [10]:
# Let check the shape of our dataframe
edmonton.shape

(381550, 11)

In [11]:
edmonton.describe(include="all")

Unnamed: 0,Suite,House Number,Street Name,Garage,Neighbourhood,Assessed Value,Latitude,Longitude,Point Location,Assessment Class % 1,Assessment Class 1
count,82452.0,381550.0,381550,381550,381550,381550.0,381550.0,381550.0,381550,381550.0,381550
unique,2472.0,,3025,2,402,,,,275902,,6
top,201.0,,104 STREET NW,Y,OLIVER,,,,POINT (-113.69680067871437 53.552227980365664),,RESIDENTIAL
freq,1041.0,,2314,249558,8324,,,,1058,,356192
mean,,7735.899649,,,,532376.0,53.518592,-113.512436,,99.946801,
std,,5467.681064,,,,3857850.0,0.069204,0.085664,,1.286385,
min,,1.0,,,,0.0,53.338595,-113.713318,,41.0,
25%,,2555.0,,,,221000.0,53.458673,-113.573103,,100.0,
50%,,8008.0,,,,359000.0,53.519348,-113.508501,,100.0,
75%,,11437.0,,,,477500.0,53.575658,-113.445229,,100.0,


In [12]:
edmonton.duplicated().sum()

0

In [13]:
# map() applies the str.lower() function to each of the columns in our dataset to convert the column names to all lowercase
edmonton.columns = map(str.lower, edmonton.columns)
edmonton.head()

Unnamed: 0,suite,house number,street name,garage,neighbourhood,assessed value,latitude,longitude,point location,assessment class % 1,assessment class 1
0,,8403,156 AVENUE NW,N,BELLE RIVE,295500,53.616313,-113.471973,POINT (-113.47197299412386 53.61631345927817),100,RESIDENTIAL
1,,9420,92 STREET NW,N,STRATHEARN,49836000,53.530949,-113.469279,POINT (-113.46927857387664 53.53094866121484),100,OTHER RESIDENTIAL
2,,136,GRANDISLE WAY NW,Y,RIVERVIEW AREA,985000,53.436014,-113.663371,POINT (-113.66337124475555 53.436014211845055),100,RESIDENTIAL
3,,15112,45 AVENUE NW,Y,RAMSAY HEIGHTS,268500,53.48456,-113.581913,POINT (-113.58191288991183 53.48456049274555),100,RESIDENTIAL
4,,9315,175 AVENUE NW,Y,LAGO LINDO,349000,53.63672,-113.485279,POINT (-113.48527866070057 53.63671965668959),100,RESIDENTIAL


In [14]:
# Let examine our data types
edmonton.dtypes

suite                    object
house number              int64
street name              object
garage                   object
neighbourhood            object
assessed value            int64
latitude                float64
longitude               float64
point location           object
assessment class % 1      int64
assessment class 1       object
dtype: object

In [15]:
# Convert assessment class % 1 from int to float

edmonton["assessment class % 1"] = edmonton["assessment class % 1"]/100

In [16]:
edmonton.dtypes 

suite                    object
house number              int64
street name              object
garage                   object
neighbourhood            object
assessed value            int64
latitude                float64
longitude               float64
point location           object
assessment class % 1    float64
assessment class 1       object
dtype: object

In [18]:
from ydata_profiling import ProfileReport

In [19]:
profile = ProfileReport(edmonton, title='Pandas Profiling Report', html={'style':{'full_width':True}})
profile

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

