# Data Exploration
This notebook performs exploratory data analysis on the dataset.
To expand on the analysis, attach this notebook to the **bci-avm-dask** cluster,
edit [the options of pandas-profiling](https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/advanced_usage.html), and rerun it.

<p align="center">
<img width=25% src="https://blockchainclimate.org/wp-content/uploads/2020/11/cropped-BCI_Logo_LR-400x333.png" alt="bciAVM" height="300"/>
</p>

[![PyPI](https://badge.fury.io/py/bciavm.svg?maxAge=2592000)](https://badge.fury.io/py/bciavm)
[![PyPI Stats](https://img.shields.io/badge/bciavm-avm-blue)](https://pypistats.org/packages/bciavm)


This notebook contains code to take a `mlflow` registered model and distribute its work with a `Dask` cluster. 
<table>
    <tr>
        <td>
            <img width=25% src="https://saturn-public-assets.s3.us-east-2.amazonaws.com/example-resources/dask.png" width="150">
        </td>
    </tr>
</table>

The [Blockchain Climate Institute](https://blockchainclimate.org) (BCI) is a progressive think tank providing leading expertise in the deployment of emerging technologies for climate and sustainability actions. 

As an international network of scientific and technological experts, BCI is at the forefront of innovative efforts, enabling technology transfers, to create a sustainable and clean global future.

# Automated Valuation Model (AVM) 

### About
AVM is a term for a service that uses mathematical modeling combined with databases of existing properties and transactions to calculate real estate values. 
The majority of automated valuation models (AVMs) compare the values of similar properties at the same point in time. 
Many appraisers, and even Wall Street institutions, use this type of model to value residential properties. (see [What is an AVM](https://www.investopedia.com/terms/a/automated-valuation-model.asp) Investopedia.com)

For more detailed info about the AVM, please read the **About** paper found here `resources/2021-BCI-AVM-About.pdf`.

### Valuation Process
<img src="resources/valuation_process.png" height="360" >

**Key Functionality**

* **Supervised algorithms** 
* **Tree-based & deep learning algorithms** 
* **Feature engineering derived from small clusters of similar properties** 
* **Ensemble (value blending) approaches** 

### Set the required AWS Environment Variables
```shell
export ACCESS_KEY=YOURACCESS_KEY
export SECRET_KEY=YOURSECRET_KEY
export BUCKET_NAME=bci-transition-risk-data
export TABLE_DIRECTORY=/dbfs/FileStore/tables/
```

### Next Steps
Read more about bciAVM on our [documentation page](https://blockchainclimate.org/thought-leadership/#blog):

### How does it relate to BCI Risk Modeling?
<img src="resources/bci_flowchart_2.png" height="280" >


### Technical & financial support for development provided by:
<a href="https://www.gcode.ai">
    <img width=15% src="https://staticfiles-img.s3.amazonaws.com/avm/gcode_logo.png" alt="GCODE.ai"  height="25"/>
</a>


### Install [from PyPI](https://pypi.org/project/bciavm/)
```shell
pip install bciavm
```

This notebook covers the following steps:
- Import data from your local machine into the Databricks File System (DBFS)
- Download data from s3
- Train a machine learning models (or more technically, multiple models in a stacked pipeline) on the dataset
- Register the model in MLflow

Runtime Version: _8.3.x-cpu-ml-scala2.12_

In [2]:
import os
import uuid
import shutil
import pandas as pd

from mlflow.tracking import MlflowClient

# Download input data from mlflow into a pandas DataFrame
# create temp directory to download data
temp_dir = os.path.join(os.environ["SPARK_LOCAL_DIRS"], str(uuid.uuid4())[:8])
os.makedirs(temp_dir)

# download the artifact and read it
client = MlflowClient()
training_data_path = client.download_artifacts("30f2d98161fc441f941c658533c8201d", "data", temp_dir)
df = pd.read_parquet(os.path.join(training_data_path, "training_data"))

# delete the temp data
shutil.rmtree(temp_dir)

target_col = "Price_p"

## Profiling Results

In [4]:
from pandas_profiling import ProfileReport
df_profile = ProfileReport(df, title="Profiling Report", progress_bar=False)
profile_html = df_profile.to_html()

displayHTML(profile_html)

0,1
Number of variables,12
Number of observations,622278
Missing cells,533
Missing cells (%),< 0.1%
Duplicate rows,0
Duplicate rows (%),0.0%
Total size in memory,57.0 MiB
Average record size in memory,96.0 B

0,1
Numeric,7
Categorical,5

0,1
POSTCODE has a high cardinality: 390770 distinct values,High cardinality
POSTCODE_OUTCODE has a high cardinality: 2255 distinct values,High cardinality
POSTTOWN_e has a high cardinality: 1182 distinct values,High cardinality
POSTCODE_AREA has a high cardinality: 105 distinct values,High cardinality
unit_indx is uniformly distributed,Uniform
POSTCODE is uniformly distributed,Uniform
unit_indx has unique values,Unique
FLOOR_LEVEL_e has 575649 (92.5%) zeros,Zeros

0,1
Analysis started,2021-06-10 00:12:56.442742
Analysis finished,2021-06-10 00:15:15.544506
Duration,2 minutes and 19.1 seconds
Software version,pandas-profiling v2.11.0
Download configuration,config.yaml

0,1
Distinct,622278
Distinct (%),100.0%
Missing,0
Missing (%),0.0%
Infinite,0
Infinite (%),0.0%

0,1
Mean,430418.559
Minimum,0
Maximum,861313
Zeros,1
Zeros (%),< 0.1%
Memory size,4.7 MiB

0,1
Minimum,0.0
5-th percentile,42960.55
Q1,215200.25
median,430288.5
Q3,645581.75
95-th percentile,818083.15
Maximum,861313.0
Range,861313.0
Interquartile range (IQR),430381.5

0,1
Standard deviation,248568.2248
Coefficient of variation (CV),0.5775035013
Kurtosis,-1.199307105
Mean,430418.559
Median Absolute Deviation (MAD),215187.5
Skewness,0.0008085814892
Sum,2.6784 × 1011
Variance,6.17861624 × 1010
Monotocity,Not monotonic

Value,Count,Frequency (%)
2047,1,< 0.1%
691707,1,< 0.1%
695801,1,< 0.1%
693752,1,< 0.1%
720373,1,< 0.1%
718324,1,< 0.1%
706034,1,< 0.1%
710128,1,< 0.1%
671213,1,< 0.1%
658923,1,< 0.1%

Value,Count,Frequency (%)
0,1,< 0.1%
1,1,< 0.1%
2,1,< 0.1%
3,1,< 0.1%
5,1,< 0.1%

Value,Count,Frequency (%)
861313,1,< 0.1%
861312,1,< 0.1%
861310,1,< 0.1%
861309,1,< 0.1%
861307,1,< 0.1%

0,1
Distinct,390770
Distinct (%),62.8%
Missing,0
Missing (%),0.0%
Memory size,4.7 MiB

0,1
S70 2RP,34
LS2 7QB,26
LS2 7QS,21
EX34 8PF,20
S70 2RH,19
Other values (390765),622158

0,1
Max length,8.0
Median length,7.0
Mean length,7.437081176
Min length,6.0

0,1
Total characters,4627932
Distinct characters,37
Distinct categories,3 ?
Distinct scripts,2 ?
Distinct blocks,1 ?

0,1
Unique,244928 ?
Unique (%),39.4%

0,1
1st row,OX14 4LA
2nd row,LN6 7EP
3rd row,M29 8RN
4th row,DL3 0HE
5th row,OX12 9HX

Value,Count,Frequency (%)
S70 2RP,34,< 0.1%
LS2 7QB,26,< 0.1%
LS2 7QS,21,< 0.1%
EX34 8PF,20,< 0.1%
S70 2RH,19,< 0.1%
S60 1NU,18,< 0.1%
S70 2RF,17,< 0.1%
SR1 1XH,14,< 0.1%
S70 4PQ,14,< 0.1%
B42 2SY,13,< 0.1%

Value,Count,Frequency (%)
cr0,1357,0.1%
ng5,1212,0.1%
le3,1205,0.1%
le2,1151,0.1%
st5,1133,0.1%
cf14,1024,0.1%
st6,1016,0.1%
cv6,1015,0.1%
ng17,991,0.1%
st4,975,0.1%

Value,Count,Frequency (%)
,622278,13.4%
1,321518,6.9%
2,222135,4.8%
S,193604,4.2%
N,186695,4.0%
3,178311,3.9%
B,167553,3.6%
L,167088,3.6%
4,158506,3.4%
E,157039,3.4%

Value,Count,Frequency (%)
Uppercase Letter,2418395,52.3%
Decimal Number,1587259,34.3%
Space Separator,622278,13.4%

Value,Count,Frequency (%)
S,193604,8.0%
N,186695,7.7%
B,167553,6.9%
L,167088,6.9%
E,157039,6.5%
D,135609,5.6%
P,129888,5.4%
H,129359,5.3%
A,125543,5.2%
R,121233,5.0%

Value,Count,Frequency (%)
1,321518,20.3%
2,222135,14.0%
3,178311,11.2%
4,158506,10.0%
5,134842,8.5%
6,130349,8.2%
7,121056,7.6%
8,112936,7.1%
9,109384,6.9%
0,98222,6.2%

Value,Count,Frequency (%)
,622278,100.0%

Value,Count,Frequency (%)
Latin,2418395,52.3%
Common,2209537,47.7%

Value,Count,Frequency (%)
S,193604,8.0%
N,186695,7.7%
B,167553,6.9%
L,167088,6.9%
E,157039,6.5%
D,135609,5.6%
P,129888,5.4%
H,129359,5.3%
A,125543,5.2%
R,121233,5.0%

Value,Count,Frequency (%)
,622278,28.2%
1.0,321518,14.6%
2.0,222135,10.1%
3.0,178311,8.1%
4.0,158506,7.2%
5.0,134842,6.1%
6.0,130349,5.9%
7.0,121056,5.5%
8.0,112936,5.1%
9.0,109384,5.0%

Value,Count,Frequency (%)
ASCII,4627932,100.0%

Value,Count,Frequency (%)
,622278,13.4%
1,321518,6.9%
2,222135,4.8%
S,193604,4.2%
N,186695,4.0%
3,178311,3.9%
B,167553,3.6%
L,167088,3.6%
4,158506,3.4%
E,157039,3.4%

0,1
Distinct,2255
Distinct (%),0.4%
Missing,0
Missing (%),0.0%
Memory size,4.7 MiB

0,1
CR0,1357
NG5,1212
LE3,1205
LE2,1151
ST5,1133
Other values (2250),616220

0,1
Max length,4.0
Median length,3.0
Mean length,3.437081176
Min length,2.0

0,1
Total characters,2138820
Distinct characters,35
Distinct categories,2 ?
Distinct scripts,2 ?
Distinct blocks,1 ?

0,1
Unique,13 ?
Unique (%),< 0.1%

0,1
1st row,OX14
2nd row,LN6
3rd row,M29
4th row,DL3
5th row,OX12

Value,Count,Frequency (%)
CR0,1357,0.2%
NG5,1212,0.2%
LE3,1205,0.2%
LE2,1151,0.2%
ST5,1133,0.2%
CF14,1024,0.2%
ST6,1016,0.2%
CV6,1015,0.2%
NG17,991,0.2%
ST4,975,0.2%

Value,Count,Frequency (%)
cr0,1357,0.2%
ng5,1212,0.2%
le3,1205,0.2%
le2,1151,0.2%
st5,1133,0.2%
cf14,1024,0.2%
st6,1016,0.2%
cv6,1015,0.2%
ng17,991,0.2%
st4,975,0.2%

Value,Count,Frequency (%)
1,260697,12.2%
2,151438,7.1%
S,131634,6.2%
3,111105,5.2%
N,110384,5.2%
4,93799,4.4%
B,88925,4.2%
L,88298,4.1%
E,74397,3.5%
5,73928,3.5%

Value,Count,Frequency (%)
Uppercase Letter,1173839,54.9%
Decimal Number,964981,45.1%

Value,Count,Frequency (%)
S,131634,11.2%
N,110384,9.4%
B,88925,7.6%
L,88298,7.5%
E,74397,6.3%
C,65868,5.6%
T,55396,4.7%
R,54885,4.7%
P,54407,4.6%
D,52210,4.4%

Value,Count,Frequency (%)
1,260697,27.0%
2,151438,15.7%
3,111105,11.5%
4,93799,9.7%
5,73928,7.7%
6,72919,7.6%
7,59979,6.2%
8,49074,5.1%
0,46611,4.8%
9,45431,4.7%

Value,Count,Frequency (%)
Latin,1173839,54.9%
Common,964981,45.1%

Value,Count,Frequency (%)
S,131634,11.2%
N,110384,9.4%
B,88925,7.6%
L,88298,7.5%
E,74397,6.3%
C,65868,5.6%
T,55396,4.7%
R,54885,4.7%
P,54407,4.6%
D,52210,4.4%

Value,Count,Frequency (%)
1,260697,27.0%
2,151438,15.7%
3,111105,11.5%
4,93799,9.7%
5,73928,7.7%
6,72919,7.6%
7,59979,6.2%
8,49074,5.1%
0,46611,4.8%
9,45431,4.7%

Value,Count,Frequency (%)
ASCII,2138820,100.0%

Value,Count,Frequency (%)
1,260697,12.2%
2,151438,7.1%
S,131634,6.2%
3,111105,5.2%
N,110384,5.2%
4,93799,4.4%
B,88925,4.2%
L,88298,4.1%
E,74397,3.5%
5,73928,3.5%

0,1
Distinct,1182
Distinct (%),0.2%
Missing,0
Missing (%),0.0%
Memory size,4.7 MiB

0,1
LONDON,27100
BIRMINGHAM,10809
MANCHESTER,10447
NOTTINGHAM,9353
BRISTOL,9108
Other values (1177),555461

0,1
Max length,22.0
Median length,8.0
Mean length,8.815243348
Min length,3.0

0,1
Total characters,5485532
Distinct characters,33
Distinct categories,6 ?
Distinct scripts,2 ?
Distinct blocks,2 ?

0,1
Unique,40 ?
Unique (%),< 0.1%

0,1
1st row,ABINGDON
2nd row,LINCOLN
3rd row,MANCHESTER
4th row,DARLINGTON
5th row,WANTAGE

Value,Count,Frequency (%)
LONDON,27100,4.4%
BIRMINGHAM,10809,1.7%
MANCHESTER,10447,1.7%
NOTTINGHAM,9353,1.5%
BRISTOL,9108,1.5%
LIVERPOOL,8464,1.4%
LEEDS,8229,1.3%
SHEFFIELD,7240,1.2%
LEICESTER,5659,0.9%
STOKE-ON-TRENT,5236,0.8%

Value,Count,Frequency (%)
london,27100,4.0%
birmingham,10809,1.6%
manchester,10447,1.5%
nottingham,9353,1.4%
bristol,9108,1.3%
liverpool,8464,1.2%
leeds,8229,1.2%
sheffield,7240,1.1%
st,7021,1.0%
leicester,5659,0.8%

Value,Count,Frequency (%)
E,543456,9.9%
O,498580,9.1%
N,445903,8.1%
R,427444,7.8%
T,384711,7.0%
L,351617,6.4%
A,331434,6.0%
S,305444,5.6%
H,275841,5.0%
I,266202,4.9%

Value,Count,Frequency (%)
Uppercase Letter,5365699,97.8%
Space Separator,61014,1.1%
Dash Punctuation,49927,0.9%
Other Punctuation,8889,0.2%
Decimal Number,2,< 0.1%
Final Punctuation,1,< 0.1%

Value,Count,Frequency (%)
E,543456,10.1%
O,498580,9.3%
N,445903,8.3%
R,427444,8.0%
T,384711,7.2%
L,351617,6.6%
A,331434,6.2%
S,305444,5.7%
H,275841,5.1%
I,266202,5.0%

Value,Count,Frequency (%)
.,7012,78.9%
',1877,21.1%

Value,Count,Frequency (%)
8,1,50.0%
2,1,50.0%

Value,Count,Frequency (%)
,61014,100.0%

Value,Count,Frequency (%)
-,49927,100.0%

Value,Count,Frequency (%)
’,1,100.0%

Value,Count,Frequency (%)
Latin,5365699,97.8%
Common,119833,2.2%

Value,Count,Frequency (%)
E,543456,10.1%
O,498580,9.3%
N,445903,8.3%
R,427444,8.0%
T,384711,7.2%
L,351617,6.6%
A,331434,6.2%
S,305444,5.7%
H,275841,5.1%
I,266202,5.0%

Value,Count,Frequency (%)
,61014,50.9%
-,49927,41.7%
.,7012,5.9%
',1877,1.6%
’,1,< 0.1%
8,1,< 0.1%
2,1,< 0.1%

Value,Count,Frequency (%)
ASCII,5485531,> 99.9%
Punctuation,1,< 0.1%

Value,Count,Frequency (%)
E,543456,9.9%
O,498580,9.1%
N,445903,8.1%
R,427444,7.8%
T,384711,7.0%
L,351617,6.4%
A,331434,6.0%
S,305444,5.6%
H,275841,5.0%
I,266202,4.9%

Value,Count,Frequency (%)
’,1,100.0%

0,1
Distinct,5
Distinct (%),< 0.1%
Missing,0
Missing (%),0.0%
Memory size,4.7 MiB

0,1
House,486535
Bungalow,67309
Flat,59346
Maisonette,9083
Park home,5

0,1
Max length,10.0
Median length,5.0
Mean length,5.302141487
Min length,4.0

0,1
Total characters,3299406
Distinct characters,21
Distinct categories,3 ?
Distinct scripts,2 ?
Distinct blocks,1 ?

0,1
Unique,0 ?
Unique (%),0.0%

0,1
1st row,House
2nd row,House
3rd row,House
4th row,House
5th row,House

Value,Count,Frequency (%)
House,486535,78.2%
Bungalow,67309,10.8%
Flat,59346,9.5%
Maisonette,9083,1.5%
Park home,5,< 0.1%

Value,Count,Frequency (%)
house,486535,78.2%
bungalow,67309,10.8%
flat,59346,9.5%
maisonette,9083,1.5%
home,5,< 0.1%
park,5,< 0.1%

Value,Count,Frequency (%)
o,562932,17.1%
u,553844,16.8%
e,504706,15.3%
s,495618,15.0%
H,486535,14.7%
a,135743,4.1%
l,126655,3.8%
t,77512,2.3%
n,76392,2.3%
B,67309,2.0%

Value,Count,Frequency (%)
Lowercase Letter,2677123,81.1%
Uppercase Letter,622278,18.9%
Space Separator,5,< 0.1%

Value,Count,Frequency (%)
o,562932,21.0%
u,553844,20.7%
e,504706,18.9%
s,495618,18.5%
a,135743,5.1%
l,126655,4.7%
t,77512,2.9%
n,76392,2.9%
g,67309,2.5%
w,67309,2.5%

Value,Count,Frequency (%)
H,486535,78.2%
B,67309,10.8%
F,59346,9.5%
M,9083,1.5%
P,5,< 0.1%

Value,Count,Frequency (%)
,5,100.0%

Value,Count,Frequency (%)
Latin,3299401,> 99.9%
Common,5,< 0.1%

Value,Count,Frequency (%)
o,562932,17.1%
u,553844,16.8%
e,504706,15.3%
s,495618,15.0%
H,486535,14.7%
a,135743,4.1%
l,126655,3.8%
t,77512,2.3%
n,76392,2.3%
B,67309,2.0%

Value,Count,Frequency (%)
,5,100.0%

Value,Count,Frequency (%)
ASCII,3299406,100.0%

Value,Count,Frequency (%)
o,562932,17.1%
u,553844,16.8%
e,504706,15.3%
s,495618,15.0%
H,486535,14.7%
a,135743,4.1%
l,126655,3.8%
t,77512,2.3%
n,76392,2.3%
B,67309,2.0%

0,1
Distinct,15771
Distinct (%),2.5%
Missing,0
Missing (%),0.0%
Infinite,0
Infinite (%),0.0%

0,1
Mean,91.82389733
Minimum,1
Maximum,788
Zeros,0
Zeros (%),0.0%
Memory size,4.7 MiB

0,1
Minimum,1
5-th percentile,50
Q1,69
median,84
Q3,105
95-th percentile,161
Maximum,788
Range,787
Interquartile range (IQR),36

0,1
Standard deviation,36.26793152
Coefficient of variation (CV),0.3949726877
Kurtosis,11.61053867
Mean,91.82389733
Median Absolute Deviation (MAD),17
Skewness,2.173138551
Sum,57139991.18
Variance,1315.362857
Monotocity,Not monotonic

Value,Count,Frequency (%)
84,10281,1.7%
82,10044,1.6%
83,10013,1.6%
86,9771,1.6%
85,9697,1.6%
78,9664,1.6%
80,9569,1.5%
81,9544,1.5%
77,9397,1.5%
79,9362,1.5%

Value,Count,Frequency (%)
1.0,1,< 0.1%
1.55,1,< 0.1%
2.4,1,< 0.1%
3.42,1,< 0.1%
4.44,1,< 0.1%

Value,Count,Frequency (%)
788.0,2,< 0.1%
783.0,1,< 0.1%
774.0,1,< 0.1%
764.0,1,< 0.1%
750.2,1,< 0.1%

0,1
Distinct,10
Distinct (%),< 0.1%
Missing,0
Missing (%),0.0%
Infinite,0
Infinite (%),0.0%

0,1
Mean,4.566544213
Minimum,0
Maximum,9
Zeros,1887
Zeros (%),0.3%
Memory size,4.7 MiB

0,1
Minimum,0
5-th percentile,2
Q1,4
median,5
Q3,5
95-th percentile,7
Maximum,9
Range,9
Interquartile range (IQR),1

0,1
Standard deviation,1.476658676
Coefficient of variation (CV),0.3233645854
Kurtosis,0.4742862874
Mean,4.566544213
Median Absolute Deviation (MAD),1
Skewness,0.2955743711
Sum,2841660
Variance,2.180520846
Monotocity,Not monotonic

Value,Count,Frequency (%)
5,181047,29.1%
4,166072,26.7%
3,102464,16.5%
6,73861,11.9%
7,38080,6.1%
2,28604,4.6%
8,16732,2.7%
1,7228,1.2%
9,6303,1.0%
0,1887,0.3%

Value,Count,Frequency (%)
0,1887,0.3%
1,7228,1.2%
2,28604,4.6%
3,102464,16.5%
4,166072,26.7%

Value,Count,Frequency (%)
9,6303,1.0%
8,16732,2.7%
7,38080,6.1%
6,73861,11.9%
5,181047,29.1%

0,1
Distinct,20
Distinct (%),< 0.1%
Missing,533
Missing (%),0.1%
Infinite,0
Infinite (%),0.0%

0,1
Mean,0.1301803794
Minimum,-1
Maximum,21
Zeros,575649
Zeros (%),92.5%
Memory size,4.7 MiB

0,1
Minimum,-1
5-th percentile,0
Q1,0
median,0
Q3,0
95-th percentile,1
Maximum,21
Range,22
Interquartile range (IQR),0

0,1
Standard deviation,0.6054417876
Coefficient of variation (CV),4.650791389
Kurtosis,192.2711729
Mean,0.1301803794
Median Absolute Deviation (MAD),0
Skewness,9.941477188
Sum,80939
Variance,0.3665597581
Monotocity,Not monotonic

Value,Count,Frequency (%)
0,575649,92.5%
1,25341,4.1%
2,12612,2.0%
3,4579,0.7%
4,1606,0.3%
5,760,0.1%
-1,453,0.1%
6,375,0.1%
9,128,< 0.1%
11,55,< 0.1%

Value,Count,Frequency (%)
-1,453,0.1%
0,575649,92.5%
1,25341,4.1%
2,12612,2.0%
3,4579,0.7%

Value,Count,Frequency (%)
21,32,< 0.1%
20,4,< 0.1%
19,6,< 0.1%
18,7,< 0.1%
17,12,< 0.1%

0,1
Distinct,390147
Distinct (%),62.7%
Missing,0
Missing (%),0.0%
Infinite,0
Infinite (%),0.0%

0,1
Mean,52.40851022
Minimum,0
Maximum,55.78674356
Zeros,16
Zeros (%),< 0.1%
Memory size,4.7 MiB

0,1
Minimum,0.0
5-th percentile,50.79251263
Q1,51.46984945
median,52.34675878
Q3,53.40408633
95-th percentile,54.53728997
Maximum,55.78674356
Range,55.78674356
Interquartile range (IQR),1.934236881

0,1
Standard deviation,1.176932965
Coefficient of variation (CV),0.02245690557
Kurtosis,100.0974824
Mean,52.40851022
Median Absolute Deviation (MAD),0.9309691422
Skewness,-1.961379197
Sum,32612662.92
Variance,1.385171204
Monotocity,Not monotonic

Value,Count,Frequency (%)
53.55373207,34,< 0.1%
53.80343769,26,< 0.1%
51.43733422,26,< 0.1%
52.95566116,25,< 0.1%
53.60610539,23,< 0.1%
54.90896328,22,< 0.1%
53.80325025,21,< 0.1%
53.7894743,20,< 0.1%
51.201557,20,< 0.1%
53.55371597,19,< 0.1%

Value,Count,Frequency (%)
0.0,16,< 0.1%
49.91276216,1,< 0.1%
49.91332584,1,< 0.1%
49.91398443,2,< 0.1%
49.91430262,1,< 0.1%

Value,Count,Frequency (%)
55.78674356,1,< 0.1%
55.78599788,1,< 0.1%
55.78509944,2,< 0.1%
55.78477575,1,< 0.1%
55.7840122,2,< 0.1%

0,1
Distinct,390081
Distinct (%),62.7%
Missing,0
Missing (%),0.0%
Infinite,0
Infinite (%),0.0%

0,1
Mean,-1.373857516
Minimum,-6.315782356
Maximum,1.757011771
Zeros,0
Zeros (%),0.0%
Memory size,4.7 MiB

0,1
Minimum,-6.315782356
5-th percentile,-3.502808015
Q1,-2.23197609
median,-1.436574812
Q3,-0.3170242729
95-th percentile,0.7120693897
Maximum,1.757011771
Range,8.072794128
Interquartile range (IQR),1.914951817

0,1
Standard deviation,1.308073664
Coefficient of variation (CV),-0.952117413
Kurtosis,-0.2570506575
Mean,-1.373857516
Median Absolute Deviation (MAD),0.9879051935
Skewness,-0.1202718158
Sum,-854921.3074
Variance,1.711056711
Monotocity,Not monotonic

Value,Count,Frequency (%)
-1.486802079,34,< 0.1%
-1.535111,26,< 0.1%
-2.602802736,26,< 0.1%
-1.142014941,25,< 0.1%
-2.430226807,23,< 0.1%
-1.381406548,22,< 0.1%
-1.535265,21,< 0.1%
-1.549928237,20,< 0.1%
-4.114378,20,< 0.1%
-0.1261449198,19,< 0.1%

Value,Count,Frequency (%)
-6.315782356,1,< 0.1%
-6.315107,1,< 0.1%
-6.311807224,1,< 0.1%
-6.309106947,2,< 0.1%
-6.301083005,1,< 0.1%

Value,Count,Frequency (%)
1.757011771,1,< 0.1%
1.75652,1,< 0.1%
1.755024043,1,< 0.1%
1.754483346,1,< 0.1%
1.754417292,1,< 0.1%

0,1
Distinct,105
Distinct (%),< 0.1%
Missing,0
Missing (%),0.0%
Memory size,4.7 MiB

0,1
B,20786
S,15821
NG,14586
NE,13695
M,12298
Other values (100),545092

0,1
Max length,2.0
Median length,2.0
Mean length,1.885689354
Min length,1.0

0,1
Total characters,1173423
Distinct characters,24
Distinct categories,1 ?
Distinct scripts,1 ?
Distinct blocks,1 ?

0,1
Unique,0 ?
Unique (%),0.0%

0,1
1st row,OX
2nd row,LN
3rd row,M
4th row,DL
5th row,OX

Value,Count,Frequency (%)
B,20786,3.3%
S,15821,2.5%
NG,14586,2.3%
NE,13695,2.2%
M,12298,2.0%
CF,12178,2.0%
PO,11204,1.8%
BS,11087,1.8%
LE,10834,1.7%
PE,10709,1.7%

Value,Count,Frequency (%)
b,20786,3.3%
s,15821,2.5%
ng,14586,2.3%
ne,13695,2.2%
m,12298,2.0%
cf,12178,2.0%
po,11204,1.8%
bs,11087,1.8%
le,10834,1.7%
pe,10709,1.7%

Value,Count,Frequency (%)
S,131634,11.2%
N,110372,9.4%
B,88921,7.6%
L,88298,7.5%
E,74392,6.3%
C,65868,5.6%
T,55392,4.7%
R,54874,4.7%
P,54356,4.6%
D,52210,4.4%

Value,Count,Frequency (%)
Uppercase Letter,1173423,100.0%

Value,Count,Frequency (%)
S,131634,11.2%
N,110372,9.4%
B,88921,7.6%
L,88298,7.5%
E,74392,6.3%
C,65868,5.6%
T,55392,4.7%
R,54874,4.7%
P,54356,4.6%
D,52210,4.4%

Value,Count,Frequency (%)
Latin,1173423,100.0%

Value,Count,Frequency (%)
S,131634,11.2%
N,110372,9.4%
B,88921,7.6%
L,88298,7.5%
E,74392,6.3%
C,65868,5.6%
T,55392,4.7%
R,54874,4.7%
P,54356,4.6%
D,52210,4.4%

Value,Count,Frequency (%)
ASCII,1173423,100.0%

Value,Count,Frequency (%)
S,131634,11.2%
N,110372,9.4%
B,88921,7.6%
L,88298,7.5%
E,74392,6.3%
C,65868,5.6%
T,55392,4.7%
R,54874,4.7%
P,54356,4.6%
D,52210,4.4%

0,1
Distinct,12182
Distinct (%),2.0%
Missing,0
Missing (%),0.0%
Infinite,0
Infinite (%),0.0%

0,1
Mean,262122.3614
Minimum,37750
Maximum,1100000
Zeros,0
Zeros (%),0.0%
Memory size,4.7 MiB

0,1
Minimum,37750
5-th percentile,75000
Q1,140000
median,218000
Q3,330000
95-th percentile,616000
Maximum,1100000
Range,1062250
Interquartile range (IQR),190000

0,1
Standard deviation,172712.61
Coefficient of variation (CV),0.6589007099
Kurtosis,3.27587529
Mean,262122.3614
Median Absolute Deviation (MAD),88000
Skewness,1.635717617
Sum,1.631129788 × 1011
Variance,2.982964566 × 1010
Monotocity,Not monotonic

Value,Count,Frequency (%)
250000,6588,1.1%
150000,6221,1.0%
180000,6195,1.0%
200000,5918,1.0%
160000,5902,0.9%
220000,5877,0.9%
210000,5683,0.9%
170000,5584,0.9%
175000,5571,0.9%
125000,5544,0.9%

Value,Count,Frequency (%)
37750,10,< 0.1%
37800,1,< 0.1%
37950,4,< 0.1%
37952,1,< 0.1%
38000,231,< 0.1%

Value,Count,Frequency (%)
1100000,385,0.1%
1098500,1,< 0.1%
1098000,1,< 0.1%
1097500,1,< 0.1%
1097000,1,< 0.1%

Unnamed: 0,unit_indx,POSTCODE,POSTCODE_OUTCODE,POSTTOWN_e,PROPERTY_TYPE_e,TOTAL_FLOOR_AREA_e,NUMBER_HEATED_ROOMS_e,FLOOR_LEVEL_e,Latitude_m,Longitude_m,POSTCODE_AREA,Price_p
0,585322,OX14 4LA,OX14,ABINGDON,House,71.0,3.0,0.0,51.638615,-1.314107,OX,210000.0
1,127649,LN6 7EP,LN6,LINCOLN,House,97.0,1.0,0.0,53.200351,-0.57511,LN,160000.0
2,676794,M29 8RN,M29,MANCHESTER,House,69.68,3.0,0.0,53.513964,-2.457266,M,85000.0
3,289594,DL3 0HE,DL3,DARLINGTON,House,77.34,5.0,0.0,54.547117,-1.549301,DL,40000.0
4,149272,OX12 9HX,OX12,WANTAGE,House,67.0,3.0,0.0,51.593238,-1.440107,OX,215000.0
5,571686,YO8 3UZ,YO8,SELBY,House,202.0,7.0,0.0,53.81284,-1.10719,YO,487000.0
6,135097,NR35 2QQ,NR35,BUNGAY,House,65.0,4.0,0.0,52.469796,1.445647,NR,168000.0
7,709870,NE4 8TQ,NE4,NEWCASTLE UPON TYNE,Flat,57.0,3.0,1.0,54.975188,-1.654788,NE,47000.0
8,2272,TS23 4AH,TS23,BILLINGHAM,House,89.0,4.0,0.0,54.604319,-1.281244,TS,86000.0
9,308568,DN18 5QR,DN18,BARTON-UPON-HUMBER,House,157.0,7.0,0.0,53.685536,-0.438625,DN,250000.0

Unnamed: 0,unit_indx,POSTCODE,POSTCODE_OUTCODE,POSTTOWN_e,PROPERTY_TYPE_e,TOTAL_FLOOR_AREA_e,NUMBER_HEATED_ROOMS_e,FLOOR_LEVEL_e,Latitude_m,Longitude_m,POSTCODE_AREA,Price_p
622268,121111,PR5 4UF,PR5,PRESTON,House,60.8,3.0,0.0,53.730012,-2.67007,PR,120000.0
622269,266940,E17 3DW,E17,LONDON,Flat,41.0,2.0,1.0,51.587929,-0.011556,E,230550.0
622270,297911,FY5 3SW,FY5,THORNTON-CLEVELEYS,Bungalow,72.0,3.0,0.0,53.86671,-3.031645,FY,120000.0
622271,103303,GU12 4BX,GU12,ALDERSHOT,House,93.23,5.0,0.0,51.242584,-0.748817,GU,277500.0
622272,704572,S6 4FE,S6,SHEFFIELD,House,85.0,5.0,0.0,53.407108,-1.510484,S,170000.0
622273,150664,TA9 3ES,TA9,HIGHBRIDGE,Bungalow,83.0,4.0,0.0,51.225737,-2.98682,TA,255000.0
622274,248981,CR0 3HJ,CR0,CROYDON,House,69.0,4.0,0.0,51.38678,-0.123885,CR,305000.0
622275,750304,BD13 2NN,BD13,BRADFORD,House,69.0,3.0,0.0,53.754523,-1.863923,BD,89950.0
622276,12244,DN16 2QS,DN16,SCUNTHORPE,House,85.0,4.0,0.0,53.563553,-0.627013,DN,107500.0
622277,303515,DN14 5JH,DN14,GOOLE,House,131.0,6.0,0.0,53.70691,-0.86401,DN,75000.0
