<img src = "https://global-uploads.webflow.com/5f0d53c042a9ed6288de7f8d/5f6337ae2cfaa10946ceeb06_Hacktiv8%20logo%20horizontal%2001%20black-p-500.png" width = 400>
<h1 align=center><font size = 5>Hacktiv8 PTP Introduction to Data Science Projects 2 // Statistical Treatment for Datasets</font></h1>

Title: Hacktiv8 PTP Introduction to Data Science Projects 2: Statistical Treatment for Datasets Starter Notebook\
Last Updated: September 20, 2020\
Author: Raka Ardhi

## NYC Property Sales Introduction

The aim of this projects is to introduce you to practical statistic with Python as concrete and as consistent as possible. Using what you’ve learned; download the NYC Property Sales Dataset from Kaggle. This dataset is a record of every building or building unit (apartment, etc.) sold in the New York City property market over a 12-month period.

This dataset contains the location, address, type, sale price, and sale date of building units sold. A reference on the trickier fields:

* `BOROUGH`: A digit code for the borough the property is located in; in order these are Manhattan (1), Bronx (2), Brooklyn (3), Queens (4), and Staten Island (5).
* `BLOCK`; `LOT`: The combination of borough, block, and lot forms a unique key for property in New York City. Commonly called a BBL.
* `BUILDING CLASS AT PRESENT` and `BUILDING CLASS AT TIME OF SALE`: The type of building at various points in time.

Note that because this is a financial transaction dataset, there are some points that need to be kept in mind:

* Many sales occur with a nonsensically small dollar amount: $0 most commonly. These sales are actually transfers of deeds between parties: for example, parents transferring ownership to their home to a child after moving out for retirement.
* This dataset uses the financial definition of a building/building unit, for tax purposes. In case a single entity owns the building in question, a sale covers the value of the entire building. In case a building is owned piecemeal by its residents (a condominium), a sale refers to a single apartment (or group of apartments) owned by some individual.

Formulate a question and derive a statistical hypothesis test to answer the question. You have to demonstrate that you’re able to make decisions using data in a scientific manner. Examples of questions can be:

* Is there a difference in unit sold between property built in 1900-2000 and 2001 so on?
* Is there a difference in unit sold based on building category?
* What can you discover about New York City real estate by looking at a year's worth of raw transaction records? Can you spot trends in the market?

Please make sure that you have completed the lesson for this course, namely Python and Practical Statistics which is part of this Program.

**Note:** You can take a look at Project Rubric below:

| Code Review |  |
| :--- | :--- |
| CRITERIA | SPECIFICATIONS |
| Mean | Student implement mean to specifics column/data using pandas, numpy, or scipy|
| Median | Student implement median to specifics column/data using pandas, numpy, or scipy|
| Modus | Student implement modus to specifics column/data using pandas, numpy, or scipy|
| Central Tendencies | Implementing Central Tendencies through dataset |
| Box Plot | Implementing Box Plot to visualize spesific data |
| Z-Score | Implementing Z-score concept to specific data |
| Probability Distribution | Student analyzing distribution of data and gain insight from the distribution |
| Intervals | Implementing Confidence or Prediction Intervals |
| Hypotesis Testing | Made 1 Hypotesis and get conclusion from data |
| Preprocessing | Student preprocess dataset before applying the statistical treatment. |
| Does the code run without errors? | The code runs without errors. All code is functional and formatted properly. |

| Readability |  |
| :--- | :--- |
| CRITERIA | SPECIFICATIONS |
| Well Documented | All cell in notebook are well documented with markdown above each cell explaining the code|

| Analysis |  |
| :--- | :--- |
| CRITERIA | SPECIFICATIONS |
|Overall Analysis| Gain an insight/conclusion of overall plots that answer the hypotesis |

**Focus on "Graded-Function" sections.**

------------

## Data Preparation

Load the library you need.

Get your NYC property data from [here](https://www.kaggle.com/new-york-city/nyc-property-sales) and load the dataframe to your notebook.

In [4]:
# Get your import statement here
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
from datetime import datetime
import math
import statistics
import scipy.stats


In [5]:
# Load your dataset here
df = pd.read_csv('nyc-rolling-sales.csv')
print ('Data read into a pandas dataframe!')

Data read into a pandas dataframe!


Let's view the top 5 rows of the dataset using the `head()` function.

In [6]:
# Write your syntax here
df.head(5)

Unnamed: 0.1,Unnamed: 0,BOROUGH,NEIGHBORHOOD,BUILDING CLASS CATEGORY,TAX CLASS AT PRESENT,BLOCK,LOT,EASE-MENT,BUILDING CLASS AT PRESENT,ADDRESS,...,RESIDENTIAL UNITS,COMMERCIAL UNITS,TOTAL UNITS,LAND SQUARE FEET,GROSS SQUARE FEET,YEAR BUILT,TAX CLASS AT TIME OF SALE,BUILDING CLASS AT TIME OF SALE,SALE PRICE,SALE DATE
0,4,1,ALPHABET CITY,07 RENTALS - WALKUP APARTMENTS,2A,392,6,,C2,153 AVENUE B,...,5,0,5,1633,6440,1900,2,C2,6625000,2017-07-19 00:00:00
1,5,1,ALPHABET CITY,07 RENTALS - WALKUP APARTMENTS,2,399,26,,C7,234 EAST 4TH STREET,...,28,3,31,4616,18690,1900,2,C7,-,2016-12-14 00:00:00
2,6,1,ALPHABET CITY,07 RENTALS - WALKUP APARTMENTS,2,399,39,,C7,197 EAST 3RD STREET,...,16,1,17,2212,7803,1900,2,C7,-,2016-12-09 00:00:00
3,7,1,ALPHABET CITY,07 RENTALS - WALKUP APARTMENTS,2B,402,21,,C4,154 EAST 7TH STREET,...,10,0,10,2272,6794,1913,2,C4,3936272,2016-09-23 00:00:00
4,8,1,ALPHABET CITY,07 RENTALS - WALKUP APARTMENTS,2A,404,55,,C2,301 EAST 10TH STREET,...,6,0,6,2369,4615,1900,2,C2,8000000,2016-11-17 00:00:00


We can also veiw the bottom 5 rows of the dataset using the `tail()` function.

In [8]:
# Write your syntax here
df.tail(5)

Unnamed: 0.1,Unnamed: 0,BOROUGH,NEIGHBORHOOD,BUILDING CLASS CATEGORY,TAX CLASS AT PRESENT,BLOCK,LOT,EASE-MENT,BUILDING CLASS AT PRESENT,ADDRESS,...,RESIDENTIAL UNITS,COMMERCIAL UNITS,TOTAL UNITS,LAND SQUARE FEET,GROSS SQUARE FEET,YEAR BUILT,TAX CLASS AT TIME OF SALE,BUILDING CLASS AT TIME OF SALE,SALE PRICE,SALE DATE
84543,8409,5,WOODROW,02 TWO FAMILY DWELLINGS,1,7349,34,,B9,37 QUAIL LANE,...,2,0,2,2400,2575,1998,1,B9,450000,2016-11-28 00:00:00
84544,8410,5,WOODROW,02 TWO FAMILY DWELLINGS,1,7349,78,,B9,32 PHEASANT LANE,...,2,0,2,2498,2377,1998,1,B9,550000,2017-04-21 00:00:00
84545,8411,5,WOODROW,02 TWO FAMILY DWELLINGS,1,7351,60,,B2,49 PITNEY AVENUE,...,2,0,2,4000,1496,1925,1,B2,460000,2017-07-05 00:00:00
84546,8412,5,WOODROW,22 STORE BUILDINGS,4,7100,28,,K6,2730 ARTHUR KILL ROAD,...,0,7,7,208033,64117,2001,4,K6,11693337,2016-12-21 00:00:00
84547,8413,5,WOODROW,35 INDOOR PUBLIC AND CULTURAL FACILITIES,4,7105,679,,P9,155 CLAY PIT ROAD,...,0,1,1,10796,2400,2006,4,P9,69300,2016-10-27 00:00:00


BOROUGH: A digit code for the borough the property is located in; in order these are Manhattan (1), Bronx (2), Brooklyn (3), Queens (4), and Staten Island (5).

To view the dimensions of the dataframe, we use the `.shape` parameter. Expected result: (84548, 22)

In [9]:
# Write your syntax here
df.shape

(84548, 22)

According to this official page, Ease-ment is "is a right, such as a right of way, which allows an entity to make limited use of another’s real property. For example: MTA railroad tracks that run across a portion of another property". Also, the Unnamed column is not mentioned and was likely used for iterating through records. So, those two columns are removed for now.

In [10]:
# Drop 'Unnamed: 0' and 'EASE-MENT' features using .drop function
df = df.drop('Unnamed: 0', axis=1)
df = df.drop('EASE-MENT', axis=1)


Let's view Dtype of each features in dataframe using `.info()` function.

In [11]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 84548 entries, 0 to 84547
Data columns (total 20 columns):
BOROUGH                           84548 non-null int64
NEIGHBORHOOD                      84548 non-null object
BUILDING CLASS CATEGORY           84548 non-null object
TAX CLASS AT PRESENT              84548 non-null object
BLOCK                             84548 non-null int64
LOT                               84548 non-null int64
BUILDING CLASS AT PRESENT         84548 non-null object
ADDRESS                           84548 non-null object
APARTMENT NUMBER                  84548 non-null object
ZIP CODE                          84548 non-null int64
RESIDENTIAL UNITS                 84548 non-null int64
COMMERCIAL UNITS                  84548 non-null int64
TOTAL UNITS                       84548 non-null int64
LAND SQUARE FEET                  84548 non-null object
GROSS SQUARE FEET                 84548 non-null object
YEAR BUILT                        84548 non-null int64
TAX

It looks like empty records are not being treated as NA. We convert columns to their appropriate data types to obtain NAs.

In [12]:
#First, let's check which columns should be categorical
print('Column name')
for col in df.columns:
    if df[col].dtype=='object':
        print(col, df[col].nunique())

Column name
NEIGHBORHOOD 254
BUILDING CLASS CATEGORY 47
TAX CLASS AT PRESENT 11
BUILDING CLASS AT PRESENT 167
ADDRESS 67563
APARTMENT NUMBER 3989
LAND SQUARE FEET 6062
GROSS SQUARE FEET 5691
BUILDING CLASS AT TIME OF SALE 166
SALE PRICE 10008
SALE DATE 364


In [13]:
# LAND SQUARE FEET,GROSS SQUARE FEET, SALE PRICE, BOROUGH should be numeric. 
# SALE DATE datetime format.
# categorical: NEIGHBORHOOD, BUILDING CLASS CATEGORY, TAX CLASS AT PRESENT, BUILDING CLASS AT PRESENT,
# BUILDING CLASS AT TIME OF SALE, TAX CLASS AT TIME OF SALE,BOROUGH 

numer = ['LAND SQUARE FEET','GROSS SQUARE FEET', 'SALE PRICE', 'BOROUGH']
for col in numer: # coerce for missing values
    df[col] = pd.to_numeric(df[col], errors='coerce')

categ = ['NEIGHBORHOOD', 'BUILDING CLASS CATEGORY', 'TAX CLASS AT PRESENT', 'BUILDING CLASS AT PRESENT', 'BUILDING CLASS AT TIME OF SALE', 'TAX CLASS AT TIME OF SALE']
for col in categ:
    df[col] = df[col].astype('category')

df['SALE DATE'] = pd.to_datetime(df['SALE DATE'], errors='coerce')

Our dataset is ready for checking missing values.

In [14]:
missing = df.isnull().sum()/len(df)*100

print(pd.DataFrame([missing[missing>0],pd.Series(df.isnull().sum()[df.isnull().sum()>1000])], index=['percent missing','how many missing']))

                  LAND SQUARE FEET  GROSS SQUARE FEET   SALE PRICE
percent missing          31.049818          32.658372     17.22217
how many missing      26252.000000       27612.000000  14561.00000


Around 30% of GROSS SF and LAND SF are missing. Furthermore, around 17% of SALE PRICE is also missing.

We can fill in the missing value from one column to another, which will help us reduce missing values. Expected values:

(6, 20)

(1366, 20)

In [15]:
print(df[(df['LAND SQUARE FEET'].isnull()) & (df['GROSS SQUARE FEET'].notnull())].shape)
print(df[(df['LAND SQUARE FEET'].notnull()) & (df['GROSS SQUARE FEET'].isnull())].shape)

(6, 20)
(1366, 20)


There are 1372 rows that can be filled in with their approximate values.

In [16]:
df['LAND SQUARE FEET'] = df['LAND SQUARE FEET'].mask((df['LAND SQUARE FEET'].isnull()) & (df['GROSS SQUARE FEET'].notnull()), df['GROSS SQUARE FEET'])
df['GROSS SQUARE FEET'] = df['GROSS SQUARE FEET'].mask((df['LAND SQUARE FEET'].notnull()) & (df['GROSS SQUARE FEET'].isnull()), df['LAND SQUARE FEET'])

In [16]:
#  Check for duplicates before

print(sum(df.duplicated()))

df[df.duplicated(keep=False)].sort_values(['NEIGHBORHOOD', 'ADDRESS']).head(10)

# df.duplicated() automatically excludes duplicates, to keep duplicates in df we use keep=False

# in df.duplicated(df.columns) we can specify column names to look for duplicates only in those mentioned columns.

765


Unnamed: 0,BOROUGH,NEIGHBORHOOD,BUILDING CLASS CATEGORY,TAX CLASS AT PRESENT,BLOCK,LOT,BUILDING CLASS AT PRESENT,ADDRESS,APARTMENT NUMBER,ZIP CODE,RESIDENTIAL UNITS,COMMERCIAL UNITS,TOTAL UNITS,LAND SQUARE FEET,GROSS SQUARE FEET,YEAR BUILT,TAX CLASS AT TIME OF SALE,BUILDING CLASS AT TIME OF SALE,SALE PRICE,SALE DATE
76286,5,ANNADALE,02 TWO FAMILY DWELLINGS,1,6350,7,B2,106 BENNETT PLACE,,10312,2,0,2,8000.0,4208.0,1985,1,B2,,2017-06-27
76287,5,ANNADALE,02 TWO FAMILY DWELLINGS,1,6350,7,B2,106 BENNETT PLACE,,10312,2,0,2,8000.0,4208.0,1985,1,B2,,2017-06-27
76322,5,ANNADALE,05 TAX CLASS 1 VACANT LAND,1B,6459,28,V0,N/A HYLAN BOULEVARD,,0,0,0,0,6667.0,6667.0,0,1,V0,,2017-05-11
76323,5,ANNADALE,05 TAX CLASS 1 VACANT LAND,1B,6459,28,V0,N/A HYLAN BOULEVARD,,0,0,0,0,6667.0,6667.0,0,1,V0,,2017-05-11
76383,5,ARDEN HEIGHTS,01 ONE FAMILY DWELLINGS,1,5741,93,A5,266 ILYSSA WAY,,10312,1,0,1,500.0,1354.0,1996,1,A5,320000.0,2017-06-06
76384,5,ARDEN HEIGHTS,01 ONE FAMILY DWELLINGS,1,5741,93,A5,266 ILYSSA WAY,,10312,1,0,1,500.0,1354.0,1996,1,A5,320000.0,2017-06-06
76643,5,ARROCHAR,02 TWO FAMILY DWELLINGS,1,3103,57,B2,129 MC CLEAN AVENUE,,10305,2,0,2,5000.0,2733.0,1925,1,B2,,2017-03-21
76644,5,ARROCHAR,02 TWO FAMILY DWELLINGS,1,3103,57,B2,129 MC CLEAN AVENUE,,10305,2,0,2,5000.0,2733.0,1925,1,B2,,2017-03-21
50126,4,ASTORIA,03 THREE FAMILY DWELLINGS,1,856,139,C0,22-18 27TH STREET,,11105,3,0,3,2000.0,1400.0,1930,1,C0,,2017-01-12
50127,4,ASTORIA,03 THREE FAMILY DWELLINGS,1,856,139,C0,22-18 27TH STREET,,11105,3,0,3,2000.0,1400.0,1930,1,C0,,2017-01-12


The dataframe has 765 duplicated rows (exluding the original rows).

In [17]:
df.drop_duplicates(inplace=True)

print(sum(df.duplicated()))

0


## Exploratory data analysis

Now, let's get a simple descriptive statistics with `.describe()` function for `COMMERCIAL UNITS` features.

In [18]:
df[df['COMMERCIAL UNITS']==0].describe()

Unnamed: 0,BOROUGH,BLOCK,LOT,ZIP CODE,RESIDENTIAL UNITS,COMMERCIAL UNITS,TOTAL UNITS,LAND SQUARE FEET,GROSS SQUARE FEET,YEAR BUILT,SALE PRICE
count,78777.0,78777.0,78777.0,78777.0,78777.0,78777.0,78777.0,52780.0,52780.0,78777.0,65629.0
mean,3.004329,4273.781015,395.42242,10722.737068,1.691737,0.0,1.724133,3140.14,2714.612,1781.065451,995296.9
std,1.298594,3589.24194,671.604654,1318.493961,9.838994,0.0,9.835016,29299.99,27912.94,551.02457,3329268.0
min,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,2.0,1330.0,23.0,10304.0,0.0,0.0,1.0,1600.0,975.0,1920.0,240000.0
50%,3.0,3340.0,52.0,11209.0,1.0,0.0,1.0,2295.0,1600.0,1940.0,529490.0
75%,4.0,6361.0,1003.0,11357.0,2.0,0.0,2.0,3300.0,2388.0,1967.0,921956.0
max,5.0,16322.0,9106.0,11694.0,889.0,0.0,889.0,4252327.0,4252327.0,2017.0,345000000.0


Let us try to understand the columns. Above table shows descriptive statistics for the numeric columns.

- There are zipcodes with 0 value
- Can block/lot numbers go up to 16322?
- Most of the properties have 2 unit and maximum of 1844 units? The latter might mean some company purchased a building. This should be treated as an outlier.
- Other columns also have outliers which needs further investigation.
- Year column has a year with 0
- Most sales prices less than 10000 can be treated as gift or transfer fees.

Now, let's get a simple descriptive statistics with `.describe()` function for `RESIDENTIAL UNITS` features.

Function below are graded function. (1 Points)

In [19]:
# Write your function below
df[df['RESIDENTIAL UNITS']!=0].describe()
# Graded-Funtion Begin (~1 Lines)
df[df['RESIDENTIAL UNITS']!=0][['BLOCK','LOT','RESIDENTIAL UNITS','LAND SQUARE FEET','GROSS SQUARE FEET','YEAR BUILT','SALE PRICE']].describe()
# Graded-Funtion End

Unnamed: 0,BLOCK,LOT,RESIDENTIAL UNITS,LAND SQUARE FEET,GROSS SQUARE FEET,YEAR BUILT,SALE PRICE
count,59237.0,59237.0,59237.0,48231.0,48231.0,59237.0,48752.0
mean,4610.712342,414.339788,2.828705,3156.0,3287.738,1837.171987,1130316.0
std,3686.816317,687.593893,19.645674,28142.0,29824.01,456.570222,4535920.0
min,1.0,1.0,1.0,0.0,0.0,0.0,0.0
25%,1490.0,26.0,1.0,1850.0,1248.0,1920.0,290000.0
50%,3937.0,56.0,1.0,2422.0,1800.0,1935.0,580000.0
75%,6713.0,1009.0,2.0,3500.0,2600.0,1979.0,980000.0
max,16322.0,9106.0,1844.0,4228300.0,3750565.0,2017.0,620000000.0


Write your findings below: 
1. Ada zipcode dengan angka 0
2. Ada Kolom Tahun dengan angka 0, apakah artinya ada yg belum dibangun?
3. Apakah jumlah block bisa lebih dari 16317?
4. Nilai minimum 1; 25% =1; 50 =1, apakah mungkin?
5. Ada Sale price 0 apakah artinya belum terjual?

Use `.value_counts` function to count total value of `BOROUGH` features. Expected value:

4    26548\
3    23843\
1    18102\
5     8296\
2     6994\
Name: BOROUGH, dtype: int64

In [26]:
# Write your syntax below
df['BOROUGH'].value_counts()


4    26548
3    23843
1    18102
5     8296
2     6994
Name: BOROUGH, dtype: int64

From here, we can calculate the mean for each Borough. Use `.mean()` function to calculate mean.

Function below are graded function. (1 Points)

In [64]:
# Write your function below
#df1 = df.[df['BOROUGH']== 1].mean()
#print(df1)
#df2 = df[df['BOROUGH']==2].mean()
#print(df2)
#df3 = df[df['BOROUGH']==3].mean()
#print(df3)
#df4 = df[df['BOROUGH']==4].mean()
#print(df4)
#df5 = df[df['BOROUGH']==5].mean()
#print(df5)
# Graded-Funtion Begin (~1 Lines)
for i in range(1,len(df['BOROUGH'].value_counts())+1):
    print(df.loc[df['BOROUGH']==i].mean())
# Graded-Funtion End

BOROUGH                      1.000000e+00
BLOCK                        1.107658e+03
LOT                          7.491904e+02
ZIP CODE                     9.912566e+03
RESIDENTIAL UNITS            2.276931e+00
COMMERCIAL UNITS             2.805215e-01
TOTAL UNITS                  2.597227e+00
LAND SQUARE FEET             5.646946e+03
GROSS SQUARE FEET            3.262300e+04
YEAR BUILT                   1.706537e+03
TAX CLASS AT TIME OF SALE    2.120539e+00
SALE PRICE                   3.344642e+06
dtype: float64
BOROUGH                           2.000000
BLOCK                          4202.934372
LOT                             298.625679
ZIP CODE                      10360.980841
RESIDENTIAL UNITS                 3.343580
COMMERCIAL UNITS                  0.160280
TOTAL UNITS                       3.510152
LAND SQUARE FEET               3909.012725
GROSS SQUARE FEET              4489.147412
YEAR BUILT                     1750.578067
TAX CLASS AT TIME OF SALE         1.525736
SALE PRI

From here, we can calculate the median for each Borough. Use `.median()` function to calculate median.

Function below are graded function. (1 Points)

In [65]:
# Write your function below
#df1med = df.loc[df['BOROUGH']== 1].median()
#print(df1med)
#df2med = df[df['BOROUGH']==2].median()
#print(df2med)
#df3med = df[df['BOROUGH']==3].median()
#print(df3med)
#df4med = df[df['BOROUGH']==4].median()
#print(df4med)
#df5med = df[df['BOROUGH']==5].median()
print(df5med)
# Graded-Funtion Begin (~1 Lines)
for i in range(1,len(df['BOROUGH'].value_counts())+1):
    print(df.loc[df['BOROUGH']==i].median())
# Graded-Funtion End

BOROUGH                           5.0
BLOCK                          3024.0
LOT                              56.0
ZIP CODE                      10308.0
RESIDENTIAL UNITS                 1.0
COMMERCIAL UNITS                  0.0
TOTAL UNITS                       1.0
LAND SQUARE FEET               3700.0
GROSS SQUARE FEET              1632.0
YEAR BUILT                     1973.0
TAX CLASS AT TIME OF SALE         1.0
SALE PRICE                   465000.0
dtype: float64
BOROUGH                            1.0
BLOCK                           1170.0
LOT                             1004.0
ZIP CODE                       10022.0
RESIDENTIAL UNITS                  0.0
COMMERCIAL UNITS                   0.0
TOTAL UNITS                        1.0
LAND SQUARE FEET                2498.0
GROSS SQUARE FEET               7520.0
YEAR BUILT                      1937.0
TAX CLASS AT TIME OF SALE          2.0
SALE PRICE                   1155000.0
dtype: float64
BOROUGH                           2.0
BLOCK   

From here, we can calculate the mode for each Borough.

Function below are graded function. (1 Points)

In [67]:
# Write your function below
#df1mod = df[df['BOROUGH']== 1].mode()
#print (df1mod)
#df2mod = df[df['BOROUGH']== 2].mode()
#print (df2mod)
#df3mod = df[df['BOROUGH']== 3].mode()
#print (df3mod)
#df4mod = df[df['BOROUGH']== 4].mode()
#print (df4mod)
#df5mod = df[df['BOROUGH']== 5].mode()
#print (df5mod)
# Graded-Funtion Begin (~1 Lines)
for i in range(1,len(df['BOROUGH'].value_counts())+1):
    print(df.loc[df['BOROUGH']==i].mode())
# Graded-Funtion End

   BOROUGH             NEIGHBORHOOD  \
0        1  UPPER EAST SIDE (59-79)   

                       BUILDING CLASS CATEGORY TAX CLASS AT PRESENT  BLOCK  \
0  13 CONDOS - ELEVATOR APARTMENTS                                2     16   

   LOT BUILDING CLASS AT PRESENT                 ADDRESS APARTMENT NUMBER  \
0    1                        R4  169 WEST 95TH   STREET                    

   ZIP CODE  RESIDENTIAL UNITS  COMMERCIAL UNITS  TOTAL UNITS  \
0     10011                  0                 0            1   

   LAND SQUARE FEET  GROSS SQUARE FEET  YEAR BUILT TAX CLASS AT TIME OF SALE  \
0            2523.0           112850.0           0                         2   

  BUILDING CLASS AT TIME OF SALE  SALE PRICE  SALE DATE  
0                             R4        10.0 2017-08-07  
   BOROUGH NEIGHBORHOOD                      BUILDING CLASS CATEGORY  \
0        2    RIVERDALE  02 TWO FAMILY DWELLINGS                       

  TAX CLASS AT PRESENT  BLOCK  LOT BUILDING CLASS AT PRE

From here, we can calculate the Range for each Borough.

Function below are graded function. (1 Points)

In [44]:
# Write your function below
df1r = df[df['BOROUGH']== 1].range()
print(df1r)

## Graded-Funtion Begin (~1 Lines)

# Graded-Funtion End

AttributeError: 'DataFrame' object has no attribute 'range'

From here, we can calculate the Variance for each Borough.

Function below are graded function. (1 Points)

In [None]:
# Write your function below

# Graded-Funtion Begin (~1 Lines)

# Graded-Funtion End

From here, we can calculate the SD for each Borough.

Function below are graded function. (1 Points)

In [None]:
# Write your function below

# Graded-Funtion Begin (~1 Lines)

# Graded-Funtion End

Now we can analyze Probability Distibution below.

Function below are graded function. (1 Points)

In [None]:
# Write your function below

# Graded-Funtion Begin





# Graded-Funtion End

Now we can analyze Confidence Intervals below.

Function below are graded function. (1 Points)

In [None]:
# Write your function below

# Graded-Funtion Begin





# Graded-Funtion End

Make your Hypothesis Testing below

Function below are graded function. (1 Points)

In [None]:
# Write your function below

# Graded-Funtion Begin





# Graded-Funtion End

Write your final conclusion below.

Your conclusion below are graded. (1 Points)