_**DELETE BEFORE PUBLISHING**_

_This is a template also containing the style guide for use cases. The styling uses the use-case css when uploaded to the website, which will not be visible on your local machine._

_Change any text marked with {} and delete any cells marked DELETE_

***

In [1]:
# DELETE BEFORE PUBLISHING
# This is just here so you can preview the styling on your local machine

from IPython.core.display import HTML
HTML("""
<style>
.usecase-title, .usecase-duration, .usecase-section-header {
    padding-left: 15px;
    padding-bottom: 10px;
    padding-top: 10px;
    padding-right: 15px;
    background-color: #0f9295;
    color: #fff;
}

.usecase-title {
    font-size: 1.7em;
    font-weight: bold;
}

.usecase-authors, .usecase-level, .usecase-skill {
    padding-left: 15px;
    padding-bottom: 7px;
    padding-top: 7px;
    background-color: #baeaeb;
    font-size: 1.4em;
    color: #121212;
}

.usecase-level-skill  {
    display: flex;
}

.usecase-level, .usecase-skill {
    width: 50%;
}

.usecase-duration, .usecase-skill {
    text-align: right;
    padding-right: 15px;
    padding-bottom: 8px;
    font-size: 1.4em;
}

.usecase-section-header {
    font-weight: bold;
    font-size: 1.5em;
}

.usecase-subsection-header, .usecase-subsection-blurb {
    font-weight: bold;
    font-size: 1.2em;
    color: #121212;
}

.usecase-subsection-blurb {
    font-size: 1em;
    font-style: italic;
}
</style>
""")

<div class="usecase-title">Exploring Business Establishments in the City of Melbourne by ANZSIC4 Classification and CLUE Small Area</div>

<div class="usecase-authors"><b>Authored by: </b> Naga Nikhil Woopalanchi</div>

<div class="usecase-duration"><b>Duration:</b> {90} mins</div>

<div class="usecase-level-skill">
    <div class="usecase-level"><b>Level: </b>{Intermediate}</div>
    <div class="usecase-skill"><b>Pre-requisite Skills: </b>{Python, and add any more skills needed}</div>
</div>

<div class="usecase-section-header">Scenario</div>

write a description of the problem you are trying to solve for this use case{Using User Story format, .}

<div class="usecase-section-header">What this use case will teach you</div>

At the end of this use case you will:
- {list the skills demonstrated in your use case}

<div class="usecase-section-header">{Heading for introduction or background relating to problem}</div>

{Write your introduction here. Keep it concise. We're not after "War and Peace" but enough background information to inform the reader on the rationale for solving this problem or background non-technical information that helps explain the approach. You may also wish to give information on the datasets, particularly how to source those not being imported from the client's open data portal.}



***

_**DELETE BEFORE PUBLISHING**_

## Style guide for use cases

### Headers

For styling within your markdown cells, there are two choices you can use for headers.

1) You can use HTML classes specific to the use case styling:

```<p class="usecase-subsection-header">This is a subsection header.</p>```

<p style="font-weight: bold; font-size: 1.2em;">This is a subsection header.</p>

```<p class="usecase-subsection-blurb">This is a blurb header.</p>```

<p style="font-weight: bold; font-size: 1em; font-style:italic;">This is a blurb header.</p>


2) Or if you like you can use the markdown header styles:

```# for h1```

```## for h2```

```### for h3```

```#### for h4```

```##### for h5```

## Plot colour schemes

General advice:
1. Use the same colour or colour palette throughout your notebook, unless variety is necessary
2. Select a palette based on the type of data being represented
3. Consider accessibility (colourblindness, low vision)

#### 1) If all of your plots only use 1-2 colors use one of the company style colors:

| Light theme | Dark Theme |
|-----|-----|
|<p style="color:#2af598;">#2af598</p>|<p style="color:#08af64;">#08af64</p>|
|<p style="color:#22e4ac;">#22e4ac</p>|<p style="color:#14a38e;">#14a38e</p>|
|<p style="color:#1bd7bb;">#1bd7bb</p>|<p style="color:#0f9295;">#0f9295</p>|
|<p style="color:#14c9cb;">#14c9cb</p>|<p style="color:#056b8a;">#056b8a</p>|
|<p style="color:#0fbed8;">#0fbed8</p>|<p style="color:#121212;">#121212</p>|
|<p style="color:#08b3e5;">#08b3e5</p>||


#### 2) If your plot needs multiple colors, choose an appropriate palette using either of the following tutorials:
- https://seaborn.pydata.org/tutorial/color_palettes.html
- https://matplotlib.org/stable/tutorials/colors/colormaps.html

#### 3) Consider accessibility as well.

For qualitative plotting Seaborn's 'colorblind' palette is recommended. For maps with sequential or diverging it is recommended to use one of the Color Brewer schemes which can be previewed at https://colorbrewer2.org/.

If you want to design your own colour scheme, it should use the same principles as Cynthia Brewer's research (with variation not only in hue but also, saturation or luminance).

### References

Be sure to acknowledge your sources and any attributions using links or a reference list.

If you have quite a few references, you might wish to have a dedicated section for references at the end of your document, linked using footnote style numbers.

You can connect your in-text reference by adding the number with a HTML link: ```<a href="#fn-1">[1]</a>```

and add a matching ID in the reference list using the ```<fn>``` tag: ```<fn id="fn-1">[1] Author (Year) _Title_, Publisher, Publication location.</fn>```

# Importing Libraries

In [2]:
import warnings
warnings.filterwarnings("ignore")

import requests
import numpy as np
import pandas as pd
from io import StringIO

In [3]:
# business-establishments-and-jobs-data-by-business-size-and-industry

base_url='https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
dataset_id='business-establishments-and-jobs-data-by-business-size-and-industry'


url=f'{base_url}{dataset_id}/exports/csv'
params={'select':'*','limit':-1,'lang':'en','timezone':'UTC' }

response=requests.get(url,params=params)

if response.status_code==200:
    url_content=response.content.decode('utf-8')
    bizsize_industryjobs=pd.read_csv(StringIO(url_content),delimiter=';')   #renaming dataset
    print(bizsize_industryjobs.head(10))
else:
    print(f'Request failed with status code {response.status_code}')

   census_year               clue_small_area  \
0         2015  West Melbourne (Residential)   
1         2015  West Melbourne (Residential)   
2         2015  West Melbourne (Residential)   
3         2015  West Melbourne (Residential)   
4         2015  West Melbourne (Residential)   
5         2015  West Melbourne (Residential)   
6         2014                       Carlton   
7         2014                       Carlton   
8         2014                       Carlton   
9         2014                       Carlton   

                                   anzsic_indusrty  \
0                Health Care and Social Assistance   
1                                    Manufacturing   
2                                    Manufacturing   
3  Professional, Scientific and Technical Services   
4          Rental, Hiring and Real Estate Services   
5                                  Wholesale Trade   
6              Administrative and Support Services   
7                                     C

In [4]:
# business-establishments-with-address-and-industry-classification

base_url='https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
dataset_id='business-establishments-with-address-and-industry-classification'


url=f'{base_url}{dataset_id}/exports/csv'
params={'select':'*','limit':-1,'lang':'en','timezone':'UTC'}

response=requests.get(url,params=params)

if response.status_code==200:
    url_content=response.content.decode('utf-8')
    bizaddressindustry=pd.read_csv(StringIO(url_content),delimiter=';')    #Renaming dataset
    print(bizaddressindustry.head(10))
else:
    print(f'Request failed with status code {response.status_code}')

   census_year  block_id  property_id  base_property_id  clue_small_area  \
0         2003       105       100172            100172  Melbourne (CBD)   
1         2003       105       103301            103301  Melbourne (CBD)   
2         2003       105       103302            103302  Melbourne (CBD)   
3         2003       105       103302            103302  Melbourne (CBD)   
4         2003       105       103302            103302  Melbourne (CBD)   
5         2003       105       103302            103302  Melbourne (CBD)   
6         2003       105       103302            103302  Melbourne (CBD)   
7         2003       105       103302            103302  Melbourne (CBD)   
8         2003       105       103302            103302  Melbourne (CBD)   
9         2003       105       109319            109319  Melbourne (CBD)   

                                        trading_name  \
0                           Wilson Parking Australia   
1                Melbourne International Backpacker

In [5]:
# Street-address

base_url='https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
dataset_id='street-addresses'


url=f'{base_url}{dataset_id}/exports/csv'
params={'select':'*','limit':-1,'lang':'en','timezone':'UTC'}

response=requests.get(url,params=params)

if response.status_code==200:
    url_content=response.content.decode('utf-8')
    street_add=pd.read_csv(StringIO(url_content),delimiter=';')  #renaming dataset
    print(street_add.head(10))
else:
    print(f'Request failed with status code {response.status_code}')

                         geo_point_2d  \
0  -37.802381557572, 144.941473440919   
1  -37.816860132435, 144.969991449806   
2  -37.798830275265, 144.942872100233   
3  -37.810546771396, 144.970906397029   
4  -37.789293657657, 144.939794028368   
5  -37.800782747366, 144.951363910142   
6  -37.826775193662, 144.959358160779   
7  -37.810925617843, 144.965443591832   
8   -37.81275302482, 144.964263891172   
9   -37.821209833971, 144.95403377339   

                                           geo_shape  suburb_id   latitude  \
0  {"coordinates": [144.941473440919, -37.8023815...      592.0 -37.802382   
1  {"coordinates": [144.969991449806, -37.8168601...      591.0 -37.816860   
2  {"coordinates": [144.942872100233, -37.7988302...      592.0 -37.798830   
3  {"coordinates": [144.970906397029, -37.8105467...      591.0 -37.810547   
4  {"coordinates": [144.939794028368, -37.7892936...      592.0 -37.789294   
5  {"coordinates": [144.951363910142, -37.8007827...      592.0 -37.800783   
6 

# 1. Preprocessing

## Street_address dataset 

A snippet of the street addresses dataset.

In [6]:
street_add

Unnamed: 0,geo_point_2d,geo_shape,suburb_id,latitude,street_no,str_name,address_pnt,easting,northing,gisid,longitude,suburb,street_id,add_comp
0,"-37.802381557572, 144.941473440919","{""coordinates"": [144.941473440919, -37.8023815...",592.0,-37.802382,133,Laurens Street,133 Laurens Street North Melbourne,318773.161972,5.814115e+06,48531,144.941473,North Melbourne,781,
1,"-37.816860132435, 144.969991449806","{""coordinates"": [144.969991449806, -37.8168601...",591.0,-37.816860,129,Flinders Street,129 Flinders Street Melbourne,321318.983104,5.812563e+06,37711,144.969991,Melbourne,636,
2,"-37.798830275265, 144.942872100233","{""coordinates"": [144.942872100233, -37.7988302...",592.0,-37.798830,44,Macaulay Road,44 Macaulay Road North Melbourne,318887.633593,5.814512e+06,30476,144.942872,North Melbourne,847,
3,"-37.810546771396, 144.970906397029","{""coordinates"": [144.970906397029, -37.8105467...",591.0,-37.810547,13,Punch Lane,13 Punch Lane Melbourne,321384.307636,5.813265e+06,35165,144.970906,Melbourne,1003,
4,"-37.789293657657, 144.939794028368","{""coordinates"": [144.939794028368, -37.7892936...",592.0,-37.789294,61,Racecourse Road,61 Racecourse Road North Melbourne,318593.277492,5.815564e+06,22247,144.939794,North Melbourne,119624,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
63716,"-37.799002138708, 144.974875728308","{""coordinates"": [144.974875728308, -37.7990021...",0.0,-37.799002,190,Nicholson Street,190 Nicholson Street,321705.987201,5.814554e+06,23688,144.974876,,931,
63717,"-37.812386139394, 144.97001218945","{""coordinates"": [144.97001218945, -37.81238613...",591.0,-37.812386,109,Bourke Street,109 Bourke Street Melbourne,321310.019913,5.813060e+06,16739,144.970012,Melbourne,455,
63718,"-37.790814400126, 144.928709232409","{""coordinates"": [144.928709232409, -37.7908144...",590.0,-37.790814,23,McConnell Street,23 McConnell Street Kensington,317620.864400,5.815374e+06,27952,144.928709,Kensington,119634,
63719,"-37.793946584931, 144.92906362331","{""coordinates"": [144.92906362331, -37.79394658...",590.0,-37.793947,516A,Macaulay Road,516A Macaulay Road Kensington,317659.774071,5.815027e+06,40114,144.929064,Kensington,847,


In [7]:
street_add.info()
print()
print(f'The shape of the dataset : {street_add.shape}')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 63721 entries, 0 to 63720
Data columns (total 14 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   geo_point_2d  63721 non-null  object 
 1   geo_shape     63721 non-null  object 
 2   suburb_id     63715 non-null  float64
 3   latitude      63721 non-null  float64
 4   street_no     63069 non-null  object 
 5   str_name      63720 non-null  object 
 6   address_pnt   63721 non-null  object 
 7   easting       63721 non-null  float64
 8   northing      63721 non-null  float64
 9   gisid         63721 non-null  int64  
 10  longitude     63721 non-null  float64
 11  suburb        62991 non-null  object 
 12  street_id     63721 non-null  int64  
 13  add_comp      1352 non-null   object 
dtypes: float64(5), int64(2), object(7)
memory usage: 6.8+ MB

The shape of the dataset : (63721, 14)


The dataset has a total of 63,721 rows and 14 columns with different datatypes.

In [8]:
street_add.isna().sum()

geo_point_2d        0
geo_shape           0
suburb_id           6
latitude            0
street_no         652
str_name            1
address_pnt         0
easting             0
northing            0
gisid               0
longitude           0
suburb            730
street_id           0
add_comp        62369
dtype: int64

The street_address dataset has a lot of missing values especially in add_comp column with 62,369 missing values, along with suburb and street_no missing a total of 652 and 730 values respectively. At last str_name missing only 1 value in its entry. 

In [9]:
street_add_comp = street_add[street_add['add_comp'].notnull()]

street_add_comp

Unnamed: 0,geo_point_2d,geo_shape,suburb_id,latitude,street_no,str_name,address_pnt,easting,northing,gisid,longitude,suburb,street_id,add_comp
13,"-37.814712070784, 144.96474438094","{""coordinates"": [144.96474438094, -37.81471207...",591.0,-37.814712,317,Bourke Street,Shop 23-27 317 Bourke Street Melbourne,320851.894758,5.812791e+06,48286,144.964744,Melbourne,455,Shop 23-27
45,"-37.819498016204, 144.987284407959","{""coordinates"": [144.987284407959, -37.8194980...",587.0,-37.819498,,Vale Street South,Substation 46 Vale Street South East Melbourne,322847.571365,5.812303e+06,40880,144.987284,East Melbourne,1148,Substation 46
53,"-37.823899191969, 144.946073083263","{""coordinates"": [144.946073083263, -37.8238991...",599.0,-37.823899,,River Esplanade,"Berth 27, Marina YE River Esplanade Docklands",319230.644967,5.811736e+06,42376,144.946073,Docklands,117906,"Berth 27, Marina YE"
193,"-37.796884374544, 144.960863447326","{""coordinates"": [144.960863447326, -37.7968843...",593.0,-37.796884,,Grattan Street,"Shop 7, Ground Union House Building 130, Unive...",320467.073318,5.814762e+06,39649,144.960863,Parkville,674,"Shop 7, Ground Union House Building 130, Unive..."
251,"-37.812389009429, 144.966003397369","{""coordinates"": [144.966003397369, -37.8123890...",591.0,-37.812389,236,Bourke Street,Shop 18 236 Bourke Street Melbourne,320957.115184,5.813052e+06,40904,144.966003,Melbourne,455,Shop 18
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
63441,"-37.824747518381, 144.953365237445","{""coordinates"": [144.953365237445, -37.8247475...",840.0,-37.824748,30,Lorimer Street,Melbourne Maritime Museum 30 Lorimer Street S...,319874.570376,5.811656e+06,41202,144.953365,South Wharf,1370,Melbourne Maritime Museum
63531,"-37.798935727815, 144.923622676394","{""coordinates"": [144.923622676394, -37.7989357...",590.0,-37.798936,1A,Childers Street,The Bill Vanina Sports Pavilion 1A Childers St...,317192.962512,5.814462e+06,42710,144.923623,Kensington,507,The Bill Vanina Sports Pavilion
63623,"-37.813068925214, 144.937529050498","{""coordinates"": [144.937529050498, -37.8130689...",599.0,-37.813069,,Star Crescent,Ground 27 & 29 Star Crescent Docklands,318452.052502,5.812921e+06,48459,144.937529,Docklands,119760,Ground 27 & 29
63651,"-37.823699932356, 144.943722829464","{""coordinates"": [144.943722829464, -37.8236999...",599.0,-37.823700,,River Esplanade,"Berth 122, Marina YE River Esplanade Docklands",319023.286356,5.811754e+06,42212,144.943723,Docklands,117906,"Berth 122, Marina YE"


With more than half of the values missing in the add_comp column dropped off.


In [10]:
street_add=street_add.drop('add_comp',axis=1)

Dropping of multiple rows with null values in the dataset. Additionally, drooping of colums which are reduntant such as 'geo_point_2d' and 'geo_shape'.

In [11]:
street_add.isna().sum()


geo_point_2d      0
geo_shape         0
suburb_id         6
latitude          0
street_no       652
str_name          1
address_pnt       0
easting           0
northing          0
gisid             0
longitude         0
suburb          730
street_id         0
dtype: int64

In [12]:
street_add

Unnamed: 0,geo_point_2d,geo_shape,suburb_id,latitude,street_no,str_name,address_pnt,easting,northing,gisid,longitude,suburb,street_id
0,"-37.802381557572, 144.941473440919","{""coordinates"": [144.941473440919, -37.8023815...",592.0,-37.802382,133,Laurens Street,133 Laurens Street North Melbourne,318773.161972,5.814115e+06,48531,144.941473,North Melbourne,781
1,"-37.816860132435, 144.969991449806","{""coordinates"": [144.969991449806, -37.8168601...",591.0,-37.816860,129,Flinders Street,129 Flinders Street Melbourne,321318.983104,5.812563e+06,37711,144.969991,Melbourne,636
2,"-37.798830275265, 144.942872100233","{""coordinates"": [144.942872100233, -37.7988302...",592.0,-37.798830,44,Macaulay Road,44 Macaulay Road North Melbourne,318887.633593,5.814512e+06,30476,144.942872,North Melbourne,847
3,"-37.810546771396, 144.970906397029","{""coordinates"": [144.970906397029, -37.8105467...",591.0,-37.810547,13,Punch Lane,13 Punch Lane Melbourne,321384.307636,5.813265e+06,35165,144.970906,Melbourne,1003
4,"-37.789293657657, 144.939794028368","{""coordinates"": [144.939794028368, -37.7892936...",592.0,-37.789294,61,Racecourse Road,61 Racecourse Road North Melbourne,318593.277492,5.815564e+06,22247,144.939794,North Melbourne,119624
...,...,...,...,...,...,...,...,...,...,...,...,...,...
63716,"-37.799002138708, 144.974875728308","{""coordinates"": [144.974875728308, -37.7990021...",0.0,-37.799002,190,Nicholson Street,190 Nicholson Street,321705.987201,5.814554e+06,23688,144.974876,,931
63717,"-37.812386139394, 144.97001218945","{""coordinates"": [144.97001218945, -37.81238613...",591.0,-37.812386,109,Bourke Street,109 Bourke Street Melbourne,321310.019913,5.813060e+06,16739,144.970012,Melbourne,455
63718,"-37.790814400126, 144.928709232409","{""coordinates"": [144.928709232409, -37.7908144...",590.0,-37.790814,23,McConnell Street,23 McConnell Street Kensington,317620.864400,5.815374e+06,27952,144.928709,Kensington,119634
63719,"-37.793946584931, 144.92906362331","{""coordinates"": [144.92906362331, -37.79394658...",590.0,-37.793947,516A,Macaulay Road,516A Macaulay Road Kensington,317659.774071,5.815027e+06,40114,144.929064,Kensington,847


In [13]:
street_add=street_add.dropna()

In [14]:
street_add.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 62338 entries, 0 to 63720
Data columns (total 13 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   geo_point_2d  62338 non-null  object 
 1   geo_shape     62338 non-null  object 
 2   suburb_id     62338 non-null  float64
 3   latitude      62338 non-null  float64
 4   street_no     62338 non-null  object 
 5   str_name      62338 non-null  object 
 6   address_pnt   62338 non-null  object 
 7   easting       62338 non-null  float64
 8   northing      62338 non-null  float64
 9   gisid         62338 non-null  int64  
 10  longitude     62338 non-null  float64
 11  suburb        62338 non-null  object 
 12  street_id     62338 non-null  int64  
dtypes: float64(5), int64(2), object(6)
memory usage: 6.7+ MB


In [15]:
street_add['suburb_id']= street_add['suburb_id'].astype(int)

In [16]:
street_add = street_add.drop(['geo_point_2d','geo_shape'],axis=1)

In [17]:
street_add

Unnamed: 0,suburb_id,latitude,street_no,str_name,address_pnt,easting,northing,gisid,longitude,suburb,street_id
0,592,-37.802382,133,Laurens Street,133 Laurens Street North Melbourne,318773.161972,5.814115e+06,48531,144.941473,North Melbourne,781
1,591,-37.816860,129,Flinders Street,129 Flinders Street Melbourne,321318.983104,5.812563e+06,37711,144.969991,Melbourne,636
2,592,-37.798830,44,Macaulay Road,44 Macaulay Road North Melbourne,318887.633593,5.814512e+06,30476,144.942872,North Melbourne,847
3,591,-37.810547,13,Punch Lane,13 Punch Lane Melbourne,321384.307636,5.813265e+06,35165,144.970906,Melbourne,1003
4,592,-37.789294,61,Racecourse Road,61 Racecourse Road North Melbourne,318593.277492,5.815564e+06,22247,144.939794,North Melbourne,119624
...,...,...,...,...,...,...,...,...,...,...,...
63715,597,-37.809485,374,Footscray Road,374 Footscray Road West Melbourne,318370.695400,5.813317e+06,36273,144.936705,West Melbourne,640
63717,591,-37.812386,109,Bourke Street,109 Bourke Street Melbourne,321310.019913,5.813060e+06,16739,144.970012,Melbourne,455
63718,590,-37.790814,23,McConnell Street,23 McConnell Street Kensington,317620.864400,5.815374e+06,27952,144.928709,Kensington,119634
63719,590,-37.793947,516A,Macaulay Road,516A Macaulay Road Kensington,317659.774071,5.815027e+06,40114,144.929064,Kensington,847


In [18]:
street_add['suburb'].unique()

array(['North Melbourne', 'Melbourne', 'Southbank', 'Docklands',
       'West Melbourne', 'Parkville', 'Carlton', 'East Melbourne',
       'Kensington', 'Port Melbourne', 'Flemington', 'South Yarra',
       'Carlton North', 'South Wharf'], dtype=object)

In [19]:
bizaddressindustry.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 374210 entries, 0 to 374209
Data columns (total 11 columns):
 #   Column                        Non-Null Count   Dtype  
---  ------                        --------------   -----  
 0   census_year                   374210 non-null  int64  
 1   block_id                      374210 non-null  int64  
 2   property_id                   374210 non-null  int64  
 3   base_property_id              374210 non-null  int64  
 4   clue_small_area               374210 non-null  object 
 5   trading_name                  374083 non-null  object 
 6   business_address              374209 non-null  object 
 7   industry_anzsic4_code         374210 non-null  int64  
 8   industry_anzsic4_description  374210 non-null  object 
 9   longitude                     369425 non-null  float64
 10  latitude                      369425 non-null  float64
dtypes: float64(2), int64(5), object(4)
memory usage: 31.4+ MB


In [20]:
bizaddressindustry.isna().sum()

census_year                        0
block_id                           0
property_id                        0
base_property_id                   0
clue_small_area                    0
trading_name                     127
business_address                   1
industry_anzsic4_code              0
industry_anzsic4_description       0
longitude                       4785
latitude                        4785
dtype: int64

In [21]:
bizaddressindustry = bizaddressindustry.dropna(subset=['longitude', 'latitude'])

bizaddressindustry

Unnamed: 0,census_year,block_id,property_id,base_property_id,clue_small_area,trading_name,business_address,industry_anzsic4_code,industry_anzsic4_description,longitude,latitude
0,2003,105,100172,100172,Melbourne (CBD),Wilson Parking Australia,24-46 A'Beckett Street MELBOURNE 3000,9533,Parking Services,144.962053,-37.808573
1,2003,105,103301,103301,Melbourne (CBD),Melbourne International Backpackers,442-450 Elizabeth Street MELBOURNE 3000,4400,Accommodation,144.960868,-37.808309
2,2003,105,103302,103302,Melbourne (CBD),Vacant,422-440 Elizabeth Street MELBOURNE 3000,0,Vacant Space,144.961017,-37.808630
3,2003,105,103302,103302,Melbourne (CBD),The Garden Cafe,"Shop 3, Ground , 422-440 Elizabeth Street MELB...",4511,Cafes and Restaurants,144.961017,-37.808630
4,2003,105,103302,103302,Melbourne (CBD),Telephony Australia,"Shop 5, Ground , 422-440 Elizabeth Street MELB...",5809,Other Telecommunications Services,144.961017,-37.808630
...,...,...,...,...,...,...,...,...,...,...,...
374205,2017,266,106082,106082,Carlton,RMIT University (BLD 42) (Ericson Building),36-40 Lygon Street CARLTON 3053,8102,Higher Education,144.965375,-37.805471
374206,2017,266,106082,106082,Carlton,RMIT University (BLD 95),24-26 Lygon Street CARLTON 3053,8102,Higher Education,144.965375,-37.805471
374207,2017,266,107083,107083,Carlton,RMIT University,11-13 Orr Street CARLTON 3053,8102,Higher Education,144.965017,-37.806389
374208,2017,266,107087,107087,Carlton,Vacant,8-14 Orr Street CARLTON 3053,0,Vacant Space,144.965370,-37.806513


In [22]:
bizaddressindustry.isna().sum()

census_year                      0
block_id                         0
property_id                      0
base_property_id                 0
clue_small_area                  0
trading_name                    95
business_address                 1
industry_anzsic4_code            0
industry_anzsic4_description     0
longitude                        0
latitude                         0
dtype: int64

In [23]:
missing_trading_name= bizaddressindustry[bizaddressindustry['trading_name'].isna()]

missing_trading_name


Unnamed: 0,census_year,block_id,property_id,base_property_id,clue_small_area,trading_name,business_address,industry_anzsic4_code,industry_anzsic4_description,longitude,latitude
1078,2003,440,105335,105335,West Melbourne (Residential),,483-485 King Street WEST MELBOURNE 3003,0,Vacant Space,144.951599,-37.808715
8951,2007,784,599856,110597,Port Melbourne,,604-608 Lorimer Street FISHERMANS BEND 3207,0,Vacant Space,144.918852,-37.822285
10779,2008,604,105008,105008,East Melbourne,,21-25 Hayes Lane EAST MELBOURNE 3002,0,Vacant Space,144.990008,-37.814132
11437,2008,790,613949,613949,Port Melbourne,,59-65 Wharf Road FISHERMANS BEND 3207,0,Vacant Space,144.908895,-37.827066
11438,2008,790,613950,613950,Port Melbourne,,67-77 Wharf Road FISHERMANS BEND 3207,0,Vacant Space,144.909513,-37.826700
...,...,...,...,...,...,...,...,...,...,...,...
337703,2006,784,599856,110597,Port Melbourne,,604-608 Lorimer Street FISHERMANS BEND 3207,0,Vacant Space,144.918852,-37.822285
349175,2002,1001,557164,557164,West Melbourne (Industrial),,46-48 MacKenzie Road WEST MELBOURNE 3003,5309,Other Warehousing and Storage Services,144.907314,-37.812934
349290,2002,1014,108123,108123,West Melbourne (Industrial),,5 Radcliffe Street WEST MELBOURNE 3003,0,Vacant Space,144.933234,-37.801884
362361,2003,855,103657,103657,South Yarra,,12 Fairlie Court SOUTH YARRA 3141,0,Vacant Space,144.984804,-37.832461


In [24]:
bizaddressindustry.loc[(bizaddressindustry['industry_anzsic4_code'] == 0) & (bizaddressindustry['industry_anzsic4_description'] == 'Vacant Space'), 'trading_name'] = bizaddressindustry.loc[(bizaddressindustry['industry_anzsic4_code'] == 0) & (bizaddressindustry['industry_anzsic4_description'] == 'Vacant Space'), 'trading_name'].fillna('Vacant')

bizaddressindustry


Unnamed: 0,census_year,block_id,property_id,base_property_id,clue_small_area,trading_name,business_address,industry_anzsic4_code,industry_anzsic4_description,longitude,latitude
0,2003,105,100172,100172,Melbourne (CBD),Wilson Parking Australia,24-46 A'Beckett Street MELBOURNE 3000,9533,Parking Services,144.962053,-37.808573
1,2003,105,103301,103301,Melbourne (CBD),Melbourne International Backpackers,442-450 Elizabeth Street MELBOURNE 3000,4400,Accommodation,144.960868,-37.808309
2,2003,105,103302,103302,Melbourne (CBD),Vacant,422-440 Elizabeth Street MELBOURNE 3000,0,Vacant Space,144.961017,-37.808630
3,2003,105,103302,103302,Melbourne (CBD),The Garden Cafe,"Shop 3, Ground , 422-440 Elizabeth Street MELB...",4511,Cafes and Restaurants,144.961017,-37.808630
4,2003,105,103302,103302,Melbourne (CBD),Telephony Australia,"Shop 5, Ground , 422-440 Elizabeth Street MELB...",5809,Other Telecommunications Services,144.961017,-37.808630
...,...,...,...,...,...,...,...,...,...,...,...
374205,2017,266,106082,106082,Carlton,RMIT University (BLD 42) (Ericson Building),36-40 Lygon Street CARLTON 3053,8102,Higher Education,144.965375,-37.805471
374206,2017,266,106082,106082,Carlton,RMIT University (BLD 95),24-26 Lygon Street CARLTON 3053,8102,Higher Education,144.965375,-37.805471
374207,2017,266,107083,107083,Carlton,RMIT University,11-13 Orr Street CARLTON 3053,8102,Higher Education,144.965017,-37.806389
374208,2017,266,107087,107087,Carlton,Vacant,8-14 Orr Street CARLTON 3053,0,Vacant Space,144.965370,-37.806513


In [25]:
bizaddressindustry.isna().sum()

census_year                      0
block_id                         0
property_id                      0
base_property_id                 0
clue_small_area                  0
trading_name                    18
business_address                 1
industry_anzsic4_code            0
industry_anzsic4_description     0
longitude                        0
latitude                         0
dtype: int64

In [26]:
bizaddressindustry['trading_name']= bizaddressindustry['trading_name'].fillna('no_name')

bizaddressindustry

Unnamed: 0,census_year,block_id,property_id,base_property_id,clue_small_area,trading_name,business_address,industry_anzsic4_code,industry_anzsic4_description,longitude,latitude
0,2003,105,100172,100172,Melbourne (CBD),Wilson Parking Australia,24-46 A'Beckett Street MELBOURNE 3000,9533,Parking Services,144.962053,-37.808573
1,2003,105,103301,103301,Melbourne (CBD),Melbourne International Backpackers,442-450 Elizabeth Street MELBOURNE 3000,4400,Accommodation,144.960868,-37.808309
2,2003,105,103302,103302,Melbourne (CBD),Vacant,422-440 Elizabeth Street MELBOURNE 3000,0,Vacant Space,144.961017,-37.808630
3,2003,105,103302,103302,Melbourne (CBD),The Garden Cafe,"Shop 3, Ground , 422-440 Elizabeth Street MELB...",4511,Cafes and Restaurants,144.961017,-37.808630
4,2003,105,103302,103302,Melbourne (CBD),Telephony Australia,"Shop 5, Ground , 422-440 Elizabeth Street MELB...",5809,Other Telecommunications Services,144.961017,-37.808630
...,...,...,...,...,...,...,...,...,...,...,...
374205,2017,266,106082,106082,Carlton,RMIT University (BLD 42) (Ericson Building),36-40 Lygon Street CARLTON 3053,8102,Higher Education,144.965375,-37.805471
374206,2017,266,106082,106082,Carlton,RMIT University (BLD 95),24-26 Lygon Street CARLTON 3053,8102,Higher Education,144.965375,-37.805471
374207,2017,266,107083,107083,Carlton,RMIT University,11-13 Orr Street CARLTON 3053,8102,Higher Education,144.965017,-37.806389
374208,2017,266,107087,107087,Carlton,Vacant,8-14 Orr Street CARLTON 3053,0,Vacant Space,144.965370,-37.806513


In [27]:
bizaddressindustry = bizaddressindustry.dropna()

In [28]:
bizaddressindustry.isna().sum()

census_year                     0
block_id                        0
property_id                     0
base_property_id                0
clue_small_area                 0
trading_name                    0
business_address                0
industry_anzsic4_code           0
industry_anzsic4_description    0
longitude                       0
latitude                        0
dtype: int64

In [29]:
bizaddressindustry


Unnamed: 0,census_year,block_id,property_id,base_property_id,clue_small_area,trading_name,business_address,industry_anzsic4_code,industry_anzsic4_description,longitude,latitude
0,2003,105,100172,100172,Melbourne (CBD),Wilson Parking Australia,24-46 A'Beckett Street MELBOURNE 3000,9533,Parking Services,144.962053,-37.808573
1,2003,105,103301,103301,Melbourne (CBD),Melbourne International Backpackers,442-450 Elizabeth Street MELBOURNE 3000,4400,Accommodation,144.960868,-37.808309
2,2003,105,103302,103302,Melbourne (CBD),Vacant,422-440 Elizabeth Street MELBOURNE 3000,0,Vacant Space,144.961017,-37.808630
3,2003,105,103302,103302,Melbourne (CBD),The Garden Cafe,"Shop 3, Ground , 422-440 Elizabeth Street MELB...",4511,Cafes and Restaurants,144.961017,-37.808630
4,2003,105,103302,103302,Melbourne (CBD),Telephony Australia,"Shop 5, Ground , 422-440 Elizabeth Street MELB...",5809,Other Telecommunications Services,144.961017,-37.808630
...,...,...,...,...,...,...,...,...,...,...,...
374205,2017,266,106082,106082,Carlton,RMIT University (BLD 42) (Ericson Building),36-40 Lygon Street CARLTON 3053,8102,Higher Education,144.965375,-37.805471
374206,2017,266,106082,106082,Carlton,RMIT University (BLD 95),24-26 Lygon Street CARLTON 3053,8102,Higher Education,144.965375,-37.805471
374207,2017,266,107083,107083,Carlton,RMIT University,11-13 Orr Street CARLTON 3053,8102,Higher Education,144.965017,-37.806389
374208,2017,266,107087,107087,Carlton,Vacant,8-14 Orr Street CARLTON 3053,0,Vacant Space,144.965370,-37.806513


Correcting datatypes 
1. suburb_id from float to int 
2. street no fro object to int 


# biz

In [30]:
bizsize_industryjobs

Unnamed: 0,census_year,clue_small_area,anzsic_indusrty,clue_industry,business_size,total_establishments,total_jobs
0,2015,West Melbourne (Residential),Health Care and Social Assistance,Health Care and Social Assistance,Large business,1,
1,2015,West Melbourne (Residential),Manufacturing,Manufacturing,Medium business,5,171.0
2,2015,West Melbourne (Residential),Manufacturing,Manufacturing,Non employing,1,
3,2015,West Melbourne (Residential),"Professional, Scientific and Technical Services",Business Services,Non employing,3,0.0
4,2015,West Melbourne (Residential),"Rental, Hiring and Real Estate Services",Real Estate Services,Small business,5,42.0
...,...,...,...,...,...,...,...
14687,2008,Kensington,"Rental, Hiring and Real Estate Services",Real Estate Services,Medium business,1,
14688,2008,Kensington,Retail Trade,Retail Trade,Medium business,2,
14689,2008,Kensington,"Transport, Postal and Warehousing","Transport, Postal and Storage",Medium business,3,118.0
14690,2008,Kensington,"Transport, Postal and Warehousing","Transport, Postal and Storage",Small business,8,32.0


In [31]:
bizsize_industryjobs.isna().sum()

census_year                0
clue_small_area            0
anzsic_indusrty            0
clue_industry              0
business_size              0
total_establishments       0
total_jobs              4327
dtype: int64

In [32]:
bizsize_industryjobs.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14692 entries, 0 to 14691
Data columns (total 7 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   census_year           14692 non-null  int64  
 1   clue_small_area       14692 non-null  object 
 2   anzsic_indusrty       14692 non-null  object 
 3   clue_industry         14692 non-null  object 
 4   business_size         14692 non-null  object 
 5   total_establishments  14692 non-null  int64  
 6   total_jobs            10365 non-null  float64
dtypes: float64(1), int64(2), object(4)
memory usage: 803.6+ KB


In [33]:
bizsize_industryjobs['total_jobs'] = bizsize_industryjobs['total_jobs'].fillna(0)

bizsize_industryjobs.isna().sum()

census_year             0
clue_small_area         0
anzsic_indusrty         0
clue_industry           0
business_size           0
total_establishments    0
total_jobs              0
dtype: int64

In [34]:
bizsize_industryjobs['total_jobs']=bizsize_industryjobs['total_jobs'].astype('int')

bizsize_industryjobs.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14692 entries, 0 to 14691
Data columns (total 7 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   census_year           14692 non-null  int64 
 1   clue_small_area       14692 non-null  object
 2   anzsic_indusrty       14692 non-null  object
 3   clue_industry         14692 non-null  object
 4   business_size         14692 non-null  object
 5   total_establishments  14692 non-null  int64 
 6   total_jobs            14692 non-null  int32 
dtypes: int32(1), int64(2), object(4)
memory usage: 746.2+ KB


# descriptive statistics 

In [35]:
street_add

Unnamed: 0,suburb_id,latitude,street_no,str_name,address_pnt,easting,northing,gisid,longitude,suburb,street_id
0,592,-37.802382,133,Laurens Street,133 Laurens Street North Melbourne,318773.161972,5.814115e+06,48531,144.941473,North Melbourne,781
1,591,-37.816860,129,Flinders Street,129 Flinders Street Melbourne,321318.983104,5.812563e+06,37711,144.969991,Melbourne,636
2,592,-37.798830,44,Macaulay Road,44 Macaulay Road North Melbourne,318887.633593,5.814512e+06,30476,144.942872,North Melbourne,847
3,591,-37.810547,13,Punch Lane,13 Punch Lane Melbourne,321384.307636,5.813265e+06,35165,144.970906,Melbourne,1003
4,592,-37.789294,61,Racecourse Road,61 Racecourse Road North Melbourne,318593.277492,5.815564e+06,22247,144.939794,North Melbourne,119624
...,...,...,...,...,...,...,...,...,...,...,...
63715,597,-37.809485,374,Footscray Road,374 Footscray Road West Melbourne,318370.695400,5.813317e+06,36273,144.936705,West Melbourne,640
63717,591,-37.812386,109,Bourke Street,109 Bourke Street Melbourne,321310.019913,5.813060e+06,16739,144.970012,Melbourne,455
63718,590,-37.790814,23,McConnell Street,23 McConnell Street Kensington,317620.864400,5.815374e+06,27952,144.928709,Kensington,119634
63719,590,-37.793947,516A,Macaulay Road,516A Macaulay Road Kensington,317659.774071,5.815027e+06,40114,144.929064,Kensington,847


In [36]:
street_add.describe()

Unnamed: 0,suburb_id,latitude,easting,northing,gisid,longitude,street_id
count,62338.0,62338.0,62338.0,62338.0,62338.0,62338.0,62338.0
mean,592.215182,-37.807994,319876.303332,5813516.0,31890.384517,144.953849,14093.98476
std,12.703847,0.013165,1677.2129,1446.715,18522.850397,0.018903,37228.804331
min,585.0,-37.850544,315203.832432,5808852.0,1.0,144.900124,356.0
25%,590.0,-37.815734,318803.100355,5812674.0,15639.25,144.941935,618.0
50%,591.0,-37.8078,320119.734723,5813536.0,31952.5,144.956616,897.0
75%,594.0,-37.797548,321044.381287,5814660.0,48101.75,144.967056,1179.0
max,840.0,-37.775595,323158.202002,5817084.0,63721.0,144.991066,120186.0
