# Exploring indicators

This notebook explores the existence of Yi Peng's indicators in ABS and id_population sources.
Then we will try to segment by type and see if we can extrapolate the data.

## Function definitions and django manage

In [2]:
import os

In [4]:
import pandas
import seaborn as sns
import matplotlib.pyplot as plt

import os
import pandas as pd

os.chdir('../../deciml_django/')

os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'deciml.settings')

from deciml.apps.db.models import *
from deciml.apps.common.utils import get_actuals

In [5]:
def sources_that_contain_indicator(indicator):
    """
    indicator should be the minimum you want to find. indicator='Hoc' would
    match both Hockey and ChOcOlate.
    """
    df = pd.DataFrame(Indicator.objects.filter(indicator_info__name__icontains=indicator)\
                               .values('indicator_info__id',
                                       'indicator_info__name', 
                                       'indicator_info__description',
                                       'source__id'))
    
    df = df.drop_duplicates()
    diff_sources = df.source__id.unique()
    diff_ind = df.indicator_info__name.unique()
    
    print('* Different indicators found ({ndi}): {di}'.format(ndi=len(diff_ind), di=diff_ind))
    print('* Different sources found ({nds}): {ds}'.format(nds=len(diff_sources), ds=diff_sources))
    return df

In [6]:
def indicators_from_source(source):
    df = pd.DataFrame(Indicator.objects.filter(source__id=source)\
                               .distinct('indicator_info__name')\
                               .values('indicator_info__name', 
                                       'indicator_info__description'))
    return df

In [7]:
def values_by_indicator_and_source(source, indicator):

    df = pd.DataFrame(

           Actual.objects.filter(indicator__source__id=source,
                                 indicator__indicator_info__name=indicator)\
                 .values('date', 'value', 'indicator__location__name')
    )
    
    return df


#### Target indicators

- [x] state population (ID_POPUL:Population, ABS:Population)
    * Poor time depth
- [x] city population (ID_POPUL:Population --> Description is wrong)
- [x] GDP (ABS:Gross domestic product)
    * Whole state
- [x] household income (ABS:Gross Disposable Income)
    * Whole state
- [ ] household indebtedness

#### Extra indicators

- [X] Unemployment Tate (ABS:unemployment rate)

## Exploring indicators and sources

We want to check if Yi Peng's indicators are available in several sources. Then we will check which sources have data about the vacancy/occupation, the cap_rate/yield or the specialty sales.

### Indicators from ID_POPUL and ABS

In [8]:
indicators_from_source('ID_POPUL')

Unnamed: 0,indicator_info__description,indicator_info__name
0,Average people per household,Average household size
1,Percentage of dwelling occupancy,Dwelling occupancy rate
2,Number of dwellings,Dwellings
3,Number of households,Households
4,Number of persons living in a state,Population
5,Population in non private dwellings,Population in non private dwellings


In [9]:
indicators_from_source('ABS')

Unnamed: 0,indicator_info__description,indicator_info__name
0,total value of approved service sector buildings,building approved
1,gross value of service sector businesses,business gross
2,total value of income by service sector busine...,business income
3,value lending to service sector businesses,business lending
4,Female to male ratio of people living in a state,Female to Male ratio
5,Household gross disposable income,Gross Disposable Income
6,Gross domestic product,Gross domestic product
7,Gross State Product,Gross State Product
8,Average number of persons that live per household,Household Size Average
9,Number of persons moving into/out of a country...,Net Overseas Migration


In [10]:
#Population by state
values_by_indicator_and_source('ABS', 'Population')

Unnamed: 0,date,indicator__location__name,value
0,2015-12-31,New South Wales Australia,7616168.0
1,2016-12-31,New South Wales Australia,7732858.0
2,2017-12-31,New South Wales Australia,7867936.0
3,2018-12-31,New South Wales Australia,7988241.0
4,2013-12-31,New South Wales Australia,7404032.0
5,2014-12-31,New South Wales Australia,7508353.0
6,2013-12-31,Victoria Australia,5772669.0
7,2014-12-31,Victoria Australia,5894917.0
8,2015-12-31,Victoria Australia,6022322.0
9,2016-12-31,Victoria Australia,6173172.0


In [11]:
#Population by city
values_by_indicator_and_source('ID_POPUL', 'Population').sort_values(by='date')

Unnamed: 0,date,indicator__location__name,value
2239,2008-06-30,SOUTH BURNETT Australia,30583.0
1062,2008-06-30,GLENELG Australia,20020.0
252,2008-06-30,BALLARAT Australia,89531.0
2283,2008-06-30,SOUTHERN DOWNS Australia,33451.0
1076,2008-06-30,GLENORCHY Australia,44542.0
247,2008-06-30,AUGUSTA MARGARET RIVER Australia,11522.0
2591,2008-06-30,WAGGA WAGGA Australia,60329.0
1084,2008-06-30,GOLD COAST Australia,481569.0
1095,2008-06-30,GOLDEN PLAINS Australia,17456.0
2272,2008-06-30,SOUTH WEST GROUP Australia,340806.0


In [12]:
#GDP by state??
values_by_indicator_and_source('ABS', 'Gross domestic product')

Unnamed: 0,date,indicator__location__name,value
0,1977-12-01,Australia,520030.0
1,1976-12-01,Australia,513066.0
2,1975-12-01,Australia,491687.0
3,1974-12-01,Australia,482157.0
4,1973-12-01,Australia,474983.0
5,1972-12-01,Australia,452270.0
6,1971-12-01,Australia,443965.0
7,1970-12-01,Australia,427620.0
8,1969-12-01,Australia,400444.0
9,1968-12-01,Australia,375292.0


In [13]:
# Gross Disposable Income by state??
values_by_indicator_and_source('ABS', 'Gross Disposable Income')

Unnamed: 0,date,indicator__location__name,value
0,2002-06-01,Australia,1137030.0
1,2001-06-01,Australia,1062416.0
2,2000-06-01,Australia,984253.0
3,1999-06-01,Australia,925488.0
4,1998-06-01,Australia,883062.0
5,1997-06-01,Australia,849097.0
6,1996-06-01,Australia,803289.0
7,1995-06-01,Australia,752868.0
8,1994-06-01,Australia,708298.0
9,1993-06-01,Australia,679594.0


In [14]:
# Unemployment rate by state
values_by_indicator_and_source('ABS', 'unemployment rate')

Unnamed: 0,date,indicator__location__name,value
0,2014-12-01,Australia,6.100007
1,2014-11-01,Australia,6.297957
2,2014-10-01,Australia,6.381043
3,2014-09-01,Australia,6.225912
4,2014-08-01,Australia,6.091151
5,2014-07-01,Australia,6.173284
6,2014-06-01,Australia,6.049167
7,2014-05-01,Australia,5.927412
8,2014-04-01,Australia,5.794500
9,2014-03-01,Australia,5.863455


In [15]:
values_by_indicator_and_source('ABS', 'unemployment rate').indicator__location__name.unique()

array(['Australia', 'New South Wales Australia', 'Queensland Australia',
       'South Australia Australia', 'Tasmania Australia',
       'Victoria Australia', 'Western Australia Australia'], dtype=object)

### Who has vacancy data (Or occupation)?

In [16]:
_ = sources_that_contain_indicator('vac')

* Different indicators found (18): ['Total Vacancy Grade Premium' 'Vacancy Rate Grade Premium'
 'Total Vacancy' 'Vacancy Rate' 'Total Vacancy Grade A'
 'Vacancy Rate Grade A' 'Total Vacancy Grade B' 'Vacancy Rate Grade B'
 'Total Vacancy Grade C' 'Vacancy Rate Grade C' 'Total Vacancy Grade D'
 'Vacancy Rate Grade D' 'Total Vacancy Secondary Grade'
 'Vacancy Rate Secondary Grade' 'Total Vacancy Overall'
 'Vacancy Rate Overall' 'Specialty Shops Vacant' 'VACANCY']
* Different sources found (2): ['JLL' 'STOCKLAND']


In [17]:
_

Unnamed: 0,indicator_info__description,indicator_info__id,indicator_info__name,source__id
0,Total Vacancy Grade Premium,TOTAL_VAC_PREMIUM,Total Vacancy Grade Premium,JLL
1,Vacancy Rate Grade Premium,TOTAL_VAC_RATE_PREMIUM,Vacancy Rate Grade Premium,JLL
2,Total Vacancy,TOTAL_VACANCY,Total Vacancy,JLL
3,Vacancy Rate,VACANCY_RATE,Vacancy Rate,JLL
4,Total Vacancy Grade A,TOTAL_VACANCY_A_GRADE,Total Vacancy Grade A,JLL
5,Vacancy Rate Grade A,VACANCY_RATE_A_GRADE,Vacancy Rate Grade A,JLL
6,Total Vacancy Grade B,TOTAL_VACANCY_B_GRADE,Total Vacancy Grade B,JLL
7,Vacancy Rate Grade B,VACANCY_RATE_B_GRADE,Vacancy Rate Grade B,JLL
8,Total Vacancy Grade C,TOTAL_VACANCY_C_GRADE,Total Vacancy Grade C,JLL
9,Vacancy Rate Grade C,VACANCY_RATE_C_GRADE,Vacancy Rate Grade C,JLL


In [18]:
_ = sources_that_contain_indicator('occ')

* Different indicators found (11): ['occupation rate' 'specialty occupancy cost' 'Dwelling occupancy rate'
 'Occupied Stock Grade Premium' 'Occupied Stock' 'Occupied Stock Grade A'
 'Occupied Stock Grade B' 'Occupied Stock Grade C'
 'Occupied Stock Grade D' 'Occupied Stock Secondary Grade'
 'Occupied Stock Overall']
* Different sources found (5): ['GPT' 'ID_POPUL' 'STOCKLAND' 'VICINITY' 'JLL']


In [19]:
_

Unnamed: 0,indicator_info__description,indicator_info__id,indicator_info__name,source__id
0,occupation rate,OCC_RATE,occupation rate,GPT
13,costs related to occupying a space in specialt...,SPECIALTY_OCC_COST,specialty occupancy cost,GPT
70,Percentage of dwelling occupancy,DWELLING_OCC_RATE,Dwelling occupancy rate,ID_POPUL
198,costs related to occupying a space in specialt...,SPECIALTY_OCC_COST,specialty occupancy cost,STOCKLAND
246,occupation rate,OCC_RATE,occupation rate,VICINITY
292,costs related to occupying a space in specialt...,SPECIALTY_OCC_COST,specialty occupancy cost,VICINITY
573,Occupied Stock Grade Premium,OCCUPIED_STCK_PREMIUM,Occupied Stock Grade Premium,JLL
574,Occupied Stock,OCCUPIED_STOCK,Occupied Stock,JLL
575,Occupied Stock Grade A,OCCUPIED_STCK_A_GRADE,Occupied Stock Grade A,JLL
576,Occupied Stock Grade B,OCCUPIED_STCK_B_GRADE,Occupied Stock Grade B,JLL


### Who has specialty data?

In [20]:
_ = sources_that_contain_indicator('specialty')

* Different indicators found (9): ['specialty occupancy cost' 'specialty sales' 'Prime Rents Specialty Rent'
 'Prime Rents Average Specialty Rental Growth' 'Specialty Shops Vacant'
 'Super-Prime Rents Average Specialty Rental Growth'
 'Super-Prime Rents Specialty Rent' 'Specialty Rent'
 'Average Specialty Rental Growth']
* Different sources found (5): ['GPT' 'STOCKLAND' 'VICINITY' 'SCENTRE' 'JLL']


In [21]:
_

Unnamed: 0,indicator_info__description,indicator_info__id,indicator_info__name,source__id
0,costs related to occupying a space in specialt...,SPECIALTY_OCC_COST,specialty occupancy cost,GPT
18,costs related to occupying a space in specialt...,SPECIALTY_OCC_COST,specialty occupancy cost,STOCKLAND
66,costs related to occupying a space in specialt...,SPECIALTY_OCC_COST,specialty occupancy cost,VICINITY
228,sales in specialty retail,SPECIALTY_SALES,specialty sales,GPT
246,sales in specialty retail,SPECIALTY_SALES,specialty sales,STOCKLAND
288,sales in specialty retail,SPECIALTY_SALES,specialty sales,VICINITY
335,sales in specialty retail,SPECIALTY_SALES,specialty sales,SCENTRE
492,,Prime_Rents_Specialty_Rent,Prime Rents Specialty Rent,JLL
497,,Prime_Rents_Average_Specialty_Rental_Gr,Prime Rents Average Specialty Rental Growth,JLL
502,,Specialty_Shops_Vacant,Specialty Shops Vacant,JLL


### Who has cap_rate data  (also yield)?

In [22]:
_ = sources_that_contain_indicator('cap')

* Different indicators found (6): ['capitalisation rate' 'Capital Value' 'Capital Value Growth q-o-q'
 'Capital Value Growth y-o-y' 'Capital Value Indicator Prime Grade'
 'Capital Value Indicator']
* Different sources found (5): ['GPT' 'STOCKLAND' 'VICINITY' 'SCENTRE' 'JLL']


In [23]:
_

Unnamed: 0,indicator_info__description,indicator_info__id,indicator_info__name,source__id
0,net operating income of a property divided by ...,CAP_RATE,capitalisation rate,GPT
50,net operating income of a property divided by ...,CAP_RATE,capitalisation rate,STOCKLAND
158,net operating income of a property divided by ...,CAP_RATE,capitalisation rate,VICINITY
221,net operating income of a property divided by ...,CAP_RATE,capitalisation rate,SCENTRE
366,Capital Value,CAPITAL_VAL,Capital Value,JLL
367,Capital Value Growth q-o-q,CAPITAL_VAL_GRTH_QOQ,Capital Value Growth q-o-q,JLL
368,Capital Value Growth y-o-y,CAPITAL_VAL_GRTH_YOY,Capital Value Growth y-o-y,JLL
369,Capital Value Indicator Prime Grade,CAPITAL_VAL_IND_PRIME,Capital Value Indicator Prime Grade,JLL
386,,Capital_Value_Indicator,Capital Value Indicator,JLL


In [24]:
_ = sources_that_contain_indicator('yield')

* Different indicators found (10): ['Equivalent Yield Upper Prime Grade' 'Equivalent Yield Lower Prime Grade'
 'Equivalent Average Yield Prime Grade'
 'Equivalent Yield Upper Secondary Grade'
 'Equivalent Yield Lower Secondary Grade'
 'Equivalent Average Yield Secondary Grade' 'Lower Yield'
 'Mid-Point Yield' 'Upper Yield' 'Median Yield']
* Different sources found (1): ['JLL']


In [25]:
_

Unnamed: 0,indicator_info__description,indicator_info__id,indicator_info__name,source__id
0,Equivalent Yield Upper Prime Grade,EQ_YIELD_UPPER_PRIME,Equivalent Yield Upper Prime Grade,JLL
5,Equivalent Yield Lower Prime Grade,EQ_YIELD_LOWER_PRIME,Equivalent Yield Lower Prime Grade,JLL
10,Equivalent Average Yield Prime Grade,EQ_AVG_YIELD_PRIME,Equivalent Average Yield Prime Grade,JLL
15,Equivalent Yield Upper Secondary Grade,EQ_YIELD_UPPER_SEC,Equivalent Yield Upper Secondary Grade,JLL
20,Equivalent Yield Lower Secondary Grade,EQ_YIELD_LOWER_SEC,Equivalent Yield Lower Secondary Grade,JLL
25,Equivalent Average Yield Secondary Grade,EQ_AVG_YIELD_SEC,Equivalent Average Yield Secondary Grade,JLL
30,,Lower_Yield,Lower Yield,JLL
57,,Mid-Point_Yield,Mid-Point Yield,JLL
73,,Upper_Yield,Upper Yield,JLL
100,,Median_Yield,Median Yield,JLL


## Extracting the relevant data

Vicinity, stockland y GPT tienen también datos de housing y offices (aparte de retail), solo nos interesa retail. Mirar la variable folder_name.

llamar get_actuals para sacar las cosas de jll.

Segmentar por regional/sub_regional/cvd.