In [10]:
# DELETE BEFORE PUBLISHING
# This is just here so you can preview the styling on your local machine

from IPython.core.display import HTML
HTML("""
<style>
.usecase-title, .usecase-duration, .usecase-section-header {
    padding-left: 15px;
    padding-bottom: 10px;
    padding-top: 10px;
    padding-right: 15px;
    background-color: #0f9295;
    color: #fff;
}

.usecase-title {
    font-size: 1.7em;
    font-weight: bold;
}

.usecase-authors, .usecase-level, .usecase-skill {
    padding-left: 15px;
    padding-bottom: 7px;
    padding-top: 7px;
    background-color: #baeaeb;
    font-size: 1.4em;
    color: #121212;
}

.usecase-level-skill  {
    display: flex;
}

.usecase-level, .usecase-skill {
    width: 50%;
}

.usecase-duration, .usecase-skill {
    text-align: right;
    padding-right: 15px;
    padding-bottom: 8px;
    font-size: 1.4em;
}

.usecase-section-header {
    font-weight: bold;
    font-size: 1.5em;
}

.usecase-subsection-header, .usecase-subsection-blurb {
    font-weight: bold;
    font-size: 1.2em;
    color: #121212;
}

.usecase-subsection-blurb {
    font-size: 1em;
    font-style: italic;
}
</style>
""")

<div class="usecase-title">Population and Job forecasting </div>

<div class="usecase-authors"><b>Authored by: Rhutuvaruni Kharade and Tharusha Chao</div>

<div class="usecase-duration"><b>Duration:</b> 100 mins</div>

<div class="usecase-level-skill">
    <div class="usecase-level"><b>Level: </b>Intermediate</div>
    <div class="usecase-skill"><b>Pre-requisite Skills: </b>Python, PowerBI, Tableau</div>
</div>

<div class="usecase-section-header">Scenario</div>

<b>As a citizen and a job seeker, I want to find a job in the field that currently has a lot of demand in this area. </b>
Job seekers are often worried about which jobs are in high demand or in low demand? Which industry will have higher number of jobs in future ? Which areas have highest numbers of jobs based on a particular industry ? Its important for them to know which jobs they should study for and which city has highest concentration of such jobs so they can plan their living and stay in such areas. By understanding the number of jobs based on a cities, job seekers can ensure that they can get a job or live in close vicinity of that job if they live in a particular areas. 

<b>As a business owner I want to establish my business where there is a high demand for my service. I want to make sure that the city I am planning to establish my business in has a higher population so I can attract more customers and potential candidates to work for my company. </b>
Business owners need to know in which city they should establish their company in. This is based on the population of the area. Number of people living in a particular area can motivate business owners to have their business in such locations. This will also attract many job seekers in these places which will help the businesses find the right candidates (employees) to work for them. 



<div class="usecase-section-header">What this use case will teach you</div>

At the end of this use case you will - 
- Learn how to make open the data in the form of pandas dataframe and save it for futher use.
- Have learned how to clean, transform, analyze, visualize data and report outcomes (findings) effectively.
- Use the data to create effective visualization like scatterplots, heatmaps, histograms, etc to aid an understanding of the data to technical and non-technical readers
- using dashboards for effective story telling to both techical and non-technical audiences. 
- Using version control tools to collaborate and contribute to the project//


<div class="usecase-section-header">Introduction</div>

This project focuses on finding the relationship between jobs and population. Jobs in different industry impact the population in the area. This project will look at this impact and will also give reader a brief idea about how population changes with respect to the number of jobs in a particular area. The data is sourced from City of Melbourne Open Data which is open source and will be used throughout this project. The dataset will be cleaned, transformed, analyzed, visualized and finally relevant insights will be reported and documented. These findings will help stakeholders, policy makers and other readers for futher decision making.  


<div class="usecase-section-header">Datasets Used </div>

<div class="usecase-section-header">Roadmap</div>

<div class="usecase-section-header">Importing libraries </div>

In [11]:
# importing libraries 
import pandas as pd 
import seaborn as sns 
import numpy as np 
import requests
import os 


<div class="usecase-section-header">Connecting to Dataset and Testing </div>

In [12]:
# # aim : api stuff, creating requests and parsing json 
# # job data - City of Melbourne Jobs Forecasts by Small Area 2021-2041

# # Function to get data from website using API
# def get_data(base, data_url, offset=0):    
#     # Set the filters, limit retrieves 20 rows at a time, offset says where to start data collection
#     filters = f'records?limit={50}&offset={offset}&timezone=UTC'
#     # Make the url from base, data url and filters variables stored ouside loop
#     url = f'{base}{data_url}/{filters}'
#     # Use the requests function to get the data
#     result = requests.get(url)
#     # Check that the request works, error code 200 = successful
#     if result.status_code == 200:
#         # Save results as a json file
#         result_json = result.json()
#         # Store a variable of max_results with total of dataset
#         max_results = result_json['total_count']
#         # Save the results key data to a list variable
#         records = result_json['results']
#     else:
#         # If data is not collected correctly return the error
#         print("ERROR GETTING DATA: ", result.status_code)
#         max_results = 0
#         records = []
#     # At end of function, return the json results in records, max_results count and offset
#     return [records, max_results, offset]


# # Collect data from API
# # Set offset increment
# OFFSET_INCREMENT = 50
# # Base url (this should be the same for all datasets)
# BASE_URL = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
# # Set specific url 
# SPECIFIC_PATH = 'city-of-melbourne-jobs-forecasts-by-small-area-2020-2040'
# # Call the get data function, passing in variables above, save to result
# result = get_data(BASE_URL, SPECIFIC_PATH)
# # Save the records data returned in the get_data function to records list variable
# records = result[0]
# # Save the dataset size data returned in the get_data function to max_results variable
# max_results = result[1] 
# # Increase the offset returned in the get_data function (result[2]) by the offset increment
# offset = result[2] + OFFSET_INCREMENT
# # Check the length of the data returned and compare it against the max_results variable
# # If the length o fthe data is less than the max_results, run the while loop
# while len(records) != max_results:
#     # Call the get data function again, passing in url, specific path and new offset value
#     data = get_data(BASE_URL, SPECIFIC_PATH, offset)
#     # Add the data collected to the existing records list
#     records += data[0]
#     # Increase the offset by the offset increment
#     offset += OFFSET_INCREMENT
# # Convert the records list of dictionaries into a pandas dataframe 
# job = pd.DataFrame(records)
# job

In [13]:

# # Collect data from API
# # Set offset increment
# OFFSET_INCREMENT = 50
# # Base url (this should be the same for all datasets)
# BASE_URL = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
# # Set specific url 
# SPECIFIC_PATH = 'city-of-melbourne-population-forecasts-by-small-area-2020-2040'
# # Call the get data function, passing in variables above, save to result
# result = get_data(BASE_URL, SPECIFIC_PATH)
# # Save the records data returned in the get_data function to records list variable
# records = result[0]
# # Save the dataset size data returned in the get_data function to max_results variable
# max_results = result[1] 
# # Increase the offset returned in the get_data function (result[2]) by the offset increment
# offset = result[2] + OFFSET_INCREMENT
# # Check the length of the data returned and compare it against the max_results variable
# # If the length o fthe data is less than the max_results, run the while loop
# while len(records) != max_results:
#     # Call the get data function again, passing in url, specific path and new offset value
#     data = get_data(BASE_URL, SPECIFIC_PATH, offset)
#     # Add the data collected to the existing records list
#     records += data[0]
#     # Increase the offset by the offset increment
#     offset += OFFSET_INCREMENT
# # Convert the records list of dictionaries into a pandas dataframe 
# pop = pd.DataFrame(records)
# pop

In [14]:
job = pd.read_csv("city-of-melbourne-jobs-forecasts-by-small-area-2020-2040.csv")
job

Unnamed: 0,Geography,Year,Category,Industry Space Use,Value
0,City of Melbourne,2023,Jobs by industry,Accommodation,10286
1,City of Melbourne,2026,Jobs by industry,Accommodation,11631
2,City of Melbourne,2032,Jobs by industry,Accommodation,13207
3,City of Melbourne,2034,Jobs by industry,Accommodation,13420
4,City of Melbourne,2035,Jobs by industry,Accommodation,13529
...,...,...,...,...,...
9109,West Melbourne (Residential),2025,Jobs by space use,Total jobs,5454
9110,West Melbourne (Residential),2026,Jobs by space use,Total jobs,5618
9111,West Melbourne (Residential),2029,Jobs by space use,Total jobs,6118
9112,West Melbourne (Residential),2033,Jobs by space use,Total jobs,6717


In [15]:
job.head()

Unnamed: 0,Geography,Year,Category,Industry Space Use,Value
0,City of Melbourne,2023,Jobs by industry,Accommodation,10286
1,City of Melbourne,2026,Jobs by industry,Accommodation,11631
2,City of Melbourne,2032,Jobs by industry,Accommodation,13207
3,City of Melbourne,2034,Jobs by industry,Accommodation,13420
4,City of Melbourne,2035,Jobs by industry,Accommodation,13529


In [16]:
job.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9114 entries, 0 to 9113
Data columns (total 5 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   Geography           9114 non-null   object
 1   Year                9114 non-null   int64 
 2   Category            9114 non-null   object
 3   Industry Space Use  9114 non-null   object
 4   Value               9114 non-null   int64 
dtypes: int64(2), object(3)
memory usage: 356.1+ KB


In [17]:
job.shape

(9114, 5)

In [38]:
job = job.rename(columns = {"Geography":"geography", "Year":"year", "Category":"category", "Industry Space Use": "industry_space_use", "Value": "value"} )

In [28]:
job.geography.unique()


array(['City of Melbourne', 'Carlton', 'Docklands', 'East Melbourne',
       'Kensington', 'Melbourne (CBD)', 'Melbourne (Remainder)',
       'North Melbourne', 'Parkville', 'Port Melbourne', 'South Yarra',
       'Southbank', 'West Melbourne (Industrial)',
       'West Melbourne (Residential)'], dtype=object)

In [36]:
job.year.unique()


array([2023, 2026, 2032, 2034, 2035, 2021, 2025, 2037, 2040, 2027, 2028,
       2036, 2041, 2029, 2031, 2038, 2024, 2030, 2039, 2022, 2033],
      dtype=int64)

In [39]:
job.category.unique()

array(['Jobs by industry', 'Jobs by space use'], dtype=object)

<div class="usecase-section-header">About Jobs Forecast Dataset. </div>

This dataset provides jobs forecasts by single year for 2021 to 2041. Prepared by SGS Economics and Planning (Jan-Jun 2022), forecasts are available for the municipality and small areas, as well as by industry and space use type.

The dataset contains the following variables (features): 
<ul>
    <li> <b>geography:</b> Geographical area (Melbourne LGA or small areas used for the City of Melbourne's CLUE analysis). Small areas mostly correspond to traditional suburb boundaries. This is a categorical variable of type <b>object</b>. This variable takes value from the following list of locations: <b> 
    'City of Melbourne', 'Carlton', 'Docklands', 'East Melbourne',
       'Kensington', 'Melbourne (CBD)', 'Melbourne (Remainder)',
       'North Melbourne', 'Parkville', 'Port Melbourne', 'South Yarra',
       'Southbank', 'West Melbourne (Industrial)',
      'West Melbourne (Resident.</b>l)']
    </li><hr>    
    <li>
        <b>year:</b> This variables shows the year the job were created. This is a numerical variable of  <b>object</b>. Year is from <b>2022 to 2040</b>./b>  
    </li><hr>
    <li>
        <b>category:</b> This variables shows the category of the job that was created in that particular year. This is a numerical variable of type <b>object.
        There are two categories in this variables however we will only look at category by industry.</b>  
    </li><hr>
    <li>
        <b>industry_space_use:</b> This variables shows the space used by a particular industry that was established in given year. This is a numerical variable of type <b>object</b>  
    </li><hr>
</</p>

<div class="usecase-section-header">References</div>
<ul>
    <li> https://pandas.pydata.org/docs/reference/api/pandas.unique.html </li>
</ul>
/ul>

<h3> Rhuth's Work Above </h3>
<h1>MAIN SECTION </h1> 
<h3> Tharusha's Work Below </h3>

In [18]:
print("HELLO THIS CHANGE IS FROM THARUSHA CAO , SEE ME")

HELLO THIS CHANGE IS FROM THARUSHA CAO , SEE ME
