# **Analyzing and Predicting Business Activity for Cafes and Restaurants in Melbourne**

**Authored by:** Sachitha Sadeepa Kasthuriarachchi

**Duration:** TBD

**Level:** Beginner/Intermediate

**Pre-requisite Skills:**  Python Programming,Jupyter Notebooks,Power BI,Data Analysis,Machine Learning and Geospatial Analysis.


**Background:**

I am planning to open a new cafe in Melbourne and want to ensure that my business meets customer demands and adheres to area-specific trends. To help with this, I am using a comprehensive dataset of existing cafes and restaurants, which includes information about their seating capacity, location, industry type, and other relevant factors.

**Objectives:**

1. Distribution Analysis: Understand the distribution of cafes and restaurants across different areas of Melbourne.

2. Seating Capacity Analysis: Compare the seating capacity for indoor and outdoor seating.
   
3. Industry Analysis: Explore the prevalence of different industries within the cafes and restaurants sector.
     
4. Geospatial Analysis: Visualize the geographic distribution of cafes and restaurants across Melbourne.
    
5. Trend Analysis: Identify trends in business activity over different census years.

6. Prediction Task: Develop a predictive model to forecast the number of seats based on various features.

**Scenario:**

I am planning to open a new cafe in Melbourne. To ensure that my business is well-prepared to meet customer demands, I use data analysis and predictive analytics to gain valuable insights and make informed decisions.

1. Distribution Analysis:  
     - I start by visualizing the concentration of cafes and restaurants in various areas of Melbourne.
     - I learn that certain areas, like the Melbourne CBD, have a high density of cafes and restaurants, which might indicate a competitive environment but   also a high customer base.
2. Seating Capacity Analysis:  
     - I analyze the average seating capacity for indoor and outdoor seating in different areas.
     - This helps me decide whether to focus more on indoor or outdoor seating based on popular trends in my chosen location.
3. Industry Analysis:
     - I explore the prevalent types of businesses (e.g., pubs, takeaway food services) in my desired area.
     - I discover that there is a higher concentration of takeaway food services, prompting me to consider whether I should incorporate a similar model or differentiate my cafe with unique offerings.      
4. Geospatial Analysis:
     - I use a map to highlight the geographic distribution of existing cafes and restaurants.
     - I identify potential hotspots for my cafe where there is a demand for more dining options.
5. Trend Analysis:
     - I look at trends in business activity over recent years, highlighting growth areas and emerging neighborhoods.
     - I notice an upward trend in new cafes opening in the suburbs, indicating a potential opportunity for less competition and growing customer interest.
6. Prediction Task:
     - using a predictive model, I receive an estimated optimal number of seats for my cafe based on my chosen location, industry type, and other relevant factors.
     - The model predicts that my cafe should ideally have around 50 indoor seats and 20 outdoor seats to maximize customer satisfaction and business efficiency.

This scenario demonstrates how data analysis and predictive modeling can provide practical support for launching a successful cafe in Melbourne, helping me meet customer demands and optimize business performance.


**Benefits:**

1. **Informed Decision-Making:** I use the insights provided by the data analysis to make data-driven decisions about my new cafe, reducing the risk of underestimating or overestimating seating capacity.

2. **ompetitive Advantage:** By understanding local trends and customer preferences, I can tailor my business strategy to stand out in a competitive market.

3. **Resource Optimization:** The predictive model helps me allocate resources effectively, ensuring I invest appropriately in indoor and outdoor seating based on demand forecasts.

## 1. Importing Data

Below are the links to the four data sets that will be used in this for this use case.

[Data Set 1](https://data.melbourne.vic.gov.au/explore/dataset/cafes-and-restaurants-with-seating-capacity/information/) **Café, restaurant, bistro seats.** This dataset contains **60055** records.

[Data Set 2](https://data.melbourne.vic.gov.au/explore/dataset/blocks-for-census-of-land-use-and-employment-clue/information/?location=13,-37.80246,144.94417&basemap=mbs-7a7333) Blocks for Census of Land Use and Employment (CLUE)


[Data Set 3](https://data.melbourne.vic.gov.au/explore/dataset/employment-by-block-by-clue-industry/information/) Jobs per CLUE industry for blocks


[Data Set 4](https://data.melbourne.vic.gov.au/explore/dataset/floor-space-by-use-by-block/information/) Floor space per space use for blocks


### 1.0 Dataset Imported through API

In [10]:
import requests
import pandas as pd
from io import StringIO

#Function to collect data
def collect_data(dataset_id):
    base_url = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
    #apikey = api_key #use if use datasets API_key permissions
    dataset_id = dataset_id
    format = 'csv'

    url = f'{base_url}{dataset_id}/exports/{format}'
    params = {
        'select': '*',
        'limit': -1,  # all records
        'lang': 'en',
        'timezone': 'UTC',
        #'api_key': apikey  #use if use datasets API_key permissions
    }

    # GET request
    response = requests.get(url, params=params)

    if response.status_code == 200:
        # StringIO to read the CSV data
        url_content = response.content.decode('utf-8')
        dataset = pd.read_csv(StringIO(url_content), delimiter=';')
        return dataset
    else:
        print(f'Request failed with status code {response.status_code}')



### 1.2 Call function to collect the dataset

In [9]:
# Set dataset_id to query for the API call dataset name
dataset_id = 'cafes-and-restaurants-with-seating-capacity'
# Save dataset to df varaible
df = collect_data(dataset_id)
# Check number of records in df
print(f'The dataset contains {len(df)} records.')
# View df
df.head(3)

The dataset contains 60055 records.


Unnamed: 0,census_year,block_id,property_id,base_property_id,building_address,clue_small_area,trading_name,business_address,industry_anzsic4_code,industry_anzsic4_description,seating_type,number_of_seats,longitude,latitude,location
0,2017,6,578324,573333,2 Swanston Street MELBOURNE 3000,Melbourne (CBD),Transport Hotel,"Tenancy 29, Ground , 2 Swanston Street MELBOUR...",4520,"Pubs, Taverns and Bars",Seats - Indoor,230,144.969942,-37.817778,"-37.817777826050005, 144.96994164279243"
1,2017,6,578324,573333,2 Swanston Street MELBOURNE 3000,Melbourne (CBD),Transport Hotel,"Tenancy 29, Ground , 2 Swanston Street MELBOUR...",4520,"Pubs, Taverns and Bars",Seats - Outdoor,120,144.969942,-37.817778,"-37.817777826050005, 144.96994164279243"
2,2017,11,103957,103957,517-537 Flinders Lane MELBOURNE 3000,Melbourne (CBD),Altius Coffee Brewers,"Shop , Ground , 517 Flinders Lane MELBOURNE 3000",4512,Takeaway Food Services,Seats - Outdoor,4,144.956486,-37.819875,"-37.819875445799994, 144.95648638781466"


## 2.0 Pre processing Data

***

_**DELETE BEFORE PUBLISHING**_

## Style guide for use cases

### Headers

For styling within your markdown cells, there are two choices you can use for headers.

1) You can use HTML classes specific to the use case styling:

```<p class="usecase-subsection-header">This is a subsection header.</p>```

<p style="font-weight: bold; font-size: 1.2em;">This is a subsection header.</p>

```<p class="usecase-subsection-blurb">This is a blurb header.</p>```

<p style="font-weight: bold; font-size: 1em; font-style:italic;">This is a blurb header.</p>


2) Or if you like you can use the markdown header styles:

```# for h1```

```## for h2```

```### for h3```

```#### for h4```

```##### for h5```

## Plot colour schemes

General advice:
1. Use the same colour or colour palette throughout your notebook, unless variety is necessary
2. Select a palette based on the type of data being represented
3. Consider accessibility (colourblindness, low vision)

#### 1) If all of your plots only use 1-2 colors use one of the company style colors:

| Light theme | Dark Theme |
|-----|-----|
|<p style="color:#2af598;">#2af598</p>|<p style="color:#08af64;">#08af64</p>|
|<p style="color:#22e4ac;">#22e4ac</p>|<p style="color:#14a38e;">#14a38e</p>|
|<p style="color:#1bd7bb;">#1bd7bb</p>|<p style="color:#0f9295;">#0f9295</p>|
|<p style="color:#14c9cb;">#14c9cb</p>|<p style="color:#056b8a;">#056b8a</p>|
|<p style="color:#0fbed8;">#0fbed8</p>|<p style="color:#121212;">#121212</p>|
|<p style="color:#08b3e5;">#08b3e5</p>||


#### 2) If your plot needs multiple colors, choose an appropriate palette using either of the following tutorials:
- https://seaborn.pydata.org/tutorial/color_palettes.html
- https://matplotlib.org/stable/tutorials/colors/colormaps.html

#### 3) Consider accessibility as well.

For qualitative plotting Seaborn's 'colorblind' palette is recommended. For maps with sequential or diverging it is recommended to use one of the Color Brewer schemes which can be previewed at https://colorbrewer2.org/.

If you want to design your own colour scheme, it should use the same principles as Cynthia Brewer's research (with variation not only in hue but also, saturation or luminance).

### References

Be sure to acknowledge your sources and any attributions using links or a reference list.

If you have quite a few references, you might wish to have a dedicated section for references at the end of your document, linked using footnote style numbers.

You can connect your in-text reference by adding the number with a HTML link: ```<a href="#fn-1">[1]</a>```

and add a matching ID in the reference list using the ```<fn>``` tag: ```<fn id="fn-1">[1] Author (Year) _Title_, Publisher, Publication location.</fn>```